HimalayasHimalayas logo
G2iGI

Senior Software Engineer — AI Evaluation & Benchmarks (Python)

G2i is a hiring platform for engineers by engineers focused on React & React Native.

G2i

Employee count: 11-50

Salary: 166k-208k USD

AL, AT + 38 more

Stay safe on Himalayas

Never send money to companies. Jobs on Himalayas will never require payment from applicants.

Before Applying

This role is open to contractors in accepted locations only. Please confirm your country is on the list before applying — we're unable to process applications from unlisted locations. List of accepted countries and locations.

For US applicants: This is a 1099 independent contractor role. It is not compatible with F-1 OPT, STEM OPT, or any visa status that requires W-2 employment, guaranteed hours, or employer sponsorship. We are unable to provide offer letters or employment verification for this role.

What You'll Be Doing

Design and build the coding benchmarks and evaluation pipelines used to test frontier AI models on real software engineering work:

  • Design coding benchmarks that evaluate frontier models on real-world programming tasks — reasoning, debugging, and production-quality code

  • Build and maintain scalable data pipelines for evaluation workflows

  • Analyze model-generated code for correctness, reliability, and edge-case failures

  • Construct structured evaluation scenarios across large repos and multi-language environments

  • Provide detailed technical feedback on model performance and failure patterns

  • Contribute to evaluation frameworks that set the bar for how coding ability is measured

End result: benchmarks that meaningfully separate what frontier models can and can't do — and shape how the next generation is trained and improved.

AI coding evaluation in one line: Design task → build harness → run model → analyze failures → feed findings back into the benchmark → evaluations that actually distinguish strong models from weak ones.

What You'll Need

  • 4+ years of professional software engineering experience (non-negotiable)

  • Expert Python — clean, performant, well-tested code

  • Hands-on experience working in large, complex codebases

  • Proven experience designing and implementing LLM coding benchmarks and evaluation data pipelines

  • Strong command of Git and modern development workflows

  • Track record at a high-growth tech company or top-tier software organization

  • Strong written English communication

Identity verification: Applicants will be required to verify their identity and confirm they have valid documentation to work as an independent contractor in their country of residence.

Nice to have

  • Senior or Lead-level profile with a history of technical ownership

  • Bachelor's or Master's in CS, ML, or related field (or equivalent professional experience)

  • Proficiency in additional languages: JavaScript, Go, C++, or others

  • CI/CD experience and writing robust unit tests (pytest, Mocha, JUnit)

  • Background in security engineering or significant open-source contributions

  • Familiarity with AI/ML evaluation methodologies or model benchmarking

Logistics

  • Location: Fully remote — work from anywhere on the accepted locations list

  • Compensation: $80–$100/hr based on location and seniority

  • Contract length: 3 months, with potential for extension

  • Hours: Full-time availability preferred — hours vary by project and are not guaranteed week to week

  • Engagement: 1099 independent contractor

  • Payment: Weekly via PayPal or Stripe

⚠️ Important: Hours are project-dependent and can vary week to week. We recommend keeping other work options open alongside this engagement rather than relying on it as your sole source of income.

About the job

Apply before

Posted on

Job type

Contractor

Experience level

Salary

Salary: 166k-208k USD

About G2i

Learn more about G2i and their company culture.

View company profile

G2i is a hiring platform for engineers by engineers focused on React & React Native. We provide engineers on contracts or direct hire to companies big and small (1 billion in valuation to 1 million).

We are passionate about the React ecosystem and are involved in the community. Everyone on the team has programming experience of some sort and offer training for new team members whether it be on the sales side or hiring team.

We technically vet every candidate that comes through our platform so companies don't have to waste their time with candidates that won't pass technical interviews.

Employee benefits

Learn about the employee benefits and perks provided at G2i.

View benefits

Paid vacation

21 days of paid time off

4-day workweeks

We work 4 day / 32 hour work weeks

Healthcare benefits

95% of base healthcare premium covered

View G2i's employee benefits
Claim this profileG2i logoGI

G2i

Company size

11-50 employees

Founded in

2012

Chief executive officer

Gabe Greenberg

View company profile

Similar remote jobs

Here are other jobs you might want to apply for.

View all remote jobs

34 remote jobs at G2i

Explore the variety of open remote roles at G2i, offering flexible work options across multiple disciplines and skill levels.

View all jobs at G2i

Remote companies like G2i

Find your next opportunity by exploring profiles of companies that are similar to G2i. Compare culture, benefits, and job openings on Himalayas.

View all companies

Find your dream job

Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan