HimalayasHimalayas logo
wexWE

Senior AI Infrastructure Engineer

WEX Inc. is a global commerce platform that simplifies the business of running a business by offering personalized technology solutions for employee benefits, mobility and fleet management, and corporate payments.

wex

Employee count: 5000+

Salary: 122k-146k USD

United States only

Stay safe on Himalayas

Never send money to companies. Jobs on Himalayas will never require payment from applicants.

This is a remote position; however, the candidate must reside within 30 miles of one of the following locations: Portland, ME; Boston, MA; Chicago, IL; Dallas, TX; San Francisco Bay Area, CA; and Seattle/WA.

About the Team

We are the backbone of the AI organization, building the high-performance compute foundation that powers our generative AI and machine learning initiatives. Our team bridges the gap between hardware and software, ensuring that our researchers and data scientists have a reliable, scalable, and efficient platform to train and deploy models. We focus on maximizing GPU utilization, minimizing inference latency, and creating a seamless "paved road" for AI development.

How You’ll Make an Impact

You are a systems thinker who loves solving hard infrastructure challenges. You will architect the underlying platform that serves our production AI workloads, ensuring they are resilient, secure, and cost-effective. By optimizing our compute layer and deployment pipelines, you will directly accelerate the velocity of the entire AI product team, transforming how we deliver AI at scale.

Responsibilities

  • Platform Architecture: Design and maintain a robust, Kubernetes-based AI platform that supports distributed training and high-throughput inference serving.

  • Inference Optimization: Engineer low-latency serving solutions for LLMs and other models, optimizing engines (e.g., vLLM, TGI, Triton) to maximize throughput and minimize cost per token.

  • Compute Orchestration: Manage and scale GPU clusters on Cloud (AWS) or on-prem environments, implementing efficient scheduling, auto-scaling, and spot instance management to optimize costs.

  • Operational Excellence (MLOps): Build and maintain "Infrastructure as Code" (Terraform/Ansible) and CI/CD pipelines to automate the lifecycle of model deployments and infrastructure provisioning.

  • Reliability & Observability: Implement comprehensive monitoring (Prometheus, Grafana) for GPU health, model latency, and system resource usage; lead incident response for critical AI infrastructure.

  • Developer Experience: Create tools and abstraction layers (SDKs, CLI tools) that allow data scientists to self-serve compute resources without managing underlying infrastructure.

  • Security & Compliance: Ensure all AI infrastructure meets strict security standards, handling sensitive data encryption and access controls (IAM, VPCs) effectively.

Experience You’ll Bring

  • 5+ years of experience in DevOps, Site Reliability Engineering (SRE), or Platform Engineering, with at least 2 years focused on Machine Learning infrastructure.

  • Production Expertise: Proven track record of managing large-scale production clusters (Kubernetes) and distributed systems.

  • Hardware Fluency: Deep understanding of GPU architectures (NVIDIA A100/H100), CUDA drivers, and networking requirements for distributed workloads.

  • Serving Proficiency: Experience deploying and scaling open-source LLMs and embedding models using containerized solutions.

  • Automation First: Strong belief in "Everything as Code"—you automate toil wherever possible using Python, Go, or Bash.

Technical Skills

  • Core Engineering: Expert proficiency in Python and Go; comfortable digging into lower-level system performance.

  • Orchestration & Containers: Mastery of Kubernetes (EKS/GKE), Helm, Docker, and container runtimes. Experience with Ray or Slurm is a huge plus.

  • Infrastructure as Code: Advanced skills with Terraform, CloudFormation, or Pulumi.

  • Model Serving: Hands-on experience with serving frameworks like Triton Inference Server, vLLM, Text Generation Inference (TGI), or TorchServe.

  • Cloud Platforms: Deep expertise in AWS (EC2, EKS, SageMaker) or GCP, specifically regarding GPU instance types and networking.

  • Observability: Proficiency with Prometheus, Grafana, DataDog, and tracing tools (OpenTelemetry).

  • Networking: Understanding of service mesh (Istio), load balancing, and high-performance networking (RPC, gRPC).

The base pay range represents the anticipated low and high end of the pay range for this position. Actual pay rates will vary and will be based on various factors, such as your qualifications, skills, competencies, and proficiency for the role. Base pay is one component of WEX's total compensation package. Most sales positions are eligible for commission under the terms of an applicable plan. Non-sales roles are typically eligible for a quarterly or annual bonus based on their role and applicable plan. WEX's comprehensive and market competitive benefits are designed to support your personal and professional well-being. Benefits include health, dental and vision insurances, retirement savings plan, paid time off, health savings account, flexible spending accounts, life insurance, disability insurance, tuition reimbursement, and more. For more information, check out the "About Us" section.Pay Range: $121,500.00 - $145,500.00

About the job

Apply before

Posted on

Job type

Full Time

Experience level

Senior

Salary

Salary: 122k-146k USD

Location requirements

Hiring timezones

United States +/- 0 hours

About wex

Learn more about wex and their company culture.

View company profile

WEX Inc. is a global commerce platform that simplifies the business of running a business. Many of our customers face the challenge of managing complex operational processes due to the rapid pace of regulatory, economic, and societal change worldwide. They are often stretched thin and lack the in-house expertise to solve these intricate problems. This is why WEX offers personalized technology solutions designed to simplify employee benefits, mobility and fleet management, and accounts payable and receivables processes. From our origins as a pioneer in fleet card payments in 1983, we have expanded our scope to become a multi-channel provider of corporate payment solutions, helping businesses navigate these complexities and achieve greater efficiency.

Our customers in the fleet industry, for example, need robust tools to manage fuel and maintenance expenses, ensure driver safety, and optimize operations. WEX Fleet provides them with fuel cards, telematics, and data analytics to meet these needs. For businesses involved in travel, managing cross-border payments and streamlining back-end accounting can be a significant hurdle. WEX's travel and corporate solutions, including virtual payment solutions, help these clients automate processes, reduce costs, and gain better insights into their spending. Similarly, in the healthcare sector, employers and employees alike grapple with the administration of benefits and healthcare payments. WEX Health offers a cloud-based platform to simplify the management of Health Savings Accounts (HSAs), Flexible Spending Accounts (FSAs), and other benefit plans, making it easier for millions of consumers to manage their healthcare expenses. By embedding our solutions into our customers' workflows and leveraging our expertise in data and analytics, we empower them to make smarter decisions, reduce operating costs, and ultimately, reach their full potential.

Employee benefits

Learn about the employee benefits and perks provided at wex.

View benefits

Company equity

WEX offers company equity.

Life insurance

WEX offers life insurance.

Paid sick days

WEX provides paid sick days.

Sabbatical

WEX offers sabbatical leave.

View wex's employee benefits
Claim this profilewex logoWE

wex

View company profile

Similar remote jobs

Here are other jobs you might want to apply for.

View all remote jobs

102 remote jobs at wex

Explore the variety of open remote roles at wex, offering flexible work options across multiple disciplines and skill levels.

View all jobs at wex

Remote companies like wex

Find your next opportunity by exploring profiles of companies that are similar to wex. Compare culture, benefits, and job openings on Himalayas.

View all companies

Find your dream job

Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan