Himalayas logo
JobgetherJO

Lead Site Reliability Engineer - Data Platforms

Jobgether

Salary: 125k-162k USD

United States only

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Lead Site Reliability Engineer - Data Platforms in the United States.

We are seeking a Lead Site Reliability Engineer to manage and optimize data platform operations, ensuring high availability, scalability, and performance. In this role, you will oversee cloud infrastructure, end-to-end data pipelines, and containerized applications while collaborating closely with data science, ML/GenAI, and development teams. You will implement Infrastructure as Code, monitor systems for observability, and troubleshoot complex issues to maintain operational excellence. The ideal candidate thrives in a fast-paced environment, embraces automation, and drives innovation across cloud and data platforms. This role combines hands-on technical expertise with strategic system design, delivering measurable impact on business-critical data workflows.

Accountabilities

  • Manage cloud-based infrastructure, including AWS services (S3, EMR, Redshift) and containerized environments (ECS, Docker), to support data pipelines and ML/GenAI workloads.
  • Design, deploy, and maintain automated infrastructure using tools like Terraform, Chef, Ansible, and CI/CD pipelines.
  • Monitor and enhance observability across data systems, applications, and platforms.
  • Collaborate with engineering and ML teams to optimize the performance, reliability, and scalability of data and AI systems.
  • Participate in code/design reviews, troubleshoot complex system issues, and document root cause analyses (RCAs).
  • Support release planning, on-call rotation, and problem resolution to ensure uninterrupted data operations.

Requirements

  • 8+ years of experience with Big Data technologies, data pipelines, and Linux administration.
  • Strong scripting proficiency in Bash or Python.
  • 5+ years managing cloud platforms (AWS, Azure) with hands-on experience in ECS, EKS, AKS, Terraform, Helm.
  • Experience with Infrastructure as Code, CI/CD tools (Chef, Ansible, Jenkins), and version control systems (Git).
  • Familiarity with Generative AI platforms (SageMaker, Bedrock, Azure ML) and vector databases.
  • Solid knowledge of networking (DNS, load balancers), MySQL, Apache Spark, and BI/data lake platforms.
  • Excellent communication skills, self-driven, capable of independently resolving complex issues, and delivering projects on time.
  • Strong interest in AI technologies and continuous improvement of operational practices.

Benefits

  • Competitive salary ($125,000 - $162,000), based on experience, skills, and location.
  • Fully remote work with flexibility to balance personal and professional life.
  • Comprehensive healthcare and benefits package.
  • Opportunities to work with cutting-edge Big Data, ML, and GenAI technologies.
  • Professional growth through collaboration with cross-functional global teams.
  • Supportive, inclusive, and innovation-driven company culture.

Jobgether is a Talent Matching Platform that partners with companies worldwide to efficiently connect top talent with the right opportunities through AI-driven job matching.

When you apply, your profile goes through our AI-powered screening process designed to identify top talent efficiently and fairly.
🔍 Our AI thoroughly evaluates your CV and LinkedIn profile, analyzing your skills, experience, and achievements.
📊 It compares your profile to the job’s core requirements and historical success factors to determine your match score.
🎯 Based on this analysis, the top 3 candidates with the highest match are automatically shortlisted for the role.
🧠 When necessary, our human team performs an additional manual review to ensure no strong profile is overlooked.

This process is transparent, skills-based, and free of bias — focusing solely on your fit for the role. Once the shortlist is finalized, it is shared directly with the hiring company, whose internal team manages the final selection steps, including interviews and additional assessments.

Thank you for your interest!

About the job

Apply before

Posted on

Job type

Full Time

Experience level

Senior

Salary

Salary: 125k-162k USD

Location requirements

Hiring timezones

United States +/- 0 hours
Claim this profileJobgether logoJO

Jobgether

View company profile

Similar remote jobs

Here are other jobs you might want to apply for.

View all remote jobs

802 remote jobs at Jobgether

Explore the variety of open remote roles at Jobgether, offering flexible work options across multiple disciplines and skill levels.

View all jobs at Jobgether

Remote companies like Jobgether

Find your next opportunity by exploring profiles of companies that are similar to Jobgether. Compare culture, benefits, and job openings on Himalayas.

View all companies

Find your dream job

Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan
Jobgether hiring Lead Site Reliability Engineer - Data Platforms • Remote (Work from Home) | Himalayas