Director of Systems Reliability & Field Resilience

Salary: 210k-250k USD

United States only

At Serve Robotics, we’re reimagining how things move in cities. Our personable sidewalk robot is our vision for the future. It’s designed to take deliveries away from congested streets, make deliveries available to more people, and benefit local businesses.

The Serve fleet has been delighting merchants, customers, and pedestrians along the way in Los Angeles while doing commercial deliveries. We’re looking for talented individuals who will grow robotic deliveries from surprising novelty to efficient ubiquity.

Who We Are

We are tech industry veterans in software, hardware, and design who are pooling our skills to build the future we want to live in. We are solving real-world problems leveraging robotics, machine learning and computer vision, among other disciplines, with a mindful eye towards the end-to-end user experience. Our team is agile, diverse, and driven. We believe that the best way to solve complicated dynamic problems is collaboratively and respectfully.

Serve Robotics is seeking a Director of Systems Reliability & Field Resilience, responsible for continuously improving end-to-end operational reliability across our robotic delivery operations infrastructure. In this role, you and your team will proactively identify, triage, and resolve complex, cross-domain issues that impact delivery service quality/efficiency, and will work cross-functionally to build monitoring, alerting, automation and resiliency into our platform.

In this role you will provide leadership and direction to your team while also contributing directly in defining, building and deploying solutions. You will work closely with engineering, product and operations to prioritize the work, and you’ll hire, allocate resources and support your team to deliver capabilities from concept to production.

The Serve Robotics delivery platform spans a wide range of technologies, from cloud and networking infrastructure that powers delivery matching, front-end solutions for robot fleet supervisors and field agents, and on-robot embedded and autonomous systems that all must work seamlessly together to fulfill our daily delivery growth and economics. You will lead a team of experts with backgrounds in SRE, Devops and Cloud Infrastructure and partner across the entire engineering organization to ensure a robust and resilient delivery infrastructure.

The ideal candidate will have a strong track record of hands-on leadership of small and highly technical software engineering teams. You will have experience hiring, mentoring and coaching Sr. level engineers, building a high-performance, collaborative team. You are a highly capable and technical generalist who is comfortable working across all components of a complex system and partnering with domain experts and functional teams to identify issues, perform detailed root cause analysis, and develop strategies for short- and long-term solutions that will often require highly technical collaboration between your team and other engineering teams to deliver.

Responsibilities

Full-Stack Troubleshooting & System Deep Dives: Become the go-to expert for identifying root causes of service issues—whether they're in cloud APIs, robot hardware, network layers, or operational workflows—and coordinate with the respective owning teams to resolve and prevent them.
Build and Lead a Global Systems Reliability Team: Hire, mentor, and grow a multidisciplinary team of high-context generalists who can investigate system-wide failures, document their learnings, and drive improvements across organizational boundaries.
Own the On-Call & Incident Management Process: Take over and evolve the company's on-call process into a mature, well-documented, and inspectable system. Define SLAs, escalation policies, and a best-in-class paging infrastructure that aligns with our service goals.
Establish and Maintain a Knowledge Base: Ensure on-call responders have access to actionable documentation, playbooks, and troubleshooting guides. Make knowledge capture a core part of incident response.
Reliability Analytics & Intuition Building: Use incident and operational data to build a deep intuition about where our systems are most fragile. Create predictive frameworks and reliability metrics that help the organization stay ahead of failures.
Service Health & Performance Dashboards: Build and maintain dashboards that monitor the health of end-to-end services—not just software, but everything that supports customer delivery. Highlight systemic issues, performance regressions, and areas needing investment.
Cross-Functional Collaboration: Work closely with engineering, infrastructure, hardware, field ops, customer support, and leadership to align on reliability priorities and drive systemic improvement efforts.

Qualifications

8+ years of experience in a technical engineering or operations role, with at least 3 years in a leadership position. Background in both software engineering and IT/DevOps a plus.
Deep experience with complex distributed systems, infrastructure, and system debugging, triage and root cause analysis. Familiarity with observability tools like Datadog, Grafana, Prometheus, ELK, etc. a plus.
Strong understanding of hardware/software integration, particularly in cloud-connected device infrastructure including robotics, consumer electronics and embedded systems
Proven success leading incident response or SRE-style functions, and managing on-call teams
Ability to drive organization wide improvements by building trusted cross-functional relationships and technical collaboration across teams
Strong data and dash-boarding skills; can translate operational data into clear insights and action plans
Excellent communication and organizational skills; comfortable writing high-quality docs and leading blameless postmortems

What Makes You Stand Out

Relentless Drive for Quality: You set high standards for code and system design, continually raising the bar for your team and the organization.
Strong Cross-Functional Communicator: You effectively collaborate with product, operations, and executive teams to ensure technology and business goals are aligned.
Strategic Vision Paired with Execution: You think beyond immediate tasks to chart a roadmap that ensures platform longevity and innovation. You excel at driving changes that boost overall team cohesion and performance.
Passion for Innovation: You bring curiosity and enthusiasm for solving complex challenges in delivery and fleet management, keeping up with the latest trends and technologies in the space.

Apply now

Please let Serve Robotics know you found this job on Himalayas. This helps us grow!

Apply now

About the job

Apply before

Jul 16, 2025

Posted on

May 17, 2025

Job type

Full Time

Experience level

Director

Salary

Salary: 210k-250k USD

Location requirements

United States

Hiring timezones

United States +/- 0 hours

About Serve Robotics

Learn more about Serve Robotics and their company culture.

View company profile

Apply now

Please let Serve Robotics know you found this job on Himalayas. This helps us grow!

Apply now

About the job

Apply before

Jul 16, 2025

Posted on

May 17, 2025

Job type

Full Time

Experience level

Director

Salary

Salary: 210k-250k USD

Location requirements

United States

Hiring timezones

United States +/- 0 hours

Claim this profile Claim this profile

Serve Robotics

View company profile

Similar remote jobs

Here are other jobs you might want to apply for.

View all remote jobs

United States only

Lead Site Reliability Engineer

Kontakt.io

Employee count: 51-200

Salary: 190k-230k USD

Full Time

Site Reliability Engineering Lead

United States only

Site Reliability Engineering Manager

ABBYY

Employee count: 201-500

Full Time

Director Of Site Reliability Engineering

United States only

Senior Site Reliability Engineer

Kontakt.io

Employee count: 51-200

Full Time

Senior Site Reliability Engineer

United States only

Senior Software Engineer - Site Reliability

Abnormal Security

Employee count: 501-1000

Salary: 176k-230k USD

Full Time

Senior Site Reliability Engineer

AI, AG + 39 more

Senior Site Reliability Engineer

Rootly

Employee count: 11-50

Full Time

Senior Site Reliability Engineer

United States only

Director of Engineering

DomainTools

Employee count: 51-200

Full Time

Serve Robotics

Full Time

Senior Community Engagement Manager

United States only

Trust & Safety Manager

Serve Robotics

Full Time

Trust And Safety Product Manager

Top remote companies

Remote companies like Serve Robotics

Find your next opportunity by exploring profiles of companies that are similar to Serve Robotics. Compare culture, benefits, and job openings on Himalayas.

View all companies

CodaMetrix

CodaMetrix is an AI-driven platform focused on autonomous medical coding, improving healthcare revenue cycle management by reducing administrative burdens and enhancing coding accuracy.

Healthcare Technology Revenue Cycle Management

CR2 jobs

Crisalix

Crisalix leads the industry in 3D aesthetic simulation, providing both patients and surgeons with cutting-edge tools to visualize potential surgery results, enhancing trust and satisfaction.

3D Simulation Aesthetic Surgery

Holding Hands Inc.

Holding Hands Inc. provides innovative therapies for children and adults, focusing on behavioral, social, and mental health support.

Mental Health Services Behavioral Health

AlphaMap

Benefits Tech stack

AlphaMap started with a simple goal: to provide tailormade GIS services to the commercial real estate industry.

GIS Commercial Real Estate

YA1 job

Yahoo

Yahoo, the trusted guide for millions globally with iconic products for over 30 years, helps individuals achieve their goals online through a variety of services including Yahoo Mail, Yahoo News, and Yahoo Finance.

Digital Media Email Services

Blue Bottle Coffee

Blue Bottle Coffee, founded in 2002, is a specialty coffee roaster and retailer known for its fresh, high-quality coffee and a commitment to sustainability.

Specialty Coffee Retail

Top remote companies

Remote companies like Serve Robotics

Find your next opportunity by exploring profiles of companies that are similar to Serve Robotics. Compare culture, benefits, and job openings on Himalayas.

View all companies

Find your dream job

Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!

Find your dream job

Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!

Director of Systems Reliability & Field Resilience

Who We Are

Responsibilities

Qualifications

What Makes You Stand Out

Apply now

About the job

Apply before

Posted on

Job type

Experience level

Salary

Location requirements

Hiring timezones

Job categories

Skills

About Serve Robotics

Apply now

About the job

Apply before

Posted on

Job type

Experience level

Salary

Location requirements

Hiring timezones

Job categories

Skills

Serve Robotics

Similar remote jobs

Lead Site Reliability Engineer

Site Reliability Engineering Manager

Senior Site Reliability Engineer

Senior Software Engineer - Site Reliability

Senior Site Reliability Engineer

Director of Engineering

15 remote jobs at Serve Robotics

Software Implementation Engineer, Market Expansion

Sr. Software Engineer, Delivery Platform

Senior Accountant

Account Executive, Advertising Sales – New York

Public Engagement Manager

Trust & Safety Manager

Remote companies like Serve Robotics

Remote companies like Serve Robotics

Find your dream job

Find your dream job

Find your dream job

Software Implementation Engineer, Market Expansion

Sr. Software Engineer, Delivery Platform

Senior Accountant

Account Executive, Advertising Sales – New York

Public Engagement Manager

Trust & Safety Manager

Lead Site Reliability Engineer

Site Reliability Engineering Manager

Senior Site Reliability Engineer

Senior Software Engineer - Site Reliability

Senior Site Reliability Engineer

Director of Engineering

Remote companies like Serve Robotics