Senior Site Reliability Engineer

RapidSOS is an intelligent safety platform that securely links life-saving data from connected devices, apps, and sensors to 9-1-1 and first responders, empowering faster and more effective emergency response.

RapidSOS

Employee count: 201-500

Salary: 160k-195k USD

United States only

Stay safe on Himalayas

Never send money to companies. Jobs on Himalayas will never require payment from applicants.

In the time it takes you to read this job description, RapidSOS will have handled ~1,380 emergencies.

At RapidSOS, we are committed to using technology to build a safer, stronger future and working together to save lives. We’re in an exciting phase of growth, welcoming new members from across the globe to our mission-driven, ambitious, and inclusive team. Our work is founded on our values of elevating purpose, inventing tomorrow, delivering with urgency, serving with integrity, and winning together, all of which support a company culture where people can innovate, collaborate, grow, and, above all, make an impact.

RapidSOS is the leading public safety AI company that unlocks mission-critical intelligence for first responders and security teams – enabling faster, smarter and more accurate emergency response. Real-time data from the world’s largest safety network of 700M+ devices, 200+ global enterprises, and 23,000+ federal, state and local agencies fuels the RapidSOS HARMONY AI engine that delivers this intelligence to those who need it most. Learn more at www.RapidSOS.com.

What this role is about: Are you excited to work on systems where reliability directly impacts real-world outcomes? At RapidSOS, we build technology that powers emergency response, ensuring critical data gets to the right place at the right time. When these systems degrade or fail, the impact is real and reliability isn’t a background function. It’s fundamental to how our product shows up in critical moments.

We’re seeking a Senior Site Reliability Engineer to own the performance and stability of services that operate at scale in real-world, high-stakes environments. You’ll work across infrastructure-as-code, container orchestration, CI/CD pipelines, and service-level application code, identifying and resolving issues at their root cause while proactively shaping how systems are built to improve reliability from the start. You’ll go beyond surface-level fixes, digging into everything from service behavior in Kubernetes to application-level decisions that impact performance, cost, and reliability. You’ll collaborate closely with engineering teams to improve how our systems are built, observed, and operated. Along the way, you’ll help shape how we approach reliability as a discipline—closing visibility gaps, improving resilience, and ensuring our platform performs when it matters most.

What you’ll do:

Own performance and reliability outcomes: Ownership of how application-level decisions create system-level impact, including connection pooling, database architecture, traffic routing patterns, and memory allocation. Collaboration with engineering teams that own specific domains, partnering directly to improve reliability and performance across their systems.
Design for system resilience: Responsibility for strengthening reliability through proactive design decisions, including safer deployment patterns, failover strategies, and redundancy approaches that improve system behavior under stress.
Build observability into system behavior: Proactively instrument services with structured logging, metrics, and alerting so systems are easier to understand and debug. The focus is on creating clear signals from production behavior before issues escalate.
Own incidents from signal to resolution: Ownership of production issues from first signal through resolution, including investigation across infrastructure and application layers, root cause identification, and implementation of fixes that restore stability and strengthen system behavior long term.
Work across the stack without a permission slip: You’ll work across infrastructure-as-code, container orchestration, CI/CD pipelines, and service-level application code. When issues come up, you don’t wait for a handoff—ownership is taken directly and driven through to resolution.

What we’re looking for in our ideal candidate:

5+ years of professional engineering experience with deep expertise in Python
Real cloud infrastructure experience with AWS: networking, managed databases, cost implications of traffic routing decisions, IAM, DNS-based routing and failover
Hands-on kubernetes experience with containerized workloads in production across EKS, ECS, or Fargate, you can read events, understand resource limits, know when to drain vs. delete a node, and understand the tradeoffs between orchestration models
Strong understanding of distributed systems and how they fail, including resource exhaustion, replication lag, queue backpressure, and other common failure modes
Experience operating high-throughput messaging systems (RabbitMQ, Kafka, AWS SNS / SQS, etc.) and the infrastructure around them, including infrastructure-as-code (e.g., Terraform) and CI/CD pipelines, with an emphasis on improving reliability and scalability
Experience building or improving observability through logging, metrics, and alerting
Demonstrable experience in using AI to safely and securely enhance velocity, improve reliability and recoverability of services
Strong communication and interpersonal skills; is a team player with a positive attitude
Highly self-motivated; ability to adapt and learn quickly in a fast-paced environment with a strong sense of ownership
Strong proficiency in coding best practices – ability to write clean, maintainable, and testable code
Demonstrated expertise in problem solving – comfortable working across both infrastructure and application layers to diagnose and resolve issues at the source
Ability and willingness to collaborate in-person a few times per quarter, or as needed

Nice-to-have experience (but not required!):

Experience supporting production systems in an on-call or similar capacity where reliability matters
Experience with observability and GitOps tooling; hands-on with Datadog (APM, alerting), Elasticsearch/OpenSearch, and ArgoCD-based GitOps deployments; comfortable modernizing legacy CI/CD pipelines (e.g., Concourse, Jenkins) toward cloud-native approaches

What we offer:

The chance to work with a passionate team on solving one of the largest challenges globally
Competitive salary and benefits and equity participation
A dynamic, flexible and fun start-up work environment with a highly talented team

If you're curious to learn more about RapidSOS, you can check out https://rapidsos.com/blog/

Starting pay for a successful applicant will depend on a variety of job-related factors, which may include experience, relevant skills, training, education, location, business needs, or market demands. The salary range for this role is $160,000 - $195,000. This role will also be eligible to receive equity options.

RapidSOS is proud to be an equal opportunity workplace. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, or Veteran status.

Interested in the role but you don’t meet 100% of the requirements? We’d love to hear from you! We encourage you to apply; we’d be excited to see if your unique skill set and experience could be a match.

Apply now

Please let RapidSOS know you found this job on Himalayas. This helps us grow!

Apply now

About the job

Apply before

Jun 23, 2026

Posted on

Apr 24, 2026

Job type

Full Time

Experience level

Senior

Salary

Salary: 160k-195k USD

Experience

5 years minimum

Location requirements

United States

Hiring timezones

United States +/- 0 hours

Browse similar jobs

Remote Senior Site-Reliability-Engineering Jobs Remote Full Time Site-Reliability-Engineering Jobs Remote Senior Site-Reliability-Engineering Jobs in United States Remote Full Time Jobs in United States Remote Site-Reliability-Engineering Jobs in United States

About RapidSOS

Learn more about RapidSOS and their company culture.

View company profile

RapidSOS is an intelligent safety platform that securely links life-saving data from connected devices, apps, and sensors to 9-1-1 and first responders. Founded in 2012 by Michael Martin and Nicholas Horelik, the company aims to transform emergency response by providing critical information to emergency communication centers (ECCs) and field responders. This data can include precise location, medical information, vehicle telematics, building sensor data, and more, empowering faster and more effective emergency response. RapidSOS partners with a wide range of technology companies, including major players in the IoT, mobile, and automotive industries, to integrate its platform into their products and services. This allows data from over 540 million connected devices to be accessible to more than 16,000 public safety agencies.

The company's mission is driven by the understanding that traditional 9-1-1 systems, largely built for landlines, often lack the rich, dynamic data available from modern technology. RapidSOS bridges this gap, ensuring that when an emergency occurs, first responders have access to a comprehensive picture of the situation, often before they even arrive on scene. This can significantly reduce response times and improve outcomes in critical situations. The platform supports various emergency scenarios, from car accidents and medical emergencies to home invasions and natural disasters. RapidSOS offers several product offerings, including RapidSOS Safety, RapidSOS Portal, RapidSOS Premium, and RapidSOS Integrations, and has recently unveiled RapidSOS Unite, an AI-powered platform that further enhances data synthesis and accessibility for emergency personnel. The company's technology is designed to work in harmony with existing public safety infrastructure, augmenting their capabilities and providing a more robust and modern emergency response ecosystem.

Tech stack

Learn about the tools and technologies that RapidSOS uses to build, market, and sell its products.

View tech stack

JavaScript

Python

CircleCI

PostgreSQL

React

Redux

NetSuite

RapidSOS employees can create an account to update this tech stack.

Employee benefits

Learn about the employee benefits and perks provided at RapidSOS.

View benefits

Paid holidays

Offers paid holidays.

Life Insurance

Offers life insurance.

Paid sick days

Offers paid sick days.

Dental insurance

Offers dental insurance.

View RapidSOS's employee benefits

Apply now

Please let RapidSOS know you found this job on Himalayas. This helps us grow!

Apply now