HimalayasHimalayas logo
LirioLI

Senior System Reliability Engineer

Lirio is a behavior change AI platform that unites behavioral science with artificial intelligence to improve the patient health journey and drive individuals toward positive behavior change at scale.

Lirio

Employee count: 51-200

Salary: 130k-150k USD

CA, DE + 3 more

Stay safe on Himalayas

Never send money to companies. Jobs on Himalayas will never require payment from applicants.

Lirio is a technology/software company that provides expertise in a variety of behavioral science domains (e.g., behavioral economics, social psychology, public health), data science, and machine learning to drive consumer engagement, close gaps in preventive and chronic care, and promote health and well-being across an individual’s lifespan. Lirio’s behavior change AI platform unites behavioral science with advanced artificial intelligence (AI) to deliver Precision Nudging health interventions. Precision Nudging is the application of behavioral science to health interventions personalized by AI to each individual that overcome barriers to action at the right time and place for scalable, behavior change.

This is a remote role with the opportunity to be hybrid if located in Tennessee. All applicants must be authorized to work in the US without sponsorship.

To ensure an excellent onboarding experience and integration into the company, new colleagues will spend their first week onsite at one of our offices in Tennessee. Travel expenses will be paid. This is a requirement.

Position Summary

The Senior System Reliability Engineer (SRE) at Lirio is responsible for the reliability, scalability, and performance of our cloud-native applications and infrastructure. This role leads the design and implementation of automation, monitoring, and incident response processes, and mentors other engineers in SRE best practices. The Senior SRE partners with development teams to ensure robust, secure, and highly available systems, and drives continuous improvement in operational excellence.

This role operates as a senior, hands-on reliability engineer embedded with product and platform teams. The Senior SRE is accountable for defining and enforcing service-level objectives (SLOs), reducing operational toil through automation, and improving system reliability through proactive engineering rather than reactive support. This role is not ticket-driven operations and is expected to influence architecture, development practices, and incident readiness across the platform

Essential Duties & Responsibilities

Reliability Engineering & Automation (40%)

  • Architect, implement, and maintain automated solutions for deployment, monitoring, alerting and incident response using Lirio’s technology stack (AWS, Azure, Kubernetes, Kafka, Java, TypeScript, Groovy, Databases/SQL).
  • Develop and manage infrastructure as code (e.g., Terraform, AWS CloudFormation).
  • Build and optimize CI/CD pipelines for seamless, reliable delivery.
  • Define, implement, and continuously refine service-level indicators (SLIs), service-level objectives (SLOs), and error budgets for critical services.
  • Identify and reduce operational toil through automation, platform improvements, and architectural changes.
  • Performance analysis and optimization of Lirio systems and services.
  • Ensure high availability and scalability of services through proactive engineering, load testing, and capacity planning across multi-tenant and client-specific environments.

Peer Reviews & Collaboration (10%)

  • Review infrastructure changes, automation scripts, and reliability-impacting code changes to ensure production readiness.
  • Collaborate with software engineers to embed reliability, security, and operational best practices into development workflows.
  • Partner with software engineering teams during design and architecture discussions to identify reliability risks early.

Operational Support & Incident Management (20%)

  • Monitor system health using modern observability tools (e.g., Prometheus, Grafana, Datadog).
  • Participate in a defined on-call rotation supporting production systems, with clear escalation paths and expectations.
  • Contribute to and maintain incident severity definitions, response procedures, and no-blame postmortem practices.
  • Lead incident response, root cause analysis, and postmortems for production issues.
  • Triage and resolve issues, ensuring minimal downtime and rapid recovery.
  • Support client onboarding and production rollouts by ensuring reliability, observability, and operational readiness standards are met.

Mentorship & Knowledge Sharing (10%)

  • Mentor and coach engineers on reliability engineering principles, operational ownership, and incident response best practices.
  • Design processes to share operational knowledge and avoid single points of failure.
  • Advise colleagues on architecture and reliability strategies.
  • Help establish shared operational ownership across teams to reduce single points of failure and knowledge silos.

Continuous Learning & Innovation (10%)

  • Stay current with industry trends in reliability engineering, cloud operations, and automation.
  • Bring innovation to operational practices and system design, evaluating and introducing new tools and technologies as appropriate for Lirio.
  • Evaluate new tooling with an emphasis on operational simplicity, security, and long-term maintainability.

Documentation & Process Improvement (5%)

  • Define and document operational processes, incident response playbooks, and reliability standards.
  • Contribute to operational planning, incident reviews, and reliability documentation.

Qualifications

  • 5-7 years related experience
  • Bachelor's Degree in related field
  • Linux systems and networking fundamentals (DNS, TCP/IP, TLS)
  • Distributed systems debugging and failure analysis
  • Load, stress, and fault-injection testing
  • CI/CD tools and processes
  • Version control (e.g., Git)
  • Cloud platforms (e.g., AWS, Azure)
  • Containers and orchestration (Kubernetes)
  • Kafka (messaging/streaming)
  • Scripting and programming languages (e.g., Java, TypeScript, Groovy, Python)
  • Agile methodologies (e.g., Scrum, XP, SAFe)
  • Databases/SQL
  • Observability/monitoring tools (DataDog)

Benefits

  • Medical (HSA available)
  • Dental
  • Vision
  • Short-term & long-term disability (company-paid)
  • Life & AD&D (company-paid)
  • 401K with company match
  • 10 paid holidays, quarterly company closure dates, + holiday week company closure
  • Flexible time off policy
  • Work from home
  • 6 weeks paid parental leave
  • Salary range: $130k-$150k

About the job

Apply before

Posted on

Job type

Full Time

Experience level

Salary

Salary: 130k-150k USD

Education

Bachelor degree

Experience

5 years minimum

Location requirements

Hiring timezones

United States +/- 0 hours, and 4 other timezones

About Lirio

Learn more about Lirio and their company culture.

View company profile

We are Lirio, a company dedicated to improving health outcomes by combining behavioral science with artificial intelligence. Our mission is to power your ability to move people along their unique journey to better health through person-centered communication. We believe everyone deserves a better life and a healthier future. Our team is a dynamic group of change-makers who bring our whole selves to the mission of personalized care through the responsible use of technology. We strive to make a positive, lasting impact at the individual and global level through our work and are committed to diversity, inclusion, community, and freedom of expression.

Our Precision Nudging™ technology is at the core of what we do. This approach applies tailored behavioral science solutions to overcome patient-specific barriers to action, delivering interventions at the right time and place. By understanding an individual's context and health history, our AI-driven recommendations augment human-to-human relationships built on trust and empathy, encouraging patients to take health actions aligned with expert clinical guidance. We continuously learn from each individual's interactions, adapting our behavioral interventions in real time to motivate people to stay engaged with their health and their healthcare organizations. This leads to an empowered and healthier patient population and reduced cost of care. We aim to help healthcare organizations reach more people earlier with the right care solutions, address health equity issues, and close gaps in care. Our organization has deep roots in behavioral science and is founded on a solid, widespread ambition to do better. We are always eager to learn, unafraid to adapt, and determined to do better, approaching each new season as an opportunity for growth and improvement.

Claim this profileLirio logoLI

Lirio

View company profile

Similar remote jobs

Here are other jobs you might want to apply for.

View all remote jobs

8 remote jobs at Lirio

Explore the variety of open remote roles at Lirio, offering flexible work options across multiple disciplines and skill levels.

View all jobs at Lirio

Remote companies like Lirio

Find your next opportunity by exploring profiles of companies that are similar to Lirio. Compare culture, benefits, and job openings on Himalayas.

View all companies

Find your dream job

Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan