This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Site Reliability Engineer in India.
We are seeking a highly skilled Site Reliability Engineer to ensure the resilience, observability, and continuous improvement of disaster recovery environments. In this role, you will collaborate with cross-functional teams including DR architects, security, infrastructure, and engineering to define and maintain SLIs/SLOs, reduce operational toil, and drive platform reliability initiatives. You will lead chaos engineering exercises, implement automation for failover and recovery, and participate in failover/failback simulations to validate system robustness. This is an opportunity to work in a fast-paced, innovative environment, optimizing critical cloud infrastructure across Azure, AWS, and private cloud platforms while contributing directly to operational excellence. The role emphasizes proactive problem-solving, collaboration, and a strong focus on system performance and reliability.
Accountabilities
- Design, build, and maintain observability dashboards and proactive alerting systems for DR environments across multiple cloud platforms.
- Define and monitor Service Level Indicators (SLIs) and Error Budgets aligned with RPO/RTO targets.
- Collaborate on runbook automation, synthetic testing, and validation pipelines to ensure DR readiness.
- Lead chaos engineering exercises and game-day simulations to proactively identify system weaknesses.
- Conduct post-incident reviews, implement feedback loops, and manage automation backlog.
- Drive infrastructure as code (IaC) adoption and reliability improvements across platforms.
- Contribute to compliance reporting and performance monitoring for protected applications.
Requirements
- 5+ years of experience in SRE, DevOps, or Platform Engineering roles.
- Hands-on expertise with observability tools such as Grafana, Prometheus, Datadog, or Splunk.
- Experience defining and tracking SLIs/SLOs, error budgets, and availability dashboards.
- Proficiency in at least one scripting or programming language (Python, Bash, Go).
- Knowledge of disaster recovery principles, failover practices, and RPO/RTO objectives.
- Familiarity with IaC tools like Terraform, Ansible, or CloudFormation.
- Experience with CI/CD pipelines, automated testing, and cloud-native deployments (Azure or AWS).
- Strong problem-solving skills, collaboration, and cross-functional teamwork ability.
- Fluent in written and spoken English.
Nice to have: Experience with Zerto, Veeam, chaos engineering tools, Kubernetes, TISAX/ISO 27001 compliance, or platform reliability for mission-critical systems.
Benefits
- Remote work from India.
- Exposure to cutting-edge disaster recovery and cloud platform technologies.
- Opportunity to lead initiatives that improve system resilience and operational efficiency.
- Collaboration with cross-functional teams in a fast-paced, innovative environment.
- Professional development and upskilling opportunities.
- Competitive compensation package for contract duration.
Jobgether is a Talent Matching Platform that partners with companies worldwide to efficiently connect top talent with the right opportunities through AI-driven job matching.
When you apply, your profile goes through our AI-powered screening process designed to identify top talent efficiently and fairly.
🔍 Our AI evaluates your CV and LinkedIn profile thoroughly, analyzing your skills, experience, and achievements.
📊 It compares your profile to the job’s core requirements and past success factors to determine your match score.
🎯 Based on this analysis, we automatically shortlist the 3 candidates with the highest match to the role.
🧠 When necessary, our human team may perform an additional manual review to ensure no strong profile is missed.
The process is transparent, skills-based, and free of bias — focusing solely on your fit for the role.
Once the shortlist is completed, we share it directly with the company that owns the job opening. The final decision and next steps (such as interviews or additional assessments) are then made by their internal hiring team.