Gaurav Gogoi
@gauravgogoi
Site Reliability Engineer improving AWS distributed systems through observability, automation, and incident response.
What I'm looking for
I’m a Site Reliability Engineer/DevOps Engineer with 3+ years of experience building and operating distributed systems on AWS. I focus on practical reliability work—turning production signals into clear SLI/SLOs, tight feedback loops, and faster incident recovery.
At Zuora, I owned on-call incident response and cross-team coordination using PagerDuty, Grafana, and Kibana, reducing MTTR by 50% through structured runbook-driven triage. I also deployed event-driven autoscaling with KEDA and HPA on Kubernetes (AWS EKS), reducing workload-related production incidents by 70%.
I built latency monitoring for 30+ Tier-1 services with Prometheus and Grafana, defining SLIs, aligning alerting with SLOs and error budgets, and cutting SLA breaches/customer escalations by 60% while reducing alert noise to improve signal-to-noise during incidents.
I further strengthened observability and delivery by deploying a self-hosted Prometheus Blackbox Exporter for 1000+ endpoints across 100+ services, improving upstream dependency visibility and reducing customer-reported issues by 80%. I’ve also driven observability and deployment modernization—migrating to Grafana (saving $400K annually), and implementing a GitOps-driven continuous deployment platform with Terraform and FluxCD for AWS ECS, reducing deployment time by 10x.
Experience
Work history, roles, and key accomplishments
Owned on-call incident response and cross-team coordination, reducing MTTR by 50% through runbook-driven triage. Implemented event-driven autoscaling with KEDA on AWS EKS and improved SLI/SLO-aligned alerting, reducing production incidents by 70% and SLA breaches by 60%.
Defined SLI/SLOs across 40+ critical services and implemented log-based alerting, reducing customer-reported incidents by 50%. Built an AI-powered incident management system and self-healing automation with PagerDuty integration, cutting mitigation and response times by 80% and 70%.
Education
Degrees, certifications, and relevant coursework
NIT Silchar
Bachelor of Technology, Electrical, Electronics and Communications Engineering
2019 - 2023
Grade: CGPA: 8.37
B.Tech in Electrical, Electronics and Communications Engineering (ECE) at NIT Silchar (2019–2023), achieving a CGPA of 8.37.
Tech stack
Software and tools used professionally
Availability
Location
Authorized to work in
Job categories
Skills
Interested in hiring Gaurav?
You can contact Gaurav and 90k+ other talented remote workers on Himalayas.
Message GauravFind your dream job
Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!
