Gaurav Gogoi
@gauravgogoi
Site Reliability Engineer improving AWS distributed systems through observability, automation, and incident response.
What I'm looking for
I’m a Site Reliability Engineer/DevOps Engineer with 3+ years of experience building and operating distributed systems on AWS. I focus on practical reliability work—turning production signals into clear SLI/SLOs, tight feedback loops, and faster incident recovery.
At Zuora, I owned on-call incident response and cross-team coordination using PagerDuty, Grafana, and Kibana, reducing MTTR by 50% through structured runbook-driven triage. I also deployed event-driven autoscaling with KEDA and HPA on Kubernetes (AWS EKS), reducing workload-related production incidents by 70%.
I built latency monitoring for 30+ Tier-1 services with Prometheus and Grafana, defining SLIs, aligning alerting with SLOs and error budgets, and cutting SLA breaches/customer escalations by 60% while reducing alert noise to improve signal-to-noise during incidents.
I further strengthened observability and delivery by deploying a self-hosted Prometheus Blackbox Exporter for 1000+ endpoints across 100+ services, improving upstream dependency visibility and reducing customer-reported issues by 80%. I’ve also driven observability and deployment modernization—migrating to Grafana (saving $400K annually), and implementing a GitOps-driven continuous deployment platform with Terraform and FluxCD for AWS ECS, reducing deployment time by 10x.
Experience
Work history, roles, and key accomplishments
Owned on-call incident response and cross-team coordination, reducing MTTR by 50% through runbook-driven triage. Implemented event-driven autoscaling with KEDA on AWS EKS and improved SLI/SLO-aligned alerting, reducing production incidents by 70% and SLA breaches by 60%.
Defined SLI/SLOs across 40+ critical services and implemented log-based alerting, reducing customer-reported incidents by 50%. Built an AI-powered incident management system and self-healing automation with PagerDuty integration, cutting mitigation and response times by 80% and 70%.
Education
Degrees, certifications, and relevant coursework
NIT Silchar
Bachelor of Technology, Electrical, Electronics and Communications Engineering
2019 - 2023
Grade: CGPA: 8.37
B.Tech in Electrical, Electronics and Communications Engineering (ECE) at NIT Silchar (2019–2023), achieving a CGPA of 8.37.
Tech stack
Software and tools used professionally
Availability
Location
Authorized to work in
Job categories
Skills
Interested in hiring Gaurav?
You can contact Gaurav and 90k+ other talented remote workers on Himalayas.
Message GauravFind your dream job
Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!
