Mhathesh TSR
@mhatheshtsr
I’m a Site Reliability Engineer delivering multi-cloud Kubernetes reliability, SLOs, and resilient incident response.
What I'm looking for
I’m a Site Reliability Engineer with 5+ years owning production reliability across multi-cloud Kubernetes platforms. I act as first responder and incident commander, with SLO/SLI ownership and blameless postmortems backed by prevention playbooks.
I’ve resolved a production Elasticsearch split-brain with zero data loss, and I build reliability guardrails through observability and policy-driven operations. I own the observability stack end-to-end—Prometheus, Grafana, Datadog (SLOs, APM, Synthetic), New Relic, CloudWatch, and OpenTelemetry—so error budgets surface reliability gaps before customer impact.
I drive infrastructure and delivery improvements using Terraform-driven IaC and Jenkins CI/CD, cutting release validation from 48+ hours to under 2 hours. I’ve also delivered FIPS 140-2 compliance for banking-grade workloads and built production-used AI-assisted tooling (a Go MCP Slack agent) to reduce incident triage time.
Experience
Work history, roles, and key accomplishments
DevOps Engineer - SRE
BigID Inc
Jan 2025 - Present (1 year 5 months)
Led production incident response for an Elasticsearch split-brain, restoring service with zero data loss and delivering blameless postmortems plus prevention playbooks. Improved reliability and deployment speed by implementing Datadog SLOs/Synthetic monitors and Jenkins pipelines that reduced release validation from 48+ hours to under 2 hours.
Senior DevOps Engineer
Softsensor.ai
Nov 2022 - Dec 2024 (2 years 1 month)
Architected and operated multi-cloud (AWS/GCP/Azure) high-availability and disaster-recovery infrastructure with independent failover. Built an observability platform on Prometheus/Grafana/Alertmanager plus Loki/Tempo/OpenTelemetry and reduced tenant onboarding from 3 weeks to 3 hours using Terraform automation.
DevOps Engineer
Grootan Technologies
Mar 2022 - Oct 2022 (7 months)
Built Kubernetes infrastructure for AI/ML platforms, including low-latency LLM model serving. Drove ISO 27001 and SOC 2 certification by implementing security controls, aligning CI/CD to compliance requirements, and producing audit-ready configurations.
AWS Cloud Engineer
Contraly LLC
Jun 2021 - Mar 2022 (9 months)
Built an OTT streaming platform on AWS serving 15+ countries and supporting 2,000 concurrent users. Implemented scalable media and serverless architecture using Lambda, DynamoDB, MediaConvert, S3, CloudFront, Route 53, and CloudWatch monitoring.
Education
Degrees, certifications, and relevant coursework
Karunya Institute of Technology and Sciences
Bachelor of Technology, Bioinformatics
2017 - 2021
B.Tech in Bioinformatics from Karunya Institute of Technology and Sciences, completed between 2017 and 2021.
Tech stack
Software and tools used professionally
GitHub
SonarQube
Kubernetes
Jenkins
GitHub Actions
MySQL
PostgreSQL
MongoDB
Gmail
Okta
Terraform
Azure DevOps
JFrog Artifactory
Loki
Istio
MongoDB Atlas
Grafana
Prometheus
OpenTelemetry
Ubuntu
CentOS
Linux
New Relic
Datadog
jFrog
Elasticsearch
Ansible
Root Cause
Trivy
Kyverno
Wiz
ArgoCD
Bash
Checkov
Karpenter
Prowler
Conftest
Arch
KEDA
BigID
Jan
Blameless
Availability
Location
Authorized to work in
Job categories
Skills
Interested in hiring Mhathesh?
You can contact Mhathesh and 90k+ other talented remote workers on Himalayas.
Message MhatheshFind your dream job
Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!
