Skip to main content
HimalayasHimalayas logo
Mhathesh TSRMT
Open to opportunities

Mhathesh TSR

@mhatheshtsr

I’m a Site Reliability Engineer delivering multi-cloud Kubernetes reliability, SLOs, and resilient incident response.

Zimbabwe
Message

What I'm looking for

I’m looking for a team where I can own production reliability end-to-end—SLOs, incident response, and observability—while improving delivery with IaC/CI/CD. I want a blameless, prevention-focused culture in a multi-cloud Kubernetes environment.

I’m a Site Reliability Engineer with 5+ years owning production reliability across multi-cloud Kubernetes platforms. I act as first responder and incident commander, with SLO/SLI ownership and blameless postmortems backed by prevention playbooks.

I’ve resolved a production Elasticsearch split-brain with zero data loss, and I build reliability guardrails through observability and policy-driven operations. I own the observability stack end-to-end—Prometheus, Grafana, Datadog (SLOs, APM, Synthetic), New Relic, CloudWatch, and OpenTelemetry—so error budgets surface reliability gaps before customer impact.

I drive infrastructure and delivery improvements using Terraform-driven IaC and Jenkins CI/CD, cutting release validation from 48+ hours to under 2 hours. I’ve also delivered FIPS 140-2 compliance for banking-grade workloads and built production-used AI-assisted tooling (a Go MCP Slack agent) to reduce incident triage time.

Experience

Work history, roles, and key accomplishments

BI
Current

DevOps Engineer - SRE

BigID Inc

Jan 2025 - Present (1 year 5 months)

Led production incident response for an Elasticsearch split-brain, restoring service with zero data loss and delivering blameless postmortems plus prevention playbooks. Improved reliability and deployment speed by implementing Datadog SLOs/Synthetic monitors and Jenkins pipelines that reduced release validation from 48+ hours to under 2 hours.

SO

Senior DevOps Engineer

Softsensor.ai

Nov 2022 - Dec 2024 (2 years 1 month)

Architected and operated multi-cloud (AWS/GCP/Azure) high-availability and disaster-recovery infrastructure with independent failover. Built an observability platform on Prometheus/Grafana/Alertmanager plus Loki/Tempo/OpenTelemetry and reduced tenant onboarding from 3 weeks to 3 hours using Terraform automation.

GT

DevOps Engineer

Grootan Technologies

Mar 2022 - Oct 2022 (7 months)

Built Kubernetes infrastructure for AI/ML platforms, including low-latency LLM model serving. Drove ISO 27001 and SOC 2 certification by implementing security controls, aligning CI/CD to compliance requirements, and producing audit-ready configurations.

Education

Degrees, certifications, and relevant coursework

Karunya Institute of Technology and Sciences logoKS

Karunya Institute of Technology and Sciences

Bachelor of Technology, Bioinformatics

2017 - 2021

B.Tech in Bioinformatics from Karunya Institute of Technology and Sciences, completed between 2017 and 2021.

Find your dream job

Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan