Skip to main content
SB
Open to opportunities

Sundar Bishnoi

@sundarbishnoi

Senior Site Reliability Engineer focused on self-healing, automation, and scalable reliability improvements.

India
Message

What I'm looking for

I’m looking for a role where I can build observable, resilient systems—automating remediation, improving MTTR/MTTD, and partnering with engineering to scale self-healing infrastructure with measurable reliability gains.

I’m a Senior Site Reliability Engineer with 7+ years of experience improving reliability, automation, and scalability for large-scale private cloud environments. I focus on reducing MTTR/MTTD, strengthening production systems, and building self-healing infrastructure that keeps services resilient under pressure.

At Morgan Stanley, I designed and deployed Python-based self-healing systems for 20,000+ VMware ESXi hosts, automating remediation of infrastructure failures and eliminating ~85% of manual interventions while reducing incident resolution time (MTTR) by ~70–75%. I also redesigned monitoring and alerting pipelines using PagerDuty and observability tooling to cut MTTD by ~50–55%, and built Ansible-driven ESXi patching automation to orchestrate pre-checks, rollout, and validation across enterprise clusters.

Earlier at Capgemini, I delivered end-to-end observability dashboards, improved alert signal-to-noise through health rule optimization, and enhanced incident response metrics through proactive monitoring. I’ve also supported chaos testing and production releases, and I mentor others to drive adoption of SRE best practices, so reliability improvements become repeatable, measurable outcomes.

Experience

Work history, roles, and key accomplishments

MS

SRE3 Director (SRE)

Jun 2022 - Feb 2026 (3 years 8 months)

Designed and deployed Python-based self-healing systems for 20,000+ VMware ESXi hosts, eliminating ~85% of manual interventions and reducing MTTR by ~70–75% while improving availability. Reduced MTTD by ~50–55% by redesigning monitoring and alerting pipelines with PagerDuty and observability tooling; also built Ansible-driven ESXi patch automation and a troubleshooting engine that cut manual debug

CA

Associate Consultant (SRE)

Oct 2019 - Jun 2022 (2 years 8 months)

Automated CPU metric collection from AppDynamics for IHS applications across Zone2/Zone3 during Active/Passive testing using Python, packaged as a reusable utility for team adoption. Built end-to-end observability dashboards, reduced alert noise via health rule optimization, improved incident response metrics (MTTR/MTTD/MTTM), and executed chaos testing to find infrastructure bottlenecks.

Education

Degrees, certifications, and relevant coursework

National Institute of Technology, Tiruchirappalli logoNT

National Institute of Technology, Tiruchirappalli

Bachelor of Technology (B.Tech.)

2014 - 2018

Earned a Bachelor of Technology (B.Tech.) from National Institute of Technology, Tiruchirappalli from 2014 to 2018.

Find your dream job

Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan