Himalayas logo
SB
Open to opportunities

Sridevi Bejj

@sridevibejj

Staff Site Reliability Engineer focused on automation, reliability, and scalable infrastructure.

United States
Message

What I'm looking for

I seek roles where I can architect and operate scalable, secure platforms, drive automation and observability, mentor teams, and improve production reliability in a collaborative, DevOps-focused culture.

I am a Staff Site Reliability Engineer with 15+ years supporting large-scale Linux, Big Data, cloud, and ML platforms in financial and enterprise environments. I specialize in automation, production reliability, observability, and 24x7 incident management, with proven expertise in Kubernetes, Hadoop ecosystems, GPU-enabled ML platforms, CI/CD pipelines, and Infrastructure as Code.

I have designed and managed cloud and on-prem infrastructure using Terraform, Chef, and Ansible, operated multi-tenant Kubernetes ML platforms, and maintained mission-critical Hadoop clusters. I deliver measurable reliability improvements through automation, monitoring (Prometheus, Grafana, Splunk), security controls (Kerberos, Ranger, LDAP), and disciplined ITSM practices.

Experience

Work history, roles, and key accomplishments

Visa logoVI
Current

Staff Site Reliability Engineer

May 2023 - Present (2 years 9 months)

Maintain and support large-scale Hadoop clusters and Kubernetes-based ML platforms, improving availability and performance through upgrades, tuning, automation, and security controls. Lead incident response, vulnerability remediation, and monitoring to ensure production reliability for ETL and ML workloads.

BF

Linux Consultant

Broadridge Financial

Jul 2021 - Apr 2023 (1 year 9 months)

Built cloud and on-prem infrastructure with Terraform and automated provisioning using Chef and Ansible, improving deployment consistency and patching workflows. Implemented enterprise monitoring and scheduled patch automation to support production reliability.

NI

Information Technology Specialist

New York State ITS

Aug 2018 - Jul 2021 (2 years 11 months)

Provided production support and automation for Linux servers, led OS upgrades and migrations (VMware to AWS), and managed configuration frameworks and Kubernetes/Docker environments to maintain 24x7 operations.

Thomson Reuters logoTR

Lead Systems Engineer

Feb 2009 - Feb 2015 (6 years)

Supported 1000+ Linux and Solaris servers, performing kernel tuning, storage management, and data center migrations to sustain production services and reduce incidents. Executed server builds, upgrades, and emergency changes via CAB processes.

ML

Infrastructure Specialist

Merrill Lynch

Jan 2008 - Feb 2009 (1 year 1 month)

Supported 2500+ production and development servers across the Americas, handling patching, backup recovery, cluster administration, and incident management to ensure enterprise service continuity.

Genpact logoGE

Infrastructure Engineer

Apr 2003 - Jan 2008 (4 years 9 months)

Managed enterprise Linux and Solaris infrastructure for 6500+ servers, leading incident and change management under ITIL, and administering storage, backups, and kernel tuning to maintain operational stability.

Education

Degrees, certifications, and relevant coursework

Osmania University logoOU

Osmania University

Master of Information Systems, Information Systems

2001 - 2003

Completed Master's degree in Information Systems with coursework and practicals relevant to enterprise IT, systems administration, and infrastructure management.

Osmania University logoOU

Osmania University

Bachelor of Computer Applications, Computer Applications

1998 - 2001

Completed Bachelor of Computer Applications with foundational studies in programming, databases, and operating systems.

Find your dream job

Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan
Sridevi Bejj - Staff Site Reliability Engineer - Visa | Himalayas