Sridevi Bejj
@sridevibejj
Staff Site Reliability Engineer focused on automation, reliability, and scalable infrastructure.
What I'm looking for
I am a Staff Site Reliability Engineer with 15+ years supporting large-scale Linux, Big Data, cloud, and ML platforms in financial and enterprise environments. I specialize in automation, production reliability, observability, and 24x7 incident management, with proven expertise in Kubernetes, Hadoop ecosystems, GPU-enabled ML platforms, CI/CD pipelines, and Infrastructure as Code.
I have designed and managed cloud and on-prem infrastructure using Terraform, Chef, and Ansible, operated multi-tenant Kubernetes ML platforms, and maintained mission-critical Hadoop clusters. I deliver measurable reliability improvements through automation, monitoring (Prometheus, Grafana, Splunk), security controls (Kerberos, Ranger, LDAP), and disciplined ITSM practices.
Experience
Work history, roles, and key accomplishments
Maintain and support large-scale Hadoop clusters and Kubernetes-based ML platforms, improving availability and performance through upgrades, tuning, automation, and security controls. Lead incident response, vulnerability remediation, and monitoring to ensure production reliability for ETL and ML workloads.
Linux Consultant
Broadridge Financial
Jul 2021 - Apr 2023 (1 year 9 months)
Built cloud and on-prem infrastructure with Terraform and automated provisioning using Chef and Ansible, improving deployment consistency and patching workflows. Implemented enterprise monitoring and scheduled patch automation to support production reliability.
Information Technology Specialist
New York State ITS
Aug 2018 - Jul 2021 (2 years 11 months)
Provided production support and automation for Linux servers, led OS upgrades and migrations (VMware to AWS), and managed configuration frameworks and Kubernetes/Docker environments to maintain 24x7 operations.
Supported 1000+ Linux and Solaris servers, performing kernel tuning, storage management, and data center migrations to sustain production services and reduce incidents. Executed server builds, upgrades, and emergency changes via CAB processes.
Infrastructure Specialist
Merrill Lynch
Jan 2008 - Feb 2009 (1 year 1 month)
Supported 2500+ production and development servers across the Americas, handling patching, backup recovery, cluster administration, and incident management to ensure enterprise service continuity.
Managed enterprise Linux and Solaris infrastructure for 6500+ servers, leading incident and change management under ITIL, and administering storage, backups, and kernel tuning to maintain operational stability.
Education
Degrees, certifications, and relevant coursework
Osmania University
Master of Information Systems, Information Systems
2001 - 2003
Completed Master's degree in Information Systems with coursework and practicals relevant to enterprise IT, systems administration, and infrastructure management.
Osmania University
Bachelor of Computer Applications, Computer Applications
1998 - 2001
Completed Bachelor of Computer Applications with foundational studies in programming, databases, and operating systems.
Tech stack
Software and tools used professionally
Availability
Location
Authorized to work in
Job categories
Skills
Interested in hiring Sridevi?
You can contact Sridevi and 90k+ other talented remote workers on Himalayas.
Message SrideviFind your dream job
Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!
