Shiva Muppa
@shivamuppa
Data Engineer with expertise in scalable cloud-native data solutions.
What I'm looking for
I am a Data Engineer with extensive experience in building scalable, cloud-native data solutions on AWS and Azure. I specialize in designing and optimizing ETL pipelines and real-time streaming systems using Apache Spark, Kafka, and Airflow. My proven track record in delivering high-performance solutions ensures compliance with standards like HIPAA, and I aim to leverage my technical expertise and cross-functional collaboration skills to build secure and efficient data platforms.
Throughout my career, I have achieved significant milestones, such as reducing latency by 65% through the development of real-time ingestion pipelines and cutting reporting times by 50% by optimizing Snowflake queries. I have automated multi-source data ingestion, increasing engineering team efficiency by 40%, and developed Infrastructure as Code (IaC) solutions that reduced provisioning time by 70%. My commitment to security and compliance has enabled secure, scalable data access, improving governance across various projects.
Experience
Work history, roles, and key accomplishments
Data Engineer
Paychex
Aug 2023 - Present (1 year 11 months)
Designed and built scalable ETL/ELT pipelines using AWS Glue, Athena, and Python, handling over 10 billion records monthly across payroll and benefits domains. Developed real-time ingestion pipelines using Apache Kafka, integrating with AWS Lambda and Step Functions, reducing batch latency by 70%. Built serverless microservices for preprocessing and cleansing data using Lambda, with robust error h
Data Engineer
Motivity Labs
Apr 2021 - Jul 2022 (1 year 3 months)
Developed modular, scalable pipelines using Azure Data Factory, Databricks, and PySpark, standardizing logic across ingestion flows. Built metadata-driven architecture leveraging parameterized datasets, triggers, and Data Lake Gen2, enabling dynamic, reusable pipeline configurations. Optimized Spark job performance by adjusting executor memory, partition counts, and cache strategies, improving loa
Data Engineer
Magellanic Cloud
Sep 2019 - Mar 2021 (1 year 6 months)
Re-architected legacy ETL pipelines to distributed Apache Spark jobs using Scala and PySpark, reducing end-to-end job time by 60%. Built custom RDD transformations to process complex XML/CSV inputs into Parquet, achieving schema evolution and format standardization. Implemented SCD Type 2 logic in AWS Redshift, preserving historical snapshots of evolving customer data.
Data Engineer
AQM Technologies
Sep 2018 - Aug 2019 (11 months)
Developed batch ETL pipelines using Spark and Scala, processing large-scale insurance claims and customer demographics data. Used Kafka with Spark Streaming to stream underwriting events in real-time, enabling timely alerting and reporting. Built Hive-based preprocessing and cleansing layers, applying filters, validations, and deduplication prior to ingestion.
Education
Degrees, certifications, and relevant coursework
University of North Texas
Master of Science, Data Science
2022 - 2024
Completed coursework in Applied Machine Learning, Data Analysis, and Data Modeling. Studied Python Programming, Data Harvesting and Storage, and Natural Language Processing.
Tech stack
Software and tools used professionally
Availability
Location
Authorized to work in
Website
linkedin.com/in/mshiva0595Skills
Interested in hiring Shiva?
You can contact Shiva and 90k+ other talented remote workers on Himalayas.
Message ShivaFind your dream job
Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!
