Open to opportunities

Sanjana Ananthula

@sanjanaananthula

Message

Senior Data Engineer building scalable batch and real-time data pipelines on cloud platforms.

Zimbabwe

Message

What I'm looking for

I want to build and optimize scalable batch and real-time data pipelines, strengthen data quality and governance, and partner with cross-functional teams to deliver trustworthy analytics in cloud-first environments.

I’m a data engineering professional with 5+ years of experience in Big Data and data engineering, focused on building reliable, production-ready pipelines. I bring strong expertise in Python, Hadoop, Spark, and SQL, and I enjoy turning messy data into dependable analytics foundations.

In my recent role at Walmart USA, I designed scalable batch and real-time ETL/ELT using PySpark, Spark SQL, and Python, orchestrated workflows with Apache Airflow, and supported near real-time insights with Kafka and Spark Streaming. I integrate cloud platforms across AWS services like S3, EMR, Glue, Lambda, and Redshift, while improving Spark performance through partitioning, caching, joins, and resource tuning.

Earlier, at PwC and Accenture, I delivered cloud data workflows across AWS/Azure/GCP, implemented data validation and reconciliation to ensure accuracy, and automated recurring batch processing with Airflow and Oozie. I’ve also consulted on Snowflake data platform architecture, built internal tooling for RDBMS vs. Hadoop validation, and support stakeholders with dashboards and extracts using Tableau and Power BI.

Experience

Work history, roles, and key accomplishments

Current

Senior Data Engineer

Current

Walmart

Sep 2024 - Present (1 year 10 months)

Designed and built scalable batch and real-time data pipelines using PySpark, Spark SQL, Python, Kafka, and Spark Streaming to support retail, customer, and transactional analytics. Implemented Airflow orchestration, optimized Spark performance, added data quality/reconciliation checks, and delivered reporting datasets and dashboards using Tableau and Power BI.

Pyspark Apache Airflow Kafka AWS S3 AWS Glue Snowflake Tableau

Data Engineer

PwC

Jan 2022 - Jun 2023 (1 year 5 months)

Designed and built scalable ETL pipelines using Python, SQL, and Apache Spark across AWS, Azure, and GCP, including data ingestion from APIs, databases, and flat files. Delivered Snowflake/Redshift warehousing improvements, implemented data validation/monitoring and governance practices, and supported real-time Kafka/Spark Streaming pipelines and BI-ready datasets.

Python Apache Spark SQL AWS S3 Snowflake BigQuery Kafka Apache Airflow

Data Engineer

Accenture

Jan 2020 - Dec 2021 (1 year 11 months)

Developed and maintained enterprise data pipelines using Python, Spark, Hive, and SQL for integration, cleansing, and analytics. Migrated legacy data into Hadoop and cloud platforms using Sqoop and ETL frameworks, automated recurring batch jobs with Oozie/Airflow, and performed data profiling, reconciliation, and production troubleshooting.

Python Apache Spark Hive HDFS Sqoop Oozie Apache Airflow Oracle