Sanjana Ananthula
@sanjanaananthula
Senior Data Engineer building scalable batch and real-time data pipelines on cloud platforms.
What I'm looking for
I’m a data engineering professional with 5+ years of experience in Big Data and data engineering, focused on building reliable, production-ready pipelines. I bring strong expertise in Python, Hadoop, Spark, and SQL, and I enjoy turning messy data into dependable analytics foundations.
In my recent role at Walmart USA, I designed scalable batch and real-time ETL/ELT using PySpark, Spark SQL, and Python, orchestrated workflows with Apache Airflow, and supported near real-time insights with Kafka and Spark Streaming. I integrate cloud platforms across AWS services like S3, EMR, Glue, Lambda, and Redshift, while improving Spark performance through partitioning, caching, joins, and resource tuning.
Earlier, at PwC and Accenture, I delivered cloud data workflows across AWS/Azure/GCP, implemented data validation and reconciliation to ensure accuracy, and automated recurring batch processing with Airflow and Oozie. I’ve also consulted on Snowflake data platform architecture, built internal tooling for RDBMS vs. Hadoop validation, and support stakeholders with dashboards and extracts using Tableau and Power BI.
Experience
Work history, roles, and key accomplishments
Designed and built scalable batch and real-time data pipelines using PySpark, Spark SQL, Python, Kafka, and Spark Streaming to support retail, customer, and transactional analytics. Implemented Airflow orchestration, optimized Spark performance, added data quality/reconciliation checks, and delivered reporting datasets and dashboards using Tableau and Power BI.
Designed and built scalable ETL pipelines using Python, SQL, and Apache Spark across AWS, Azure, and GCP, including data ingestion from APIs, databases, and flat files. Delivered Snowflake/Redshift warehousing improvements, implemented data validation/monitoring and governance practices, and supported real-time Kafka/Spark Streaming pipelines and BI-ready datasets.
Developed and maintained enterprise data pipelines using Python, Spark, Hive, and SQL for integration, cleansing, and analytics. Migrated legacy data into Hadoop and cloud platforms using Sqoop and ETL frameworks, automated recurring batch jobs with Oozie/Airflow, and performed data profiling, reconciliation, and production troubleshooting.
Education
Degrees, certifications, and relevant coursework
University of Central Missouri
Master's in Computer Science, Computer Science
2023 - 2024
Completed a Master’s program in Computer Science at the University of Central Missouri from 2023 to 2024.
Tech stack
Software and tools used professionally
Airbyte
Azure HDInsight
Azure Synapse
Apache Spark
Talend
QlikView
Google Cloud Platform
Google Cloud Storage
Azure Storage
GitHub
Bitbucket
Kubernetes
Jenkins
GitHub Actions
Jupyter
NumPy
Pandas
PySpark
Dask
Navicat
DB
Sqoop
MySQL
PostgreSQL
MongoDB
Cassandra
Hadoop
HBase
Gmail
Node.js
.NET
Yarn
Databricks
Terraform
Visual Studio
PyCharm
Azure DevOps
Jira
JavaScript
Java
MATLAB
Neuro
TensorFlow
PyTorch
scikit-learn
Keras
NLTK
Kafka
Ambari
Zookeeper
Linux
Windows
Visual Studio Code
Sublime Text
Notepad++
RStudio
Airflow
Time Analytics
Root Cause
Amazon Web Services (AWS)
SQL
SciPy
AWS KMS
Cosmos
Bash
Factory
Jan
Movement
Seaborn
Availability
Location
Authorized to work in
Job categories
Skills
Interested in hiring Sanjana?
You can contact Sanjana and 90k+ other talented remote workers on Himalayas.
Message SanjanaFind your dream job
Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!
