Open to opportunities

Purbarag Pathak Choudhury

@purbaragchoudhury

Message

Senior Data Engineer (AdTech) @ HelloFresh | PySpark • Snowflake • Databricks • dbt• Airflow • IaC

Germany

Message

What I'm looking for

I’m looking for a senior data engineering role where I can build reliable data platforms and drive operational excellence, cost and performance wins, and mentor teams—partnering closely with analytics stakeholders to turn data into faster decisions.

With over seven years of experience in data engineering, I specialize in building and optimizing cloud-based data pipelines. As a Senior Data Engineer at HelloFresh, I focus on leveraging tools like PySpark, dbt, Airflow, and Snowflake to enable data-driven decision-making while supporting teams in Marketing, Reporting, and Data Science. My approach emphasizes collaboration, quality assurance, and aligning technical solutions with business goals.

By driving the creation of robust data workflows and optimizations through technologies like DeltaLake, our team has streamlined data operations. I am committed to delivering efficient, reliable, and scalable data solutions while fostering clear communication with stakeholders to ensure timely project delivery.

Currently based in the Berlin, Germany and actively open to fully remote opportunities.

Experience

Work history, roles, and key accomplishments

Current

Senior Data Engineer

Current

HelloFresh

Jul 2025 - Present (1 year)

Led centralization of programmatic ad data across 6 platforms into a single source of truth, reducing cross-market reporting discrepancies by 80%+.
Built an event-driven marketing conversion pipeline that cut time-to-insight from 1 day to 2 hours and optimized the Snowflake serving layer to reduce load time by 72%.

Pyspark SQL Airflow AWS Lambda Snowflake DBT Databricks Query Optimization Data Quality

Data Engineer

HelloFresh

Oct 2022 - Jun 2025 (2 years 8 months)

Built Snowflake + dbt conversion cohort pipelines orchestrated in Airflow with 150 Soda checks to reach 90% SLA compliance, reducing data issues by 30% and Snowflake storage costs by 75%.
Migrated legacy Salesforce CRM CDP pipelines to Databricks DeltaLake, reducing execution time by 80% and delivering €172.6K annual compute cost savings.

Python Pyspark SQL Airflow DBT Snowflake Databricks Kafka Terraform CI CD

Associate Consultant

ZS Associates

Sep 2021 - Sep 2022 (1 year)

Led a 4-person team to deliver on-time PySpark batch pipelines on AWS EMR processing 600 GB/day of clinical behavioral data for risk mitigation models.
Implemented SCD Type 2 change-tracking across 20+ tables to provide full historical audit trails for compliance reporting.

Pyspark SQL ETL Data Modeling S3 Batch Processing

Solutions Engineer

Zaloni Inc.

Aug 2018 - Aug 2021 (3 years)

Executed migration of 12TB SQL Server data to MongoDB using PySpark, Sqoop, and EMR, completing the transition to a NoSQL architecture in under 90 days.
Designed cross-region replication (RDS + Lambda + Kafka + Debezium) achieving <10 seconds average replication lag and 99.9% uptime for critical workloads.

Pyspark SQL Server MongoDB Sqoop Kafka Debezium AWS Lambda REST APIs