Open to opportunities

Pratyush Dulal

@pratyushdulal

Message

Senior data engineer building scalable lakehouse and AI-enabled data pipelines in cloud ecosystems.

United States

Message

What I'm looking for

I’m looking for a team where I can build secure, governed lakehouse and streaming data platforms, partner with ML/AI teams on RAG pipelines, and deliver measurable reliability, performance, and cost improvements end-to-end.

I’m a Senior Data Engineer with 7+ years designing and optimizing scalable, cloud-native data platforms and high-performance ETL/ELT pipelines. I build batch and real-time solutions across AWS, Azure, and GCP, routinely processing 20+ TB of data daily while cutting latency by 70% and lowering cloud costs by 35%.

In my recent role, I architected a cloud native healthcare data platform with Azure Databricks, Snowflake, and ADLS Gen2, serving 1,000+ business users. I’ve led lakehouse modernization using Delta Lake (cutting onboarding timelines to less than 3 days), automated orchestration with Airflow/Data Factory (99.8% execution success), and improved throughput by 4.5x—while also retiring 50+ legacy workflows.

I also develop RAG-enabled AI data pipelines using Azure OpenAI, vector databases, embedding models, and LLM orchestration frameworks to enable enterprise knowledge retrieval. From governance and security (Unity Catalog, RBAC, Purview) to observability (Great Expectations, Juno, monitoring), I focus on secure, governed, analytics-ready data that drives measurable business value and operational reliability.

Experience

Work history, roles, and key accomplishments

Current

Senior Data Engineer

Current

Johnson & Johnson

Mar 2024 - Present (2 years 4 months)

Architected and scaled a cloud-native healthcare data platform on Azure Databricks, Snowflake, and ADLS Gen2, processing 18+ TB of data daily for 1,000+ business users. Improved reliability and speed by driving 99.8% successful pipeline runs, cutting data delivery latency to under 10 minutes, accelerating insights 35% faster, and saving 35%+ cloud costs ($1.2M annually).

Azure Databricks Snowflake Delta Lake PySpark Azure Data Factory Unity Catalog Kafka Airflow azure monitor Azure OpenAI

Data Engineer

Amgen

Jun 2022 - Feb 2024 (1 year 8 months)

Built scalable batch and real-time data pipelines on GCP using Dataflow, Apache Beam, Pub/Sub, and BigQuery, moving 10+ TB/day across clinical and research domains. Increased orchestration reliability to 99.7%, reduced production data defects by 35%, and lowered cloud spend by 25% through performance and cost optimization.

Google Cloud Dataflow Apache Beam Pub Sub BigQuery Databricks PySpark Cloud Composer Terraform Cloud Monitoring Snowflake

Data Engineer

HCA Healthcare

Aug 2019 - May 2022 (2 years 9 months)

Developed batch ingestion and ETL workflows for healthcare and operational analytics, processing 2+ TB weekly using Python, SQL, Spark, and Hadoop. Improved pipeline performance and quality with 98%+ Airflow success rates, 20% faster batch processing, 30% better Redshift query performance, and 25% fewer recurring validation issues.