Pratyush Dulal
@pratyushdulal
Senior data engineer building scalable lakehouse and AI-enabled data pipelines in cloud ecosystems.
What I'm looking for
I’m a Senior Data Engineer with 7+ years designing and optimizing scalable, cloud-native data platforms and high-performance ETL/ELT pipelines. I build batch and real-time solutions across AWS, Azure, and GCP, routinely processing 20+ TB of data daily while cutting latency by 70% and lowering cloud costs by 35%.
In my recent role, I architected a cloud native healthcare data platform with Azure Databricks, Snowflake, and ADLS Gen2, serving 1,000+ business users. I’ve led lakehouse modernization using Delta Lake (cutting onboarding timelines to less than 3 days), automated orchestration with Airflow/Data Factory (99.8% execution success), and improved throughput by 4.5x—while also retiring 50+ legacy workflows.
I also develop RAG-enabled AI data pipelines using Azure OpenAI, vector databases, embedding models, and LLM orchestration frameworks to enable enterprise knowledge retrieval. From governance and security (Unity Catalog, RBAC, Purview) to observability (Great Expectations, Juno, monitoring), I focus on secure, governed, analytics-ready data that drives measurable business value and operational reliability.
Experience
Work history, roles, and key accomplishments
Senior Data Engineer
Johnson & Johnson
Mar 2024 - Present (2 years 3 months)
Architected and scaled a cloud-native healthcare data platform on Azure Databricks, Snowflake, and ADLS Gen2, processing 18+ TB of data daily for 1,000+ business users. Improved reliability and speed by driving 99.8% successful pipeline runs, cutting data delivery latency to under 10 minutes, accelerating insights 35% faster, and saving 35%+ cloud costs ($1.2M annually).
Data Engineer
Amgen
Jun 2022 - Feb 2024 (1 year 8 months)
Built scalable batch and real-time data pipelines on GCP using Dataflow, Apache Beam, Pub/Sub, and BigQuery, moving 10+ TB/day across clinical and research domains. Increased orchestration reliability to 99.7%, reduced production data defects by 35%, and lowered cloud spend by 25% through performance and cost optimization.
Data Engineer
HCA Healthcare
Aug 2019 - May 2022 (2 years 9 months)
Developed batch ingestion and ETL workflows for healthcare and operational analytics, processing 2+ TB weekly using Python, SQL, Spark, and Hadoop. Improved pipeline performance and quality with 98%+ Airflow success rates, 20% faster batch processing, 30% better Redshift query performance, and 25% fewer recurring validation issues.
Education
Degrees, certifications, and relevant coursework
Fisk University
Bachelor of Science in Computer Science, Computer Science
Earned a Bachelor of Science in Computer Science from Fisk University.
Tech stack
Software and tools used professionally
Amazon Redshift
Azure Synapse
Apache Spark
AWS Glue
Apache Hive
AWS IAM
Amazon S3
Google Cloud Storage
GitHub
Kubernetes
Azure Kubernetes Service
GitHub Actions
PySpark
dbt
DB
Sqoop
MySQL
PostgreSQL
MongoDB
Hadoop
HBase
Gmail
Databricks
Terraform
Java
Apache Flume
Kafka
Azure Monitor
Google Cloud Dataflow
Google Cloud Pub/Sub
Airflow
Apache Beam
Apache Oozie
Root Cause
SQL
Azure Cosmos DB
Delta Lake
Great Expectations
Cosmos
Bash
Unity Catalog
Factory
Beam
Movement
Microsoft Purview
Availability
Location
Authorized to work in
Social media
Job categories
Skills
Interested in hiring Pratyush?
You can contact Pratyush and 90k+ other talented remote workers on Himalayas.
Message PratyushFind your dream job
Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!
