Open to opportunities

Saujan Baniya

@saujanbaniya

Message

I am a Senior Data Engineer building scalable cloud-native data platforms.

United States

Message

What I'm looking for

I’m seeking a senior role building scalable, secure cloud data platforms and real-time pipelines in collaborative, compliance-focused teams where I can lead architecture, mentor engineers, and deliver production-grade analytics.

I am a results-driven Senior Data Engineer with over 7 years designing and modernizing cloud-native data platforms across finance, healthcare, and telecom.

I have built multi-terabyte data warehouses and orchestrated PySpark ETL in Databricks, implemented Medallion Architecture with Delta Lake, and integrated dbt to standardize transformations and testing. I designed real-time processing with Kafka, Spark, and Flink to reduce data latency and enable operational insights.

I am proficient across AWS, Azure, and GCP and automate infrastructure using Terraform, CloudFormation, and CI/CD tools like GitHub Actions and Azure DevOps. I enforce data quality and governance with Great Expectations and Azure Purview while ensuring compliance with HIPAA, GDPR, and SOX.

I consistently deliver production-ready, secure solutions—authoring documentation, mentoring junior engineers, and building dashboards and APIs that support predictive analytics, regulatory reporting, and enterprise decision-making.

Experience

Work history, roles, and key accomplishments

Current

Senior Data Engineer

Current

Pfizer

Aug 2022 - Present (3 years 11 months)

Designed and deployed a multi-terabyte data warehouse on AWS Redshift and built PySpark ETL workflows in Databricks across AWS S3 and GCP Storage to enable scalable transformations. Implemented Medallion Architecture with Delta Lake and dbt, built real-time Kafka/Spark/Flink pipelines to reduce data latency, automated IaC with Terraform, and enforced HIPAA/GDPR controls.

PySpark Databricks Redshift Delta Lake DBT Kafka Terraform AWS

Data Engineer

Goldman Sachs

Aug 2021 - Jul 2022 (11 months)

Designed scalable ETL/ELT pipelines using Apache Spark and Azure Data Factory, ingesting data from over 25 sources into Azure Data Lake Storage and Snowflake. Built real-time streaming with Kafka and Event Hubs, integrated dbt and Great Expectations for testing, automated IaC/CI-CD, and led migrations that reduced operational costs by 30%.

Apache Spark Azure Data Factory Snowflake Kafka DBT Great Expectations Terraform Azure DevOps

Data Engineer

LifePoint Health

Aug 2020 - Jul 2022 (1 year 11 months)

Built ETL pipelines with Azure Data Factory and Delta Lake in Azure Databricks to support CDC and modeled ML-ready datasets in Azure Synapse for analytics and reporting. Deployed dbt models and Great Expectations validations, automated infrastructure with Terraform and CI/CD, and delivered Power BI dashboards while enforcing HIPAA/GDPR controls.

Azure Data Factory Databricks Delta Lake DBT Great Expectations Terraform Azure Synapse Power BI

Data Engineer

Verizon

Jan 2018 - Jun 2020 (2 years 5 months)

Developed Hadoop and Spark pipelines processing multi-terabyte clickstream and log data, migrating batch workflows to Spark to achieve 5x performance gains and lower compute costs. Built Kafka and NiFi streaming ingestion, implemented Delta Lake and Parquet data lakes in S3, automated infrastructure with Terraform and CI/CD, and implemented data quality checks across pipelines.

Apache Spark Hadoop Kafka Delta Lake Terraform Airflow PySpark S3