Saujan Baniya
@saujanbaniya
I am a Senior Data Engineer building scalable cloud-native data platforms.
What I'm looking for
I am a results-driven Senior Data Engineer with over 7 years designing and modernizing cloud-native data platforms across finance, healthcare, and telecom.
I have built multi-terabyte data warehouses and orchestrated PySpark ETL in Databricks, implemented Medallion Architecture with Delta Lake, and integrated dbt to standardize transformations and testing. I designed real-time processing with Kafka, Spark, and Flink to reduce data latency and enable operational insights.
I am proficient across AWS, Azure, and GCP and automate infrastructure using Terraform, CloudFormation, and CI/CD tools like GitHub Actions and Azure DevOps. I enforce data quality and governance with Great Expectations and Azure Purview while ensuring compliance with HIPAA, GDPR, and SOX.
I consistently deliver production-ready, secure solutions—authoring documentation, mentoring junior engineers, and building dashboards and APIs that support predictive analytics, regulatory reporting, and enterprise decision-making.
Experience
Work history, roles, and key accomplishments
Designed and deployed a multi-terabyte data warehouse on AWS Redshift and built PySpark ETL workflows in Databricks across AWS S3 and GCP Storage to enable scalable transformations. Implemented Medallion Architecture with Delta Lake and dbt, built real-time Kafka/Spark/Flink pipelines to reduce data latency, automated IaC with Terraform, and enforced HIPAA/GDPR controls.
Designed scalable ETL/ELT pipelines using Apache Spark and Azure Data Factory, ingesting data from over 25 sources into Azure Data Lake Storage and Snowflake. Built real-time streaming with Kafka and Event Hubs, integrated dbt and Great Expectations for testing, automated IaC/CI-CD, and led migrations that reduced operational costs by 30%.
Data Engineer
LifePoint Health
Aug 2020 - Jul 2022 (1 year 11 months)
Built ETL pipelines with Azure Data Factory and Delta Lake in Azure Databricks to support CDC and modeled ML-ready datasets in Azure Synapse for analytics and reporting. Deployed dbt models and Great Expectations validations, automated infrastructure with Terraform and CI/CD, and delivered Power BI dashboards while enforcing HIPAA/GDPR controls.
Data Engineer
Verizon
Jan 2018 - Jun 2020 (2 years 5 months)
Developed Hadoop and Spark pipelines processing multi-terabyte clickstream and log data, migrating batch workflows to Spark to achieve 5x performance gains and lower compute costs. Built Kafka and NiFi streaming ingestion, implemented Delta Lake and Parquet data lakes in S3, automated infrastructure with Terraform and CI/CD, and implemented data quality checks across pipelines.
Education
Degrees, certifications, and relevant coursework
The University of Findlay
Master of Business Administration, Business Analytics
Master's in Business Analytics (MBA) from The University of Findlay, focusing on business analytics and data-driven decision-making.
Tech stack
Software and tools used professionally
Azure Synapse
Apache Spark
AWS Glue
Apache Flink
SAS
Data Studio
Amazon Quicksight
GitHub
GitLab
Kubernetes
Jenkins
GitHub Actions
PySpark
dbt
DB
Sqoop
MySQL
PostgreSQL
MongoDB
Cassandra
Hadoop
Gmail
Django
Databricks
Terraform
AWS CloudFormation
Azure DevOps
Jira
Java
TensorFlow
PyTorch
MLflow
scikit-learn
Kafka
Apache NiFi
Grafana
Kibana
Prometheus
Azure Monitor
Zookeeper
Ubuntu
Linux
macOS
Windows
Azure Active Directory
Elasticsearch
AWS Lambda
Serverless
pytest
Airflow
SQL
ServiceNow
XGBoost
LightGBM
CatBoost
Delta Lake
Great Expectations
Availability
Location
Authorized to work in
Job categories
Skills
Interested in hiring Saujan?
You can contact Saujan and 90k+ other talented remote workers on Himalayas.
Message SaujanFind your dream job
Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!
