Open to opportunities

ANISH BARAL

@anishbaral1

Message

Senior Data/ML Engineer specializing in cloud-native, scalable data and ML platforms.

United States

Message

What I'm looking for

I seek senior-level roles building cloud-native data platforms and ML/Ops with strong engineering practices, governance, and cross-functional leadership.

I am a Senior Data/ML Engineer with 6+ years building cloud-native, scalable data platforms across healthcare, retail, and finance. I design secure, HIPAA-compliant data lakes and medallion architectures and migrate legacy workloads to modern cloud warehouses.

I build modular batch and streaming ETL pipelines with PySpark, Spark (Scala), Delta Lake, Databricks, Kafka, and AWS/ Azure services, and I integrate ingestion frameworks (NiFi, ADF, Glue) to onboard 100+ sources. I apply dbt for modular SQL transformations and CI-driven data quality enforcement.

I collaborate with ML teams to deliver production ML/Ops—deploying models with MLflow, SageMaker, and Azure ML—and have led initiatives in deep learning, NLP, computer vision, and LLMs for cancer diagnostics and patient stratification. I develop low-latency APIs and monitoring with FastAPI, Lambda, Prometheus, and Grafana.

I lead and mentor cross-functional teams, implement CI/CD and IaC (Terraform, GitHub Actions, Azure DevOps), and promote data governance, observability, and domain-driven architectures to deliver reliable analytics and data products that drive business outcomes.

Experience

Work history, roles, and key accomplishments

Current

Senior Data Engineer

Current

Cardinal Health

May 2023 - Present (3 years 2 months)

Led ML and data engineering initiatives to build HIPAA-compliant Medallion data platforms and production ML Ops pipelines, migrating legacy Hadoop workloads to Azure Synapse/Databricks and improving query response times by 3x while maintaining 95%+ SLA adherence.

Databricks Delta Lake Azure Synapse MLFlow Sagemaker Azure Data Factory Apache NiFi DBT Airflow PySpark

Data Engineer

Pfizer

Jan 2021 - Apr 2023 (2 years 3 months)

Built scalable ETL and ML pipelines for healthcare analytics, deployed HIPAA-compliant data lakes on S3 with Delta Lake, and enabled real-time patient insights using Kafka and Spark Structured Streaming to support clinical decision workflows.

PySpark AWS Glue S3 Delta Lake Kafka Spark Structured Streaming DBT Redshift Snowflake Airflow

Data Engineer

Dollar General

Jul 2018 - Dec 2020 (2 years 5 months)

Developed Spark-based ETL and real-time streaming pipelines on EMR and Kafka/Kinesis, optimized Spark jobs to reduce runtimes from 90 to 30 minutes, and implemented CI/CD and monitoring to improve pipeline reliability.

PySpark Kafka Kinesis Redshift Gitlab CI Oozie Airflow S3