ANISH BARAL
@anishbaral1
Senior Data/ML Engineer specializing in cloud-native, scalable data and ML platforms.
What I'm looking for
I am a Senior Data/ML Engineer with 6+ years building cloud-native, scalable data platforms across healthcare, retail, and finance. I design secure, HIPAA-compliant data lakes and medallion architectures and migrate legacy workloads to modern cloud warehouses.
I build modular batch and streaming ETL pipelines with PySpark, Spark (Scala), Delta Lake, Databricks, Kafka, and AWS/ Azure services, and I integrate ingestion frameworks (NiFi, ADF, Glue) to onboard 100+ sources. I apply dbt for modular SQL transformations and CI-driven data quality enforcement.
I collaborate with ML teams to deliver production ML/Ops—deploying models with MLflow, SageMaker, and Azure ML—and have led initiatives in deep learning, NLP, computer vision, and LLMs for cancer diagnostics and patient stratification. I develop low-latency APIs and monitoring with FastAPI, Lambda, Prometheus, and Grafana.
I lead and mentor cross-functional teams, implement CI/CD and IaC (Terraform, GitHub Actions, Azure DevOps), and promote data governance, observability, and domain-driven architectures to deliver reliable analytics and data products that drive business outcomes.
Experience
Work history, roles, and key accomplishments
Senior Data Engineer
Cardinal Health
May 2023 - Present (2 years 4 months)
Led ML and data engineering initiatives to build HIPAA-compliant Medallion data platforms and production ML Ops pipelines, migrating legacy Hadoop workloads to Azure Synapse/Databricks and improving query response times by 3x while maintaining 95%+ SLA adherence.
Data Engineer
Pfizer
Jan 2021 - Apr 2023 (2 years 3 months)
Built scalable ETL and ML pipelines for healthcare analytics, deployed HIPAA-compliant data lakes on S3 with Delta Lake, and enabled real-time patient insights using Kafka and Spark Structured Streaming to support clinical decision workflows.
Data Engineer
Dollar General
Jul 2018 - Dec 2020 (2 years 5 months)
Developed Spark-based ETL and real-time streaming pipelines on EMR and Kafka/Kinesis, optimized Spark jobs to reduce runtimes from 90 to 30 minutes, and implemented CI/CD and monitoring to improve pipeline reliability.
Education
Degrees, certifications, and relevant coursework
Texas A&M University
Master of Science, Business Analytics
Completed a Master's program focused on business analytics with coursework in data analysis, statistical modeling, and data-driven decision making.
Tech stack
Software and tools used professionally
OpenAPI
Airbyte
Fivetran
Azure Synapse
Apache Spark
AWS Glue
Apache Flink
Talend
Amazon Quicksight
AWS IAM
Amazon S3
AWS Step Functions
GitHub
GitLab
Kubernetes
AWS CodePipeline
Jenkins
GitHub Actions
GitLab CI
NumPy
Pandas
PySpark
dbt
DB
Sqoop
PostgreSQL
MongoDB
Cassandra
Hadoop
Vertica
Gmail
Spring Boot
Databricks
Amazon Neptune
Neo4j
Redis
Terraform
Azure DevOps
Jira
JavaScript
TensorFlow
PyTorch
MLflow
scikit-learn
Neptune
Kafka
Apache NiFi
Apache Pulsar
FastAPI
PagerDuty
Grafana
Prometheus
Datadog
GraphQL
Elasticsearch
AWS Lambda
Serverless
Azure Functions
Azure SQL Database
Kafka Streams
pytest
Airflow
Apache Beam
Apache Oozie
Time Analytics
Root Cause
Amazon EMR
Amazon Athena
SQL
Amazon SageMaker
Azure Cosmos DB
XGBoost
AWS KMS
MinIO
LangChain
Ollama
Pinecone
Monte Carlo
Delta Lake
Great Expectations
Availability
Location
Authorized to work in
Job categories
Skills
Interested in hiring ANISH?
You can contact ANISH and 90k+ other talented remote workers on Himalayas.
Message ANISHFind your dream job
Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!
