Open to opportunities

Sandip Jaishwal

@sandipjaishwal

Message

Senior Data Engineer specializing in scalable cloud data platforms and GenAI pipelines.

United States

Message

What I'm looking for

I seek senior data engineering roles building secure, scalable cloud data platforms and GenAI-supporting pipelines, with opportunities for cross-functional collaboration and measurable impact.

I am a Senior Data Engineer with over seven years building scalable data solutions across AWS, Azure, and GCP, and six years designing Spark pipelines in Scala on AWS EMR. I design both real-time and batch pipelines, enabling robust analytics and ML use cases.

I have delivered data platforms that power Generative AI/LLM solutions using LangChain, RAG pipelines, and MLflow, and I’ve collaborated closely with data science teams to productionize models and feature stores. My work has supported healthcare and finance analytics, compliance reporting, and executive dashboards.

Technically, I build ingestion frameworks with Kafka, Kinesis, NiFi, and Fivetran, implement ETL/ELT with Glue, Airflow, dbt and Spark (Scala/PySpark), and manage cloud infrastructure with Terraform, CloudFormation, and CI/CD pipelines. I also implement data quality frameworks using Great Expectations and Glue DataBrew to ensure reliable data delivery.

I prioritize secure, governed architectures—using IAM, KMS, Lake Formation, Key Vaults, VPC endpoints and compliance with HIPAA/GDPR—while enabling self-service analytics through QuickSight, Looker, and Power BI. I work effectively in Agile teams and focus on measurable improvements in reliability, performance, and data quality.

Experience

Work history, roles, and key accomplishments

Current

Senior Data Engineer

Current

UnitedHealth Group

Jan 2023 - Present (3 years 2 months)

Built a central S3 data lake and Spark/Glue ETL pipelines to support clinical, pharmacy, and insurance analytics, improving report performance and enabling GenAI-enabled knowledge bases for LLM search and reasoning.

AWS Glue Pyspark Scala EMR Redshift S3 Terraform Airflow Great Expectations

Data Engineer

Berkshire Hathaway

Aug 2020 - Nov 2022 (2 years 3 months)

Migrated legacy ETL to Scala Spark pipelines on EMR and designed Redshift models and data lakes, reducing processing time and data quality incidents while ensuring HIPAA-compliant governance.

Scala Spark EMR Redshift Snowflake DBT Great Expectations Terraform AWS Glue

ETL Developer

The Cigna Group

Jun 2017 - Jul 2020 (3 years 1 month)

Developed batch and near-real-time ETL frameworks for IoT and EHR data using Spark on EMR and Azure Synapse, enabling ingestion of 100+ GB/day and improving predictive maintenance scheduling by 35%.

Pyspark Scala EMR Azure Synapse Data Lake Terraform Kafka API Gateway