Sandip Jaishwal
@sandipjaishwal
Senior Data Engineer specializing in scalable cloud data platforms and GenAI pipelines.
What I'm looking for
I am a Senior Data Engineer with over seven years building scalable data solutions across AWS, Azure, and GCP, and six years designing Spark pipelines in Scala on AWS EMR. I design both real-time and batch pipelines, enabling robust analytics and ML use cases.
I have delivered data platforms that power Generative AI/LLM solutions using LangChain, RAG pipelines, and MLflow, and I’ve collaborated closely with data science teams to productionize models and feature stores. My work has supported healthcare and finance analytics, compliance reporting, and executive dashboards.
Technically, I build ingestion frameworks with Kafka, Kinesis, NiFi, and Fivetran, implement ETL/ELT with Glue, Airflow, dbt and Spark (Scala/PySpark), and manage cloud infrastructure with Terraform, CloudFormation, and CI/CD pipelines. I also implement data quality frameworks using Great Expectations and Glue DataBrew to ensure reliable data delivery.
I prioritize secure, governed architectures—using IAM, KMS, Lake Formation, Key Vaults, VPC endpoints and compliance with HIPAA/GDPR—while enabling self-service analytics through QuickSight, Looker, and Power BI. I work effectively in Agile teams and focus on measurable improvements in reliability, performance, and data quality.
Experience
Work history, roles, and key accomplishments
Senior Data Engineer
UnitedHealth Group
Jan 2023 - Present (2 years 8 months)
Built a central S3 data lake and Spark/Glue ETL pipelines to support clinical, pharmacy, and insurance analytics, improving report performance and enabling GenAI-enabled knowledge bases for LLM search and reasoning.
Data Engineer
Berkshire Hathaway
Aug 2020 - Nov 2022 (2 years 3 months)
Migrated legacy ETL to Scala Spark pipelines on EMR and designed Redshift models and data lakes, reducing processing time and data quality incidents while ensuring HIPAA-compliant governance.
ETL Developer
The Cigna Group
Jun 2017 - Jul 2020 (3 years 1 month)
Developed batch and near-real-time ETL frameworks for IoT and EHR data using Spark on EMR and Azure Synapse, enabling ingestion of 100+ GB/day and improving predictive maintenance scheduling by 35%.
Education
Degrees, certifications, and relevant coursework
Sandip hasn't added their education
Don't worry, there are 90k+ talented remote workers on Himalayas
Tech stack
Software and tools used professionally
Amazon Redshift
Airbyte
Fivetran
Azure Synapse
Apache Spark
AWS Glue
Apache Flink
AtScale
Superset
Data Studio
Amazon Quicksight
AWS IAM
Google Cloud Platform
Amazon CloudWatch
Amazon S3
Google Cloud Storage
GitHub
Bitbucket
Kubernetes
AWS CodePipeline
Jenkins
GitHub Actions
NumPy
Pandas
PySpark
AWS Glue DataBrew
AWS Data Pipeline
dbt
PostgreSQL
Hadoop
HBase
Gmail
Databricks
Dist
Terraform
AWS CloudFormation
Azure DevOps
Jira
Java
JSON
XML
TensorFlow
MLflow
scikit-learn
Kafka
Apache NiFi
Prometheus
Datadog
AWS X-Ray
Amazon Kinesis
Amazon Macie
Avro
AWS Lambda
Serverless
Airflow
AWS Backup
NetSuite
SQL
ServiceNow
XGBoost
Workato
Hugging Face
AWS KMS
Mode Analytics
Apache Iceberg
LangChain
Hightouch
Hex
Ray
Delta Lake
Great Expectations
Availability
Location
Authorized to work in
Job categories
Skills
Interested in hiring Sandip?
You can contact Sandip and 90k+ other talented remote workers on Himalayas.
Message SandipFind your dream job
Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!
