Sonia Hayes
@soniahayes
I’m a Principal Data Engineer designing enterprise lakehouse and AI-enabled distributed data platforms.
What I'm looking for
I’m a Principal Data Engineer and Data Architect with 10+ years designing enterprise-grade, multi-cloud lakehouse and distributed data platforms across healthcare, fintech, and enterprise domains. I focus on exactly-once streaming, CDC frameworks, metadata-driven ingestion, Data Vault modeling, and governance standards including HIPAA, HITRUST, and SOC2.
I’ve built AI-enabled data foundations using feature stores, embedding pipelines, vector search, drift detection, and RAG-ready architectures across Databricks, Snowflake, Kafka, and Spark ecosystems. I also lead observability-driven DataOps, Spark performance tuning, partitioning strategies, and cost optimization—delivering 99.99% platform availability at scale.
Recently at DataForest, I architected multi-cloud lakehouse ingesting 6TB+ daily across 140+ systems, designed exactly-once Kafka pipelines sustaining 75K events/sec peak throughput, and reduced Spark shuffle skew 39%. Earlier, at CorroHealth, I built HIPAA/HITRUST/HITECH-aligned FHIR/HL7 lakehouse processing for 2.7B+ claims and clinical records monthly, strengthening security with PHI masking, encryption-at-rest, and audit logging while reducing warehouse compute costs 28%.
Experience
Work history, roles, and key accomplishments
Principal Data Engineer
DataForest
Jul 2023 - Present (2 years 11 months)
Architected a multi-cloud lakehouse ingesting 6TB+ daily across 140+ enterprise systems using Databricks and Snowflake. Designed exactly-once Kafka streaming pipelines (75K events/sec) and metadata-driven ingestion that reduced production failures 61%, while achieving 99.99% pipeline availability and cutting annual compute spend by $1.6M.
Education
Degrees, certifications, and relevant coursework
Sonia hasn't added their education
Don't worry, there are 90k+ talented remote workers on Himalayas
Tech stack
Software and tools used professionally
AWS Glue
Apache Flink
Amazon S3
Google Cloud Storage
GitHub
GitLab
Kubernetes
GitHub Actions
GitLab CI
PySpark
Debezium
dbt
MySQL
PostgreSQL
MongoDB
Cassandra
Gmail
Databricks
Terraform
Java
MLflow
Kafka
Milvus
Airflow
Apache Beam
SQL
Dagster
Apache Iceberg
LangChain
LlamaIndex
Pinecone
Monte Carlo
Feast
DataHub
Delta Lake
Great Expectations
OpenMetadata
Trino
Apache Hudi
Collibra
dbt Cloud
Bash
Faiss
Microsoft Fabric
OpenLineage
Unity Catalog
Factory
Beam
Microsoft Purview
Availability
Location
Authorized to work in
Job categories
Skills
Interested in hiring Sonia?
You can contact Sonia and 90k+ other talented remote workers on Himalayas.
Message SoniaFind your dream job
Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!
