Peter Wong
@peterwong
Senior Data Engineer building governed lakehouse and real-time streaming platforms for regulated healthcare.
What I'm looking for
I’m a Senior Data Engineer with 12+ years of experience building hyperscale data platforms across Google, Databricks, and healthcare at Optum. I focus on designing governed, AI-ready systems that improve performance, reduce cost, and accelerate time-to-insight in high-stakes environments.
At Optum, I architected HIPAA-compliant real-time streaming pipelines using Apache Kafka, Debezium CDC, and Databricks Spark Structured Streaming to process 5M+ patient events daily with sub-5-minute latency. I also designed and deployed medallion lakehouse architectures on Delta Lake with Unity Catalog governance and Apache Iceberg compatibility, reducing query latency by 75% and storage costs while supporting multimodal RAG and AI use cases.
I lead agentic AI-assisted data quality and observability with zero-ETL integrations, eliminating 90% of manual validation and achieving 99.99% data freshness SLAs across 200+ consumers. I optimize ELT orchestration with dbt and Apache Airflow for petabyte-scale datasets, and I’ve built production feature stores and contract-first ingestion capabilities that accelerate ML deployment cycles by 10x—backed by earlier lakehouse and streaming platform migrations at Databricks and foundational hyperscale pipelines on Google Cloud.
Experience
Work history, roles, and key accomplishments
Architected HIPAA-compliant real-time streaming pipelines with Kafka/Debezium and Databricks Spark, processing 5M+ patient events daily with sub-5-minute latency and improving predictive readmission accuracy by 22%. Designed a Delta Lake medallion lakehouse with Unity Catalog governance, reducing query latency by 75% and storage costs while delivering 99.99% data freshness SLA across 200+ consumer
Delivered enterprise lakehouse migrations using Delta Lake, Apache Iceberg, and Unity Catalog, improving query performance by 80% and reducing infrastructure costs for Fortune 500 customers. Built real-time CDC and streaming pipelines handling 100M+ events/day at 99.99% uptime, and developed reusable Databricks workflow patterns that cut pipeline development time by 70%.
Designed and scaled production data pipelines with Google Cloud (Dataflow/Apache Beam, BigQuery) to process multi-petabyte datasets with sub-second latency. Led migration of legacy Hadoop workloads to cloud-native Pub/Sub + Dataflow + BigQuery, reducing operational overhead by 60% and enabling real-time analytics, while implementing governance controls achieving 99.9% data reliability.
Education
Degrees, certifications, and relevant coursework
The University of Texas at Austin
Bachelor of Science, Computer Science
2011 - 2015
Earned a Bachelor of Science in Computer Science from The University of Texas at Austin (2011–2015).
Tech stack
Software and tools used professionally
Splunk
Apache Spark
Superset
Metabase
GitHub
GitLab
Kubernetes
Jenkins
CircleCI
GitHub Actions
GitLab CI
PySpark
Debezium
dbt
Hadoop
Django
Spring Boot
Databricks
Terraform
Azure DevOps
Java
Kafka
FastAPI
Grafana
Prometheus
OpenTelemetry
Datadog
OpenSearch
Apache Beam
Time Analytics
SQL
Buildkite
Apache Iceberg
Hex
Delta Lake
Lightdash
Bash
Agentic
OpenLineage
Unity Catalog
Beam
Availability
Location
Authorized to work in
Job categories
Skills
Interested in hiring Peter ?
You can contact Peter and 90k+ other talented remote workers on Himalayas.
Message PeterFind your dream job
Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!
