Open to opportunities

Peter Wong

@peterwong

Message

Senior Data Engineer building governed lakehouse and real-time streaming platforms for regulated healthcare.

United States

Message

What I'm looking for

I’m looking for a senior data engineering role where I can build governed lakehouse and real-time streaming platforms, apply agentic data quality/observability, and deliver measurable performance and cost wins for analytics and ML teams.

I’m a Senior Data Engineer with 12+ years of experience building hyperscale data platforms across Google, Databricks, and healthcare at Optum. I focus on designing governed, AI-ready systems that improve performance, reduce cost, and accelerate time-to-insight in high-stakes environments.

At Optum, I architected HIPAA-compliant real-time streaming pipelines using Apache Kafka, Debezium CDC, and Databricks Spark Structured Streaming to process 5M+ patient events daily with sub-5-minute latency. I also designed and deployed medallion lakehouse architectures on Delta Lake with Unity Catalog governance and Apache Iceberg compatibility, reducing query latency by 75% and storage costs while supporting multimodal RAG and AI use cases.

I lead agentic AI-assisted data quality and observability with zero-ETL integrations, eliminating 90% of manual validation and achieving 99.99% data freshness SLAs across 200+ consumers. I optimize ELT orchestration with dbt and Apache Airflow for petabyte-scale datasets, and I’ve built production feature stores and contract-first ingestion capabilities that accelerate ML deployment cycles by 10x—backed by earlier lakehouse and streaming platform migrations at Databricks and foundational hyperscale pipelines on Google Cloud.

Experience

Work history, roles, and key accomplishments

Current

Senior Data Engineer

Current

Optum

May 2021 - Present (5 years 2 months)

Architected HIPAA-compliant real-time streaming pipelines with Kafka/Debezium and Databricks Spark, processing 5M+ patient events daily with sub-5-minute latency and improving predictive readmission accuracy by 22%. Designed a Delta Lake medallion lakehouse with Unity Catalog governance, reducing query latency by 75% and storage costs while delivering 99.99% data freshness SLA across 200+ consumer

Debezium Delta Lake Unity Catalog Medallion Lakehouse Architecture DBT Kafka Airflow

Senior Data Engineer

Databricks

May 2017 - Apr 2021 (3 years 11 months)

Delivered enterprise lakehouse migrations using Delta Lake, Apache Iceberg, and Unity Catalog, improving query performance by 80% and reducing infrastructure costs for Fortune 500 customers. Built real-time CDC and streaming pipelines handling 100M+ events/day at 99.99% uptime, and developed reusable Databricks workflow patterns that cut pipeline development time by 70%.

Databricks Unity Catalog Delta Lake Apache Iceberg PySpark Spark Structured Streaming DBT Liquid Clustering and Photon Engine Kafka

Data Engineer

Google

Sep 2015 - May 2017 (1 year 8 months)

Designed and scaled production data pipelines with Google Cloud (Dataflow/Apache Beam, BigQuery) to process multi-petabyte datasets with sub-second latency. Led migration of legacy Hadoop workloads to cloud-native Pub/Sub + Dataflow + BigQuery, reducing operational overhead by 60% and enabling real-time analytics, while implementing governance controls achieving 99.9% data reliability.

BigQuery Google Cloud Dataflow Pub Sub Cloud Composer Real Time CDC Data Governance PySpark Kafka