Henry Phan
@henryphan
Senior data engineer building cloud-native platforms, reducing latency, and improving data quality.
What I'm looking for
I’m a data engineer with 7 years building cloud-native data platforms and analytics solutions across energy and software, focused on scalable pipelines, measurable performance, and trustworthy data. I’ve used Python, SQL, ETL/ELT, Airflow, dbt, Spark, and Kafka to reduce ETL latency, improve data quality, and enable self-serve analytics.
Most recently, I designed a GCP data lake with BigQuery, Cloud Storage, and Dataflow (reducing query time by 60%) and delivered near-real-time order processing with Kafka and Dataflow (cutting order-to-visibility latency from minutes to seconds). I also optimized BigQuery costs by 30% and brought observability with Prometheus and Grafana to improve incident response times by 45%, while mentoring engineers and building CI/CD for safer dbt and Airflow deployments.
Experience
Work history, roles, and key accomplishments
Senior Data Engineer
CookUnity
May 2025 - Mar 2026 (10 months)
Designed a cloud-native data lake on GCP, centralizing datasets to reduce query time by 60%. Built Airflow and Kafka/Dataflow pipelines to improve data freshness and cut order-to-visibility latency from minutes to seconds.
Data Engineer
Umbrage
Aug 2023 - May 2025 (1 year 9 months)
Led migration of legacy ETL jobs from on-prem Hadoop to Dataproc and Cloud Storage, reducing processing time by 3x and infrastructure costs by 25%. Built event-driven ingestion and modular Airflow pipelines, cutting manual intervention during failures by 70% monthly.
Data Engineer
Bluware
Jan 2021 - Sep 2023 (2 years 8 months)
Built scalable Spark workflows for seismic and well log data, reducing processing time by 4x and accelerating interpretation cycles. Developed ETL pipelines and dbt/SQL transformations to improve data integrity and downstream ML feature accuracy.
Data Engineer
EnergyMakers Advisory Group
Jan 2019 - Jan 2021 (2 years)
Designed a centralized data platform consolidating SCADA, meter, and market data, improving asset-optimization visibility and reducing retrieval times by 50%. Built ETL pipelines and scheduled Spark/Airflow jobs to normalize time-series data and cut manual reconciliation effort by 65%.
Education
Degrees, certifications, and relevant coursework
Rice University
Master of Science in Subsurface Data Science, Subsurface Data Science
2018 - 2019
Completed a Master of Science in Subsurface Data Science at Rice University from 2018 to 2019.
Texas A&M University
Bachelor of Science in Geology/Earth Science, Geology/Earth Science
2015 - 2017
Completed a Bachelor of Science in Geology/Earth Science at Texas A&M University from 2015 to 2017.
Tech stack
Software and tools used professionally
Splunk
Apache Spark
Apache Flink
Google Cloud Platform
Stackdriver
Google Cloud Storage
GitHub
Kubernetes
Jenkins
CircleCI
GitHub Actions
NumPy
Pandas
Dask
dbt
MySQL
PostgreSQL
MongoDB
Hadoop
Redis
Terraform
Java
JSON
Kafka
Grafana
Prometheus
OpenTelemetry
Datadog
Elasticsearch
Ansible
Kafka Streams
Airflow
Apache Beam
Time Analytics
SQL
Dagster
Bash
Column
Beam
Safe
Jan
Seismic
Availability
Location
Authorized to work in
Job categories
Skills
Interested in hiring Henry ?
You can contact Henry and 90k+ other talented remote workers on Himalayas.
Message HenryFind your dream job
Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!
