Drake Nguyen
@drakenguyen1
I’m an AI Data Engineer with 8+ years building scalable ML data pipelines, GPU utilization, and reproducible infrastructure.
What I'm looking for
I’m an AI Data Engineer with 8+ years designing and operating large-scale data pipelines and machine-learning data infrastructure that powers model training, evaluation, and continual improvement. I focus on dataset versioning, lineage tracking, and reproducibility controls—so ML teams can trust results and audit regulated data assets.
Across AWS and Databricks, I build high-throughput ingestion and transformation systems optimized for GPU utilization, with strong data quality, privacy redaction, and consent enforcement at scale. I also drive end-to-end observability and governance using tools like Unity Catalog, RBAC, and HashiCorp Vault—partnering closely with ML researchers and engineers to translate research requirements into reliable, cost-optimized production pipelines.
Experience
Work history, roles, and key accomplishments
AI Data Engineer
Rearc
Sep 2025 - Present (9 months)
Architected and operated petabyte-scale AI data ingestion and transformation pipelines on AWS for multimodal enterprise ML workloads. Built Spark/Ray data loading systems to maximize GPU utilization, implemented dataset versioning and lineage for reproducible training, and added privacy redaction and data-quality observability to protect regulated datasets.
Senior AI Data Platform Engineer
H-E-B
Aug 2023 - Aug 2025 (2 years)
Managed and scaled AWS and Databricks data platforms used by 25+ data engineering and analytics teams, including ML workloads at petabyte scale. Implemented Unity Catalog governance (RBAC, lineage/provenance), automated credential lifecycle with Vault, improved monitoring for data drift and pipeline health, and reduced storage costs via Delta Lake compaction and compression.
Led incremental ETL pipelines with Databricks, Spark, and Delta Lake to deliver fresh, versioned time-series datasets for analytical modeling. Built a self-serve analytics platform with Snowflake and Tableau, revamped legacy pipelines for accuracy and reduced redundant computation, and introduced automated validation/anomaly detection with lineage documentation for reproducibility.
Built and operated large-scale Apache Spark pipelines processing millions of high-definition map records for autonomous vehicle programs across the US and Canada. Developed PySpark applications for spatial variance validation, automated daily quality reporting, and modernized deployments with Kubernetes and Azure pipeline-as-code to improve reproducibility and reduce pipeline runtimes.
Developed a Java application to automate and standardize internal combustion engine simulation preparation, reducing manual setup time and improving consistency of inputs. Modernized a legacy Apache Struts web application by migrating it to Java Spring to improve security, maintainability, and code review/testing practices.
Education
Degrees, certifications, and relevant coursework
Texas A&M University
Bachelor of Science, Computer Engineering
2015 - 2017
Bachelor of Science in Computer Engineering. Completed coursework focused on foundational engineering and computing concepts.
Lone Star College
Associate of Science, Engineering
2013 - 2015
Associate of Science in Engineering. Completed foundational engineering coursework.
Tech stack
Software and tools used professionally
Availability
Location
Authorized to work in
Job categories
Skills
Interested in hiring Drake?
You can contact Drake and 90k+ other talented remote workers on Himalayas.
Message DrakeGet matched with your dream remote job
Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!
