Open to opportunities

Drake Nguyen

@drakenguyen1

I’m an AI Data Engineer with 8+ years building scalable ML data pipelines, GPU utilization, and reproducible infrastructure.

United States

Message

What I'm looking for

I’m looking for an AI data engineering role where I can build GPU-efficient, reproducible training pipelines with strong governance, privacy controls, and observability—working closely with ML teams to ship reliable, cost-optimized data infrastructure.

I’m an AI Data Engineer with 8+ years designing and operating large-scale data pipelines and machine-learning data infrastructure that powers model training, evaluation, and continual improvement. I focus on dataset versioning, lineage tracking, and reproducibility controls—so ML teams can trust results and audit regulated data assets.

Across AWS and Databricks, I build high-throughput ingestion and transformation systems optimized for GPU utilization, with strong data quality, privacy redaction, and consent enforcement at scale. I also drive end-to-end observability and governance using tools like Unity Catalog, RBAC, and HashiCorp Vault—partnering closely with ML researchers and engineers to translate research requirements into reliable, cost-optimized production pipelines.

Experience

Work history, roles, and key accomplishments

Current

AI Data Engineer

Current

Rearc

Sep 2025 - Present (10 months)

Architected and operated petabyte-scale AI data ingestion and transformation pipelines on AWS for multimodal enterprise ML workloads. Built Spark/Ray data loading systems to maximize GPU utilization, implemented dataset versioning and lineage for reproducible training, and added privacy redaction and data-quality observability to protect regulated datasets.

Apache Spark Ray Delta Lake Versioned Datasets Parquet Observability S3

Senior AI Data Platform Engineer

H-E-B

Aug 2023 - Aug 2025 (2 years)

Managed and scaled AWS and Databricks data platforms used by 25+ data engineering and analytics teams, including ML workloads at petabyte scale. Implemented Unity Catalog governance (RBAC, lineage/provenance), automated credential lifecycle with Vault, improved monitoring for data drift and pipeline health, and reduced storage costs via Delta Lake compaction and compression.

Glue IAM Cloudwatch Databricks Unity Catalog Terraform Delta Lake Data Lineage Delta Lake Optimization (Compaction Z Order)Performance Optimization S3

Lead Data Engineer

Bonusly

Sep 2021 - Jul 2023 (1 year 10 months)

Led incremental ETL pipelines with Databricks, Spark, and Delta Lake to deliver fresh, versioned time-series datasets for analytical modeling. Built a self-serve analytics platform with Snowflake and Tableau, revamped legacy pipelines for accuracy and reduced redundant computation, and introduced automated validation/anomaly detection with lineage documentation for reproducibility.

Databricks Apache Spark Delta Lake Snowflake Tableau Versioned Datasets Data Validation

Big Data Engineer

General Motors

Mar 2019 - Sep 2021 (2 years 6 months)

Built and operated large-scale Apache Spark pipelines processing millions of high-definition map records for autonomous vehicle programs across the US and Canada. Developed PySpark applications for spatial variance validation, automated daily quality reporting, and modernized deployments with Kubernetes and Azure pipeline-as-code to improve reproducibility and reduce pipeline runtimes.

Apache Spark PySpark Spatial Validation (Variance)Kubernetes Azure Pipelines

Software Engineer

General Motors

Jan 2018 - Mar 2019 (1 year 2 months)

Developed a Java application to automate and standardize internal combustion engine simulation preparation, reducing manual setup time and improving consistency of inputs. Modernized a legacy Apache Struts web application by migrating it to Java Spring to improve security, maintainability, and code review/testing practices.

Java Spring Framework Apache Struts Migration Application Security Testing Maintainability and Compliance Refactoring