Skip to main content
MM
Open to opportunities

Michael Ma

@michaelma1

Senior data engineer building scalable lakehouse and streaming pipelines for analytics and experimentation at massive scale.

United States
Message

What I'm looking for

I’m looking for a team where I can build scalable lakehouse/streaming pipelines, enforce strong data quality and lineage, and enable fast, reliable experimentation. I want ownership of end-to-end datasets that drive global product decisions.

I’m a Senior Data Engineer focused on turning complex, multi-source event data into reliable, production-grade datasets for analytics, ranking, and experimentation. I build end-to-end rider lifecycle and marketplace feature pipelines that directly power decision-making at scale.

At Uber, I modeled and operationalized the Ride Session Analytics Platform—covering shopping through matching to trip completion—enabling analytics across 200M+ monthly active users and ~3B+ quarterly trips. I’ve also aligned metric definitions across 10+ teams, improving consistency in marketplace performance reporting.

I’ve engineered scalable lakehouse pipelines (Bronze/Silver/Gold) using Spark, Python, SQL, and AWS S3, processing multi-terabyte daily datasets from thousands of upstream sources. I integrated 5+ heterogeneous data sources into unified analytical tables, while implementing data quality checks for freshness, completeness, duplication, and schema drift.

Previously at Airbnb and DoorDash, I delivered Airflow-based ETL pipelines, batch + near-real-time integrations using Kafka/CDC patterns, and feature datasets supporting A/B testing at hundreds of concurrent experiments. I’m especially energized by optimizing incremental processing for late-arriving data, tuning Spark workloads to reduce runtime, and improving dataset usability through documentation, lineage, and ownership tracking.

Experience

Work history, roles, and key accomplishments

UB
Current

Senior Data Engineer

Uber

Jan 2022 - Present (4 years 5 months)

Modeled the end-to-end rider lifecycle for Ride Session Analytics, enabling analytics across 200M+ monthly active users and ~3B+ quarterly trips. Built scalable lakehouse (Bronze/Silver/Gold) pipelines and improved reliability and efficiency, including 20–30% faster recurring batch runtimes and more consistent marketplace reporting across 10+ teams.

AI

Data Engineer

Airbnb

Jun 2019 - Dec 2021 (2 years 6 months)

Built data pipelines for Search & Discovery ranking and personalization, contributing to ~90%+ booking conversions through search and recommendation flows. Developed Airflow-based ETL with Spark/Hive/Presto, integrated Kafka/CDC for fresher data, and improved execution efficiency by ~25% while supporting datasets for hundreds of concurrent A/B tests.

DO

Data Engineer

DoorDash

Jun 2016 - May 2019 (2 years 11 months)

Implemented ETL pipelines integrating consumer, merchant, and dasher data into centralized datasets used by multiple business and operations teams. Modeled core entities with star schema for delivery lifecycle reporting and supported experimentation datasets (including switchback testing) through data validation, backfills, and schema updates to improve production reliability.

Education

Degrees, certifications, and relevant coursework

University of Houston logoUH

University of Houston

Bachelor's Degree in Computer Science, Computer Science

2012 - 2016

Earned a bachelor's degree in computer science at the University of Houston from 2012 to 2016.

Find your dream job

Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan