Looking for a job

Michael Li

@michaelli1

Message

Senior data engineer building reliable Spark/Databricks pipelines to power analytics and ML products.

United States

Message

What I'm looking for

I’m looking to build end-to-end data platforms and ELT/ETL systems with modern tooling, partner closely with product and ML teams, and deliver measurable gains in data latency, reliability, and decision-making with clean, reusable models.

I’m a Senior Data Engineer who designs and ships production-grade ELT/ETL and real-time data pipelines that make analytics and machine learning possible at scale. I’ve built Spark-based pipelines in Databricks and used streaming feature pipelines to improve ETA prediction and location ranking for search and routing systems.

At Lyft, I designed Medallion architecture and dbt models in Snowflake, built feature pipelines to support ML teams, and translated data/ML requirements into scalable architectures with product, mapping, and data science partners. I improved pickup/dropoff accuracy and reduced data latency by 17% for routing and search systems, and helped reduce deployment friction across ML teams.

I also focus heavily on reliability and data quality. I implemented Airflow workflows and monitoring for core mapping and mobility datasets, reducing incidents by 19% in critical operational datasets, and delivered semantic layers and curated datasets for pickup/dropoff funnel and driver supply-demand metrics—cutting average pickup time by 7% across major markets.

Previously at DigitalOcean and as a Data Analyst, I scaled batch and real-time pipelines with Airflow, Spark, and Kafka, re-architected transformations into bronze/silver/gold layers with dbt, and established data quality checks to reduce data incidents and discrepancies. I’m known for building simple, dependable systems that create measurable business impact while enabling self-service analytics.

Experience

Work history, roles, and key accomplishments

Current

Senior Data Engineer

Current

Lyft

Jan 2023 - Present (3 years 6 months)

Built Spark-based ELT/ETL pipelines on Databricks to ingest ride, GPS, and map signal data, improving pickup/dropoff accuracy and reducing data latency by 17% for routing and search. Designed Medallion architecture and dbt models in Snowflake, and developed streaming feature pipelines for ML ETA prediction and location ranking to improve model consistency and reduce deployment friction.

Apache Spark Databricks DBT Snowflake Medallion Architecture Data Pipelines Routing And Search Systems

Senior Data Engineer

DigitalOcean

Mar 2019 - Dec 2022 (3 years 9 months)

Designed and scaled batch and real-time data pipelines with Airflow, Spark, and Kafka, improving data availability latency by 27% for growth marketing and analytics teams. Re-architected transformations into dbt bronze/silver/gold layers, reducing downstream inconsistencies by 25%, and implemented dbt tests and Airflow monitoring to cut data incidents by 30%.

Airflow Apache Spark Kafka DBT Databricks Data Modeling Data Pipelines

Data Analyst

Snowflake

Jan 2018 - Mar 2019 (1 year 2 months)

Built and maintained SQL-based reporting and dashboards for product and GTM teams, improving decision-making speed through standardized, accessible metrics. Translated business requirements into reusable data models and performed reconciliations across multiple sources, reducing data discrepancies by 25%.

SQL Snowflake Reports And Dashboards Data Modeling KPI Tracking Data Reconciliation Analytics

Research Assistant

University of Florida

Apr 2017 - Dec 2017 (8 months)

Developed SQL/Python scripts to extract and transform data from multiple sources, improving data accessibility and reducing manual analysis effort for research teams. Collaborated with faculty and researchers to turn research questions into data-driven analyses supporting publications and project deliverables.

Python SQL Data Analysis Research Collaboration Data Processing