Michael Li
@michaelli1
Senior data engineer building reliable Spark/Databricks pipelines to power analytics and ML products.
What I'm looking for
I’m a Senior Data Engineer who designs and ships production-grade ELT/ETL and real-time data pipelines that make analytics and machine learning possible at scale. I’ve built Spark-based pipelines in Databricks and used streaming feature pipelines to improve ETA prediction and location ranking for search and routing systems.
At Lyft, I designed Medallion architecture and dbt models in Snowflake, built feature pipelines to support ML teams, and translated data/ML requirements into scalable architectures with product, mapping, and data science partners. I improved pickup/dropoff accuracy and reduced data latency by 17% for routing and search systems, and helped reduce deployment friction across ML teams.
I also focus heavily on reliability and data quality. I implemented Airflow workflows and monitoring for core mapping and mobility datasets, reducing incidents by 19% in critical operational datasets, and delivered semantic layers and curated datasets for pickup/dropoff funnel and driver supply-demand metrics—cutting average pickup time by 7% across major markets.
Previously at DigitalOcean and as a Data Analyst, I scaled batch and real-time pipelines with Airflow, Spark, and Kafka, re-architected transformations into bronze/silver/gold layers with dbt, and established data quality checks to reduce data incidents and discrepancies. I’m known for building simple, dependable systems that create measurable business impact while enabling self-service analytics.
Experience
Work history, roles, and key accomplishments
Built Spark-based ELT/ETL pipelines on Databricks to ingest ride, GPS, and map signal data, improving pickup/dropoff accuracy and reducing data latency by 17% for routing and search. Designed Medallion architecture and dbt models in Snowflake, and developed streaming feature pipelines for ML ETA prediction and location ranking to improve model consistency and reduce deployment friction.
Designed and scaled batch and real-time data pipelines with Airflow, Spark, and Kafka, improving data availability latency by 27% for growth marketing and analytics teams. Re-architected transformations into dbt bronze/silver/gold layers, reducing downstream inconsistencies by 25%, and implemented dbt tests and Airflow monitoring to cut data incidents by 30%.
Built and maintained SQL-based reporting and dashboards for product and GTM teams, improving decision-making speed through standardized, accessible metrics. Translated business requirements into reusable data models and performed reconciliations across multiple sources, reducing data discrepancies by 25%.
Developed SQL/Python scripts to extract and transform data from multiple sources, improving data accessibility and reducing manual analysis effort for research teams. Collaborated with faculty and researchers to turn research questions into data-driven analyses supporting publications and project deliverables.
Education
Degrees, certifications, and relevant coursework
University of Florida
Master of Science, Computer Science
2015 - 2017
Grade: 3.8
Activities and societies: N/A
Earned a Master of Science in Computer Science at the University of Florida from 2015 to 2017.
University of Florida
Bachelor of Science, Computer Science
2011 - 2015
Earned a Bachelor of Science in Computer Science at the University of Florida from 2011 to 2015.
Tech stack
Software and tools used professionally
Availability
Location
Authorized to work in
Social media
Job categories
Interested in hiring Michael?
You can contact Michael and 90k+ other talented remote workers on Himalayas.
Message MichaelFind your dream job
Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!
