Justin Feng
@justinfeng
I’m a data engineer building scalable batch and streaming analytics platforms delivering near-real-time insights.
What I'm looking for
I’m a data engineer focused on building scalable data platforms and analytics systems that drive product decisions. As a Staff Software Engineer in Data Engineering, I helped design and deliver a Snowflake-backed merchant analytics platform ingesting and transforming 10B+ daily commerce events via Kafka and Apache Spark.
I build analytics-grade data models and transformations with dbt and SQL, delivering <15-minute data freshness across dashboards and APIs. I also lead hybrid streaming + batch pipeline architecture using Spark Structured Streaming, Airflow, and Snowflake ingestion patterns—reducing end-to-end latency by ~40% and improving reliability to 99.9%+ SLA.
I’m equally committed to correctness and operational excellence: I set data quality, observability, and governance standards to cut customer-visible incidents by ~50% and accelerate experimentation. From an event-driven insurance platform to batch warehouse ETL with Airflow, Python, Spark, and Redshift, I consistently translate business requirements into feature-ready datasets while optimizing cost/performance and mentoring engineers on modern cloud data architecture.
Experience
Work history, roles, and key accomplishments
Designed and delivered a Snowflake-backed merchant analytics platform ingesting and transforming 10B+ daily commerce events via Kafka and Apache Spark, enabling near-real-time customer analytics with <15-minute data freshness. Built hybrid streaming + batch pipelines (Spark Structured Streaming, Airflow) to cut end-to-end latency ~40% and improve reliability to 99.9%+ SLA while leading data qualit
Senior Data Engineer
Homesite Insurance
Feb 2021 - Jul 2025 (4 years 5 months)
Designed an event-driven data platform for the full policy lifecycle, ingesting 300M+ daily events into AWS S3 and transforming them in Databricks (Spark) for downstream analytics and product features. Built real-time underwriting analytics with <30-minute quote-level signals, implemented a Snowflake analytics warehouse with dbt powering 100+ dashboards, and improved operational outcomes including
Data Engineer
Engage3
May 2019 - Feb 2021 (1 year 9 months)
Designed and implemented scalable batch data pipelines on AWS to ingest and normalize multi-channel customer engagement data (email, web, and campaign events) using Python, Apache Spark, and Amazon S3. Built and maintained a Redshift-based analytical warehouse with Airflow-orchestrated ETL, improving pipeline reliability and reducing end-user report latency ~40% through Spark tuning and Redshift q
Designed and delivered enterprise ETL pipelines for the State of Ohio eProcurement platform supporting 50,000+ users, and provided technical architecture recommendations that improved data processing efficiency ~30%.
Education
Degrees, certifications, and relevant coursework
Georgia Institute of Technology
Master of Science (Computer Science), Computer Science
2019 - 2022
Master of Science in Computer Science at Georgia Institute of Technology (2019–2022).
California State University
Bachelor's Degree, Business Information Systems
2013 - 2017
Bachelor's degree in Business Information Systems at California State University (2013–2017).
Tech stack
Software and tools used professionally
Availability
Location
Authorized to work in
Job categories
Interested in hiring Justin?
You can contact Justin and 90k+ other talented remote workers on Himalayas.
Message JustinFind your dream job
Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!
