Open to opportunities

Justin Feng

@justinfeng

Message

I’m a data engineer building scalable batch and streaming analytics platforms delivering near-real-time insights.

United States

Message

What I'm looking for

I’m looking to build product-driven data platforms—owning batch + streaming pipelines, data quality/governance, and analytics performance—while partnering with product teams and mentoring engineers to deliver fast, trustworthy insights.

I’m a data engineer focused on building scalable data platforms and analytics systems that drive product decisions. As a Staff Software Engineer in Data Engineering, I helped design and deliver a Snowflake-backed merchant analytics platform ingesting and transforming 10B+ daily commerce events via Kafka and Apache Spark.

I build analytics-grade data models and transformations with dbt and SQL, delivering <15-minute data freshness across dashboards and APIs. I also lead hybrid streaming + batch pipeline architecture using Spark Structured Streaming, Airflow, and Snowflake ingestion patterns—reducing end-to-end latency by ~40% and improving reliability to 99.9%+ SLA.

I’m equally committed to correctness and operational excellence: I set data quality, observability, and governance standards to cut customer-visible incidents by ~50% and accelerate experimentation. From an event-driven insurance platform to batch warehouse ETL with Airflow, Python, Spark, and Redshift, I consistently translate business requirements into feature-ready datasets while optimizing cost/performance and mentoring engineers on modern cloud data architecture.

Experience

Work history, roles, and key accomplishments

Current

Staff Software Engineer

Current

Shopify

Jul 2025 - Present (1 year)

Designed and delivered a Snowflake-backed merchant analytics platform ingesting and transforming 10B+ daily commerce events via Kafka and Apache Spark, enabling near-real-time customer analytics with <15-minute data freshness. Built hybrid streaming + batch pipelines (Spark Structured Streaming, Airflow) to cut end-to-end latency ~40% and improve reliability to 99.9%+ SLA while leading data qualit

Snowflake Apache Spark Kafka DBT SQL Python Airflow Spark Structured Streaming AWS Data Architecture

Senior Data Engineer

Homesite Insurance

Feb 2021 - Jul 2025 (4 years 5 months)

Designed an event-driven data platform for the full policy lifecycle, ingesting 300M+ daily events into AWS S3 and transforming them in Databricks (Spark) for downstream analytics and product features. Built real-time underwriting analytics with <30-minute quote-level signals, implemented a Snowflake analytics warehouse with dbt powering 100+ dashboards, and improved operational outcomes including

Apache Spark Databricks Snowflake DBT SQL Python Data Modeling S3

Data Engineer

Engage3

May 2019 - Feb 2021 (1 year 9 months)

Designed and implemented scalable batch data pipelines on AWS to ingest and normalize multi-channel customer engagement data (email, web, and campaign events) using Python, Apache Spark, and Amazon S3. Built and maintained a Redshift-based analytical warehouse with Airflow-orchestrated ETL, improving pipeline reliability and reducing end-user report latency ~40% through Spark tuning and Redshift q

S3 Python Apache Spark Amazon Redshift SQL ETL Airflow Data Warehouse

Data & Analytics Associate

KPMG

Jul 2017 - Feb 2019 (1 year 7 months)

Designed and delivered enterprise ETL pipelines for the State of Ohio eProcurement platform supporting 50,000+ users, and provided technical architecture recommendations that improved data processing efficiency ~30%.

ETL Data Pipeline Architecture Data Integration Performance Optimization Data Engineering