Open to opportunities

Jerry Su

@jerrysu

Senior data engineer specializing in scalable ETL/ELT, data warehousing, and real-time analytics pipelines on AWS and Spark.

United States

Message

What I'm looking for

I’m looking for a role where I can build scalable ETL/ELT and data warehousing platforms, improve reliability and SLAs, and deliver real-time analytics that help analytics, reporting, and machine learning teams make faster, data-driven decisions.

I’m a Senior Data Engineer with extensive experience designing and building scalable data platforms, ETL/ELT pipelines, and real-time data processing systems. I focus on modern data architectures—data lakes, lakehouse patterns, and cloud data warehouses—delivering reliable analytics foundations for business and machine learning use cases.

At CVS Health, I built end-to-end healthcare claims and pharmacy analytics pipelines using Python, SQL, Airflow, AWS Glue, and Databricks. I implemented a Medallion architecture with Delta Lake on Amazon S3, optimized Spark workloads for efficiency, and developed Snowflake warehouse models with dbt to reduce errors and accelerate deployments—while enabling near real-time ingestion via Kinesis and improving data quality with Great Expectations.

Previously at Stripe and Amazon, I engineered low-latency event streaming with Apache Kafka and Spark Structured Streaming for fraud detection and risk scoring, and delivered batch + streaming financial analytics pipelines. I also built metadata-driven reporting in Redshift, automated data quality across 100+ datasets, and partnered with cross-functional teams to turn complex data into trustworthy, actionable insights.

Experience

Work history, roles, and key accomplishments

Current

Senior Data Engineer

Current

CVS Health

Apr 2023 - Present (3 years 3 months)

Designed and built healthcare claims and pharmacy analytics ETL/ELT pipelines processing millions of records using Python, SQL, Spark, and AWS Glue/Databricks. Implemented Medallion (Bronze/Silver/Gold) on Delta Lake, built Snowflake/dbt models, and reduced data latency via Kinesis-based near real-time ingestion while improving data quality through validation frameworks and monitoring.

Python SQL Apache Airflow AWS Glue Databricks Delta Lake Snowflake DBT Amazon Kinesis

Data Engineer

Stripe

Mar 2019 - Mar 2023 (4 years)

Enhanced real-time payment event streaming pipelines with Kafka and Kafka Streams, enabling low-latency processing of millions of transactions daily for fraud detection and risk scoring. Built batch and streaming ETL/ELT workflows with Spark, EMR, S3, and Snowflake, and migrated selected batch jobs to near real-time using Structured Streaming to improve timeliness.

Apache Kafka Kafka Streams Spark Structured Streaming Amazon S3 Amazon EMR Python SQL Snowflake AWS Glue Amazon CloudWatch

Data Engineer

Amazon

Jan 2014 - Feb 2019 (5 years 1 month)

Built scalable Seller Central data pipelines with Python, SQL, and Spark on EMR, cutting data preparation time by 30%. Developed a metadata-driven reporting system on Redshift/star schema, automated data quality across 100+ datasets, and enabled near real-time analytics using Kinesis Firehose/Lambda with BI in Tableau and QuickSight.