Jerry Su
@jerrysu
Senior data engineer specializing in scalable ETL/ELT, data warehousing, and real-time analytics pipelines on AWS and Spark.
What I'm looking for
I’m a Senior Data Engineer with extensive experience designing and building scalable data platforms, ETL/ELT pipelines, and real-time data processing systems. I focus on modern data architectures—data lakes, lakehouse patterns, and cloud data warehouses—delivering reliable analytics foundations for business and machine learning use cases.
At CVS Health, I built end-to-end healthcare claims and pharmacy analytics pipelines using Python, SQL, Airflow, AWS Glue, and Databricks. I implemented a Medallion architecture with Delta Lake on Amazon S3, optimized Spark workloads for efficiency, and developed Snowflake warehouse models with dbt to reduce errors and accelerate deployments—while enabling near real-time ingestion via Kinesis and improving data quality with Great Expectations.
Previously at Stripe and Amazon, I engineered low-latency event streaming with Apache Kafka and Spark Structured Streaming for fraud detection and risk scoring, and delivered batch + streaming financial analytics pipelines. I also built metadata-driven reporting in Redshift, automated data quality across 100+ datasets, and partnered with cross-functional teams to turn complex data into trustworthy, actionable insights.
Experience
Work history, roles, and key accomplishments
Senior Data Engineer
CVS Health
Apr 2023 - Present (3 years 2 months)
Designed and built healthcare claims and pharmacy analytics ETL/ELT pipelines processing millions of records using Python, SQL, Spark, and AWS Glue/Databricks. Implemented Medallion (Bronze/Silver/Gold) on Delta Lake, built Snowflake/dbt models, and reduced data latency via Kinesis-based near real-time ingestion while improving data quality through validation frameworks and monitoring.
Enhanced real-time payment event streaming pipelines with Kafka and Kafka Streams, enabling low-latency processing of millions of transactions daily for fraud detection and risk scoring. Built batch and streaming ETL/ELT workflows with Spark, EMR, S3, and Snowflake, and migrated selected batch jobs to near real-time using Structured Streaming to improve timeliness.
Built scalable Seller Central data pipelines with Python, SQL, and Spark on EMR, cutting data preparation time by 30%. Developed a metadata-driven reporting system on Redshift/star schema, automated data quality across 100+ datasets, and enabled near real-time analytics using Kinesis Firehose/Lambda with BI in Tableau and QuickSight.
Education
Degrees, certifications, and relevant coursework
University of California, Merced
Bachelor of Science, Computer Science
2009 - 2013
Earned a Bachelor of Science in Computer Science at the University of California, Merced.
Tech stack
Software and tools used professionally
Amazon Redshift
Azure Synapse
Apache Spark
AWS Glue
Apache Flink
Amazon Quicksight
AWS IAM
Amazon CloudWatch
Amazon S3
Google Cloud Storage
PySpark
AWS Data Pipeline
dbt
MySQL
PostgreSQL
MongoDB
Hadoop
Gmail
Databricks
Java
Kafka
Amazon DynamoDB
Amazon Kinesis
Amazon Kinesis Firehose
Avro
AWS Lambda
Kafka Streams
Airflow
Time Analytics
Google BigQuery
Amazon EMR
Amazon Athena
SQL
Delta Lake
Great Expectations
Trino
Bash
Enhance
Factory
Remote
Jan
Availability
Location
Authorized to work in
Job categories
Skills
Interested in hiring Jerry?
You can contact Jerry and 90k+ other talented remote workers on Himalayas.
Message JerryFind your dream job
Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!
