Open to opportunities

Ryan Jepsen

@ryanjepsen

Message

Senior data engineer specializing in low-latency real-time market data pipelines and scalable lakehouse platforms.

United States

Message

What I'm looking for

I’m looking for a Senior Data Engineering role to build low-latency streaming platforms, modernize ETL into lakehouse architectures, and own data quality, governance, and observability from pipeline design through production.

I’m a Senior Data Engineer focused on building low-latency, real-time market data systems. At IMC Trading, I led the development of a Real-Time Market Data Processing Platform, architecting Spark and Scala pipelines integrated with Kafka and Kinesis, deployed on AWS, and reducing processing latency by 35%.

I also delivered an event-driven High-Frequency Data Ingestion Framework using Kafka with AWS SNS and SQS, implementing schema enforcement and fault-tolerant design to support millions of market events per second. I e-architected legacy batch ETL into Spark Structured Streaming jobs orchestrated via Airflow, persisting curated datasets into AWS S3 and Amazon Redshift for downstream analytics.

Beyond performance and reliability, I strengthen resiliency through idempotent processing, retries, and dead-letter queues, and I embed observability, data quality validation, and governance aligned with CI/CD and Git-based version control. I’ve optimized distributed workloads with Spark tuning and lakehouse-style modeling (Star Schema and Snowflake Schema) to support petabyte-scale time-series datasets.

Experience

Work history, roles, and key accomplishments

Current

Senior Data Engineer

Current

IMC Trading

May 2021 - Present (5 years 2 months)

Led development of a real-time market data processing platform, architecting low-latency Spark/Scala pipelines integrated with Kafka and Kinesis, reducing processing latency by 35%. Built high-frequency ingestion and streaming ETL workflows (Airflow + Spark Structured Streaming) and improved throughput by 28% while reducing production incidents by 50%.

Apache Spark Scala Apache Kafka Amazon Kinesis AWS Glue AWS S3 Amazon Redshift Apache Airflow Amazon CloudWatch

Big Data Engineer

KPMG US

Oct 2019 - May 2021 (1 year 7 months)

Spearheaded enterprise data modernization by migrating legacy SQL Server workflows into distributed Spark/PySpark pipelines on Azure Databricks using Delta Lake storage. Built regulatory reporting ELT pipelines with dbt and improved processing time by 45% through Spark performance tuning and orchestration.

Python SQL Pyspark Apache Spark DBT Azure Databricks Delta Lake Snowflake Amazon Redshift Apache Airflow

Data Engineer

Conversant LLC

Jun 2016 - Oct 2019 (3 years 4 months)

Engineered distributed data processing platforms handling 250B+ log records daily using Spark/Scala with Kafka, Hadoop, and YARN clusters. Migrated legacy MapReduce to Spark/Spark Streaming pipelines and improved execution performance by 60%, enabling near real-time analytics via Kafka + Flume + Elasticsearch.