Looking for a job

Bryan Schaefer

@bryanschaefer1

Message

Lead Data Engineer building scalable data platforms for analytics and machine learning.

United States

Message

What I'm looking for

I seek a senior data engineering role building reliable, scalable data platforms that support analytics and ML, with strong CI/CD, observability, and collaborative teams.

I am a Lead Data Engineer specializing in architecting and operating scalable data platforms that power analytics, reporting, and machine learning workloads. I design and implement both batch and streaming pipelines, cloud-native architectures, and distributed processing frameworks using modern data stack technologies.

At Flatiron Health I directed development of 30 batch and streaming pipelines processing 5TB daily and integrated 12 healthcare data sources to deliver curated analytics datasets that support BI and ML teams. I established data reliability and observability frameworks across 50 production pipelines, reducing failure rates by 35% and delivered feature-ready datasets for 20 ML models.

Previously, I engineered ETL and streaming pipelines at Flare and Uber, optimizing multi-terabyte workflows, improving query performance and pipeline throughput, and strengthening data quality monitoring to reduce incidents. I mentor engineers, introduced CI/CD and modular pipeline standards, and translate complex business and clinical requirements into production-grade, maintainable data solutions.

I bring strong foundations in data modeling, orchestration, observability, and automated data quality controls, and I collaborate closely with engineering, analytics, and data science teams to ensure platform scalability, reliability, and long-term sustainability.

Experience

Work history, roles, and key accomplishments

Current

Lead Data Engineer

Current

Flatiron Health

Sep 2021 - Present (4 years 10 months)

Directed development of 30 batch and streaming pipelines processing 5TB daily to enable analytics and ML, established observability and reliability frameworks that reduced pipeline failures by 35%, and mentored a team of 6 engineers while improving deployment efficiency by 40%.

PySpark Data Modeling Observability CI CD Data Quality Airflow

Senior Data Engineer

Flare

Dec 2018 - Aug 2021 (2 years 8 months)

Engineered 25 ETL and streaming pipelines ingesting 3TB daily from 10 systems, standardized transformation frameworks to reduce duplication by 30%, and optimized queries to cut average runtimes by 40%.

Spark Streaming ETL Data Modeling Query Optimization Data Quality Orchestration SQL

Data Engineer

Uber

Feb 2018 - Oct 2018 (8 months)

Built high-throughput Spark pipelines processing billions of ride and event records, improved pipeline throughput by 25% via partitioning strategies, and maintained reliability across 20 production workflows supporting operational analytics.

Apache Spark Partitioning ETL SQL Anomaly Detection Data Warehouse

Data Analyst Intern

MindEase

Jan 2017 - Jan 2018 (1 year)

Analyzed operational datasets with SQL and Python to support 10 reporting dashboards, implemented validation checks that improved reporting accuracy by 20%, and produced BI dashboards tracking 15 KPIs.

SQL Python Data Validation KPI Tracking Analytics Dashboarding

Education

Degrees, certifications, and relevant coursework

Texas Tech University

Bachelor of Science, Computer Science

2013 - 2016

Grade: 3.8

Completed a Bachelor of Science in Computer Science with coursework in algorithms, data structures, distributed systems, and database systems; applied Python, Java, and SQL in academic projects.