Open to opportunities

Philip John

@philipjohn1

Message

Senior data platform engineer building scalable cloud-native pipelines and reliable high-throughput data products.

United States

Message

What I'm looking for

I’m looking to lead data platform and distributed pipeline work—improving reliability, latency, and cost efficiency with Python/Spark and cloud-native infrastructure, backed by strong observability and automation.

I’m a Senior Data Platform Engineer with 8+ years of experience designing scalable cloud-native data platforms, distributed data pipelines, and infrastructure automation across financial services, healthcare, and telecommunications. I specialize in Python, SQL, Spark-based processing, Kubernetes orchestration, and cloud data architecture, with a track record of delivering high-throughput systems processing 20+ TB/day while improving reliability, performance, and cost efficiency.

At Capital One, I designed an enterprise data platform supporting 200+ data products, built pipelines with Python/PySpark, Kafka, and Delta Lake, and reduced end-to-end latency from 8 hours to under 40 minutes. I also implemented reusable ingestion frameworks, CI/CD using Terraform and GitHub Actions, and observability with Prometheus and Grafana to reduce MTTR by 60%, while optimizing AWS infrastructure to save $1.2M annually. Previously at UnitedHealth Group and AT&T, I built HIPAA-compliant pipelines, migrated ETL workflows to Snowflake/Airflow/dbt, improved data accuracy to 99.8%, orchestrated 600+ workflows, reduced cloud warehouse costs by 38%, and modernized Hadoop workloads to Spark-based cloud architecture.

Experience

Work history, roles, and key accomplishments

Current

Senior Data Engineer

Current

Capital One

Oct 2023 - Present (2 years 9 months)

Designed an enterprise data platform supporting 200+ data products across analytics and risk domains. Built Python/PySpark/Kafka/Delta Lake pipelines processing 25+ TB/day, reducing end-to-end latency from 8 hours to under 40 minutes, and cut MTTR by 60% via Prometheus/Grafana.

Python SQL PySpark Kafka Delta Lake AWS Terraform GitHub Actions Prometheus Grafana

Data Platform Engineer

UnitedHealth Group

Apr 2020 - Oct 2023 (3 years 6 months)

Built HIPAA-compliant data pipelines processing 30M+ patient records and migrated legacy ETL to Snowflake, Airflow, and dbt. Orchestrated 600+ workflows, improved data accuracy to 99.8% using Python validation, and reduced cloud warehouse costs by 38%.

Python SQL Snowflake Airflow DBT Data Validation Query Optimization Partitioning HIPAA compliance Machine Learning Pipelines

Data Engineer

AT&T

Oct 2017 - Feb 2020 (2 years 4 months)

Built ingestion pipelines processing 5B+ telecom events per month and developed Python ETL services for transformation and enrichment. Migrated Hadoop workloads to Spark-based cloud architecture and improved query performance by 60% using indexing and partitioning, with monitoring dashboards for pipeline health.

Dashboarding Python SQL ETL Spark Indexing Partitioning Data Enrichment