Open to opportunities

Karol Yamazaki

@karolyamazaki

Message

Data Engineer turning raw data into low-latency intelligence for cloud-native, AI-first products.

Poland

Message

What I'm looking for

I’m looking to build and scale low-latency ETL/streaming systems for cloud-native, AI-first products—using Airflow/Kafka/Spark, infrastructure as code, and strong observability—so teams can ship confidently with measurable SLAs and data governance.

I’m a Data Engineer with 10+ years transforming raw data into actionable intelligence for cloud-native and AI-first products. I focus on designing scalable ETL and streaming systems that move data from ingestion to insight with measurable impact.

Most recently at Stripe (remote), I migrated core billing ETL from monolith jobs to Apache Airflow DAGs and AWS Lambda, cutting end-to-end latency by 46% and reducing monthly compute spend by 33%. I also architected streaming ingestion with Apache Kafka and Flink to process 8 million events per day, improving near-real-time analytics freshness from 4 hours to under 5 minutes.

Before that, at Endava (remote) I consolidated pipelines across GCP and AWS to reduce maintenance overhead by 48%, and automated releases with GitHub Actions and Jenkins—bringing release lead time from days to hours. I hardened delivery with encryption, IAM policies, and data contract validation, preventing 95% of breaking changes during releases.

I started as a FullStack/Frontend developer and grew into production ML and data platforms through work at Preferred Networks and PKSHA Technology. Along the way, I became strong in infrastructure as code (Terraform), containerization (Docker/Kubernetes), observability (Prometheus/Grafana), and reliable ML deployment patterns with TensorFlow and PyTorch—always pairing engineering rigor with practical adoption.

Experience

Work history, roles, and key accomplishments

Current

Senior Software Engineer

Current

Stripe

Feb 2023 - Present (3 years 5 months)

Led migration of core billing ETL from monolithic jobs to Apache Airflow DAGs and AWS Lambda, cutting end-to-end latency by 46% and monthly compute spend by 33%. Built Kafka/Flink streaming ingestion for 8M events/day and improved near-real-time analytics freshness from 4 hours to under 5 minutes.

Python SQL Apache Airflow AWS Lambda Apache Kafka Apache Flink Terraform Kubernetes Amazon Redshift

Senior Backend Developer

Endava

Mar 2020 - Jan 2023 (2 years 10 months)

Designed and delivered cloud-native data platforms on GCP and AWS, consolidating fragmented pipelines into a unified stack and reducing maintenance overhead by 48%. Integrated CI/CD with GitHub Actions and Jenkins to automate ETL releases, cutting release lead time from days to hours for 15 services, and scaled Spark on Kubernetes to support a 3x daily data-volume increase while maintaining SLA ta

Google Cloud Platform (GCP)AWS Apache Spark Kubernetes Apache Kafka Amazon Kinesis GitHub Actions Jenkins PostgreSQL MySQL

Junior FullStack Developer

Preferred Networks

Feb 2017 - Feb 2020 (3 years)

Developed model training orchestration and feature pipelines in Python and Spark, enabling production ML experiments to run 3x faster and reducing iteration time from days to hours. Deployed containerized inference services with Docker and Kubernetes, serving 50k predictions/day with 98.7% uptime and sub-120 ms p99 latency.

Python Apache Spark Apache Airflow MLFlow Docker Kubernetes fastAPI gRPC MongoDB HDFS

Frontend Developer

PKSHA Technology

Sep 2015 - Jan 2017 (1 year 4 months)

Created interactive dashboards with React and D3.js to visualize model outputs, reducing stakeholder decision time by 33% and increasing usage to 1,200 monthly users. Improved TypeScript frontends integrated with GraphQL and optimized client performance by reducing bundle size and lazy-loading, decreasing time-to-interactive by 47% on average.

React D3.Js TypeScript GraphQL TensorFlow Unit Testing Integration Testing Performance Optimization