Karol Yamazaki
@karolyamazaki
Data Engineer turning raw data into low-latency intelligence for cloud-native, AI-first products.
What I'm looking for
I’m a Data Engineer with 10+ years transforming raw data into actionable intelligence for cloud-native and AI-first products. I focus on designing scalable ETL and streaming systems that move data from ingestion to insight with measurable impact.
Most recently at Stripe (remote), I migrated core billing ETL from monolith jobs to Apache Airflow DAGs and AWS Lambda, cutting end-to-end latency by 46% and reducing monthly compute spend by 33%. I also architected streaming ingestion with Apache Kafka and Flink to process 8 million events per day, improving near-real-time analytics freshness from 4 hours to under 5 minutes.
Before that, at Endava (remote) I consolidated pipelines across GCP and AWS to reduce maintenance overhead by 48%, and automated releases with GitHub Actions and Jenkins—bringing release lead time from days to hours. I hardened delivery with encryption, IAM policies, and data contract validation, preventing 95% of breaking changes during releases.
I started as a FullStack/Frontend developer and grew into production ML and data platforms through work at Preferred Networks and PKSHA Technology. Along the way, I became strong in infrastructure as code (Terraform), containerization (Docker/Kubernetes), observability (Prometheus/Grafana), and reliable ML deployment patterns with TensorFlow and PyTorch—always pairing engineering rigor with practical adoption.
Experience
Work history, roles, and key accomplishments
Led migration of core billing ETL from monolithic jobs to Apache Airflow DAGs and AWS Lambda, cutting end-to-end latency by 46% and monthly compute spend by 33%. Built Kafka/Flink streaming ingestion for 8M events/day and improved near-real-time analytics freshness from 4 hours to under 5 minutes.
Designed and delivered cloud-native data platforms on GCP and AWS, consolidating fragmented pipelines into a unified stack and reducing maintenance overhead by 48%. Integrated CI/CD with GitHub Actions and Jenkins to automate ETL releases, cutting release lead time from days to hours for 15 services, and scaled Spark on Kubernetes to support a 3x daily data-volume increase while maintaining SLA ta
Junior FullStack Developer
Preferred Networks
Feb 2017 - Feb 2020 (3 years)
Developed model training orchestration and feature pipelines in Python and Spark, enabling production ML experiments to run 3x faster and reducing iteration time from days to hours. Deployed containerized inference services with Docker and Kubernetes, serving 50k predictions/day with 98.7% uptime and sub-120 ms p99 latency.
Frontend Developer
PKSHA Technology
Sep 2015 - Jan 2017 (1 year 4 months)
Created interactive dashboards with React and D3.js to visualize model outputs, reducing stakeholder decision time by 33% and increasing usage to 1,200 monthly users. Improved TypeScript frontends integrated with GraphQL and optimized client performance by reducing bundle size and lazy-loading, decreasing time-to-interactive by 47% on average.
Education
Degrees, certifications, and relevant coursework
University of Hyogo
Bachelor’s Degree in Computer Science, Computer Science
2011 - 2015
Earned a Bachelor's degree in Computer Science at the University of Hyogo from 2011 to 2015.
Tech stack
Software and tools used professionally
Apache Spark
D3.js
GitHub
Kubernetes
Jenkins
GitHub Actions
Dask
MySQL
PostgreSQL
MongoDB
Hadoop
Next.js
pre-commit
Terraform
Jira
JavaScript
HTML5
Java
TensorFlow
PyTorch
MLflow
scikit-learn
Kafka
RabbitMQ
FastAPI
Grafana
Prometheus
Datadog
GraphQL
gRPC
Elasticsearch
AWS Lambda
pytest
Airflow
Time Analytics
SQL
Dagster
Bash
Increase
Core ML
Remote
Jan
Availability
Location
Authorized to work in
Job categories
Skills
Interested in hiring Karol?
You can contact Karol and 90k+ other talented remote workers on Himalayas.
Message KarolFind your dream job
Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!
