Hajio Shao
@hajioshao
Senior Data Engineer specializing in ML infrastructure, scalable data pipelines, and cloud.
What I'm looking for
I am a Senior Data Engineer and ML Infrastructure Engineer with over eight years building scalable data platforms and production ML systems. I design low-latency feature generation pipelines and integrate ML models into production to enable real-time personalization.
My work spans distributed processing with Apache Spark, Flink, and Kafka, orchestration using Airflow and Dagster, and cloud deployments across AWS, GCP, and Azure. I have led infrastructure efforts using Kubernetes, Docker, Terraform, and CI/CD tooling to automate model and service delivery.
I have delivered large-scale solutions including real-time indexing and serving, streaming feature generation, payment and fraud systems, and consolidated billing platforms. I emphasize reliability through monitoring, alerting, and data governance to maintain high-quality data for ML.
I mentor junior engineers, drive technical strategy, and collaborate with product teams to translate business needs into scalable architectures that support complex ML and AI use cases.
Experience
Work history, roles, and key accomplishments
Led development and optimization of Core ML feature infrastructure using Spark and Flink to enable low-latency feature generation across product surfaces, improving real-time personalization and pipeline reliability.
Senior Data Engineer
DiDi
Aug 2020 - Apr 2022 (1 year 8 months)
Built data platforms for a large-scale ride-sharing and payments ecosystem, implementing KYC, anti-fraud, and bad-debt models that improved payment security and risk detection across the 99Pay product.
Cloud Data Engineer
Classic Computers Corp
Aug 2017 - Aug 2020 (3 years)
Designed and automated cloud-based payment transaction pipelines on AWS using Kafka, Lambda, RDS, and S3, improving real-time processing, reconciliation, and scalable storage for millions of transactions.
Full Stack Engineer
Marlabs Inc
Mar 2016 - Apr 2017 (1 year 1 month)
Developed full-stack applications and RESTful APIs using Node.js and Python, implemented cloud storage on AWS S3, and automated backend processes to support scalable data ingestion and ETL workflows.
Education
Degrees, certifications, and relevant coursework
New York University
Master of Science, Computer Science
2013 - 2015
Completed a Master's degree in Computer Science with coursework focused on advanced algorithms and systems from 2013 to 2015.
University of Shanghai for Science and Technology
Bachelor of Science, Computer Science
2008 - 2012
Completed a Bachelor's degree in Computer Science from 2008 to 2012 focusing on foundational computing principles and software development.
Tech stack
Software and tools used professionally
Azure Synapse
Apache Spark
Apache Flink
Apache Hive
Talend
AWS IAM
GitHub
GitLab
Kubernetes
AWS Fargate
Jenkins
GitHub Actions
GitLab CI
dbt
MySQL
PostgreSQL
MongoDB
Cassandra
Hadoop
Gmail
Node.js
Django
Yarn
Redis
Terraform
AWS CloudFormation
JavaScript
Java
AWS CloudTrail
TensorFlow
PyTorch
MLflow
scikit-learn
Keras
Kubeflow
Kafka
Grafana
Prometheus
Zookeeper
Xoom
Datadog
Elasticsearch
Ansible
AWS Lambda
Serverless
Airflow
Braintree
SQL
XGBoost
Hugging Face
LightGBM
CatBoost
Seldon
Dagster
Availability
Location
Authorized to work in
Job categories
Skills
Interested in hiring Hajio?
You can contact Hajio and 90k+ other talented remote workers on Himalayas.
Message HajioFind your dream job
Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!
