Humza Mahmud
@humzamahmud
Principal/Staff Data Engineer specializing in cloud-native distributed and streaming data platforms.
What I'm looking for
I am a Principal/Staff Data Engineer with over a decade of experience designing and operating cloud-native data platforms, distributed data systems, and analytics infrastructure across AWS, GCP, and Azure. I specialize in ETL/ELT, real-time streaming architectures, lakehouse solutions, and observability to deliver reliable, compliant data pipelines.
I have architected large-scale event-driven systems using Kafka, Spark, Flink, Databricks, Snowflake, BigQuery, and related services to support fraud analytics, healthcare analytics, and autonomous systems. My work includes performance tuning, cost optimization, secure data governance, and building ML-ready feature stores and model pipelines.
I drive platform modernization through Infrastructure as Code, reusable Terraform and Kubernetes modules, strong monitoring (OpenTelemetry, Prometheus, Grafana, Datadog), and automated CI/CD. I mentor engineers, lead design reviews, and focus on delivering scalable, compliant platforms that reduce costs and improve incident recovery.
Experience
Work history, roles, and key accomplishments
Principal Distributed Data Systems Engineer
Lithic
Jan 2024 - Present (2 years 1 month)
Architected large-scale distributed data systems and real-time fraud analytics pipelines, reducing incident recovery time by 50% and cutting infrastructure costs by 35% while meeting SOC 2, PCI-DSS, and GDPR compliance.
Senior Analytics Platform Engineer
Lightouch
Aug 2021 - Dec 2023 (2 years 4 months)
Designed and led a cloud-native healthcare lakehouse platform using Delta Lake and Spark, implemented PHI masking and RBAC, and reduced compute costs by 30% while enabling ML-ready feature stores.
Streaming Data Engineer
Ralpanda
Mar 2019 - Jul 2021 (2 years 4 months)
Built real-time telemetry ingestion and feature computation pipelines for autonomous driving using Apache Beam, Pub/Sub, and BigQuery, improving BigQuery costs by 35% and pipeline reliability for ML serving.
ETL/ELT Data Engineer
MotherDuck
Aug 2016 - Jul 2019 (2 years 11 months)
Built and maintained scalable ETL/ELT pipelines with Airflow and AWS Glue, optimized Redshift performance, and improved pipeline reliability via idempotency and structured error handling.
Education
Degrees, certifications, and relevant coursework
Unknown Institution
Bachelor of Science, Computer Science
Bachelor's degree in Computer Science; coursework and training supporting data engineering, distributed systems, and analytics platform development.
Tech stack
Software and tools used professionally
Amazon Redshift
Apache Spark
AWS Glue
Apache Flink
GitHub
GitLab
Kubernetes
Jenkins
GitHub Actions
GitLab CI
Pandas
PySpark
Debezium
dbt
dat
Gmail
Databricks
Redis
Terraform
Java
JSON
Kafka
Apache Pulsar
FastAPI
Grafana
Prometheus
OpenTelemetry
Datadog
ws
AWS Lambda
Airflow
Apache Beam
SQL
MotherDuck
Pinecone
Delta Lake
Great Expectations
ArgoCD
Lithic
Factory
Beam
Availability
Location
Authorized to work in
Website
humzamahmud.devJob categories
Skills
Interested in hiring Humza?
You can contact Humza and 90k+ other talented remote workers on Himalayas.
Message HumzaFind your dream job
Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!
