Open to opportunities

Humza Mahmud

@humzamahmud

Message

Principal/Staff Data Engineer specializing in cloud-native distributed and streaming data platforms.

United States

Message

What I'm looking for

I seek roles building and scaling cloud-native distributed data platforms and real-time analytics with strong observability, IaC, and compliance focus.

I am a Principal/Staff Data Engineer with over a decade of experience designing and operating cloud-native data platforms, distributed data systems, and analytics infrastructure across AWS, GCP, and Azure. I specialize in ETL/ELT, real-time streaming architectures, lakehouse solutions, and observability to deliver reliable, compliant data pipelines.

I have architected large-scale event-driven systems using Kafka, Spark, Flink, Databricks, Snowflake, BigQuery, and related services to support fraud analytics, healthcare analytics, and autonomous systems. My work includes performance tuning, cost optimization, secure data governance, and building ML-ready feature stores and model pipelines.

I drive platform modernization through Infrastructure as Code, reusable Terraform and Kubernetes modules, strong monitoring (OpenTelemetry, Prometheus, Grafana, Datadog), and automated CI/CD. I mentor engineers, lead design reviews, and focus on delivering scalable, compliant platforms that reduce costs and improve incident recovery.

Experience

Work history, roles, and key accomplishments

Current

Principal Distributed Data Systems Engineer

Current

Lithic

Jan 2024 - Present (2 years 6 months)

Architected large-scale distributed data systems and real-time fraud analytics pipelines, reducing incident recovery time by 50% and cutting infrastructure costs by 35% while meeting SOC 2, PCI-DSS, and GDPR compliance.

Kafka Apache Spark Step Functions Terraform Kubernetes OpenTelemetry Prometheus Datadog

Senior Analytics Platform Engineer

Lightouch

Aug 2021 - Dec 2023 (2 years 4 months)

Designed and led a cloud-native healthcare lakehouse platform using Delta Lake and Spark, implemented PHI masking and RBAC, and reduced compute costs by 30% while enabling ML-ready feature stores.

Delta Lake Apache Spark Databricks Airflow Terraform GitHub Actions DataHub PHI Masking

Streaming Data Engineer

Ralpanda

Mar 2019 - Jul 2021 (2 years 4 months)

Built real-time telemetry ingestion and feature computation pipelines for autonomous driving using Apache Beam, Pub/Sub, and BigQuery, improving BigQuery costs by 35% and pipeline reliability for ML serving.

Apache Beam BigQuery Dataflow Kubeflow Monitoring

ETL/ELT Data Engineer

MotherDuck

Aug 2016 - Jul 2019 (2 years 11 months)

Built and maintained scalable ETL/ELT pipelines with Airflow and AWS Glue, optimized Redshift performance, and improved pipeline reliability via idempotency and structured error handling.

Airflow Python SQL AWS Glue Amazon Redshift ETL Data Modeling