Open to opportunities

Ali Shahid

@alishahid1

Message

Senior Data Engineer specializing in cloud-scale Lakehouse architectures, streaming systems, and data reliability.

United States

Message

What I'm looking for

I seek a senior data platform role building scalable, governed Lakehouse and streaming systems with strong DataOps, observability, and cost-efficiency at enterprise scale.

I am a Senior Data Engineer with 13 years of experience designing, building, and running cloud-scale data platforms across AWS, Azure, and GCP. I specialize in Lakehouse architectures, scalable batch and streaming pipelines (Spark, Flink, Kafka), CDC, and data governance to deliver reliable, well-governed data for analytics and ML.

At recent roles I architected cloud-native Lakehouse platforms using Delta Lake and Apache Iceberg, built real-time ingestion and CDC pipelines with Flink, Kafka, and Debezium, and implemented metadata-driven orchestration with Airflow and Dagster. I have driven performance optimizations for Spark workloads, standardized platform infrastructure with Terraform and Kubernetes, and established observability with OpenTelemetry, Prometheus, and Grafana.

I bring a strong programming background in Python, Scala, SQL, Go, and Rust, and a practical focus on DataOps automation, data quality, cost-efficient platform design, and federated analytics. I seek to apply these skills to build reliable, scalable data platforms that empower analytics, real-time reporting, and machine learning.

Experience

Work history, roles, and key accomplishments

Current

Staff Data Engineer

Current

Datafold

Sep 2021 - Present (4 years 6 months)

Led design and evolution of a cloud-native Lakehouse platform across AWS and Azure, built real-time CDC and ingestion pipelines with Flink, Kafka, and Debezium, and implemented metadata-driven orchestration and observability to reduce incidents and optimize compute costs.

Delta Lake Apache Iceberg Apache Flink Kafka Debezium Airflow Dagster OpenTelemetry Spark AWS

Senior Data Engineer

Sigmoid

Apr 2019 - Aug 2021 (2 years 4 months)

Designed and operated scalable streaming pipelines processing billions of IoT events daily, migrated analytics to Snowflake with dbt and automated data quality checks, and enabled federated analytics via Trino/Starburst to unify cross-store queries.

Spark Structured Streaming Kafka AWS Kinesis Snowflake DBT Great Expectations Trino Starburst Terraform Kubernetes

Data Engineer

AlphaSense

Jul 2015 - Mar 2019 (3 years 8 months)

Built and maintained large-scale ETL pipelines on AWS EMR and Azure HDInsight using PySpark/Scala, implemented CDC with Kafka Connect/Debezium, and automated data quality checks to improve pipeline reliability and reduce compute costs.

Pyspark Scala Debezium Deequ Airflow Azure HDInsight Spark Tuning PostgreSQL

Junior Data Engineer

Enigma Technologies

May 2012 - May 2015 (3 years)

Developed foundational ETL pipelines with Talend, Python, and SQL to ingest ERP/CRM data into PostgreSQL and Hadoop, designed dimensional models for BI, and supported migration of on-prem Hadoop workloads to AWS S3/EMR.