Himalayas logo
Ali ShahidAS
Open to opportunities

Ali Shahid

@alishahid1

Senior Data Engineer specializing in cloud-scale Lakehouse architectures, streaming systems, and data reliability.

United States
Message

What I'm looking for

I seek a senior data platform role building scalable, governed Lakehouse and streaming systems with strong DataOps, observability, and cost-efficiency at enterprise scale.

I am a Senior Data Engineer with 13 years of experience designing, building, and running cloud-scale data platforms across AWS, Azure, and GCP. I specialize in Lakehouse architectures, scalable batch and streaming pipelines (Spark, Flink, Kafka), CDC, and data governance to deliver reliable, well-governed data for analytics and ML.

At recent roles I architected cloud-native Lakehouse platforms using Delta Lake and Apache Iceberg, built real-time ingestion and CDC pipelines with Flink, Kafka, and Debezium, and implemented metadata-driven orchestration with Airflow and Dagster. I have driven performance optimizations for Spark workloads, standardized platform infrastructure with Terraform and Kubernetes, and established observability with OpenTelemetry, Prometheus, and Grafana.

I bring a strong programming background in Python, Scala, SQL, Go, and Rust, and a practical focus on DataOps automation, data quality, cost-efficient platform design, and federated analytics. I seek to apply these skills to build reliable, scalable data platforms that empower analytics, real-time reporting, and machine learning.

Experience

Work history, roles, and key accomplishments

DA
Current

Staff Data Engineer

Datafold

Sep 2021 - Present (4 years 5 months)

Led design and evolution of a cloud-native Lakehouse platform across AWS and Azure, built real-time CDC and ingestion pipelines with Flink, Kafka, and Debezium, and implemented metadata-driven orchestration and observability to reduce incidents and optimize compute costs.

AL

Data Engineer

AlphaSense

Jul 2015 - Mar 2019 (3 years 8 months)

Built and maintained large-scale ETL pipelines on AWS EMR and Azure HDInsight using PySpark/Scala, implemented CDC with Kafka Connect/Debezium, and automated data quality checks to improve pipeline reliability and reduce compute costs.

ET

Junior Data Engineer

Enigma Technologies

May 2012 - May 2015 (3 years)

Developed foundational ETL pipelines with Talend, Python, and SQL to ingest ERP/CRM data into PostgreSQL and Hadoop, designed dimensional models for BI, and supported migration of on-prem Hadoop workloads to AWS S3/EMR.

Education

Degrees, certifications, and relevant coursework

PU

Punjab University

Bachelor of Science, Computer Science

Completed a Bachelor of Science in Computer Science focused on core computing principles and software development.

Find your dream job

Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan
Ali Shahid - Staff Data Engineer - Datafold | Himalayas