Ali Shahid
@alishahid1
Senior Data Engineer specializing in cloud-scale Lakehouse architectures, streaming systems, and data reliability.
What I'm looking for
I am a Senior Data Engineer with 13 years of experience designing, building, and running cloud-scale data platforms across AWS, Azure, and GCP. I specialize in Lakehouse architectures, scalable batch and streaming pipelines (Spark, Flink, Kafka), CDC, and data governance to deliver reliable, well-governed data for analytics and ML.
At recent roles I architected cloud-native Lakehouse platforms using Delta Lake and Apache Iceberg, built real-time ingestion and CDC pipelines with Flink, Kafka, and Debezium, and implemented metadata-driven orchestration with Airflow and Dagster. I have driven performance optimizations for Spark workloads, standardized platform infrastructure with Terraform and Kubernetes, and established observability with OpenTelemetry, Prometheus, and Grafana.
I bring a strong programming background in Python, Scala, SQL, Go, and Rust, and a practical focus on DataOps automation, data quality, cost-efficient platform design, and federated analytics. I seek to apply these skills to build reliable, scalable data platforms that empower analytics, real-time reporting, and machine learning.
Experience
Work history, roles, and key accomplishments
Staff Data Engineer
Datafold
Sep 2021 - Present (4 years 5 months)
Led design and evolution of a cloud-native Lakehouse platform across AWS and Azure, built real-time CDC and ingestion pipelines with Flink, Kafka, and Debezium, and implemented metadata-driven orchestration and observability to reduce incidents and optimize compute costs.
Senior Data Engineer
Sigmoid
Apr 2019 - Aug 2021 (2 years 4 months)
Designed and operated scalable streaming pipelines processing billions of IoT events daily, migrated analytics to Snowflake with dbt and automated data quality checks, and enabled federated analytics via Trino/Starburst to unify cross-store queries.
Data Engineer
AlphaSense
Jul 2015 - Mar 2019 (3 years 8 months)
Built and maintained large-scale ETL pipelines on AWS EMR and Azure HDInsight using PySpark/Scala, implemented CDC with Kafka Connect/Debezium, and automated data quality checks to improve pipeline reliability and reduce compute costs.
Junior Data Engineer
Enigma Technologies
May 2012 - May 2015 (3 years)
Developed foundational ETL pipelines with Talend, Python, and SQL to ingest ERP/CRM data into PostgreSQL and Hadoop, designed dimensional models for BI, and supported migration of on-prem Hadoop workloads to AWS S3/EMR.
Education
Degrees, certifications, and relevant coursework
Punjab University
Bachelor of Science, Computer Science
Completed a Bachelor of Science in Computer Science focused on core computing principles and software development.
Tech stack
Software and tools used professionally
Azure HDInsight
Azure Synapse
AWS Glue
Apache Flink
Druid
Talend
Dremio
GitHub
Kubernetes
Jenkins
GitHub Actions
PySpark
Debezium
dbt
MySQL
PostgreSQL
Cassandra
Hadoop
InfluxDB
Gmail
Neo4j
Terraform
Pulumi
JSON
MLflow
Kubeflow
Kafka
FastAPI
Grafana
Prometheus
OpenTelemetry
Avro
Airflow
Time Analytics
Root Cause
SQL
Clickhouse
Dagster
Apache Iceberg
Datafold
Tecton
Feast
DataHub
Delta Lake
Great Expectations
Trino
Amundsen
Starburst
Collibra
Deequ
OpenLineage
Factory
Beam
Availability
Location
Authorized to work in
Website
alishahid.devJob categories
Skills
Interested in hiring Ali?
You can contact Ali and 90k+ other talented remote workers on Himalayas.
Message AliFind your dream job
Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!
