Open to opportunities

Haz Khalid

@hazkhalid1

Message

Principal Data Engineer building scalable, cost-efficient cloud lakehouse platforms and real-time streaming systems.

United States

Message

What I'm looking for

I want to lead cloud data platform and lakehouse initiatives—designing reliable standards, real-time streaming, and cost-optimized pipelines—while partnering cross-functionally to turn business needs into high-impact data products.

I’m a Principal Data Engineer with 10+ years of experience building and scaling large-scale (TB–PB) data platforms across Fintech, E-Commerce, and SaaS. I specialize in Databricks Lakehouse architecture (Delta Medallion) and advanced PySpark optimization, consistently delivering 40%+ performance improvements.

At Slickdeals (Jan 2022 – Present), I led org-wide data platform strategy—processing 3–5TB+ daily—to enable real-time personalization for 12M+ MAUs. I defined enterprise standards using Medallion + Data Contracts + Data Quality SLAs, reducing data quality incidents by 90%+ while improving scalability, reliability, and engineering efficiency. I also redesign 50+ Airflow DAGs to strengthen SLAs and observability, improving reliability by 60%+ at scale.

I architected real-time streaming platforms using Kafka and Spark (500K+ events/hour), cutting latency from hours to seconds, and I drive multi-cloud implementations across AWS, GCP, and Azure. I’ve built governance and compliance with Unity Catalog (RBAC, PII masking, data lineage) across 200+ datasets, and mentored engineers to achieve a 70% reduction in deployment failures. Earlier roles strengthened my foundation in PySpark/ELT pipelines, dbt testing, and warehouse engineering—shaping how I deliver durable, cost-aware data products.

Experience

Work history, roles, and key accomplishments

Current

Principal Data Engineer

Current

Slickdeals

Jan 2022 - Present (4 years 4 months)

Led org-wide Databricks Lakehouse data platform strategy processing 3–5TB+ daily to power real-time personalization for 12M+ MAUs. Defined architecture standards (Medallion + data contracts + quality SLAs), redesigning 50+ Airflow DAGs to improve reliability 60%+ and reduce data quality incidents by 90%+.

Apache Kafka Apache Airflow Unity Catalog Data Contract Design

Senior Data Engineer

Fivetran

Mar 2020 - Dec 2021 (1 year 9 months)

Designed and maintained PySpark pipelines on Databricks handling 2TB daily sync workloads across 150+ connectors. Implemented Delta Lake optimizations to reduce pipeline failures 65%, built Kafka/Spark streaming to cut latency from hours to minutes, and delivered 40% cost savings while improving data contract stability with dbt testing.

Pyspark Databricks Delta Lake Apache Kafka Spark Structured Streaming DBT

Data Engineer

Mozart Data

Jun 2017 - Feb 2020 (2 years 8 months)

Engineered Python and SQL ETL pipelines ingesting 50M+ records/day from 20+ sources (REST APIs, SFTP, Oracle, SQL Server) into Redshift and BigQuery. Built star/snowflake models with SCD Type 1 & 2 for regulatory reporting, and improved scheduling efficiency 35% and reporting query performance 75% using Airflow orchestration and SQL tuning.

Python SQL ETL Apache Airflow Amazon Redshift Google BigQuery Performance Tuning

Cloud and ELT Specialist

Cognizant

Jan 2015 - May 2017 (2 years 4 months)

Built and maintained SQL/Python ETL pipelines loading Oracle and SQL Server data into enterprise warehouses supporting 10,000+ daily users. Reduced average incident resolution time 55% via root-cause analysis and proactive validation, and improved migration reconciliation accuracy to 98.8% through schema mapping and profiling.