Haz Khalid
@hazkhalid1
Principal Data Engineer building scalable, cost-efficient cloud lakehouse platforms and real-time streaming systems.
What I'm looking for
I’m a Principal Data Engineer with 10+ years of experience building and scaling large-scale (TB–PB) data platforms across Fintech, E-Commerce, and SaaS. I specialize in Databricks Lakehouse architecture (Delta Medallion) and advanced PySpark optimization, consistently delivering 40%+ performance improvements.
At Slickdeals (Jan 2022 – Present), I led org-wide data platform strategy—processing 3–5TB+ daily—to enable real-time personalization for 12M+ MAUs. I defined enterprise standards using Medallion + Data Contracts + Data Quality SLAs, reducing data quality incidents by 90%+ while improving scalability, reliability, and engineering efficiency. I also redesign 50+ Airflow DAGs to strengthen SLAs and observability, improving reliability by 60%+ at scale.
I architected real-time streaming platforms using Kafka and Spark (500K+ events/hour), cutting latency from hours to seconds, and I drive multi-cloud implementations across AWS, GCP, and Azure. I’ve built governance and compliance with Unity Catalog (RBAC, PII masking, data lineage) across 200+ datasets, and mentored engineers to achieve a 70% reduction in deployment failures. Earlier roles strengthened my foundation in PySpark/ELT pipelines, dbt testing, and warehouse engineering—shaping how I deliver durable, cost-aware data products.
Experience
Work history, roles, and key accomplishments
Principal Data Engineer
Slickdeals
Jan 2022 - Present (4 years 3 months)
Led org-wide Databricks Lakehouse data platform strategy processing 3–5TB+ daily to power real-time personalization for 12M+ MAUs. Defined architecture standards (Medallion + data contracts + quality SLAs), redesigning 50+ Airflow DAGs to improve reliability 60%+ and reduce data quality incidents by 90%+.
Senior Data Engineer
Fivetran
Mar 2020 - Dec 2021 (1 year 9 months)
Designed and maintained PySpark pipelines on Databricks handling 2TB daily sync workloads across 150+ connectors. Implemented Delta Lake optimizations to reduce pipeline failures 65%, built Kafka/Spark streaming to cut latency from hours to minutes, and delivered 40% cost savings while improving data contract stability with dbt testing.
Data Engineer
Mozart Data
Jun 2017 - Feb 2020 (2 years 8 months)
Engineered Python and SQL ETL pipelines ingesting 50M+ records/day from 20+ sources (REST APIs, SFTP, Oracle, SQL Server) into Redshift and BigQuery. Built star/snowflake models with SCD Type 1 & 2 for regulatory reporting, and improved scheduling efficiency 35% and reporting query performance 75% using Airflow orchestration and SQL tuning.
Cloud and ELT Specialist
Cognizant
Jan 2015 - May 2017 (2 years 4 months)
Built and maintained SQL/Python ETL pipelines loading Oracle and SQL Server data into enterprise warehouses supporting 10,000+ daily users. Reduced average incident resolution time 55% via root-cause analysis and proactive validation, and improved migration reconciliation accuracy to 98.8% through schema mapping and profiling.
Education
Degrees, certifications, and relevant coursework
Haz hasn't added their education
Don't worry, there are 90k+ talented remote workers on Himalayas
Tech stack
Software and tools used professionally
Airbyte
Fivetran
Mozart Data
Azure Synapse
Apache Spark
Apache Flink
AWS Step Functions
GitHub
Kubernetes
Jenkins
GitHub Actions
PySpark
dbt
Gmail
Databricks
Terraform
MLflow
Kafka
PagerDuty
Grafana
Airflow
Apache Beam
Root Cause
Google BigQuery
SQL
Dagster
Apache Iceberg
Pinecone
Monte Carlo
Delta Lake
OpenAI API
Great Expectations
Apache Hudi
Trunk
Bash
Transform
pgvector
Column
Unity Catalog
Beam
Jan
Availability
Location
Authorized to work in
Job categories
Skills
Interested in hiring Haz?
You can contact Haz and 90k+ other talented remote workers on Himalayas.
Message HazFind your dream job
Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!
