Noah Anwar
@noahanwar
Principal Data Engineer building streaming-first, lakehouse data platforms for real-time analytics and ML.
What I'm looking for
I’m a Principal/Senior Data Engineer with 11+ years of experience building and scaling cloud-native, data-intensive platforms across AWS, Azure, and GCP. I focus on streaming-first architectures, real-time data pipelines, and modern Lakehouse solutions using Apache Spark, Kafka, Flink, and Snowflake.
In my most recent work, I architected a streaming-first platform using Apache Kafka, Kafka Connect, and Apache Flink, then unified streaming and historical data with a Lakehouse approach (Delta Lake and Snowflake). I developed end-to-end machine learning pipelines in Python with Scikit-learn, TensorFlow, and MLflow, delivering demand forecasting models that reduced stock-outs by 20%.
I also lead with DataOps and governance—establishing data quality validation, lineage tracking, observability, and CI/CD with infrastructure-as-code. I enjoy translating complex business requirements into scalable, efficient, high-performance solutions, and mentoring engineers while aligning the data platform strategy to business objectives and key performance indicators.
Experience
Work history, roles, and key accomplishments
Principal Data Integration Engineer
Falkonry
Aug 2021 - Present (4 years 8 months)
Architected a streaming-first data platform using Kafka, Kafka Connect, and Flink, and built a Lakehouse on Delta Lake and Snowflake (bronze/silver/gold) to unify real-time and historical analytics. Developed Python ML pipelines with Scikit-learn, TensorFlow, and MLflow, delivering demand forecasting models that reduced stock-outs by 20% while deploying cloud-native infrastructure with Terraform.
Data Engineering Team Lead
Current Health
May 2018 - Jul 2021 (3 years 2 months)
Led a real-time health data platform ingesting streaming IoT and wearable data, implementing event-driven pipelines with Kafka and Spark Structured Streaming for low-latency, fault-tolerant processing. Built scalable GCP-based lake and analytics foundations (GCS, Dataflow, BigQuery) and introduced DataOps practices (CI/CD, automated testing) to improve deployment efficiency and reduce failures.
Data Engineer
Seeq Corporation
Feb 2015 - Apr 2018 (3 years 2 months)
Built real-time ingestion and processing systems with Kafka and Flink to deliver high-throughput, low-latency data pipelines, and engineered scalable storage and ETL workflows using Parquet and Python/Spark. Implemented OCR/document processing pipelines and established data quality, validation, and monitoring, deploying reproducible environments via Terraform and AWS CloudFormation.
Education
Degrees, certifications, and relevant coursework
University of the Punjab
Bachelor of Science, Computer Science
2010 - 2014
Grade: 3.7
Tech stack
Software and tools used professionally
Google Tag Manager
Apache Spark
Apache Flink
Talend
Microsoft Azure
Google Cloud Platform
GitLab
Kubernetes
Cloudflare
Jenkins
CircleCI
GitLab CI
Jupyter
dbt
MySQL
PostgreSQL
MongoDB
SQLite
Cassandra
Hadoop
InfluxDB
HBase
Gmail
Node.js
Google Analytics
Databricks
Redis
Terraform
AWS CloudFormation
Pulumi
React
JavaScript
Python
HTML5
Java
CSS 3
TensorFlow
PyTorch
MLflow
scikit-learn
Kafka
Apache NiFi
Ansible
Kafka Streams
Apache Storm
TypeScript
Docker
Airflow
Time Analytics
TimescaleDB
SQL
Azure Blob Storage
Delta Lake
Bash
Transform
Factory
Unify
Availability
Location
Authorized to work in
Salary expectations
Job categories
Skills
Interested in hiring Noah?
You can contact Noah and 90k+ other talented remote workers on Himalayas.
Message NoahFind your dream job
Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!
