Open to opportunities

Noah Anwar

@noahanwar

Message

Principal Data Engineer building streaming-first, lakehouse data platforms for real-time analytics and ML.

United States

Message

What I'm looking for

I’m looking to lead streaming-first data platform work—ETL/ELT, Lakehouse, DataOps, and governance—while mentoring engineers and delivering real-time analytics and AI/ML that drive measurable business outcomes.

I’m a Principal/Senior Data Engineer with 11+ years of experience building and scaling cloud-native, data-intensive platforms across AWS, Azure, and GCP. I focus on streaming-first architectures, real-time data pipelines, and modern Lakehouse solutions using Apache Spark, Kafka, Flink, and Snowflake.

In my most recent work, I architected a streaming-first platform using Apache Kafka, Kafka Connect, and Apache Flink, then unified streaming and historical data with a Lakehouse approach (Delta Lake and Snowflake). I developed end-to-end machine learning pipelines in Python with Scikit-learn, TensorFlow, and MLflow, delivering demand forecasting models that reduced stock-outs by 20%.

I also lead with DataOps and governance—establishing data quality validation, lineage tracking, observability, and CI/CD with infrastructure-as-code. I enjoy translating complex business requirements into scalable, efficient, high-performance solutions, and mentoring engineers while aligning the data platform strategy to business objectives and key performance indicators.

Experience

Work history, roles, and key accomplishments

Current

Principal Data Integration Engineer

Current

Falkonry

Aug 2021 - Present (4 years 10 months)

Architected a streaming-first data platform using Kafka, Kafka Connect, and Flink, and built a Lakehouse on Delta Lake and Snowflake (bronze/silver/gold) to unify real-time and historical analytics. Developed Python ML pipelines with Scikit-learn, TensorFlow, and MLflow, delivering demand forecasting models that reduced stock-outs by 20% while deploying cloud-native infrastructure with Terraform.

Apache Kafka Apache Flink Delta Lake Snowflake Terraform AWS S3 Glue EMR Lambda Kinesis Python MLFlow scikit learn

Data Engineering Team Lead

Current Health

May 2018 - Jul 2021 (3 years 2 months)

Led a real-time health data platform ingesting streaming IoT and wearable data, implementing event-driven pipelines with Kafka and Spark Structured Streaming for low-latency, fault-tolerant processing. Built scalable GCP-based lake and analytics foundations (GCS, Dataflow, BigQuery) and introduced DataOps practices (CI/CD, automated testing) to improve deployment efficiency and reduce failures.

Apache Kafka Spark Structured Streaming Dataflow BigQuery Apache Airflow Apache NiFi Python Data Modeling DataOps

Data Engineer

Seeq Corporation

Feb 2015 - Apr 2018 (3 years 2 months)

Built real-time ingestion and processing systems with Kafka and Flink to deliver high-throughput, low-latency data pipelines, and engineered scalable storage and ETL workflows using Parquet and Python/Spark. Implemented OCR/document processing pipelines and established data quality, validation, and monitoring, deploying reproducible environments via Terraform and AWS CloudFormation.

Apache Kafka Apache Flink Apache Spark Python Parquet Terraform AWS CloudFormation Data Quality