Open to opportunities

MICHAEL HUSS

@michaelhuss

Message

Senior Data Architect specializing in HIPAA-compliant data platforms and real-time/batch pipelines for AI analytics.

United States

Message

What I'm looking for

I’m looking for a role where I can design reliable, cost-optimized, governance-first data platforms—owning real-time and batch pipelines that enable AI-ready analytics at scale, with strong cross-team collaboration.

I’m a Senior Data Architect and Senior Data Engineer with 7+ years designing and scaling enterprise-grade data platforms across healthcare, e-commerce, and digital health. I build high-impact data products using Python, Java, R, SQL, Spark, and multi-cloud ecosystems (AWS, Azure, GCP), with a strong emphasis on reliability, cost optimization, and data governance.

Most recently, at Highmark Health, I architected an Enterprise Clinical Data Fabric integrating EHR, claims, and real-time patient telemetry with Apache Kafka, AWS S3, Snowflake, and Databricks—processing 10B+ records/month and cutting data latency from batch to near real-time. I’ve also led medallion architecture (Delta Lake) improvements (3x query performance), productionized ML-ready feature pipelines (35% less training time), and implemented enterprise-grade governance (RBAC, data masking, lineage, audit logging) to ensure HIPAA compliance.

Experience

Work history, roles, and key accomplishments

Current

Data Architect

Current

Highmark Health

May 2025 - Present (1 year 2 months)

Architected an enterprise clinical data fabric integrating EHR, claims, and real-time telemetry, processing 10B+ records/month and reducing latency to near real time. Implemented a Delta Lake medallion architecture for 3x faster analytics/ML queries and cut Snowflake compute costs by 40% while enforcing HIPAA-compliant governance (RBAC, masking, lineage, audit logs).

Kafka S3 Snowflake Databricks Delta Lake PySpark Airflow AWS Glue Data Governance HIPAA compliance

Senior Data Engineer

Fanatics

Sep 2023 - May 2025 (1 year 8 months)

Led development of a real-time personalization engine ingesting and processing 5M+ user events/hour using Kafka, Spark Structured Streaming, and AWS Kinesis, driving an 18% conversion lift. Built scalable ELT pipelines with Airflow, dbt, and Snowflake, migrated legacy ETL to modular dbt for 2x faster deployments, and improved observability to reduce MTTD/MTTR by 50+%.

Kafka Spark Structured Streaming AWS Kinesis Airflow DBT Snowflake AWS Lambda Datadog SQL Tuning Observability

Senior Data Engineer

Twin Health

Oct 2022 - Aug 2023 (10 months)

Engineered a metabolic health intelligence platform ingesting IoT health signals via Kafka and GCP Pub/Sub, processing billions of events/month. Built GCP pipelines (Dataflow, BigQuery, Cloud Storage) and feature engineering with PySpark to improve chronic disease prediction accuracy by 25%, while reducing data inconsistencies by 30% through validation/anomaly detection and cutting release cycle t

Kafka GCP Google Cloud Dataflow BigQuery Cloud Storage PySpark Python Data Validation Anomaly Detection Docker

Software Engineer

GE Healthcare

Aug 2020 - Oct 2022 (2 years 2 months)

Built an imaging analytics pipeline with PySpark, AWS EMR, and S3 for radiology imaging metadata, reducing processing time by 45%. Developed backend services and APIs using Java (Spring Boot) and Python, and improved production reliability with logging, monitoring, and alerting.

S3 PySpark Java Spring Boot Python Distributed Systems API Development Monitoring

Data Engineer

Komodo Health

Sep 2018 - Aug 2020 (1 year 11 months)

Developed a healthcare data lake platform with AWS S3, Glue, and Redshift, integrating multi-source datasets (claims, EMR, lab results) and processing terabytes daily via batch and streaming pipelines. Optimized Redshift performance with distribution/sort keys and query tuning to reduce execution time by 35%, and implemented data quality and lineage frameworks to improve trust in analytics outputs