MICHAEL HUSS
@michaelhuss
Senior Data Architect specializing in HIPAA-compliant data platforms and real-time/batch pipelines for AI analytics.
What I'm looking for
I’m a Senior Data Architect and Senior Data Engineer with 7+ years designing and scaling enterprise-grade data platforms across healthcare, e-commerce, and digital health. I build high-impact data products using Python, Java, R, SQL, Spark, and multi-cloud ecosystems (AWS, Azure, GCP), with a strong emphasis on reliability, cost optimization, and data governance.
Most recently, at Highmark Health, I architected an Enterprise Clinical Data Fabric integrating EHR, claims, and real-time patient telemetry with Apache Kafka, AWS S3, Snowflake, and Databricks—processing 10B+ records/month and cutting data latency from batch to near real-time. I’ve also led medallion architecture (Delta Lake) improvements (3x query performance), productionized ML-ready feature pipelines (35% less training time), and implemented enterprise-grade governance (RBAC, data masking, lineage, audit logging) to ensure HIPAA compliance.
Experience
Work history, roles, and key accomplishments
Architected an enterprise clinical data fabric integrating EHR, claims, and real-time telemetry, processing 10B+ records/month and reducing latency to near real time. Implemented a Delta Lake medallion architecture for 3x faster analytics/ML queries and cut Snowflake compute costs by 40% while enforcing HIPAA-compliant governance (RBAC, masking, lineage, audit logs).
Led development of a real-time personalization engine ingesting and processing 5M+ user events/hour using Kafka, Spark Structured Streaming, and AWS Kinesis, driving an 18% conversion lift. Built scalable ELT pipelines with Airflow, dbt, and Snowflake, migrated legacy ETL to modular dbt for 2x faster deployments, and improved observability to reduce MTTD/MTTR by 50+%.
Engineered a metabolic health intelligence platform ingesting IoT health signals via Kafka and GCP Pub/Sub, processing billions of events/month. Built GCP pipelines (Dataflow, BigQuery, Cloud Storage) and feature engineering with PySpark to improve chronic disease prediction accuracy by 25%, while reducing data inconsistencies by 30% through validation/anomaly detection and cutting release cycle t
Built an imaging analytics pipeline with PySpark, AWS EMR, and S3 for radiology imaging metadata, reducing processing time by 45%. Developed backend services and APIs using Java (Spring Boot) and Python, and improved production reliability with logging, monitoring, and alerting.
Developed a healthcare data lake platform with AWS S3, Glue, and Redshift, integrating multi-source datasets (claims, EMR, lab results) and processing terabytes daily via batch and streaming pipelines. Optimized Redshift performance with distribution/sort keys and query tuning to reduce execution time by 35%, and implemented data quality and lineage frameworks to improve trust in analytics outputs
Education
Degrees, certifications, and relevant coursework
University of Central Florida
Bachelor of Computer Science, Computer Science
2014 - 2018
Bachelor of Computer Science from the University of Central Florida (2014–2018).
Tech stack
Software and tools used professionally
Availability
Location
Authorized to work in
Job categories
Skills
Interested in hiring MICHAEL?
You can contact MICHAEL and 90k+ other talented remote workers on Himalayas.
Message MICHAELFind your dream job
Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!
