Open to opportunities

Yasir Ch

@yasirch1

Staff data engineer and solutions architect delivering cloud lakehouse and real-time streaming platforms with governance and cost optimization.

United States

Message

What I'm looking for

I’m looking to build cloud lakehouse and real-time data platforms where data quality, HIPAA/GDPR governance, and FinOps optimization are non-negotiable. I want to translate business ambiguity into scalable architectures, mentor teams, and ship reliable CI/CD-enabled data products.

I’m a Staff Data Engineer and Solutions Architect with 10+ years of hands-on experience designing and delivering enterprise-grade data platforms across healthcare, finance, and technology. I lead complex migrations from on-premise Hadoop ecosystems to modern Cloud Lakehouses (AWS, Azure, GCP), architecting high-throughput ETL/ELT pipelines and real-time streaming systems.

I focus on data quality, governance (HIPAA/GDPR), and FinOps optimization to maximize ROI. I’ve designed scalable Data Mesh approaches, real-time CDC pipelines, and lakehouse architectures that support enterprise analytics and AI/ML workloads—backed by strong lineage and metadata frameworks.

In healthcare, I work confidently with standards like HL7, FHIR, and C0-CDA, including clinical data mapping and EHR integrations. I’ve built FHIR-compliant data connectors to normalize HL7/C-CDA messages into standardized clinical datasets mapped to ICD-10, CPT, LOINC, and RxNorm.

I’m also a technical leader who bridges strategy with execution, translating ambiguous business requirements into scalable, future-ready data architectures. I mentor cross-functional engineering teams, drive CI/CD adoption, and deliver analytics-ready, AI/ML-enabled data products that reduce operational cost and accelerate decision-making.

Experience

Work history, roles, and key accomplishments

Current

Lead Data Engineer

Current

Axuall

Oct 2023 - Present (2 years 9 months)

Led migration of 50+ TB healthcare data from on-prem Hadoop to AWS EMR and Snowflake, improving performance and cutting infrastructure costs. Built batch/CDC/real-time pipelines with PySpark, Kafka, and AWS Kinesis processing 10M+ events/day, and implemented data quality checks using Great Expectations and Monte Carlo.

Snowflake PySpark Kafka AWS Kinesis Great Expectations Monte Carlo Terraform Airflow

Senior Data Engineer

Census

May 2019 - Sep 2023 (4 years 4 months)

Designed and rolled out a unified AWS lakehouse platform using S3, Redshift, and Delta Lake to support enterprise analytics and AI/ML workloads. Built Kafka/Spark streaming pipelines for near real-time reporting and improved reliability by implementing dbt automated testing/documentation for 300+ models, along with governance (RBAC, data contracts, lineage, PII masking) across AWS/Azure/GCP.

Redshift Delta Lake Kafka DBT OpenLineage Amundsen RBAC PII Masking S3

Data Engineer

BuyerQuest

Nov 2015 - Apr 2019 (3 years 5 months)

Automated procurement and reporting workflows with Python and AWS Glue, reducing manual processing time by ~90%. Built star/snowflake models and optimized Snowflake queries, Spark jobs, and ETL to cut report generation from 4 hours to under 30 minutes, and created Tableau/Power BI dashboards that reduced ad hoc reporting requests by ~60%.

Python AWS Glue Snowflake Apache Spark ETL ELT Redshift Tableau Power BI