Skip to main content
Anna ChAC
Open to opportunities

Anna Ch

@annach

Data Engineer building reliable, high-performance Python/Spark data pipelines in large-scale Hadoop/Spark lakehouses.

Canada
Message

What I'm looking for

I’m looking for a role where I can own end-to-end reliability and data quality in large-scale lakehouse environments—building/testing pipelines, preventing silent failures, optimizing Spark performance, and improving operational health with strong engineering practices.

I’m a Data Engineer with 5 years of experience designing and implementing data platforms at Citi, operating in a large-scale 17 PB Hadoop/Spark data lakehouse. I focus on data quality engineering, pipeline reliability, performance optimization, and backend service development that keeps production work trustworthy.

At the center of my impact is SparkEX, a proprietary Python-based Spark execution framework I’ve developed and extended. It supports 1,500+ production pipelines, and I’ve contributed new data quality modules—including a statistical anomaly detection engine using Statistical Process Control (SPC) and 3-sigma (3σ) anomaly detection to automatically halt critical feeds during upstream data loss.

I also improve stability by resolving Spark performance bottlenecks and debugging failures across metadata-driven ETL frameworks. I’ve led production operational ownership of 1,500+ pipelines—performing attribute-level impact analysis for schema migrations, standardizing UAT validation workflows, and implementing SLA-protective job scheduling logic—so pipelines don’t fail silently or drift out of compliance.

Experience

Work history, roles, and key accomplishments

Citi logoCI

Data Engineer

Jul 2021 - Jan 2026 (4 years 6 months)

Data Engineering on a team managing 1,500+ production pipelines across a 17 PB Hadoop data lakehouse. Extended SparkEX, a proprietary Python/Spark execution framework. Key work included Spark performance optimization, forensic SQL debugging, data lineage and impact analysis, semantic schema discovery in undocumented Hive environments, and large-scale data parity validation using Starburst.

Education

Degrees, certifications, and relevant coursework

Queen's University logoQU

Queen's University

Bachelor of Computing, Data Distribution & Platform Engineering

2017 - 2021

Grade: GPA 3.71

Activities and societies: Teaching Assistant (Sep 2018–Apr 2021) for CISC 101, 102, 181, 221, 325; facilitated Python labs and virtual lectures and provided real-time code review and debugging support for 50+ students per term.

Bachelor of Computing from Queen’s University (2017–2021) with GPA 3.71 and Dean’s Honour List; received an Excellence Scholarship. Served as a Teaching Assistant for CISC courses, supporting lab sessions and assignment evaluation.

Find your dream job

Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan