Anna Ch
@annach
Data Engineer building reliable, high-performance Python/Spark data pipelines in large-scale Hadoop/Spark lakehouses.
What I'm looking for
I’m a Data Engineer with 5 years of experience designing and implementing data platforms at Citi, operating in a large-scale 17 PB Hadoop/Spark data lakehouse. I focus on data quality engineering, pipeline reliability, performance optimization, and backend service development that keeps production work trustworthy.
At the center of my impact is SparkEX, a proprietary Python-based Spark execution framework I’ve developed and extended. It supports 1,500+ production pipelines, and I’ve contributed new data quality modules—including a statistical anomaly detection engine using Statistical Process Control (SPC) and 3-sigma (3σ) anomaly detection to automatically halt critical feeds during upstream data loss.
I also improve stability by resolving Spark performance bottlenecks and debugging failures across metadata-driven ETL frameworks. I’ve led production operational ownership of 1,500+ pipelines—performing attribute-level impact analysis for schema migrations, standardizing UAT validation workflows, and implementing SLA-protective job scheduling logic—so pipelines don’t fail silently or drift out of compliance.
Experience
Work history, roles, and key accomplishments
Data Engineering on a team managing 1,500+ production pipelines across a 17 PB Hadoop data lakehouse. Extended SparkEX, a proprietary Python/Spark execution framework. Key work included Spark performance optimization, forensic SQL debugging, data lineage and impact analysis, semantic schema discovery in undocumented Hive environments, and large-scale data parity validation using Starburst.
Education
Degrees, certifications, and relevant coursework
Queen's University
Bachelor of Computing, Data Distribution & Platform Engineering
2017 - 2021
Grade: GPA 3.71
Activities and societies: Teaching Assistant (Sep 2018–Apr 2021) for CISC 101, 102, 181, 221, 325; facilitated Python labs and virtual lectures and provided real-time code review and debugging support for 50+ students per term.
Bachelor of Computing from Queen’s University (2017–2021) with GPA 3.71 and Dean’s Honour List; received an Excellence Scholarship. Served as a Teaching Assistant for CISC courses, supporting lab sessions and assignment evaluation.
Availability
Location
Authorized to work in
Social media
Job categories
Interested in hiring Anna?
You can contact Anna and 90k+ other talented remote workers on Himalayas.
Message AnnaFind your dream job
Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!
