John Gitau
@johngitau
Skilled Data Engineer passionate about building high-throughput data pipelines.
What I'm looking for
I am a skilled Data Engineer with a passion for architecting and building large-scale, high-throughput data pipelines. My experience includes designing batch and real-time systems, optimizing data retrieval and storage, and delivering intuitive analytics tools that drive business value. I have a proven track record of implementing Apache Spark-based ETL pipelines that process large-scale public health datasets, significantly reducing runtime and enhancing data transformation efficiency.
In my current role at Global Programs for Research and Training, I engineered a data lake architecture for storing and retrieving anonymized patient data, which laid the groundwork for future AWS S3 integration. I have also automated data validation processes, ensuring compliance with data governance standards. My adaptability across different programming languages, particularly Python and Scala, has allowed me to contribute effectively to various projects, streamlining deployment pipelines and enhancing overall data engineering practices.
Experience
Work history, roles, and key accomplishments
Data Engineer
Global Programs for Research and Training Affiliate of The U
Jul 2022 - Present (2 years 10 months)
Designed and implemented Apache Spark-based ETL pipelines for large-scale public health datasets, optimizing job runtime and enabling near-real-time insights. Engineered a data lake architecture using S3-compatible storage and automated data validation, reducing errors. Contributed to Scala-based Spark jobs and streamlined deployment pipelines by integrating automated testing.
Mid-Level Business Intelligence Developer
Global Programs for Research and Training Affiliate of the U
Jul 2021 - Jun 2022 (11 months)
Authored complex SQL-based ETL stored procedures to flatten multi-source surveillance tables, significantly improving query performance. Pioneered the use of Apache Spark with Airflow for automated daily data ingestion, achieving sub-hourly SLAs. Assisted in proof-of-concept migration to AWS Glue and developed reusable Python modules for data validation and transformation.
Internship
Global Programs for Research and Training
Apr 2021 - Jun 2021 (2 months)
Conducted routine Moodle data analysis for over 5000 users and served as a Moodle learning management system administrator. Wrote ETL scripts using an in-house SQL tool to load data into analysis databases. Developed a Power BI dashboard to track course uptake and provided technical assistance on the learning management system.
Education
Degrees, certifications, and relevant coursework
KCA University
Bachelor of Science, Information Technology
2017 - 2021
Studied Information Technology at KCA University. Completed the program from September 2017 to April 2021.
Tech stack
Software and tools used professionally
Availability
Location
Authorized to work in
Job categories
Interested in hiring John?
You can contact John and 90k+ other talented remote workers on Himalayas.
Message JohnFind your dream job
Sign up now and join over 85,000 remote workers who receive personalized job alerts, curated job matches, and more for free!
