John GitauJG
Open to opportunities

John Gitau

@johngitau

Skilled Data Engineer passionate about building high-throughput data pipelines.

Kenya

What I'm looking for

I am looking for a role that fosters innovation and collaboration, where I can leverage my data engineering skills to drive impactful insights and contribute to meaningful projects.

I am a skilled Data Engineer with a passion for architecting and building large-scale, high-throughput data pipelines. My experience includes designing batch and real-time systems, optimizing data retrieval and storage, and delivering intuitive analytics tools that drive business value. I have a proven track record of implementing Apache Spark-based ETL pipelines that process large-scale public health datasets, significantly reducing runtime and enhancing data transformation efficiency.

In my current role at Global Programs for Research and Training, I engineered a data lake architecture for storing and retrieving anonymized patient data, which laid the groundwork for future AWS S3 integration. I have also automated data validation processes, ensuring compliance with data governance standards. My adaptability across different programming languages, particularly Python and Scala, has allowed me to contribute effectively to various projects, streamlining deployment pipelines and enhancing overall data engineering practices.

Experience

Work history, roles, and key accomplishments

GU
Current

Data Engineer

Global Programs for Research and Training Affiliate of The U

Jul 2022 - Present (2 years 10 months)

Designed and implemented Apache Spark-based ETL pipelines for large-scale public health datasets, optimizing job runtime and enabling near-real-time insights. Engineered a data lake architecture using S3-compatible storage and automated data validation, reducing errors. Contributed to Scala-based Spark jobs and streamlined deployment pipelines by integrating automated testing.

GU

Mid-Level Business Intelligence Developer

Global Programs for Research and Training Affiliate of the U

Jul 2021 - Jun 2022 (11 months)

Authored complex SQL-based ETL stored procedures to flatten multi-source surveillance tables, significantly improving query performance. Pioneered the use of Apache Spark with Airflow for automated daily data ingestion, achieving sub-hourly SLAs. Assisted in proof-of-concept migration to AWS Glue and developed reusable Python modules for data validation and transformation.

GT

Internship

Global Programs for Research and Training

Apr 2021 - Jun 2021 (2 months)

Conducted routine Moodle data analysis for over 5000 users and served as a Moodle learning management system administrator. Wrote ETL scripts using an in-house SQL tool to load data into analysis databases. Developed a Power BI dashboard to track course uptake and provided technical assistance on the learning management system.

Education

Degrees, certifications, and relevant coursework

KU

KCA University

Bachelor of Science, Information Technology

2017 - 2021

Studied Information Technology at KCA University. Completed the program from September 2017 to April 2021.

Interested in hiring John?

You can contact John and 90k+ other talented remote workers on Himalayas.

Message John

People also viewed

View all talent

Find your dream job

Sign up now and join over 85,000 remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan
John Gitau - Data Engineer - Global Programs for Research and Training Affiliate of The U | Himalayas