I'm seeking a Senior Data Engineer role focused on building scalable cloud-based data platforms using Spark, Airflow, DBT, and Databricks. I value teams that prioritize clean architecture, automation, and governance. I'm especially interested in impactful domains like healthcare or finance, and I enjoy mentoring, learning, and taking ownership of end-to-end data solutions.
pemba moktan
@pembamoktan
Senior Data Engineer with 6+ years of experience in cloud, ETL, ML pipelines, and data compliance.
What I'm looking for
I'm a Senior Data Engineer with over 6 years of experience designing and deploying cloud-native data solutions across AWS, Azure, and GCP. My background spans healthcare, insurance, and financial domains, where I’ve led the development of secure, scalable, and high-performance ETL and real-time ML pipelines using tools like Apache Spark, Airflow, Kafka, DBT, and Databricks.
One of my proudest achievements was modernizing a legacy data platform at DaVita, which reduced processing time by 50% and increased SLA compliance by 30%. I’ve also built analytics environments compliant with HIPAA, GDPR, and SOX, enhancing data security and governance across organizations.
Beyond hands-on development, I enjoy mentoring junior engineers, improving CI/CD workflows, and contributing to architectural decisions that align data strategies with business goals. I'm deeply passionate about building systems that are not just technically sound, but also drive real business impact.
Outside of work, I enjoy exploring emerging data tools, contributing to open-source projects, and staying active in the data engineering community.
Experience
Work history, roles, and key accomplishments
Designed and deployed cross-cloud data pipelines across Azure, AWS, and GCP, supporting batch and streaming workloads for advanced analytics, machine learning inference, and real-time business reporting enabling 24/7 availability across multi-region deployments. Engineered Spark-SQL pipelines in Databricks to process diverse data formats (JSON, Parquet, Avro).
● Architected hybrid cloud analytics workflows on Azure, provisioning infrastructure as code using Terraform and YAML to support scalable, governed pipelines across enterprise datasets.
● Modernized legacy Hadoop pipelines by replatforming Scala-based Spark jobs to Azure Databricks using Delta Lake and Medallion Architecture, improving data reliability and developer velocity.
● Gathered business requirements, performed business analysis, and designed various data products.
● Built and automated Spark-based ETL workflows across legacy (Informatica + Hive) and modern (Talend + Snowflake/SQL Server) environments, using Airflow and shell scripting to streamline daily production processes.
Education
Degrees, certifications, and relevant coursework
University of North Texas
Bachelors of Business Administration, Business Analytics
Grade: 3.5
Tech stack
Software and tools used professionally
Snowflake
AWS Glue
Microsoft Azure
GitHub
Azure Kubernetes Service
AWS CodePipeline
Azure Pipelines
MySQL WorkBench
MySQL
PostgreSQL
Microsoft SQL Server
PipelineDB
Databricks
AWS Cloud Development Kit
Kafka Manager
Python
Java
Kafka
Azure Active Directory
Azure Database for PostgreSQL
Google Cloud Dataflow
AWS Lambda
Azure SQL Database
Kafka Streams
Docker
Data Warehouses by Freshpaint
Cloud AI Platform Pipelines
Amazon Web Services (AWS)
AWS Database Migration Service
SQL
Availability
Location
Authorized to work in
Salary expectations
Social media
Job categories
Interested in hiring pemba?
You can contact pemba and 90k+ other talented remote workers on Himalayas.
Message pembaFind your dream job
Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!
