Open to opportunities

Mehmood Ghojaria

@mehmoodghojaria

Senior data engineer specializing in scalable AWS streaming analytics and governed data platforms that power machine learning and BI.

United States

Message

What I'm looking for

I’m looking to build and run scalable, governed cloud data platforms—especially real-time streaming pipelines and production ML/BI infrastructure—while using automation, monitoring, and Infrastructure-as-Code to deliver secure, reliable outcomes.

I’m a Senior Data Engineer with extensive experience building scalable AWS cloud data platforms that support streaming analytics, machine learning, and business intelligence. I design robust ETL/ELT pipelines using Python, SQL, Kafka, Spark, Airflow, AWS Glue, and Snowflake—focused on reliability in production.

Across enterprise environments, I implement real-time streaming architectures with Kafka, Kinesis, Lambda, and distributed processing frameworks. I also build and optimize curated datasets, API-enabled data services, and governed data warehouses—supporting analysts, stakeholders, partners, and decision-making.

I strengthen data trust through governance, validation frameworks, monitoring, and automated alerting systems. I pair that with CI/CD automation and Infrastructure-as-Code (Terraform and related tooling) to deliver secure, compliant, cost-optimized cloud data architectures that consistently drive measurable outcomes.

Experience

Work history, roles, and key accomplishments

Current

Senior Data Engineer

Current

Capital One

Jul 2023 - Present (2 years 11 months)

Designed scalable streaming pipelines using Kafka, Spark, AWS services, and Python to support real-time enterprise financial risk evaluation. Built ETL/ELT workflows, curated datasets, API-enabled services, and Snowflake/Synapse solutions with data quality validation, monitoring, and automated alerting.

Apache Kafka Apache Spark Python AWS S3 Lambda Glue Airflow Snowflake Azure Synapse Terraform Data Quality Alerting

Data Engineer

Kimberly-Clark

May 2022 - Mar 2023 (10 months)

Developed scalable ETL/ELT pipelines with AWS Glue, Python, Spark, Lambda, and SQL for enterprise analytical processing. Built streaming ingestion with Kafka and Kinesis, implemented Airflow/Step Functions orchestration, and delivered Snowflake/Redshift warehousing with monitoring, validation, and CI/CD automation.

AWS Glue Python Apache Spark AWS Lambda Apache Kafka Amazon Kinesis Snowflake Amazon Redshift Apache Airflow Terraform

Data Engineer

Takeda Pharmaceutical Company

Jul 2019 - Apr 2021 (1 year 9 months)

Built cloud-native ETL pipelines using Dataflow, Apache Beam, Python, and SQL for enterprise pharmaceutical analytics. Implemented streaming with Pub/Sub and Dataflow, delivered governed BigQuery/warehouse architectures, and automated monitoring, validation, and alerting for regulatory compliance.

Google Cloud Dataflow Apache Beam Google Pub Sub Python SQL BigQuery Terraform Cloud Build CI CD Pipelines Data Validation

Data Engineer

Visa Inc.

Apr 2018 - Jun 2019 (1 year 2 months)

Designed enterprise streaming pipelines using Kafka, Spark, Airflow, and Hadoop for real-time transaction analytics and monitoring. Developed ETL frameworks for structured and unstructured datasets, delivered Snowflake/Redshift/Hive warehouse solutions, and implemented monitoring and validation using Prometheus, Grafana, and CloudWatch.

Apache Kafka Apache Spark Apache Airflow Hadoop AWS Lambda Snowflake Amazon Redshift Prometheus Grafana Cloudwatch

Big Data Engineer

ACT Fibernet

Jan 2016 - Mar 2018 (2 years 2 months)

Developed Hadoop and Spark processing pipelines for large-scale distributed analytics and operational reporting. Implemented streaming ingestion with Kafka and Spark Streaming, built ETL workflows with Hive/Pig/PySpark/SQL, and automated orchestration with Airflow and Oozie including schema optimization and data quality frameworks.

Hadoop Apache Spark Apache Kafka Hive Pig Pyspark Apache Airflow Oozie Hive Partitioning & Bucketing Data Validation

Hadoop Developer

Chevron

Sep 2014 - Dec 2015 (1 year 3 months)

Built Hadoop-based ETL processing frameworks using MapReduce, Spark, Hive, and SQL for enterprise analytical and operational reporting. Implemented ingestion with Sqoop/Flume, orchestrated pipelines with Oozie and scripts, and delivered optimized Hive schemas with data quality validation and monitoring.