Open to opportunities

Keval Dainik

@kevaldainik

Message

Senior Data Engineer specializing in scalable ETL and real-time analytics across GCP and AWS.

Canada

Message

What I'm looking for

I’m looking to build and scale reliable ETL and real-time data platforms on GCP/AWS, optimize Spark/Hive/SQL performance, strengthen security and data quality, and partner with data science teams to productionize analytics and AI-driven use cases.

I’m a Senior Data Engineer with over 6+ years of IT experience delivering end-to-end data platforms across diverse industries. I’ve worked hands-on with Cloudera and Hortonworks, and I’m proficient across the Hadoop ecosystem (Spark, MapReduce, Hive, Kafka, HBase, Impala) with cluster management via Ambari.

At Deloitte, I designed and implemented scalable ETL pipelines using Scala with Apache Spark on Dataproc, and built real-time and batch processing with Cloud Dataflow, Apache Beam, Cloud Pub/Sub, and Cloud Composer. I led enterprise workflow migration from Teradata to GCP, refactoring legacy stored procedures into modular BigQuery SQL tuned with partitioning, clustering, and materialized views—while also implementing data validation, restart capabilities, and security controls (including row-level security).

Earlier, at Ford and Capgemini, I implemented AWS-based solutions (EC2, S3, RDS, VPC, EMR), optimized MapReduce and ETL pipelines, and supported data warehouse development with SQL Server/SSIS and automation for reliable data operations. I enjoy troubleshooting with Root Cause Analysis, collaborating through Agile sprints, and turning complex data work into dependable, production-ready pipelines.

Experience

Work history, roles, and key accomplishments

Current

Senior Data Engineer

Current

Deloitte

Apr 2023 - Present (3 years 3 months)

Designed and implemented scalable ETL pipelines with Scala/Spark on GCP (Dataproc, Dataflow, Pub/Sub) and migrated enterprise workflows from Teradata to BigQuery. Refactored stored procedures into partitioned/clustered BigQuery using materialized views, improving product model performance by 50% and building secure, role-based data access plus Looker dashboards.

Scala Apache Spark Pyspark Google BigQuery Dataflow Pub Sub Cloud Composer Cloud Functions

Data Engineer

Ford

Nov 2021 - Apr 2023 (1 year 5 months)

Built AWS-based ETL and data pipelines using EC2, S3, RDS, EMR, and Lambda, integrating AWS sources and APIs into Redshift and HDFS for downstream analytics. Automated infrastructure with Terraform and CI/CD, and improved processing reliability using SnapLogic and Python-based Spark/MapReduce jobs.

AWS EC2 AWS S3 AWS RDS AWS Lambda Terraform Snaplogic Pyspark Apache Flume

Data Warehouse Developer

Capgemini

Apr 2020 - Nov 2021 (1 year 7 months)

Created and optimized SQL Server database objects and stored procedures to support reporting and application performance, including automated maintenance routines with SSIS. Managed security roles and imports from multiple sources into centralized SQL Server, and led physical-to-virtual/virtual-to-virtual server migrations with follow-up stability monitoring.

SQL Server T SQL SSIS Stored Procedures Database Security VBA WMI Virtualization