Open to opportunities

Merry Shah

@merryshah

Message

Lead Data Engineer crafting real-time, cloud-native data pipelines and predictive analytics for data-driven decisions.

United States

Message

What I'm looking for

I’m looking to lead cloud-native data engineering—real-time pipelines, strong governance and automated validation, and predictive/ML-enabled analytics—working with teams that value scalable architecture, reliability, and measurable data quality outcomes.

I’m a Lead Data Engineer with 9+ years of experience designing, developing, and optimizing data pipelines, cloud architectures, and analytics solutions. I focus on scalable ETL workflows, cloud data warehouses, and real-time processing that turn events into actionable insight.

In my current role, I architected high-performance real-time pipelines using Apache Kafka, Apache Flink, and Apache Spark Streaming—handling 100M+ daily events with sub-second latency. I’ve migrated on-prem warehouses to AWS Redshift and Snowflake, improving query performance by 40% and reducing infrastructure costs by 25%, while also cutting processing latency and pipeline time with incremental loads, partitioning, and tuning.

I’m especially strong in building reliable data platforms with orchestration, validation, and governance. I use Apache Airflow, AWS Glue, and automated data validation (including EvidentlyAI and Prometheus) to improve data quality and achieve 99.9% pipeline uptime, alongside HIPAA-compliant healthcare pipeline work.

I also lead with a product mindset—integrating machine learning and predictive analytics (Python, Scikit-learn, XGBoost, Spark MLlib) into data pipelines and delivering interactive BI dashboards with Tableau and Power BI. I enjoy mentoring teams and creating maintainable systems that support data maturity, governance lineage, and strategic growth.

Experience

Work history, roles, and key accomplishments

Current

Lead Data Engineer

Current

Wavicle Solutions

Jun 2023 - Present (3 years 1 month)

Designed and implemented real-time Kafka/Flink/Spark Streaming pipelines handling 100M+ daily events with sub-second latency. Migrated warehouses to AWS Redshift and Snowflake, improving query performance by 40% and reducing infrastructure costs by 25%, while building HIPAA-compliant data pipelines and achieving 99.9% pipeline uptime through automated validation.

Apache Kafka Apache Flink AWS Glue Pyspark AWS RedShift Snowflake Apache Airflow Tableau Data Validation

Senior Data Engineer

Datavail

Sep 2019 - May 2023 (3 years 8 months)

Designed and developed Spark/Python ETL pipelines that reduced processing time by 30% and migrated legacy systems to AWS, improving processing speeds by 35% while cutting operational costs by 20%. Implemented secure RBAC and governance for data privacy and automated Trino cluster monitoring with Prometheus/Grafana to minimize downtime by 15%.

Apache Spark Pyspark AWS Snowflake AWS RedShift Prometheus Grafana RBAC Data Governance

Data Engineer

Accenture

Aug 2017 - Aug 2019 (2 years)

Assisted in migrating multi-terabyte relational workloads to Hadoop and AWS Redshift using Sqoop and Flume, and optimized PostgreSQL/MySQL queries to improve retrieval times by 20%. Built batch processing and automated ETL orchestration/monitoring with Spark on AWS EMR plus Airflow and AWS Glue, reducing manual intervention by 40%.

Sqoop Apache Flume Apache Spark AWS Glue Apache Airflow PostgreSQL MySQL ETL