Merry Shah
@merryshah
Lead Data Engineer crafting real-time, cloud-native data pipelines and predictive analytics for data-driven decisions.
What I'm looking for
I’m a Lead Data Engineer with 9+ years of experience designing, developing, and optimizing data pipelines, cloud architectures, and analytics solutions. I focus on scalable ETL workflows, cloud data warehouses, and real-time processing that turn events into actionable insight.
In my current role, I architected high-performance real-time pipelines using Apache Kafka, Apache Flink, and Apache Spark Streaming—handling 100M+ daily events with sub-second latency. I’ve migrated on-prem warehouses to AWS Redshift and Snowflake, improving query performance by 40% and reducing infrastructure costs by 25%, while also cutting processing latency and pipeline time with incremental loads, partitioning, and tuning.
I’m especially strong in building reliable data platforms with orchestration, validation, and governance. I use Apache Airflow, AWS Glue, and automated data validation (including EvidentlyAI and Prometheus) to improve data quality and achieve 99.9% pipeline uptime, alongside HIPAA-compliant healthcare pipeline work.
I also lead with a product mindset—integrating machine learning and predictive analytics (Python, Scikit-learn, XGBoost, Spark MLlib) into data pipelines and delivering interactive BI dashboards with Tableau and Power BI. I enjoy mentoring teams and creating maintainable systems that support data maturity, governance lineage, and strategic growth.
Experience
Work history, roles, and key accomplishments
Lead Data Engineer
Wavicle Solutions
Jun 2023 - Present (3 years)
Designed and implemented real-time Kafka/Flink/Spark Streaming pipelines handling 100M+ daily events with sub-second latency. Migrated warehouses to AWS Redshift and Snowflake, improving query performance by 40% and reducing infrastructure costs by 25%, while building HIPAA-compliant data pipelines and achieving 99.9% pipeline uptime through automated validation.
Senior Data Engineer
Datavail
Sep 2019 - May 2023 (3 years 8 months)
Designed and developed Spark/Python ETL pipelines that reduced processing time by 30% and migrated legacy systems to AWS, improving processing speeds by 35% while cutting operational costs by 20%. Implemented secure RBAC and governance for data privacy and automated Trino cluster monitoring with Prometheus/Grafana to minimize downtime by 15%.
Assisted in migrating multi-terabyte relational workloads to Hadoop and AWS Redshift using Sqoop and Flume, and optimized PostgreSQL/MySQL queries to improve retrieval times by 20%. Built batch processing and automated ETL orchestration/monitoring with Spark on AWS EMR plus Airflow and AWS Glue, reducing manual intervention by 40%.
Education
Degrees, certifications, and relevant coursework
Merry hasn't added their education
Don't worry, there are 90k+ talented remote workers on Himalayas
Tech stack
Software and tools used professionally
Fivetran
Apache Spark
AWS Glue
Apache Flink
Talend
D3.js
GitHub
GitLab
Bitbucket
Kubernetes
Jenkins
CircleCI
GitHub Actions
NumPy
Pandas
PySpark
dbt
DB
Sqoop
MySQL
PostgreSQL
MongoDB
Cassandra
Hadoop
HBase
Vertica
Gmail
Databricks
Terraform
Java
Logstash
TensorFlow
PyTorch
MLflow
scikit-learn
Kafka
Grafana
Kibana
Prometheus
Datadog
Elasticsearch
Avro
Ansible
AWS Lambda
Airflow
Time Analytics
SQL
XGBoost
Hugging Face
LightGBM
Apache Iceberg
Pinecone
Delta Lake
Great Expectations
Trino
Starburst
Bash
Transform
Instantly
Movement
Availability
Location
Authorized to work in
Job categories
Skills
Interested in hiring Merry?
You can contact Merry and 90k+ other talented remote workers on Himalayas.
Message MerryFind your dream job
Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!
