Open to opportunities

Rabina Lama

@rabinalama1

Senior Data Engineer modernizing cloud-native lakehouses and streaming platforms to deliver governed, near real-time analytics.

United States

Message

What I'm looking for

I’m looking for a senior role building governed lakehouse and streaming data platforms on AWS/Azure/GCP—leading migrations, improving performance, and partnering with analytics and compliance to deliver reliable, audit-ready datasets.

I’m a Senior Data Engineer with 8+ years of experience designing, modernizing, and operating enterprise-scale cloud-native data platforms across insurance, healthcare, and financial services. I specialize in scalable Lakehouse architectures and distributed data processing frameworks using AWS, Azure, and GCP.

In my roles, I’ve built high-performance Spark and PySpark pipelines for large-scale batch and real-time ingestion, processing multi-terabyte datasets with optimized partitioning and workload tuning. I’ve also architected secure, governed data lakes and warehouses using Redshift, Synapse Analytics, BigQuery, and Delta Lake to deliver analytics-ready datasets for underwriting, actuarial modeling, clinical research, regulatory compliance, and executive reporting.

I bring hands-on experience implementing CDC-driven streaming pipelines with Kafka and Event Hubs to reduce latency and enable near real-time decision-making. I’m also strong in ELT/ETL design, dimensional modeling, and transformation frameworks using Airflow orchestration and dbt to improve data quality, lineage, and governance.

I lead modernization and migration initiatives that reduce operational costs, improve reliability, and strengthen cross-functional delivery with analytics, actuarial, compliance, and business stakeholders. I focus on infrastructure automation with Terraform, CI/CD with Jenkins and Azure DevOps, and monitoring with CloudWatch/Azure Monitor to proactively detect failures and speed up incident response.

Experience

Work history, roles, and key accomplishments

Current

Senior Data Engineer

Current

JPMorgan Chase

Feb 2024 - Present (2 years 2 months)

Led modernization of the data ecosystem by migrating underwriting, claims, and policy servicing workflows to AWS, reducing legacy infrastructure dependency by 80% and saving $1.2M annually. Built AWS lakehouse and CDC-driven ingestion pipelines with Spark and Kafka, cutting reporting latency from 48 hours to under 3 hours and improving analytics and reporting performance by 35%.

AWS S3 AWS Glue Amazon EMR Amazon Redshift Spark Pyspark Kafka CDC DBT Terraform

Senior Data Engineer

Johnson & Johnson

Mar 2022 - Jan 2024 (1 year 10 months)

Architected an Azure lakehouse platform using ADLS Gen2, Databricks, and Synapse to deliver secure clinical and pharmaceutical datasets at scale. Optimized Delta Lake storage with partitioning and Z-ORDER, reducing query latency by 45%, and orchestrated batch/stream pipelines with Azure Data Factory and Airflow to cut operational overhead by 30%.

Azure ADLS Gen2 Databricks Azure Synapse Delta Lake Spark Structured Streaming Azure Event Hubs Azure Data Factory Apache Airflow Azure DevOps Terraform

Data Engineer

UnitedHealth Group (Optum)

Jan 2021 - Feb 2022 (1 year 1 month)

Designed scalable healthcare data pipelines for claims processing, pharmacy benefits, and population health analytics across millions of member records. Built Databricks and Spark ingestion with Event Hubs, improved Synapse performance by 30%, reduced data discrepancies by 25%, and lowered manual processing effort by 35% through orchestrated batch/stream workflows.

Databricks Azure Event Hubs ADLS Spark Apache Airflow Terraform Python SQL

Backend Developer

Allstate Insurance Company

Jun 2019 - Dec 2021 (2 years 6 months)

Built Python backend services and REST APIs to automate ingestion of policy, billing, and claims data into centralized analytics platforms, improving accessibility for reporting teams. Optimized SQL extraction and ETL logic to reduce batch processing latency by 30% and improved pipeline stability by reducing recurring data load failures by 20%.