Open to opportunities

kalveen joseph

@kalveenjoseph

Message

Senior Big Data Engineer architecting petabyte-scale batch and real-time platforms with 99.99% uptime.

United States

Message

What I'm looking for

I’m looking to own end-to-end big-data platforms at petabyte scale—building dependable batch and real-time pipelines, lakehouse modernization, and data quality/governance teams can trust.

I’ve built and led big-data ecosystems for nine years, focused on turning complex pipelines into reliable, high-performance platforms. At Oznolo, I architected and owned a healthcare data platform that processes 28B events daily across 38 hospital network clients, runs on a 2,400-node Spark cluster, and delivers 4.7PB of data with 99.99% uptime.

I’m especially strong at modernizing lakehouse and streaming architectures—redesigning batch from MapReduce to Spark 3.4 to cut nightly ETL from 14 hours to 47 minutes, and building real-time streaming with Kafka and Apache Flink for sub-60-second sepsis and deterioration alerts. I’ve driven measurable wins across governance, quality, and cost: migrating HDFS to a Delta Lake lakehouse on S3 (68% storage cost reduction, 12x query performance lift), enforcing automated data quality at scale with Great Expectations (18,000 contracts nightly), and establishing HIPAA-compliant governance with Unity Catalog to support 280 analysts and 45 ML engineers.

Experience

Work history, roles, and key accomplishments

Current

Senior Big Data Engineer

Current

Oznolo

Mar 2022 - Present (4 years 2 months)

Architected and owned a healthcare big data platform processing 28B clinical and claims events daily across 38 hospital clients, managing a 2,400-node AWS EMR Spark cluster serving 4.7PB with 99.99% uptime. Redesigned batch and real-time pipelines (MapReduce to Spark; Kafka/Flink) and migrated HDFS to Delta Lake on S3, cutting ETL runtime from 14 hours to 47 minutes and reducing storage costs by 6

Apache Spark Apache Kafka Apache Flink Delta Lake S3 Apache Airflow Great Expectations HL7 FHIR

Big Data Engineer

Humana

Jun 2019 - Feb 2022 (2 years 8 months)

Built and operated Hadoop/Spark claims analytics at 820M records quarterly and led a 6-person data platform team. Delivered real-time fraud detection at 2.4M transactions daily and led a Cloudera-to-AWS EMR/S3 migration for 180 Hive jobs, cutting infrastructure costs by 52% and improving throughput by 8x.

Hadoop Apache Spark Apache Flink S3 Delta Lake Apache Hive

Data Engineer

Anthem (Now Elevance Health)

Aug 2016 - May 2019 (2 years 9 months)

Supported an enterprise healthcare data lake managing 2.1PB on Hadoop for clinical analytics, actuarial modeling, and regulatory reporting. Built pharmacy and provider analytics pipelines in Spark/Hive, created CMS regulatory reporting automation, and optimized claims data with custom Spark UDFs/partitioning to reduce query costs by 40%.

Apache Hadoop Apache Spark Apache Hive Spark UDFs Data Lake SQL Data Pipelines