kalveen joseph
@kalveenjoseph
Senior Big Data Engineer architecting petabyte-scale batch and real-time platforms with 99.99% uptime.
What I'm looking for
I’ve built and led big-data ecosystems for nine years, focused on turning complex pipelines into reliable, high-performance platforms. At Oznolo, I architected and owned a healthcare data platform that processes 28B events daily across 38 hospital network clients, runs on a 2,400-node Spark cluster, and delivers 4.7PB of data with 99.99% uptime.
I’m especially strong at modernizing lakehouse and streaming architectures—redesigning batch from MapReduce to Spark 3.4 to cut nightly ETL from 14 hours to 47 minutes, and building real-time streaming with Kafka and Apache Flink for sub-60-second sepsis and deterioration alerts. I’ve driven measurable wins across governance, quality, and cost: migrating HDFS to a Delta Lake lakehouse on S3 (68% storage cost reduction, 12x query performance lift), enforcing automated data quality at scale with Great Expectations (18,000 contracts nightly), and establishing HIPAA-compliant governance with Unity Catalog to support 280 analysts and 45 ML engineers.
Experience
Work history, roles, and key accomplishments
Senior Big Data Engineer
Oznolo
Mar 2022 - Present (4 years 2 months)
Architected and owned a healthcare big data platform processing 28B clinical and claims events daily across 38 hospital clients, managing a 2,400-node AWS EMR Spark cluster serving 4.7PB with 99.99% uptime. Redesigned batch and real-time pipelines (MapReduce to Spark; Kafka/Flink) and migrated HDFS to Delta Lake on S3, cutting ETL runtime from 14 hours to 47 minutes and reducing storage costs by 6
Built and operated Hadoop/Spark claims analytics at 820M records quarterly and led a 6-person data platform team. Delivered real-time fraud detection at 2.4M transactions daily and led a Cloudera-to-AWS EMR/S3 migration for 180 Hive jobs, cutting infrastructure costs by 52% and improving throughput by 8x.
Supported an enterprise healthcare data lake managing 2.1PB on Hadoop for clinical analytics, actuarial modeling, and regulatory reporting. Built pharmacy and provider analytics pipelines in Spark/Hive, created CMS regulatory reporting automation, and optimized claims data with custom Spark UDFs/partitioning to reduce query costs by 40%.
Education
Degrees, certifications, and relevant coursework
Georgia Institute of Technology
Master of Science, Computer Science
Earned an M.S. in Computer Science (Computing Systems) from Georgia Institute of Technology.
Tech stack
Software and tools used professionally
Availability
Location
Authorized to work in
Job categories
Interested in hiring kalveen?
You can contact kalveen and 90k+ other talented remote workers on Himalayas.
Message kalveenFind your dream job
Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!
