HimalayasHimalayas logo
JT
Open to opportunities

JOSEPH TANG

@josephtang

Staff data engineer building real-time lakehouses and governed data platforms.

United States
Message

What I'm looking for

I’m looking to build governed, real-time data platforms—lakehouses, streaming pipelines, and feature stores—using scalable multi-cloud infrastructure, strong observability, and automation that improves latency, cost, and lineage for ML/analytics teams.

I’m a Staff Data Engineer at Databricks, focused on enterprise-grade lakehouse and streaming platforms that power petabyte-scale analytics and ML workloads. I architected the Enterprise Lakehouse Platform (ELP), enabling multi-tenant data access across 50+ enterprise customers while reducing data latency by 45%.

I designed real-time ingestion and feature pipelines with my “StreamHub” system, using Spark Structured Streaming and Kafka to support sub-second availability for downstream ML inference and analytics by 20%. I also orchestrated Delta Lake + Spark lakehouse pipelines across massive datasets, improving pipeline throughput by 3x and reducing compute cost by 35%.

I drive governance, reliability, and operational excellence through Unity Catalog-based controls, including fine-grained RBAC and data lineage, plus a data quality and observability framework with anomaly detection and alerting that reduced data incidents by 60%. Across multi-cloud environments, I implement scalable medallion architectures, optimize PySpark performance, and build reproducible infrastructure with Terraform, CI/CD, and orchestration to support production ML/LLM workflows.

Experience

Work history, roles, and key accomplishments

Databricks logoDA
Current

Staff Data Engineer

Jul 2017 - Present (8 years 9 months)

Architected the Enterprise Lakehouse Platform (ELP) enabling multi-tenant analytics for 50+ enterprise customers and reducing data latency by 45%. Built real-time ingestion and feature pipelines and orchestrated Delta Lake processing to improve throughput 3x, cut compute costs 35%, and reduce data incidents 60%.

Education

Degrees, certifications, and relevant coursework

California Institute of Technology logoCT

California Institute of Technology

Bachelor of Science, Computer Science

2010 - 2014

Earned a B.S. in Computer Science from the California Institute of Technology from 2010 to 2014.

Tech stack

Software and tools used professionally

Find your dream job

Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan