Skip to main content
HimalayasHimalayas logo
JT
Open to opportunities

JOSEPH TANG

@josephtang

Staff data engineer building real-time lakehouses and governed data platforms.

United States
Message

What I'm looking for

I’m looking to build governed, real-time data platforms—lakehouses, streaming pipelines, and feature stores—using scalable multi-cloud infrastructure, strong observability, and automation that improves latency, cost, and lineage for ML/analytics teams.

I’m a Staff Data Engineer at Databricks, focused on enterprise-grade lakehouse and streaming platforms that power petabyte-scale analytics and ML workloads. I architected the Enterprise Lakehouse Platform (ELP), enabling multi-tenant data access across 50+ enterprise customers while reducing data latency by 45%.

I designed real-time ingestion and feature pipelines with my “StreamHub” system, using Spark Structured Streaming and Kafka to support sub-second availability for downstream ML inference and analytics by 20%. I also orchestrated Delta Lake + Spark lakehouse pipelines across massive datasets, improving pipeline throughput by 3x and reducing compute cost by 35%.

I drive governance, reliability, and operational excellence through Unity Catalog-based controls, including fine-grained RBAC and data lineage, plus a data quality and observability framework with anomaly detection and alerting that reduced data incidents by 60%. Across multi-cloud environments, I implement scalable medallion architectures, optimize PySpark performance, and build reproducible infrastructure with Terraform, CI/CD, and orchestration to support production ML/LLM workflows.

Experience

Work history, roles, and key accomplishments

Databricks logoDA
Current

Staff Data Engineer

Jul 2017 - Present (8 years 11 months)

Architected the Enterprise Lakehouse Platform (ELP) enabling multi-tenant analytics for 50+ enterprise customers and reducing data latency by 45%. Built real-time ingestion and feature pipelines and orchestrated Delta Lake processing to improve throughput 3x, cut compute costs 35%, and reduce data incidents 60%.

Education

Degrees, certifications, and relevant coursework

California Institute of Technology logoCT

California Institute of Technology

Bachelor of Science, Computer Science

2010 - 2014

Earned a B.S. in Computer Science from the California Institute of Technology from 2010 to 2014.

Tech stack

Software and tools used professionally

Find your dream job

Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan