JOSEPH TANG
@josephtang
Staff data engineer building real-time lakehouses and governed data platforms.
What I'm looking for
I’m a Staff Data Engineer at Databricks, focused on enterprise-grade lakehouse and streaming platforms that power petabyte-scale analytics and ML workloads. I architected the Enterprise Lakehouse Platform (ELP), enabling multi-tenant data access across 50+ enterprise customers while reducing data latency by 45%.
I designed real-time ingestion and feature pipelines with my “StreamHub” system, using Spark Structured Streaming and Kafka to support sub-second availability for downstream ML inference and analytics by 20%. I also orchestrated Delta Lake + Spark lakehouse pipelines across massive datasets, improving pipeline throughput by 3x and reducing compute cost by 35%.
I drive governance, reliability, and operational excellence through Unity Catalog-based controls, including fine-grained RBAC and data lineage, plus a data quality and observability framework with anomaly detection and alerting that reduced data incidents by 60%. Across multi-cloud environments, I implement scalable medallion architectures, optimize PySpark performance, and build reproducible infrastructure with Terraform, CI/CD, and orchestration to support production ML/LLM workflows.
Experience
Work history, roles, and key accomplishments
Architected the Enterprise Lakehouse Platform (ELP) enabling multi-tenant analytics for 50+ enterprise customers and reducing data latency by 45%. Built real-time ingestion and feature pipelines and orchestrated Delta Lake processing to improve throughput 3x, cut compute costs 35%, and reduce data incidents 60%.
Built scalable distributed data-processing pipelines with BigQuery and Dataflow, processing 10+ TB/day and reducing query latency by 30%. Developed dimensional models and ETL workflows for real-time dashboards and experimentation, reducing compute usage by 25%.
Developed backend data services and ETL pipelines using Java and SQL for enterprise-scale transactional systems. Designed relational data models and optimized OracleDB indexing strategies to improve query performance and system reliability.
Education
Degrees, certifications, and relevant coursework
California Institute of Technology
Bachelor of Science, Computer Science
2010 - 2014
Earned a B.S. in Computer Science from the California Institute of Technology from 2010 to 2014.
Availability
Location
Authorized to work in
Portfolio
linkedin.com/in/joseph-t-a4702139aSocial media
Job categories
Interested in hiring JOSEPH?
You can contact JOSEPH and 90k+ other talented remote workers on Himalayas.
Message JOSEPHFind your dream job
Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!
