Skip to main content
DC
Open to opportunities

David Chou

@davidchou

Infrastructure-focused Staff Software Engineer specializing in global load balancing, failover orchestration, and production observability.

United States
Message

What I'm looking for

I’m looking for infrastructure engineering roles where I can own multi-year roadmaps for global reliability—disaster recovery, traffic orchestration, and distributed tracing—pairing automation and observability with deep mentorship to improve recovery time and production confidence.

I’m an infrastructure-focused Staff Software Engineer with deep expertise in global load balancing, failover orchestration, distributed tracing, and production-scale load testing at Meta. I own technical strategy and multi-year roadmapping to support Meta-scale availability and capacity planning.

At Meta, I lead cross-functional initiatives across disaster recovery, site reliability, capacity, and traffic infrastructure teams. I drive architectural decisions and technical problem-solving for organization-spanning challenges, including automated failover evolution and advanced traffic orchestration under extreme failure scenarios.

I act as a primary technical backstop and influencer, scoping high-ambiguity projects and partnering with EMs/PMs to define priorities that ladder into broader infrastructure objectives. I also elevate team performance through deep mentorship of senior engineers and by setting and enforcing engineering excellence standards.

I champion CI/CD, observability, and load testing strategies at org scale, helping reduce recovery times and increase confidence in production changes while maintaining Meta’s high-reliability bar. Earlier, I led development of Canopy, Meta’s distributed performance tracing system, spanning frontend visualization, backend instrumentation, and trace aggregation at massive scale.

Experience

Work history, roles, and key accomplishments

Meta logoME
Current

Staff Software Engineer

Feb 2021 - Present (5 years 5 months)

Own technical strategy and multi-year roadmap for global user traffic management and disaster recovery systems, aligning mitigations with long-term resilience goals for large-scale availability and capacity planning. Lead cross-functional architectural decisions around automated failover evolution, traffic orchestration, and CI/CD/observability/load testing to reduce recovery times.

Meta logoME

Senior Software Engineer

Aug 2016 - Feb 2021 (4 years 6 months)

Technical lead for a disaster recovery organization team owning end-to-end global user traffic management systems for site reliability and capacity. Designed automated traffic balancing, failover orchestration, traffic shifting tools, and production load testing frameworks, while driving technical strategy and CI/CD pipeline evolution.

Meta logoME

Software Engineer

Oct 2013 - Aug 2016 (2 years 10 months)

Led development of Canopy, Meta's distributed end-to-end performance tracing system, including internal visualization tools, sampling policy configuration interfaces, and trace data aggregation backends. Built instrumentation APIs/libraries and contributed full-stack capabilities across frontend (React/JavaScript/CSS) and backend services (Hack/PHP) while maintaining scalable, low-latency trace ag

Education

Degrees, certifications, and relevant coursework

University of California, Berkeley logoUB

University of California, Berkeley

Bachelor of Science, Computer Science

2009 - 2013

Bachelor of Science in Computer Science at the University of California, Berkeley from 2009 to 2013.

Tech stack

Software and tools used professionally

Get matched with your dream remote job

Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan