Skip to main content
JJ
Looking for a job

James Jia

@jamesjia

Senior site reliability engineer focused on cloud reliability, cost optimization, and scalable platform operations for customer-facing products.

United States
Message

What I'm looking for

I’m looking for a mission-driven platform/reliability role where I can design resilient systems, automate observability and rollouts, optimize cloud costs, and mentor engineers while delivering measurable customer impact.

I’m a senior reliability and DevOps engineer who builds and operates high-availability platforms at scale. I focus on end-to-end reliability—capacity planning, progressive rollouts, automated monitoring, and incident recovery—so customer experiences stay fast and dependable.

At Google, I spearheaded reliability work for high-visibility smart home ecosystems and delivered major impact. I helped achieve 99.99% uptime for a streaming launch, automated monitoring and alerting for Gemini AI features, and executed staged, zero-disruption firmware rollouts across a device fleet.

I also optimized performance and resiliency during peak demand by sustaining uninterrupted 4K streaming and AI-powered recommendations. When the internet fails, I used edge capabilities to preserve smart home functionality, and I implemented self-healing automation to reduce manual operational burden and improve time-to-resolution.

Previously at Amazon, I owned asynchronous trial workflows and migrated trial approval/state management to event-driven architecture, cutting end-to-end setup latency by 60%+. I led fault-tolerant state machine designs with chaos engineering, improved availability and latency on critical paths, and drove infrastructure-as-code and automated testing that reduced hotfixes and monthly infrastructure costs.

Experience

Work history, roles, and key accomplishments

Google logoGO
Current

Senior Site Reliability Engineer

Jan 2020 - Present (6 years 5 months)

Spearheaded reliability, capacity planning, and progressive rollouts for Google TV Streamer and smart home ecosystems, cutting cloud costs 30% and halving service recovery times. Delivered 99.99% uptime for launch and automated monitoring/alerting and self-healing for Gemini AI features, enabling zero-disruption firmware rollouts at peak demand.

Amazon logoAM

Software Development Engineer II

Aug 2015 - Oct 2019 (4 years 2 months)

Owned Prime Wardrobe Prime trial core asynchronous workflows, designing scalable state machines for trial, deferred billing, and returns while collaborating across frontend, ML, and fulfillment teams. Led migration to event-driven state management, reducing setup latency by 60% and improving throughput ~40%, while maintaining 99.99% availability and sub-10ms p99 on critical paths.

Education

Degrees, certifications, and relevant coursework

Northwestern University logoNU

Northwestern University

Master of Science (MS), Computer Engineering

Earned a Master of Science (MS) in Computer Engineering from Northwestern University in May 2015.

Find your dream job

Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan