HimalayasHimalayas logo
FL
Looking for a job

Francis Lobo

@francislobo

Senior Systems & Resilience Engineer building large‑scale load, telemetry, and reliability platforms for cloud‑native systems (~350k RPS, 26M users)

Australia
Message

What I'm looking for

I’m looking for a role where I can lead performance and resilience engineering for large distributed systems—building load-generation, failure-mode testing, and observability that improves scalability, tail latency, and reliability over time.

Senior Systems & Resilience Engineer with a background that spans low‑level embedded systems and cloud‑native platforms. I’ve spent the last several years building and leading large‑scale workload simulation and resilience initiatives, including a distributed load‑generation framework that sustained ~350k RPS (3× production peak) for a national‑scale serverless system serving ~26M users.
My focus is understanding how complex systems behave under extreme load and failure: combining low‑level performance debugging, distributed telemetry pipelines, and failure‑mode discovery to drive architectural changes that improve scalability, reliability, and tail latency.
Recent work includes designing population‑scale workload models and multi‑hour soaks (10+ hours, 3B+ multi‑step transactions), building out‑of‑core telemetry pipelines on AWS (Fargate, EFS, DuckDB) for per‑second percentile analysis, and uncovering issues such as DynamoDB hot partitions and Lambda cold‑start amplification that directly informed system design and capacity decisions.
Earlier in my career I built hardware CI and test infrastructure for embedded and Set‑Top Box systems (Cisco, Motorola, Philips), and later moved into distributed systems, banking platforms, and marketing SaaS (Tyro, Campaign Monitor). Across these environments, I’ve consistently worked at the intersection of performance, reliability, and test infrastructure—helping teams move from ad‑hoc testing to engineering‑owned quality and SRE‑style practices.
Areas of interest: large‑scale distributed systems, SRE and production engineering, platform reliability, capacity planning, failure injection and game days, observability and telemetry pipelines, and bridging the gap between test infrastructure and production‑grade reliability.

Experience

Work history, roles, and key accomplishments

SC
Current

Senior Systems & Resilience Engineer

Slalom Consulting

Jan 2023 - Present (3 years 3 months)

Owned performance at scale for a national serverless platform (~26M users), building an in-house distributed load-generation and resilience platform from scratch to enable safe pre-launch validation. Delivered a Python/Locust framework sustaining ~350k RPS (3× peak), plus long-duration workload simulation and failure-mode testing that improved tail-latency stability during burst traffic.

SC

Senior Quality Engineering Architect

Jan 2021 - Jan 2023 (2 years)

Developed and maintained API test automation suites and baseline performance checks for microservice and event-driven architectures, integrating them into CI/CD quality gates. Designed mutation testing strategies and embedded exploratory/quality practices to reduce regressions reaching staging and production.

Campaign Monitor logoCM

Quality Coach & Automation Engineer

Campaign Monitor

Jan 2018 - Jan 2021 (3 years)

Engineered CI/CD deployment gates using Terraform + AWS ECS and implemented Pact contract testing to prevent breaking API changes from reaching production. Built distributed load-testing frameworks (Locust on AWS) and led failure-mode analysis for distributed systems to improve resilience and validation strategies.

CM

Quality Coach

Campaign Monitor

Jan 2018 - Jan 2021 (3 years)

Engineered CI/CD deployment gates with Terraform and AWS ECS, enforcing API compatibility using Pact contract testing to prevent breaking changes from reaching production. Established the organization’s first distributed load-testing frameworks (Locust on AWS) and led failure-mode analysis bridging functional testing and early reliability engineering.

AK

QA Technical Lead

Audience / Knowles

Jan 2014 - Jan 2015 (1 year)

Led validation of Android audio driver stacks, defining synthetic and real-world audio test streams across hardware–software integration layers. Set a structured testing strategy for Audio HAL and ALSA components to verify embedded audio behavior.

Cisco logoCI

Systems Automation Engineer

Jan 2008 - Jan 2015 (7 years)

Built hardware CI and automation infrastructure for Set-Top Box systems, enabling automated middleware and driver-level testing on physical devices. Developed remote hardware test farms and an IP-based testing framework for direct command injection and live telemetry streaming, enabling global remote debugging to reduce on-site support needs.

TN

Tour Director

Tour of Nilgiris

Jan 2008 - Jan 2015 (7 years)

Scaled a grassroots cycling initiative into a multi-day national tour by owning end-to-end operations and real-time incident response. Designed structured communication and telemetry systems for event operations that were later adopted as an industry model.

Education

Degrees, certifications, and relevant coursework

SJC Institute of Technlogy logoST

SJC Institute of Technlogy

Master in Technology, Digital Communication and Network Engineering

2004 - 2006

R V College of Engineering logoRE

R V College of Engineering

Bachelor of Engineering, Telecommunication Engineering

2000 - 2004

Find your dream job

Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan