Tom Muga
@tommuga
C++ engineer building benchmark-driven, high-performance systems with memory efficiency and correctness.
What I'm looking for
I’m a C++ engineer who builds performance-critical systems where execution speed, memory efficiency, and correctness under constraint matter simultaneously. I justify architectural choices against alternatives, and I benchmark every performance claim rather than relying on intuition.
I achieved a 100× throughput improvement by replacing a Python computation core with a purpose-built C++ core—measured against baseline and validated empirically. Across my work, I focus on tight-memory-budget systems, high-throughput simulation, parallel computation, and clean C++/Python interoperability layers.
My open-source engineering workflow is “read → isolate → change → measure → verify,” and I make the evidence easy to validate via verifiable GitHub engineering work. From variance-aware probabilistic search tweaks to entropy-guided stochastic optimization and cache-efficient simulation engines, I aim to turn constraints into measurable, reliable speedups.
Experience
Work history, roles, and key accomplishments
Causal Simulation Engine
Proprietary
Rebuilt a Python stochastic simulation core in C++ to achieve 100× throughput improvement versus the original implementation. Also reduced experimental search space by 90% and demonstrated cross-instance generalisation by learning intervention dynamics from one instance and generalising to 998 additional cases.
C++ Performance Engineer
Open Source
Delivered 100× throughput improvements by replacing Python computation cores with purpose-built C++ implementations, validating gains with benchmarks and hardware counter analysis. Built performance-critical simulation and optimization systems (C++/Python interoperability, parallel execution, and numerically stable algorithms) and shipped verified PRs with reproducible results.
Monte Carlo Simulation Engine
Improved a Monte Carlo simulation engine to reach 2.38× throughput over a Python baseline by restructuring parallelisation and eliminating intermediate storage. Fixed a parallel pragma placement bug, used a single-pass Welford algorithm for numerically stable mean/stddev, and added risk estimation (expected revenue and probability of negative cashflow).
Multi-Agent Belief Framework
Built a fixed-capacity belief distribution framework with a compile-time memory pool to eliminate dynamic allocation and reduce fragmentation. Implemented a certainty-aware two-objective Dijkstra (biasing toward higher certainty) and reduced Section size from 32 to 24 bytes by member reordering and replacing std::string with char.
Causal Simulation Engine Engineer
Rebuilt a Python stochastic causal simulation core in C++ to achieve 100× throughput improvement versus the original. Implemented a contiguous, allocation-minimal C++ engine with a Python orchestration layer for configuration and aggregation.
Entropy-Guided Stochastic Optimizer Engineer
Optimized an entropy-guided stochastic optimizer to reach 22.38× throughput improvement through profiling-guided hot-path changes. Reduced cache miss rate from 53.64% to 12.15% (−89% absolute misses), with instruction and branch misprediction reductions, while improving numerical stability with regularized Cholesky.
WebAssembly Cross-Runtime C++ Engineer
Built a zero-serialization, zero-copy WebAssembly execution model using C++ as the memory owner/scheduler, C# as a stateless compute module, and minimal JS bootstrapping. Achieved stable 60fps at 1,024 entities with 99.95% frame consistency via shared WASM linear memory and a single JS boundary crossing per frame.
Entropy-Guided Stochastic Optimizer
Implemented a multi-stage optimization pipeline (Gaussian sampling, evaluation, hierarchical clustering, adaptive refocusing) with numerically stable Cholesky decomposition using jitter. Improved throughput by 22.38× and reduced cache miss rate from 53.64% to 12.15% (−89% absolute misses) via targeted hot-path changes.
IS-MCTS Variance-Aware C++ Engineer
Modified IS-MCTS selection to incorporate action-level variance into the UCB decision rule for risk-adjusted search. Empirically showed variance penalisation amplifies first-mover advantage with statistically significant divergence from the canonical implementation.
C++ Pathfinding Library Engineer
Recast Navigation
Identified and refactored a redundant two-pass hot-path loop (rcCalcBounds) into a single-pass fused min/max operation. Improved throughput by ~10% (106 ns → 95 ns) and reduced instruction and cycle counts per hardware counters.
Monte Carlo Parallel Simulation Engineer
Improved a Monte Carlo simulation engine by 2.38× over a Python baseline and an additional 1.71× by removing intermediate storage and restructuring parallelization. Fixed a structural OpenMP bug that made the MC loop sequential and used a fused single-pass Welford algorithm for mean/standard deviation.
Education
Degrees, certifications, and relevant coursework
Ubunifu College
Software Engineering Diploma, Software Engineering
Completed a Software Engineering Diploma in 2022.
Tech stack
Software and tools used professionally
Availability
Location
Authorized to work in
Portfolio
github.com/tommygrammarSalary expectations
Social media
Job categories
Skills
Interested in hiring Tom?
You can contact Tom and 90k+ other talented remote workers on Himalayas.
Message TomFind your dream job
Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!
