Tom Muga
@tommuga
C++ engineer building benchmark-driven, high-performance systems with memory efficiency and correctness.
What I'm looking for
I’m a C++ engineer who builds performance-critical systems where execution speed, memory efficiency, and correctness under constraint matter simultaneously. I justify architectural choices against alternatives, and I benchmark every performance claim rather than relying on intuition.
I achieved a 100× throughput improvement by replacing a Python computation core with a purpose-built C++ core—measured against baseline and validated empirically. Across my work, I focus on tight-memory-budget systems, high-throughput simulation, parallel computation, and clean C++/Python interoperability layers.
My open-source engineering workflow is “read → isolate → change → measure → verify,” and I make the evidence easy to validate via verifiable GitHub engineering work. From variance-aware probabilistic search tweaks to entropy-guided stochastic optimization and cache-efficient simulation engines, I aim to turn constraints into measurable, reliable speedups.
Experience
Work history, roles, and key accomplishments
Causal Simulation Engine
Proprietary
Rebuilt a Python stochastic simulation core in C++ to achieve 100× throughput improvement versus the original implementation. Also reduced experimental search space by 90% and demonstrated cross-instance generalisation by learning intervention dynamics from one instance and generalising to 998 additional cases.
C++ Performance Engineer
Open Source
Delivered 100× throughput improvements by replacing Python computation cores with purpose-built C++ implementations, validating gains with benchmarks and hardware counter analysis. Built performance-critical simulation and optimization systems (C++/Python interoperability, parallel execution, and numerically stable algorithms) and shipped verified PRs with reproducible results.
Monte Carlo Simulation Engine
Improved a Monte Carlo simulation engine to reach 2.38× throughput over a Python baseline by restructuring parallelisation and eliminating intermediate storage. Fixed a parallel pragma placement bug, used a single-pass Welford algorithm for numerically stable mean/stddev, and added risk estimation (expected revenue and probability of negative cashflow).
Multi-Agent Belief Framework
Built a fixed-capacity belief distribution framework with a compile-time memory pool to eliminate dynamic allocation and reduce fragmentation. Implemented a certainty-aware two-objective Dijkstra (biasing toward higher certainty) and reduced Section size from 32 to 24 bytes by member reordering and replacing std::string with char.
Causal Simulation Engine Engineer
Rebuilt a Python stochastic causal simulation core in C++ to achieve 100× throughput improvement versus the original. Implemented a contiguous, allocation-minimal C++ engine with a Python orchestration layer for configuration and aggregation.
Entropy-Guided Stochastic Optimizer Engineer
Optimized an entropy-guided stochastic optimizer to reach 22.38× throughput improvement through profiling-guided hot-path changes. Reduced cache miss rate from 53.64% to 12.15% (−89% absolute misses), with instruction and branch misprediction reductions, while improving numerical stability with regularized Cholesky.
WebAssembly Cross-Runtime C++ Engineer
Built a zero-serialization, zero-copy WebAssembly execution model using C++ as the memory owner/scheduler, C# as a stateless compute module, and minimal JS bootstrapping. Achieved stable 60fps at 1,024 entities with 99.95% frame consistency via shared WASM linear memory and a single JS boundary crossing per frame.
Entropy-Guided Stochastic Optimizer
Implemented a multi-stage optimization pipeline (Gaussian sampling, evaluation, hierarchical clustering, adaptive refocusing) with numerically stable Cholesky decomposition using jitter. Improved throughput by 22.38× and reduced cache miss rate from 53.64% to 12.15% (−89% absolute misses) via targeted hot-path changes.
IS-MCTS Variance-Aware C++ Engineer
Modified IS-MCTS selection to incorporate action-level variance into the UCB decision rule for risk-adjusted search. Empirically showed variance penalisation amplifies first-mover advantage with statistically significant divergence from the canonical implementation.
C++ Pathfinding Library Engineer
Recast Navigation
Identified and refactored a redundant two-pass hot-path loop (rcCalcBounds) into a single-pass fused min/max operation. Improved throughput by ~10% (106 ns → 95 ns) and reduced instruction and cycle counts per hardware counters.
Monte Carlo Parallel Simulation Engineer
Improved a Monte Carlo simulation engine by 2.38× over a Python baseline and an additional 1.71× by removing intermediate storage and restructuring parallelization. Fixed a structural OpenMP bug that made the MC loop sequential and used a fused single-pass Welford algorithm for mean/standard deviation.
Education
Degrees, certifications, and relevant coursework
Ubunifu College
Software Engineering Diploma, Software Engineering
Completed a Software Engineering Diploma in 2022.
Tech stack
Software and tools used professionally
Availability
Location
Authorized to work in
Portfolio
github.com/tommygrammarSalary expectations
Social media
Job categories
Skills
Interested in hiring Tom?
You can contact Tom and 90k+ other talented remote workers on Himalayas.
Message TomFind your dream job
Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!
