Osborne Andrew
@osborneandrew
Senior DevOps & Platform Engineer architecting Kubernetes-native AI/ML infrastructure—cutting costs, accelerating releases, and ensuring high-SLA reliability.
What I'm looking for
I’m a Senior DevOps and Platform Engineer focused on designing and operating enterprise AI/ML infrastructure across AWS, Azure, and GCP. At GitLab, I lead AI infrastructure for the GitLab Duo Agent Platform—architecting Kubernetes-native MLOps systems that support 100K+ daily operations with 99.999% SLA on mission-critical LLM workloads.
I turn infrastructure complexity into measurable outcomes: $450K+ annual cloud savings, 40% faster release cycles, and a 3× reduction in P1 incidents year-over-year. I’ve built and scaled RAG and inference platforms (vLLM + Triton, Istio routing, KEDA autoscaling), implemented zero-trust security with Vault, OPA/Gatekeeper, and OIDC workload identity federation to reach SOC2 Type II compliance, and automated multi-region disaster recovery to maintain sub-15-minute RTO.
Experience
Work history, roles, and key accomplishments
Senior DevOps / Platform Engineer
GitLab
Mar 2020 - Present (6 years 3 months)
Architected Kubernetes-native MLOps for GitLab Duo across AWS EKS and Azure AKS, automating model training and inference orchestration and reducing provisioning from days to under 10 minutes while supporting 29% YoY growth without SLA regression. Delivered multi-cloud LLM inference with vLLM/Triton and zero-trust security (Vault, OPA/Gatekeeper, OIDC) plus multi-region disaster recovery, achieving
DevOps Engineer
HashiCorp
Jul 2017 - Feb 2020 (2 years 7 months)
Designed 40+ production-grade Terraform modules for AWS/Azure/GCP adopted by 100+ enterprise clients, cutting environment onboarding from 3 weeks to under 4 days. Built Vault dynamic secrets and optimized GitHub Actions + Terraform Cloud CI/CD with Sentinel policy-as-code, reducing deployment errors by 70% and cutting infrastructure change cycles from 3 hours to 28 minutes.
Cloud Engineer
RunPod
Oct 2016 - Mar 2017 (5 months)
Operated GPU infrastructure for A100/H100/H200 workloads, using Terraform and Ansible to reduce pod spin-up times by 50%+ for enterprise AI teams. Designed serverless LLM inference with KEDA autoscaling and NVMe model weight caching to achieve sub-3-second cold starts, saving customers an average of $120K/month in wasted compute.
Education
Degrees, certifications, and relevant coursework
Georgia Institute of Technology
Bachelor of Science, Computer Engineering
2012 - 2016
Bachelor of Science in Computer Engineering from Georgia Institute of Technology from 2012 to 2016.
Tech stack
Software and tools used professionally
Availability
Location
Authorized to work in
Website
andrewosborne.devJob categories
Skills
Interested in hiring Osborne?
You can contact Osborne and 90k+ other talented remote workers on Himalayas.
Message OsborneFind your dream job
Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!
