Open to opportunities

Dzmitry Shulhin

@shulhd

Staff software engineer specializing in distributed systems and AI performance engineering for reliable, high-load LLM inference.

Poland

Message

What I'm looking for

I’m looking for a team where I can push high-load reliability in distributed systems while driving AI performance engineering—optimizing LLM inference, GPU efficiency, and production MLOps with strong technical excellence.

I’m a Staff Software Engineer with a decade of experience in distributed systems, leading engineering from architecture to delivery. I focus on high-load reliability, technical excellence, and turning performance constraints into production-grade solutions.

In my recent work at VISA, I led cross-functional, platform-wide initiatives for high-load payment acceptance—upgrading TLS across microservices, enforcing PCI compliance, and migrating from embedded Hazelcast to a shared distributed cluster. I’ve also re-engineered agent availability with Kafka broadcasting, built Spark/Databricks ETL pipelines that cut processing runtimes by 30%, and developed Alexa TTS fine-tuning and AWS pipeline architectures that improved cache efficiency and enabled petabyte-scale processing.

Experience

Work history, roles, and key accomplishments

Current

AI Performance Engineering Fellow

Current

Nebius Academy

Mar 2026 - Present (4 months)

Specialize in LLM architecture and inference optimization using KV-caching, Mixture of Experts (MoE), and LoRA fine-tuning to maximize model efficiency. Build production-ready MLOps stacks with vLLM, Kubernetes, and MLflow for scalable, observable deployments.

LLM Inference Optimization Mixture Of Experts (MoE)VLLM Kubernetes MLFlow MLOps Performance Engineering

Current

Staff Software Engineer

Current

Visa

Feb 2025 - Present (1 year 5 months)

Led cross-functional initiatives for high-load payment acceptance systems, orchestrating a platform TLS protocol upgrade while ensuring PCI compliance across microservices and maintaining high availability. Directed migration from embedded Hazelcast to a shared distributed cluster to improve resilience and observability while reducing infrastructure overhead and technical debt.

Distributed Systems TLS PCI Compliance Microservices High Availability Hazelcast Observability Engineering Leadership

Software Engineer

Allegro

Mar 2024 - Feb 2025 (11 months)

Re-engineered third-party agent availability tracking by replacing high-overhead MongoDB polling with Kafka broadcasting and local caching, improving real-time synchronization and reducing redundant database usage. Architected GPT-based agents and deployed ML-driven fraud prevention services, reducing false positives by 11%.

Kafka MongoDB Caching Real Time Systems Machine Learning Fraud Detection Microservices

Software Engineer

Samba TV

Jun 2023 - Jan 2024 (7 months)

Worked on high-throughput data ingestion and downstream analytics reliability by implementing automated data cleaning and validation within Spark-based pipelines. Supported performance-focused ETL development to improve runtime efficiency for large-scale daily data processing.

Apache Spark Databricks ETL Data Cleaning Data Validation Performance Optimization

Software Development Engineer

Amazon

Dec 2020 - May 2023 (2 years 5 months)

Developed a Dockerized fine-tuning pipeline for Alexa custom voice generation using BERT and vocoders to improve speech synthesis quality. Improved AWS Polly in-memory cache hit rate by 18% and built Kinesis-to-S3 Lambda architecture pipelines for petabyte-scale data processing.

AWS Polly Kinesis S3 AWS Lambda Docker BERT Vocoders Text To Speech Fine Tuning Pipelines

Software Engineer

Samba TV

Mar 2019 - Jan 2020 (10 months)

Engineered Spark-based ETL pipelines on Databricks, optimizing orchestration and performance tuning to reduce processing runtimes by 30%. Managed high-throughput terabyte-scale daily ingestion with automated data cleaning and validation for reliable downstream analytics.

Apache Spark Databricks ETL Data Validation Data Cleaning Data Ingestion Distributed Data Processing Performance Optimization

Education

Degrees, certifications, and relevant coursework

NEBIUS Academy

AI Performance Engineering Fellow, AI Performance Engineering

2026 -

Activities and societies: Building production-ready MLOps ecosystems using vLLM, Kubernetes, and MLflow for end-to-end AI lifecycle management with enterprise-grade observability and scale.

AI Performance Engineering Fellow specializing in LLM architecture and inference optimization, including KV-caching, Mixture of Experts (MoE), and LoRA fine-tuning.