Open to opportunities

Dhanush User

@dhanushuser2

Senior MLOps Engineer (7+ years) building production ML infrastructure in finance and healthcare, enabling scalable, compliant self-service platforms.

United States

Message

What I'm looking for

I’m looking to build end-to-end, audit-ready MLOps systems—distributed serving, drift detection, and CI/CD—where I can drive self-service adoption, reduce operational overhead, and improve reliability, latency, and governance in regulated environments.

I’m a Senior MLOps Engineer with 7+ years building production machine-learning infrastructure in finance and healthcare—two of the highest-compliance, highest-stakes environments. At Morgan Stanley and Bloomberg, I engineered GPU-accelerated training clusters, spot-optimized serving infrastructure, and self-service ML platforms adopted by 60+ data scientists, consistently eliminating idle compute waste and reducing MRM audit cycles across regulated environments.

I specialize in end-to-end ML platform engineering: distributed model serving, drift detection, CI/CD pipelines, champion/challenger deployments, and audit-ready governance. At Morgan Stanley, I improved reliability and performance while cutting infrastructure cost by 20%, migrated prediction services to Kubernetes to improve P99 latency/throughput by 35%, and standardized ML lifecycle workflows to reduce audit cycles by 25%. I also enforced SOC2-compliant observability by integrating EvidentlyAI with Prometheus and Grafana (cutting manual monitoring overhead by 40%), and scaled self-service adoption to drive onboarding time from weeks to days. Previously at Bloomberg and Optum, I deployed low-latency serving with SeldonCore, built distributed training with Ray and PyTorch, and delivered HIPAA-compliant pipeline automation and model registry infrastructure—improving release cadence and strengthening data governance.

Experience

Work history, roles, and key accomplishments

Current

Senior MLOps Engineer

Current

Morgan Stanley

Oct 2024 - Present (1 year 9 months)

Engineered model risk and inference infrastructure and built self-service ML platform for 12+ quantitative research teams, reducing infrastructure costs ~20% and improving P99 inference latency/throughput by 35%. Standardized ML pipelines with MLflow + JFrog Artifactory GitOps and implemented SOC2-compliant drift observability (EvidentlyAI, Prometheus/Grafana) to cut MRM audit cycles by 25% and ma

Kubernetes Helmfile Docker Terraform MLFlow JFrog Artifactory Evidently AI Grafana

MLOps Platform Engineer

Bloomberg

Jan 2023 - Sep 2024 (1 year 8 months)

Deployed low-latency real-time news sentiment serving using Seldon Core on Kubernetes and tuned gRPC interfaces to reach 10k+ requests per second at sub-20ms P99 latency. Led distributed training with Ray/PyTorch on AWS EKS, implemented drift-triggered Kubeflow/MLflow promotion workflows, and standardized artifact management via JFrog + Azure DevOps/Helmfile to cut experimentation cycles by 25% an

Kubernetes Seldon gRPC Ray PyTorch Kubeflow JFrog Artifactory Go

MLOps Engineer

Optum Healthcare

Sep 2020 - Dec 2022 (2 years 3 months)

Automated HIPAA-compliant clinical claims ML pipelines and model registry, reducing manual claims review by 12% and enabling production processing for 2M+ monthly records with 92% precision. Built a centralized MLflow registry for 50+ experiments and doubled release cadence (monthly to bi-weekly) via Jenkins/Git CI/CD, while securing PHI transit with Terraform and AES-256 encryption.

Docker Jenkins MLFlow HIPAA Terraform SQL Server AES 256 Encryption Model Registry Airflow

Graduate Engineer Trainee

BHEL

Jan 2019 - Jun 2020 (1 year 5 months)

Built real-time IoT data ingestion pipelines processing 150k+ sensor signals per hour using Python and shell scripting, achieving 99.9% uptime. Developed predictive maintenance models for thermal stress across four steam turbines, delivering 88% detection accuracy and reducing unplanned downtime by 10%.

Python Shell Scripting Data Ingestion Predictive Maintenance Feature Extraction Alerting Reliability Engineering