Open to opportunities

Song Ding

@songding

Message

Senior AI Engineer building scalable multimodal ML systems from research to production.

United States

Message

What I'm looking for

I’m looking to build production-grade multimodal AI systems, improve latency/accuracy with strong MLOps, and ship research breakthroughs into reliable inference pipelines at global scale.

I’m a Senior AI Engineer with 15 years of experience architecting and deploying scalable machine learning systems, specializing in computer vision, speech processing, and multimodal architectures. I orchestrate end-to-end model lifecycles—from data ingestion and distributed training to high-availability inference pipelines—serving millions globally while improving both model latency and precision.

At ByteDance, I architected a multimodal AI framework that increased user engagement by 25% and reduced end-to-end inference latency by 35% while improving Word Error Rate (WER) by 18% using advanced quantization in speech recognition pipelines. Earlier at Yahoo, I improved transcription accuracy from 87% to 95%, enabled real-time edge processing with TensorRT optimization, reduced training cycles by 40% with proprietary synthetic data augmentation, and built scalable MLOps pipelines. I’m passionate about advancing research-to-production by integrating large-scale generative models and proprietary neural architectures to tackle complex challenges and maintain competitive advantage.

Experience

Work history, roles, and key accomplishments

Current

Senior Software Engineer

Current

ByteDance

Sep 2021 - Present (4 years 10 months)

Led development of a multimodal AI platform combining speech, vision, and language models to deliver more natural interactions for millions of users. Built backend AI services, model-serving infrastructure, and API integrations for real-time inference, improving performance via model optimization and distributed deployment.

Multimodal AI Speech Recognition Computer Vision Language Models Model Serving GPU Acceleration API Integration MLOps

Current

Senior AI Engineer

Current

ByteDance

Aug 2021 - Present (4 years 11 months)

Architected a multimodal AI framework integrating vision transformers with large language models, increasing user engagement by 25%. Reduced end-to-end inference latency by 35% and improved WER by 18% using advanced quantization, while boosting throughput by 50% via PyTorch + TensorRT and reducing infrastructure costs by 20%.

AI Engineer

Yahoo!

Dec 2014 - Jun 2021 (6 years 6 months)

Architected a multi-modal enterprise platform unifying computer vision with ASR pipelines, improving workflow efficiency by 20%. Engineered transformer-based acoustic models to raise transcription accuracy from 87% to 95% and enabled real-time edge processing by reducing latency 35% with TensorRT quantization and distributed computing, while cutting training cycles 40% using synthetic data augmentation.

Computer Vision Automatic Speech Recognition (ASR)Transformer Acoustic Models Multi modal Systems TensorRT Model Quantization Distributed Computing Synthetic Data Augmentation Latency Optimization Edge inference

AI Software Engineer

Yahoo!

Dec 2014 - Jun 2021 (6 years 6 months)

Developed machine learning and computer vision solutions to automate large-scale content analysis and improve operational efficiency. Built backend services and data processing pipelines to support real-time AI applications and analytics platforms, from research through deployment and monitoring.

Machine Learning Computer Vision Data Pipelines Backend Services Model Monitoring Analytics

Machine Learning Engineer

Yahoo Beijing

Jun 2010 - Nov 2014 (4 years 5 months)

Developed deep neural networks for voice biometric authentication, achieving 99% precision and reducing unauthorized access by 40%, and built a CNN-based automated inspection pipeline to triple throughput. Implemented scalable MLOps pipelines with PyTorch and AWS SageMaker to reduce model retraining latency and improved authentication success rates by 15% YoY using novel acoustic feature extraction techniques.