Skip to main content
SD
Open to opportunities

Song Ding

@songding

Senior AI Engineer building scalable multimodal ML systems from research to production.

United States
Message

What I'm looking for

I’m looking to build production-grade multimodal AI systems, improve latency/accuracy with strong MLOps, and ship research breakthroughs into reliable inference pipelines at global scale.

I’m a Senior AI Engineer with 15 years of experience architecting and deploying scalable machine learning systems, specializing in computer vision, speech processing, and multimodal architectures. I orchestrate end-to-end model lifecycles—from data ingestion and distributed training to high-availability inference pipelines—serving millions globally while improving both model latency and precision.

At ByteDance, I architected a multimodal AI framework that increased user engagement by 25% and reduced end-to-end inference latency by 35% while improving Word Error Rate (WER) by 18% using advanced quantization in speech recognition pipelines. Earlier at Yahoo, I improved transcription accuracy from 87% to 95%, enabled real-time edge processing with TensorRT optimization, reduced training cycles by 40% with proprietary synthetic data augmentation, and built scalable MLOps pipelines. I’m passionate about advancing research-to-production by integrating large-scale generative models and proprietary neural architectures to tackle complex challenges and maintain competitive advantage.

Experience

Work history, roles, and key accomplishments

ByteDance logoBY
Current

Senior AI Engineer

Aug 2021 - Present (4 years 10 months)

Architected a multimodal AI framework integrating vision transformers with large language models, increasing user engagement by 25%. Reduced end-to-end inference latency by 35% and improved WER by 18% using advanced quantization, while boosting throughput by 50% via PyTorch + TensorRT and reducing infrastructure costs by 20%.

Yahoo! logoYA

AI Engineer

Yahoo!

Dec 2014 - Jun 2021 (6 years 6 months)

Architected a multi-modal enterprise platform unifying computer vision with ASR pipelines, improving workflow efficiency by 20%. Engineered transformer-based acoustic models to raise transcription accuracy from 87% to 95% and enabled real-time edge processing by reducing latency 35% with TensorRT quantization and distributed computing, while cutting training cycles 40% using synthetic data augmentation.

Yahoo Beijing logoYB

Machine Learning Engineer

Yahoo Beijing

Jun 2010 - Nov 2014 (4 years 5 months)

Developed deep neural networks for voice biometric authentication, achieving 99% precision and reducing unauthorized access by 40%, and built a CNN-based automated inspection pipeline to triple throughput. Implemented scalable MLOps pipelines with PyTorch and AWS SageMaker to reduce model retraining latency and improved authentication success rates by 15% YoY using novel acoustic feature extraction techniques.

Education

Degrees, certifications, and relevant coursework

Peking University logoPU

Peking University

Master of Science, Computer Science – Artificial Intelligence

2007 - 2010

Earned an M.S. in Computer Science with a focus on Artificial Intelligence at Peking University (2007–2010).

Tech stack

Software and tools used professionally

Find your dream job

Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan