Ivan Podluzhny
@ivanpodluzhny
Applied Scientist specializing in speech recognition and machine learning.
What I'm looking for
I am an Applied Scientist at Amazon, focusing on enhancing the naturalness and expressiveness of text-to-speech (TTS) voices. My work involves utilizing Direct Preference Optimization (DPO) to improve TTS synthesis generations. I have successfully designed and deployed an automation measurement pipeline that evaluates speaker identity, accent, and prosody drift, which has significantly contributed to the development of a tone-sensitive neural TTS front-end system for Alexa in Japanese, achieving a remarkable 41% reduction in relative SER.
Prior to my role at Amazon, I worked as a Speech Researcher at STC Group, where I created and released speech recognition models for multiple languages, including English, Spanish, Russian, and Arabic. My innovative approach led to a 38% increase in out-of-vocabulary (OOV) recognition rates using the BPE-dropout technique for low-resource tasks. I also developed a punctuation model for automatic speech recognition (ASR) results, achieving a 0.9 F1 score, and contributed to various academic papers and conferences in the field.
Experience
Work history, roles, and key accomplishments
Applied Scientist
Amazon
Jan 2023 - Present (2 years 5 months)
Improved naturalness and expressiveness for new TTS voices using Direct Preference Optimization (DPO). Designed and deployed an automation measurement pipeline for speaker identity, accent, and prosody drift in LLM TTS synthesis generations. This work included developing a tone-sensitive neural TTS front-end system for Alexa in Japanese, which reduced relative SER by 41% and achieved strong prefer
Speech Researcher
STC Group
Jan 2019 - Present (6 years 5 months)
Created and released speech recognition models for various languages, including English, Spanish, Russian, and Arabic. Increased the Out-Of-Vocabulary (OOV) recognition rate by 38% using the BPE-dropout technique for low-resource tasks. Additionally, developed and published an NLP post-processing punctuation model for ASR results, achieving a 0.9 F1 score.
Education
Degrees, certifications, and relevant coursework
ITMO University
Ph.D. in CS, Computer Science
Engaged in research on semi-supervised techniques for end-to-end Speech Recognition. Developed expertise in cutting-edge machine learning and AI methodologies.
Saint Petersburg State University
M.S. in Mathematics, Mathematics
Grade: 4.96 GPA
Focused on advanced topics in Probability Theory, Random Processes, and Statistics, achieving a high academic standing. Gained a comprehensive understanding of mathematical principles applicable to various scientific fields.
Tech stack
Software and tools used professionally
Availability
Location
Authorized to work in
Job categories
Interested in hiring Ivan?
You can contact Ivan and 90k+ other talented remote workers on Himalayas.
Message IvanFind your dream job
Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!
