As an R&D TTS Speech Applied Scientist, you will be a core member of our Speech Team, dedicated to expanding our high-quality TTS services into new languages. Your focus will be on the end-to-end process of localizing and deploying synthetic voices, from initial linguistic analysis and data preparation to final model deployment.
Responsibilities
The primary focus is the research, development, and implementation of robust TTS and Voice Cloning systems for global expansion into new and diverse locales, covering the entire speech synthesis pipeline.
1. End-to-End Pipeline Ownership
This role requires end-to-end involvement in launching TTS for new languages, ensuring quality and scalability across all stages.
- Language Analyses: Conduct thorough language analysis, phonetic/phonology studies, and define the phoneme set for new target languages. Design and implement the lexicon and G2P (Grapheme-to-Phoneme) development process to ensure accurate pronunciation modeling.
- Voice Creation & Data Curation: Actively participate in the initial stages of voice creation projects:
- Assist in voice talent selection to meet aesthetic and linguistic requirements.
- Collaborate on corpus design, defining sentence structure and coverage, and overseeing the corpus creation, recording, and quality review process.
- Cooperate on voice style design to define the desired emotional and speaking characteristics for the synthetic voice.
- TTS Model Development:
- Lead TTS model training/evaluation for multiple languages, ensuring high-quality synthesis and speaker consistency.
- Adapt or extend current TTS Voices data flow/pipeline for new languages and actively contribute to developing/training new models.
- Quality Assurance: Conduct rigorous listening tests (e.g., MOS score evaluation) and error analysis to drive model improvements, collaborating closely with internal listening testers.
2. Voice Cloning & Low-Resource Research
- Conduct applied research into low-resource voice adaptation and few-shot voice cloning techniques to rapidly deploy high-quality new voices across various markets.
3. Production & Scale
- Work closely with MLOps and engineering teams to transition successful models into a low-latency, high-scale production environment for global deployment.
- Prototype new research ideas and optimize existing model architectures for real-world performance.
4. Agile Methodologies & Collaboration
- Actively participate in Agile software development processes, including sprint planning, daily stand-ups, and retrospectives to ensure timely and high-quality deliverables.
- Work closely with cross-functional teams, including product managers, designers, and other engineers, to gather requirements and ensure alignment on project goals.
- Participate in project planning, including research and development.
- Contribute to the backlog of tasks with improvements and suggestions.
- Implement Proof of Concepts (PoC) to introduce new solutions and ideas to the team.
- Effectively manage time and meet deadlines.
5. Contribute actively and effectively as an integrated team member
- Meet regularly with the line manager to review progress.
- Manage issue resolution and critically escalate.
- Work effectively with other teams, units, and departments.
- Manage issues with clarity and ensure effective information flow and team working.
- Support organization's other priority activities, when necessary.
- Act as an Omilia ambassador.
Requirements
- MSc degree in Computer Science, Engineering, or a related subject.
- 2+ years of experience in speech synthesis development roles.
- Ph.D. in a relevant field is a plus but not required.
- Proven experience in developing AI-driven applications, particularly in speech synthesis, voice cloning, or related fields.
- Strong understanding of state-of-the-art voice LLM techniques
- Proficiency in Python and deep learning frameworks like PyTorch or TensorFlow.
- Hands-on experience with TTS frameworks (e.g. FastPitch, VITS, StyleTTS, StyleTTS2) and neural vocoders (e.g., HiFi-GAN, WaveGlow, Vocos)
- Hands-on experience with LLMs, Diffusion models and Neural Audio Codecs.
- Familiarity with zero-shot synthesis approaches and multi-speaker TTS systems.
- Self-motivated and driven to create extraordinary things.
- Ability to work under pressure and on strict deadlines.
- Continuous innovation mindset.
- Excellent written and oral communication skills in English.
- Effective time management skills and the ability to meet deadlines.
Nice to have
- Experience with AWS cloud platform for scalable model deployment and monitoring.
- Experience with NVIDIA Triton Inference Server.
- Experience with MLOps practices.
Benefits
- Fixed compensation;
- Long-term employment with the working days vacation;
- Development in professional growth (courses, training, etc);
- Being part of successful cutting-edge technology products that are making a global impact in the service industry;
- Proficient and fun-to-work-with colleagues;
- Apple gear.
Omilia is proud to be an equal opportunity employer and is dedicated to fostering a diverse and inclusive workplace. We believe that embracing diversity in all its forms enriches our workplace and drives our collective success. We are committed to creating an environment where everyone feels welcomed, valued, and empowered to contribute their unique perspectives without regard to factors such as race, color, religion, gender, gender identity or expression, sexual orientation, national origin, heredity, disability, age, or veteran status, all eligible candidates will be given consideration for employment.
