We’re looking for a Machine Learning Engineer to own and scale our multilingual data pipeline—from sourcing and curation to evaluation and continuous improvement. You’ll work closely with researchers and infra engineers to ensure our models perform robustly across languages, scripts, and cultural contexts.

This role sits at the intersection of data, research, and production ML and is ideal for someone who cares deeply about data quality, linguistic diversity, and model generalization beyond English.

What You’ll Do

Design, build, and maintain large-scale multilingual datasets across high- and low-resource languages
Develop data pipelines for collection, cleaning, normalization, deduplication, and labeling
Implement quality filters using statistical, heuristic, and model-based methods
Work with researchers to define language coverage, benchmarks, and evaluation metrics
Analyze dataset bias, coverage gaps, and failure modes across regions and scripts
Support training, fine-tuning, and distillation workflows with high-quality multilingual data
Continuously iterate on datasets based on model performance and real-world usage

What We’re Looking For

3+ years of experience as an ML Engineer, Applied Scientist, or similar role
Strong experience working with multilingual or non-English datasets
Solid understanding of NLP fundamentals (tokenization, embeddings, language modeling)
Experience building scalable data pipelines (Python, Spark, Ray, or similar)
Familiarity with Unicode, scripts, tokenization challenges, and language-specific quirks
Comfort collaborating with researchers and translating research needs into production systems

Nice to Have

Experience with low-resource languages or multilingual benchmarks (e.g. FLORES, XTREME)
Exposure to LLM training, fine-tuning, or distillation
Linguistics background or experience working with native language experts
Contributions to open-source datasets or ML tooling
Experience with data quality evaluation at scale

Why Join

Real ownership over a core differentiator of the product
Work on models used globally, not just in English-speaking markets
Small, high-caliber team with deep ML and systems experience
Competitive compensation + meaningful equity at Series A stage

Machine Learning Engineer — Multilingual Data

What You’ll Do

What We’re Looking For

Nice to Have

Why Join

Apply now

About the job

Apply before

Posted on

Job type

Experience level

Location requirements

Hiring timezones

Job categories

Skills

About Featherless AI

Apply now

About the job

Apply before

Posted on

Job type

Experience level

Location requirements

Hiring timezones

Job categories

Skills

Featherless AI

Similar remote jobs

Data Engineer

Senior Machine Learning Engineer (Applications)

Data Scientist (AI Data & LLM Specialist)

ML Engineer

Senior Artificial Intelligence Specialist

Data & AI Engineer

19 remote jobs at Featherless AI

Developer Relations Associate/Intern (Partnerships) Paris-Based

Developer Relations Associate/Intern (Partnerships) Berlin-Based

Founding Account Executive (AI Cloud)

Business Development Rep (AI Cloud)

Business Development Rep (AI Cloud)

Developer Relations Associate/Intern (Partnerships) Boston-Based

Find your dream job

Find your dream job

Apply now

Apply now

Data Engineer

Senior Machine Learning Engineer (Applications)

Data Scientist (AI Data & LLM Specialist)

ML Engineer

Senior Artificial Intelligence Specialist

Data & AI Engineer

Developer Relations Associate/Intern (Partnerships) Paris-Based

Developer Relations Associate/Intern (Partnerships) Berlin-Based

Founding Account Executive (AI Cloud)

Business Development Rep (AI Cloud)

Business Development Rep (AI Cloud)

Developer Relations Associate/Intern (Partnerships) Boston-Based

Find your dream job