HimalayasHimalayas logo
OmiliaOM

Senior Data Architect

Omilia is a Conversational AI company that provides an enterprise-grade cloud platform for automated voice and chat customer service solutions, aiming to improve customer experience and reduce operational costs.

Omilia

Employee count: 201-500

Poland only

Stay safe on Himalayas

Never send money to companies. Jobs on Himalayas will never require payment from applicants.

Accountabilities

  • Own the Training Environment data architecture end-to-end: dataset design and schema for all ML training pipelines, including dialog corpora for LLM training, conversational steps for NLU models, annotated evaluation sets, and whole-call recordings for speech-to-speech model development.
  • Define and govern data selection and sampling strategy: establish criteria that determine which production conversations have the highest training value, including diversity-optimized sampling, confidence-based filtering, edge-case prioritization, and deduplication strategies.
  • Build and maintain the data catalog and dataset discovery infrastructure: enable ML engineers across LLM, NLU, Speech, and Agentic teams to find, understand, and use training data without friction.
  • Define annotation pipeline architecture: establish requirements for data labeling — intent annotation, entity tagging, dialog act classification, task completion scoring, and agentic reasoning evaluation — across internal annotators and external vendors.
  • Architect the data flywheel: the closed-loop system where real customer conversations feed back into training data collection, curation, annotation, model retraining, and evaluation.
  • Own and maintain data pipelines and infrastructure spanning Snowflake, AWS S3, ETL/ELT pipelines (Airflow), and integration with ML training workflows on AWS SageMaker.

Key Responsibilities

  • Work directly with LLM, NLU, and Agentic systems teams to understand training data requirements — what conversational patterns improve zero-shot routing accuracy, what dialog structures train better task planners, what edge cases stress-test agentic reasoning — and translate these into concrete dataset specifications and pipeline configurations.
  • Define and maintain the data architecture for Omilia's Training Environment: schema design, data flow patterns from production (OCP) to centralized training infrastructure, storage strategy (Snowflake + S3), cross-pipeline consistency, and clear auditable data lineage, including anonymization requirements as part of the compliance layer.
  • Design data quality frameworks that directly improve model outcomes: content-based deduplication, diversity-maximizing sampling, confidence-based filtering using NLU scores and behavioral signals, and dedicated NLU improvement corpus extraction from low-confidence and no-match production data.
  • Define annotation requirements for ML model development — intent labeling guidelines, entity tagging schemas, dialog act classification, task completion scoring, and reasoning quality assessment — and design annotation workflows that produce consistent, high-quality labels at scale; evaluate and manage external data annotation vendors.
  • Build and maintain the data catalog that enables cross-team dataset discovery: document dataset contents, schemas, lineage, quality metrics, intended use cases, and known limitations; define the taxonomy for organizing training datasets across model types (LLM, S2S, NLU, ASR, TTS, agentic).
  • Architect the closed-loop data flywheel: production conversations → data selection → anonymization → curation → annotation → model training → evaluation → safe redeployment → back to production; define feedback mechanisms that route model failure cases into targeted training data collection.
  • Identify gaps in production training data and define requirements for external data acquisition (public datasets, synthetic data generation, vendor-sourced corpora); design data augmentation strategies for underrepresented languages, domains, or conversational patterns.
  • Work closely with LLM/NLU/S2S/ASR/TTS/VB Tech Leads and Senior Engineers to align data architecture with model training requirements; collaborate with Platform Engineering, Security & Compliance, and Product Management stakeholders.
  • Maintain comprehensive documentation of data architecture, dataset specifications, pipeline configurations, and data catalog; produce data architecture RFCs for significant changes and share best practices with ML teams.

Requirements

Technical / Professional Skills

  • 5+ years in data architecture, data engineering, or LLM/ML data infrastructure, with demonstrated ownership of production data systems serving ML/AI model development.
  • Strong understanding of ML training data requirements — what makes training data high-quality, diverse, and useful for LLM and NLU model development, not just clean and well-structured.
  • Deep experience with data modeling, schema design, and data pipeline architecture.
  • Strong proficiency with Snowflake, AWS S3, and ETL/ELT orchestration tools (Airflow, dbt, or similar).
  • Experience defining annotation requirements and managing data annotation workflows — intent labeling, entity tagging, dialog classification, or similar NLP annotation tasks.
  • Experience with data cataloging, metadata management, and dataset discovery at scale.
  • Strong SQL and Python skills for data pipeline development and data quality analysis.
  • Experience with data quality frameworks: deduplication, sampling strategies, diversity optimization.
  • Desirable: hands-on experience with LLM training data preparation — instruction tuning datasets, preference data, RLHF/DPO annotation, synthetic data generation.
  • Desirable: experience with data anonymization and PII/PCI redaction as part of ML data pipelines.
  • Desirable: familiarity with AWS SageMaker ML pipeline integration and active learning/data selection strategies.
  • Desirable: knowledge of voice/audio data handling, storage, and processing at scale.

Soft / Behavioural Skills

  • Excellent communication skills — ability to translate ML team data needs into concrete pipeline specifications and explain data architecture decisions to both technical and compliance audiences.
  • Strong cross-functional collaboration skills: track record of working effectively with ML engineers, platform teams, and product stakeholders.
  • Analytical mindset with the ability to make informed trade-off decisions on data quality, diversity, and scale.
  • Self-driven ownership mentality: comfortable operating as the accountable technical owner of a critical platform domain.

Formal Requirements

  • Master's degree or PhD in Computer Science, Data Engineering, Information Systems, or a related field.
  • Experience with conversational AI data (dialog transcripts, ASR outputs, NLU annotations) is a strong advantage.
  • Experience with data governance for regulated industries (financial services, healthcare) is a plus.
  • Familiarity with NER/NLU-based data processing approaches (spaCy, HuggingFace, custom entity recognition) is desirable.

Benefits

  • Fixed compensation;
  • Long-term employment with the working days vacation;
  • Development in professional growth (courses, training, etc);
  • Being part of successful cutting-edge technology products that are making a global impact in the service industry;
  • Proficient and fun-to-work-with colleagues;
  • Apple gear.

Omilia is proud to be an equal opportunity employer and is dedicated to fostering a diverse and inclusive workplace. We believe that embracing diversity in all its forms enriches our workplace and drives our collective success. We are committed to creating an environment where everyone feels welcomed, valued, and empowered to contribute their unique perspectives without regard to factors such as race, color, religion, gender, gender identity or expression, sexual orientation, national origin, heredity, disability, age, or veteran status, all eligible candidates will be given consideration for employment.

About the job

Apply before

Posted on

Job type

Full Time

Experience level

Location requirements

Hiring timezones

Poland +/- 0 hours

About Omilia

Learn more about Omilia and their company culture.

View company profile

Omilia is a Conversational AI pioneer, dedicated to revolutionizing how customers interact with enterprises. Many customers experience frustration with traditional automated systems, like complex IVR menus, that fail to understand their needs or provide efficient solutions. This is why Omilia developed its enterprise-grade Omilia Cloud Platform (OCP). Our platform empowers businesses to deploy advanced voice and chat AI assistants that engage in natural, end-to-end conversations, making customer service more intuitive and effective. We understand that businesses need to cut costs, protect their customers, and ultimately, delight them with superior service. Our solutions are designed to deliver rapid ROI by automating self-service, which frees up human agents to concentrate on high-value, complex interactions. Furthermore, Omilia incorporates robust contact center security, including voice biometric verification and multi-layered anti-fraud mechanisms, to safeguard customer data and ensure regulatory compliance.

Our customers span various industries, including finance, insurance, retail, utilities, automotive, travel, hospitality, and healthcare, all facing the common challenge of meeting ever-increasing customer expectations for 24/7, personalized service. Omilia addresses these needs by providing a suite of AI-driven tools. This includes Conversational Voice & Chat, Contact Centre Security, Conversational Insights for data analytics, AI Agent Assist for real-time support to human agents, and Workforce AI for call quality management. We are committed to helping enterprises transform their customer care by providing technology that not only understands what customers are saying but also the intent behind their words. This deep understanding allows for higher task completion rates and a significant increase in self-service containment. Omilia started in a small garage in 2002 with a vision to reinvent customer service, and today, we are proud to serve billions of conversations in numerous languages across multiple countries, consistently recognized by industry leaders like Gartner and IDC for our innovative and impactful solutions.

Claim this profileOmilia logoOM

Omilia

View company profile

Similar remote jobs

Here are other jobs you might want to apply for.

View all remote jobs

16 remote jobs at Omilia

Explore the variety of open remote roles at Omilia, offering flexible work options across multiple disciplines and skill levels.

View all jobs at Omilia

Remote companies like Omilia

Find your next opportunity by exploring profiles of companies that are similar to Omilia. Compare culture, benefits, and job openings on Himalayas.

View all companies

Find your dream job

Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan