Himalayas logo
Diligent RoboticsDR

TLM, AI Evaluation Science

Diligent Robotics is a pioneering AI company creating robotic assistants like Moxi to improve efficiency in healthcare by handling routine tasks for clinical staff.

Diligent Robotics

Employee count: 201-500

United States only

What we’re doing isn’t easy, but nothing worth doing ever is.

We envision a future powered by robots that work seamlessly with human teams. We build artificial intelligence that enables service robots to collaborate with people and adapt to dynamic human environments. Join our mission-driven, venture-backed team as we build out current and future generations of humanoid robots.

The TLM, AI Evaluation Science will lead the team responsible for advancing the state of the art to measure the performance of physical AI systems, and measuring and validating how our AI systems perform in the real world. This group defines requirements, builds metrics, and creates rigorous evaluation pipelines. This work ensures that our robots meet high bars for safety, reliability, task performance and human trust. You’ll own simulation, testing, labeling, and interpretability frameworks, making sure our robots not only work, but work safely, repeatably, and explainably.

This is a hands-on leadership role in a startup environment. You’ll be both strategist and player-coach: defining evaluation standards, coding tools and models, and building the team that ensures our embodied AI is ready for deployment.

Responsibilities

  • Lead the AI Evaluation Science team, owning evaluation strategy for robot perception, planning, control, and multimodal models.
  • Define metrics and benchmarks for AI performance across safety, reliability, user experience, and robustness.
  • Develop and maintain large-scale simulation environments to test robot behaviors under diverse real-world conditions (edge cases, adversarial scenarios, rare failures).
  • Design evaluation frameworks that cover offline experiments, simulation, and live deployments.
  • Build scalable pipelines for test coverage, automated evaluation, and regression tracking.
  • Oversee labeling and data curation pipelines to generate high-quality ground truth for training and validation.
  • Drive interpretability and explainability in embodied AI models—ensuring failures are measurable, diagnosable, and improvable.
  • Collaborate closely with AI/Robotics engineering teams to define product requirements, set acceptance thresholds, and close the loop between evaluation and development.
  • Actively mentor engineers and scientists while contributing hands-on to code, experiments, and metrics design.

Skills and Experience

  • MS or PhD in Computer Science, Robotics, ML, EE, or related field along with 8+ years of AI/ML experience.
  • Proven leadership experience: built and managed technical teams in AI, simulation, or robotics evaluation.
  • Hands-on expertise building and evaluating large multimodal ML models (vision, language, action).
  • Strong background in defining and operationalizing metrics for AI/robotics systems (safety, robustness, reliability).
  • Demonstrated success in designing end-to-end evaluation pipelines: from data labeling and test definition to automated reporting and regression tracking.
  • Experience in evaluation, benchmarking, or safety in robotics, AVs, or similar domains.
  • Experience with simulation platforms for robotics or AVs
  • Technical depth in ML interpretability, error analysis, and data-driven model improvement.
  • Ability to operate in a startup context: strategic, but hands-on in code and experimentation.
  • Excellent communication and cross-functional alignment skills—able to articulate risks, metrics, and trade-offs to executives, engineers, and non-technical stakeholders.

About the job

Apply before

Posted on

Job type

Full Time

Experience level

Senior

Location requirements

Hiring timezones

United States +/- 0 hours

About Diligent Robotics

Learn more about Diligent Robotics and their company culture.

View company profile

Diligent Robotics is an AI company that creates robot assistants designed to enhance the capabilities of healthcare workers. Founded in 2017 by Andrea Thomaz and Vivian Chu, both experts in social robotics, the company is on a mission to improve patient care by allowing healthcare staff to focus on what they do best. The flagship product, Moxi, is a service robot that assists clinical staff with routine non-patient-facing tasks such as running patient supplies and delivering lab samples. By automating these logistical functions, Diligent Robotics aims to alleviate the workload on human staff, reduce stress, and improve overall efficiency in hospital environments.

With over 30 hospitals utilizing Moxi, Diligent Robotics has positioned itself at the forefront of healthcare innovation and robotics integration. The company is focused on seamless human-robot interaction, enabling robots and humans to work collaboratively side by side. Diligent Robotics prioritizes human-centered design principles to ensure that their robotic solutions are intuitive and provide real value in clinical settings. These efforts reflect a growing recognition of the potential of robotics to address challenges within the healthcare sector, especially in light of labor shortages and increasing demand for efficient patient care.

Claim this profileDiligent Robotics logoDR

Diligent Robotics

View company profile

Similar remote jobs

Here are other jobs you might want to apply for.

View all remote jobs

10 remote jobs at Diligent Robotics

Explore the variety of open remote roles at Diligent Robotics, offering flexible work options across multiple disciplines and skill levels.

View all jobs at Diligent Robotics

Remote companies like Diligent Robotics

Find your next opportunity by exploring profiles of companies that are similar to Diligent Robotics. Compare culture, benefits, and job openings on Himalayas.

View all companies

Find your dream job

Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan
Diligent Robotics hiring TLM, AI Evaluation Science • Remote (Work from Home) | Himalayas