HimalayasHimalayas logo
VeritoneVE

Site Reliability Engineer II

Veritone, Inc. is an American artificial intelligence (AI) technology company that provides AI-powered solutions across various industries through its proprietary aiWARE™ operating system. Founded in 2014, the company is headquartered in Irvine, California, and serves a global customer base.

Veritone

Employee count: 501-1000

Salary: 130k-140k USD

United States only

Stay safe on Himalayas

Never send money to companies. Jobs on Himalayas will never require payment from applicants.

POSITION SUMMARY

The ideal candidate will have 7+ years of experience in Linux systems and software management, expertise with Terraform, Ansible, and cloud platforms like AWS, Azure, and GCP. Experience with large-scale distributed systems, monitoring/alerting systems (Prometheus, Grafana), CI/CD pipelines, container orchestration (Docker, Kubernetes), and programming languages (Go, Java, Python) is essential. Because we are an AI-first company, this role also heavily involves engineering scalable infrastructure for machine learning workloads, including GPU provisioning and MLOps integrations. A background in implementing security controls, automating deployments, and troubleshooting complex systems is also required.

WHAT YOU'LL DO

  • Deploy and maintain a resilient, secure, and efficient SaaS application platform to meet established SLAs.

  • Build and maintain robust CI/CD pipelines and developer platforms to empower engineering teams to release features quickly and safely.

  • Design and deploy scalable infrastructure specifically optimized for AI/ML workloads, including managing GPU resources and integrating MLOps tools.

  • Automate, monitoring, management and incident response to achieve an auto-remediation system.

  • Participate in on-call rotation to ensure stability and uptime for our platforms.

  • Scale infrastructure to meet rapidly increasing demand.

  • Independently design and develop tools to aid in operations and automation to AI as well as work jointly with other team members to deliver innovative solutions to complex business and technical challenges.

  • Provide deployment and operations support for multi-tiered distributed software applications.

  • Estimate engineering effort, plan implementation, and rollout system changes that meet requirements for functionality, performance, scalability, reliability, and adherence to development goals and principles.

  • Collaborate in a fast paced environment with multiple teams (software development, release management, build and release, etc...).

  • Defining how the behavior of large scale systems can be achieved.

  • Measuring and achieving reliability through engineering and operations automation.

  • Monitoring and alert development, documentation and management with the goal of creating an auto-remediation system to bring platform stability.

  • Adapting security controls to products not typically native to GA releases.

  • Developing automation methods to extend standard deployment pipelines for bespoke implementations.

  • Patching, configuration management, policy enforcement, and audit of production systems.

  • Driving the Disaster Recovery process.

WHAT YOU'LL NEED

  • 5+ years of professional Linux and Windows systems and software management experience.

  • Expertise with Infrastructure-as-Code such as Terraform and Cloud Formation.

  • Knowledgeable with code languages including: Python, Go, Node.js.

  • Experience with managing infrastructure within Azure, GCP and AWS.

  • Expertise in Kubernetes management, upgrades.

  • Strong script skills for systems and data driven solutions.

  • Strong GitOps and CICD experience with tools such as Jenkins, ArgoCD, Helm.

  • Proven ability to lead root-cause analysis (RCA) and blameless post-mortems, actively driving strategic architectural changes to prevent incident recurrence.

  • Act as an infrastructure consultant to software engineering teams, guiding them on reliability best practices and system architecture during the design phase, not just at deployment.

  • Identify systemic weaknesses across our multi-tiered applications and strategically advocate for reliability roadmap items.

  • Drive a culture of observability; ensuring our AI/ML applications emit the right metrics so we can anticipate failures before our customers notice them.

  • Comprehensive background in monitoring and alerting systems in auto-remediation systems including Prometheus, Grafana.

  • Familiarity with deploying, scaling, and observing AI models, Vector Databases, or LLMs in production environments.

  • Proven examples of standardizing security controls and configuration management across large-scale infrastructure in multiple environments.

  • Comfort working within project/task management platforms.

Systems and Tools

  • Cloud/Infrastructure platforms: AWS and Azure.

  • Infrastructure & Configuration: Terraform, Cloud Formation, Python.

  • Programming & Scripting: Go, Node.js, Python, and BASH.

  • CI/CD & GitOps: Jenkins, ArgoCD, GitHub Actions, Rundeck.

  • Datastores: Postgres, MySQL, MongoDB, MSSQL, ElasticSearch, Solr.

  • Container Orchestration: Docker, Kubernetes, EKS, AKS.

  • Monitoring/Alerting Tools: Prometheus, Grafana, Thanos, Runscope, Cloudwatch, Monitor, VictorOps.

  • AI/MLOps: NVIDIA Triton, Kubeflow, MLflow, or similar model serving frameworks.

  • Security & Hardening: STIG, CIS, SELinux, IPTables, CJIS, FIPS 140-3.

  • Data & APIs: JSON data structures and database schemas. API Query language: REST, GQL.

Bonus Points If

  • Bachelor’s degree in Computer Science or related field.

  • Experience provisioning and managing GPU infrastructure (e.g., NVIDIA CUDA).

  • Have worked in regulated or public sector environments through development and assessment of cloud based solutions.

  • Experience with the following languages, platforms and tools: Perl, Java, VMWare,

  • Have concrete examples ready to present for creating auto-remediation systems and infrastructure with agentic solutions.

DISCLOSURE

Our company provides equal employment opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, sex, national origin, age, disability or genetics.

(Colorado & California Only*): The posted annual salary range provided is of $130,000.00 to $140,000.00. This base pay is for illustrative purposes only and will be determined based on skills and experience comparable to the job requirements. This position may be eligible for additional compensation and benefits including but not limited to: incentive compensation; health benefits; retirement benefits; life insurance; paid time off; parental leave and benefits; and other employee perks and benefits.
• Note: Disclosure as required by sb19-085 (8-5-20) of the minimum salary compensation for this role when being hired in California & Colorado.

About the job

Apply before

Posted on

Job type

Full Time

Experience level

Senior

Salary

Salary: 130k-140k USD

Experience

5 years minimum

Location requirements

Hiring timezones

United States +/- 0 hours

About Veritone

Learn more about Veritone and their company culture.

View company profile

Veritone, Inc. is an American artificial intelligence (AI) technology company headquartered in Irvine, California, with additional offices in London, New York City, San Diego, and Seattle. Founded in 2014 by brothers Chad and Ryan Steelberg, Veritone provides AI-powered solutions across various industries, including media and entertainment, government and public safety, energy, and talent acquisition. The company's core offering is its proprietary AI operating system, aiWARE™, which orchestrates an expanding ecosystem of machine learning models to transform audio, video, and other data sources into actionable intelligence. This platform enables customers to analyze vast amounts of structured and unstructured data at scale and in near real-time.

Veritone's aiWARE technology and solutions are licensed and utilized by over 3,500 customers globally, including major media conglomerates, professional sports teams, federal government agencies, energy utilities, and state and local police departments. The company's products and services are also used by its wholly-owned subsidiaries, which historically included advertising agency Veritone One and Veritone Digital, a provider of content management solutions and licensing services, though Veritone One was divested in October 2024. Veritone's suite of applications addresses needs such as digital evidence management, redaction, media management and monetization, recruitment, and energy management. For instance, Veritone Redact software is used by police departments for redacting sensitive information from evidence, and Veritone Attribute helps broadcasters measure advertising efficacy. The company has also expanded into the HR technology space with acquisitions like PandoLogic. Veritone went public via an Initial Public Offering (IPO) on May 12, 2017, and is traded on the NASDAQ Global Market under the ticker symbol VERI. The company emphasizes a human-centered approach to AI, aiming to augment human capabilities and drive operational efficiency and new opportunities for its clients.

Employee benefits

Learn about the employee benefits and perks provided at Veritone.

View benefits

Pet insurance

Veritone offers pet insurance.

Company equity

Veritone offers company equity.

Life insurance

Veritone offers life insurance.

Paid sick days

Veritone offers paid sick days.

View Veritone's employee benefits
Claim this profileVeritone logoVE

Veritone

View company profile

Similar remote jobs

Here are other jobs you might want to apply for.

View all remote jobs

13 remote jobs at Veritone

Explore the variety of open remote roles at Veritone, offering flexible work options across multiple disciplines and skill levels.

View all jobs at Veritone

Remote companies like Veritone

Find your next opportunity by exploring profiles of companies that are similar to Veritone. Compare culture, benefits, and job openings on Himalayas.

View all companies

Find your dream job

Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan