Himalayas logo
ServiceNowSE

Principal Observability Architect

At ServiceNow, we make the world of work, work better for people.

ServiceNow

Employee count: 5000+

United States only

We are seeking a Principal Observability Architect to lead the strategic architecture, evolution, and operationalization of a modern, multi-tenant Observability Platform-as-a-Service (OPaaS) tailored for a hybrid on-prem and cloud-native SaaS product.

You will architect a cloud-agnostic, federated observability platform that supports real-time monitoring, advanced telemetry pipelines, and AI-powered insights to ensure platform reliability, developer productivity, and exceptional customer experiences. This role combines deep technical leadership with a strong focus on developer enablement, platform resiliency, and data governance.

What you get to do in this role:

Platform Architecture & Strategy

  • Lead architecture and roadmap for a multi-region, multi-cloud, multi-tenant observability platform scalable across diverse customer environments and service boundaries.
  • Architect near real-time telemetry ingestion pipelines with low-latency guarantees (seconds) using a mix of streaming and batch processing technologies.
  • Define observability blueprints including telemetry SLAs, data contracts, tenant data isolation, and cost-aware retention strategies for high-cardinality data.
  • Ensure observability systems are cloud-native and container-aware, supporting environments built on Kubernetes, service meshes, and serverless components.

Real-Time Monitoring & Detection

  • Design and implement real-time metrics, logs, traces, and event pipelines with technologies such as:
    • VictoriaMetrics, Prometheus, Grafana, Alertmanager
    • Cribl Stream and Edge for dynamic routing and filtering
    • VictoriaLogs for structured log analysis
  • Embed real-time anomaly detection and signal correlation, with context-aware alerting to reduce noise and MTTR.
  • Integrate with alerting and incident response tools (PagerDuty, Slack, ServiceNow) for automated incident routing and contextual enrichment.
  • Ensure observability of synthetic probes, end-user transactions, and critical SLOs with per-tenant granularity.

Instrumentation, Developer Enablement & CI/CD Integration

  • Standardize OpenTelemetry instrumentation across all services with prebuilt SDKs, language libraries, and semantic conventions.
  • Architect OpenTelemetry deployment patterns (agent-based, sidecar, collector pipelines) with support for Kubernetes, Lambda, and edge environments.
  • Embed observability validation gates into CI/CD workflows (e.g., GitHub Actions, GitLab CI) to enforce telemetry compliance before production rollout.
  • Provide self-service tools, templates, and training to enable developer teams to adopt observability by default.

AI for Observability & Productivity

  • Leverage AI/ML for:
    • Real-time anomaly detection and noise suppression
    • Predictive incident detection and impact forecasting
    • Auto-summarization of alert storms and telemetry bursts
    • Multi-tenant root cause and blast radius correlation
  • Build or integrate LLM-powered tools that support:
    • Natural language querying of live telemetry
    • AI-assisted debugging and dashboard generation
    • Generative runbooks and incident summaries

Data Platform Architecture

  • Architect hot and cold telemetry storage pipelines using:
    • VictoriaMetrics and Cribl for hot-path observability
    • Long-term retention in object storage (e.g., S3, GCS) using open formats (Parquet, JSON)
    • Federated querying engines like Trino for historical and cross-service analytics
  • Implement cost-aware ETL strategies, balancing real-time visibility with storage and ingestion optimization.
  • Incorporate data governance, PII handling, and regional data compliance (e.g., GDPR, SOC2) into telemetry architecture.

SaaS Operations & ITSM Integration

  • Integrate observability into ITSM and incident response systems (e.g., ServiceNow, Jira):
    • Auto-create incidents enriched with correlated traces, logs, and metrics
    • Provide real-time telemetry context in change and problem management flows
  • Deliver customer-facing health dashboards, SLA monitoring, and per-tenant observability insights to support operational excellence and transparency.

Technical Leadership

  • Lead cross-functional collaboration with SRE, Platform, Security, and Engineering teams to evolve observability maturity.
  • Define and document observability patterns, anti-patterns, and escalation workflows.
  • Drive internal R&D around OpenTelemetry, AI in observability, high-cardinality telemetry, and eBPF-based observability tooling.

To be successful in this role you have:

  • Experience in leveraging or critically thinking about how to integrate AI into work processes, decision-making, or problem-solving. This may include using AI-powered tools, automating workflows, analyzing AI-driven insights, or exploring AI's potential impact on the function or industry.
  • 10+ years in DevOps, SRE, or Observability roles, including 5+ years in architecture or platform engineering.
  • Proven experience designing and operating near real-time observability systems in global-scale SaaS environments.
  • Deep expertise in OpenTelemetry (including collector deployment, semantic conventions, sampling strategies).
  • Experience integrating observability in Kubernetes, microservices, and serverless ecosystems.
  • Hands-on with telemetry data pipelines using Cribl, Prometheus/VictoriaMetrics, and log/trace platforms.
  • Experience embedding telemetry validation in CI/CD workflows.
  • Familiarity with AI/ML for observability (anomaly detection, summarization, impact correlation).
  • Working knowledge of data privacy, retention, and compliance practices in observability.

Nice to Have:

  • Experience with Trino, S3 data lakes, and long-term observability analysis.
  • Experience building customer-facing observability features (dashboards, SLAs, health status pages).
  • Contributions to open-source observability tools or standards.
  • Knowledge of or hands-on experience with Agentic AI systems to drive autonomous remediation, telemetry analysis, or incident response.
  • Relevant certifications (e.g., AWS, GCP, Azure, OpenTelemetry, Observability Practitioner).

GCS-23

Work Personas

We approach our distributed world of work with flexibility and trust. Work personas (flexible, remote, or required in office) are categories that are assigned to ServiceNow employees depending on the nature of their work and their assigned work location. Learn more here. To determine eligibility for a work persona, ServiceNow may confirm the distance between your primary residence and the closest ServiceNow office using a third-party service.

Equal Opportunity Employer

ServiceNow is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, creed, religion, sex, sexual orientation, national origin or nationality, ancestry, age, disability, gender identity or expression, marital status, veteran status, or any other category protected by law. In addition, all qualified applicants with arrest or conviction records will be considered for employment in accordance with legal requirements.

Accommodations

We strive to create an accessible and inclusive experience for all candidates. If you require a reasonable accommodation to complete any part of the application process, or are unable to use this online application and need an alternative method to apply, please contact globaltalentss@servicenow.com for assistance.

Export Control Regulations

For positions requiring access to controlled technology subject to export control regulations, including the U.S. Export Administration Regulations (EAR), ServiceNow may be required to obtain export control approval from government authorities for certain individuals. All employment is contingent upon ServiceNow obtaining any export license or other approval that may be required by relevant export control authorities.

From Fortune. ©2025 Fortune Media IP Limited. All rights reserved. Used under license.

It all started in sunny San Diego, California in 2004 when a visionary engineer, Fred Luddy, saw the potential to transform how we work. Fast forward to today — ServiceNow stands as a global market leader, bringing innovative AI-enhanced technology to over 8,100 customers, including 85% of the Fortune 500®. Our intelligent cloud-based platform seamlessly connects people, systems, and processes to empower organizations to find smarter, faster, and better ways to work. But this is just the beginning of our journey. Join us as we pursue our purpose to make the world work better for everyone.

About the job

Apply before

Posted on

Job type

Full Time

Experience level

Senior

Location requirements

Hiring timezones

United States +/- 0 hours

About ServiceNow

Learn more about ServiceNow and their company culture.

View company profile

At ServiceNow, we make the world of work, work better for people. We deliver digital workflows that create great experiences and unlock productivity.

With 6,200+ customers, we serve ~80% of the Fortune 500. And we are on the 2020 list of FORTUNE World’s Admired Companies®. This is the future of work.

Who We Are

ServiceNow believes in the power of technology to reduce the complexity in our jobs and make work, work better for people.

What We Do

We transform old, manual ways of working into modern digital workflows. Employees and customers get what they need, when they need it—fast, simple, easy.

Careers

Want to ride a rocket ship? LinkedIn named us one of the top U.S. employers in 2019. Join a diverse, creative, fast-growing team that's changing how the world works.

Diversity, Inclusion and Belonging

ServiceNow embraces diversity, inclusion, and belonging as core to how we operate, how we recruit talent and develop our people, and how we create culture. We empower our employees to bring their best selves to work.

Leadership

“I believe ServiceNow has the potential to become one of the great enterprise software companies of this era.”
Bill McDermott, President and CEO

Employee benefits

Learn about the employee benefits and perks provided at ServiceNow.

View benefits

Learning and development budget

Learning and development stipend to grow your skills.

Paid parental leave

Generous family leave for all parents to support you and your family.

Volunteer opportunities

Paid volunteer time and matched donations for non profits that matter to you.

Retirement benefits

Generous 401(k) with matching and regional retirement plans to help you invest in your future.

View ServiceNow's employee benefits
Claim this profileServiceNow logoSE

ServiceNow

View company profile

Similar remote jobs

Here are other jobs you might want to apply for.

View all remote jobs

205 remote jobs at ServiceNow

Explore the variety of open remote roles at ServiceNow, offering flexible work options across multiple disciplines and skill levels.

View all jobs at ServiceNow

Remote companies like ServiceNow

Find your next opportunity by exploring profiles of companies that are similar to ServiceNow. Compare culture, benefits, and job openings on Himalayas.

View all companies

Find your dream job

Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan
ServiceNow hiring Principal Observability Architect • Remote (Work from Home) | Himalayas