HimalayasHimalayas logo
LeidosLE

Cloud Engineer - Senior (Observability)

Leidos is an American defense, aviation, information technology, and biomedical research company that provides scientific, engineering, systems integration, and technical services to government and commercial customers.

Leidos

Employee count: 5000+

Salary: 87k-157k USD

United States only

Stay safe on Himalayas

Never send money to companies. Jobs on Himalayas will never require payment from applicants.

The Cloud Engineer - Senior (Observability) supports the SEC ISS contract by engineering,operating, and continuously improving the enterprise observability platform across hybrid cloud and containerized environments. This role is hands-on: instruments services with distributed tracing, code-level profiling, and custom metrics; builds and tunes Datadog (or comparable) dashboards, alerts, APM, log pipelines, RUM, and synthetic monitors; then uses that telemetry to solve production performance, reliability, and capacity problems. The engineer partners with cloud, platform, and application teams to embed observability into Azure, AWS, and container platforms (OpenShift/Kubernetes), and drives reduction of alert noise, mean time to detect (MTTD), and mean time to resolve (MTTR). This position provides senior technical leadership for APM/distributed tracing strategy, SLO/SLI engineering, and data-driven operational decision-making in a 24x7x365 operating environment.

PRIMARY RESPONSIBILITIES

Observability Platform Engineering

  • Engineer andoperatethe enterprise observability stack (Datadog or comparable), including metrics, logs, traces, APM, RUM, synthetic monitoring, and network performance monitoring.
  • Build, tune, andmaintaindashboards, monitors, SLOs/SLIs, and alerting policies that produce actionable signal and minimize noise.
  • Instrument services, infrastructure, and containerized workloads using agents,OpenTelemetry, and language-specific APM tracers (Java, .NET, Python, Node.js, Go) with consistent span tagging, W3CTraceContextpropagation, and unified service tagging across the estate.
  • Develop andmaintainintegrations between observability platforms, ITSM (ServiceNow), CI/CD pipelines, and on-call/paging workflows.
  • Define and enforce a unified tagging standard (environment, service, version, team/ownership, data classification, cost center) across metrics, logs, and traces; manage tag cardinality, governance, and custom business tags to keep telemetryqueryable, attributable, and cost-controlled.

Cloud and Container Monitoring Engineering

  • Design and deliver monitoring coverage for Microsoft Azure and AWS workloads, including PaaS services, serverless, networking, identity, managed databases, and cloud-native data services.
  • Engineer managed database observability across AWS RDS/Aurora (MySQL, PostgreSQL, SQL Server, Oracle), Azure SQL/PostgreSQL/MySQL, and NoSQL/cache services (DynamoDB, Cosmos DB,ElastiCache/Redis), including query-level performance analytics, slow-query and execution-plan capture, lock/deadlock/wait analysis, connection pool and session monitoring, replication lag, storage/IOPS saturation, and backup/HA health -- correlating database spans with upstream APM traces.
  • Engineer container-platform observability for OpenShift/Kubernetes, covering cluster health, control plane, nodes, pods, namespaces, ingress, service mesh, and workload APM.
  • Build standardized, reusable monitoring modules deployable via infrastructure-as-code (Terraform, Bicep, ARM) and CI/CD.
  • Support hybrid visibility across on-premises, cloud, and containerized workloads with correlated telemetry.

Performance Engineering and Problem Solving

  • Lead data-driven investigation and resolution of complex performance, latency, saturation, and reliability issues across the estate.
  • Use APM distributed traces, service/dependency maps, continuous code profiling (CPU, memory, lock contention), database query analytics, exception/error tracking, and RUM-to-backend trace correlation to isolate bottlenecks in applications, platforms, middleware, and downstream dependencies.
  • Partner with engineering teams to define and implement remediation, tuning, and architectural improvements based on telemetry evidence.
  • Define and implement trace-based SLOs, deployment tracking, and change-correlation workflows so performance regressions are detected and attributed to specific releases, versions, or configuration changes.
  • Provide senior technical leadership during major incidents, delivering impact analysis, contributing to root-cause analysis, and owning post-incident observability gaps.

Capacity, Reliability, and Continuous Improvement

  • Analyze operational telemetry and trend data toidentifycapacity risks, recurring constraints, and opportunities for efficiency.
  • Build andmaintaincapacity and performance dashboards and reports that communicate posture, risk, and recommendations to technical and leadership stakeholders.
  • Define capacity thresholds, alert baselines, and trigger points for scaling, technology refresh, and resource reallocation.
  • Drive continuous improvement of observability coverage, alert quality, runbook linkage, and operational maturity aligned to SEC SLA/KPI expectations.

REQUIRED QUALIFICATIONS

Citizenship/Work Authorization: Must meet contract requirements.

Clearance: Ability to obtain andmaintainSEC Public Trust (or higher ifrequired).

EXPERIENCE

  • Minimum 8 years of experience in IT infrastructure or platform engineering roles, including 5+ years focused on observability, performance engineering, or site reliability engineering.
  • Demonstratedexperience engineering andoperatingan enterprise observability platform (Datadog strongly preferred; equivalent experience with Dynatrace, New Relic, Splunk Observability, or Grafana/Prometheus stacks considered).
  • Proven experience building APM and distributed tracing coverage for production multi-tier applications -- including language-specific tracer deployment, custom instrumentation of business transactions, service/dependency mapping, continuous profiling, and RUM-to-backend trace correlation -- across cloud and containerized workloads.
  • Proven experience leading complex production performance and reliability problem-solving from telemetry to remediation.
  • Hands-on experience monitoring Kubernetes or OpenShift clusters and containerized workloads in production.

TECHNICAL SKILLS

  • Enterprise observability platforms (Datadog or comparable): metrics, logs, traces, APM, RUM, synthetic, NPM
  • Instrumentation withOpenTelemetry, Datadog agents/SDKs, and language-specific APM tracers (Java, .NET, Python, Node.js, Go) including custom spans, trace sampling strategies, W3CTraceContextpropagation, and continuous profiling
  • Microsoft Azure and AWS monitoring services and integrations (Azure Monitor, Log Analytics, CloudWatch, AWS X-Ray)
  • Container and Kubernetes/OpenShift observability, including cluster, workload, and service mesh telemetry
  • Cloud database monitoring: AWS RDS/Aurora (including Performance Insights), Azure SQL/PostgreSQL/MySQL (Query Performance Insight), and NoSQL/cache (DynamoDB, Cosmos DB,ElastiCache/Redis); query-level performance tuning, execution-plan analysis, and Datadog DBM or equivalent deep database APM
  • Infrastructure-as-code for monitoring (Terraform, Bicep, ARM) and CI/CD-driven monitor/dashboard deployment
  • APM and distributed tracing: service/dependency maps, trace analytics, RUM-to-backend correlation, exception/error tracking, deployment tracking, and trace-based SLOs
  • Unified tagging strategy and cardinality governance across metrics/logs/traces (environment, service, version, ownership, data classification, cost center), including custom tag enrichment and tag-driven access/cost controls
  • Alert engineering, SLO/SLI design, error budget management, and alert-noise reduction
  • Performance engineering, capacity analysis, and telemetry-driven root-cause analysis
  • Integration of observability with ITSM (ServiceNow) and on-call/paging workflows

PREFERRED QUALIFICATIONS

  • Experience supporting federal agency IT environments under FISMA/FedRAMP/NIST-aligned security and compliance requirements.
  • Datadog certification (Fundamentals and/or Administrator) or comparable enterprise observability certification.
  • Hands-on experience with Red Hat OpenShift Virtualization (CNV/KubeVirt) orotherKubeVirt-based container virtualization observability.
  • Experience witheBPF-based observability tooling and service mesh telemetry (Istio,Linkerd).
  • Experience implementing SLOs and error budgets at enterprise scale and integrating them into operational governance.
  • Experience with cost-aware observability practices, including telemetry volume optimization and retention tuning.
  • Experience integrating observability outputs with executive reporting, SLA/KLI dashboards, and capacity forecasting.

- ITIL 4 Foundation

  • AWS Certified Solutions Architect - Associate (or higher)
  • Microsoft Certified: Azure Administrator Associate (or higher)
  • Red Hat Certified Specialist in OpenShift Administration (or equivalent)

-HashiCorpTerraform Associate

WORK ENVIRONMENT / OTHER

Operational Support: Supports a 24x7x365 operating environment;participatesin a defined on-call rotation and may require surge support based on operational needs.

Location: Telework

Travel: As required per contract direction.

EDUCATION & EXPERIENCE

BS and 4 – 8 years of prior relevant experience or Masters with 2 – 6 years of prior relevant experience. Preferred degree in a relevant field (e.g., Information Technology, Computer Science, Engineering).

If you're looking for comfort, keep scrolling. At Leidos, we outthink, outbuild, and outpace the status quo — because the mission demands it. We're not hiring followers. We're recruiting the ones who disrupt, provoke, and refuse to fail. Step 10 is ancient history. We're already at step 30 — and moving faster than anyone else dares.

Original Posting:

May 19, 2026

For U.S. Positions: While subject to change based on business needs, Leidos reasonably anticipates that this job requisition will remain open for at least 3 days with an anticipated close date of no earlier than 3 days after the original posting date as listed above.

Pay Range:

Pay Range $87,100.00 - $157,450.00

The Leidos pay range for this job level is a general guideline only and not a guarantee of compensation or salary. Additional factors considered in extending an offer include (but are not limited to) responsibilities of the job, education, experience, knowledge, skills, and abilities, as well as internal equity, alignment with market data, applicable bargaining agreement (if any), or other law.

About the job

Apply before

Posted on

Job type

Full Time

Experience level

Salary

Salary: 87k-157k USD

Education

Bachelor degree

Experience

5 years minimum

Experience accepted in place of education

Location requirements

Hiring timezones

United States +/- 0 hours

About Leidos

Learn more about Leidos and their company culture.

View company profile

Leidos' story begins in 1969 when Dr. J. Robert Beyster, a visionary scientist, founded Science Applications Incorporated (SAI) in La Jolla, San Diego, California. With a modest investment and a powerful idea, Dr. Beyster embarked on a journey to apply scientific expertise to solve complex problems. The company's early days were marked by a focus on research and engineering, tackling challenges for various government and commercial clients. One of its initial significant projects involved studying radiation-based cancer therapy for the Los Alamos National Laboratory, which laid the groundwork for Leidos' future health business. SAI soon expanded its reach, opening an office in Albuquerque to support the Air Force Weapons Laboratory's work on electromagnetic phenomena, a precursor to the company's Physical Science Group.

Throughout the 1980s, the company, then known as Science Applications International Corporation (SAIC), strategically shifted its focus towards national security and defense, solidifying its position as a key government services provider. This era set the stage for substantial growth and diversification. The 1990s saw SAIC continue to expand its offerings and international presence, securing its first major global contract with the Kuwaiti Defense Forces. A pivotal moment arrived in 2013 when SAIC underwent a significant transformation, splitting into two independent, publicly traded companies: a new company retaining the SAIC name and the original company, which was rebranded as Leidos (a name derived from 'kaleidoscope'). Leidos, as the legal successor to the original SAIC, inherited its pre-2013 stock price and corporate filing history and established its new headquarters in Reston, Virginia. This strategic move allowed Leidos to sharpen its focus on national security, health, and engineering solutions. Another major milestone occurred in August 2016 when Leidos merged with Lockheed Martin's Information Systems & Global Solutions (IS&GS) business, a landmark transaction that created the defense industry's largest IT services provider and significantly expanded Leidos' capabilities and market share. Today, Leidos stands as a Fortune 500® global science and technology leader, employing approximately 47,000 people worldwide and generating billions in annual revenue, committed to making the world safer, healthier, and more efficient through innovation and technology.

Employee benefits

Learn about the employee benefits and perks provided at Leidos.

View benefits

Paid sick days

Leidos offers paid sick days.

Health Insurance

Leidos offers health insurance.

Dental Insurance

Leidos offers dental insurance.

Vision Insurance

Leidos offers vision insurance.

View Leidos's employee benefits
Claim this profileLeidos logoLE

Leidos

View company profile

Similar remote jobs

Here are other jobs you might want to apply for.

View all remote jobs

80 remote jobs at Leidos

Explore the variety of open remote roles at Leidos, offering flexible work options across multiple disciplines and skill levels.

View all jobs at Leidos

Remote companies like Leidos

Find your next opportunity by exploring profiles of companies that are similar to Leidos. Compare culture, benefits, and job openings on Himalayas.

View all companies

Find your dream job

Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan