Himalayas logo
DevsuDE

Site Reliability Engineer (SRE) - GCP

Devsu is a technology partner that provides software delivery and staff augmentation services to startups, scale-ups, and enterprise companies. They specialize in connecting companies with top-tier technology teams and solutions to enhance operational excellence and profitability.

Devsu

Employee count: 201-500

Peru only

Stay safe on Himalayas

Never send money to companies. Jobs on Himalayas will never require payment from applicants.

We are seeking a Site Reliability Engineer (SRE) with deep expertise in monitoring, observability, and reliability engineering to support systems running across on-premises infrastructure and Google Cloud Platform (GCP).

This role is primarily responsible for designing, operating, and improving monitoring, alerting, and observability platforms, with a strong focus on Grafana and Kubernetes environments.

As a secondary responsibility, this role provides backup coverage for the Application Support team during periods of resource constraints or major incidents, offering L2/L3 technical support when required.

Responsibilities

Monitoring & Observability (Core Focus)

  • Own and operate the monitoring and observability stack across on-prem and GCP environments
  • Design, build, and maintain Grafana dashboards for infrastructure, Kubernetes, and applications
  • Define, tune, and maintain alerts to ensure high signal-to-noise ratio
  • Establish observability standards and best practices across teams
  • Improve visibility into system health, performance, and reliability

Site Reliability Engineering

  • Apply SRE principles to improve availability, performance, and resilience
  • Define and track SLIs, SLOs, and error budgets
  • Participate in on-call rotations and SEV incident response
  • Lead or contribute to incident investigations and root cause analysis (RCA)
  • Drive preventative actions to reduce repeat incidents

Kubernetes & Platform Reliability

  • Support and monitor Kubernetes environments (GKE and on-prem clusters)
  • Monitor cluster health, capacity, and resource utilization
  • Troubleshoot platform-level issues impacting application reliability
  • Collaborate with Platform and Engineering teams on reliability improvements

Secondary Responsibilities (Backup Application Support)

  • These responsibilities are activated as needed, not part of day-to-day operations.
  • Provide L2/L3 application support coverage during:
    • Support team resource shortages
    • High-severity incidents (SEVs)
    • Peak support periods or escalations
  • Triage and troubleshoot application issues using existing runbooks and dashboards
  • Collaborate with Application Support and Engineering teams during incidents
  • Ensure all actions, findings, and resolutions are documented in ServiceNow (SNOW)

Requirements

  • Strong experience as a Site Reliability Engineer or Reliability Engineer
  • Deep hands-on expertise with Grafana (dashboards, alerting, troubleshooting)
  • Solid experience with monitoring and observability systems
  • Production experience operating Kubernetes environments
  • Experience supporting systems in GCP and on-prem environments
  • Strong Linux systems and troubleshooting skills
  • Fluent English (written and spoken).
  • Ability to work in PST time zone.
  • Ability to participate in an on-call rotation that includes coverage for one weekend day. Time worked during the weekend is compensated with one day off during the week, in accordance with the established work schedule.

Technology Stack:

  • Observability: Grafana, Prometheus, logging platforms
  • Containers: Kubernetes (GKE and on-prem)
  • Cloud: Google Cloud Platform (GCP)
  • Operations: Linux, networking, infrastructure monitoring
  • Incident Tools: PagerDuty, ServiceNow, Slack (or equivalents)

Nice to have:

  • Experience supporting application teams during SEV incidents
  • Knowledge of capacity planning and performance tuning
  • Scripting skills (Python, Bash, etc.)
  • Experience with hybrid infrastructure environments

Benefits

At Devsu, we believe in creating an environment where you can thrive both personally and professionally. By joining our team, you’ll enjoy:

  • A stable, long-term contract with opportunities for career growth
  • Private health insurance
  • A remote-friendly culture that promotes work-life balance
  • Continuous training, mentorship, and learning programs to keep you at the forefront of the industry
  • Free access to AI training resources and state-of-the-art AI tools to elevate your daily work
  • A flexible Paid Time Off (PTO) policy as well as paid holiday days
  • Challenging, world-class software projects for clients in the US and LatAm
  • Collaboration with some of the most talented software engineers in Latin America and the US, in a diverse work environment

Join Devsu and discover a workplace that values your growth, supports your well-being, and empowers you to make a global impact.

About the job

Apply before

Posted on

Job type

Full Time

Experience level

Senior

Location requirements

Hiring timezones

Peru +/- 0 hours

About Devsu

Learn more about Devsu and their company culture.

View company profile

At Devsu, our culture is built on a foundation of driving operational excellence and enhancing profitability for our clients. We are more than just a service provider; we see ourselves as a strategic partner, dedicated to aligning our deep industry expertise with our clients' goals to forge meaningful and impactful connections. Our mission is to empower businesses to not only adapt to the ever-evolving landscape of digital transformation but to truly thrive within it. This commitment to delivering tangible business value is at the heart of everything we do. We pride ourselves on assembling high-performing technology teams and sourcing top-notch talent, a success rooted in our rigorous vetting process that ensures we work only with the best. By integrating advanced methodologies and AI tools, we empower our teams to consistently deliver outstanding results.

Our team is composed of top-tier professionals, selected through a meticulous process, who are committed to excellence and driven by a shared passion for innovation. We cultivate an environment where mentoring, collaboration, and continuous learning flourish, allowing our talent to work on transformative projects for global brands while advancing their careers. Transparency, trust, and collaboration are woven into every relationship we build, ensuring our clients feel confident in partnering with us. We believe in staying tech-agnostic, focusing on finding the optimal tech stack for each product's vision to ensure peak performance and scalability. Ultimately, our goal is to help our clients become more competitive, efficient, and successful in their respective industries. This people-powered process, combined with our dedication to excellence, reassures clients of our capability to offer exceptional quality and innovative solutions.

Employee benefits

Learn about the employee benefits and perks provided at Devsu.

View benefits

Computer provided

Devsu provides a computer for your work.

Digital library

Access to digital books or subscriptions.

Paid sick days

Sick leave is compensated (limits might apply).

Flexible schedule

Flexible schedule and freedom for attending family needs or personal errands.

View Devsu's employee benefits
Claim this profileDevsu logoDE

Devsu

View company profile

Similar remote jobs

Here are other jobs you might want to apply for.

View all remote jobs

10 remote jobs at Devsu

Explore the variety of open remote roles at Devsu, offering flexible work options across multiple disciplines and skill levels.

View all jobs at Devsu

Remote companies like Devsu

Find your next opportunity by exploring profiles of companies that are similar to Devsu. Compare culture, benefits, and job openings on Himalayas.

View all companies

Find your dream job

Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan