Thrive LearningTL

Site Reliability Engineer

Thrive Learning is an all-in-one, AI-powered LMS designed to enhance skills development and communication for modern organizations, boasting a 99% customer retention rate.

Thrive Learning

Employee count: 51-200

United Kingdom only

As a Site Reliability Engineer within the SRE team, you’ll be focused on monitoring and supporting the applications hosted in AWS environments for platforms and tools utilised by our customers.

The SRE team specialises in giving delivery squads visibility of the performance of their services in production and support to investigate and contain potential problems.

Unlike traditional development roles, this position won't have you building features. Instead, you'll dive deep into troubleshooting issues, implementing automation solutions, containing bugs and implementing proactive measures to uphold our system's integrity and performance.

You’ll have freedom to help research and recommend solutions for hosting applications at scale. You’ll be fundamental in incident response, troubleshooting and containing issues.

Key responsibilities

  • Debug Node.js applications and contribute to their optimisation and performance tuning.

  • Configuration and ongoing management of environments and services on AWS.

  • Enhancing tools and processes for monitoring scalable applications on AWS.

  • Maintaining high availability through proactive measures.

  • Troubleshooting and resolving complex technical issues.

  • Documentation of Standard Operating Procedures.

  • Automation of SOPs and Run Books.

  • Respond to issues outside of working hours as per on call rota.

Basic Qualifications

  • Experience implementing environments for web-based microservices.

  • Experience of supporting MongoDB based web applications.

  • Experience of engineering, architecting, or supporting AWS solutions.

  • Familiarity with cloud virtualisation tools such as ECS and/or Docker containers.

  • Experience working with automated deployment systems (eg. CloudFormation. CodeBuild).

  • Familiarity with any monitoring tool. for eg : NewRelic, DataDog, Prometheus, Grafana etc.

  • Experience in automation of workloads using a scripting language like Python or JavaScript

  • Strong problem-solving skills and the ability to troubleshoot complex issues.

  • Good understanding of incident response best practices, post-incident reviews, and continuous improvement.

  • Ability and willingness to proactively improve ways of working and processes.

  • Desire to continually grow, develop and improve.

  • Experience debugging NodeJS applications.

Useful Skills

  • Understanding of REST, GraphQL and asynchronous messaging

  • Experience of using Git for version control.

  • Experience of Continuous Integration and Deployment advantageous.

  • Familiarity with core SRE principles encompassing areas such as monitoring, alerting, error budgets, fault analysis, and other prevalent concepts in the realm of reliability engineering.

  • Excellent written and verbal communication skills.

  • Familiarity with IT compliance and risk management requirements (eg. security, privacy, GDPR etc.)


What You’ll Get

• 💲 Competitive salary

• 🕒 Flexible working hours

• 🎂 Birthday off

• ☀️ 4 Day Summer Holidays (8 weeks p.a.)

• 🩺 Health cash plan

• 🌴 Unlimited holiday

• 🌍 Work from anywhere (4 weeks a year)

• 🍺 Thrive days (10-3 Fridays)

• 🎄 Christmas & New Year shutdown

• ⛷️ Company trip

|

About the job

Apply before

Posted on

Job type

Full Time

Experience level

Mid-level

Location requirements

Hiring timezones

United Kingdom +/- 0 hours

About Thrive Learning

Learn more about Thrive Learning and their company culture.

View company profile

Thrive Learning is on a mission to empower modern employers with impactful solutions that effortlessly upskill their teams. As an all-in-one, AI-powered learning management system (LMS), Thrive combines learning, skills development, and communication into a seamless platform for organizations looking to enhance their learning culture.

With hundreds of customers and a remarkable 99% customer retention rate, Thrive has proven its effectiveness in the industry. It caters to various use cases including onboarding, leadership training, and sales, enabling organizations to adapt and evolve in today’s fast-paced environment. The platform features a wide array of tools from compliance and social learning to analytics and AI-powered content authoring, ensuring that teams have access to the resources needed to thrive in their roles. Thrive's dedicated team of experts provides support and guidance, helping organizations implement the platform successfully and ensuring a positive experience from the outset.

Employee benefits

Learn about the employee benefits and perks provided at Thrive Learning.

View benefits

Cycle to work scheme

Save up to 42% on a new bike.

Company trip

Optional subsidised company trip.

Health Cash Back Plan with Health Shield

Prioritise your health with our comprehensive cash back plan.

Salary Sacrifice Pension

You will be enrolled into Thrive's salary exchange pension scheme.

View Thrive Learning's employee benefits
Claim this profileThrive Learning logoTL

Thrive Learning

View company profile

Similar remote jobs

Here are other jobs you might want to apply for.

View all remote jobs

4 remote jobs at Thrive Learning

Explore the variety of open remote roles at Thrive Learning, offering flexible work options across multiple disciplines and skill levels.

View all jobs at Thrive Learning

Remote companies like Thrive Learning

Find your next opportunity by exploring profiles of companies that are similar to Thrive Learning. Compare culture, benefits, and job openings on Himalayas.

View all companies

Find your dream job

Sign up now and join over 85,000 remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan
Thrive Learning hiring Site Reliability Engineer • Remote (Work from Home) | Himalayas