AccelbyteAC

Site Reliability Engineer

Build, scale, and operate games faster, simpler, and without compromise.

Accelbyte

Employee count: 201-500

Indonesia only
Apply now

At AccelByte, our mission is to empower game creators by providing them with the backend platform and tools required to make scalable, reliable AAA-quality games. The company was founded in 2016 by industry veterans who have engineered online systems for some of the largest game and distribution platforms in the world including Fortnite, Epic Store, Xbox Live, PlayStation Network, and EA Origin. We are backed by top investors including Softbank, Sony Interactive Entertainment, Galaxy Interactive, NetEase, and Krafton. Our latest Series B funding has firmly solidified our place as a top player in the gaming industry. AccelByte’s talent has decades of experience building and shipping some of the largest game and distribution platforms in the world.

We believe that the best companies empower employees to make decisions, obsess about the best user experience, and are not afraid to make and learn from their mistakes. Our culture is based on humility, openness to feedback, drive, and collaboration, which we feel results in the best performing teams. As a company that values diversity, inclusion, and employee growth, our employees have opportunities to work with and learn from teams all over the world. We offer competitive salaries, a full range of health benefits, social activities, career growth opportunities, and an amazing team. Come join us!

Position Summary

As an SRE/Cloud Engineer - Observability, your primary responsibility revolves around enhancing the observability of our infrastructure. You play an important role in strategically optimizing resources and driving initiatives to ensure effective infrastructure management aligned with business objectives. Your focus lies in implementing tools and practices that enable comprehensive monitoring, logging, and tracing of system components and processes. By doing so, you contribute to improving system reliability, troubleshooting efficiency, and overall operational transparency.

Essential Functions/Responsibilities

The SRE/Cloud Engineer - Observability is accountable for the following functions and responsibilities:

  • Configure and maintain monitoring tools (Prometheus, Grafana, AWS CloudWatch) for real-time visibility into system performance and health.
  • Implement log management solutions (Elasticsearch, Fluentd, Kibana) to gain insights from application and system logs.
  • Enhance observability strategies and tools to monitor the performance, availability, and reliability of distributed systems.
  • Maintain robust monitoring and alerting solutions for timely issue detection and resolution.
  • Promote best practices in observability, including logging, tracing, and metrics collection within development teams.
  • Collaborate with development teams to ensure proper instrumentation of containerized applications for monitoring and observability.
  • Utilize Kubernetes (K8s) for container orchestration, scalability, reliability, and efficient resource utilization.
  • Assist in performance analysis, capacity planning, and optimizing system performance and resource utilization.
  • Identify and address bottlenecks, inefficiencies, and potential failure points in the system.
  • Assist in creating and enforcing cost control measures, monitor AWS resource utilization, and identify optimization opportunities to decrease infrastructure costs.
  • Implement containerization strategies to improve deployment efficiency and resource utilization in the AWS environment.
  • Contribute to the analysis of cloud resource usage patterns and identify opportunities for cost optimization.
  • Perform technical assessment for SRE/ Cloud Engineer candidates
  • Perform other duties as assigned.

Qualifications/Experience Required

  • Bachelor's Degree background or relevant work experience, certification, or courses
  • At least 3 years of experience specializing in roles such as Site Reliability Engineering (SRE) or similar, with a particular focus on improving observability within distributed systems.
  • Experience in designing and implementing log collection, aggregation, and visualization systems using Fluentd, Fluentbit, prom-tail, Loki LokiQL, Logstash, OpenSearch, and AWS Athena.
  • Experience in designing and implementing metric collection, aggregation, and visualization solutions using technologies like Prometheus PromQL, Grafana, cadvisor, metric-server, and Cloudwatch.
  • Practical knowledge of trace collection, aggregation, and visualization methodologies employing tools such as Grafana tempo TraceQL, tail sampling, and open telemetry.
  • Basic experience in Kubernetes, including using Kubectl, flux, and other tools for debugging and modifying cluster states and understanding containerization technology's limitations and usage within a Kubernetes cluster.
  • Basic experience in containerization technology, particularly Docker and Containers, including its limitations and practical applications within a Kubernetes cluster environment.
  • Basic experience in using Infrastructure-as-Code (IaC) tools (e.g., Terraform, Cloudformation) for provisioning and configuration management, including the ability to apply, modify, or delete modules and create custom Terraform modules.
  • Basic experience in performing cloud system operations on AWS infrastructure, including backups, snapshots, and other administrative tasks.
  • Practical knowledge of defining budgets, forecasting expenses, and building automated tools to identify cost trends and anomalies for cloud infrastructure
  • Understanding of distributed systems architecture and best practices.
  • Experience in using one or more scripting or programming languages (e.g., Python, Go) for automation and tooling development.
  • Experience in managing and optimizing costs across multiple cloud accounts or subscriptions, with proficiency in cloud account management tools and techniques, is a plus point.
  • Familiarity with multi-cloud or hybrid cloud environments, including the ability to navigate and leverage different cloud platforms simultaneously, is preferred.
  • Experience at a AAA game studio or a software product company is preferred.
  • AWS Certified Solutions Architect is a big plus.
  • Experience working in a multinational technology startup is a big plus.
  • Eagerness to learn new languages and technologies.
  • Proficiency in written and verbal English language.
  • Flexibility to adjust to work routines/schedules, as required, to meet the needs of the company and the expectations of customers.

AccelByte Inc is an Equal Employment Opportunity Employer, all qualified candidates and applicants will receive consideration for employment without regard to race, religion, gender, national origin, sexual orientation, marital status, age, or disability. Our culture is innovative and inclusive, and we value our people the highest.

Please visit our career page for a complete listing of our open positions https://accelbyte.io/careers

Elevate your application

Let our AI craft your perfect cover letter and align your resume to this job's criteria.

By using our AI tools, you consent to sharing your profile with our AI partner for this purpose.

Apply now

Please let Accelbyte know you found this job on Himalayas. This helps us grow!

Apply now

About the job

Apply before

Jun 25, 2024

Posted on

Apr 26, 2024

Job type

Full Time

Experience level

Mid-level

Location requirements

Hiring timezones

Indonesia +/- 0 hours

About Accelbyte

Learn more about Accelbyte and their company culture.

View company profile
Build, scale, and operate games faster, simpler, and without compromise.
AccelByte is a game backend platform that helps creators focus on what matters most: Making awesome games.

About us
We are founded by industry veterans who have engineered online systems for some of the largest game and distribution platforms in the world—including Fortnite, Epic Store, Xbox Live, PlayStation Network, and EA Origin.

Our Principles
We know there's no one-size-fits-all solution when it comes to making an online game. Our team has decades of collective experience in this industry, so we know that building a platform from scratch is no easy task.
Our mission is to provide accessible, best-in-class tools and game platform technology for creators to develop and operate world-class online gaming and entertainment experiences. Our functions are designed as discrete microservices, so you can use only what you need. Our technology is store platform agnostic, so our services are available regardless of where your players play. We provide a single tenant deployment of our services per client, so you have your own dedicated environment to yourself.
We'll grow with you and we set no artificial limits on the number of transactions, storage, or anything in the way of your growth. And without question, you will always fully own your players' data.

Employee benefits

Learn about the employee benefits and perks provided at Accelbyte.

View benefits

Healthcare benefits

Accelbyte provides an allowance for med-related needs.

Paid parental leave

Paid family leave for all parents to support you and your family.

Life insurance

Life insurance and accidental death & dismemberment insurance.

Learning and development budget

Regular employees are given trainings for professional and personal growth. Employees are also allowed to request joining a seminar/workshop/conference/etc. to improve themselves, and the company does its best to provide itinerary and tickets to the event.

View Accelbyte's employee benefits
Claim this profileAccelbyte logoAC

Accelbyte

Company size

201-500

Founded in

2016

Chief executive officer

Junaili Lie

View company profileVisit accelbyte.io

Similar remote jobs

Here are other jobs you might want to apply for.

View all remote jobs

11 remote jobs at Accelbyte

Explore the variety of open remote roles at Accelbyte, offering flexible work options across multiple disciplines and skill levels.

View all jobs at Accelbyte

Remote companies like Accelbyte

Find your next opportunity by exploring profiles of companies that are similar to Accelbyte. Compare culture, benefits, and job openings on Himalayas.

View all companies

Find your dream job

Sign up now and join thousands of other remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan