Hitachi SolutionsHS

Lead Site Reliability Engineer (AZURE) - Empower Product Group

Hitachi Solutions is a global IT consulting firm and systems integrator, focused exclusively on the Microsoft platform to deliver innovative business solutions.

Hitachi Solutions

Employee count: 5000+

Salary: 143k-199k USD

United States only

Hitachi Solutions is a global Microsoft solutions integrator seeking a Lead Site Reliability Engineer to design and implement CI/CD tooling using GitHub Actions/Azure DevOps, Azure Kubernetes AKS clusters, and related technologies. The successful candidate will work closely with application, engineering, security, and operations teams to engineer and build Kubernetes and Azure PaaS & IaaS solutions within an agile and modern enterprise-grade operating model.

Requirements

  • Responsible for availability, latency, performance, efficiency, monitoring/observability, emergency response, capacity planning, setting, and maintaining SLOs, SLIs and Error Budgets, creating dashboards.
  • Analyze, troubleshoot, and resolve operational challenges contributing to defined SLO's.
  • Manage site stability, performance, reliability, and maintain uptime for production environments.
  • Develop a fully automated multi-environment observability stack based on the existing system and extend it to predict capacity needs based on the usage patterns.
  • Strive for automation to reduce toil and increase development velocity.
  • Perform application-specific production support, incident management, change management, problem management, RCAs, and service restoration as needed.
  • Identify changes for the product architecture from the reliability, performance and availability perspective with a data driven approach.
  • Analyze and address complex technical challenges and issues that arise during the software development & run lifecycle. Debug, troubleshoot, and resolve technical problems efficiently.
  • Create and maintain technical documentation, including design specifications, user guides, run books and best practice guidelines.
  • Actively look for opportunities to improve the availability and performance of the system by applying the learnings from monitoring and observation.
  • Collaborate with software development teams in the release management process and to shape the future roadmap and establish strong operational readiness across teams.
  • Participate in Agile ceremonies, such as sprint planning, stand-up meetings, and retrospectives.
  • Collaborate with product managers, designers, and other engineers to ensure alignment and efficient project execution.
  • Share your expertise and mentor engineers, helping them grow and develop their skills. Foster a culture of continuous learning and improvement within the team.
  • Stay updated with the latest technologies, tools, and cloud computing. Proactively learn and adapt to new technologies to drive innovation.
  • Collaborate with customers to understand their needs, gather feedback, and provide technical support and guidance as needed.
  • Triage incoming Web Support escalation requests routing to applicable internal teams
  • Contribute to incident root cause analysis, service restoration, and serve as an incident commander during outage events.
  • Strong background as a SRE supporting a 24x7 highly available production environment for a SaaS or cloud service provider.
  • Solid experience with Monitoring/APM/Observability tools (Data dog, Application Insights, Prometheus, Grafana etc.,)
  • Strong backgroud with Azure Resources like Key Vault, Data Factory, Azure Databricks and Storage Accounts.
  • Experience implementing observability plans around logs, metrics, and traces.
  • Experience in an agile development team developing software.
  • Implement and participate exercising best practices for CI/CD.
  • Experience with cloud infrastructure environments, preferably Azure, and Infrastructure as code (Terraform, Bicep, ARM).
  • Design, develop, and maintain infrastructure using popular IaC tools and technologies like Terraform, Helm, others.
  • Strong experience with containerization technology and/or Kubernetes.
  • Experience with Release automation, system administration, configuration management.
  • Experience with programming languages (Python, Go, etc.).
  • Strong understanding of Linux, Windows, software development, systems, networking, and cloud concepts.
  • Strong interpersonal and teaming skills - ability to set and enforce process and influence engineers who are not direct reports.
  • Strong analytical and programming skills (Python, Go etc.).
  • Experience with MLFlow and other MLOps pipeline technology

Benefits

  • Bonus Plan
  • Medical, Dental and Vision Coverage
  • Life Insurance and Disability Programs
  • Retirement Savings with Company Match
  • Paid Time Off
  • Flexible Work Arrangements including Remote Work

About the job

Apply before

Posted on

Job type

Full Time

Experience level

Senior
Manager

Salary

Salary: 143k-199k USD

Location requirements

Hiring timezones

United States +/- 0 hours

About Hitachi Solutions

Learn more about Hitachi Solutions and their company culture.

View company profile

At Hitachi Solutions, we are at the forefront of digital transformation, pioneering groundbreaking solutions that redefine industries and empower societies. As a core IT company of the Hitachi Group, our mission is to contribute to a sustainable future through the development of superior, original technology and products. We are a global cloud-services and systems integrator with an unwavering focus on the Microsoft platform. This exclusive dedication allows us to harness the full power of Microsoft's ecosystem, delivering innovative and seamless solutions that drive business modernization for our clients worldwide. Our team of over 5,000 world-class professionals collaborates across the globe, leveraging cutting-edge digital technologies to co-create with our customers and solve the complex challenges facing business and society.

Our innovation is not just about technology; it's about creating tangible value and a lasting impact. We are accelerating Sustainability Transformation (SX) by integrating advanced solutions like Microsoft Dynamics 365, Power Platform, and Azure cloud services. These technologies form the bedrock of our custom-built solutions, designed to enhance operational excellence, foster agility, and unlock new opportunities for growth. From developing sophisticated data analytics platforms to implementing intelligent business applications, our work is geared towards making businesses more efficient, resilient, and forward-thinking. We take immense pride in our ability to understand our customers' unique needs and to architect tailored solutions that not only meet but exceed their expectations, laying the groundwork for an agile and prosperous future where no one is left behind.

Employee benefits

Learn about the employee benefits and perks provided at Hitachi Solutions.

View benefits

Dental Insurance

Dental insurance coverage.

Maternity leave

Paid leave for new mothers.

Paternity leave

Paid leave for new fathers.

Flexible working hours

Offers flex-time for employees.

View Hitachi Solutions's employee benefits
Claim this profileHitachi Solutions logoHS

Hitachi Solutions

View company profile

Similar remote jobs

Here are other jobs you might want to apply for.

View all remote jobs

3 remote jobs at Hitachi Solutions

Explore the variety of open remote roles at Hitachi Solutions, offering flexible work options across multiple disciplines and skill levels.

View all jobs at Hitachi Solutions

Remote companies like Hitachi Solutions

Find your next opportunity by exploring profiles of companies that are similar to Hitachi Solutions. Compare culture, benefits, and job openings on Himalayas.

View all companies

Find your dream job

Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan
Hitachi Solutions hiring Lead Site Reliability Engineer (AZURE) - Empower Product Group • Remote (Work from Home) | Himalayas