Responsibilities:
- Leadership in System Maintenance and Reliability: Lead the design and enhancement of operational monitoring frameworks, ensuring high availability and stability for all critical systems.
- Technologies: Operating Systems (Linux, Windows Server), Monitoring Tools (e.g., Prometheus, Grafana, Open Telemetry, Nagios, ELK Stack 14), Virtualization (VMware, Hyper-V), Cost tracking (OpenCost), Access Management (Keycloak, GitOps), Enterprise logging (Loki).
- Advanced Infrastructure Management: Architect and manage complex on-premise and/or cloud environments, focusing on resilience, security, and compliance.
- Platforms: Cloud Platforms such as AWS, Azure, or GCP
- Configuration and Patch Management tools such as WSUS, SCCM, or Ansible
- Mentorship and Collaboration: Serve as a mentor to SysOps Engineers, fostering a culture of continuous improvement and best practices.
- Strategic Problem-Solving: Tackle complex system performance issues, employing advanced analytical skills to maintain reliability and efficiency.
- Strategic Problem-Solving and Incident Response: Tackle complex system performance issues and lead incident management, employing advanced analytical skills to restore and maintain service reliability and efficiency.
Minimum Qualification:
- Bachelor’s degree in Computer Science, Engineering, IT, or related field; or equivalent practical experience.
- 5+ years of experience in System Operations or Systems Administration, with proficient scripting skills in Bash, PowerShell, or Python.
- Deep expertise in monitoring, incident response, and troubleshooting systems on both cloud platforms (AWS, Azure, Google Cloud) and on-premise infrastructure.
- Advanced knowledge in networking, security protocols, backup/recovery strategies, and database management.
- Leadership skills with the ability to mentor others, influence cross-functional teams, and lead by example in a collaborative environment.
- Strong collaborative skills, with the ability to work effectively with cross-functional teams to foster an environment of teamwork and cooperation.
- Exceptional English communication skills with team members, stakeholders, and customers, ensuring clear and effective exchange of information.
- Strong analytical and problem-solving skills, with a detail-oriented approach to identifying and resolving system issues efficiently.
- Self-motivated and detail-oriented, with the ability to work independently and under pressure, managing multiple priorities and deadlines effectively.