Role: Site Reliability Engineer (Ex - Fidelity Exp)
Location: Remote
Position Type: Contract
Key Responsibilities
- Design, implement, and manage Kubernetes environments from deployment to configuration, monitoring, and troubleshooting
- Build and maintain scalable and reliable infrastructure using infrastructure as code principles
- Develop comprehensive monitoring solutions and implement alerting strategies
- Analyze system performance bottlenecks and implement improvements
- Implement and maintain CI/CD pipelines for seamless deployments
- Conduct incident response, root cause analysis, and implement preventative measures
- Create and enhance automation tools leveraging AI/ML where applicable
- Collaborate with development teams to improve application reliability and performance
Required Qualifications
- 5-7 years of experience in SRE or DevOps roles
- Strong expertise with Kubernetes ecosystem and container orchestration
- Deep understanding of Linux/Unix operating systems and performance analysis tools (NMON, etc.)
- Experience with log analysis, monitoring systems, and observability tools
- Proficiency in database administration and performance tuning (Oracle, SQL Server)
- Strong programming skills in at least one of: Python, Go, Java, or Node.js
- Experience developing automation tools and frameworks
- Proven track record of proactive problem identification and resolution
Preferred Qualifications
- Experience with AI/ML integration into operational workflows
- Cloud platform experience (AWS, GCP, Azure)
- Knowledge of service mesh technologies
- Experience with distributed systems architecture
- Familiarity with security best practices and compliance requirements
Personal Qualities
- Proactive mindset with strong analytical and problem-solving abilities
- Collaborative approach to working across development and operations teams
- Excellent communication skills and ability to explain complex technical concepts
- Self-motivated with the ability to work independently and as part of a team
- Passion for continuous improvement and learning