Lumen is seeking a Senior Site Reliability Engineer (SRE) to design, implement, and manage highly available, scalable systems on AWS EKS, leveraging tools like Terraform, ArgoCD, and GitHub Actions. The role requires strong troubleshooting skills, robust monitoring, and optimization for performance, reliability, and cost-efficiency. This position is a fully remote position within the United States.
Requirements
- 10+ years of related experience in software development, systems engineering, and/or networking
- Kubernetes Expertise: Deep hands-on experience managing Kubernetes clusters (AWS EKS or similar)
- Infrastructure as Code & Automation: Expertise in Terraform for infrastructure as code
- System Guardrails & Application Monitoring: Proficiency in Prometheus, Grafana, and incident management workflows
- Cloud Expertise: Advanced knowledge of AWS services, including EKS, EC2, CloudWatch, Route53, Aurora, and S3
- Familiarity with auto-scaling, load balancing, and cloud cost optimization
- Programming & Scripting Skills: Strong proficiency in Python, Go, or Bash for scripting and automation tasks
- Systems Troubleshooting: Proven ability to troubleshoot complex, distributed systems
- Strong listening and communication skills
Benefits
- Health benefits
- Life benefits
- Voluntary Lifestyle benefits