Join Truelogic, a leading provider of nearshore staff augmentation services, as a Site Reliability Engineer (AWS) and play a key role in platform enablement by building and maintaining core infrastructure tooling that enables teams to deploy and operate services reliably using AWS and Kubernetes.
Requirements
- Designs, implements, and evolves shared AWS CDK and CDK8s constructs used across multiple services and teams.
- Maintains core infrastructure components including VPC, EKS clusters and node groups, RDS, OpenSearch, and MSK.
- Operates and extends Kubernetes cluster addons such as ingress controllers, cert-manager, autoscalers, and monitoring/logging stacks.
- Ensures high reliability through structured alerting systems (Prometheus, CloudWatch), autoscaling strategies, and recovery mechanisms.
- Manages and publishes baseline templates, configuration schemas, and comprehensive documentation for infrastructure usage.
- Owns the CI/CD pipelines for Infrastructure as Code (IaC) codebases and platform component releases.
- Collaborates with engineering teams to troubleshoot infrastructure-related issues and deliver scalable, reliable solutions.
- Applies Site Reliability Engineering (SRE) principles—including SLIs, SLOs, observability, and fault tolerance—to all shared platform services.
- Supports IAM roles, secrets management, and tenant isolation best practices.
Benefits
- 100% Remote Work
- Highly Competitive USD Pay
- Paid Time Off
- Work with Autonomy
- Work with Top American Companies
