We are looking for a Senior Cloud Engineer to architect and manage the AWS ecosystem that will power our Data and AI platform. Your primary mission is to create a "Developer Self-Service" environment—ensuring that our Fullstack and AI engineers have the infrastructure, CI/CD pipelines, and observability tools they need to move fast without compromising security.
You will be the architect of our cloud-native strategy, moving beyond manual configuration to Infrastructure as Code (IaC). You aren't just "managing servers"; you are building a scalable, automated platform that treats infrastructure as a product.
Requirements
7+ years of experience in Infrastructure/Systems Engineering, with a heavy focus on AWS.
Infrastructure as Code (IaC) Expert: Mastery of Terraform or Pulumi. We expect "ClickOps" to be non-existent; everything must be version-controlled.
Container Orchestration: Deep experience with Amazon EKS (Kubernetes) or ECS, including fargate, service mesh, and scaling policies.
CI/CD Architect: Ability to design and maintain complex pipelines (GitHub Actions, GitLab CI, or Jenkins) that include automated testing, security scanning, and blue/green deployments.
Cloud-Native Networking: Expertise in VPC design, Peering, Transit Gateways, and Load Balancing (ALB/NLB), especially for real-time data streams.
Security & Compliance: Implementation of IAM least-privilege policies, secret management (AWS Secrets Manager/HashiCorp Vault), and encryption at rest/transit.
Monitoring & Observability: Setting up the "Golden Signals" of monitoring using tools like Prometheus/Grafana, Datadog, or New Relic.
Database Ops: Experience configuring and tuning RDS (PostgreSQL) and Vector Databases for high availability and performance.
Nice to have:
Data/AI Infrastructure: Experience with AWS SageMaker, MLOps pipelines (Kubeflow), or managing GPU-backed instances for AI inference.
Serverless Mastery: High proficiency with AWS Lambda and EventBridge for the "Real-time event stream" requirements of the project.
Cost Optimization: Proven track record of using AWS Cost Explorer and Reserved Instances/Savings Plans to keep cloud spend under control.
SRE Mindset: Familiarity with defining Service Level Objectives (SLOs) and conducting Blameless Post-Mortems.
