Our Clients Cloud Operations team is a group of talented engineers passionate about building highly reliable, scalable and secure solutions in public/private cloud environments. We are looking to hire a highly motivated Cloud Operations engineer with strong working experience in production operation, as well as cloud infrastructure design and implementation. Together, we will design, develop and implement the best public / private / local cloud solutions for our customers. You will also be expected to participate in continuous cloud service operation, troubleshoot, and resolve complex issues in production.
Responsibilities:
1. Manage and maintain the clients cloud infrastructure in AWS, GCP & Azure.
2. Provide technical leadership in cloud infrastructure design and implementation.
3. Ensure secure and reliable communication across different regions and cloud service providers.
4. Deploy and configure middleware services, such as SQL, NoSQL databases, and messaging queue systems.
5. Evaluate, recommend, and implement CloudOps / DevOps technology and solutions.
6. Participate in continuous cloud service operations with the US and remote teams.
7. Troubleshoot and follow up on production infrastructure / application related issues.
8. Driving root cause analysis and resolution.
9. Communicate with Dev/QA as well as external carriers to resolve and prevent issues.
10. Design and implement deployment automation platform for Kubernetes based microservices.
11. Improve service availability and scalability through tuning, automation, tools, and process.
12. Analyze service performance, identify bottleneck and provide actionable improvement plans.
Requirements
1. BS level technical degree required; Computer Science or Engineering background preferred.
2. 8+ years of experience in a CloudOps / DevOps role.
3. Hands on experience with AWS or any public cloud (Azure, GCP etc.).
4. Knowledge of Linux, security and networking fundamentals.
5. Working knowledge of container-based architecture and deployment (Docker, Kubernetes.)
6. Working knowledge of deployment automation development (Terraform, Helm, ArgoCD).
7. Experience in diagnosing and resolving complex application problems.
8. Working knowledge of Elasticsearch, PostgreSQL, Redis, Ignite, Flink, Kafka, and RabbitMQ.
9. Experience with monitoring tools (Nagios, Grafana, Prometheus)
10. Experience with cloud security and compliance implementation is a plus.
11. Strong follow-through and initiative to stay with issues until they are resolved.
12. Comfortable working within a distributed team located in multiple time zones.