About Us:
At Apolo, we're committed to simplifying AI/ML operations for organizations. By addressing the deployment challenges of AI/ML in varied environments, we provide cost-effective and hassle-free solutions. Our managed services and comprehensive tools allow businesses to focus on their core objectives, ensuring seamless AI integration and innovation without the operational complexity.
The Role:
We are looking for an Infrastructure Engineer who will be crucial in managing our product infrastructure. This role requires technical expertise, leadership qualities, and a proactive mindset to ensure our systems are secure, efficient, and in line with our product goals. Ideal candidates are resourceful, excel in problem-solving, and capable of working autonomously with minimal supervision
Requirements
- Extensive knowledge and hands-on experience with Kubernetes, including overall cluster administration.
- Proficiency with cloud service providers (AWS, GCP, Azure).
- Experience in managing bare metal infrastructure.
- Proficiency in Terraform for infrastructure automation.
- Expertise in Helm for package management.
- Strong foundation in Linux system administration, with skills in performance tuning, troubleshooting, and understanding operating system internals.
- Solid networking knowledge, including TCP/IP, DNS, load balancing, and firewall configurations, to ensure secure and efficient network operations.
- Expertise in container engines such as containerd and Docker, with practical experience in configuring, managing, and optimizing containerized environments.
- Proficiency in CI/CD practices, particularly with GitHub Actions.
Responsibilities:
- Oversee infrastructure across cloud, on-premise, and bare metal environments.
- Manage resources in multiple cloud service providers (AWS, GCP, Azure).
- Enhance observability across all environments.
- Implement and integrate solutions that align with our product goals.
- Streamline provisioning pipelines, focusing on the automation of manual processes.
- Apply Infrastructure as Code (IaC) principles using tools like Terraform and Helm.
- Facilitate certification processes and maintain compliance with industry standards.
- Implement robust security hardening practices.
Desirable Skills:
- Experience with CNI, Ingress Controllers, Service Meshes, Gateways.
- Experience with CSI, NAS, NFS and other related storage technologies.
- Prometheus / Thanos, Grafana and related tools.
- Proficient in Python for scripting and automation.
Benefits
What We Offer:
- Work remotely, ensuring time zones align for effective collaboration.
- Shape the product's direction and success by taking ownership of essential components.
- Solve complex and innovative challenges.
- Join a supportive and dynamic team environment.
- Receive a competitive salary and benefits package