Founded in 2010 in The Netherlands, AnywhereNow is a global leader in Enterprise Dialogue Management, with a vision to ensure every employee and customer feels heard, understood, and valued. With around 240 employees in working from 22 different countries, we partner with over 2,000 leading enterprises, including Mazda, the UN International Organization for Migration, Adecco Group, and the University of Cape Town, to deliver exceptional customer experiences through the power of Microsoft Teams and AI-driven insights. Our commitment to innovation, customer focus, and accountability drives our success.
We are seeking a highly experienced Cloud Operations Engineer to strengthen our South Africa-based team supporting US (Eastern Time) operations.
This role is focused on operating and improving our Azure-based production environments, including infrastructure managed via Terraform and workloads running on Azure Kubernetes Service (AKS).
You will act as a senior technical escalation point (L4), drive automation initiatives, and contribute to reliability-focused improvements across the platform. This role requires strong hands-on Azure expertise, deep troubleshooting capability, and a practical understanding of infrastructure-as-code and Kubernetes operations.
This role is based in South Africa (remote) and requires working hours aligned to US Eastern Time (ET).
Key Responsibilities
Operate and maintain Azure production environments, ensuring high availability, performance, and stability.
Operational Stability & SRE: Collaborate with the wider CloudOps team to improve platform stability. This includes applying practical SRE principles like reducing manual "toil" through better documentation and helping define proactive alerting thresholds for voice health.
Act as an L4 escalation point for complex cloud incidents across compute, networking, storage, identity, and AKS layers.
Contribute to and maintain Terraform-managed infrastructure, including reviewing, modifying, and troubleshooting infrastructure-as-code.
Operate and troubleshoot AKS clusters and workloads, including cluster health, scaling, networking, ingress, and performance issues.
On-Call Rotation: Participate in the Cloud Operations on-call rotation, providing expert-level coverage for critical production incidents.
Documentation: Maintain clear, technical operational procedures and "runbooks" for common voice-related troubleshooting scenarios.
Requirements
Why we would like to have a dialogue with you
Ownership & accountability: You don’t just solve problems; you take responsibility for preventing them from recurring.
Communication: The ability to communicate clearly and effectively with individuals across the organization, and to be responsive to their needs and concerns.
Action-oriented: The ability to act quickly and decisively, even in the face of uncertainty, to move projects forward and achieve business goals.
Commitment: The ability to consistently meet or exceed the quality standards expected by customers.
Collaboration: The ability to work effectively with others and to build strong relationships based on trust and mutual respect, recognizing that everyone has something to contribute.
Systems thinking: You understand how small changes impact the broader system, and you consider scalability and long-term impact.
Customer centricity: the ability to provide high-quality service (to customers), including understanding their needs, solving problems, and responding to feedback.
Adaptability: The ability to adjust to changing situations and work effectively in environments that may be uncertain or unpredictable.
Competencies are key, but to be successful in this role you need to bring a few essentials to kickstart the conversation
5+ years of experience in Cloud Operations, Platform Engineering, or SRE-type roles.
Strong hands-on experience with Microsoft Azure, including VMs, networking, identity, storage, and monitoring.
Proven experience working with Terraform in production environments.
General Systems Troubleshooting: A methodical approach to isolation—distinguishing between application, network, and provider-level issues in a complex, distributed environment.
Strong operational experience with Azure Kubernetes Service (AKS).
Experience building automation using scripting languages such as PowerShell, Python, or Bash.
Experience optimizing Azure environments for reliability and cost efficiency.
Monitoring Tools: Experience using Azure Monitor to track performance trends.
Some last notes
AnywhereNow is committed to creating a diverse environment and is proud to be an equal-opportunity employer. We accept difference and we thrive on it for the benefit of our employees, our products, and our community.
Please note that we have a background check policy. The background check differs per country and position. If you would like to know more, the recruiters are happy to answer any questions!
Highlights
Own Azure reliability at scale. Lead L4 incident response, automate with Terraform, operate AKS, and turn SRE principles into resilient, customer‑critical platforms.
