Join Udemy, a leading AI-powered reskilling platform, and help shape the future of learning. As a Principal Database Reliability Engineer, you will oversee the day-to-day activities and engineering strategies of the Datastore Infrastructure (DSI) team, ensuring uptime, security, and compliance for millions of learners worldwide.
Requirements
- 8-10 years of professional experience working in a Cloud Engineering team (also SRE/DBRE team) with Infrastructure responsibilities in managing large production workloads.
- Proficiency with managing MySQL at scale
- Strong understanding in running Production Workloads in Kubernetes
- Proficiency with tools like Terraform, Ansible, Git and how to work with Infrastructure as Code, and automated provisioning.
- Strong experience in Kafka cluster management, topic configuration, performance tuning, and ensuring high availability and fault tolerance.
- Experience with Message Queues (MQ/SQS) and Caching (Redis, Memcache) or similar products
- Experience in Python.
- Knowledge of configuration management tools, monitoring systems (Datadog or similar) for database infrastructure, and scaling strategies for handling increased data volumes.
- Strong troubleshooting skills to diagnose complex database issues.
- Hands-on experience with AWS cloud infrastructure and a grasp of security best practices.
Benefits
- Flexible work arrangements
- Health insurance
- Retirement savings plan
- Stock options
- Free access to Udemy courses
- Monthly UDay for self-improvement
- Budget for professional development