This is a Virtualization Server Hosting Engineering position in Enterprise Technology.
Virtualization Hosting service enablement
Capacity Management
- Conduct capacity planning and forecasting for the platforms, including Compute/Virtual Machine (VM), memory, storage, and network resources, to ensure scalability and prevent resource exhaustion
- Analyze resource utilization trends and make recommendations for infrastructure scaling, consolidation, or optimization
- Collaborate with application teams and stakeholders to understand future demand and project capacity needs
- Develop and maintain capacity models and reports to support strategic planning
Automation & Efficiency
Develop automation solutions (scripts, playbooks) for repetitive VMware/OSV tasks, including configuration changes, VM management (like snapshot removal), auditing, remediation and integration with ticketing systems
Leverage automation to enable delivering operator updates and changes efficiently at scale
Implement Site Reliability Engineering (SRE) principles and practices to improve overall platform stability, performance, and operational efficiency
Role Based Access Control deployment and auditing
Namespace and Resource Quota management (CPU, Disk and Storage)
Observability, Monitoring, logging and Troubleshooting
Implement and maintain comprehensive end to end observability solutions (monitoring, logging, tracing) for the VMware/OSV environment, including integration with tools like Dynatrace and Prometheus/Grafana
Explore and implement Event Driven Architecture (EDA) for enhanced real time monitoring and response
Develop capabilities to flag and report abnormalities and identify "blind spots" in observability
Perform deep dive Root Cause Analysis (RCA), potentially utilizing available tooling, to quickly identify and resolve issues across the global compute environment
Find the needle in a haystack/unhealthy bit in the compute universe (Globally) for faster time to resolution
Monitor VM health, resource usage, and performance metrics proactively
Monitor for unusual activity that might indicate a compromise or misconfiguration
Solution Design & Consulting
Provide technical consulting and expertise to application teams requiring VMware/OSV solutions
Design, implement, and validate custom or dedicated OSV clusters and VM solutions for critical applications with unique or complex requirements (e.g., specialized appliances)
Knowledge Management
Create, maintain, and update comprehensive internal documentation and customer facing content to facilitate self-service and clearly articulate platform capabilities
Support
Participate in L1 – L3 level support to Operations teams environmental related issues. Monthly after hours and weekend work will be required
