Location:
- Serve as the senior technical leader for the L1 and L2 Cloud Operations Engineers, providing day-to-day guidance, coaching, and knowledge transfer.
- Lead by example on complex incidents, walking junior engineers through advanced troubleshooting methodologies and resolution strategies.
- Define and maintain standard operating procedures, runbooks, and escalation workflows to ensure consistent and high-quality service delivery.
- Act as the technical point of contact during operational hours, overseeing ticket queues, prioritization, and SLA adherence across the team.
- Conduct internal training sessions and knowledge sharing on new technologies, processes, and client-specific environments.
- Identify skill gaps within the team and recommend training paths and certification goals to the Head of Managed Services.
- Serve as the highest-level technical escalation point within the service desk, resolving the most complex incidents spanning cloud, networking, security, and back-end infrastructure.
- Own and lead major incident processes end-to-end, including triage, communication, escalation to third parties, root cause analysis, and post-incident reviews.
- Perform advanced troubleshooting across multi-tenant Azure environments, hybrid infrastructure, and complex networking topologies.
- Coordinate with third-party vendors, ISPs, and technology partners during critical incidents and change management activities.
- Drive problem management initiatives to identify trends, reduce repeat incidents, and improve overall service reliability.
- Document and track all escalations, major incidents, and problem records in Jira Service Management with thorough root cause analysis.
- Architect and manage complex Azure environments, including Azure AD, Conditional Access, Azure Virtual Desktop (AVD), Azure Networking (VNets, NSGs, ExpressRoute), and Azure Automation.
- Administer and optimize Office 365 tenants at an advanced level, including Exchange Online mail flow, hybrid configurations, security and compliance policies, and tenant-to-tenant migrations.
- Manage and troubleshoot Windows Server infrastructure (2016, 2019, 2022) at an advanced level, including Active Directory design, Group Policy architecture, DNS and DHCP, DFS, and Certificate Services.
- Oversee VMware ESXi and virtualization environments, including capacity planning, performance optimization, host management, and migration strategies.
- Lead VDI environment management, including Azure Virtual Desktop, Citrix, and thin client deployments at scale.
- Perform intermediate to advanced network troubleshooting and configuration across TCP/IP, DNS, DHCP, VLANs, routing protocols, and WAN connectivity.
- Configure and manage Fortinet FortiGate firewalls, including advanced policy management, SD-WAN, IPS or IDS, web filtering, and high-availability configurations.
- Manage Cisco Meraki environments at scale, including complex wireless deployments, SD-WAN, switch stacking, and security appliance policies.
- Design, configure, and troubleshoot SSL VPN and IPsec VPN solutions across multiple client environments.
- Perform advanced Cisco networking tasks, including routing configuration (OSPF, BGP basics), ACLs, inter-VLAN routing, and spanning tree optimization.
- Lead wireless network design and troubleshooting, including site surveys, heat mapping, controller-based and cloud-managed deployments.
- Serve as the subject matter expert for advanced desktop and endpoint issues that cannot be resolved at L1 or L2, including complex OS corruption, driver conflicts, and application compatibility.
- Design and optimize Intune or Endpoint Manager deployment strategies, including Autopilot, compliance policies, and application packaging.
- Architect Group Policy structures and manage complex AD environments across multiple client tenants.
- Liaise directly with client stakeholders on escalated issues, service reviews, and change management activities.
- Support client onboarding and handover processes, ensuring smooth transitions and comprehensive documentation.
- Collaborate with the account management teams on client escalations, service improvement plans, and quarterly business reviews.
- Contribute to pre-sales technical assessments and scoping for new client engagements.
- Design and implement automation solutions using PowerShell, Azure Automation, and other scripting tools to eliminate manual overhead and improve operational efficiency.
- Own and optimize Jira Service Management workflows, SLA configurations, dashboards, and reporting for the Cloud Operations team.
- Manage Jira project boards for change management, infrastructure projects, and operational improvement initiatives.
- Lead process improvement initiatives across the service desk, including SLA optimization, ticket workflow refinement, and monitoring enhancements.
- Evaluate and recommend new tools, technologies, and processes to enhance the team’s capabilities and service delivery.
- A minimum of five or more years of experience in a third-line (L3) or senior helpdesk, service desk, or cloud operations role within an MSP or MSSP environment.
- At least two years of experience in a technical leadership, senior escalation, or mentorship capacity.
- Extensive experience supporting enterprise Windows environments, Microsoft Azure, Office 365, and hybrid infrastructure.
- A proven track record of leading major incident resolution and driving root cause analysis in client-facing environments with stringent SLAs.
- Hands-on experience with Jira Service Management and Jira for project and change management.
- Subject matter expertise in Microsoft Azure (Azure AD, AVD, Networking, Automation, Conditional Access, ExpressRoute) and Office 365 administration.
- Advanced Windows Server administration (2016, 2019, 2022): AD architecture, GPO design, DNS and DHCP, DFS, Certificate Services, and PowerShell automation.
- Expert-level Windows desktop troubleshooting and endpoint management (Intune, Autopilot, SCCM or MECM).
- Advanced networking: TCP/IP, VLANs, routing (OSPF, BGP basics), subnetting, wireless enterprise design, and WAN optimization.
- Strong Fortinet FortiGate experience, including advanced firewall policies, SD-WAN, VPN (SSL and IPsec), IPS or IDS, and high-availability configurations.
- Advanced Cisco Meraki management and intermediate Cisco IOS networking (switching, routing, ACLs).
- VMware ESXi and virtualization: capacity planning, performance tuning, migration, and host management.
- Advanced PowerShell scripting and automation experience with proven results in reducing manual operational overhead.
- Experience with monitoring and alerting platforms (e.g., PRTG, Datadog, Azure Monitor, LogicMonitor).
- Demonstrated technical leadership ability, with experience guiding, coaching, and developing junior and mid-level engineers.
- Exceptional client communication skills, with the ability to manage expectations and deliver difficult messages professionally.
- Strong organizational skills, with the ability to manage competing priorities across multiple clients and team responsibilities.
- A strategic thinker with a proactive approach to identifying and resolving operational challenges before they impact service delivery.
- ITIL Foundation certification or demonstrated working knowledge of ITIL service management frameworks.
- Microsoft certifications: AZ-104, AZ-305, AZ-500, MS-102, or similar.
- Fortinet NSE 4+ certifications.
- Cisco CCNP, CCNA, or Meraki certifications.
- ITIL Intermediate or higher certifications.
- Experience with SIEM platforms, EDR solutions, and security incident response.
- DevOps exposure: CI/CD pipelines, Infrastructure as Code (Terraform, ARM or Bicep templates).
- SQL proficiency in operational reporting and data analysis.
- Prior experience in financial services, hedge funds, or trading technology environments.
- A degree in Computer Science, Information Technology, or a related field.
- Should be willing to accept a long-term work-from-home arrangement.
- Should be amenable to a permanent night shift schedule.
Details
