As a Senior Engineer (L3) specializing in Defect Management & DevOps, you will play a critical role in driving operational excellence, ensuring defect-free delivery pipelines, and strengthening reliability across cloud-native platforms. You will collaborate closely with engineering, QA, SRE, and product teams to manage end-to-end defect processes, streamline automation, and enhance service observability. The role demands deep analytical capability, strong DevOps experience, and the ability to influence cross-functional improvements through data-driven insights and advanced troubleshooting.
You will act as a subject matter expert (SME) in DevOps and GCP/AWS, overseeing end-to-end release processes, governance, and delivery pipelines. This role requires leadership, deep technical knowledge, and excellent communication skills.
Core Responsibilities
- Serve as the Subject Matter Expert (SME) for cloud platforms, primarily AWS (GCP exposure is a plus), providing guidance on cloud best practices, architectural decisions, and solution design.
- Support customers with core Managed Services technologies, including Cloud, Automation, Terraform, CI/CD, and containerization.
- Design, implement, and optimize cloud-native and DevOps solutions aligned with customer and organizational objectives.
- Lead technical discussions, demos, and customer engagements while effectively communicating complex technical concepts to both technical and non-technical stakeholders.
- Assist with team-building activities such as interviewing, onboarding, and aligning technical resources.
- Provide technical leadership, coaching, and mentorship to junior team members.
- Maintain strong project and situational awareness to ensure deliverables meet timelines and organizational expectations.
- Develop high-quality documentation including architectures, workflows, runbooks, and other written deliverables.
- Act as a technical expert in internal knowledge-sharing initiatives and external client interactions.
- Influence cloud governance, operational policies, best practices, and process improvements across teams and customer environments.
- Ensure precision, accuracy, and strong attention to detail across all tasks and deliverables.
Requirements
- Act as the SME for Defect Management processes, governance, tooling, and reporting.
- Own and manage the full defect lifecycle, including logging, triage, prioritization, RCA, corrective actions, and closure.
- Partner with Development, QA, SRE, and Product teams to ensure timely resolution of high-impact issues.
- Establish and maintain defect dashboards, KPIs, and trend analytics to drive quality and process improvements.
- Develop standardized runbooks, escalation workflows, and operational procedures for defect handling.
- Lead cross-team Root Cause Analysis (RCA) investigations and drive Corrective and Preventive Actions (CAPA) implementations.
- Improve operational readiness through enhanced monitoring, alerting, and structured incident-to-defect workflows.
- Provide guidance on CI/CD optimization, automation strategies, infrastructure stability, and reliability engineering.
- Mentor junior engineers in DevOps principles, tooling, defect analysis techniques, and troubleshooting best practices.
Requirements
- Defect Management Expertise
- Full ownership of defect lifecycle ensuring SLA adherence.
- Deep understanding of SDLC, change management, and ITIL best practices.
- Ability to analyze defect patterns, severity trends, root causes, and long-term systemic issues.
- Conduct structured RCA using 5 Why’s, Fishbone, Fault Tree Analysis.
- Define and enforce severity, categorization, and prioritization standards.
- Create dashboards and quality metrics to drive continuous improvement.
- Tools & Skills:
- Strong JIRA workflow, automation rule, dashboard, and reporting expertise.
- Ability to visualize defect trends and quality metrics effectively.
- Observability, Monitoring & SIEM Tools
- Hands-on experience with Dynatrace, Datadog, Prometheus, Grafana, CloudWatch, or similar tooling.
- Skilled in APM analysis, log correlation, anomaly detection, service mapping, and performance troubleshooting.
- Build and maintain dashboards and alert frameworks.
- Integrate monitoring insights with DevOps and operational workflows.
- Exposure to SIEM event analysis for operational and security correlation.
Core DevOps Responsibilities
- Build, enhance, and support CI/CD pipelines across multiple environments using AWS CodePipeline, CodeBuild, CodeDeploy, and Git-based workflows.
- Collaborate on automation initiatives using Terraform, CloudFormation, AWS CDK, or equivalent IaC tools to standardize and streamline deployments.
- Deploy and manage AWS cloud-native services including EKS, ECS, Lambda, API Gateway, S3, IAM, and supporting architectures.
- Work with containers and orchestration platforms such as Kubernetes, EKS, ECS, and AKS (where required).
- Implement deployment best practices such as blue/green, rolling updates, and automated rollback strategies to ensure safe, repeatable releases.
- Troubleshoot complex deployment issues, environment drift, infrastructure failures, performance bottlenecks, and service-level degradations.
- Implement and maintain observability using CloudWatch, Prometheus, Grafana, Datadog, Dynatrace, or equivalent monitoring stacks.
- Ensure AWS workloads adhere to resiliency, compliance, security, and operational excellence guidelines.
- Strong hands-on, production-grade DevOps experience in AWS (primary cloud).
- Deep expertise in Kubernetes, containerized workloads, microservices, autoscaling, and cloud networking.
- Advanced troubleshooting across AWS services, distributed systems, CI/CD pipelines, and API-driven workflows.
- Knowledge of AWS cost optimization, tagging, FinOps alignment, and resource lifecycle governance.
- Exposure to building or maintaining CI/CD pipelines within GCP ecosystems (Cloud Build, GKE, Artifact Registry, etc.).
- Ability to work with GCP cloud-native services where required, ensuring consistency across hybrid/multi-cloud deployments.
- Familiarity with GCP IAM, VPC architecture, and core compute/storage/networking components is a plus.
General Qualifications
- Strong communication, leadership, and mentoring capabilities.
- 6–10+ years of experience in DevOps, SRE, QA Engineering, or Cloud Operations.
- Expert-level AWS knowledge (GCP exposure would be a plus).
- Strong command of IaC tools such as Terraform, CloudFormation, CDK.
- Experience with CI/CD systems: Jenkins, GitLab CI, AWS CodePipeline.
- Proficiency with Docker, Kubernetes, and container orchestration.
- Experience with monitoring technologies: Datadog, Grafana, Prometheus.
- Experience with JIRA workflows and project tracking.
- Ability to excel in dynamic, fast-paced environments.
Expectations
- Demonstrate deep expertise across DevOps, cloud platforms, automation, and engineering practices.
- Balance hands-on delivery with leadership responsibilities and strategic initiatives.
- Continuously assess, refine, and enhance processes, documentation, and operational workflows.
- Adapt effectively to evolving customer requirements, project priorities, and technology landscapes.
- Engage confidently with senior stakeholders, providing clear communication and technical guidance.
- Lead scoping, planning, and methodology definition for major technical initiatives and transformations.
- Contribute to the development of new engineering standards, frameworks, and best practices across teams.
- Take senior-level ownership of critical defects, escalations, and operational issues, driving them to resolution.
- Influence and drive cross-team improvements in tooling, quality, automation, and operational efficiency.
- Ensure prevention mechanisms, automation guardrails, and reliability practices are embedded early in delivery cycles.
- Lead initiatives focused on defect prevention, observability enhancements, and overall DevOps maturity uplift.
- Participate in on-call rotations and provide Tier-3 technical expertise for complex issues.
- Continuously propose, design, and implement enhancements across tooling, automation, and operational frameworks.