Open to opportunities

viktor domovnikov

@viktordomovnikov

Senior DevOps/SRE leading infrastructure reliability and observability, cutting MTTD 40–60% with automated monitoring and incident response.

Armenia

Message

What I'm looking for

I’m looking for a remote-first team where I can own observability and infrastructure reliability end-to-end—building monitoring, alerting, and incident response tied to SLAs, improving MTTD, and advancing SRE day-2 operations.

I’m a Senior DevOps / SRE Engineer with 9+ years owning infrastructure reliability and observability end-to-end. I focus on measurable outcomes—I've reduced MTTD by 40–60% using robust monitoring platforms, automated alerting, and structured incident response.

I’ve built and operated observability platforms from scratch, including Prometheus, Zabbix, Grafana, and Alertmanager, and I’ve delivered faster onboarding by creating 40+ reusable monitoring templates. I design high-performance metrics pipelines (Python + Redis + Docker) to maintain tight polling SLAs while keeping target system load under control.

My approach treats observability as a core SRE discipline: metrics, logs, and alerting that map directly to SLA and operational runbooks. I’ve also led Kubernetes monitoring (sidecar patterns, kube-state-metrics) and implemented centralized log aggregation with ELK, while onboarding and mentoring engineers on architecture and day-2 operations.

Experience

Work history, roles, and key accomplishments

DevOps / Monitoring Engineer

Coms

Jan 2025 - Mar 2026 (1 year 2 months)

Built and maintained an observability platform from scratch (Prometheus, Zabbix, Grafana, Alertmanager), reducing MTTD by 60% across engineering teams. Developed 40+ monitoring templates, improved onboarding by 95%, and maintained a ≤10 sec metrics polling SLA using Python, Redis, and Docker.

Prometheus Zabbix Grafana AlertManager Python Redis Docker Kubernetes JMX Exporter

Systems / DevOps Engineer

Dobrotsen

Mar 2024 - Dec 2024 (9 months)

Deployed an end-to-end observability stack (Prometheus, Grafana, Alertmanager) with Telegram alerts, reducing incident detection time by 40%. Centralized logs via ELK across 30+ servers and migrated unstable NAS to Nextcloud for 300+ employees while improving infrastructure inventory using NetBox.

Prometheus Grafana AlertManager ELK Stack Telegram NextCloud NetBox Active Directory

Systems Engineer (24/7 Ops)

Galamart

Jul 2021 - Dec 2023 (2 years 5 months)

Administered and extended Zabbix monitoring at scale through template development, trigger tuning, and host onboarding. Managed terminal server farms and key services (RDS, IIS, DFS, SharePoint) and automated recurring operations with PowerShell.

Zabbix PowerShell Windows Server RDS IIS SharePoint

Systems Administrator

Alteit

Mar 2017 - Dec 2021 (4 years 9 months)

Managed MSP infrastructure for multiple clients, supporting Linux and Windows Server environments and deploying Zabbix + Grafana monitoring. Configured Veeam Backup & Replication, administered PostgreSQL and MS SQL databases, and operated virtualized platforms using ESXi, Proxmox, Hyper-V, and KVM.