ISO New England is the independent system operator responsible for ensuring the safe and reliable flow of electricity in our region and planning for the future of the electric grid. We are at the forefront of New England’s ongoing transition to clean energy.
ISO New England is seeking a Senior Site Reliability Engineer to strengthen the reliability, automation, and observability of our hybrid infrastructure spanning on-premises and public cloud environments. This hands-on technical role focuses on the administration, optimization, and automation of Splunk and related observability tooling, with opportunities to grow into broader platform leadership.
This role will also help shape ISO’s overall observability landscape, integrating tools and data sources such as Splunk, Dynatrace, ExtraHop, PRTG, and others to provide a complete picture of system health and business reliability.
You will collaborate with other SREs, infrastructure, cyber security, platform engineering, NOC, and application teams to improve monitoring visibility, establish the best observability practices, and support proactive detection and faster incident response.
What we offer you:
- Distance-based relocation assistance available
- 100% remote with travel to the office when required
- Competitive compensation with a base salary + performance bonus
• Robust benefits package, including:
◦ Enhanced 401(k) and financial planning support ◦ Tuition reimbursement and professional development ◦ Wellness programs, including an onsite gym ◦ Free coffee at our onsite café ◦ Flexible work hours ◦ Employee Business Networks
- A stable, mission-driven workplace where your impact truly matters
How you will make an impact:
- Administer and optimize Splunk Enterprise and related observability platforms for performance, scale, and data integrity.
- Manage data ingestion pipelines, index management, dashboards, and alerting.
- Develop and maintain automation for Splunk deployment, configuration, and integrations using Ansible and Terraform.
- Partner with SRE and application teams to improve metrics collection, log aggregation, and tracing standards.
- Help drive incident response, root cause analysis, and reliability improvement efforts.
- Take stock of the broader observability ecosystem — including Dynatrace, ExtraHop, and other telemetry solutions — to assess how they fit together, identify overlap, and evolve toward a unified visibility strategy that improves reliability and enables better business insights.
- Contribute to the design and implementation of SLOs/SLIs and other reliability metrics that guide engineering priorities.
- Stay current with observability and automation trends, recommending enhancements to existing platforms.
What you need to be successful in this role:
- 4+ years of experience in observability, monitoring, and site reliability engineering roles.
- Hands-on expertise with Splunk Enterprise (admin, search, data onboarding, and dashboarding).
- Familiarity with hybrid infrastructure monitoring across Linux (RHEL), Windows Server, networking, storage, and other core infrastructure and application environments.
- Experience automating configuration and deployments with Ansible and Terraform.
- Proficiency in Python is preferred, but the key requirement is a solid understanding of scripting and coding fundamentals — with the ability to read, adapt, and extend automation scripts or integrations as needed.
- Strong knowledge of Linux (RHEL) and Windows Server environments.
- Understanding of incident management and reliability best practices.
- Excellent collaboration and communication skills with a growth mindset toward leadership.
Desired not required:
- Experience with Splunk, Dynatrace, CloudWatch, and Azure Monitor.
- Experience mentoring peers and influencing cross-team observability practices.
- Interest in learning new scripting languages and automation frameworks as the environment evolves.
- Desire to evolve toward a technical lead and platform owner role within SRE/Observability.
This employer will not sponsor applicants for work visas for this position (ex: H-1B, F-1/CPT/OPT, O-1, E-3, TN, J, etc.).
The expected salary range for this position is $141,000 - $167,000 per year. This role is also eligible for an annual performance bonus, comprehensive health insurance (medical, dental and vision), flexible spending and health savings accounts, a 401(k) plan with generous employer contributions and a student debt benefit, life and AD&D insurance, disability insurance, critical illness and hospital indemnity benefits, paid time off, paid leave, a wellness program, an employee assistance program and other great company perks.
This is a U.S. based role. If the successful candidate resides outside of the U.S., relocation will be required.
Equal Opportunity: We are proud to be an EEO employer. Applicants for employment are considered without regard to race, color, religion, creed, sex (including pregnancy, childbirth, and related medical conditions), gender identity or expression, sexual orientation, citizenship, national origin, age, ancestry, marital status, disability (including learning, mental, intellectual, and physical), service in the uniformed services, genetic information, or any other status protected by applicable law.
Drug Free Environment: We maintain a drug-free workplace and perform pre-employment substance abuse testing.
