Himalayas logo
ManevaMA

Site Reliability Engineer

Maneva is a leader in AI-driven solutions for the manufacturing industry, enhancing productivity and efficiency with cutting-edge technologies.

Maneva

Employee count: 11-50

Canada only

Stay safe on Himalayas

Never send money to companies. Jobs on Himalayas will never require payment from applicants.

About Maneva

Maneva builds and deploys edge AI solutions powering real-time intelligence for industrial environments. Our systems run on distributed edge compute devices (NVIDIA Jetson platforms), integrate with local network cameras, PLCs, sensors, and other on-premise equipment, and securely communicate with cloud services via client- or site-based VPNs. Our customers rely on our systems around the clock, and we take reliability seriously.

Weʼre seeking a Site Reliability Engineer (SRE) who enjoys solving complex operational challenges, improving observability and automation, and supporting mission-critical workloads in production.

About the Role

As a Site Reliability Engineer at Maneva, you will ensure the reliability, availability, and performance of our edge AI deployments at customer sites. This includes gaining deep familiarity with Manevaʼs hardware platform, networking configurations, and application stack so that you can rapidly diagnose and resolve issues as they arise.

The role includes participating in an on-call rotation for 24/7 incident response, including off-hour coverage as part of a structured global support model. When not responding to incidents, you will contribute to long-term engineering initiatives around monitoring, automation, reliability, and documentation.

Responsibilities

Operational Support & Incident Response

  • Serve as a first responder for production issues, alarms, and system outages (24/7 rotation required).
  • Troubleshoot Linux system issues, hardware problems, networking connectivity, and edge-device performance.
  • Perform root-cause analysis (RCA) and implement corrective and preventive solutions.
    Document incidents, contributing to a culture of transparency and process improvement.

Proactive Monitoring & Observability

  • Build and maintain robust monitoring dashboards and alerts using Prometheus, Grafana, and similar tools.
  • Continuously improve observability, including metrics, logs, traces, and health checks.
  • Analyze trends to proactively identify reliability risks before incidents occur.
  • Develop automation to reduce noise and improve actionable alert quality.

Systems Reliability & DevOps Engineering

  • Improve deployment workflows, CI/CD pipelines, configuration management, and automated provisioning.
  • Create tools and scripts in Python/Bash to streamline operational processes.
  • Contribute to load testing, system validation, and network health verification for edge deployments.
  • Implement best practices for secure, scalable, and maintainable infrastructure.

Infrastructure & Application Ownership

  • Understand and operate Manevaʼs end-to-end edge AI stack:
  • Jetson/embedded Linux systems
  • GPU-accelerated workloads for computer vision
  • Video pipelines (RTSP, camera interfaces, data ingestion)
  • Local integrations (PLCs, industrial hardware, APIs, network resources)
  • VPN-based connectivity (client-based or site-to-site)
  • Maintain visibility into device health and fleet-wide system performance.

Documentation & Process Development

  • Create and maintain SOPs for on-site customer teams and internal engineering workflows.
  • Produce detailed incident reports and reliability documentation.
  • Maintain internal knowledge bases, troubleshooting guides, and playbooks.

Requirements

Technical Skills

  • Strong Linux systems administration experience (Ubuntu, embedded Linux, ARM systems).
  • Proficiency in Python and/or Bash for scripting and operations automation.
  • Solid networking fundamentals: TCP/IP, routing, DNS, DHCP, VPNs, VLANs, firewall rules.
  • Familiarity with troubleshooting tools: tcpdump, nmap, iftop, netstat, etc.
  • Hands-on experience with Prometheus, Grafana, or similar monitoring/alerting platforms.
  • Experience with logging/observability stacks (ELK/EFK, Loki, Fluentd, etc.) is a plus.
  • Experience with Docker or containerized applications is desirable.
  • Comfort supporting distributed or remote device fleets.

Soft Skills

  • Excellent diagnostic and analytical abilities under pressure.
  • Strong communication skills with both technical and non-technical stakeholders.
  • High ownership mentality and ability to follow issues through to resolution.
  • Comfortable working independently in a fully remote environment.
  • Willingness to participate in on-call rotation, including off-hours and weekends.

Preferred Qualifications

  • Experience supporting machine learning, computer vision, or GPU-accelerated systems.
  • Familiarity with NVIDIA Jetson or other embedded AI hardware.
  • Prior SRE/DevOps/Systems Engineer experience in a 24/7 operational environment.
  • Exposure to industrial IoT, manufacturing systems, or operational technology (OT).
  • Experience writing customer-facing operational documentation or SOPs.

Benefits

What We Offer

  • Fully remote work environment with flexibility (within on-call requirements).
  • Opportunities to work with cutting-edge edge compute and AI deployments.
  • A high-impact role shaping reliability practices from early stages.
  • Contract or full-time options, with competitive compensation.
  • A collaborative team committed to transparency, improvement, and excellence.

About the job

Apply before

Posted on

Job type

Full Time

Experience level

Mid-level
Senior

Location requirements

Hiring timezones

Canada +/- 0 hours

About Maneva

Learn more about Maneva and their company culture.

View company profile

Maneva is an innovative firm that specializes in applying artificial intelligence to revolutionize manufacturing processes. The company provides an autonomous AI-powered digital workforce designed to enhance the efficiency, quality, and productivity of factories. Established in 2022 and based in North York, Ontario, Maneva's mission is to support manufacturers by automating workflows and digitizing key tasks, ultimately transforming how production facilities operate.

With its pioneering platform, Maneva empowers manufacturers to implement AI solutions that minimize downtime, ensure superior product quality, and cut operational costs. The technology integrates seamlessly into existing systems, enabling rapid implementation without the need for custom hardware. Maneva's advanced solutions cater specifically to the unique demands of the manufacturing sector, addressing challenges that were once thought to be unmanageable.

Claim this profileManeva logoMA

Maneva

View company profile

Similar remote jobs

Here are other jobs you might want to apply for.

View all remote jobs

Remote companies like Maneva

Find your next opportunity by exploring profiles of companies that are similar to Maneva. Compare culture, benefits, and job openings on Himalayas.

View all companies

Find your dream job

Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan
Maneva hiring Site Reliability Engineer • Remote (Work from Home) | Himalayas