Himalayas logo
Positron.aiPO

Sr Software Engineer

Positron AI is a hardware startup that develops custom, energy-efficient hardware to accelerate machine learning applications and AI inference, offering an alternative to traditional GPU-based systems.

Positron.ai

Employee count: 11-50

United States only

About Us:

Positron.ai specializes in developing custom hardware systems to accelerate AI inference. These inference systems offer significant performance and efficiency gains over traditional GPU-based systems, delivering advantages in both performance per dollar and performance per watt. Positron exists to create the world's best AI inference systems.

Senior Software Engineer – Machine Learning Systems & High-Performance LLM Inference

We are seeking a Senior Software Engineer to contribute to the development of high-performance software that powers execution of open-source large language models (LLMs) on our custom appliance. This appliance leverages a combination of FPGAs and x86 CPUs to accelerate transformer-based models. The software stack is written primarily in modern C++ (C++17/20) and heavily relies on templates, SIMD optimizations, and efficient parallel computing techniques.

Key Areas of Focus & Responsibilities

  • Design and implement high-performance inference software for LLMs on custom hardware.
  • Develop and optimize C++-based libraries that efficiently utilize SIMD instructions, threading, and memory hierarchy.
  • Work closely with FPGA and systems engineers to ensure efficient data movement and computational offloading between x86 CPUs and FPGAs.
  • Optimize model execution via low-level optimizations, including vectorization, cache efficiency, and hardware-aware scheduling.
  • Contribute to performance profiling tools and methodologies to analyze execution bottlenecks at the instruction and data flow levels.
  • Apply NUMA-aware memory management techniques to optimize memory access patterns for large-scale inference workloads.
  • Implement ML system-level optimizations such as token streaming, KV cache optimizations, and efficient batching for transformer execution.
  • Collaborate with ML researchers and software engineers to integrate model quantization techniques, sparsity optimizations, and mixed-precision execution.
  • Ensure all code contributions include unit, performance, acceptance, and regression tests as part of a continuous integration-based development process.

Required Skills & Experience

  • 7+ years of professional experience in C++ software development, with a focus on performance-critical applications.
  • Strong understanding of C++ templates and modern memory management.
  • Hands-on experience with SIMD programming (AVX-512, SSE, or equivalent) and intrinsics-based vectorization.
  • Experience in high-performance computing (HPC), numerical computing, or ML inference optimization.
  • Experience with ML model execution optimizations, including efficient tensor computations and memory access patterns.
  • Knowledge of multi-threading, NUMA architectures, and low-level CPU optimization.
  • Proficiency with systems-level software development, profiling tools (perfetto, VTune, Valgrind), and benchmarking.
  • Experience working with hardware accelerators (FPGAs, GPUs, or custom ASICs) and designing efficient software-hardware interfaces.

Preferred Skills (Nice to Have)

  • Familiarity with LLVM/Clang or GCC compiler optimizations.
  • Experience in LLM quantization, sparsity optimizations, and mixed-precision computation.
  • Knowledge of distributed inference techniques and networking optimizations.
  • Understanding of graph partitioning and execution scheduling for large-scale ML models.

Why Join Us?

  • Work on a cutting-edge ML inference platform that redefines performance and efficiency for LLMs.
  • Tackle challenging low-level performance engineering problems in AI and HPC.
  • Collaborate with a team of hardware, software, and ML experts building an industry-first product.
  • Opportunity to contribute to and shape the future of open-source AI inference software.

About the job

Apply before

Posted on

Job type

Full Time

Experience level

Senior

Location requirements

Hiring timezones

United States +/- 0 hours

About Positron.ai

Learn more about Positron.ai and their company culture.

View company profile

Positron AI is a hardware company at the forefront of accelerating artificial intelligence, with a mission to make advanced machine learning more accessible and efficient. Founded in the spring of 2023, the company is dedicated to providing solutions that offer superior performance per dollar and enhanced energy efficiency. All of Positron's products are designed, fabricated, and assembled in the United States, emphasizing a commitment to domestic manufacturing and supply chain security. The team at Positron brings together over 400 years of combined experience in AI, systems, silicon, and cloud technologies. The company was established to address the growing problem of GPUs becoming a financial burden for organizations deploying AI. The core mission is to deliver inference that is highly efficient, affordable, and proudly American-made, with the ultimate goal of making GPUs an optional component in the AI technology stack.

The company's first-generation product, Atlas, is the world's first accelerator designed specifically for LLM-inference. Developed and shipped within 18 months of the company's inception, Atlas was created to tackle the cost and energy constraints that are hindering growth in the AI sector. Positron is already developing its second-generation system, Titan, which aims to be even faster and more efficient. By leveraging the insights gained from Atlas, Titan is being designed to unlock virtually limitless context and enable the concurrent operation of numerous models and AI agents, supported by terabytes of memory per accelerator. Positron's innovative hardware architecture is engineered to resolve the power, memory, and scalability challenges inherent in legacy infrastructures, thereby offering the lowest total cost of ownership for transformer models. The company's systems are compatible with Hugging Face transformer models and serve inference requests through an OpenAI API compatible endpoint, ensuring seamless integration into existing AI workflows.

Claim this profilePositron.ai logoPO

Positron.ai

View company profile

Similar remote jobs

Here are other jobs you might want to apply for.

View all remote jobs

2 remote jobs at Positron.ai

Explore the variety of open remote roles at Positron.ai, offering flexible work options across multiple disciplines and skill levels.

View all jobs at Positron.ai

Remote companies like Positron.ai

Find your next opportunity by exploring profiles of companies that are similar to Positron.ai. Compare culture, benefits, and job openings on Himalayas.

View all companies

Find your dream job

Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan
Positron.ai hiring Sr Software Engineer • Remote (Work from Home) | Himalayas