We are developing a real-time voice AI platform that combines speech recognition, LLM-driven reasoning, and natural-sounding text-to-speech capabilities. This is production-grade software designed to run continuously, scale under heavy load, and deliver human-like conversations with minimal latency.
If you are a seasoned Python engineer who thrives on building resilient systems and solving tough concurrency challenges, this is the role for you.
What You’ll Do/Essential Job Functions
- Architect and build the conversation orchestration service: ASR → LLM inference → TTS streaming in real time
- Write robust, asynchronous Python code designed to handle high concurrency without deadlocks, race conditions, or memory leaks
- Design and maintain clean, well-structured APIs for future scalability and ease of debugging
- Manage interaction data using SQLAlchemy (or equivalent) with efficient schema design and safe migrations
- Implement observability: structured logging, metrics, and tracing across the system for instant issue diagnosis
- Partner with ML and Product teams to rapidly iterate on conversation flow and user experience
- Enforce a strong testing culture: automated unit tests, E2E flows, and load testing
- Build resilient systems capable of handling real-world edge cases like noisy audio, unreliable APIs, and flaky networks
- Continuously profile, optimize, and reduce latency and response times
Requirements
What We Expect You To Know/Requirement
- Deep Python expertise: 5+ years in Python, production systems experience required, context managers, generators, event loops, GIL, and effective use of asyncio
- Database fundamentals: data modeling, efficient queries, ORM best practices
- Networking & I/O: streaming, backpressure, and resilient design for unreliable networks
- Testing discipline: delivering production-ready, validated code
- Observability mindset: metrics, logs, and traces are integral to your coding process
- Production readiness: You’ve built and supported systems running live at scale.
What You’re Like
- Curious: You don’t just fix bugs — you find root causes
- Calm under pressure: You can diagnose incidents, resolve them quickly, and prevent recurrences
- Pragmatic: You solve problems without over-engineering
- Collaborative: You write code for your teammates and your future self
- Quality-driven: You refuse to compromise on correctness and reliability
- Data-informed: You make decisions based on real latency metrics, throughput, and error rates
What To Expect
This is not a feature-factory role. You will be responsible for building a real-time system that stays online, hits latency targets, and performs reliably under pressure.
Team Culture
- Direct, collaborative, and low on politics
- High ownership: see your work running in production, serving real users
- Obsessed with clarity, correctness, and reliability
- Fast-moving: minimal ceremony, maximum impact
- Pragmatic planning: no endless poker sessions — scope, assign, design, deploy.
Your Daily Work
- Ensure performance, reliability, and observability in everything you build
- Collaborate closely with ML and product teams on speech recognition, TTS voices, and LLM behavior.
- Monitor, debug, and improve the system as it runs in production.
Working Terms: The candidate must be flexible and work during US hours at least until 6 PM ET, which is essential for this role & must have their own system/work setup for remote work.