About MediaRadar
MediaRadar, now including the data and capabilities of Vivvix, powers the mission-critical marketing and sales decisions that drive competitive advantage. Our competitive advertising intelligence platform enables clients to achieve peak performance with always-on data and insights that span the media, creative, and business strategies of five million brands across 30+ media channels. By bringing the advertising past, present, and future into focus, our clients rapidly act on the competitive moves and emerging advertising trends impacting their business.
Job Summary:
We’re continuing to build a best-in-class AI and Machine Learning team focused on delivering advanced capabilities that empower both our data organization and customers.
This team is responsible for developing scalable, intelligent systems that automate complex data workflows, improve data quality, and enable smarter insights through cutting-edge AI, LLM, and retrieval technologies.
As a Machine Learning Engineer, you’ll be a key contributor in designing, implementing, and optimizing machine learning solutions that power our data products and enhance our customers’ experience. This is a hands-on role for someone who enjoys solving technically challenging problems at the intersection of data, engineering, and AI.
Stack highlights: PostgreSQL + pgvector, LangChain, Azure OpenAI, SQLAlchemy/Alembic, Pydantic, pytest, async I/O.
Responsibilities:
- Retrieval & Relevance
- Improve retrieval quality through scoring optimization, fusion methods (RRF vs weighted), and query normalization.
- Implement heuristics and relevance-tuning logic to enhance matching precision and recall.
- Design and evaluate hybrid retrieval workflows combining semantic and lexical search.
- Model Development & Evaluation
- Build, fine-tune, and evaluate LLM-based agents for classification, deduplication, and decision-making tasks.
- Develop pipelines to measure accuracy, precision, recall, and model reliability.
- Implement guardrails, thresholds, and fallback logic to ensure consistent, explainable results (Langfuse observability).
- Data Engineering & Infrastructure
- Optimize data vectorization and ingestion jobs (batching, concurrency, retry logic, and backfills).
- Maintain ORM models and database migrations using SQLAlchemy + pgvector and Alembic.
- Ensure data schema consistency and efficient vector indexing with pgvector.
- Develop clean, scalable ETL/ELT workflows to support data enrichment and ML readiness.
- Operational Excellence
- Create observability tools, logging, and metrics dashboards to support production ML systems.
- Produce reviewer-friendly exports, lightweight CLIs, and analytical reports for QA and ops teams.
- Contribute to documentation, design standards, and operational best practices for ML pipelines.
Success Measures:
- Retrieval Performance: Demonstrable improvements in model recall, precision, and fusion quality.
- System Reliability: Scalable, high-throughput ingestion and vectorization with minimal downtime.
- Model Impact: Proven improvement in automation, deduplication, or classification accuracy.
- Code Quality: Robust, well-tested, and maintainable codebase with strong documentation.
- Operational Efficiency: Faster iteration cycles, reproducibility, and measurable performance gains.
Requirements
Key Qualifications and Role Requirements:
- Expert Python engineering skills — strong understanding of typing, packaging, async I/O, and performance optimization.
- Deep PostgreSQL expertise — SQL, indexing (pg_trgm, ivfflat/hnsw), and query plan optimization.
- Proficiency in machine learning system design with emphasis on retrieval, RAG, or LLM-based architectures.
- Experience with LangChain, OpenAI/Azure OpenAI, or equivalent LLM frameworks.
- Strong testing and evaluation mindset (pytest, metrics, eval harnesses).
- Hands-on experience with LLM agents and Retrieval-Augmented Generation (RAG) pipelines.
- Familiarity with asyncio or ThreadPoolExecutor for concurrent I/O-bound processes.
- Experience with Docker, devcontainers, or Kubernetes for scalable deployments.
- Background in observability, metrics logging, or offline evaluation frameworks (e.g., Langfuse).
- Exposure to both relational and NoSQL databases (PostgreSQL, MongoDB).
- Experience integrating ML components into production-grade APIs or services.
