This is a remote position.
Our client is looking for an innovative and driven AI Engineer to join their team. A leader in media intelligence and AI-driven content creation, they have recently expanded their work in AI voice and image technologies, driving the development of the next generation of cutting-edge products. This role will focus on the creation, classification, and organization of massive volumes of AI-generated media, along with spearheading R&D into AI voice and audio generation and advanced image intelligence capabilities.
Job Description
Responsibilities:
- Design, train, and deploy classification models for content pipeline, including style detection, quality scoring, content moderation, filtering, and semantic categorization of generated media.
- Develop and maintain automated tagging and organization systems for the media library: extracting attributes, detecting visual features, clustering similar content, and enabling intelligent search.
- Build and optimize training data pipelines: create annotation tooling, curate datasets, establish active learning loops, and ensure high-quality labeled data.
- Lead R&D into AI voice and audio generation, including voice cloning, text-to-speech, and audio synthesis; prototype integrations and create a production-ready pathway from research to features.
- Research and prototype image intelligence technologies such as face/body analysis, pose estimation, style transfer, and image-to-image consistency.
- Develop evaluation frameworks to measure the accuracy of classifiers, the quality of generation models, and model drift over time.
- Optimize inference pipelines for performance, cost, and latency—incorporating batching, quantization, caching, and model serving strategies.
- Integrate with GPU compute infrastructure and deliver models via production APIs.
Requirements
- 3+ years of experience building and deploying machine learning models in production, particularly in classification, tagging, or content understanding.
- Hands-on experience with model training, including dataset curation, experimenting with architectures, tuning hyperparameters, and debugging.
- Strong background in image classification and computer vision techniques (e.g., CNNs, vision transformers, CLIP).
- Experience or demonstrated interest in voice/audio AI (e.g., text-to-speech, voice cloning, audio classification).
- Proficiency in Python, with experience in PyTorch or TensorFlow.
- Experience with building data labeling pipelines, annotation workflows, or active learning systems.
- Understanding of model serving in production environments, including REST APIs and latency optimization.
Qualifications:
- Bachelor’s degree or higher in Computer Science, Engineering, or related field.
- Experience in AI/ML, particularly in content classification, tagging, and media organization systems.
- Proven experience with Python and ML frameworks like PyTorch or TensorFlow.
- Strong communication skills to collaborate with R&D teams and integrate new technologies into production.
