Who are we?
Join our team building production ML infrastructure for enterprise-scale machine learning pipelines.You'll work on a platform that orchestrates end-to-end ML workflows from data ingestion through model training, evaluation, and deployment.
How will you contribute?
- Build and maintain Apache Airflow DAGs for ML pipeline orchestration
- Develop SageMaker training jobs for NLP models (NeMo, PyTorch)
- Implement MLflow tracking and model registry integrations
- Write infrastructure-as-code using Terraform (AWS S3, IAM, VPC)
- Create comprehensive tests for ML pipeline components
- Follow spec-driven development practices with Claude Code
- Contribute to ML observability and evaluation frameworks
What will you bring?
- Experience with PyTorch, transformers, or other ML libraries
- Familiarity with ML model evaluation and experimentation
- Interest in ML/AI infrastructure and operations
- Strong problem-solving and debugging skills
- Comfortable with Linux/command-line environments
- Knowledge of AWS services (S3, SageMaker, IAM)
- Exposure to Apache Airflow or workflow orchestration
- Understanding of CI/CD, testing, or infrastructure-as-code
