Pixalate is seeking a PhD-level Big Data Engineer to develop intelligent, self-healing data systems processing petabyte-scale data. The role involves research in distributed ML systems and AI-enhanced data optimization. It's an opportunity to impact business operations by applying cutting-edge AI research to fundamental data challenges.
Requirements
- PhD in Computer Science, Data Science, or Distributed Systems
- Expert SQL
- Python
- Scala/Java
- Big Data Stack: Spark 3.5+, Flink, Kafka, Ray, Dask
- Storage & Orchestration: Delta Lake, Iceberg, Airflow, Dagster, Temporal
- Cloud Platforms: GCP (BigQuery, Dataflow, Vertex AI), AWS (EMR, SageMaker), Azure (Databricks)
- ML Systems: MLflow, Kubeflow, Feature Stores, Vector Databases, scikit-learn + search CV, H2O AutoML, auto-sklearn, GCP Vertex AI AutoML Tables
- Neural Architecture Search: KerasTuner, AutoKeras, Ray Tune, Optuna, PyTorch Lightning + Hydra
- Research Skills
- Track record with 100TB+ datasets
- Experience with lakehouse architectures, streaming ML, and graph processing at scale
- Understanding of distributed systems theory and ML algorithm implementation
- Experience applying LLMs to data engineering challenges
- Experience implementing feature engineering automation or NAS experiments