Build and manage reliable, production-quality batch and streaming data pipelines using Python on AWS (S3, Athena, Glue, Lambda, Kinesis/Kafka). Model, catalog, and optimize data in a data lake/lakehouse environment (Parquet, partitioning, schema evolution, Glue Data Catalog, Lake Formation). Create and maintain Airflow DAGs with strong retry, dependency, SLA, and alerting features.
Requirements
- 4+ years of experience in Data engineering, using Python for data pipelines
- Deep experience with AWS analytics tools and advanced SQL skills, including performance and cost optimization
- Proven track record with batch and streaming pipelines and Airflow orchestration in production
Benefits
- Paid time off
- Retirement savings (e.g., 401k, pension schemes)
- Bonus/incentive eligibility
- Equity grants
- Participation in our employee stock purchase plan
- Competitive health benefits
- Parental leave
