Design, build, and optimize ETL pipelines using AWS Glue 3.0+ and PySpark. Implement scalable and secure data lakes using Amazon S3, following bronze/silver/gold zoning.
Requirements
- Strong hands-on experience with AWS: Glue, S3, Athena, Step Functions, EventBridge, CloudWatch, Glue Data Catalog.
- Programming skills in Python 3.x, PySpark, and SQL (Athena/Presto).
- Proficient with Pandas and NumPy for data wrangling, feature extraction, and time series slicing.
- Strong command over data governance tools like Great Expectations, OpenMetadata / Amundsen.
- Familiarity with tagging sensitive metadata (PII, KPIs, model inputs).
- Capable of creating audit logs for QA and rejected data.
- Experience in feature engineering – rolling averages, deltas, and time-window tagging.
- BI-readiness with Sigma, with exposure to Power BI / Tableau (nice to have).
- Excellent communication and collaboration skills with data scientists, QA, and business users.
- Self-starter with strong problem-solving and critical thinking abilities.
- Ability to translate business KPIs and domain requirements into technical implementations.
- Detail-oriented with a high standard of data quality and compliance.
- Demonstrated accountability, confidentiality, and ethical standards in handling data.
Benefits
- Flexible work arrangement in India