We are looking for a Data Engineer to lead the development of scalable data pipelines within the Databricks ecosystem. You will be responsible for architecting robust ETL/ELT processes using a "configuration-as-code" approach, ensuring our data lakehouse is governed, performant, and production-ready.
Requirements
- Pipeline Architecture: Design and implement declarative data pipelines using Lakeflow and Databricks Asset Bundles (DABs) to ensure seamless CI/CD.
- Data Ingestion: Build efficient, scalable ingestion patterns using AutoLoader and Change Data Capture (CDC) to handle high-volume data streams.
- Governance & Security: Manage metadata, lineage, and access control through Unity Catalog.
- Orchestration: Develop and maintain complex workflows using Databricks Jobs and orchestration tools.
- Infrastructure as Code: (Asset) Utilize Terraform to manage AWS resources (S3, EC2) and Databricks workspaces.
- Expertise: Deep mastery of PySpark and advanced SQL.
- Platform: Extensive experience in the Databricks environment (Workflows, Delta Lake).
- Cloud: Familiarity with AWS infrastructure and cloud-native data patterns.
Benefits
- Competitive compensation
- Opportunity to work with a leading company in the industry
- Chance to develop and lead scalable data pipelines
