We are seeking a highly skilled Databricks Engineer to join our dynamic data engineering team. This role is pivotal in driving the migration of existing Hadoop workloads to the modern Databricks Lakehouse platform, enabling our organization to leverage scalable, efficient, and innovative data solutions. The ideal candidate will be instrumental in designing, developing, optimizing, and monitoring data pipelines and architectures that support our business intelligence and analytics initiatives. This position offers an exciting opportunity to work with cutting-edge cloud technologies, collaborate across multiple teams, and contribute to a strategic phased migration roadmap that will transform our data infrastructure.
Responsibilities
- Identify and classify current Hadoop jobs and data sources to understand migration scope and complexity.
- Collaborate with stakeholders to prioritize and select use cases for Minimum Viable Product (MVP) migration initiatives.
- Set up and manage Databricks environments on Azure, ensuring optimal configuration for performance and security.
- Lead pilot projects to migrate Hadoop workloads to Databricks, validating data integrity and system performance post-migration.
- Design scalable, repeatable ETL processes using Apache Spark (Scala or PySpark) to transform and load data efficiently.
- Implement monitoring solutions to track pipeline health, query performance, and cost management.
- Utilize Databricks capabilities such as Delta Lake, Lakehouse Federation, Liquid Clustering, and Unity Catalog to enhance data governance and performance.
- Work closely with data scientists, analysts, DevOps, and business teams to ensure alignment and successful delivery of data solutions.
- Define and monitor KPIs related to migration success, data quality, system performance, and cost efficiency.
- Assist in developing a phased, strategic roadmap for the full migration from Hadoop to the Databricks Lakehouse platform.
- Implement and maintain data governance policies to ensure data security, privacy, and regulatory compliance throughout the migration process.
Requirements
- Bachelor’s degree in Computer Science, Information Technology, or a related
- field, or equivalent professional experience.
- 4-6 years of experience in data engineering, with hands-on experience in
- Databricks and Hadoop ecosystems.
- Proficiency in Spark (Scala or PySpark), ETL pipeline development, and data
- migration practices.
- Experience with Lakehouse architecture, Delta Lake, and Databricks advanced
- features (e.g., Lakehouse Federation, Liquid Clustering, Unity Catalog) is a
- strong plus.
- Solid understanding of cloud platforms (Azure) and data governance concepts.
- Strong analytical and problem-solving skills with attention to detail.
- Ability to work collaboratively in cross-functional teams and communicate
- effectively with both technical and non-technical stakeholders.
Nice-to-Have Skills
- Experience with Databricks Migration Accelerator or similar migration tools.
- Familiarity with synthetic data generation and pipeline health monitoring.
- Understanding of performance tuning, query optimization, and cost management on cloud data platforms.
- Ability to document migration patterns and contribute to best practices for broader adoption.