Overview
About the Role
Key Responsibilities
- Design and implement modern data architectures using Databricks Lakehouse platform
- Lead the technical design of Data Warehouse/Data Lake migration initiatives from legacy systems
- Develop data engineering frameworks and reusable components to accelerate delivery
- Establish CI/CD pipelines and infrastructure-as-code practices for data solutions
- Implement data catalog solutions and governance frameworks
- Create technical specifications and architecture documentation
- Provide technical leadership to data engineering teams
- Collaborate with cross-functional teams to ensure alignment of data solutions
- Evaluate and recommend technologies, tools, and approaches for data initiatives
- Ensure data architectures meet security, compliance, and performance requirements
- Mentor junior team members on data architecture best practices
- Stay current with emerging technologies and industry trends
Qualifications
- Extensive experience in data architecture design and implementation
- Strong software engineering background with expertise in Python or Scala
- Proven experience building data engineering frameworks and reusable components
- Experience implementing CI/CD pipelines for data solutions
- Expertise in infrastructure-as-code and automation
- Experience implementing data catalog solutions and governance frameworks
- Deep understanding of Databricks platform and Lakehouse architecture
- Experience migrating workloads from legacy systems to modern data platforms
- Strong knowledge of healthcare data requirements and regulations
- Experience with cloud platforms (AWS, Azure, GCP) and their data services
- Bachelor's degree in computer science, Information Systems, or related field; advanced degree preferred
Technical Skills
- Programming languages: Python and/or Scala (required)
- Data processing frameworks: Apache Spark, Delta Lake
- CI/CD tools: Jenkins, GitHub Actions, Azure DevOps
- Infrastructure-as-code (optional): Terraform, CloudFormation, Pulumi
- Data catalog tools: Databricks Unity Catalog, Collibra, Alation
- Data governance frameworks and methodologies
- Data modeling and design patterns
- API design and development
- Cloud platforms: AWS, Azure, GCP
- Container technologies: Docker, Kubernetes
- Version control systems: Git
- SQL and NoSQL databases
- Data quality and testing frameworks
Optional - Healthcare Industry Knowledge
- Healthcare data standards (HL7, FHIR, etc.)
- Clinical and operational data models
- Healthcare interoperability requirements
- Healthcare analytics use cases