Role: Data Engineer
Overview:
Role & Responsibilities:
- Data Pipeline Development & Optimization:
- Design, build, and maintain scalable and reliable data pipelines to support analytics, ML models, and business reporting.
- Collaborate with data scientists and analysts to ensure data is available, clean, and optimized for downstream use.
- Implement data quality checks, monitoring, and validation processes.
- Data Architecture & Integration:
- Work with cross-functional teams to design efficient ETL/ELT workflows using modern data tools.
- Integrate data from multiple sources (databases, APIs, third-party tools) into centralized storage solutions (data lakes/warehouses).
- Support cloud-based infrastructure for data storage and retrieval.
- Performance & Scalability:
- Monitor, troubleshoot, and optimize existing data pipelines to handle large-scale, real-time data flows.
- Implement best practices for query optimization and cost-efficient data storage.
- Ensure data is available and accessible for business-critical operations.
- Collaboration & Documentation:
- Partner with product, engineering, and business stakeholders to understand data requirements.
- Document data workflows, schemas, and best practices.
- Support a culture of data reliability, governance, and security.
Requirements:
- Proficiency in Python and SQL for data engineering tasks.
- Strong understanding of ETL/ELT processes, data warehousing, and data modeling.
- Hands-on experience with cloud platforms (AWS, GCP, or Azure) and data storage solutions (BigQuery, Redshift, Snowflake, etc.).
- Familiarity with data orchestration tools Airflow, Airbyte is a must.
- Experience with containerization & deployment tools (Docker, Kubernetes) is a plus.
- Knowledge of data governance, security, and best practices for handling sensitive data.
- Familiarity to work with Git and GitHub.
- Dataform is a must
- Strong skills in eliciting requirements from cross-functional stakeholders and translating them into actionable data engineering tasks.
Experience:
- 2+ years in data engineering, building and maintaining data pipelines.
- 2+ years in SQL and Python development for production environments.
- Experience working in fast-growing startup environments is a plus.
- Exposure to real-time data processing frameworks (Kafka, Spark, Flink) is a plus.
