L3 Big datadeveloper ( 7-10 Years)
1. Design, develop, and implement highly scalable and distributed big datasolutions using Hadoop ecosystem technologies such as HBase, Hive, Kudu, andSpark.
2. Architect HBase schemas and data models to accommodate evolving businessrequirements and ensure optimal performance for data storage and retrievaloperations.
3. Develop complex Hive queries and data processing pipelines to transform rawdata into structured formats suitable for analysis and reporting.
4. Implement data ingestion pipelines using Spark Streaming and Spark SQL forreal-time processing of streaming data sources, ensuring high throughput andlow latency.
5. Optimize Spark applications for performance and resource utilization,including tuning RDD transformations, optimizing data partitioning strategies,and leveraging in-memory caching.
6. Utilize advanced features of Spark MLlib for machine learning tasks such asclassification, regression, clustering, and collaborative filtering.
7. Design and deploy Kudu tables for fast analytical queries and real-timeanalytics, leveraging Kudus unique combination of fast analytics and fast dataingestion.
8. Collaborate with data scientists to integrate machine learning models intoSpark workflows and productionize them for real-time predictions and analytics.
9. Troubleshoot performance bottlenecks, data quality issues, and systemfailures in big data applications and infrastructure, and implement solutionsto address them.
10. Stay abreast of emerging technologies and best practices in big data processingand analytics, and evaluate their potential impact on our architecture andsolutions.
