This is a remote position.
Project Description
A high-performance, distributed data processing platform built using Java and Scala to support real-time analytics and large-scale data transformation. The system ingests structured and unstructured data from multiple sources, processes it using scalable frameworks, and delivers insights to downstream applications.
The platform leverages Apache Spark for big data processing and integrates with microservices developed in Java. It is designed with a focus on fault tolerance, scalability, and low-latency processing to support enterprise-grade business intelligence and decision-making.
Roles & Responsibilities
- Develop and maintain scalable backend services using Java and Scala.
- Design and implement data pipelines using Apache Spark (RDD, DataFrame, Dataset APIs).
- Collaborate with cross-functional teams to gather requirements and translate them into technical solutions.
- Optimize performance of data processing jobs and ensure efficient memory utilization.
- Build RESTful APIs using Java frameworks like Spring Boot.
- Integrate data pipelines with messaging systems such as Apache Kafka.
- Write clean, modular, and testable code following best practices.
- Participate in code reviews, debugging, and troubleshooting production issues.
- Ensure application reliability through logging, monitoring, and alerting mechanisms.
- Work in Agile/Scrum environments and contribute to sprint planning and delivery.
Technical Requirements
Core Skills
- Strong proficiency in Java (8 or above) and Scala
- Hands-on experience with Apache Spark
- Good understanding of object-oriented and functional programming concepts
Frameworks & Tools
- Experience with Spring Boot / Spring ecosystem
- Knowledge of distributed systems and microservices architecture
- Familiarity with Apache Kafka or similar messaging systems
Data & Cloud
- Experience with SQL and NoSQL databases
- Exposure to big data ecosystems (Hadoop, Hive)
- Cloud platforms like AWS/Azure (preferred)
Other Skills
- Understanding of CI/CD pipelines (Jenkins, Git, etc.)
- Strong problem-solving and debugging skills
- Good communication and teamwork abilities
Nice-to-Have
- Experience with streaming data (Spark Streaming / Kafka Streams)
- Knowledge of containerization tools like Docker & Kubernetes
- Exposure to data warehousing and ETL processes
