Open to opportunities

Manas Singh User

@manassinghuser

Message

Data engineer building cloud-native ETL, warehouses, and real-time pipelines.

India

Message

What I'm looking for

I’m looking for cloud-native data engineering work where I can build Databricks/AWS warehouses, cost-optimized ETL/ELT, and real-time streaming pipelines—collaborating with engineering and analytics teams to deliver reliable data products, dashboards, and ML-ready datasets.

I’m a Data Engineer focused on shipping production warehouses and archival pipelines that move multi-source data efficiently while cutting storage cost. I’m currently architecting Analytics Vidhya’s Databricks warehouse on AWS, turning high-volume sources into query-ready layers.

In my latest role, I built a centralized Databricks warehouse ingesting 500+ GB from PostgreSQL, MariaDB, MongoDB, and GA4 into an S3 medallion landing layer, achieving ~5x footprint reduction (to ~100 GB) using Parquet/Snappy. I designed the Bronze → Silver → Gold medallion flow, with the Silver catalog live and Gold aggregates in progress to power B2C/B2B reporting.

I also shipped “Project Brahma,” an archival pipeline moving MongoDB user-activity data into S3 Glacier with ~5x compression and lower cold-storage spend. Earlier, I built Spark pipelines and Streamlit dashboards for YouTube sentiment analysis, and I’ve worked across Kafka, Spark, and Azure analytics/ML pipelines with a strong emphasis on schema design, data contracts, and cost optimization.

Experience

Work history, roles, and key accomplishments

Current

Data Engineer

Current

Analytics Vidhya

Aug 2025 - Present (1 year)

Architected a centralized Databricks warehouse ingesting 500+ GB from PostgreSQL, MariaDB, MongoDB, and GA4 into an S3 medallion landing layer, reducing footprint ~5x to ~100 GB via Parquet/Snappy. Shipped an archival pipeline moving MongoDB user-activity data into S3 Glacier with ~5x compression and reduced cold-storage spend while defining schemas and data contracts with backend and analytics st

S3 Databricks Apache Spark Kafka Parquet Medallion Architecture Data Modeling Schema Design

Data Engineering Intern

Xebia

Jun 2024 - Aug 2024 (2 months)

Built a Spark batch pipeline for YouTube comment sentiment analysis over large scraped datasets to surface audience-reaction patterns across videos. Shipped Streamlit dashboards (sentiment distributions and engagement breakdowns) to support non-technical stakeholder decision-making.

Apache Spark PySpark Python Sentiment Analysis Streamlit ETL Data Visualization NLP

Education

Degrees, certifications, and relevant coursework

Microsoft

Microsoft Certified: Azure AI Engineer Associate, Azure AI

2024 -

Earned the Microsoft Certified: Azure AI Engineer Associate certification in July 2024.

University of Petroleum and Energy Studies (UPES)

Bachelor of Technology (B.Tech), Computer Science Engineering (Big Data Specialization)

2021 - 2025

Grade: CGPA 7.95/10

Completed a B.Tech in Computer Science Engineering with a Big Data specialization. Coursework covered Big Data Analytics, Distributed Systems, Data Mining, and Machine Learning.