Looking for a job

Sanskar Srivastava

@sanskarsrivastava

Message

I build data science and ML systems for real-world AI impact.

United States

Message

What I'm looking for

I’m looking for a team where I can build production-ready ML/LLM systems—especially RAG, real-time data pipelines, and scalable workflows—while using strong evaluation to drive measurable business impact and continuous model iteration.

I’m a Data Scientist and ML Engineer building AI systems, RAG, and deep learning models, with a strong focus on turning messy, unstructured data into structured insights. At Indiana University, I developed a dual-pipeline LLM + OCR/computer-vision redaction detection system for 40,000+ legal PDFs, reaching 92% combined accuracy, and I architected GPU-accelerated pipelines that reduced processing time by 10x.

In parallel, I extended transformer-based computational modeling for mental health discourse, using SBERT and 500K+ Reddit posts to generate triplet training data and to fine-tune models that map cognitive beliefs into belief networks. My work consistently emphasizes production readiness—scalable architectures, rigorous evaluation, and measurable outcomes—from causal churn modeling with uplift strategies to low-latency analytics and multimodal e-commerce systems.

Experience

Work history, roles, and key accomplishments

Current

Machine Learning Engineer

Current

Indiana University

Jan 2026 - Present (6 months)

Developed a dual-pipeline redaction detection system for 40,000+ legal PDFs using a Qwen-based LLM contextual approach and OCR/computer vision, achieving 92% combined accuracy. Architected GPU-accelerated HPC pipelines to convert unstructured legal documents into structured datasets, reducing processing time by 10x and enabling analysis of court confidentiality practices.

Qwen LLM Computer Vision HPC Data Engineering NLP

Current

LLM Engineer

Current

Soda Labs

Sep 2025 - Present (10 months)

Extended transformer-based computational modeling for mental-health discourse using SBERT on 500K+ Reddit posts by generating triplet training data to capture semantic patterns related to depression discussions. Fine-tuned models to map cognitive beliefs into belief networks, identifying patterns and underlying cognitive structures in mental health conditions.

Transformers Fine Tuning Triplet Training NLP Reddit Data Modeling Deep Learning

Education

Degrees, certifications, and relevant coursework

Indiana University Bloomington

Master’s in Data Science, Data Science

2024 - 2026

Grade: GPA 3.94

Master’s in Data Science (GPA 3.94) with coursework including Applied Machine Learning, Data Mining, Advanced Database, Big Data principles, and Intro/Elements of AI and LLMs.