Skip to main content
Vanija JoshiVJ
Open to opportunities

Vanija Joshi

@vanijajoshi

Principal AI & Data Engineer building production LLM systems end-to-end.

India
Message

What I'm looking for

I’m looking to own production LLM-driven systems end-to-end—RAG and agentic pipelines, continuous training/evaluation, and robust data infrastructure—with strong CI/CD, observability, and autonomy to ship and monitor in multi-environment setups.

I’m a Senior/Principal AI & Data Engineer with 6+ years shipping production LLM-driven systems, autonomous data pipelines, and AI-integrated platforms end-to-end—from architecture through deployment and monitoring. I currently build and own two production AI systems at Health Catalyst, operating without handoff: blank page to deployed, monitored production system.

My core work focuses on agentic pipeline architecture and LLM integration, including RAG pipelines (FAISS vector search and cross-encoder reranking), OpenAI API usage (embeddings, completions, fine-tuning), and continuous training & model evaluation with MLflow. For TIGR Chart Abstraction, I designed an event-driven inference pipeline that auto-populates clinical registry fields from unstructured notes, plus a continuous training pipeline that ingests model outputs, prepares fine-tuning datasets, evaluates in MLflow, and performs loss-gated deployment.

I also lead end-to-end data infrastructure and observability—Databricks (AWS and Azure), Python validation, and CI/CD across multi-environment DEV/PROD. At Health Catalyst, I built CCI (Cost & Clinical Intelligence) on Azure Databricks and created structured outputs for the PowerCosting UI, reducing healthcare implementation cycles by 800 hours, while observability reduced manual data quality checks by 60% and maintained a 99.9% SLA across 12+ clients processing 10TB+ daily.

Experience

Work history, roles, and key accomplishments

HC
Current

Principal AI & Data Engineer

Health Catalyst

Apr 2024 - Present (2 years 2 months)

Architected and owned two production AI systems—TIGR Chart Abstraction (RAG-based clinical registry auto-population) and CCI Cost & Clinical Intelligence—deployed with event-driven Azure inference and outputs for hospital clients. Built continuous training with MLflow evaluation and loss-gated rollovers, and implemented CI/CD and data observability to cut manual data checks 60% while maintaining 9

RG

Data Engineer

Revol Greens

Jan 2023 - Sep 2023 (8 months)

Built an end-to-end AWS data platform to integrate IoT sensor data, ERP systems, and external APIs, processing 5M+ data points daily across 15+ greenhouse facilities. Created crop yield forecasting and real-time anomaly detection models (40% accuracy improvement) and developed Power BI dashboards for climate and yield KPIs, reducing operational waste 15%.

VS

Senior Data Analyst / Data Engineer

Vayra Renewable Energy Solutions

Apr 2020 - Feb 2023 (2 years 10 months)

Architected a dimensional warehouse on SQL Server and Azure Data Factory for 8+ energy sites, and used Databricks to build Lakehouse ingestion and transformations for SCADA telemetry, market pricing, and ERP data. Optimized star-schema and SCD models to reduce report load times 50% and improved plant efficiency 12% via predictive maintenance ML, with automated ETL pipelines at 99.5% reliability.

Education

Degrees, certifications, and relevant coursework

NIT Warangal logoNW

NIT Warangal

Master of Technology (Engineering Physics), Engineering Physics

2016 - 2019

Completed an M.Tech in Engineering Physics at NIT Warangal (2016–2019) with a focus on data science, machine learning, and data acquisition.

Find your dream job

Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan