Skip to main content
HimalayasHimalayas logo
Vanija JoshiVJ
Open to opportunities

Vanija Joshi

@vanijajoshi

Principal AI & Data Engineer building production LLM systems end-to-end.

India
Message

What I'm looking for

I’m looking to own production LLM-driven systems end-to-end—RAG and agentic pipelines, continuous training/evaluation, and robust data infrastructure—with strong CI/CD, observability, and autonomy to ship and monitor in multi-environment setups.

I’m a Senior/Principal AI & Data Engineer with 6+ years shipping production LLM-driven systems, autonomous data pipelines, and AI-integrated platforms end-to-end—from architecture through deployment and monitoring. I currently build and own two production AI systems at Health Catalyst, operating without handoff: blank page to deployed, monitored production system.

My core work focuses on agentic pipeline architecture and LLM integration, including RAG pipelines (FAISS vector search and cross-encoder reranking), OpenAI API usage (embeddings, completions, fine-tuning), and continuous training & model evaluation with MLflow. For TIGR Chart Abstraction, I designed an event-driven inference pipeline that auto-populates clinical registry fields from unstructured notes, plus a continuous training pipeline that ingests model outputs, prepares fine-tuning datasets, evaluates in MLflow, and performs loss-gated deployment.

I also lead end-to-end data infrastructure and observability—Databricks (AWS and Azure), Python validation, and CI/CD across multi-environment DEV/PROD. At Health Catalyst, I built CCI (Cost & Clinical Intelligence) on Azure Databricks and created structured outputs for the PowerCosting UI, reducing healthcare implementation cycles by 800 hours, while observability reduced manual data quality checks by 60% and maintained a 99.9% SLA across 12+ clients processing 10TB+ daily.

Experience

Work history, roles, and key accomplishments

HC
Current

Principal AI & Data Engineer

Health Catalyst

Apr 2024 - Present (2 years 1 month)

Architected and owned two production AI systems—TIGR Chart Abstraction (RAG-based clinical registry auto-population) and CCI Cost & Clinical Intelligence—deployed with event-driven Azure inference and outputs for hospital clients. Built continuous training with MLflow evaluation and loss-gated rollovers, and implemented CI/CD and data observability to cut manual data checks 60% while maintaining 9

RG

Data Engineer

Revol Greens

Jan 2023 - Sep 2023 (8 months)

Built an end-to-end AWS data platform to integrate IoT sensor data, ERP systems, and external APIs, processing 5M+ data points daily across 15+ greenhouse facilities. Created crop yield forecasting and real-time anomaly detection models (40% accuracy improvement) and developed Power BI dashboards for climate and yield KPIs, reducing operational waste 15%.

VS

Senior Data Analyst / Data Engineer

Vayra Renewable Energy Solutions

Apr 2020 - Feb 2023 (2 years 10 months)

Architected a dimensional warehouse on SQL Server and Azure Data Factory for 8+ energy sites, and used Databricks to build Lakehouse ingestion and transformations for SCADA telemetry, market pricing, and ERP data. Optimized star-schema and SCD models to reduce report load times 50% and improved plant efficiency 12% via predictive maintenance ML, with automated ETL pipelines at 99.5% reliability.

Education

Degrees, certifications, and relevant coursework

NIT Warangal logoNW

NIT Warangal

Master of Technology (Engineering Physics), Engineering Physics

2016 - 2019

Completed an M.Tech in Engineering Physics at NIT Warangal (2016–2019) with a focus on data science, machine learning, and data acquisition.

Find your dream job

Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan