Vanija Joshi
@vanijajoshi
Principal AI & Data Engineer building production LLM systems end-to-end.
What I'm looking for
I’m a Senior/Principal AI & Data Engineer with 6+ years shipping production LLM-driven systems, autonomous data pipelines, and AI-integrated platforms end-to-end—from architecture through deployment and monitoring. I currently build and own two production AI systems at Health Catalyst, operating without handoff: blank page to deployed, monitored production system.
My core work focuses on agentic pipeline architecture and LLM integration, including RAG pipelines (FAISS vector search and cross-encoder reranking), OpenAI API usage (embeddings, completions, fine-tuning), and continuous training & model evaluation with MLflow. For TIGR Chart Abstraction, I designed an event-driven inference pipeline that auto-populates clinical registry fields from unstructured notes, plus a continuous training pipeline that ingests model outputs, prepares fine-tuning datasets, evaluates in MLflow, and performs loss-gated deployment.
I also lead end-to-end data infrastructure and observability—Databricks (AWS and Azure), Python validation, and CI/CD across multi-environment DEV/PROD. At Health Catalyst, I built CCI (Cost & Clinical Intelligence) on Azure Databricks and created structured outputs for the PowerCosting UI, reducing healthcare implementation cycles by 800 hours, while observability reduced manual data quality checks by 60% and maintained a 99.9% SLA across 12+ clients processing 10TB+ daily.
Experience
Work history, roles, and key accomplishments
Principal AI & Data Engineer
Health Catalyst
Apr 2024 - Present (2 years 1 month)
Architected and owned two production AI systems—TIGR Chart Abstraction (RAG-based clinical registry auto-population) and CCI Cost & Clinical Intelligence—deployed with event-driven Azure inference and outputs for hospital clients. Built continuous training with MLflow evaluation and loss-gated rollovers, and implemented CI/CD and data observability to cut manual data checks 60% while maintaining 9
Oracle OTBI & ETL Expert
Kornit Digital
Sep 2023 - Mar 2024 (6 months)
Designed and deployed 25+ production Oracle OTBI dashboards and reports integrating Oracle E-Business Suite and Oracle Cloud data for global manufacturing operations. Built ETL processes and optimized SQL/T-SQL to improve report performance 40%, supporting requirements and UAT with business analysts and IT stakeholders.
Data Engineer
Revol Greens
Jan 2023 - Sep 2023 (8 months)
Built an end-to-end AWS data platform to integrate IoT sensor data, ERP systems, and external APIs, processing 5M+ data points daily across 15+ greenhouse facilities. Created crop yield forecasting and real-time anomaly detection models (40% accuracy improvement) and developed Power BI dashboards for climate and yield KPIs, reducing operational waste 15%.
Senior Data Analyst / Data Engineer
Vayra Renewable Energy Solutions
Apr 2020 - Feb 2023 (2 years 10 months)
Architected a dimensional warehouse on SQL Server and Azure Data Factory for 8+ energy sites, and used Databricks to build Lakehouse ingestion and transformations for SCADA telemetry, market pricing, and ERP data. Optimized star-schema and SCD models to reduce report load times 50% and improved plant efficiency 12% via predictive maintenance ML, with automated ETL pipelines at 99.5% reliability.
Education
Degrees, certifications, and relevant coursework
NIT Warangal
Master of Technology (Engineering Physics), Engineering Physics
2016 - 2019
Completed an M.Tech in Engineering Physics at NIT Warangal (2016–2019) with a focus on data science, machine learning, and data acquisition.
Availability
Location
Authorized to work in
Job categories
Skills
Interested in hiring Vanija?
You can contact Vanija and 90k+ other talented remote workers on Himalayas.
Message VanijaFind your dream job
Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!
