Vanija Joshi
@vanijajoshi
Principal AI & Data Engineer building production LLM systems end-to-end.
What I'm looking for
I’m a Senior/Principal AI & Data Engineer with 6+ years shipping production LLM-driven systems, autonomous data pipelines, and AI-integrated platforms end-to-end—from architecture through deployment and monitoring. I currently build and own two production AI systems at Health Catalyst, operating without handoff: blank page to deployed, monitored production system.
My core work focuses on agentic pipeline architecture and LLM integration, including RAG pipelines (FAISS vector search and cross-encoder reranking), OpenAI API usage (embeddings, completions, fine-tuning), and continuous training & model evaluation with MLflow. For TIGR Chart Abstraction, I designed an event-driven inference pipeline that auto-populates clinical registry fields from unstructured notes, plus a continuous training pipeline that ingests model outputs, prepares fine-tuning datasets, evaluates in MLflow, and performs loss-gated deployment.
I also lead end-to-end data infrastructure and observability—Databricks (AWS and Azure), Python validation, and CI/CD across multi-environment DEV/PROD. At Health Catalyst, I built CCI (Cost & Clinical Intelligence) on Azure Databricks and created structured outputs for the PowerCosting UI, reducing healthcare implementation cycles by 800 hours, while observability reduced manual data quality checks by 60% and maintained a 99.9% SLA across 12+ clients processing 10TB+ daily.
Experience
Work history, roles, and key accomplishments
Principal AI & Data Engineer
Health Catalyst
Apr 2024 - Present (2 years 2 months)
Architected and owned two production AI systems—TIGR Chart Abstraction (RAG-based clinical registry auto-population) and CCI Cost & Clinical Intelligence—deployed with event-driven Azure inference and outputs for hospital clients. Built continuous training with MLflow evaluation and loss-gated rollovers, and implemented CI/CD and data observability to cut manual data checks 60% while maintaining 9
Oracle OTBI & ETL Expert
Kornit Digital
Sep 2023 - Mar 2024 (6 months)
Designed and deployed 25+ production Oracle OTBI dashboards and reports integrating Oracle E-Business Suite and Oracle Cloud data for global manufacturing operations. Built ETL processes and optimized SQL/T-SQL to improve report performance 40%, supporting requirements and UAT with business analysts and IT stakeholders.
Data Engineer
Revol Greens
Jan 2023 - Sep 2023 (8 months)
Built an end-to-end AWS data platform to integrate IoT sensor data, ERP systems, and external APIs, processing 5M+ data points daily across 15+ greenhouse facilities. Created crop yield forecasting and real-time anomaly detection models (40% accuracy improvement) and developed Power BI dashboards for climate and yield KPIs, reducing operational waste 15%.
Senior Data Analyst / Data Engineer
Vayra Renewable Energy Solutions
Apr 2020 - Feb 2023 (2 years 10 months)
Architected a dimensional warehouse on SQL Server and Azure Data Factory for 8+ energy sites, and used Databricks to build Lakehouse ingestion and transformations for SCADA telemetry, market pricing, and ERP data. Optimized star-schema and SCD models to reduce report load times 50% and improved plant efficiency 12% via predictive maintenance ML, with automated ETL pipelines at 99.5% reliability.
Education
Degrees, certifications, and relevant coursework
NIT Warangal
Master of Technology (Engineering Physics), Engineering Physics
2016 - 2019
Completed an M.Tech in Engineering Physics at NIT Warangal (2016–2019) with a focus on data science, machine learning, and data acquisition.
Availability
Location
Authorized to work in
Job categories
Skills
Interested in hiring Vanija?
You can contact Vanija and 90k+ other talented remote workers on Himalayas.
Message VanijaFind your dream job
Sign up now and join over 250,000+ remote workers who receive personalized job alerts, curated job matches, and more for free!
