Open to opportunities

Hassan Pasha

@hassanpasha1

Message

Senior AI Engineer building low-latency multimodal LLM systems with RAG and safety.

United States

Message

What I'm looking for

I’m looking for a team where I can build and scale production LLM/AI systems end-to-end—RAG, low-latency multimodal inference, and rigorous evaluation—while partnering with Product and MLOps to ship safety-conscious features fast.

I’m a Senior AI Engineer with 5+ years designing, deploying, and scaling production AI/ML systems for real-time and multimodal applications. I specialize in LLM-based architectures—RAG pipelines, prompt-based workflows, fine-tuning—plus vision and ASR, with hands-on GPU optimization and low-latency inference.

Across my work, I own the full ML lifecycle, from data ingestion and training through deployment, monitoring, evaluation, and iteration. I build evaluation frameworks to measure faithfulness, relevance, and ranking quality, and I continuously improve accuracy while reducing bias across outputs.

I also operationalize agentic and tool-using LLM systems, including RAG, structured outputs, guardrails, fallback chains, and safety/bias mitigation strategies. In educational environments, I’ve implemented regulatory-compliance guardrails aligned with FERPA/COPPA and helped translate classroom needs into scalable, production-ready AI.

Partnering closely with Product, Engineering, and MLOps teams, I communicate tradeoffs between model complexity, performance, and cost, and I contribute to long-term AI roadmap and architecture decisions. My goal is to deliver reliable, safety-conscious AI systems that perform under real constraints—latency, cost, and deployment velocity.

Experience

Work history, roles, and key accomplishments

Current

Senior AI Engineer

Current

AbbVie

May 2023 - Present (3 years 2 months)

Designed and deployed real-time voice and vision AI systems for educational R&D environments, reducing inference latency 20–30% and improving system efficiency 15%. Operationalized LLM-based RAG and agent workflows, accelerating deployment cycles 25% and improving generative model accuracy 10–15% while implementing FERPA/COPPA safety guardrails and bias mitigation.

Real Time AI Inference Vision And ASR Pipelines Fine Tuning Safety Guardrails (FERPA COPPA)

Machine Learning Engineer

Mass General Brigham

Apr 2020 - Apr 2023 (3 years)

Built end-to-end machine learning pipelines in Python for large-scale healthcare datasets, enabling clinical decision-making workflows. Developed and optimized supervised and deep learning models for patient risk prediction and medical text classification, and implemented modular LLM RAG systems with cross-encoder reranking for improved clinical retrieval precision.

Machine Learning Pipelines scikit learn Cross Encoder Re Ranking

Data Scientist

7-Eleven

Feb 2017 - Mar 2020 (3 years 1 month)

Developed end-to-end machine learning models in Python and scikit-learn for customer behavior analysis and sales prediction, supporting retail business decisions. Engineered distributed ETL and forecasting workflows using Spark/PySpark and time-series methods, and built segmentation, recommendation, and NLP sentiment pipelines to drive measurable business impact via A/B testing.

Forecasting (ARIMA Moving Averages)ETL Spark PySpark Recommendation Systems NLP TF IDF A B Testing