Jason Chou
@jasonchou
Senior data scientist delivering NLP, semantic search, and production ML solutions.
What I'm looking for
I’m a data science and machine learning professional with 7+ years delivering NLP, semantic search, and entity resolution for document-centric analytics. I thrive on turning mission requirements into prioritized AI and analytics use cases with real stakeholder impact.
In my work, I’ve designed transformer-based retrieval and ranking models in PyTorch and HuggingFace, improving precision@10 by 22%. I’ve built OCR/ICR-driven ingestion pipelines that processed 5M documents and reduced parsing errors by 47%, and I’ve engineered end-to-end semantic search integrated with Elasticsearch and Databricks to cut median query latency by 40% and boost recall by 30%.
I also focus on production readiness and measurable trust: I’ve implemented graph-based entity resolution with Neo4j to reduce duplicates by 78% across a 20M-entity index, deployed models via REST APIs and scheduled batch pipelines using Docker, Airflow, and cloud services, and used SHAP/LIME-based explainability to increase XAI adoption by 65% across pilots.
Experience
Work history, roles, and key accomplishments
Senior Data Scientist / ML Eng
Integer Group
Sep 2021 - Mar 2026 (4 years 6 months)
Delivered NLP and semantic search pilots with 10+ stakeholders, improving precision@10 by 22% using transformer-based retrieval and reducing median query latency by 40% while boosting recall by 30%. Built end-to-end OCR/ICR ingestion for 5M documents (47% fewer parsing errors), enabled entity resolution to cut duplicates by 78%, and shipped 12 Tableau/Looker dashboards that reduced analyst time-to
Data Scientist / ML Engineer
Integer Group
Feb 2019 - Aug 2021 (2 years 6 months)
Partnered with product owners and data engineers to scope and run reproducible Databricks experiments, contributing to 6 pilot initiatives. Improved NER recall by 18% and precision by 12%, reduced data freshness from 48 hours to 4 hours with Airflow/Spark scheduling, and built a 2M-page labeled OCR corpus for downstream NLP models.
Data Analyst Intern
Divercety LLC
May 2017 - Aug 2017 (3 months)
Cleaned and analyzed marketing datasets using R and SQL, reducing missing-value rates by 27% across key tables. Built NLP preprocessing with tokenization and TF-IDF for a 30,000-document corpus and produced 4 Tableau/Matplotlib reports for marketing and analytics teams.
Education
Degrees, certifications, and relevant coursework
University of Texas at Dallas
Master of Science in Computer Science, Computer Science
2012 - 2018
Earned a Master of Science in Computer Science at the University of Texas at Dallas from 2012 to 2018.
Availability
Location
Authorized to work in
Job categories
Skills
Interested in hiring Jason?
You can contact Jason and 90k+ other talented remote workers on Himalayas.
Message JasonFind your dream job
Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!
