I want to build end-to-end ML systems in multimodal AI, computer vision, and generative AI — trained on real benchmarks, deployed as production services, and solving real problems. Open to global remote roles at startups and product companies.
Huzefa Merchant
@huzefamerchant
I'm an ML engineer building deployed multimodal, GenAI and computer vision systems.
What I'm looking for
I'm an ML Engineer focused on Generative AI, Multimodal AI, and Computer Vision, building real systems trained on real benchmarks like Flickr30k and COCO 2017. I rebuilt the same image captioning system 7 times — from InceptionV3+LSTM to ConvNeXt+Perceiver+GPT-2 — because getting it right mattered more than getting it done.
In my flagship LensToWords multimodal captioning pipeline, I designed an end-to-end model (ConvNeXt-Tiny → Perceiver with 16 latents → cross-attention injected into all 12 GPT-2 blocks) and drove it through phased training. I achieved BLEU-4 = 0.1341 on COCO 2017 after training on Flickr30k for architecture validation, then scaling up to COCO.
I also ship deployed ML applications: a Streamlit Movie Recommender System with a production similarity matrix, a FastAPI + Streamlit Potato Disease Detector structured as a deployable two-layer service, and NLP/vision products like the WhatsApp Chat Analyser and Fashion Recommender System. Built through hands-on research and Stanford/DeepLearning.AI certifications, I iterate with diagnosis-first engineering and I'm motivated by measurable outcomes, not demos.
Experience
Work history, roles, and key accomplishments
Built and deployed end-to-end ML systems across Generative AI, multimodal AI and computer vision. Flagship project: LensToWords — an image captioning system rebuilt through 7 architecture iterations (InceptionV3+LSTM → ConvNeXt+Perceiver+GPT-2), achieving BLEU-4: 0.1341 on COCO 2017. Shipped 8 public GitHub repositories including deployed applications in recommendation systems, and NLP analytics.
Automated repetitive operational tasks
using shell scripting, reducing manual effort. Monitored system logs and diagnosed anomalies -resolving shallow issues independently and escalating complex ones to development teams. Maintained enterprise-grade infrastructure stability.
Education
Degrees, certifications, and relevant coursework
Medicaps University
Bachelor of Technology, Mechanical Engineering
2017 - 2021
Grade: 7.2
Activities and societies: Social Club, NSS ( National Service Scheme)
Bachelor of Technology in Mechanical Engineering at Medicaps University, Indore starting in 2017.
DeepLearning.AI (Coursera)
Deep Learning Specialization, Deep Learning
Completed the Deep Learning Specialization on Coursera by DeepLearning.AI.
Stanford University / DeepLearning.AI (Coursera)
Machine Learning Specialization, Machine Learning
Completed the Machine Learning Specialization on Coursera by Stanford University and DeepLearning.AI.
Availability
Location
Authorized to work in
Website
huzefamerchant.vercel.appPortfolio
huzefamerchant.vercel.appSalary expectations
Job categories
Skills
Interested in hiring Huzefa ?
You can contact Huzefa and 90k+ other talented remote workers on Himalayas.
Message HuzefaFind your dream job
Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!
