HimalayasHimalayas logo
Unstructured TechnologiesUT

AI Engineer - Public Sector

Unstructured.io is a company that specializes in transforming unstructured data from various formats into LLM-ready data, enabling enterprises to leverage their internal data for AI applications.

Unstructured Technologies

Employee count: 51-200

United States only

Stay safe on Himalayas

Never send money to companies. Jobs on Himalayas will never require payment from applicants.

The Mission

At Unstructured, we are obsessed with transforming messy, unstructured data into a format that LLMs can actually use. Our Public Sector team has recently secured several high-impact contracts that demand more than just "off-the-shelf" solutions. We are looking for an AI Engineer who thrives at the intersection of R&D and production-grade software engineering.
You won’t just be building notebook demos; you will be architecting, prototyping, and shipping novel multimodal data processing, RAG, and agentic systems that solve critical problems for Government and Military clients. Your work will bridge the gap between one-off custom builds and a repeatable, scalable product roadmap.

What You’ll Do Day-to-Day

You will be a high-agency individual contributor, owning the lifecycle of AI solutions from initial research to AWS deployment.
50% Building & Shipping: Design and implement production-grade RAG pipelines and agentic workflows using Python. You’ll build systems that handle real-world "messy" data (PDFs, scanned docs, images, full motion video) and ensure they are performant and scalable.
30% Research & Experimentation: Stay at the bleeding edge. You’ll evaluate new models (LLMs, embedding models, object detection), prototype approaches for SBIR/government deliverables, and run experiments to prove what actually works.
20% Strategy & Collaboration: Partner with the team to document architectures, contribute to technical reports for contract deliverables, and participate in pre-sales calls to architect solutions for complex client needs.

The Ideal Candidate

We are looking for a self-directed engineer who excels in high-stakes, ambiguous environments. You are likely a strong fit if your professional background reflects the following:
Systems-First Engineering: You prioritize building reliable, scalable systems over experimental scripts. You have a track record of moving AI models out of notebooks and into production environments where latency, cost, and accuracy are treated as first-class citizens.
Technical Resourcefulness: You are comfortable working in restricted or air-gapped environments. When commercial APIs aren’t an option, you have the expertise to deploy, fine-tune, and optimize open-source models to achieve the mission objective.
Autonomous Problem Solving: You can take a high-level government requirement and translate it into a technical roadmap. You don't require constant oversight to identify the right tool for a job, whether it’s a specific vector database or a custom multimodal pipeline.
A "Generalist" Mindset: While you specialize in AI, you understand the full stack. You are as comfortable discussing embedding strategies as you are configuring AWS GovCloud infrastructure or debugging a FastAPI endpoint.

Must-Haves

  • Proven experience deploying Production RAG pipelines against real-world, messy datasets.
  • Deep expertise in Agentic system design (tool-use, multi-agent orchestration).
  • Strong Python engineering skills—writing clean, scalable, and maintainable code
  • Experience operating within AWS/GovCloud environments.

Nice-to-Haves

  • Experience fine-tuning NLP or object detection models.
  • Familiarity with LLM evaluation frameworks (hallucination detection, drift monitoring).
  • Knowledge of government security standards and working in different classification environments and on-prem
  • Security Clearance: Existing Secret/TS clearance or eligibility is a significant plus.

Your Technical Toolkit

  • Languages: Python (expert-level), SQL
  • LLM & Agentic Frameworks: LangChain, LangGraph, CrewAI, or similar orchestration frameworks
  • RAG Stack: Retrieval with vector databases (Pinecone, Weaviate, Chroma, pgvector), graph databases (Neo4J), Elasticsearch, BM25, and Sentence-Transformers; NLP enrichment with spaCy, GLiNER, and Transformers; optimization using embedding models, reranking pipelines, and DSPy
  • Evaluation & Observability: RAGAS, DeepEval, Arize Phoenix, and synthetic annotations
  • Cloud & Infrastructure: AWS, SageMaker, Bedrock, S3, Lambda, Docker, and FastAPI
  • Data Processing: Complex pipelines for unstructured and multimodal data, including PDFs, scanned documents, images, and audio.

Why Join Us

Opportunity to work on a dynamic team and work on cutting-edge machine learning projects.
Collaborative and innovative work environment with a focus on learning and growth.
Impactful role in shaping the company's direction and driving innovation in unstructured data processing.
Competitive compensation package, including benefits and stock options.

About the job

Apply before

Posted on

Job type

Full Time

Experience level

Location requirements

Hiring timezones

United States +/- 0 hours

About Unstructured Technologies

Learn more about Unstructured Technologies and their company culture.

View company profile

Unstructured addresses a significant challenge many enterprises face: leveraging their vast amounts of unstructured data for use with large language models (LLMs) and other AI applications. Customers often struggle with data in various formats like PDFs, Word documents, PowerPoint presentations, HTML files, images, and more, which are not readily usable by machine learning models. This is where Unstructured steps in, providing solutions to automate the preprocessing of this messy, human-generated data. Our platform transforms raw data into clean, structured formats, making it compatible with LLMs for tasks such as fine-tuning, pre-training, and Retrieval Augmented Generation (RAG).

Our customers need to unlock the potential of their internal data to enhance productivity, drive innovation, and gain actionable intelligence. Unstructured offers open-source libraries and commercial API products designed to simplify and accelerate this data transformation process. We enable organizations to connect their enterprise data, regardless of file type or layout, to LLMs efficiently. This means data scientists and engineers no longer need to spend the majority of their time on the laborious task of data preprocessing, which traditionally involves building custom, brittle pipelines for each data type. By providing robust tools for data ingestion, partitioning, cleaning, and staging, Unstructured empowers businesses to build powerful AI applications based on their own specific, high-quality data, rather than relying solely on generic, pre-trained models. This allows for more accurate, relevant, and secure AI-driven insights and workflows.

Claim this profileUnstructured Technologies logoUT

Unstructured Technologies

View company profile

Similar remote jobs

Here are other jobs you might want to apply for.

View all remote jobs

9 remote jobs at Unstructured Technologies

Explore the variety of open remote roles at Unstructured Technologies, offering flexible work options across multiple disciplines and skill levels.

View all jobs at Unstructured Technologies

Remote companies like Unstructured Technologies

Find your next opportunity by exploring profiles of companies that are similar to Unstructured Technologies. Compare culture, benefits, and job openings on Himalayas.

View all companies

Find your dream job

Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan