HimalayasHimalayas logo
MLabsML

Research Crawling Engineer

MLabs is a Haskell and Rust consultancy specializing in mission-critical software development, AI, Fintech, and cross-team collaboration, with a global team dedicated to providing innovative and robust solutions.

MLabs

Employee count: 51-200

Salary: 80k-175k USD

United States only

Stay safe on Himalayas

Never send money to companies. Jobs on Himalayas will never require payment from applicants.

Location: Remote - Must have a 6 hour overlap with EST

Remote | Full-time

Compensation: $80K - $175K

We are hiring on behalf of our client who is a technical infrastructure firm specializing in the delivery of massive-scale web data to organizations developing advanced artificial intelligence models. The organization supports high-capacity bandwidth-sharing networks and operates a distributed crawler capable of accessing high-quality public web data at a global scale. Additionally, the team has engineered sophisticated pipelines for the ingestion, segmentation, and annotation of billions of multimedia files, facilitating dataset creation for frontier research labs.

The organization operates as a lean, technical team that prioritizes speed and direct execution. As a Research Crawling Engineer, the successful candidate will design and operate large-scale web data acquisition systems. This role encompasses distributed systems, scraping infrastructure, and data pipelines, focusing on providing high-quality inputs for research and model development.

Key Responsibilities

  • Construct and maintain large-scale web crawlers across diverse domains.
  • Design high-throughput, fault-tolerant systems for data collection, managing volumes ranging from millions to billions of URLs per day.
  • Navigate anti-bot systems, rate limits, and dynamic, JavaScript-heavy websites.
  • Develop robust pipelines for data cleaning, deduplication, filtering, and normalization.
  • Build and maintain datasets specifically structured for research and machine learning model training.
  • Monitor and optimize crawl performance, coverage, and data quality through rapid iteration.
  • Collaborate with research teams to ensure data collection efforts align with modeling requirements.
  • Optimize infrastructure to ensure cost-efficiency, low latency, and reliability.

Requirements

  • Extensive programming experience in one or more of the following: Go, Rust, Python, Java, or C++.
  • Proven experience in building web crawlers or large-scale data pipelines.
  • Solid understanding of HTTP, networking protocols, and browser behavior.
  • Familiarity with distributed systems and parallel processing techniques.
  • Experience handling large datasets, ideally at the terabyte to petabyte scale.
  • Demonstrated ability to debug and maintain systems within unstable or adversarial environments.

Preferred Qualifications:

  • Experience with NLP pipelines or dataset curation for machine learning.
  • Familiarity with LLM pre-training data or retrieval systems.
  • Practical experience with headless browsers (e.g., Playwright, Puppeteer, or Chrome DevTools Protocol).
  • Knowledge of proxy systems, IP rotation, and large-scale request orchestration.
  • Background in data quality evaluation or benchmarking.
  • Experience running workloads on cloud or bare-metal infrastructure.

Benefits

  • Impactful Opportunity: Contribute to the development of a web-scale crawler and knowledge graph at the forefront of AI data accessibility.
  • High-Performance Culture: Join a lean, low-ego team that prioritizes high output and professional growth.
  • Remote Work: This position is part of a fully remote team, offering flexibility and autonomy.
  • Competitive Compensation: A package including a competitive salary, comprehensive benefits, and equity, commensurate with experience and the ability to operate at scale.



Interview Process

  1. Recruiter Coordination Call
  2. Hiring Manager Interview
  3. Founder / CEO Interview
  4. Secondary Executive Interview
  5. Final Interview

Due to the high volume of applications we anticipate, we regret that we are unable to provide individual feedback to all candidates. If you do not hear back from us within 4 weeks of your application, please assume that you have not been successful on this occasion. We genuinely appreciate your interest and wish you the best in your job search.

Commitment to Equality and Accessibility:

At MLabs, we are committed to offer equal opportunities to all candidates. We ensure no discrimination, accessible job adverts, and providing information in accessible formats. Our goal is to foster a diverse, inclusive workplace with equal opportunities for all. If you need any reasonable adjustments during any part of the hiring process or you would like to see the job-advert in an accessible format please let us know at the earliest opportunity by emailing human-resources@mlabs.city.

MLabs Ltd collects and processes the personal information you provide such as your contact details, work history, resume, and other relevant data for recruitment purposes only. This information is managed securely in accordance with MLabs Ltd’s Privacy Policy and Information Security Policy, and in compliance with applicable data protection laws. Your data may be shared only with clients and trusted partners where necessary for recruitment purposes. You may request the deletion of your data or withdraw your consent at any time by contacting legal@mlabs.city.

About the job

Apply before

Posted on

Job type

Full Time

Experience level

Salary

Salary: 80k-175k USD

Location requirements

Hiring timezones

United States +/- 0 hours

About MLabs

Learn more about MLabs and their company culture.

View company profile

MLabs emerged in 2018, founded by Mark Florisson, as a consultancy specializing in Fintech, AI, and IT. With a primary focus on functional programming, particularly Haskell, compilers, and full-stack development, Mark not only steers the company's business strategy but also actively manages several client relationships and internal projects. The company's core mission revolves around delivering sustainable and robust solutions for even the most intricate and mission-critical software endeavors. MLabs prides itself on a world-class software engineering team that spans 32 countries, fostering innovation and providing support through a rich diversity of disciplines and specializations.

The ethos of MLabs is deeply rooted in helping clients solve complex problems while delivering cutting-edge value. They have a significant footprint in the blockchain sector, offering essential services to smart contract projects, especially within the Cardano, Polkadot, and Solana ecosystems. Their support is tailored, sometimes assisting clients in overcoming internal development hurdles, while in other instances, MLabs takes on the entirety of the development process. This flexibility extends to startups, many of which approach MLabs with promising product ideas but lack the execution capability or clear launch strategies. MLabs steps in to fill these gaps, thereby reducing both the time to market and overall development costs. Their offerings include complete DevOps solutions and customized services, underscoring their comprehensive and adaptable approach. The company champions values such as diversity, collaboration, innovation, creativity, and pragmatism, believing that diverse cultural perspectives are fundamental to their shared culture of respect and their multi-perceptual approach to building solutions. Technologists at heart, they continuously encourage curiosity and ideation, striving to surpass expectations and generate industry-leading solutions for their clientele.

Employee benefits

Learn about the employee benefits and perks provided at MLabs.

View benefits

Flexible work arrangements

Offering flexible work arrangements when possible.

Generous vacation time

Promote work-life balance by offering generous vacation time.

Gym discounts

Comprehensive benefits packages that support mental, physical, and financial health, including gym discounts.

Health insurance

Comprehensive benefits packages that support mental, physical, and financial health, including health insurance.

View MLabs's employee benefits
Claim this profileMLabs logoML

MLabs

View company profile

Similar remote jobs

Here are other jobs you might want to apply for.

View all remote jobs

42 remote jobs at MLabs

Explore the variety of open remote roles at MLabs, offering flexible work options across multiple disciplines and skill levels.

View all jobs at MLabs

Remote companies like MLabs

Find your next opportunity by exploring profiles of companies that are similar to MLabs. Compare culture, benefits, and job openings on Himalayas.

View all companies

Find your dream job

Sign up now and join over 100,000 remote workers who receive personalized job alerts, curated job matches, and more for free!

Sign up
Himalayas profile for an example user named Frankie Sullivan