Machine Learning Researcher RL & Agentic Systems

Protege

entry level to mid-level

Work Type

Remote

Seniority

entry level to mid-level

Posted

May 28, 2026

Expected Salary

$100k+

Want to apply for this job?

Subscribe to access the application link and 8,000+ more jobs

Job Description

Company Overview: We are building Protege to solve the biggest unmet need in AI — getting access to the right training data.

The process today is time intensive incredibly expensive and often ends in failure.

The Protege platform facilitates the secure efficient and privacy-centric exchange of AI training data.

Solving AI’s data problem is a generational opportunity.

We’re backed by world-class investors and already powering partnerships with some of the most ambitious teams in AI.

The company that succeeds will be one of the largest in AI — and in tech.

We’re a lean fast-moving high-trust team of builders who are obsessed with velocity and impact.

Our culture is built for people who thrive on ambiguity own outcomes and want to shape the future of data and AI.

ABOUT DATALAB DataLab exists because truly useful data is rare — and the frontier of AI development only moves forward when high-quality data makes it possible.

We believe data is one of the most underdeveloped layers of the AI stack.

Our work focuses on building and evaluating high-value datasets grounded in real-world workflows and economically meaningful tasks.

We work across multiple domains to create safe high-fidelity datasets that preserve the structure and context needed to train advanced AI systems.

Our research spans data quality evaluation design privacy-preserving transformation workflow reconstruction and task-grounded AI training data.

At DataLab applied research is tightly connected to real-world deployment.

Researchers work directly with large-scale datasets production systems and frontier AI training problems.

ROLE OVERVIEW Data is the foundation of AI performance and we believe model quality starts with data quality.

As AI systems become more agentic a critical challenge is understanding which real-world datasets tasks and environments actually lead to better model behavior.

We’re seeking a Machine Learning Researcher focused on RL and agentic systems to help define design and evaluate the datasets tasks environments and benchmarks used to assess advanced AI systems.

In this role you’ll work closely with research and engineering teams to translate real-world workflows into high-value datasets and evaluation assets: structured tasks interactive environments benchmark suites and quality scorecards that help us understand how models perform in realistic settings.

You’ll help define what “high-quality agentic data” means in practice using statistical computational and ML-driven methods to evaluate dataset quality task design environment fidelity and downstream model performance.

You’ll work on the core problems of benchmarking real-world data measuring how well models perform on that data and designing RL-style or agentic environments that capture the structure of meaningful work.

This is an ideal role for someone with a strong machine learning background who is excited by reinforcement learning agentic systems evaluation and the role of data in shaping model behavior.

You should be excited by the opportunity to build the datasets and benchmarks that help define what high-quality real-world data looks like for frontier AI systems.

WHAT YOU’LL DO DESIGN AND BUILD DATASETS TASKS AND ENVIRONMENTS Design and build datasets tasks environments and evaluation assets for benchmarking agentic systems and multi-step model behavior.

Translate real-world workflows into structured tasks interaction traces trajectories stateful environments and verifiable outcomes that can be used to evaluate advanced AI systems.

DEVELOP FRAMEWORKS FOR EVALUATING REAL-WORLD DATA QUALITY Develop frameworks that assess diversity realism coverage fidelity informativeness and downstream usefulness of datasets for agentic systems.

Build quality scorecards and evaluation methods that make dataset strengths weaknesses and failure modes legible across teams.

BENCHMARK MODEL BEHAVIOR IN RL AND AGENTIC SETTINGS Evaluate planning tool use robustness recovery from failure task completion and generalization behavior in RL-style or agentic environments.

Connect model failures back to concrete dataset environment or task-design gaps and recommend improvements grounded in empirical evidence.

BUILD SCALABLE EVALUATION AND VALIDATION TOOLING Contribute to tools and systems that automate dataset validation environment generation rollout analysis benchmark construction and evaluation workflows.

Improve internal infrastructure for reproducible experimentation benchmark management and evaluation quality.

PARTNER ACROSS RESEARCH ENGINEERING AND PRODUCT Collaborate closely with research and engineering teams to identify data bottlenecks improve evaluation methodology and shape internal best practices around task-grounded AI training data.

Represent DataLab’s perspective in cross-functional discussions around dataset quality benchmark design and frontier agentic-system evaluation.

WHAT SUCCESS LOOKS LIKE NEAR-TERM: ESTABLISH A STRONG EVALUATION BASELINE Create clear benchmark frameworks evaluation assets and dataset-quality scorecards that help Protege reason about how real-world data impacts advanced agentic systems.

Use rigorous evaluation methods to identify meaningful dataset improvements improve benchmark fidelity and sharpen the company’s understanding of what high-impact agentic data actually looks like in practice.

WHAT YOU BRING - PhD or equivalent Master’s Degree + 4+ years industry experience in machine learning computer science statistics engineering mathematics economics or related quantitative fields. - Strong understanding of AI model training pipelines evaluation methodology and the role of data in shaping model performance. - Experience working with large unstructured or semi-structured datasets used to train or evaluate ML systems. - Experience with reinforcement learning sequential decision-making agentic systems tool-using models or multi-step model evaluation. - Experience designing tasks benchmarks environments simulations or evaluation frameworks for real-world model behavior. - Strong intuition for realism coverage difficulty fidelity and meaningful outcome structure in datasets. - Strong experimental design evaluation benchmarking and data-validation skills. - High ownership and ability to independently identify and solve high-impact problems.

NICE TO HAVE - Experience developing evaluation frameworks or performance metrics for datasets agentic systems or training data. - Experience translating real-world workflows into structured tasks or environments for model evaluation. - Experience with RLHF RLAIF imitation learning reward modeling online or offline RL or related methods. - Experience with Harbor or other agent evaluation frameworks. - Publications or open-source contributions in reinforcement learning agents evaluation or data-centric AI. - Experience collaborating cross-functionally with product infrastructure or partnership teams. - Experience with synthetic data generation trajectory generation or simulation-based environments.

PROTEGE'S VALUES Pass the Loved Ones' Test We act with integrity and do the right thing - especially when it's hard and no one is watching.

Always Find a Way We are resourceful resilient builders who solve hard problems and push through obstacles.

Go Fast and Grow Fast Velocity matters.

We move with urgency learn quickly and continuously improve as individuals and as a company.

Practice Kindness and Candor We communicate directly and respectfully building trust through honest feedback and genuine care for one another.

Deliver Together We win as one team.

Collaboration accountability and shared ownership drive our success.

Own the Outcome.

Hone the Craft.

We take pride in our work sweat the details and continuously raise the bar for excellence.

More Jobs You Might Like

Senior Sales Exec

Lawvu

Remote

Senior Product Designer

Tive

Remote

Reverse Logistics Representative

Tive

Remote

Customer Success Manager Customer Operations

Sumsub

Multiple locations · Fully Remote

UX Writer Design

Sumsub

Multiple locations · Fully Remote

Helpful Resources

Salary & Savings Calculator

Compare salaries across European cities and calculate your potential savings. Understand cost of living and take-home pay for tech jobs in Europe.

Career Guides

Expert advice on landing high-paying tech jobs in Europe. Tips on interviews, salary negotiation, and career growth from The European Engineer.

Access 8,000+ High-Paying Tech Jobs

Get unlimited access to our full database of 8,000+ jobs with advanced filters, salary comparisons, and exclusive career guides from The European Engineer.