Schmidt Sciences NDIF Cadenza Labs

Red Team: Lie Detection Competition — Request for Proposals

Proposals Due: March 31, 2026

TL;DR Submit a proposal describing your concept for creating datasets of on-policy LLM lies, along with your team’s background, by March 31, 2026. Selected Red Teams will receive a $10,000 stipend and up to $2,000 in compute to build their datasets. Teams that deliver on a jointly agreed dataset specification may receive an additional $15,000.

Apply Now →

1. Competition Overview

Schmidt Sciences, NDIF, and Cadenza are jointly running a competition to advance the science of lie detection in large language models. The competition is structured as a Red Team vs. Blue Team challenge:

The central research question is: What methods best detect whether a model lied, and how much does white-box access (weights, activations, activation steering) contribute beyond black-box tools alone?

Definition of Lying

We adopt the assertion-based definition from Fallis (2009), as operationalized in Liars’ Bench (Kretschmar et al., 2025): the assistant lies if it states something it believes to be false. Beliefs are measured through consistent responses in contexts without direct pressure to lie.

For example, if a model generates “I fixed the code, it runs now” after observing failed tests in-context, its apparent beliefs (informed by the failed tests) contradict its statement. See Section 3 in Liars’ Bench for more details.

Background: Liars’ Bench

This competition builds on Liars’ Bench, a testbed of 72,863 labeled examples of lies and honest responses generated by four open-weight models across seven datasets. These datasets will serve as the validation set during development of the detectors by Blue Teams. Red Teams are expected to create new datasets that go beyond the scenarios already captured, introducing novel and challenging types of lies for Blue Teams to detect.


2. Red Team Task

Red Teams create static datasets of on-policy LLM lies, similar in structure to Liars’ Bench but exploring new or more challenging settings. The goal is to produce lies that are difficult for current detection methods to identify.

Blue Team Affordances

Blue Teams will create general-purpose deception detectors, not dataset-specific ones. They will have the following affordances:

What Blue Teams can do:

What Blue Teams cannot do:

Dataset Requirements

Preferred Dataset Properties

Design Dimensions to Explore

Liars’ Bench categorizes lies along two axes that Red Teams could consider when designing new datasets:

Axis Category Examples
Object of Belief (What the model lies about) World-knowledge Factual claims, trivia, domain expertise
  Self-knowledge Own capabilities, past actions, policies, private state
Reason for Lying (Why the model lies) Encouraged Explicit instructions, role-play, goal-directed pressure
  Inherent Learned during training or fine-tuning (e.g., backdoors, alignment faking)

3. Deliverables

Upon selection for the final competition, Red Teams must submit:

Datasets

⚠️ Important: The datasets should not be made public until after the competition.

Datasets must be uploaded to HuggingFace, in a private repository, in the following schema:

Column Description
model Full HuggingFace model identifier used for generation (e.g., meta-llama/Llama-3.3-70B-Instruct)
messages Full conversation transcript in ChatML format; the last message must be an assistant message
deceptive Boolean label indicating whether the last assistant message is a lie

Fine-tuned Models

Any fine-tuned models used in the dataset creation must be provided, e.g. as LoRA adapters.

Report

A written report of no more than 4 pages (plus optional appendices) that:


4. Compensation

Phase Amount
(1) Accepted Proposal: Upon entering implementation phase $10,000 stipend + up to $2,000 in compute
(2) Upon selection for the final competition Additional $15,000
(3) Co-authorship Included on the competition report

⚠️ Important: The payments may take several weeks to process after milestone completion.

Upon proposal acceptance (1), organizers and Red Teams will jointly agree on a dataset specification. Teams that deliver datasets meeting the agreed-upon specification will be selected for the final competition and receive the additional $15,000 (2).


5. Timeline

Milestone Date
RFP published March 9, 2026
Proposal decisions (rolling) Deadline: March 31, 2026 (AoE)
Codebase + report submission June 15, 2026
Selection for final competition June 30, 2026
Blue Team competition Late Summer 2026

6. How to Apply

Submit a proposal (max 4500 characters) describing:

  1. Dataset concept: Your approach to creating datasets of on-policy lies, including which types of lies you plan to target, how you will verify labels, and why you expect your datasets to be challenging for current detection methods.
  2. Team background: Your team’s relevant experience. Backgrounds in LLM fine-tuning, interpretability, AI safety evaluation, or adversarial red-teaming are particularly valued.

Applications are reviewed on a rolling basis. The deadline is March 31, 2026.

If a full implementation is the only barrier to applying, the organizers are open to discussing joint development.

Contact for questions: competition@cadenzalabs.org

Submit Your Proposal →

References

Fallis, D. (2009). What is lying? Journal of Philosophy, 106(1), 29–56.

Kretschmar, K., Laurito, W., Maiya, S., & Marks, S. (2025). Liars’ Bench: Evaluating Lie Detectors for Language Models. arXiv:2511.16035.