Method library › Student Learning

Unassisted Evidence Checkpoint

strong evidence · Student Learning

After scaffolded practice, run an unassisted check — a problem with no AI help. Separates what the learner can do with support from what they can do independently. Critical for preventing phantom attainment.

What it does

After scaffolded practice, schedules and administers an unassisted check — a problem or question the learner must attempt with absolutely no AI help, hints, or scaffolding. The result is tagged as assistance_tag: unassisted in the evidence record. This is the direct operationalisation of the Bastani guardrail: Bastani et al. (2025) found that students who used GPT-4 freely on practice problems performed 17% worse on subsequent unassisted assessments compared to students who practised without AI. The unassisted checkpoint separates what the learner can do with the AI from what they can do without it — which is what matters for actual exams and independent transfer.

The evidence behind it

Bastani et al. (2025), published in the Proceedings of the National Academy of Sciences, ran a controlled study with high school students learning algebra. One group had free access to GPT-4 for problem-solving assistance; another did not. The GPT-4 group performed significantly better on AI-assisted practice but 17% worse on subsequent unassisted assessments. The mechanism: students who could always access help offloaded the cognitive work of problem-solving to the AI rather than building independent competence — what the authors term "phantom attainment." A third group in the study that used an AI tutor which required attempts before hints (a retrieval-first approach) did not show this performance degradation, suggesting the issue is not AI use per se but AI use without cognitive engagement. Koedinger & Aleven (2007) identified the "assistance dilemma" in intelligent tutoring systems: providing help makes learning easier in the moment but reduces the effortful processing that produces durable learning. Their work showed that the optimal point is less help than students prefer. Roediger & Karpicke (2006) and Bjork et al. (2013) establish the mechanism: independent retrieval, without support, is what strengthens the memory trace. Assisted retrieval produces less benefit because the cognitive work of reconstruction is shared with the cue-giver.

Sources

Bastani et al. (2025) — Generative AI can harm learning (PNAS) — unguarded GPT-4 reduced unassisted performance by 17%
Koedinger & Aleven (2007) — Exploring the assistance dilemma in experiments with cognitive tutors
Kapur & Rummel (2012) — Productive failure and the nature of prior knowledge for learning
Roediger & Karpicke (2006) — Testing effect and independent retrieval
Bjork et al. (2013) — Self-regulated learning: beliefs, techniques, and illusions

How to use it in your lesson

For the best results with EvidenceLesson, give it:

topic_or_skill — The topic or skill that was just practised with scaffolding
scaffolded_session_summary — Brief description of what was covered in the scaffolded session
prior_unassisted_results (optional) — Results from previous unassisted checkpoints on the same topic
developmental_band (optional) — Learner age or stage for calibrating difficulty

Known limitations

The unassisted checkpoint cannot be cryptographically enforced. The skill relies on the learner's honest participation — it cannot verify they haven't consulted another AI, their notes, or a textbook. It is designed for self-determined learners who want to know their genuine independent capability, not as an assessment instrument.

Problem calibration matters significantly. If the unassisted problem is too different in surface form from the scaffolded practice, the gap between assisted and unassisted performance may reflect novelty rather than dependence on scaffolding. The skill instructs the AI to use a slightly different surface form, but calibrating "slightly different" correctly requires good judgment.

A single checkpoint is insufficient to characterise a learner's independent capability. One unassisted problem is a data point, not a reliable measure. The pattern across multiple sessions — consistent gaps, improving independence — is more informative than any single checkpoint result.

The firm no-help rule can distress anxious learners. The redirect scripts are designed to be warm while holding the line, but for highly anxious students, a boundary that can't be negotiated may feel punishing. Context matters — a student who is already distressed may need the rule explained more carefully, or the checkpoint deferred to a calmer moment.