Method library › Student Learning

Teach-Back Evaluator

moderate evidence · Student Learning

The learner teaches the concept to the AI, which plays a curious novice peer and identifies gaps through authentic questions. Use when the learner wants to test their understanding — teaching forces a different kind of organisation than studying.

What it does

The learner teaches the concept to the AI, which plays the role of a curious, slightly confused peer who has not studied this material. The AI asks clarifying questions from the novice perspective — probing gaps in the explanation, asking for examples when claims are abstract, and flagging when the explanation would confuse a non-expert. The AI then scores the teach-back on three dimensions: coherence (does the explanation hang together?), completeness (are the key ideas present?), and misconception risk (does the explanation contain or invite incorrect inferences?). The learner must achieve a clear, accurate explanation before the session can close.

The evidence behind it

Bargh & Schul (1980) demonstrated the "protégé effect": students who expected to teach material learned it more thoroughly than students who expected to be tested — even before the teaching occurred. The expectation of teaching changed how students studied, producing more organised, coherent knowledge structures. Biswas et al. (2008, 2016) created Betty's Brain, a computer-based learning environment where students teach a virtual agent who then takes a test. Students who taught Betty showed stronger science reasoning and metacognitive skills than control students, with the mechanism being that the teaching process revealed gaps that motivated further learning. Roscoe & Chi (2007) studied peer tutors and distinguished two modes: "knowledge-telling" (repeating material) and "knowledge-building" (generating new explanations, making connections, recognising gaps). Only knowledge-building produced learning gains for the tutor. This distinction is central to the teach-back evaluator's design: the AI's novice questions are specifically designed to interrupt knowledge-telling and force knowledge-building. Fiorella & Mayer (2013) found that learning by teaching produces durable learning gains specifically because it requires the learner to generate explanations and connections not directly present in the source material — the generation effect operating at the level of an explanation rather than a single sentence.

Sources

Biswas et al. (2016) — Betty's Brain: a computer-based learning environment that promotes science reasoning and metacognition
Biswas et al. (2008) — Learning by teaching a computer agent
Bargh & Schul (1980) — On the cognitive benefits of teaching
Roscoe & Chi (2007) — Understanding tutor learning: knowledge-building and knowledge-telling in peer tutors' explanations
Fiorella & Mayer (2013) — The relative benefits of learning by teaching and teaching expectancy

How to use it in your lesson

For the best results with EvidenceLesson, give it:

concept_to_teach — The concept the learner will teach to the AI
context — Course and level — helps calibrate what depth is appropriate
prior_understanding_level (optional) — Self-reported or assessed understanding level before the teach-back
developmental_band (optional) — Learner age or stage

Known limitations

The teach-back evaluator requires the AI to maintain two distinct modes: novice peer and evaluating coach. Inconsistency between these modes — slipping into coach mode during the teach-back with corrective framing — undermines the novice authenticity and can short-circuit the knowledge-building process. The role separation must be maintained explicitly.

Roscoe & Chi (2007) found that knowledge-telling (recitation) is the default mode for most peer tutors. Without the novice questions interrupting recitation, the teach-back can become a fluency exercise rather than a genuine understanding test. The quality of Alex's questions is the critical variable — and generating authentically "novice" questions requires the AI to accurately model what a non-expert genuinely wouldn't know.

The scoring dimensions are approximate and require subject-domain knowledge. "Misconception risk" in particular requires the AI to know not only whether the learner's explanation is correct but whether it could produce incorrect inferences in a non-expert listener. This is a sophisticated judgment that depends on the AI's accuracy in the specific subject domain.

The teach-back format is more effective for some content types than others. It works exceptionally well for conceptual material with clear mechanisms and relationships (biology, physics, economics). It is less effective for procedural skills (calculations, code execution) and highly contextual material (literary analysis, historical interpretation) where the "correct explanation" is less well-defined.