Method library › Student Learning

Confidence Calibration Check

moderate evidence · Student Learning

Capture confidence ratings before and after a learning attempt to identify overconfidence and underconfidence patterns. Use when a student wants to understand how well they actually know something versus how well they think they know it.

What it does

Captures a confidence rating (0–100) before a knowledge attempt and again after receiving feedback, then compares the two. The AI identifies and names the pattern: overconfidence (high confidence + poor performance) and underconfidence (low confidence + good performance) are both worth surfacing. Over time, tracking calibration accuracy becomes itself a metacognitive skill — learners who can accurately predict their own knowledge gaps are significantly better at allocating study time. This skill makes the "illusion of competence" visible and actionable.

The evidence behind it

Bjork & Bjork (2011) identified the illusion of competence as one of the primary obstacles to effective self-study: re-reading material produces a feeling of familiarity that learners mistake for genuine understanding. Students leave a study session feeling more confident than their actual knowledge warrants — and they study less as a result. Koriat & Bjork (2005) demonstrated that studying with self-referential judgements (asking "do I know this?") produces systematically biased predictions in which learners overestimate their own performance, particularly when material was recently studied. Thiede et al. (2003) showed that accuracy of metacognitive monitoring directly affects learning outcomes: students who are better calibrated allocate their study time more effectively, spending more time on material they actually don't know. Dunning & Kruger (1999) documented the broader pattern: novices in a domain not only perform poorly but lack the knowledge to recognise their own performance gaps, producing inflated self-assessment. Hacker et al. (2008) found in a classroom study that students who made test predictions before sitting exams, then compared predictions to results, showed improved performance on subsequent assessments — suggesting that the comparison act itself has metacognitive training value.

Sources

Bjork & Bjork (2011) — Making things hard on yourself: the illusion of competence via re-reading
Koriat & Bjork (2005) — Illusions of competence in monitoring one's knowledge during study
Thiede et al. (2003) — Accuracy of metacognitive monitoring affects learning of texts
Dunning & Kruger (1999) — Unskilled and unaware of it: how difficulties in recognising one's own incompetence lead to inflated self-assessments
Hacker et al. (2008) — Test prediction and performance in a classroom context

How to use it in your lesson

For the best results with EvidenceLesson, give it:

topic_or_question — The specific concept or question the calibration check is built around
context — Course or study context
prior_calibration_data (optional) — Previous confidence ratings and accuracy scores from this session or prior sessions
developmental_band (optional) — Learner age or stage

Known limitations

The 0–100 confidence scale is intuitive but not psychometrically calibrated. Different learners interpret the scale differently. A 70 from a naturally anxious learner may represent stronger actual knowledge than a 70 from an overconfident one. The skill uses relative calibration (before vs. after, across sessions) rather than absolute numbers — which is more informative than any single rating.

The calibration pattern depends on the quality of the performance evaluation. If the AI misjudges the accuracy of the learner's attempt — calling something "partial" when it was actually strong, or missing a genuine misconception — the calibration feedback will be misleading. The AI's subject-matter accuracy is a hard dependency.

Frequent calibration checks can produce strategic responding if learners game the rating. A learner who understands the pattern may artificially lower pre-attempt confidence to appear "well-calibrated" after a good performance. This is unlikely with genuinely motivated learners but worth noting.

Calibration data across sessions requires session history to be useful. A single calibration check gives a snapshot. The pattern — persistent overconfidence, improving calibration — only becomes visible over multiple sessions. This skill pairs with 20-10 (SRL Session Wrapper) and 20-12 (Weekly Agency Review) for longitudinal tracking.