Method library › Ai Learning Science

Formative Assessment Loop Designer for AI Systems

strong evidence · ⏱ 5 minutes · Ai Learning Science

Design an adaptive assessment loop where each student response triggers the next instructional move. Use when building technology-enhanced formative assessment cycles.

What it does

Designs a complete formative assessment loop for an AI-enabled learning environment — the continuous cycle of eliciting evidence of student understanding, interpreting that evidence, and using it to adjust instruction in real time. Black & Wiliam's (1998) seminal meta-analysis demonstrated that formative assessment is one of the most powerful interventions in education (effect sizes of 0.40-0.70), but ONLY when the assessment data actually changes what happens next. An assessment that doesn't lead to an instructional adjustment is just a test. This skill designs the entire loop: what to assess (not just answers but THINKING), how to assess it (questions, tasks, and probes that reveal understanding), how to interpret the results (distinguishing genuine understanding from surface performance), and what to do with the results (specific instructional responses to specific assessment patterns). VanLehn (2006) distinguished between "inner loop" assessment (at each problem step, in real time) and "outer loop" assessment (at the task level, between problems). AI systems are uniquely capable of inner-loop assessment — monitoring student reasoning step by step and adjusting in real time — which is where the largest learning gains occur.

The evidence behind it

Black & Wiliam (1998) conducted the most influential review of formative assessment research, analysing over 250 studies and finding effect sizes ranging from 0.40 to 0.70 — larger than most educational interventions. They defined formative assessment as "all those activities undertaken by teachers, and/or by their students, which provide information to be used as feedback to modify the teaching and learning activities in which they are engaged." The critical insight: the ASSESSMENT is not the valuable part — the MODIFICATION is. Data collection without instructional adjustment is summative assessment with a different label. Black & Wiliam (2009) developed a more refined theoretical framework, identifying five key strategies of formative assessment: (1) clarifying and sharing learning intentions and criteria for success, (2) engineering effective classroom discussions and tasks that elicit evidence of learning, (3) providing feedback that moves learners forward, (4) activating students as instructional resources for one another, and (5) activating students as owners of their own learning. All five strategies involve a cycle: elicit → interpret → act. Wiliam (2011) translated this framework into practical classroom strategies, emphasising that formative assessment must be EMBEDDED in instruction — not an add-on activity but a continuous process of checking, adjusting, and responding. He argued that the biggest barrier to effective formative assessment is not data collection but DATA USE: teachers often collect data but don't change their teaching in response to it. VanLehn (2006) analysed the behaviour of tutoring systems and identified two levels of assessment loop. The "outer loop" operates between problems: after a student completes a problem, the system decides what to do next (another similar problem, a harder problem, a review, or a new topic). The "inner loop" operates within problems: at each step, the system assesses the student's response and provides feedback, hints, or scaffolding. VanLehn found that the inner loop was the more important determinant of ITS effectiveness — systems that assessed and responded at the step level dramatically outperformed systems that only assessed at the problem level. Shute & Zapata-Rivera (2012) reviewed adaptive educational systems and found that the most effective systems combined continuous assessment with immediate instructional adaptation — creating a "tight" formative assessment loop where the time between assessment and response was minimised.

Sources

Black & Wiliam (1998) — Assessment and classroom learning (seminal meta-analysis)
Black & Wiliam (2009) — Developing the theory of formative assessment
Wiliam (2011) — Embedded formative assessment
VanLehn (2006) — The behavior of tutoring systems (inner loop vs. outer loop)
Shute & Zapata-Rivera (2012) — Adaptive educational systems

How to use it in your lesson

For the best results with EvidenceLesson, give it:

learning_objective — The specific learning objective that the formative assessment loop should monitor — what students are trying to learn
current_assessment_approach — How assessment currently works in this context — when teachers check understanding, what they check, and what they do with the information
student_level (optional) — Age/year group and proficiency level
subject_area (optional) — The curriculum subject
ai_system_capabilities (optional) — What the AI system can do — real-time monitoring, adaptive questioning, dashboard reporting, or other
class_size (optional) — How many students the system needs to support simultaneously
assessment_frequency (optional) — How often assessment data should be collected — continuous, per-task, daily, weekly

Known limitations

Formative assessment loops require continuous data flow. The design above assumes the AI system can assess student responses in real time and adapt immediately. Systems with batch processing (collect data, analyse overnight, adjust tomorrow) cannot implement the inner loop. The outer loop is still possible and still valuable, but the learning gains from inner-loop assessment (VanLehn, 2006) require real-time processing.

The interpretation framework is probabilistic. A student who chooses "area" for a perimeter question PROBABLY has a conceptual confusion — but they might have misread the question, clicked the wrong option, or not understood the word "perimeter." The system should never make a definitive diagnosis from a single response. The interpretation framework above uses PATTERNS of responses (multiple problems) to build confidence in the diagnosis.

Black & Wiliam's (1998) meta-analysis included a wide range of formative assessment practices. The effect sizes (0.40-0.70) apply to formative assessment broadly, not specifically to AI-implemented formative assessment. The principles are sound, but the specific effect size of an AI formative loop in THIS context has not been empirically measured. The design is evidence-informed, not evidence-proven.

Teacher dashboard data can be overwhelming. The dashboard above provides detailed information about individual students and class-level patterns. In a class of 30, this is manageable. In a year group of 120 students, the data volume may be overwhelming. Dashboard design for larger scales requires more aggressive filtering and summarisation.

The assessment loop may inadvertently narrow the curriculum. If the AI loop focuses exclusively on area vs. perimeter identification and calculation, students may develop competence in this specific skill but miss the broader mathematical understanding (measurement as a concept, connections to other topics). Wiliam (2011) warns that formative assessment should serve learning, not define it.