Disciplinary AI Literacy Sequence Designer

moderate evidence · ⏱ 5 minutes · Ai Literacy

Design a sequence where students compare AI's handling of the same question across disciplines, developing a mental model of where AI is reliable vs. distorting based on knowledge type.

What it does

Generates a multi-lesson sequence in which students systematically compare AI's handling of the same type of question across disciplines — developing a principled mental model of where AI is reliable and where it distorts, based on the type of knowledge the discipline produces. The central insight is that AI's reliability is not uniform: it handles settled factual knowledge differently from contested interpretive claims, sequential procedural knowledge differently from dispositional knowledge about values and judgment. A student who understands why AI is generally reliable when explaining photosynthesis but unreliable when interpreting the causes of the French Revolution has developed transferable AI literacy — not just a list of "AI gets wrong" examples, but a predictive framework. The sequence follows the logic of Maton's (2013) semantic wave: starting from concrete discipline-specific examples, building to an abstract framework (AI reliability varies by knowledge type), then returning to concrete predictions ("for this assignment, in this subject, I expect AI to be X reliable because..."). The skill draws on the library's existing knowledge architecture framework: sequential knowledge (structured, hierarchical, cumulative — as in mathematics or scientific procedures) tends to be well-served by AI; horizontal knowledge (multiple valid frameworks that don't displace each other — as in historiography or literary criticism) is where AI is most likely to flatten genuine complexity or false-certainty a contested position.

The evidence behind it

Willingham (2007) demonstrated that critical thinking is domain-specific — students who think critically in history do not automatically transfer that skill to biology, because what counts as good evidence differs by discipline. This has a direct AI corollary: AI's limitations in one domain do not automatically transfer to predictions about other domains. Disciplinary AI literacy requires domain-by-domain evaluation. McPeck (1981) argued that critical thinking is constituted by disciplinary knowledge — you cannot evaluate AI output in history without understanding how historians argue, any more than you can evaluate scientific AI output without understanding scientific reasoning. The sequence here operationalises this: students use their disciplinary knowledge as the evaluative instrument. Bernstein's (1999) distinction between vertical discourse (hierarchical, cumulative knowledge structures where new knowledge subsumes or displaces old — as in natural sciences) and horizontal discourse (segmented knowledge structures where competing frameworks co-exist — as in social sciences and humanities) directly predicts AI reliability patterns. AI is trained on the full distribution of human knowledge production — in vertical discourse domains, that distribution converges on correct answers; in horizontal discourse domains, it averages across competing frameworks, potentially flattening the distinctions between them. Maton's (2013) semantic wave concept provides the instructional design logic: the sequence must move students between concrete examples (AI answers about specific topics in specific disciplines) and abstract principles (the knowledge-type framework that explains the pattern), then back to concrete predictions. Wineburg (2007) on historical thinking as "unnatural" provides a specific example: the skills experts use in a discipline are not intuitive — students must be taught them. The same applies to disciplinary AI literacy: students must be explicitly taught to ask "what kind of knowledge is this discipline producing, and what does that mean for AI's reliability?"

Sources

Willingham (2007) — Critical thinking: why is it so hard to teach? (domain-specificity)
McPeck (1981) — Critical Thinking and Education: domain-specificity of critical thinking
Bernstein (1999) — Vertical and horizontal discourse: epistemic and social foundations of research contexts
Maton (2013) — Making semantic waves: a key to cumulative knowledge-building
Wineburg (2007) — Unnatural and essential: the nature of historical thinking

How to use it in your lesson

For the best results with EvidenceLesson, give it:

target_disciplines — Which two or three school subjects to compare — e.g. 'Biology and History', 'Maths, History, and Ethics', 'Physics and Literary Studies'
student_level — Age/year group
anchor_question_type (optional) — The type of question to translate across disciplines — causal (why did X happen?), evaluative (how good/significant is X?), procedural (how does X work?), or definitional (what is X?)
knowledge_type_focus (optional) — Which knowledge structure distinction to emphasise — sequential vs. horizontal (Bernstein), or factual vs. interpretive vs. dispositional
subject_area (optional) — The curriculum subject context if this is embedded in a specific unit
time_available (optional) — Time available for the sequence

Known limitations

The vertical/horizontal distinction is not binary. Bernstein's categories are analytical — in practice, every discipline has both settled and contested knowledge. The framework should be taught as a useful heuristic, not a classification system. Biology has contested areas; history has settled facts. The framework predicts PATTERNS, not guarantees.

AI models vary in how well they handle contested claims. Some models are specifically trained to acknowledge debate and hedge their confidence in interpretive domains. The sequence may produce different findings with different AI tools. This is itself a useful finding — discussing why models handle these questions differently can deepen the AI literacy.

The sequence requires students to have sufficient domain knowledge to evaluate AI's answers. Students cannot assess whether AI's causal account of Germany's WWI defeat is oversimplified if they haven't studied WWI sufficiently. The sequence is most effective mid-unit or end-of-unit, after foundational knowledge is in place.

Direct empirical evidence for disciplinary AI literacy pedagogy is very limited. The knowledge-type framework (Bernstein, Willingham, Maton) is strongly evidenced for general curriculum design. Its specific application as a framework for predicting AI reliability is principled but novel — there is not yet substantial research on teaching students to predict AI reliability using knowledge-structure reasoning.