A pair of ears in every lesson? Using AI to give trainee teachers richer feedback

Written by:
8 min read
MILES BERRY, PROFESSOR OF COMPUTING EDUCATION, UNIVERSITY OF ROEHAMPTON, UK

AI can now generate detailed, evidence-referenced feedback from a transcript of any lesson, but how does this fit with professional formation?

When ChatGPT launched in November 2022, one of the first things that many teachers did was ask it to write a lesson plan. The results were typically disappointing. ‘Write a lesson plan for Year 9 on the periodic table’ produced something generic, often shallow and rarely something that a teacher would use. My concern was what trainee teachers might do with plans such as these.

That early encounter pointed to a tension that runs through AI in initial teacher training: the risk of cognitive outsourcing. If a trainee can generate a lesson plan in seconds, then do they learn the craft of planning? Does the hard slog of constructing a bespoke lesson for a particular class, with particular pupils, bringing particular knowledge and enthusiasms, still matter?

I think that it does. But I also think that there is a role for AI in initial teacher training that is quite different from plan generation and more powerful: using AI not to do the thinking for trainees, but to reflect back on what happened when their thinking met reality, in both the lesson that they planned and the lesson that they taught.

The lesson planning question

The Teachers’ Standards (DfE, 2011) make it clear that trainees must plan and teach well-structured lessons. There is a parallel between downloading a ready-made lesson plan from a publisher and asking a language model to generate one. Both can serve as starting points, but neither constitutes lesson planning as a professional act. Just as we would be reluctant to allow trainees to teach exclusively from others’ plans, we should be cautious about AI-generated plans that short-circuit the thinking that makes planning valuable. This is the sort of cognitive outsourcing that seems detrimental to the development of a trainee’s craft skills.

The research supports this caution. Dornburg and Davin (2024) found that ChatGPT-generated foreign language lesson plans were often of reasonable surface quality but showed troubling variability and embedded historical biases, reflecting outdated pedagogical approaches. Kalenda et al. (2024) found that pre-service teachers’ confidence in ChatGPT as a planning tool decreased once they engaged in careful analysis of its outputs, suggesting that the capacity to evaluate and adapt AI-generated material critically should be a core component of teacher education. Prompt engineering helps: grounding requests in curriculum documents or exam specifications yields better results than a zero-shot prompt. But the generated plan still belongs to the model, and not to the teacher.

The research case for AI feedback on teaching

The research is more encouraging in using AI not to generate teaching but to analyse it. A growing body of work has examined whether language models and natural language processing tools can provide specific feedback on classroom practice.

Demszky et al. (2025) conducted a pre-registered randomised controlled trial with 224 mathematics and science teachers. Those who received automated feedback on their use of questions that press pupils to explain and reflect, rather than simply recall, increased their use of such questions by 20 per cent compared to a control group. The feedback came from an AI system working from classroom audio; no human observer was required. Where both human mentors and AI provided feedback on the same lessons, the AI tended to be more comprehensive, attending to the whole lesson rather than to salient moments alone. Jacobs et al. (2025) found similar results in their work on automating feedback on classroom discourse patterns, noting that AI could surface patterns in teacher talk that post-lesson conversations rarely reached.

This picture is echoed by Sert et al. (2025), who found that automatic question-detection tools, embedded in teacher education programmes, prompted reflection on classroom interaction when accompanied by structured discussion and mentoring support. AI data, on its own, is not sufficient. Nygren et al. (2025), comparing AI and expert human mentoring in simulations, found that AI feedback was more consistent and broader in scope, while human feedback was better attuned to pedagogical moments and the emotional texture of teaching. The emerging consensus is that AI feedback and human mentoring are complementary rather than competing, and that the most productive use of AI positions it as a prompt for professional conversation, and not a substitute for one: while AI feedback might be more objective, and is often more detailed, human feedback is more likely to be acted on, for both pupils and trainees (Zhang et al., 2026)

AI as critical friend

This is the role that I have been exploring with my PGCE (postgraduate certificate of education) trainees at Roehampton: AI as a critical friend – not just on the lesson plan, but also on the lesson as taught.

We already expect trainees to share their lesson plans with mentors before teaching, typically a day ahead. The mentor reads the plan, offers questions and suggestions, and the trainee refines their thinking. The cognitive work of planning remains theirs. We would not expect the mentor to write the plan.

A trainee who sits with a thoughtful commentary on their lesson – ‘here is where you introduced new vocabulary’, ‘here is where questioning was concentrated on a small number of pupils’, ‘here is one thing to try differently next time’ – is engaged in professional reflection. They did the teaching. The AI has been, in effect, a pair of ears in the room, listening carefully to the whole lesson, and is now offering what a mentor might offer, if mentors had the time.

Grounded in specific statements from the ITTECF (Initial Teacher Training and Early Career Framework; DfE, 2024) and organised against the Teachers’ Standards (DfE, 2011), the feedback can point to a particular moment in the lesson and say: ‘Here is where a little more wait time, as described in ITTECF 4n, might have given more pupils the chance to formulate a response’ or ‘This is a good example of 1h, acknowledging pupil effort and emphasising progress’. The feedback is developmental, evidence-referenced and entirely non-judgemental. Very few mentors have the time to give feedback of this depth and specificity on every lesson. 

The system in practice

The tool that I have developed for this is a straightforward web application, designed to work on a phone or laptop browser. A trainee uploads or records audio of their lesson; the app reduces background noise and transcribes it using OpenAI’s Whisper. The original audio is then discarded immediately. The transcript is shown to the trainee, who can edit it, removing personally identifiable information or anything raising safeguarding concerns, before submitting it for feedback. No account or login is required. Nothing is stored.

The feedback is generated by prompting Anthropic’s Claude Haiku model, chosen for its speed, cost-effectiveness and Anthropic’s approach to responsible AI development (Anthropic, 2026), using a structured knowledge base drawn from the ITTECF and Teachers’ Standards. The response opens with a summary of the lesson and its phases, identifies two or three strengths with specific ITTECF references, names areas for development with concrete suggestions and closes with a single priority next step. The whole thing is clearly caveated: this is AI-generated, unreviewed by any human and is intended as the basis of a discussion with the mentor, and not a replacement for that conversation.

There is no access to transcripts or to the feedback generated, other than for the trainee at the time of use. Qualitative responses from trainees have been positive: the feedback is described consistently as more detailed than mentor feedback, and usefully specific. In my own trials, running transcripts of lessons that I have observed against the same system, the AI’s commentary has rarely contradicted the human observer’s judgement, but has regularly gone further, attending to aspects of the lesson that post-lesson conversations did not reach.

Limitations and open questions

Working from a transcript, the AI cannot see the room. Behaviour management incidents that a mentor would notice in seconds may not appear in the audio at all. Non-verbal communication, whiteboard work and the physical arrangement of the classroom are all absent. However, in most classroom subjects, a transcript is a rich record, as much teaching happens through the medium of the spoken. Regardless, feedback built on a transcript should never be mistaken for a full lesson observation.

There is also a subtler concern. One thing that I value about the post-lesson conversation – the mentor and trainee with a cup of coffee, talking through what happened – is precisely that it requires the trainee to stop and think. The cognitive work of reflecting on a lesson is itself part of professional formation. There is a risk that reading a detailed AI commentary becomes a substitute for that thinking, rather than a prompt for it. The feedback is most valuable, I suspect, when it is the starting point for a professional conversation, and not the end of one. I worry about outsourcing the reflection itself, even to a machine that may, in some respects, be better at listening than any of us.

These are questions that the field is only beginning to address (Demszky et al., 2021; Wang and Demszky, 2023). What I have so far is only proof-of-concept work; assurance that it does no harm and robust evidence of its effectiveness are needed before it can be recommended at scale.

Conclusion

AI is not going to replace the craft of teaching or the work needed in learning to teach. What it may be able to do is ensure that every lesson that a trainee teacher teaches is also, in a meaningful sense, attended to – not in the threatening, high-stakes way of a formal observation, but quietly, non-judgementally, as a critical friend who listens carefully and has some constructive thoughts. The emerging research suggests that this kind of feedback, grounded in good criteria and offered without grades or grades-adjacent language, can improve practice. Whether it does so for trainee teachers, in the conditions of initial teacher education in England, is a question worth pursuing carefully.

The examples of AI use and specific tools in this article are for context only. They do not imply endorsement or recommendation of any particular tool or approach by the Department for Education or the Chartered College of Teaching and any views stated are those of the individual. Any use of AI also needs to be carefully planned, and what is appropriate in one setting may not be elsewhere. You should always follow the DfE’s Generative AI In Education policy position and product safety standards in addition to aligning any AI use with the DfE’s latest Keeping Children Safe in Education guidance. You can also find teacher and leader toolkits on gov.uk.

    0 0 votes
    Please Rate this content
    0 Comments
    Oldest
    Newest Most Voted
    Inline Feedbacks
    View all comments

    From this issue

    Impact Articles on the same themes