DALE BASSETT, DIRECTOR OF ASSESSMENT, UNITED LEARNING, UK
Assessment is integral to education, helping us to understand how students are progressing. Whether formal annual tests, exit-ticket quizzes or informal learning checks in the classroom, ‘traditional’ assessment isn’t going away – but it is hugely impacted by AI, both positively and negatively.
United Learning’s ongoing development of the United Curriculum and supporting assessments, for our schools and for those outside the group, requires us to consider how to manage the risks and take advantage of the opportunities presented by the advent of generative AI (GenAI). This paper draws together our own experience with research and emerging innovative practice to explore how schools can safely use AI in assessment.
How AI is changing traditional assessment
AI has the potential to reduce teacher workload while maintaining or improving assessment quality (Roy et al., 2024). GenAI can rapidly generate questions, texts or images to use in assessments, helping with workload or prompting a more creative way in which to assess a topic – for example, if a biology teacher asks ChatGPT to give them some innovative questions assessing students’ knowledge of cells in unfamiliar contexts, it will happily oblige.
A plethora of tools already promise the ‘silver bullet’ of automated feedback and marking, reducing workload and enhancing impact by providing instant feedback and next steps for teachers and students. AI could also produce a range of exemplar responses to support more consistent marking, show students what good work looks like and demonstrate common mistakes and misconceptions.
Assessment generates huge amounts of data, which teachers spend a lot of time analysing. AI excels at identifying patterns and could provide automated QLA (question-level analysis), quickly spotting areas where students are struggling so that they can focus on the teaching rather than poring over the data. GenAI even offers the potential to skip marking altogether and directly analyse students’ work to identify common errors and gaps – for example, an English teacher could upload students’ essays and ask AI to formatively analyse the grammatical structures used in their work, to inform planning and interventions. (It is essential to ensure that students’ data and IP (intellectual property) are protected, as discussed later in this article.)
GenAI can provide quality assurance by reviewing assessment tasks for clarity, accessibility and potential bias. Teachers can input questions or tasks and receive feedback on reading-age appropriateness, cultural or linguistic bias, clarity of instructions and construct-irrelevant barriers that might prevent students from demonstrating their learning.
Differentiation and accessibility can be significantly enhanced using AI, which can help to create multiple versions of assessment tasks to meet different learning needs, while maintaining consistent learning objectives. This might include adjusting reading levels while maintaining subject complexity, breaking down complex tasks into manageable steps or suggesting accommodations for students with specific learning needs. AI for Education (2024a) has explored how to use GenAI tools to systematically review and modify an assessment to maximise accessibility and validityIn assessment, the degree to which a particular assessment measures what it is intended to measure, and the extent to which proposed interpretations and uses are justified.
AI can support teachers, but not replace them
While there is significant scope for teachers to use GenAI to support them in producing high-quality assessments, there are risks involved in doing so, especially when using ‘generalist’ tools such as ChatGPT or Claude. Keeping a ‘human in the loop’ is critical to check the quality of anything generated by AI.
Teachers need to ensure that AI-generated questions are aligned to the curriculum, at the right level and assessing the right things, and that feedback is relevant and meaningful. AI can help to spot human bias but it can have substantial biases of its own. It is well known that GenAI suffers from ‘hallucinations’ – its responses can contain factual errors or completely invented content – so any materials produced must be checked for factual inaccuracies. When using specialist educational tools incorporating AI, teachers should critically evaluate the quality of the AI-powered provision and how well it aligns to their curriculum; meaningful and useful feedback needs to be consistent with the curriculum sequence and key knowledge, build on students’ prior learning and highlight common misconceptions.
GenAI tools powered by large language models (LLMs) also tend to be more suited to language-based subjects and tasks. Developers are racing to build the mathematical and reasoning capabilities of their models but generalist LLMs can still get confused by anything but the most basic maths problems.
Ethical practice
As with any technology, schools must consider data security, intellectual property rights and safeguarding implications when implementing AI in assessment.
Policies should specify what can be shared and where, and how GDPR (General Data Protection Regulation) and other data protection requirements will be complied with. Schools should be particularly mindful of data security when handling pupil data, with anonymisation recommended for any analysis using GenAI tools. Different tools treat data differently, but it should always be assumed that anything shared with GenAI could be visible to the public unless schools are confident otherwise. A data protection impact assessment (DPIA) should be in place for any tools used.
Teachers should be mindful of IP considerations when uploading students’ work or third party resources into AI tools. Schools should ensure that safeguarding policies cover AI use, and it is good practice to be transparent with parents about how AI is used. Developing students’ AI literacy through the curriculum is vital to ensure that they learn how to use AI tools safely.
Integrity, authenticity and malpractice
The advent of widely available (and free) GenAI tools means that schools face increasing difficulty in ensuring the authenticity and integrity of students’ work. However, research from Stanford University suggests that the incidence of cheating has not increased since the advent of ChatGPT; rather, students who previously sought short cuts are simply using different tools (Lee et al., 2024).
In formal non-exam assessment in regulated qualifications, students and schools must adapt to avoid committing malpractice, perhaps unintentionally.
The Joint Council for Qualifications (JCQ) places the burden firmly on teachers and schools to identify whether work is a student’s own and to provide clear guidance on the use of AI tools in assessments. Their guidance highlights the importance of teaching students about AI use and misuse, what constitutes malpractice, how to reference AI usage and the importance of the candidate declaration certifying that the work is the student’s own (JCQ, 2024). Schools also need robust procedures for checking work authenticity, including comparing with students’ previous work, looking for ‘telltale signs’ of AI use and, potentially, discussing submissions with students.
JCQ also suggests that schools could consider using AI detection tools. However, research has shown that such tools are unreliable (Perkins, Roe et al., 2024), a view recently supported by Ofqual’s Chief Regulator (Bauckham, 2024); they can also demonstrate bias against non-native English speakers (Liang et al., 2023). A reasonable approach could eschew detection tools but ensure that the other proposed checks are robustly implemented.
While managing risks is crucial, banning AI use or failing to engage with it is not a viable option. Students must learn to use AI effectively and responsibly, as it will be an integral part of their future work lives. Good policies, practices and culture, as well as well-designed assessment tasks, can help schools to ameliorate the risks and make the most of the opportunities.
‘AI-resistant’ assessments
Outside of regulated qualifications, schools can develop AI-resistant assessments that are inherently difficult to complete using AI alone.
Critical here is the importance of assessment tasks requiring cognitive engagement and reducing opportunities for plagiarism and cognitive offloading. Good assessment tasks might focus on process instead of product, or skills instead of content, and can be rooted in the student’s specific experience so that GenAI cannot know the answer.
Examples of such assessment tasks might include student presentations with live Q&A sessions, group discussions where students must reference specific peer contributions or project work based in personal experience or local context (AI for Education, 2024b). These are not just AI-resistant but, potentially, also more valid and authentic assessments.
From ‘open book’ to ‘open ChatGPT’
We may think of students using AI in assessment primarily as a risk, but it is also an opportunity to set more interesting and challenging tasks, assess higher-order skills and develop students’ skills in using AI.
AI literacy is itself a skill to be developed and assessed. Students can learn to evaluate AI-generated content for accuracy and bias, compare AI-generated and human-written texts, and prompt AI tools effectively. This develops critical thinking about AI’s capabilities and limitations, while preparing students for future workplace demands.
Researcher-practitioners such as Kentz (2024) are experimenting with using AI to assess students’ creativity, critical thinking and metacognition, while developing their AI literacy. By asking students to use GenAI tools iteratively to respond to a task, and then reviewing the transcripts of students’ interactions with the AI, teachers can assess not only how the student is using AI but also how their thought process follows and responds to the dialogue with the AI tool. Evolving the ‘open book’ assessment into an ‘open ChatGPT’ assessment in this way has the potential to give teachers a window into their students’ thinking that traditional assessment struggles to provide.
A framework for implementation
To realise these benefits while managing risks, schools need a structured approach to implementing AI in assessment. The AI Assessment Scale (Perkins, Furze et al., 2024) provides a framework for ethically integrating GenAI into assessment practices, recognising different levels of appropriate AI use depending on the assessment context and learning objectives.
When developing their policies for AI use in assessment, schools should consider how to create a space in which teachers feel safe and confident to begin low-stakes experimentation – what this author’s colleague Lauren Thorpe refers to as ‘bowling with the bumpers up’.
Key principles for implementation include:
- matching AI use to assessment stakes – experimenting with innovative approaches in low-stakes, formative contexts
- maintaining the teacher’s central role in AI-supported assessment processes, especially assessment design and quality assurance
- ensuring transparency about AI use with all stakeholders, including parents
- developing students’ AI literacy through the curriculum, with a particular focus on safe and ethical AI use
- prioritising data security, IP protection and safeguarding
- ensuring alignment with curriculum and pedagogical approaches.
Balancing challenges and opportunities
The emergence of GenAI tools presents both challenges and opportunities for assessment in education. Success lies not in preventing AI use but in harnessing its potential while managing risks effectively. This requires clear policies, professional development for staff and a willingness to experiment with new assessment approaches that develop students’ AI literacy alongside traditional subject knowledge and skills.
By taking a balanced approach – neither ignoring AI nor allowing it to drive pedagogical decisions – schools can begin to safely experiment with using AI in assessment to realise its benefits. The focus should remain on valid assessment of learning, while preparing students for a future where AI will be an integral part of their working lives.
The examples of AI use and specific tools in this article are for context only. They do not imply endorsement or recommendation of any particular tool or approach by the Department for EducationThe ministerial department responsible for children’s services and education in England or the Chartered College of Teaching and any views stated are those of the individual. Any use of AI also needs to be carefully planned, and what is appropriate in one setting may not be elsewhere. You should always follow the DfE’s Generative AI In Education policy position and product safety expectations in addition to aligning any AI use with the DfE’s latest Keeping Children Safe in Education guidance. You can also find teacher and leader toolkits on gov.uk .