For four years I have taught in a wide variety of schools and I have always been confused about how we use the word ‘validity’. I knew the word had something to do with ‘goodness’, but goodness of what? And how could I know what was good and what was not?
That is what motivated me to enrol for the University of Cambridge postgraduate certificate in educational assessment and examinations. On this course I examined issues in assessment such as In assessment, the degree to which a particular assessment m..., validation, In assessment, the degree to which the outcome of a particul..., fairness and bias. I also came across Michael Kane and, in particular, his work with Terry Crooks and Allan Cohen (Crooks, Kane and Cohen, 1996). Of all the readings I reflected on during the course, this provided an answer to at least one of my two central questions and supported the development of assessment in my department and school.
Although other popular assessment books such as Daisy Christodoulou’s Making Good Progress? (2017) present validity in fairly concrete, black-and-white terms, my postgraduate study showed me that validity can be conceptualised in many different ways. For the working teacher, being faced with a dense and somewhat impenetrable definition of validity is not helpful:
‘Validity is an integrated evaluative judgement of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores or other modes of assessment’ (Messick, 1989: p.13).
Not only did I not know where to start forming this ‘integrated evaluative judgement’, I also did not know how to. Further reading led me to an attempt by Shaw and Crisp to develop and apply an approach to validation for large-cohort and high-stakes assessment. In Shaw and Crisp (2012), a single qualification (A-level physics) for the October/November 2009 examinations was validated. Eighteen separate pieces of complex evidence over 47 pages were interrogated to come to a decision on validity. The sheer scale of the work appeared to show me that an attempt at validation within my school would not be feasible.
It seemed as if I would not be able to take the theoretical perspectives from researchers and apply them with high fidelity to my classroom. However, Crooks, Kane and Cohen (1996) provided a way to operationalise validity by stating clear validation criteria that can work within any assessment structure. Following their approach has enabled me to have powerful discussions about validity with every teacher, in every department.
The single most important discussion starter was to ask each assessment developer: ‘What is your intended purpose for this?’ Validation, in whatever form, cannot start without this. But this alone did not revolutionise my practise. As a curriculum and assessment designer for science, I needed more than a single, sharp question. What I got from Crooks, Kane and Cohen (1996) were clear stages I could use to develop a series of questions.
They break down validation into eight linked stages and use the metaphor of a chain to emphasise the importance of the weakest link; the chain (validation argument) is only as strong as the weakest link. The eight stages are:
- Administration – how the assessment is given to the student.
- Scoring – how the assessment is scored.
- Aggregation – how smaller tests are combined to give one larger test score.
- Generalisation – how much this single test gives us useful information about the rest of the assessment domain (for example, how well does one topic test on mathematics tell us about performance on all the possible topic questions generally).
- Extrapolation – what the assessment can tell us about real-world performance (in the previous example: mathematics ability).
- Evaluation – how the student’s performance is judged.
- Decision – what decisions are made because of the judgements.
- Impact – what effect does the assessment have on students, teachers, parents and other participants?
This is still relatively technical and I found it hard to bring together for the benefit of students and teachers. I recalled how powerful the purpose question was, and realised that, much like our students, staff and I responded much better to questions than declarative statements. Everyone knows what to do with a question (answer it!) as opposed to a statement (such as the definition by Messick).
Even then, with clear questions, the answers seemed to have a high noise-to-signal ratio. Yet again, Crooks, Kane and Cohen (1996) provided a solution. By offering some associated threats to validity it allowed people working with the questions to sharpen their answers. I took the separate insights from the paper and formed a useable framework, based on the eight stages of the validation.
As an example, for the administration stage, I suggested questions such as: what were the administration procedures for this assessment? Were students treated differently? Where and how did the students do the assessment? Alongside these questions, I identified possible threats, such as anxiety and inappropriate conditions. The full framework for each of the eight stages can be seen in Figure 1.
Printed out on A3 paper and used during roundtable discussions with other colleagues, this framework presented a clear route through the forest of validity and validation. Teachers came to find the intellectual clarity empowering and exhilarating, and our discussions on assessment became far more meaningful. Some teachers have signed up to assessment courses around the country, and if that’s the impact from reading a single paper, we all ought to read more.
Christodoulou D (2017) Making Good Progress? 1st ed. Oxford: Oxford University Press.
Crooks T, Kane M and Cohen A (1996) Threats to the valid use of assessments. Assessment in Education: Principles, Policy & Practice, 3(3): 265-286.
Messick S (1989) Validity. In: Linn R (ed) Educational Measurement. New York: Macmillan, pp.13-103.
Shaw S and Crisp V (2012) An approach to validation: Developing and applying an approach for the validation of general qualifications. Research Matters: A Cambridge Assessment Publication, Special Issue 3.