Impact Journal Logo

Measuring what matters: Redefining data’s role in schools

9 min read

We [often] question the judgement of experts whenever we seek out a second opinion on a medical diagnosis. Unfortunately, when it comes to our own knowledge and opinions, we often favor feeling right over being right… We need to develop the habit of forming our own second opinions.

Grant, 2021, p. 18

We distinctly remember telling parents/carers, ‘Your child has been working hard this year; it shows because their end-of-term scores have gone up.’ We also remember the guilt at realising that the quantitative measure that we were sharing didn’t fully represent the progress and development that the pupil had made across the year, because it couldn’t. It’s something we still sit with today and is the reason we felt compelled to write this article. Not because we believe that schools are doing something ‘wrong’, but because the great responsibility that we have as teachers means that we should constantly question our existing practices. We must ensure that we favour being right over feeling right. 

In 2018 alone, the Department for Education published their report on tackling onerous and inappropriate data requirements (Teacher Workload Advisory Group, 2018); Professor Becky Allen wrote about issues surrounding ‘progress’ measures (Allen, 2018); and the National Director of Education wrote for Ofsted how time invested in marking and data practices wasn’t being reflected in benefits to learning (Harford, 2018). We acknowledge that this article may make for uneasy reading as it attempts to scratch the surface of ideas that are at times unwieldy, often abstract, but always important.

How do we know whether students are getting better?

Let’s return to the statement we shared: ‘Their scores have gone up.’

We wanted to communicate whether students were ‘getting better’: if the numbers are increasing, then surely students are improving? We’d assumed that scores alone could pinpoint how students were doing and whether they were getting better. When we interrogated this, the walls started crumbling.

Consider how the tests were administered: we teach some topics across a term before assessing in exam conditions. If learning is knowledge that has been permanently changed (Bjork and Bjork, 2011), then would we feel confident that students would score similarly if they repeated that test a week/ term later (i.e. that their knowledge had been permanently changed)? We know that forgetting is inevitable (Murre and Dros, 2015) and we’ve all experienced students being able to do something well only to struggle after even a small delay. What this means is that end-of-unit/term tests are too short-term a measure to draw valid inferences about learning (Christodoulou, 2016).

What about scores/grades from longer-term measures? Grades describe a range of scores, so are inherently imprecise, and they are also arbitrary: cut-off points placed along the normal distribution of student performance (Christodoulou, 2016). What about scores? Let’s again consider administration of tests: students sit an assessment; it gets marked and a score is produced. A question to ask here is whether a single score accurately reflects ability? Would a student get the same score in a test if they took it in the morning vs the afternoon (Christodoulou, 2016)? Normal variations in mental performance mean that we can’t treat a given score as precise; instead, it will reflect a point within a range that a student would likely achieve under any given circumstances (Allen, 2018).

A grade/score alone can’t reflect progress because:

  • Termly/unit tests are too short-term a measure
  • They will only test a small sample of the domain 
  • Grades communicate broadly how well students might be doing in a subject, but they are imprecise by design
  • Scores are unreliable because they represent a point within a probable range.


So, grades are too imprecise to be compared and a comparison of scores is a comparison of ranges (which can’t be done), which means we can’t measure progress (Allen, 2018). So how can we know that students are getting better? Because our curriculums are designed to ensure this. We sequence and teach the curriculum so that knowledge is built, returned to, and embedded over time (e.g. by regularly providing low-stakes retrieval opportunities) and check whether students are ready to move on by using frequent, targeted formative checks for understanding.

What can grades or scores tell us?

While grades/scores aren’t precise enough to tell us what students know, they can communicate proficiency within a subject. Getting an 8 at GCSE maths tells us that a student is highly competent at maths and will likely be successful in future maths-based study/employment. Outside of national exams, things get a little trickier.

The whole purpose of grades is to communicate a shared meaning (Wiliam and Black, 1996) e.g. GCSE grades are designed to evaluate student performance against an entire national cohort, to give an idea of relative proficiency. However, schools sometimes apply them as a ‘point-in-time’ measure, which can be problematic when it comes to shared meaning. If we award a student a grade 3 for their end-of-year-7 exam, does it mean that we expect them to get a 3 at GCSE? Or that if they take their GCSE today, they’d get a 3? Or something else entirely? 

An alternative might be to provide scores, but again we run into problems. As discussed, scores are highly variable and, in more subjective disciplines (e.g. English and art), can also be influenced by the marking. In our experience, teachers will often qualitatively agree, but the challenge comes in trying to give a score based on criteria that require interpretation, with the result often varying, depending on the marker – it’s why moderation exists. An alternative might be to use comparative judgement, which instead requires assessors to compare pupil work and simply decide which is better (Wheadon et al., 2020; Jones et al., 2016).

As the above perhaps demonstrates, there are no easy solutions to this incredibly complex problem:

  • Scores give a false sense of preciseness
  • Grades avoid this but can create confusion if they overlap with national exam grades.
  • Both can be threatened by subjective marking, which might be eased by comparative judgement, but its use must be driven by the purpose that having those comparisons would serve.


As former teachers, we appreciate the pressures of the system and know that you can’t just stop considering grades/scores. However, we hope that this article supports you to reflect upon the meaning attributed to them, the sources used to select them and the way in which you communicate them to stakeholders.

What are the leadership implications?

Leaders could consider the evidence behind what they are requesting and challenge ‘how we’ve always done it’.. Too often, teachers feel that they are being asked for data that doesn’t seem to serve a purpose. Sometimes, leaders believe that specific data (e.g. target grades) can impact wider school improvement, but this purpose doesn’t get shared with staff. Sometimes, these clear reasons are missing. As a former primary school assessment lead, it was common practice to collect data about pupils in all subjects, regardless of whether those particular subjects had been taught since the previous assessment. There was a box on the software programme that seemed to need filling. Where use of data is effective, leaders already ensure that teachers understand the rationale. To communicate the rationale clearly with staff, leaders need to ensure that they themselves are aware of why the data is being collected.

Alongside this, leaders could be highly selective in the data that they ask colleagues to collect. Because of the time and accountability pressures, ‘we sometimes use data that we have to hand rather than what we need’ (EEF, 2019, p. 15). So, we might end up with data that was collected to serve a different purpose from the one for which we’re using it, which can result in misplaced justifications under the guise of ‘evidence’, e.g. a failure to meet the grade/progress target is seen as a lack of teacher/curriculum effectiveness. We can mitigate this by:

  1. using multiple sources to triangulate and draw more confident conclusions (e.g. accompanying end-of-year tests with teacher reflections when understanding attainment for individual students)
  2. acknowledging the original sources of data (e.g. ‘The single grade “FFT estimate” actually summarises a distribution of grades. A particular pupil is estimated to have an x% chance of achieving a grade 9, a y% chance of achieving a grade 8, and so on. The single grade estimate is the mid-point of that distribution.’ (FFT Education Datalab, 2022, para. 5))
  3. acknowledging the strengths/limitations of data and selecting others that will balance those out (e.g. lesson observations carried out regularly and by different people to mitigate against their subjective nature ).


While we’ve discussed how leaders might select data effectively, the questions preceding this are: What are we hoping to achieve by collecting this data? How will it improve teaching and learning? Is it more important that teachers can accurately assess in the moment, making decisions about how to support pupils during a lesson? Many schools have made shifts towards this approach already. What would happen if leaders in even more schools focused more on what inferences teachers are making with pupils day to day, considering the suggestions above, and less on each pupil’s row in their master spreadsheet?

What are the systemic implications?

We’ve considered teachers and leaders. However, a common and valid pushback is that all teachers and leaders work within a system of accountability. The 2016 report of the Independent Teacher Workload Review Group stated that ‘Although the Ofsted framework has changed, there is evidence to suggest that workload pressures associated with inspection have not been eased.’ (p. 6)

One challenge here is the constantly moving goalposts. Daniel Kebede made the case when he responded to the idea that the Ofsted framework has had positive changes : ‘the Ofsted framework has changed five times in nine years. That is incredibly difficult for a profession to keep up with… What sort of evidence do they collect in that Ofsted framework window? It can be really problematic, creating its own internal pressures.’ (Education Committee, 2023, p. 3) While the changes might be positive, the pace of change means that school leaders share expectations with staff, only to find that those no longer match the inspection framework, and sometimes labour-intensive data-gathering is not fit for purpose. Some of these changes reflect the fact that the focus is now on curriculum (intent, implementation and impact), rather than quality assurance by assessing outcomes. However, some schools are still collecting more data than others to evidence impact or are using it to drive their curriculum work; ‘“data drops” are more frequent in schools currently judged by Ofsted as requires improvement (RI) or inadequate’ (Fischer Family Trust, 2019, p. 4).


Despite changes in policies and frameworks, practices have lagged. A suggestion is that this is because using assessment data provides an efficient, low-cost and seemingly objective way in which to measure the impact of teaching via student progress (Bitler et al., 2021). Our hope is that this article prompts you to consider the following questions:

  • Are teachers managing the tension between moving through the curriculum and responding to the needs of the students (identified through regular checks for understanding)?
  • What inferences are we making from the data?
  • How do we know that those inferences are valid/accurate? 
  • Are leaders communicating well-informed rationales for data collection?


Confronting these questions is no small ask and may mean difficult conversations with parents/carers, colleagues and perhaps ourselves. Yet it is something that we must ask of ourselves, our schools and our systems: to be brave in our leadership and give ourselves permission to be led by our curriculum and our teaching – to make the difficult choice of prioritising being right over feeling right.

    0 0 votes
    Please Rate this content
    Notify of
    Inline Feedbacks
    View all comments

    From this issue

    Impact Articles on the same themes

    Author(s): Bill Lucas