Impact Journal Logo

Teacher development: Measuring what matters

6 min read
Chris Larvin, Research Specialist, Teach First, UK
Jenny Griffiths, Knowledge & Research Manager, Teach First, UK
Luke Bocock, Head of Research & Learning, Teach First, UK

Recent research suggests that the impact of high-quality teacher development on pupil outcomes can be equivalent to a pupil taught by a teacher with 10 years of experience compared with a graduate (EPI, 2020). Despite an abundance of claims about the impact of professional development, these are rarely scrutinised with respect to their validity and reliability. Anecdotes and testimonials from past participants, while an effective marketing tool, can be highly subjective and unreliable (Guskey, 2009). Similarly, traditional approaches to measuring and evaluating the impact of programmes generally rely heavily on self-reported impact. So how can we know whether or not a teacher development programme is effective? 

Measuring teacher development

Most sources of information on the impact of professional development on teachers and their pupils rely on snapshots of teachers’ performance – for example, classroom observations and quantitative measures of pupils’ outcomes through summative assessment. Both of these measures are problematic. Classroom observations, frequently used to evaluate performance for the award of QTS or performance management, tell us little about development and come with demonstrable concerns over reliability and validity (Coe et al., 2014; Mihaly et al., 2013; Strong et al., 2011). As a measure of teacher performance, pupil outcomes fail to control for out-of-school contextual factors or to account for the varied impact of multiple teachers over time (Cambridge CEM, 2019; Shakeshaft et al., 2013). Studies have also failed to demonstrate the impact of classroom observation on pupil outcomes (EEF, 2018). 

As a result, neither of these approaches are particularly helpful for evaluating teacher development courses, particularly at scale. Hybrid teacher development programmes have also made the gauging of impact more complex. Metrics generated by learning management systems for online units of study can give insight into experience, but tell us little about the complexity of translating teacher development content into classroom practice.

For example: 

  • Measures of engagement, such as attendance, are poor proxies of what teachers may have learnt
  • Teachers’ interest and participation in their learning can provide insight into their motivation for learning but are subjective and may have no impact in the classroom
  • Satisfaction and other factors not related to the programme’s effectiveness are inadequate measures of evaluating the impact of a programme, and surveys can be subject to selection bias
  • Teacher knowledge is complex, and while recall of curriculum content is valuable, as a measure it fails to account for the application of that knowledge and so must be validated with other measures of learning (Singhghutaura, 2017
  • The demonstration of a teacher’s competence to apply a skill during a training episode is insufficient evidence that they can recall and apply it to their classroom later.


Acknowledging the complexity of teacher development and the difficulty of evaluating its impact on pupil outcomes, it is apparent that the evaluation process requires triangulation of multiple sources of evidence. Evaluation of teacher development programmes should therefore consider the intended development of teachers’ knowledge, competence and decision-making and what they can apply in their classroom. These are attainable measures that can support a greater understanding of the effective mechanisms of teachers’ professional growth by programme designers and evaluators. 

Developing new tools 

Evaluating large-scale teacher development programmes poses a particular challenge in ensuring high levels of validity. To combat this challenge, Teach First and researchers at the University of York, led by Professor Robert Klassen, have been developing three assessment tools to use with teacher development programmes: knowledge checks to evaluate the development of teachers’ knowledge; bespoke self-beliefs inventories to track shifts in self-efficacy; and scenario-based learning (SBL) to explore decision-making. While we recognise the limitations of each approach, the combination is designed to help us better understand the effectiveness of our programmes and the development of the teachers participating in them. 

Addressing the desire to understand changes in teachers’ knowledge of key aspects of the curriculum, robust knowledge checks are integrated in the start and end of a learning sequence, serving both formative and summative purposes. The automatically graded multiple-choice quizzes provide feedback to move teachers forward in their learning, while ensuring a level of challenge that continues to motivate teachers to engage with their online learning. Rather than simplistic recall questions, items have been developed that reflect greater depths of knowledge, such as requiring teachers to evaluate or assess the most appropriate options. Teachers must possess in-depth knowledge and a degree of discriminating judgement to select the correct option from a list, including plausible distractors. As shown in Figure 1, teachers’ aggregate performances can provide insight into their starting points and development across a whole year of a two-year programme. The tool also signposts areas of the curriculum where teachers do not retain key learning and where programme improvements are needed. 

Figure 1 is a bar chart showing changes in mean pre- and post-scores across six modules of the Teach First Early Career Framework programme.

Figure 1: Changes in mean pre- and post-scores across six modules of the Teach First Early Career Framework programme (n = 2,500 to 5,000+)

The teacher self-beliefs inventory seeks to understand changes in teachers’ beliefs about their capabilities throughout a programme. Teachers reflect and record judgements against a scale of teacher self-efficacy, a measure of their perceived capabilities in specific areas of their teaching (Bandura, 1986). While there are existing teacher self-efficacy scales (e.g. Skaalvik and Skaalvik, 2007; Tschannen-Moran and Woolfolk Hoy, 2001), we sought to develop a novel scale that incorporates contemporary conceptions of effective teaching. Our scale was informed by professional frameworks and the structure of Evidence Based Education’s ‘Great Teaching Toolkit’ (Coe et al., 2020). The role of self-efficacy in teacher development is particularly interesting, given its influential role in teacher development and established empirical relationships with teachers’ practices, enthusiasm and commitment (Klassen and Tze, 2014). The results from this tool are analysed against demographic and group information, such as teaching phase or subject, to understand the differences in perceived capabilities and to inform subsequent programme improvements. 

Teacher development requires not only knowing more and believing that they are effective, but successful decision-making in the classroom. SBL (sometimes called case-based learning or near-world simulation) is a promising area of development in teacher education, whereby programme members engage with realistic scenarios of critical incidents related to a specific area of teaching (Klassen et al., 2021). As a component within a module, a teacher is presented with a complex classroom scenario and asked to evaluate the appropriateness of several options to take in response. A panel of expert teachers has previously evaluated these options as to how appropriate they are in the classroom, with each option generating additional feedback to the teacher. In addition to providing insight into teachers’ decision-making and how this may shift throughout a programme, this tool represents an integrated reflection-feedback cycle that enables teachers to think deeply about an area of their practice, gain insight into the experts’ reasoning and re-evaluate their own perspective. This tool is built into the programme at strategic points to provide opportunities for programme members to monitor their professional development and build on these experiences in future SBL components. 

From data to conclusions

When drawing conclusions from these tools, we must acknowledge their limitations. For example, a component of teachers’ improved performance shown in Figure 1 may be due to the practice effect of familiarity with quiz questions. The self-report nature of the self-beliefs inventory may reflect social desirability bias, as teachers report levels of confidence that they feel would be viewed favourably. Therefore, inferences drawn from these tools are made at cohort and subgroup levels rather than individual. This enables us to analyse and consider issues relating to potential biases relating to, for example, gender or ethnicity. Findings can then be incorporated into the iterative development refinement and validation cycle of programmes. Further explanatory qualitative evaluation can also be carried out, such as on the new NPQ for ‘Leading Teacher Development’, where focus group discussions were used to help understand whether considerable self-efficacy growth in ‘professional development’ was due to programme design or expertise gained through practical leadership experience. 

Early indications are that these tools provide far greater insight into the impact of our teacher development programmes, with statistical correlations supporting the principle of triangulation. They will contribute to an increased understanding of how teachers develop across programmes throughout differing stages of their careers. Ultimately, we hope that this work will pave the way for ITT and teacher development providers – schools, trainees and mentors – to have a more objective, granular understanding of what trainees have learned and what they can do, and therefore enable them to better identify further development needs.

    0 0 votes
    Please Rate this content
    Notify of
    1 Comment
    Newest Most Voted
    Inline Feedbacks
    View all comments
    Yitzchak Yitzchak Freeman

    Sounds very interesting. It would be great to have links to examples of these tools.

    From this issue

    Impact Articles on the same themes