Impact Journal Logo

Identifying evidence-based professional development: Programmes, forms and mechanisms

Written by: Sam Sims and Harry Fletcher-Wood
11 min read

Teachers acquire new skills in a number of ways: through experience, through working with other high-quality teachers and through participating in effective professional development (for a summary, see Allen and Sims, 2018). There are several ways in which policymakers and school leaders can try to affect teacher quality through each route. But effective professional development (PD) has the potential to be both scalable and portable, making it a particularly attractive option for improving teaching quality.

The difficulty is in understanding the apparently innocuous ‘effective’. How do we know what is effective, as opposed to just plausible, interesting or enjoyable? How do research funders like the Education Endowment Foundation (EEF) in England know which types of interventions to test? And how do school leaders and teacher trainers know which types of PD to opt for? In this article, we set out three different foci for identifying effective PD: programmes, forms and mechanisms. For each, we provide a definition, give examples and discuss the advantages and disadvantages of the approach.


PD programmes are specific sets of activities and materials that have their own identity and tend to be located in, or associated with, specific people or institutions. By activities, we mean the actions and tasks that teachers go through, and by materials, we mean the concrete stimuli and curricular resources that are provided for teachers. In well-established programmes, the activities are sometimes codified in a programme manual and the materials can sometimes be acquired off-the-shelf as part of a resource pack. These will often be subject to copyright, are therefore specific to the programme and tend to come under a brand name. The identity of the programme is often intertwined with the team that developed it or the institution in which it is based.

For example, the Dialogic Teaching programme, developed by Robin Alexander at the University of Cambridge, aims to improve the way in which teachers use discussion in the classroom (Alexander, 2017). It consists of a set of activities, including induction training, in-school mentoring, discussion of video and audio recordings of teaching, and planning and review activities. All of this is encapsulated in a 68-page handbook that describes what happens during each week of the course. The programme also provides teachers with materials, including the book on which the programme is based, proformas to support the planning and review sessions, and a laminated sheet summarising the dialogic repertoires that the programme focuses on developing.

The specific and codified nature of PD programmes means that they are often suitable for evaluation using experimental or quasi-experimental methods. This involves providing access to the programme for one ‘treatment’ group and denying it to another ‘control’ group, and then measuring the differences in outcomes between these two groups. Randomised controlled trials (RCTs) determine which teachers end up in the treatment or control group by, in essence, the flip of a coin. For the same reason that, over many coin flips, we would expect to get approximately 50 per cent heads and 50 per cent tails, RCTs ensure that teachers allocated to the treatment group have (approximately) identical characteristics to teachers allocated to the control group. Because the two groups have near-identical characteristics, any differences in outcomes are inferred to be the result of the one remaining difference between the two groups: receipt of the PD programme. The Dialogic Teaching programme has been evaluated using just this method and found to have a positive impact in science (Jay et al., 2017), for example.

While the results of the Dialogic Teaching trial certainly provide support for its use in schools, we might want to hold off declaring it to be effective PD. Several PD programmes have shown positive results in initial RCTs only for larger, follow-up RCTs to find no impact. For example, Thinking, Doing, Talking Science achieved impressive results for an efficacy trial with 41 schools (Hanley et al., 2015), but the impact on student learning disappeared when the same programme was retested in a larger RCT with 200 schools (Kitmitto et al., 2018).

There are at least two reasons why this might happen. One is that when the programme is scaled up, the original developers have less influence and the programme becomes warped. Another is that the randomisation did a poor job at achieving balance. Although we would always expect many coin tosses to result in approximately 50 per cent heads and 50 per cent tails, every now and then somebody will flip a sizable majority of heads. In trial terms, this might result in more of the motivated teachers, for example, ending up in the treatment group. Better outcomes in the treatment group could now reflect either the impact of the programme or the effect of having more motivated teachers in the treatment group. We just don’t know. For both reasons, replicating the results of trials is very important. Dialogic Teaching is currently undergoing just such a replication attempt. Other PD programmes that have shown positive evidence of impact in replicated RCTs include Reading Recovery (Sirinides et al., 2018) and My Teaching Partner (Allen et al., 2015).

A major advantage of examining the effectiveness of PD programmes (as opposed to forms or mechanisms) is that they are relatively unambiguous. It is clear what we are talking about when we discuss the Dialogic Teaching programme. Alexander’s book explains the theory of the programme, and the programme manual sets out the precise activities and materials involved. This is an important advantage given the ambiguity of language in the social sciences and the potential for misunderstanding. For example, two teachers might both experience ‘coaching’ PD, but in practice one might be experiencing instructional coaching and the other ‘non-directive’ coaching, which bear little resemblance to each other. Despite this advantage, identifying effective PD programmes may not always be much help to school leaders and policymakers. They may not be able to access effective programmes, for example, because they are based in the US and they are not available elsewhere. US programmes may also not translate effectively to schools in other countries (see, for example, Success for All, Miller et al., 2017). Even those that are already based in the same country can only expand so quickly, given the need to maintain fidelity to the programme design.


An alternative approach to trying to identify effective PD is to focus on forms. A form is a type or category of PD that is specified at a higher level of abstraction than a programme. Forms are defined by a set of characteristics: typical, identifying features. Unlike programmes, forms can accommodate variation in the specific materials and activities involved and are not uniquely associated with specific people or institutions.

Instructional coaching, for example, has been defined as (Kraft et al., 2018, p. 553):

“instructional experts work[ing] with teachers to discuss classroom practice in a way that is (a) individualized – coaching sessions are one-on-one; (b) intensive – coaches and teachers interact at least every couple of weeks; (c) sustained – teachers receive coaching over an extended period of time; (d) context specific – teachers are coached on their practices within the context of their own classroom; and (e) focused – coaches work with teachers to engage in deliberate practice of specific skills.”

All PD programmes with these characteristics are examples of instructional coaching, regardless of other characteristics, such as the specific materials they employ or whether teachers receive additional training sessions. Examples include the Content Focused Coaching programme (Matsumura et al., 2012).

A form cannot be evaluated in the same way as a programme. This is because while you can allocate a teacher to receive a set of activities and materials that make up a programme, you cannot allocate them to receive a category of interventions. A more fruitful way to evaluate a form of PD is to look at the evidence across many programmes of a particular form. Meta-analyses of RCTs or good quasi-experimental studies allow us to do this by answering two important questions. Firstly, do examples of the form work on average? Kraft et al. (2018) find that, on average, instructional coaching has a positive impact. Secondly, under what circumstances does it appear to work best? Kraft et al. (2018) show, for example, that combining coaching with group training and instructional resources is associated with a larger impact. However, the amount of coaching provided is not.

Seeking effective forms of professional development has some advantages over seeking effective programmes. A form is more portable than a programme: it is generally easier to design PD around characteristics than to reproduce a specific programme. Forms can also be adapted more easily to suit the needs of a particular school. However, knowing that a form of professional development is effective on average does not guarantee that a particular instance will work. For example, some forms of instructional coaching have not had the desired impact on student learning (e.g. Garet et al., 2011). Identifying effective forms also requires many good evaluations, whereas programmes can be established as effective (based on our proposal above) with as few as two studies. For example, instructional coaching is the only form of PD that we know of to have demonstrated a positive effect from a meta-analysis of rigorous studies. Identifying effective forms will therefore take time.


A third approach is to focus on identifying effective mechanisms. A mechanism is an ‘active ingredient’ – that is, it could not be removed from PD without making that PD less effective. Susan Michie and colleagues further define a mechanism as an ‘observable, replicable and irreducible component of an intervention designed to alter or redirect causal processes that regulate behaviour’ (Michie et al., 2013, p. 5). ‘Observable’ implies that it is concrete as opposed to abstract. For example, being ‘inspirational’ is not observable. ‘Replicable’ implies that it could also be used in many contexts. Having a single celebrity speaker deliver part of a PD programme is arguably not replicable, for example, since they can only be in one place at a time. ‘Irreducible’ implies that it cannot be split into further constituent parts. This is intended to emphasise that mechanisms are basic building blocks. Finally, in the context of teacher PD, ‘altering human behaviour’ should be interpreted as changing teachers’ practice.

Michie and her team have exhaustively identified 93 such mechanisms, organised in 16 clusters. For example, Cluster 9, ‘Goals and planning’, covers nine specific mechanisms, including planning of implementation and planned reviews of whether specific goals were achieved. As well as giving examples of specific mechanisms, we can characterise PD programmes in terms of the interlocking set of mechanisms of which they are made up. For example, Content Focused Coaching (Matsumura et al., 2012) appears to consist of nine mechanisms, including providing an observable example of a technique, providing communication from a credible source in favour of a technique, and prompting rehearsal of a specific technique.

While useful, Michie’s taxonomy only provides a taxonomy of potentially effective mechanisms. The empirical evidence for the efficacy of each mechanism has not yet been established. Moreover, the value of a specific mechanism depends on the other mechanisms with which it is combined. For example, prompting rehearsal of a technique may not be of any use if the teacher has not been provided with an observable example of a technique. So, how can we identify which will be effective for PD? We have suggested elsewhere that this requires two types of evidence. Firstly, the mechanisms should have empirical evidence of being effective for changing practice across a range of contexts, including outside of education (Clarke et al., 2014). This increases our confidence that the mechanism is a genuinely reproducible, active ingredient. Secondly, the mechanism should appear in PD programmes or forms that have evidence of being effective. This increases our confidence that the mechanism can be combined with others to improve teachers’ practice in particular.

An important advantage of mechanisms is that they can be deployed and combined in a flexible way. In addition, when the mechanisms behind an effective programme are accurately understood and specified, this can also help to guard against a ‘lethal mutation’ – a variation that is no longer faithful to the underlying principles of the intervention. There are, however, disadvantages of trying to identify effective mechanisms. As with forms, mechanisms leave the content of PD underspecified: they are more or less silent on what teachers should work on. More importantly, perhaps, they also have very high evidential standards. Understanding whether mechanisms work requires both evaluations of teacher PD programmes and evaluations from other contexts, as well as a good deal of interpretative effort to bring these together. The evidence base is not yet sufficient to confidently apply this approach in education, though research is developing rapidly.


In this article, we have set out three possible foci for identifying effective professional development. While we have presented them separately, there are benefits of considering all three levels of the framework when discussing, designing, commissioning and evaluating CPD. For example, understanding the mechanisms behind a programme can help to ensure that it is implemented faithfully or make clear the scope for adaptation. Similarly, looking to understand when forms appear to be effective can help to identify under what conditions, or in which settings, a specific programme should be applied. Despite their distinctive (dis)advantages, programmes, forms and mechanisms should therefore be considered complementary forms of evidence.


Alexander R J (2017) Developing dialogue: Process, trial, outcomes. In: 17th Biennial EARLI Conference, Tampere, Finland.

Allen JP, Hafen CA, Gregory AC et al. (2015) Enhancing secondary school instruction and student achievement: Replication and extension of the My Teaching Partner-Secondary intervention. Journal of Research on Educational Effectiveness 8(4): 475–489.

Allen R and Sims S (2018) The Teacher Gap. London: Routledge.

Clarke B, Gillies D, Illari P et al. (2014) Mechanisms and the evidence hierarchy. Topoi 33(2): 339–360.

Garet MS, Wayne AJ, Stancavage F et al. (2011) Middle School Mathematics Professional Development Impact Study: Findings After the Second Year of Implementation. NCEE 2011–4024. Washington, DC: National Center for Education Evaluation and Regional Assistance.

Hanley P, Slavin R and Elliott L (2015) Thinking, Doing, Talking Science: Evaluation Report and Executive Summary. London: Education Endowment Foundation.

Jay T, Willis B and Thomas P (2017) Dialogic Teaching: Evaluation Report and Executive Summary. London: Education Endowment Foundation.

Kitmitto S, González R and Mezzanote J (2018) Thinking, Doing, Talking Science: Evaluation Report and Executive Summary. London: Education Endowment Foundation.

Kraft M A, Blazar D and Hogan D (2018) The effect of teacher coaching on instruction and achievement: A meta-analysis of the causal evidence. Review of Educational Research 88(4): 547–588.

Matsumura LC, Garnier HE and Spybrook J (2012) The effect of content-focused coaching on the quality of classroom text discussions. Journal of Teacher Education 63(3): 214–228.

Michie S, Richardson M, Johnston M et al. (2013) The behavior change technique taxonomy (v1) of 93 hierarchically clustered techniques: Building an international consensus for the reporting of behavior change interventions. Annals of Behavioral Medicine 46(1): 81–95.

Miller S, Biggart A, Sloan S et al. (2017) Success for All: Evaluation Report and Executive Summary. London: Education Endowment Foundation.

Sirinides P, Gray A and May H (2018) The impacts of Reading Recovery at scale: Results from the 4-year i3 external evaluation. Educational Evaluation and Policy Analysis 40(3): 316–335.

      0 0 votes
      Please Rate this content
      Notify of
      Inline Feedbacks
      View all comments

      From this issue

      Impact Articles on the same themes

      Author(s): Bill Lucas