The Case for Ofsted: is there one?

“We have done more to raise standards in 21 years of existence than any other organisation.” (Michael Wilshaw, Chief inspector of Schools in England, 2014)

“The reality is that Ofsted is no longer fit for purpose, if it ever was. It […] does nothing to improve the quality of education.” (Bernadette Hunter, National President NAHT, 2013)

1. Introduction

In 1992, John Major’s government, dissatisfied with what it saw as the ‘progressive’ tendencies of existing school inspection systems, established a new school accountability system in England. This new national body, called the Office for Standards in Education (Ofsted), was charged with supervising the inspection of all state funded schools in England. Then, as now, Ofsted’s defining mantra was “improvement through inspection” (Ofsted 1993:8).

From its beginnings Ofsted’s validity, reliability and value for money were questioned by the research community (e.g. Fitz-Gibbon 1995), and its alleged (negative) impacts on morale and professionalism were articulated by teachers (e.g. Brimblecomb, Ormston and Shaw 1995, Jeffery and Woods 1996). In the twenty-one years since its first inspection, attitudes to Ofsted have hardly changed. It is still seen by many as an unaccountable and unregulated body, which bases its judgements on questionable evidence, and which enjoys disproportionate power and influence over schools and teachers (see, for example, comments in Gray and Gardner 1999, Coe 2013a, NASUWT 2014, NUT 2014a).

When I wrote this essay in April 2014 the three main teachers’ unions were holding their annual conferences and, unsurprisingly, all had something to say about Ofsted. Christine Blower, General Secretary of the NUT described it as a “destructive system of inspection” (NUT 2014b). Her counterpart Chris Keates at NASUWT regards Ofsted as “seriously tainted” (NASUWT 2014); and Mary Bousted, General Secretary of ATL has opined that “Ofsted can no longer claim that its inspection reports are worth the paper they are written on” (Exley 2014).

Rhetoric of this kind from those most frequently at the sharp end of Ofsted’s judgements is unsurprising, but the bad blood had well and truly spilled over into government, with Chief Inspector, Michael Wilshaw, accusing then Secretary of State for Education, Michael Gove, of briefing against the inspectorate through think tanks such as Civitas and Policy Exchange (Withnall 2014). While Wilshaw later back-peddled on his accusation (Sellgren 2014), it is true that both organisations had publicly questioned Ofsted’s effectiveness (de Waal 2008, Waldergrave and Simons 2014).

Despite almost continuous criticism from teachers and researchers since its inception, however, Ofsted has enjoyed support from three consecutive governments, four political parties, and eleven Secretaries of State for Education. In 1998, Ofsted’s responsibilities were extended to include inspection of Local Education Authorities (LEAs) and teacher training establishments (Elliot 2012). Then, in 2001, inspection of day care and childminding were added (Plomin 2001). Subsequently, in 2007, inspection of post-16 government-funded education, social care services for children, and the welfare inspection of independent and maintained boarding schools were added to its remit (Carvel and Ward 2007). It is now overseeing voluntary inspection of British Schools Overseas (HMG 2014). In the 21 years of it’s existence, Ofsted’s has formally updated its inspection framework only twice. All of this suggests robust government satisfaction with Ofsted’s effectiveness.

Given the collective uncertainty over Ofsted’s fitness for purpose, summarised in the two quotes opening this essay, what evidence is there that Ofsted meets its defining mantra of ‘Improvement through Inspection’? Drawing on empirical evidence, this essay addresses that question and attempts to assess whether any conclusion can be confidently drawn about the extent to which inspection can be considered a catalyst for school improvement.

2. Defining School Improvement

It is all very well for Ofsted’s Chief Inspector and the General Secretaries of teachers’ unions to state that inspection either is or is not a catalyst for school improvement, but until ‘school improvement’ has been defined such statements are essentially meaningless. There is no clear consensus on what constitutes ‘improvement’. ‘Raising standards’ recurs regularly as a theme in Ofsted’s own pronouncements, with successive Chief Inspectors defining Ofsted’s role as one of “improving standards of achievement and quality of education” (Shaw et al. 2003:64). But this doesn’t get us any closer to understanding what is meant by those terms. Shaw et al. note that Ofsted considers exam results as indicative of standards and quality, and are, therefore, an appropriate measure “to gauge the effect of its agents’ inspections” (Shaw et al. 2003:65).

However, in criticising Ofsted, teachers have accused it of failing to acknowledge the ethos and dedication of the staff, hampering innovation, and ignoring teachers’ efforts to inspire students’ curiosity (NASUWT 2014). So one might infer that, for some at least, these are appropriate measures of success.

Another measure of school effectiveness might relate to teaching approaches. Thus, inspection of the extent to which teaching approaches that have been shown empirically to improve student achievement are used in a school might be used to indicate how effective that school is. By extension, encouraging adoption of those approaches if they are absent might be regarded as an appropriate catalyst for improvement. However rational such an approach might seem, Chief Inspector Michael Wilshaw has been at pains to emphasise that Ofsted favours no particular teaching style (RSA 2012). He refers to Ofsted’s latest guidance to inspectors, which states that they “must not expect teaching staff to teach in any specific way or follow a prescribed methodology” (Ofsted 2012:34). Yet, it has been convincingly demonstrated that inspectors continue to penalise teachers for not following particular pedagogical approaches (Old 2013). Ofsted does not appear to have come to an internal consensus about the relationship between teaching approaches and educational quality, let alone a consensus that incorporates the views of others.

This lack of agreement on what is termed ‘improvement’ invites uncertainty over the construct validity of any assessment (or proclamation) of the effects of inspection, and makes assessing the research evidence problematic.

3. Inferring Effects of Inspection

Inferring the effects of inspection on improvement is further complicated by inconsistencies in methodological quality among the relevant studies. There is a dearth of studies that can be considered sufficiently methodologically rigorous to allow for causal inferences to be made confidently (de Wolf and Janssens 2007:388).

4. Systematic Reviews

Faced with the sheer volume of literature on school inspection, the problems associated with defining school improvement and the uneven methodological quality of the published research, I have chosen to concentrate on systematic reviews of empirical research that attempt to address the primary question of the effect of school inspection on school improvement. Good systematic reviews approach assessment of the research literature in methodical and transparent ways, which define the question they seek to answer, identify and critically assess the available evidence, and synthesise the findings to draw relevant conclusions. These conclusions can then be used by practitioners to inform their policy and practice, and for researchers to plan worthwhile future research.

I located four relevant systematic reviews (see footnote) and assessed the quality of each using the guidelines suggested by Oxman and Guyatt (1988). Of the four, a review conducted by Nelson and Ehren in 2014 was the most methodologically sound and the most recently conducted. It also incorporated the findings of the reviews that preceded it. I have therefore chosen to summarise and discuss the evidence on the effects of school inspections in the light of their findings.

5. Review and synthesis of evidence on the (mechanisms of) impact of school inspections (Nelson and Ehren, 2014)

5.1. Review question

The review aims “to identify and summarize findings from international empirical research on the impact of inspections” (p1). This broad scope is qualified by characterising ‘impact of inspection’ in four main ways:

  • School improvement
  • Improvement/introduction of school self-evaluation
  • Behavioural change of teachers and school leaders
  • Student achievement results

These characterisations help to focus the scope of the review. However, with the exception of student achievement results (which can be regarded as relatively objective), without further qualification, the caveats I have expressed on defining improvement stand.

5.2. Methodology

The reviewers describe a thorough search of the literature, using a range of resources, including general and education-specific databases, web portals, paper journals, and books. They list their key words and search terms to allow for replication and provide an annotated bibliography for the 107 studies included in the review. While studies from a range of countries are covered, they have included only reports written in English. Each study was assessed for its relevance to the overarching aim of the review and for its use of empirical evidence, and was weighted for internal validity using the Maryland Scientific Methods Scale, a five-point scale, 5 indicating the highest level of methodological rigour. While the weighting appears in the annotated bibliography, it is not clear from the narrative account of the research how this weighting was taken into account when drawing conclusions from the synthesis. In any case, it is telling that of the 107 studies cited, they rated only five studies as 3 or higher on the scale. In commenting on the paucity of studies allowing for causality to be confidently inferred, the reviewers add their voice to de Wolf and Janssens’ (2007) criticism of the field.

5.3. Findings

Nelson and Ehren introduce their review by stating that “the reviewed research suggests that inspection may” (p 1, their emphasis) have an effect on any or all of their four characterisations of ‘impact’, and that these impacts are not all positive. They also describe previous systematic reviews, noting that all have a high degree of agreement in the conclusions they draw. They go on to summarise their findings based on their four characterisations of ‘impact’.

5.3.1. School improvement

The reviewers find that school improvement following inspection is dependent on four conditions: acceptance, feedback, support for improvement, and leadership.


A small number of studies suggest that, for school improvement to take place following inspection, schools must accept the findings of the inspection as valid. This may seem axiomatic, but its importance should not be lost, especially in the light of teachers’ unions’ rejection of Ofsted’s validity.

In addition, studies disagree about the validity and reliability of some of Ofsted’s tools for assessment. For example, Coe (2013b) describes the frailty of classroom observation as an indicator of teacher effectiveness. Drawing on the work of Mihaly et al. (2013) on reliability of lesson observations, Coe warns that inter-rater reliability is so variable that teachers rated as ‘outstanding’ should do everything they can to avoid being rated again; three times out of four they would be downgraded. He then paints a worrying picture of the validity of lesson observations, concluding that judgement of teacher effectiveness through lesson observation by experienced observers is as likely to be as accurate as the toss of a coin would be (Coe 2013, citing Strong et al. 2013). In contrast, Hussain (2011) asserts that two Ofsted inspectors are likely to arrive at similar conclusions about the quality of classroom teaching, but comments that there is “almost no empirical evidence” on the validity of observations (Hussain 2011:12).

Recent criticism of one of the companies sub-contracted by Ofsted to conduct inspections, Tribal, has done little to support the image of Ofsted’s trustworthiness. They have been accused of generating ‘cut-and-paste’ reports (BBC 2012), and of having conflicts of interest related to the academisation of failing schools (Harris 2013, #teacherROAR 2013). In a leaked email from the Directors of Inspection at Tribal to all of its inspectors, they acknowledge Ofsted’s criticism of the standard of the company’s reports (Bald 2013).

If acceptance is one of the keys to the effectiveness of school inspection, it would seem prudent for Ofsted to seek to address the uncertainties about, and criticisms of, its practices.


Good quality feedback is seen as important to encourage school improvement. Studies from the UK, Sweden, and the USA have all suggested that quality feedback, tailored to the needs of the school, including specific advice on how to improve (not just what to improve), are more likely to have an impact on school improvement.

Support for improvement

Echoing the need for precise feedback on how to improve practice, three studies cited in the review find that support for improvement after inspection is important. It is interesting, then, that HMI inspection, which Ofsted replaced in 1992, is seen to have been much more concerned with helping LEAs to develop and maintain supportive relationships with their schools with a view to promoting sustained improvement. This was perceived as embodying a developmental ethos rather than the more judgemental approach of Ofsted (Gray and Gardner 1999, Field et al. 1998).


Strong leadership that mediates external demands for change into internally-owned desire to improve, is reported to be important in studies from the Netherlands, USA and Germany. In the case of the German studies it was deemed important that the results of inspections are not made public, but rather used by schools internally to guide development. One of the key pillars of Ofsted’s framework is the publication of its reports.

5.3.2. Improvement/introduction of school self-evaluation

School self-evaluation (SSE) is viewed as an indirectly related feature of inspection systems in many countries. The review cites one study in the UK that reports satisfaction on the part of school leaders when Ofsted inspections validate the school’s assessment of itself. However, a number of studies warn that SSEs are often written to comply with the expectations of the inspectorates rather than to reflect honestly on the strengths and weaknesses of the school. Good leadership is posited as key to effective SSE, so Ofsted’s capacity to remove ineffective leaders could be used to justify its claim of a direct effect on school improvement. However, Courtney (2012) observes that SSE has been marginalised in the UK following the “abolition of the self evaluation form” (Courtney 2012:13).

5.3.3. Behavioural change of teachers and school leaders

Three UK studies are cited suggesting that Ofsted has a powerful influence over actions taken or intended to be taken by teachers and school leaders following inspection. Cited in the references, but not fully explored in the review, are studies by Brimblecombe et al. (1996) and Chapman (2001). These authors found respectively that 38% and 22% of teachers are inclined to consider changing their practice as a result of Ofsted inspections. These studies did not ascertain whether this minority of teachers actually went on to do so.

The review describes Courtney’s (2012) case study of head teachers’ reactions to inspection under the revised Ofsted framework of 2012. Courtney claims that schools prioritise only those areas being inspected (considerably slimmed down in the revisions), to the detriment of other areas. He characterises this as inspection for control rather than for improvement. This may or may not be a positive thing; one must trust that Ofsted’s framework prioritises areas that are demonstrably a priority.

5.3.4. Student achievement results

This section deals with perhaps the only objective measure of school effectiveness. Exam results are widely used (if not universally accepted) as measures of both individual and corporate success at school, and thus one might expect them to be common in the research literature as a yardstick by which to assess the effects of school inspections. This is not so.

The only study in the review rated 5 on the MSMS randomly allocated schools to receive an inspection or not, then compared their subsequent exam results. It concluded that “inspections do no harm but seem to have little or no effect on student performance” (Luginbuhl, Webbink and de Wolf 2009:235).

The other studies reviewed report a mix of conclusions, including an association between inspection and a lowering of exam results (Cullingford, Daniels and Brown 1999, Rosenthal 2003), and the raising of exam results among already high-achieving schools (Shaw et al. 2003), and in schools judged to be failing (Allen and Burgess 2012, Hussain 2012).

5.3.5. Unintended consequences of school inspections

Finally, evidence about unintended consequences of school inspection is considered. Here a number of ways that schools might end up engaging in either intended or unintended strategic behaviour in the face of inspection is described. The former includes window dressing, fraud, misrepresentation, and other instances of ‘gaming’ the system. In the latter, schools concentrate only on elements they know will be assessed in inspections. They conclude that “these types of behaviour may negatively affect student achievement” (p8, their emphasis).

6. Conclusion

Returning to the question with which this essay began: Is school inspection, and by extension Ofsted, a catalyst for school improvement?

I have presented a summary of the findings of a synthesis of the best available evidence on the effects of school inspection. While there are issues with the methodological quality of many of the studies included in the review, without more studies that allow confident causal inferences Ofsted should nonetheless use this evidence to inform its practice. It is unclear to what extent it does so. The review suggests that quality feedback from inspectors is helpful, so it is encouraging that Ofsted has attempted to improve the way it feeds back to teachers (Chapman 2001). The evidence also suggests that inspection helps very low and very high performing schools to improve; recent changes to Ofsted now place greater emphasis on inspecting such schools (Exley 2014b). In contrast however, the review’s conclusions about support for improvement, (non)publication of reports, and the validity and reliability of lesson observations appear to contradict what Ofsted actually does (e.g. Waldergrave and Simons 2014, Courtney 2012, Coe 2013a).

The lack of quality research allowing confident causal inferences about the effects of inspections lends weight to Coe’s (2013a) assertion that it is Ofsted’s responsibility to prove that its judgements are correct, not for its critics to prove otherwise. I agree – to an extent. If Ofsted is to be taken seriously then it must draw on empirical evidence generated by high quality studies to support what it does. In the absence of such evidence it should make vigorous calls for such experiments to be conducted.

However, I also feel that teachers’ unions are letting their members down if they do not hold themselves to an equally high standard when criticising Ofsted. They have a professional obligation to refer to the best evidence to support their assertions, rather than issue non-specific proclamations about ‘fitness for purpose’, citing anecdotal evidence (NASUWT 2014) and using rhetoric more suited to the dispatch box (Exley 2014a). Unions must be equally vocal in calling for high quality empirical research to be conducted, to allow the profession to be confident in its assessment of the effects of inspection. Until such time, the collective uncertainty summarised by the quotes opening this essay remains.


The three other systematic reviews considered for this essay are marked in the references using an asterisk.


Michael Wilshaw quote:

Burnadette Hunter quote:

