Misunderstandings and misrepresentations of randomised trials and what is claimed for them

I recently read this blog post, which presents and expands on Angus Deaton and Nancy Cartwright’s recent thoughts about RCTs. It is worth a read because, in my view, it is an excellent case study of the misunderstanding and misrepresentation of the the claims made in favour of RCTs. Deaton and Cartwright must take the lion’s share of the blame here, not the author of the post, though he does add to the commentary some startling assertions. Here are two of many:

RCTs do not have external validity.


A key argument in favour of randomization is the ability to blind both those receiving the treatment and those administering it.

Both assertions are just plain wrong. Although the author does demonstrate a more nuanced understanding of the value of randomised trials elsewhere:

The results of RCTs must be integrated with other knowledge, including the
practical wisdom of policy makers if they are to be usable outside the context in which they were constructed.

In a response to a comment I made on his post suggesting that Deaton’s and Cartwright’s arguments unfairly target randomised trials, when the criticisms they make are equally applicable to all kinds of intervention research, the author responded thusly:


I think that this argument fails to acknowledge the single defining feature of a randomised trial and also misrepresents what is claimed for them by people who the author has called ‘randomistas’. Moreover, what Deaton considers to be the position of so called ‘randomistas’, specifically, is irrelevant (or at best thoughtless) when his criticisms are not actually of randomised trials but of all types of research.

My response to the above comment is reproduced below.

You may be correct about how Deaton views ‘randomistas’, but if so, he really needs give examples of people claiming that the results of RCTs are superior to results of obtained using other methods. I am a proud ‘randomista’ and I work with a lot of people who might be classified as such, and the idea that people like me say that the results of RCTs are always superior to alternative methods is just not a familiar one. In fact when reading reports of RCTs it is common to find loads of caveats about the findings.

People who understand what RCTs are and what they are not know that the only unique feature of the design is that they generate comparison groups by randomly allocating cases to conditions. That’s it.

I don’t think it is controversial for ‘randomistas’ to argue that this is the best way of generating comparison groups that differ only as a result of the play of chance, rather than as a result of some systematic (non-random) characteristic. In any population there will be things that we know and can measure (so for example we could deliberately match cases based on these factors – say age, gender, or test scores). But there are also things that might be relevant that we don’t or can’t know about our participants and therefore can’t take into account when generating comparison groups. If we accept that there are things that we don’t or can’t know about our participants, then the only way around it, if you want to create probabilistically similar groups, is to use random allocation. Random allocation thus acknowledges and accounts for the limitations of our knowledge.

So, the notion of ‘superiority’ centres around the question ‘how confident am I that the groups being compared were similar in all important known and unknown (and possibly unknowable) characteristics?’

Of course, if your research question is one that does not involve comparisons and causal description then RCTs are not appropriate. You would be hard pressed to find a ‘randomista’ arguing that you need an RCT to help understand the views or opinions of a population of interest, for example. In addition you will be unlikely to find a ‘randomista’ arguing that you need an RCT when observational studies have reported very dramatic effects. Take for example the tired old chestnut about not needing an RCT to find out if parachutes work. 99.9% of people who do not open their parachutes after jumping out of a plane die. This is a highly statistically significant finding and is extremely dramatic. There is no need to go beyond observation here.

Unfortunately for us, the effects of interventions in the social sciences are rarely so dramatic. Therefore, one key element in making casual inferences is ensuring that when we compare alternative interventions or approaches we are, in the best way we know how, comparing like with like. This means that any differences in outcome that we observe between groups can be more confidently attributed to the interventions being compared rather than to an effect of non-random differences between groups.

That’s the strength of an RCT.

A Three Hundred Year Old Disagreement

It was my birthday yesterday and I was given a really fantastic present.

Hoole - Title Page.jpg

This is the title page of a book originally published in 1684, though my version is an impression from 1700. It is a text book for learners of Latin containing Latin colloquies (passages of everyday speech). Not untypical for publications of the time, the title of the book is 58 words long (see above).

Hoole - Excerpt.jpg

What is really cool about those 58 words are the 19 pictured above.

Now compare them to these words taken from the founding statutes of Adams’ Free Grammar School in Shropshire in 1656.

Fifteenth rule (combined scans).jpg

Fifteenthly No scholars that have attained to such a progress in learning as to be able to speak Latin, shall neither within School or without, when they are among the Scholars of the same or a higher form, speak English. And that the Master shall appoint which are the forms, that shall observe this order of speaking Latin, and shall take care that it be observed and due correction given to those that do neglect it.

Now, these 17th century expressions of the place a language learner’s mother tongue has in his or her education might seem familiar to teachers of language learners in the 21st century.

Here is England’s DfE on mother tongue use from 2006:

Screen Shot 2015-11-24 at 14.52.37.png

And here is the first principle of inlingua language schools’ method from 2015:

Screen Shot 2015-11-24 at 14.55.29.png

So, who’s right? Do we have any empirical evidence to settle this disagreement, a disagreement at least 331 years old? Are there any empirical studies that compare the effects of allowing children to use their mother tongue with the effects of forbidding it? We know that bilingual schools, on average, produce better linguistic and academic results for language learners than do all-English schools. But does this mean that allowing mother tongue use in language classes or in mainstream classes has the same effect?

I think it’s time this ancient argument was settled. And that’s what I’m hoping to achieve with my research.

Asking for Evidence

Yesterday my attention was drawn to an article on the Wales Online news website that reports on a letter written to a local newspaper by Toby Belfield, headteacher of Ruthin School in Denbighshire, North Wales. In the letter he responds to a parent’s suggestion that the language of instruction in schools in Wales should be Welsh by suggesting that by “forcing young people to learn both English and Welsh (arguably, both to a substandard level) is that young people in Wales will continue to be educationally weaker than their peers in England and abroad.” (Williams 2015a).

This is an interesting claim and one which could be supported or refuted by empirical evidence.

Toby Belfield's letter

So, in the spirit of encouraging members of the educational community to base their assertions on evidence, I decided to ask Toby Belfield for the evidence upon which he based those in his letter. To do this I used the excellent Ask For Evidence website. The Ask For Evidence initiative was set up by Sense About Science to help “people request for themselves the evidence behind news stories, marketing claims and policies” (AskForEvidence 2011).

Here is the text of my request for evidence:

Dear Mr Belfield

I read with interest about the the disagreement being voiced between you and some members of the Welsh community with regards to the effects that Welsh language education has on the attainment of children educated in Wales (Wales Online 14 May 2015).

I am a doctoral researcher in Oxford, with a particular interest in the way that first and second languages interact during the learning process. My thesis investigates whether making use of a child’s first language in an otherwise monolingual English speaking environment is helpful, harmful or makes no difference.

I am currently preparing a systematic review of empirical studies that have investigated this question. This involves as exhaustive a search as possible to find reports of studies addressing this issue, so that a comprehensive picture of the effects can be assembled, as free as possible from bias.

I would be very grateful if you felt able to provide me with the evidence upon which your comments to the press in Wales were based. This would allow me to incorporate into my review studies that I may have missed.

As an educator of some 20 years prior to going into research, I feel that it is absolutely vital that decisions in education are based on sound empirical evidence, so that we as teachers can be confident that what we are doing is as effective as possible in raising attainment of the children to whom we are responsible.

Please note that in addition to my own research, I am asking this as part of the Ask for Evidence campaign and will share the response I get publicly.

I look forward to hearing from you.

Yours sincerely

Hamish Chalmers

I think that it is unlikely that Mr Belfield will be able to provide me with evidence to support his claim that learning two languages in and of itself is harmful to the academic prospects of Welsh children. The evidence of which I am aware would suggest either that it makes no difference in the long term, or that it is beneficial. But this evidence is derived in the main from studies conducted in the USA or Canada, and so he may be aware of similar studies carried out in Wales that would suggest otherwise.

Where he might have a point is in his assertion that teaching Welsh and English “arguably, both to a substandard level” (Williams 2015a) could have negative effects of the overall attainment of Welsh children. This, though, is not a question of what one teaches, but rather the competence with which it is taught. Indeed, reading between the lines of his letter, there is more than language of instruction that he is worried about, and he makes a number of further claims on which he might be reasonably challenged.

I sympathise with his question later in his letter asking “Why is the Welsh education system one of the weakest in the world?”(ibid). He makes a leap of logic though to suggest that it is weak because of the necessity in some schools in Wales to learn Welsh. But that – in the absence of any evidence – is merely conjecture. We shouldn’t be basing educational claims on conjecture.

You can follow my AskForEvidence request here if you’re interested, and I’ll be updating on my blog as the request progresses.


As I was writing this post I saw that Mr Belfield has made an unreserved apology to people who were offended by his letter. He modified his position somewhat by stating that “If all teachers had to not only speak Welsh, but had to teach through the medium of Welsh, then the pool of teachers available to work in Welsh schools will dramatically be reduced, and high quality academics will not necessarily be able to guide/teach our Welsh children.” (Williams 2015b). This is a not unreasonable position to adopt, and further study to determine the Welsh language proficiency of teachers in the Welsh system would be warranted to to assess whether it is defensible.

The baying crowd on social media has forced this headteacher to apologise for offence he has caused. If people are genuinely offended by the personal opinions of a private school headteacher, then I guess that’s their prerogative. For me the bigger offence is to not take account of the evidence when commentating on approaches to education.


AskForEvidence (2011) Ask For Evidence. Available online at [Accessed 15.05.2015]

Williams K (2015a) ‘Forcing pupils to learn Welsh will keep them weaker than English counterparts’ Private school head causes outcry with language claim. Wales Online available online at [Accessed 15.05.2015]

Williams K (2015b) Unreserved apology from Ruthin private school head who sparked Welsh language row. Wales Online. Available online at [Accessed 15.05.2015]

The Case for Ofsted: is there one?

“We have done more to raise standards in 21 years of existence than any other organisation.” (Michael Wilshaw, Chief inspector of Schools in England, 2014)

“The reality is that Ofsted is no longer fit for purpose, if it ever was. It […] does nothing to improve the quality of education.” (Bernadette Hunter, National President NAHT, 2013)

1. Introduction

In 1992, John Major’s government, dissatisfied with what it saw as the ‘progressive’ tendencies of existing school inspection systems, established a new school accountability system in England. This new national body, called the Office for Standards in Education (Ofsted), was charged with supervising the inspection of all state funded schools in England. Then, as now, Ofsted’s defining mantra was “improvement through inspection” (Ofsted 1993:8).

From its beginnings Ofsted’s validity, reliability and value for money were questioned by the research community (e.g. Fitz-Gibbon 1995), and its alleged (negative) impacts on morale and professionalism were articulated by teachers (e.g. Brimblecomb, Ormston and Shaw 1995, Jeffery and Woods 1996). In the twenty-one years since its first inspection, attitudes to Ofsted have hardly changed. It is still seen by many as an unaccountable and unregulated body, which bases its judgements on questionable evidence, and which enjoys disproportionate power and influence over schools and teachers (see, for example, comments in Gray and Gardner 1999, Coe 2013a, NASUWT 2014, NUT 2014a).

When I wrote this essay in April 2014 the three main teachers’ unions were holding their annual conferences and, unsurprisingly, all had something to say about Ofsted. Christine Blower, General Secretary of the NUT described it as a “destructive system of inspection” (NUT 2014b). Her counterpart Chris Keates at NASUWT regards Ofsted as “seriously tainted” (NASUWT 2014); and Mary Bousted, General Secretary of ATL has opined that “Ofsted can no longer claim that its inspection reports are worth the paper they are written on” (Exley 2014).

Rhetoric of this kind from those most frequently at the sharp end of Ofsted’s judgements is unsurprising, but the bad blood had well and truly spilled over into government, with Chief Inspector, Michael Wilshaw, accusing then Secretary of State for Education, Michael Gove, of briefing against the inspectorate through think tanks such as Civitas and Policy Exchange (Withnall 2014). While Wilshaw later back-peddled on his accusation (Sellgren 2014), it is true that both organisations had publicly questioned Ofsted’s effectiveness (de Waal 2008, Waldergrave and Simons 2014).

Despite almost continuous criticism from teachers and researchers since its inception, however, Ofsted has enjoyed support from three consecutive governments, four political parties, and eleven Secretaries of State for Education. In 1998, Ofsted’s responsibilities were extended to include inspection of Local Education Authorities (LEAs) and teacher training establishments (Elliot 2012). Then, in 2001, inspection of day care and childminding were added (Plomin 2001). Subsequently, in 2007, inspection of post-16 government-funded education, social care services for children, and the welfare inspection of independent and maintained boarding schools were added to its remit (Carvel and Ward 2007). It is now overseeing voluntary inspection of British Schools Overseas (HMG 2014). In the 21 years of it’s existence, Ofsted’s has formally updated its inspection framework only twice. All of this suggests robust government satisfaction with Ofsted’s effectiveness.

Given the collective uncertainty over Ofsted’s fitness for purpose, summarised in the two quotes opening this essay, what evidence is there that Ofsted meets its defining mantra of ‘Improvement through Inspection’? Drawing on empirical evidence, this essay addresses that question and attempts to assess whether any conclusion can be confidently drawn about the extent to which inspection can be considered a catalyst for school improvement. Continue reading