Category: research

(When) Is it OK to deny half of your pupils a promising new teaching approach?

One of the objections sometimes voiced about conducting RCTs in education is that they are unethical. This, it is often claimed, is because requiring a control group (a group of children who do not receive the teaching approach that you wish to evaluate) wilfully ‘denies’ that group access to the new teaching approach. I encountered this argument at a seminar session at BERA 2016, and explored my response to it this post. I’ve been thinking about it some more.

First, it is important to stress that there is nothing peculiar to RCTs about this allegedly unethical behaviour. Research of any design that evaluates the effects of a new teaching approach by comparing it to an alternative involves ‘denying’ some children access to the new approach. Matched pairs designs, regression discontinuity designs, non-random comparisons, multiple baseline interrupted time series, stepped wedge designs, and designs with a waitlist control, all ‘deny’ the new approach to some children, either for the duration of the evaluation or for a portion of it. Equally, all designs (experiments, quasi-experiments, and non-experimental observations) ‘deny’ access to the new approach to any children who are not in the study. Moreover, and by the same logic, children receiving the experimental approach in any of the above contexts are ‘denied’ the control approach for the duration of the study.

This last point brings us to my second, more germane, observation. Implicit in the argument that children in a control group are ‘denied’ the new approach is the assumption that new approaches are always superior to existing approaches. The argument fails to acknowledge the possibility that a new approach might be inferior to existing approaches. It fails, therefore, to acknowledge that children may be harmed by exposure to a new teaching approach. The argument that control groups are unethical is, therefore, lopsided. By the standard applied here, we must acknowledge that ‘denying’ children access to the control approach is unethical as well, as exposure to the new approach may be harmful. This does not leave us in a very informed position.

A recent tweet by Vinay Prasad (see below) explored this argument in relation to the use of sham surgical procedures in evaluations of new surgical approaches. Here, a new surgical procedure is compared to a surgical placebo. Basically, the members of the control group undergo a surgical procedure (anaesthesia, incision, stitching, etc.) but do not undergo the actual surgery. This is important because the potential harms of undergoing surgery may outweigh potential benefits of the surgery itself. For example, in the treatment of prostate cancer, surgery to remove the cancer can cause complications such as incontinence and impotence without changing the life expectancy of the patient (many men die with prostate cancer, not because of it). Without a sham procedure we are less well informed about the relative quality of life following the surgery and, therefore, we are in a less well-informed position to decide the best course of action for the patient.

Prasad illustrated this in the flow diagram below, making the point that introducing a new surgical procedure without comparing it to an alternative (sham) procedure risks harming patients for no apparent reason.

Harms in education are less easy to spot. They are rarely as dramatic as some of those found in medicine, but they do exist (see, for example, evaluations of Chatterbooks and Mate-Tricks).

It is perhaps better to think in terms of opportunity costs. For example, a child taught using an approach that is less effective than available alternatives may still make progress, but at a slower rate. Or, if a child spends time away from their mainstream classroom to receive a targeted intervention, they miss whatever is going on in their classroom during that time. Or, in cases where a new approach is no better and no worse than existing approaches, there are costs of time, money and effort associated with changing the way teachers teach. What could be done with that time, money and effort instead of implementing the new approach for no relative gain in primary outcomes?

Building on Prasad’s work, I mapped out my thoughts on the use of control groups in education. See below.


Ethics of Control Groups in Edcuation

For those who argue that ‘denying’ children access to a new teaching approach is unethical, I invite them to consider the question at the bottom of that diagram. My own position is that when there is uncertainty about the effects of a new teaching approach, the only ethical course of action is to evaluate it in relation to the best available alternative, which necessitates having a control group.

A final note on RCTs. As I have said, there is nothing peculiar to RCTs about ‘denying’ an approach to some children while making it available to others. What is peculiar to RCTs is the method by which children are allocated to different approaches. The single defining feature of an RCT is that children are allocated to alternatives fairly. No child (or school or classroom if you are doing a clustered RCT) stands a better or worse chance of being allocated to either the experimental or control group than any other child when the decision is a random one. That’s the whole point of randomisation.

When my brother and I were children, our dad used to toss a coin to decide which of us got lumbered with the washing up after family meals. While it never felt fair to the one who ended up wearing the Marigolds, we could hardly argue with the ethics of our dad’s method of choosing. By the same token, when we must decide who receives what in a comparison of alternative teaching approaches, I contend that random allocation is not just the most effective way of creating unbiased comparison groups, but it is the most ethical way, too.

For anyone interested, you can download a PDF version of my diagram here. Feedback welcome.


Misunderstandings and misrepresentations of randomised trials and what is claimed for them

I recently read this blog post, which presents and expands on Angus Deaton and Nancy Cartwright’s recent thoughts about RCTs. It is worth a read because, in my view, it is an excellent case study of the misunderstanding and misrepresentation of the the claims made in favour of RCTs. Deaton and Cartwright must take the lion’s share of the blame here, not the author of the post, though he does add to the commentary some startling assertions. Here are two of many:

RCTs do not have external validity.


A key argument in favour of randomization is the ability to blind both those receiving the treatment and those administering it.

Both assertions are just plain wrong. Although the author does demonstrate a more nuanced understanding of the value of randomised trials elsewhere:

The results of RCTs must be integrated with other knowledge, including the
practical wisdom of policy makers if they are to be usable outside the context in which they were constructed.

In a response to a comment I made on his post suggesting that Deaton’s and Cartwright’s arguments unfairly target randomised trials, when the criticisms they make are equally applicable to all kinds of intervention research, the author responded thusly:


I think that this argument fails to acknowledge the single defining feature of a randomised trial and also misrepresents what is claimed for them by people who the author has called ‘randomistas’. Moreover, what Deaton considers to be the position of so called ‘randomistas’, specifically, is irrelevant (or at best thoughtless) when his criticisms are not actually of randomised trials but of all types of research.

My response to the above comment is reproduced below.

You may be correct about how Deaton views ‘randomistas’, but if so, he really needs give examples of people claiming that the results of RCTs are superior to results of obtained using other methods. I am a proud ‘randomista’ and I work with a lot of people who might be classified as such, and the idea that people like me say that the results of RCTs are always superior to alternative methods is just not a familiar one. In fact when reading reports of RCTs it is common to find loads of caveats about the findings.

People who understand what RCTs are and what they are not know that the only unique feature of the design is that they generate comparison groups by randomly allocating cases to conditions. That’s it.

I don’t think it is controversial for ‘randomistas’ to argue that this is the best way of generating comparison groups that differ only as a result of the play of chance, rather than as a result of some systematic (non-random) characteristic. In any population there will be things that we know and can measure (so for example we could deliberately match cases based on these factors – say age, gender, or test scores). But there are also things that might be relevant that we don’t or can’t know about our participants and therefore can’t take into account when generating comparison groups. If we accept that there are things that we don’t or can’t know about our participants, then the only way around it, if you want to create probabilistically similar groups, is to use random allocation. Random allocation thus acknowledges and accounts for the limitations of our knowledge.

So, the notion of ‘superiority’ centres around the question ‘how confident am I that the groups being compared were similar in all important known and unknown (and possibly unknowable) characteristics?’

Of course, if your research question is one that does not involve comparisons and causal description then RCTs are not appropriate. You would be hard pressed to find a ‘randomista’ arguing that you need an RCT to help understand the views or opinions of a population of interest, for example. In addition you will be unlikely to find a ‘randomista’ arguing that you need an RCT when observational studies have reported very dramatic effects. Take for example the tired old chestnut about not needing an RCT to find out if parachutes work. 99.9% of people who do not open their parachutes after jumping out of a plane die. This is a highly statistically significant finding and is extremely dramatic. There is no need to go beyond observation here.

Unfortunately for us, the effects of interventions in the social sciences are rarely so dramatic. Therefore, one key element in making casual inferences is ensuring that when we compare alternative interventions or approaches we are, in the best way we know how, comparing like with like. This means that any differences in outcome that we observe between groups can be more confidently attributed to the interventions being compared rather than to an effect of non-random differences between groups.

That’s the strength of an RCT.

A Three Hundred Year Old Disagreement

It was my birthday yesterday and I was given a really fantastic present.

Hoole - Title Page.jpg

This is the title page of a book originally published in 1684, though my version is an impression from 1700. It is a text book for learners of Latin containing Latin colloquies (passages of everyday speech). Not untypical for publications of the time, the title of the book is 58 words long (see above).

Hoole - Excerpt.jpg

What is really cool about those 58 words are the 19 pictured above.

Now compare them to these words taken from the founding statutes of Adams’ Free Grammar School in Shropshire in 1656.

Fifteenth rule (combined scans).jpg

Fifteenthly No scholars that have attained to such a progress in learning as to be able to speak Latin, shall neither within School or without, when they are among the Scholars of the same or a higher form, speak English. And that the Master shall appoint which are the forms, that shall observe this order of speaking Latin, and shall take care that it be observed and due correction given to those that do neglect it.

Now, these 17th century expressions of the place a language learner’s mother tongue has in his or her education might seem familiar to teachers of language learners in the 21st century.

Here is England’s DfE on mother tongue use from 2006:

Screen Shot 2015-11-24 at 14.52.37.png

And here is the first principle of inlingua language schools’ method from 2015:

Screen Shot 2015-11-24 at 14.55.29.png

So, who’s right? Do we have any empirical evidence to settle this disagreement, a disagreement at least 331 years old? Are there any empirical studies that compare the effects of allowing children to use their mother tongue with the effects of forbidding it? We know that bilingual schools, on average, produce better linguistic and academic results for language learners than do all-English schools. But does this mean that allowing mother tongue use in language classes or in mainstream classes has the same effect?

I think it’s time this ancient argument was settled. And that’s what I’m hoping to achieve with my research.

Asking for Evidence

Yesterday my attention was drawn to an article on the Wales Online news website that reports on a letter written to a local newspaper by Toby Belfield, headteacher of Ruthin School in Denbighshire, North Wales. In the letter he responds to a parent’s suggestion that the language of instruction in schools in Wales should be Welsh by suggesting that by “forcing young people to learn both English and Welsh (arguably, both to a substandard level) is that young people in Wales will continue to be educationally weaker than their peers in England and abroad.” (Williams 2015a).

This is an interesting claim and one which could be supported or refuted by empirical evidence.

Toby Belfield's letter

So, in the spirit of encouraging members of the educational community to base their assertions on evidence, I decided to ask Toby Belfield for the evidence upon which he based those in his letter. To do this I used the excellent Ask For Evidence website. The Ask For Evidence initiative was set up by Sense About Science to help “people request for themselves the evidence behind news stories, marketing claims and policies” (AskForEvidence 2011).

Here is the text of my request for evidence:

Dear Mr Belfield

I read with interest about the the disagreement being voiced between you and some members of the Welsh community with regards to the effects that Welsh language education has on the attainment of children educated in Wales (Wales Online 14 May 2015).

I am a doctoral researcher in Oxford, with a particular interest in the way that first and second languages interact during the learning process. My thesis investigates whether making use of a child’s first language in an otherwise monolingual English speaking environment is helpful, harmful or makes no difference.

I am currently preparing a systematic review of empirical studies that have investigated this question. This involves as exhaustive a search as possible to find reports of studies addressing this issue, so that a comprehensive picture of the effects can be assembled, as free as possible from bias.

I would be very grateful if you felt able to provide me with the evidence upon which your comments to the press in Wales were based. This would allow me to incorporate into my review studies that I may have missed.

As an educator of some 20 years prior to going into research, I feel that it is absolutely vital that decisions in education are based on sound empirical evidence, so that we as teachers can be confident that what we are doing is as effective as possible in raising attainment of the children to whom we are responsible.

Please note that in addition to my own research, I am asking this as part of the Ask for Evidence campaign and will share the response I get publicly.

I look forward to hearing from you.

Yours sincerely

Hamish Chalmers

I think that it is unlikely that Mr Belfield will be able to provide me with evidence to support his claim that learning two languages in and of itself is harmful to the academic prospects of Welsh children. The evidence of which I am aware would suggest either that it makes no difference in the long term, or that it is beneficial. But this evidence is derived in the main from studies conducted in the USA or Canada, and so he may be aware of similar studies carried out in Wales that would suggest otherwise.

Where he might have a point is in his assertion that teaching Welsh and English “arguably, both to a substandard level” (Williams 2015a) could have negative effects of the overall attainment of Welsh children. This, though, is not a question of what one teaches, but rather the competence with which it is taught. Indeed, reading between the lines of his letter, there is more than language of instruction that he is worried about, and he makes a number of further claims on which he might be reasonably challenged.

I sympathise with his question later in his letter asking “Why is the Welsh education system one of the weakest in the world?”(ibid). He makes a leap of logic though to suggest that it is weak because of the necessity in some schools in Wales to learn Welsh. But that – in the absence of any evidence – is merely conjecture. We shouldn’t be basing educational claims on conjecture.

You can follow my AskForEvidence request here if you’re interested, and I’ll be updating on my blog as the request progresses.


As I was writing this post I saw that Mr Belfield has made an unreserved apology to people who were offended by his letter. He modified his position somewhat by stating that “If all teachers had to not only speak Welsh, but had to teach through the medium of Welsh, then the pool of teachers available to work in Welsh schools will dramatically be reduced, and high quality academics will not necessarily be able to guide/teach our Welsh children.” (Williams 2015b). This is a not unreasonable position to adopt, and further study to determine the Welsh language proficiency of teachers in the Welsh system would be warranted to to assess whether it is defensible.

The baying crowd on social media has forced this headteacher to apologise for offence he has caused. If people are genuinely offended by the personal opinions of a private school headteacher, then I guess that’s their prerogative. For me the bigger offence is to not take account of the evidence when commentating on approaches to education.


AskForEvidence (2011) Ask For Evidence. Available online at [Accessed 15.05.2015]

Williams K (2015a) ‘Forcing pupils to learn Welsh will keep them weaker than English counterparts’ Private school head causes outcry with language claim. Wales Online available online at [Accessed 15.05.2015]

Williams K (2015b) Unreserved apology from Ruthin private school head who sparked Welsh language row. Wales Online. Available online at [Accessed 15.05.2015]