I recently read this blog post, which presents and expands on Angus Deaton and Nancy Cartwright’s recent thoughts about RCTs. It is worth a read because, in my view, it is an excellent case study of the misunderstanding and misrepresentation of the the claims made in favour of RCTs. Deaton and Cartwright must take the lion’s share of the blame here, not the author of the post, though he does add to the commentary some startling assertions. Here are two of many:
RCTs do not have external validity.
A key argument in favour of randomization is the ability to blind both those receiving the treatment and those administering it.
Both assertions are just plain wrong. Although the author does demonstrate a more nuanced understanding of the value of randomised trials elsewhere:
The results of RCTs must be integrated with other knowledge, including the
practical wisdom of policy makers if they are to be usable outside the context in which they were constructed.
In a response to a comment I made on his post suggesting that Deaton’s and Cartwright’s arguments unfairly target randomised trials, when the criticisms they make are equally applicable to all kinds of intervention research, the author responded thusly:
I think that this argument fails to acknowledge the single defining feature of a randomised trial and also misrepresents what is claimed for them by people who the author has called ‘randomistas’. Moreover, what Deaton considers to be the position of so called ‘randomistas’, specifically, is irrelevant (or at best thoughtless) when his criticisms are not actually of randomised trials but of all types of research.
My response to the above comment is reproduced below.
You may be correct about how Deaton views ‘randomistas’, but if so, he really needs give examples of people claiming that the results of RCTs are superior to results of obtained using other methods. I am a proud ‘randomista’ and I work with a lot of people who might be classified as such, and the idea that people like me say that the results of RCTs are always superior to alternative methods is just not a familiar one. In fact when reading reports of RCTs it is common to find loads of caveats about the findings.
People who understand what RCTs are and what they are not know that the only unique feature of the design is that they generate comparison groups by randomly allocating cases to conditions. That’s it.
I don’t think it is controversial for ‘randomistas’ to argue that this is the best way of generating comparison groups that differ only as a result of the play of chance, rather than as a result of some systematic (non-random) characteristic. In any population there will be things that we know and can measure (so for example we could deliberately match cases based on these factors – say age, gender, or test scores). But there are also things that might be relevant that we don’t or can’t know about our participants and therefore can’t take into account when generating comparison groups. If we accept that there are things that we don’t or can’t know about our participants, then the only way around it, if you want to create probabilistically similar groups, is to use random allocation. Random allocation thus acknowledges and accounts for the limitations of our knowledge.
So, the notion of ‘superiority’ centres around the question ‘how confident am I that the groups being compared were similar in all important known and unknown (and possibly unknowable) characteristics?’
Of course, if your research question is one that does not involve comparisons and causal description then RCTs are not appropriate. You would be hard pressed to find a ‘randomista’ arguing that you need an RCT to help understand the views or opinions of a population of interest, for example. In addition you will be unlikely to find a ‘randomista’ arguing that you need an RCT when observational studies have reported very dramatic effects. Take for example the tired old chestnut about not needing an RCT to find out if parachutes work. 99.9% of people who do not open their parachutes after jumping out of a plane die. This is a highly statistically significant finding and is extremely dramatic. There is no need to go beyond observation here.
Unfortunately for us, the effects of interventions in the social sciences are rarely so dramatic. Therefore, one key element in making casual inferences is ensuring that when we compare alternative interventions or approaches we are, in the best way we know how, comparing like with like. This means that any differences in outcome that we observe between groups can be more confidently attributed to the interventions being compared rather than to an effect of non-random differences between groups.
That’s the strength of an RCT.
Yesterday the Royal Statistical Society hosted a seminar entitled The Rise of RCTs in Education: Statistical and Practical Challenges. The panel consisted of four researchers with a great deal of experience of conducting investigations into the effects and effectiveness of educational interventions using the design. They were Carole Torgerson of the University of Durham, Ben Styles of the NfER, Vic Menzies of the CEM at University of Durham, and Kevan Collins of The Educational Endowment Foundation. I will split up my thoughts on the session for the sake of manageability over the next couple of posts. This first post will concentrate on the part of Carole Torgerson’s talk where she described the history of RCTs and the situation as it stands now, interspersed with some of my thoughts.
What are RCTs?
An RCT (or randomised controlled trial) is a research design that can be used to compare the effects of alternative interventions. The strength in the design, and indeed the only unique thing about it, is that participants in RCTs are randomly allocated to the different interventions being compared. As a consequence, any differences in the characteristics between groups are due to the play of chance rather than due to systematic differences (known or unknown, measurable or unmeasurable). As a result, we can be more confident that like is being compared with like, and, therefore, that any observed differences in outcomes between the groups are due to the intervention.
There are a few circumstances when it is either impossible, unethical or unnecessary to conduct an RCT. But when it is possible, and an appropriate research question is under investigation, RCTs should be first preference. RCTs are superior to other models of ‘what works’ research designs in which either no comparison group is included, so effects of the intervention cannot be disentangled from effects of other things (for example, the passage of time), or where comparison groups are systematically different to each other (for example, participants are from different socio-economic backgrounds) and so like is not being compared with like.
Some people get quite cross about the suggestion that RCTs are useful for educational research. They use words like ‘positivist paradigm’ and they regard RCTs as only applicable to quantitative methods. This is nonsense. For a start, there are no such things as quantitative and qualitative methods, only quantitative and qualitative data. An RCT can be used to create unbiased comparison groups from which qualitative data can be collected just as readily as quantitative data. In addition some people misunderstand the term ‘controlled’ and think that it means something like ‘being in control of every possible influence that may come to bare on the thing that your are studying’. It doesn’t. In the term RCT the word ‘control’ is essentially synonymous with the term ‘comparison group’, to which participants are ‘randomly’ assigned. It has nothing to do with ‘being in control’. Thomas C Chalmers, pioneer of RCTs in medicine (no relation) suggested that RCTs should be recast as randomised-control trials to make the point that the control group is generated by random allocation. Iain Chalmers of the Cochrane Collaboration and James Lind Initiative (relation) thinks that the word ‘controlled’ in this construction is superfluous and that RCTs should be called simply ‘randomised trials’, because participants are randomly allocated not just to the control but all the other arms of the trial as well.
But, I digress.
Only quite recently in the UK have RCTs started to be used to compare the effects of alternative educational interventions.
They are considered by the UK Cabinet Office Behavioural Insights Team to be the best way to determine whether a policy is working (Haynes et al. 2013), and according to the National Foundation for Educational Research (NfER) “should be considered as the first choice to establish whether an intervention works” (Hutchison and Styles 2010:7). In addition the Education Endowment Foundation will only fund ‘what works’ research that uses the design.
Torgerson kicked off the seminar with a short history of RCTs in education, before describing where we are now.
It was interesting to hear that, contrary to received wisdom, RCTs in education pre-date what is regarded by many to be the first large scale RCT in health – the streptomycin trial*. Torgerson suggested that a trial conducted in Purdue University, published in 1931, by JE Walters may have been the first education RCT.
Walters’ trial compared the effects of assigning Seniors at the university to act as counsellors for Freshmen who had shown themselves to be at risk of “scholastic mortality”.
“The 220 delinquent Freshmen were divided into two groups by random sampling. One-half of them were counselled by the Seniors who used definite procedures, and the other half were left to progress as in previous years. These latter constituted the control group.” (Walters 1931:446)
He found that using Seniors as counsellors for Freshmen was more effective than doing nothing and, incidentally, would save the University $891 per year in lost learning (less $77 dollars to pay the Seniors for their time at the going rate of thirty-five cents an hour).
There followed six more RCTs at Purdue before the 1948 streptomycin trial, but then RCTs in education fell out of favour, Torgerson reported. There was a notable blip on the otherwise scant horizon when, in 1985, an RCT involving 6,500 Tennesseean children was begun to assess the effects of class size on children’s attainment. Participating children were randomly allocated to small classes, normal sized classes, or normal sized classes with a paid assistant working with the teacher. Fred Mosteller of Harvard University described this trial as:
“…a controlled experiment which is one of the most important educational investigations ever carried out and illustrates the kind and magnitude of research needed in the field of education to strengthen schools.” (Mosteller 1995:113)
Perhaps because of Mosteller’s urging and the work of people like Robert Boruch, Judith Gueron and Robert Slavin, RCTs, in the nineties and beyond, regained favour in US educational research communities. The USA with Canada are now the world leaders in terms of the number of RCTs that have been conducted.
In the UK, RCTs have taken a little longer to get off the ground.
One important UK RCT that Torgerson’s talk brought to mind, but which she didn’t mention, was conducted in the mid-seventies by Tizard, Schofield and Hewison to assess the effects of asking parents to listen to their children read on a regular basis at home. They randomised classes in inner-city London schools to one of three reading interventions: parent involvement, extra teacher support, and business as usual controls. What they found from that study – that reading at home was the most effective of the three methods compared – went on to inform the practice of teachers sending books home from school that is so unexceptional now that one might be forgiven for thinking that it had always ever been thus.
Aside from that trial, RCTs have been almost non-existent in the UK until very recently. The first RCT funded by the DfE, Torgerson said, was only in 2007 – commissioned by Labour and published under the coalition government in 2010. Under the banner of the government’s Every Child Counts initiative, 409 children were randomly assigned to receive maths tuition using a learning programme called Numbers Count, or to a business as usual control. The trial found statistically significantly higher scores in the Numbers Count group after two years of the intervention.
Then, under the coalition government, the Education Endowment Foundation (EEF) was set up with an initial purse of £125 million to fund research intended to help close the attainment gap between disadvantaged children and their better advantaged peers. They have been instrumental in causing the ‘rise’ of RCTs around which the seminar was based. In the 50 months since the EEF’s foundation the number of government funded RCTs in education has risen from 1 (the Every Child Counts trial) to more than 100. This is a very positive direction to be headed. If this momentum can be sustained, the effect on the quality of the research that informs our education decisions will pay dividends.
This section of the seminar really brought home how fledgling the use of RCTs in education is in the UK, and explains, in part, why there is such resistance to the design in some quarters of the educational research community (unfamiliarity breeds contempt). There is a long way to go, I feel, in helping people more widely to understand the benefits of RCTs when addressing ‘what works’ questions. However we are well on the way.
Next post: All else being equal, well designed RCTs are the best way to compare alternative educational interventions. However, they are not without their challenges. This was addressed by Torgerson, and in talks by other members of the panel, and I will write about them next. In addition, I shall look at some of the examples that have been generated by RCTs in education that have helped us to avoid making dreadful mistakes in what we get our children and teachers to do.
*Though widely assumed to be the first RCT in health, the streptomycin trial is predated by a number of other instances where investigators used unbiased allocation schedules in comparisons of treatments. See The James Lind Library for more.
Haynes L, Service O, Goldacre B and Torgerson D (2012) Test, Learn, Adapt: Developing Public Policy with Randomised Controlled Trials. London: Cabinet Office, Behavioural Insights Team
Hutchison D and Styles B (2010) A Guide to Running Randomised Controlled Trials for Educational Researchers. Slough: NFER
Mosteller F, (1995) The Tennessee Study of Class Size in the Early School Grades. Critical Issues for Children and Youths. 5:2 113-127
Tizard J, Schofield WN, and Hewson J (1988) Collaboration between teachers and parents in assisting children’s reading. British Journal of Educational Psychology 52:1 1-15
It is received wisdom in the EAL teaching community that making systematic use of EAL children’s mother tongues while carrying out class work helps them to do better at school. The theoretical basis for this position is Cummins’ (1979 and 1980) theories of linguistic interdependence and common underlying proficiency. From the underpinning assumptions of these theories, a number of specific recommendations for practice have been extrapolated … with varying quantities and quality of empirical evidence to support them. In conducting my MA dissertation, I sought to assess the effects of one such recommendation (made by the UK DfE, among others), for which I did not find convincing empirically derived support in my literature review.
To feed back to the people who helped me to conduct my trial (students, parents, teachers and others) I wrote a ‘plain language summary’ of it, outlining the background, methods and findings. Here it is:
Do children for whom English is an additional language do better in school if they are given opportunities to use their home languages while doing school work?*
Plain language summary
Hamish Chalmers 2014
Why did I do the study?
Teachers of children for whom English is an additional language are sometimes advised to provide opportunities for those children to use their home languages while doing school work. This is because some people believe that using home languages helps to improve those children’s English, as well as helping them in other areas of learning. However, teachers are sometimes advised to forbid the use of languages that are not English at school. This is because some people believe that the best way to get better at English is to use only English. No one knows which of these conflicting pieces of advice is more likely to help children in British schools. Evidence from research is needed to help us to find out.
Unfortunately, the results of research are not very clear. This is partly because a lot of it has been done in schools that are very different to British schools. For example, a lot of research has been done in schools in the USA that provide lessons in both Spanish and English for Hispanic children (bilingual schools); we cannot be confident that the findings of this research can be applied directly to non-bilingual schools in Britain.
Some of the research used to support the belief that children should be given opportunities to use their home languages while doing school work has measured things other than the effect on the quality of English and other learning. For example, researchers have asked students how happy they are when they are allowed to use their home languages. This does not tell us if using home languages helps them get better results at school.
Only a small number of studies have looked at the effects of using home languages on the quality of students’ English and other learning. However, the number of studies claiming that this helps is about the same as the number of studies claiming that it has no effect or that it hinders success. In addition, some of these studies have been done with university students, not school children, and most of them have been done outside of the UK.
I conducted the study summarised here to add to the small amount of research on the effects of structured use of home languages on the quality of students’ English and other learning, and to help teachers in British schools to decide whether they should act on advice to make use of children’s home languages when they do school work.
Who participated in the study?
At one state primary school in Oxford, UK, I invited the parents of all of the children registered as having English as an additional language in Years 1 and 2 (ages 5 to 7) to give permission for their children to take part in the study. Thirty-six children took part out of a total of 45 who were invited. The children were allocated at random to one of two groups. Depending on the group to which they had been allocated, the children then carried out a learning task using either their home language, or only English.
What was the learning task?
The British government’s Department for Education includes the following in its advice to teachers of children for whom English is an additional language:
“Before the Literacy Hour, a bilingual teaching assistant and children [should] talk through the pictures and summarise the story of a Big Book in the home language.”**
There were 17 different languages spoken at home by the children in the study; the teaching assistants at the school do not speak all of these languages. Therefore, the advice above was adapted to invite parents to take on the role of the teaching assistant at home with their child.
I gave each child a picture book to take home to talk through and summarise with their parents. I gave their parents a set of questions to help guide this discussion. One group of parents had the questions written in English. The other group had the questions translated into their home languages.
I asked the parents to use these questions to help talk through and summarise the picture book with their child in the language to which they had been assigned (only English or only their home language). At the end of one week the children wrote a story in English based on what they had understood about the story in the picture book.
I graded the quality of the children’s stories using the school’s normal assessment criteria for English. I also used a measure of English writing proficiency designed specially for use with children who are learning English. I then compared the averages of the scores for each group to see if there were any clear differences between the quality of the stories that had been written by the children in the two comparison groups.
What did I find?
There were some small differences between the average scores of the two groups. However, these were too small to be considered clear differences, and could easily have been due to chance. Furthermore, from the answers given by parents and children to a questionnaire taken at the end of the study, it was not clear whether they had always carried out the task in the way intended.
What do the results mean?
This small study does not resolve the question of which language children for whom English is an additional language should use in this kind of learning task. So, more research is needed to guide teachers on what is most helpful for these children.
This study involved a small number of children and was conducted over a short period of time. To be more confident about the effects of using home languages in the way described, further studies should be done, involving a larger number of children, lasting for a longer period of time, and with support for parents in how to carry out the tasks.
I am very grateful to the children, parents and teachers who agreed to help me try to resolve the question of home language use in British schools by taking part in this study. I am also indebted to the many friends who helped to translate the discussion questions into the 17 different languages used by the children in the study.
*This is a plain language summary of an MA dissertation entitled Harnessing linguistic diversity in polylingual British-curriculum schools. Do L1 mediated home learning tasks improve learning outcomes for bilingual children? A randomised trial. A copy of the full report is available on request.
**Taken from “Home Languages in the Literacy Hour” by Jill Bourne. Published by the Department for Education and Skills, (2002)