Yesterday the Royal Statistical Society hosted a seminar entitled The Rise of RCTs in Education: Statistical and Practical Challenges. The panel consisted of four researchers with a great deal of experience of conducting investigations into the effects and effectiveness of educational interventions using the design. They were Carole Torgerson of the University of Durham, Ben Styles of the NfER, Vic Menzies of the CEM at University of Durham, and Kevan Collins of The Educational Endowment Foundation. I will split up my thoughts on the session for the sake of manageability over the next couple of posts. This first post will concentrate on the part of Carole Torgerson’s talk where she described the history of RCTs and the situation as it stands now, interspersed with some of my thoughts.
What are RCTs?
An RCT (or randomised controlled trial) is a research design that can be used to compare the effects of alternative interventions. The strength in the design, and indeed the only unique thing about it, is that participants in RCTs are randomly allocated to the different interventions being compared. As a consequence, any differences in the characteristics between groups are due to the play of chance rather than due to systematic differences (known or unknown, measurable or unmeasurable). As a result, we can be more confident that like is being compared with like, and, therefore, that any observed differences in outcomes between the groups are due to the intervention.
There are a few circumstances when it is either impossible, unethical or unnecessary to conduct an RCT. But when it is possible, and an appropriate research question is under investigation, RCTs should be first preference. RCTs are superior to other models of ‘what works’ research designs in which either no comparison group is included, so effects of the intervention cannot be disentangled from effects of other things (for example, the passage of time), or where comparison groups are systematically different to each other (for example, participants are from different socio-economic backgrounds) and so like is not being compared with like.
Some people get quite cross about the suggestion that RCTs are useful for educational research. They use words like ‘positivist paradigm’ and they regard RCTs as only applicable to quantitative methods. This is nonsense. For a start, there are no such things as quantitative and qualitative methods, only quantitative and qualitative data. An RCT can be used to create unbiased comparison groups from which qualitative data can be collected just as readily as quantitative data. In addition some people misunderstand the term ‘controlled’ and think that it means something like ‘being in control of every possible influence that may come to bare on the thing that your are studying’. It doesn’t. In the term RCT the word ‘control’ is essentially synonymous with the term ‘comparison group’, to which participants are ‘randomly’ assigned. It has nothing to do with ‘being in control’. Thomas C Chalmers, pioneer of RCTs in medicine (no relation) suggested that RCTs should be recast as randomised-control trials to make the point that the control group is generated by random allocation. Iain Chalmers of the Cochrane Collaboration and James Lind Initiative (relation) thinks that the word ‘controlled’ in this construction is superfluous and that RCTs should be called simply ‘randomised trials’, because participants are randomly allocated not just to the control but all the other arms of the trial as well.
But, I digress.
Only quite recently in the UK have RCTs started to be used to compare the effects of alternative educational interventions.
They are considered by the UK Cabinet Office Behavioural Insights Team to be the best way to determine whether a policy is working (Haynes et al. 2013), and according to the National Foundation for Educational Research (NfER) “should be considered as the first choice to establish whether an intervention works” (Hutchison and Styles 2010:7). In addition the Education Endowment Foundation will only fund ‘what works’ research that uses the design.
Torgerson kicked off the seminar with a short history of RCTs in education, before describing where we are now.
It was interesting to hear that, contrary to received wisdom, RCTs in education pre-date what is regarded by many to be the first large scale RCT in health – the streptomycin trial*. Torgerson suggested that a trial conducted in Purdue University, published in 1931, by JE Walters may have been the first education RCT.
Walters’ trial compared the effects of assigning Seniors at the university to act as counsellors for Freshmen who had shown themselves to be at risk of “scholastic mortality”.
“The 220 delinquent Freshmen were divided into two groups by random sampling. One-half of them were counselled by the Seniors who used definite procedures, and the other half were left to progress as in previous years. These latter constituted the control group.” (Walters 1931:446)
He found that using Seniors as counsellors for Freshmen was more effective than doing nothing and, incidentally, would save the University $891 per year in lost learning (less $77 dollars to pay the Seniors for their time at the going rate of thirty-five cents an hour).
There followed six more RCTs at Purdue before the 1948 streptomycin trial, but then RCTs in education fell out of favour, Torgerson reported. There was a notable blip on the otherwise scant horizon when, in 1985, an RCT involving 6,500 Tennesseean children was begun to assess the effects of class size on children’s attainment. Participating children were randomly allocated to small classes, normal sized classes, or normal sized classes with a paid assistant working with the teacher. Fred Mosteller of Harvard University described this trial as:
“…a controlled experiment which is one of the most important educational investigations ever carried out and illustrates the kind and magnitude of research needed in the field of education to strengthen schools.” (Mosteller 1995:113)
Perhaps because of Mosteller’s urging and the work of people like Robert Boruch, Judith Gueron and Robert Slavin, RCTs, in the nineties and beyond, regained favour in US educational research communities. The USA with Canada are now the world leaders in terms of the number of RCTs that have been conducted.
In the UK, RCTs have taken a little longer to get off the ground.
One important UK RCT that Torgerson’s talk brought to mind, but which she didn’t mention, was conducted in the mid-seventies by Tizard, Schofield and Hewison to assess the effects of asking parents to listen to their children read on a regular basis at home. They randomised classes in inner-city London schools to one of three reading interventions: parent involvement, extra teacher support, and business as usual controls. What they found from that study – that reading at home was the most effective of the three methods compared – went on to inform the practice of teachers sending books home from school that is so unexceptional now that one might be forgiven for thinking that it had always ever been thus.
Aside from that trial, RCTs have been almost non-existent in the UK until very recently. The first RCT funded by the DfE, Torgerson said, was only in 2007 – commissioned by Labour and published under the coalition government in 2010. Under the banner of the government’s Every Child Counts initiative, 409 children were randomly assigned to receive maths tuition using a learning programme called Numbers Count, or to a business as usual control. The trial found statistically significantly higher scores in the Numbers Count group after two years of the intervention.
Then, under the coalition government, the Education Endowment Foundation (EEF) was set up with an initial purse of £125 million to fund research intended to help close the attainment gap between disadvantaged children and their better advantaged peers. They have been instrumental in causing the ‘rise’ of RCTs around which the seminar was based. In the 50 months since the EEF’s foundation the number of government funded RCTs in education has risen from 1 (the Every Child Counts trial) to more than 100. This is a very positive direction to be headed. If this momentum can be sustained, the effect on the quality of the research that informs our education decisions will pay dividends.
This section of the seminar really brought home how fledgling the use of RCTs in education is in the UK, and explains, in part, why there is such resistance to the design in some quarters of the educational research community (unfamiliarity breeds contempt). There is a long way to go, I feel, in helping people more widely to understand the benefits of RCTs when addressing ‘what works’ questions. However we are well on the way.
Next post: All else being equal, well designed RCTs are the best way to compare alternative educational interventions. However, they are not without their challenges. This was addressed by Torgerson, and in talks by other members of the panel, and I will write about them next. In addition, I shall look at some of the examples that have been generated by RCTs in education that have helped us to avoid making dreadful mistakes in what we get our children and teachers to do.
*Though widely assumed to be the first RCT in health, the streptomycin trial is predated by a number of other instances where investigators used unbiased allocation schedules in comparisons of treatments. See The James Lind Library for more.
Haynes L, Service O, Goldacre B and Torgerson D (2012) Test, Learn, Adapt: Developing Public Policy with Randomised Controlled Trials. London: Cabinet Office, Behavioural Insights Team
Hutchison D and Styles B (2010) A Guide to Running Randomised Controlled Trials for Educational Researchers. Slough: NFER
Mosteller F, (1995) The Tennessee Study of Class Size in the Early School Grades. Critical Issues for Children and Youths. 5:2 113-127
Tizard J, Schofield WN, and Hewson J (1988) Collaboration between teachers and parents in assisting children’s reading. British Journal of Educational Psychology 52:1 1-15