(When) Is it OK to deny half of your pupils a promising new teaching approach?

One of the objections sometimes voiced about conducting RCTs in education is that they are unethical. This, it is often claimed, is because requiring a control group (a group of children who do not receive the teaching approach that you wish to evaluate) wilfully ‘denies’ that group access to the new teaching approach. I encountered this argument at a seminar session at BERA 2016, and explored my response to it this post. I’ve been thinking about it some more.

First, it is important to stress that there is nothing peculiar to RCTs about this allegedly unethical behaviour. Research of any design that evaluates the effects of a new teaching approach by comparing it to an alternative involves ‘denying’ some children access to the new approach. Matched pairs designs, regression discontinuity designs, non-random comparisons, multiple baseline interrupted time series, stepped wedge designs, and designs with a waitlist control, all ‘deny’ the new approach to some children, either for the duration of the evaluation or for a portion of it. Equally, all designs (experiments, quasi-experiments, and non-experimental observations) ‘deny’ access to the new approach to any children who are not in the study. Moreover, and by the same logic, children receiving the experimental approach in any of the above contexts are ‘denied’ the control approach for the duration of the study.

This last point brings us to my second, more germane, observation. Implicit in the argument that children in a control group are ‘denied’ the new approach is the assumption that new approaches are always superior to existing approaches. The argument fails to acknowledge the possibility that a new approach might be inferior to existing approaches. It fails, therefore, to acknowledge that children may be harmed by exposure to a new teaching approach. The argument that control groups are unethical is, therefore, lopsided. By the standard applied here, we must acknowledge that ‘denying’ children access to the control approach is unethical as well, as exposure to the new approach may be harmful. This does not leave us in a very informed position.

A recent tweet by Vinay Prasad (see below) explored this argument in relation to the use of sham surgical procedures in evaluations of new surgical approaches. Here, a new surgical procedure is compared to a surgical placebo. Basically, the members of the control group undergo a surgical procedure (anaesthesia, incision, stitching, etc.) but do not undergo the actual surgery. This is important because the potential harms of undergoing surgery may outweigh potential benefits of the surgery itself. For example, in the treatment of prostate cancer, surgery to remove the cancer can cause complications such as incontinence and impotence without changing the life expectancy of the patient (many men die with prostate cancer, not because of it). Without a sham procedure we are less well informed about the relative quality of life following the surgery and, therefore, we are in a less well-informed position to decide the best course of action for the patient.

Prasad illustrated this in the flow diagram below, making the point that introducing a new surgical procedure without comparing it to an alternative (sham) procedure risks harming patients for no apparent reason.

Harms in education are less easy to spot. They are rarely as dramatic as some of those found in medicine, but they do exist (see, for example, evaluations of Chatterbooks and Mate-Tricks).

It is perhaps better to think in terms of opportunity costs. For example, a child taught using an approach that is less effective than available alternatives may still make progress, but at a slower rate. Or, if a child spends time away from their mainstream classroom to receive a targeted intervention, they miss whatever is going on in their classroom during that time. Or, in cases where a new approach is no better and no worse than existing approaches, there are costs of time, money and effort associated with changing the way teachers teach. What could be done with that time, money and effort instead of implementing the new approach for no relative gain in primary outcomes?

Building on Prasad’s work, I mapped out my thoughts on the use of control groups in education. See below.


Ethics of Control Groups in Edcuation

For those who argue that ‘denying’ children access to a new teaching approach is unethical, I invite them to consider the question at the bottom of that diagram. My own position is that when there is uncertainty about the effects of a new teaching approach, the only ethical course of action is to evaluate it in relation to the best available alternative, which necessitates having a control group.

A final note on RCTs. As I have said, there is nothing peculiar to RCTs about ‘denying’ an approach to some children while making it available to others. What is peculiar to RCTs is the method by which children are allocated to different approaches. The single defining feature of an RCT is that children are allocated to alternatives fairly. No child (or school or classroom if you are doing a clustered RCT) stands a better or worse chance of being allocated to either the experimental or control group than any other child when the decision is a random one. That’s the whole point of randomisation.

When my brother and I were children, our dad used to toss a coin to decide which of us got lumbered with the washing up after family meals. While it never felt fair to the one who ended up wearing the Marigolds, we could hardly argue with the ethics of our dad’s method of choosing. By the same token, when we must decide who receives what in a comparison of alternative teaching approaches, I contend that random allocation is not just the most effective way of creating unbiased comparison groups, but it is the most ethical way, too.

For anyone interested, you can download a PDF version of my diagram here. Feedback welcome.

If it ain’t backed up in three places, it doesn’t exist.

Tricks and tips for a happy PhD: Using cloudHQ to sync my Dropbox and Google Drive.

Back in the nineties this glorious piece of black comedy did the rounds (in the old fashioned way, via forwarded emails). In it, a man leaves an voice message with a computer repair shop at which he has entrusted his laptop. From the off you know something’s up as his voice shakes, his words are clipped, and you hear the ricktus of his lips as he tries to suppress whatever it is that is barely concealed beneath an obviously unquiet surface. It’s all for nothing though, as the pressure builds and ultimately he explodes into incadescent rage. The repair shop has removed and replaced his hard drive, and in the process lost “Everything I have been working on for the past two G********m f********g years of my life!!!!!!!” Lost in a maelstrom of bellowed expleteives, his complaint only worsens from there.

While it’s hard to resist a chuckle at the plight of this poor soul, your heart really does go out to him. There’s definately a moment of ‘there but for the grace of God …’. Especially considering that back in the nineties the internet barely existed (let alone the cloud), and everything you wanted to back up had to be done on floppy disks, which had a tendency to become corrupt without warning leaving you no better off than if you hadn’t spent an afternoon waiting in mind numbing tedium for a large document to be copied over to a box of half a dozen floppies.

Now we have the cloud, so I guess phone calls like the one above are fewer and further between. However, it is a fool indeed that thinks everything will be just fine without a back up.

I’ve spent the last two years writing a PhD thesis (and still counting). So, my laptop holds not just lots and lots of writing, but all of the materials and data relating to my research. If my hard drive goes pop, I’m never getting all that back. So, I signed up to Dropbox, which allows me to sync pretty much everything on my laptop to the cloud and gives me a bit of piece of mind. However, having been reminded recently of the maxim in the title of this post -‘if it ain’t backed up in three places it doesn’t exist’- I’ve been pondering ways to belt-n’-braces myself and find a third safe place in which to keep the sum total of my PhD life.

My university gives all of its staff and students a big chunk of Google Drive, so this was the obvious choice. The problem is though, how do I get the thesis folder on my hard drive to sync with both Dropbox and Google Drive? I don’t want to have two versions of the folder on my laptop – one for each cloud service. For one, this would take up twice as much room on my laptop (which is already overflowing) and would mean having to manually upload files to Google Drive each time I modified or created them in Dropbox – which defeats the whole purpose of automatic syncing.

Today, however, I came across cloudHQ. cloudHQ acts as a messenger between my Dropbox and Google Drive. It’s great. It has a intuitive interface, where you drag and drop your cloud services into a simple schematic. This then tells cloudHQ which folders on which cloud you want synced with which other folders on which other cloud. The service then just ticks away in the background (whether you’re plugged in and online or not) and makes sure that you have the latest copy of whatever you’ve been working on in both places all the time.cloudHQ

There are loads of other features that help you keep your clouds in line (for example, integrating email accounts, integrating apps, bulk forwarding, and email archiving). But for me, the peace of mind of knowing that my data does exist (in three places) is what I’m after, and cloudHQ does just this.

Now that I’ve got this set up I hope I can safely say that I will never be the man in that harrowing voicemail.

Check out cloudHQ here.

Misunderstandings and misrepresentations of randomised trials and what is claimed for them

I recently read this blog post, which presents and expands on Angus Deaton and Nancy Cartwright’s recent thoughts about RCTs. It is worth a read because, in my view, it is an excellent case study of the misunderstanding and misrepresentation of the the claims made in favour of RCTs. Deaton and Cartwright must take the lion’s share of the blame here, not the author of the post, though he does add to the commentary some startling assertions. Here are two of many:

RCTs do not have external validity.


A key argument in favour of randomization is the ability to blind both those receiving the treatment and those administering it.

Both assertions are just plain wrong. Although the author does demonstrate a more nuanced understanding of the value of randomised trials elsewhere:

The results of RCTs must be integrated with other knowledge, including the
practical wisdom of policy makers if they are to be usable outside the context in which they were constructed.

In a response to a comment I made on his post suggesting that Deaton’s and Cartwright’s arguments unfairly target randomised trials, when the criticisms they make are equally applicable to all kinds of intervention research, the author responded thusly:


I think that this argument fails to acknowledge the single defining feature of a randomised trial and also misrepresents what is claimed for them by people who the author has called ‘randomistas’. Moreover, what Deaton considers to be the position of so called ‘randomistas’, specifically, is irrelevant (or at best thoughtless) when his criticisms are not actually of randomised trials but of all types of research.

My response to the above comment is reproduced below.

You may be correct about how Deaton views ‘randomistas’, but if so, he really needs give examples of people claiming that the results of RCTs are superior to results of obtained using other methods. I am a proud ‘randomista’ and I work with a lot of people who might be classified as such, and the idea that people like me say that the results of RCTs are always superior to alternative methods is just not a familiar one. In fact when reading reports of RCTs it is common to find loads of caveats about the findings.

People who understand what RCTs are and what they are not know that the only unique feature of the design is that they generate comparison groups by randomly allocating cases to conditions. That’s it.

I don’t think it is controversial for ‘randomistas’ to argue that this is the best way of generating comparison groups that differ only as a result of the play of chance, rather than as a result of some systematic (non-random) characteristic. In any population there will be things that we know and can measure (so for example we could deliberately match cases based on these factors – say age, gender, or test scores). But there are also things that might be relevant that we don’t or can’t know about our participants and therefore can’t take into account when generating comparison groups. If we accept that there are things that we don’t or can’t know about our participants, then the only way around it, if you want to create probabilistically similar groups, is to use random allocation. Random allocation thus acknowledges and accounts for the limitations of our knowledge.

So, the notion of ‘superiority’ centres around the question ‘how confident am I that the groups being compared were similar in all important known and unknown (and possibly unknowable) characteristics?’

Of course, if your research question is one that does not involve comparisons and causal description then RCTs are not appropriate. You would be hard pressed to find a ‘randomista’ arguing that you need an RCT to help understand the views or opinions of a population of interest, for example. In addition you will be unlikely to find a ‘randomista’ arguing that you need an RCT when observational studies have reported very dramatic effects. Take for example the tired old chestnut about not needing an RCT to find out if parachutes work. 99.9% of people who do not open their parachutes after jumping out of a plane die. This is a highly statistically significant finding and is extremely dramatic. There is no need to go beyond observation here.

Unfortunately for us, the effects of interventions in the social sciences are rarely so dramatic. Therefore, one key element in making casual inferences is ensuring that when we compare alternative interventions or approaches we are, in the best way we know how, comparing like with like. This means that any differences in outcome that we observe between groups can be more confidently attributed to the interventions being compared rather than to an effect of non-random differences between groups.

That’s the strength of an RCT.

The Double Standard of Ethics in Educational Research

I am at the BERA 2016 conference this week. For an unapologetic empiricist, its a bit of an odd place to be.


Yesterday I attended a small seminar at the conference, convened by BERA’s Practitioner Research Special Interest Group (SIG). There has been a lot of talk recently about the challenges faced in helping teachers to engage with research. A combination of lack of support from senior management, unhelpful writing styles, limited access to journal articles, and perfunctory or non-existent research methods courses in initial teacher education means that teachers tend not to engage fully with the research that is there to help inform what they do. This is in contrast to other, comparable, professions. For example, as a part of the everyday responsibility of being a nurse or a doctor there is the expectation that one will not only keep up to date with relevant research but be actively involved in new research. Not so in teaching.

It was, therefore, fascinating to hear at the seminar about a postgraduate degree delivered in Wales for newly qualified teachers, a part of which reflected the norms for nurses and doctors by helping these teachers to develop their research literacy and engagement. This Masters in Education Practice degree ran over three years and culminated in a research project designed and implemented by the teacher and guided by research mentors.

We were told about one such project where a teacher used an action research approach to explore a method of giving feedback to her students on their work. The project involved eight students in one class, who had been classified as having Additional Learning Needs (Wales’ equivalent of SEN), and ran for six sessions. It appeared to be a great success. The teacher was very happy with the results of her research enquiry and felt that she had tapped into a way to improve the outcomes for her students. At the end of her presentation she described what she felt were the limitations of her study and implications for further research.

I felt that this was a brilliant introduction to research for this teacher. She was clearly extremely switched on and reflective about her practice. Moreover, she demonstrated keen understanding about how her research had informed her teaching and its potential to continue to help her develop as a professional. I asked, therefore, whether she intended to build on this small scale study to explore whether it could be helpful beyond the eight SEN children with whom she had developed her hypothesis.

This is where it got odd. I suggested that it would be interesting to involve all of her classes, to divide them into two groups, give one the promising approach that she had piloted with SEN children and continue to teach the others with her usual approach. Then to compare the results. At this an audible intake of breath filled the room, followed by cries of “Ethics!”. “That would be unethical?” two said in unison. “Why?”, I asked. To which the usual trope was trotted out, asserting that denying an apparently promising teaching approach to one set of children, while delivering it to another group, is ethically indefensible.

Rebuttals to this trope are not new, but it is worth going over them again with reference to this project. First, the ‘ethics criers’ appeared not to see the irony of their position, as they celebrated a research project that delivered a promising intervention to only eight of this teacher’s children while denying it to all the others. Neither did they acknowledge the double standard expressed in the idea that a teacher can deliver whatever untested approach she likes to all of her students without ethical approval, but that if she wants to try it out in only half of them, so that she has a better idea of its effects, she is acting unethically. Moreover, by implication, they express the notion that new teaching approaches are only ever positive, without acknowledging the possibility that some new teaching approaches can have negative effects, or can add nothing to what is already being done.

This raises two ethical issues. First, if we are convinced (preferably by good research) that a new teaching approach is categorically better than existing approaches, then the ‘ethics criers’ are correct, we mustn’t wilfully deny it to children who may benefit from it (for example, all the children in this teachers’ classes who were not in her action research group). However, if uncertainty exists about the effects of a new teaching approach (such as the uncertainties expressed by this teacher when she described the limitations of her study), the only ethical course of action is to assess these effects properly. The best way to assess the effects of a new teaching approach is to compare it with an alternative.

So, to add my voice to the conversation about why teachers don’t engage in research: one possibility is that when motivated and clever young teachers are told (by the Practitioner Research SIG of the British Education Research Association of all things) that they are entitled to develop thoughtful educational theories but that they are not entitled to test those theories properly, we are failing to educate them about educational research and we are, therefore, enforcing an embargo on their professional development in this regard. The promise of the Masters in Education Practice to raise research literacy in teachers was fatally compromised by this peculiar attitude to what is and what is not ethical in educational research.

How relevant is our research base to teaching EAL learners in the UK?

My guest post for NALDIC’s EAL Journal blog

EAL Journal

There has been a surge in EAL research over the past couple of years, but is it enough to give a solid evidence-base for teaching? In a guest post, Hamish Chalmers argues that it is not – yet – but sees much to be optimistic about for the future.

View original post 1,164 more words

Welsh Language in Ruthin Under Threat?

Interesting developments from North Wales. Earlier this year the council decided it would shut the Denbighshire town of Ruthin’s two primary schools and merge them into one new school. One of the schools earmarked for closure is a ‘Category One’, Welsh language school. The other school is a ‘Category Two’ dual stream Welsh/English school, where parents choose either to have their children educated in English or Welsh. Apparently, 80% of parents with children currently in this school choose the Welsh stream. The proposed merged school is also set to be a Category Two school, thus removing the possibility of Welsh only education in the town. Ruthin locals fought the merger, arguing that the Category One school is necessary to protect Welsh language.

This is interesting for a number of reasons, not least because even in an area of relatively high Welsh language use (42% in Ruthin are Welsh speakers) the perceived threat from English is obviously a worry. For me this underscores the difficulty of maintaining diverse language practices in the presence of a very dominant prestige language. If Wales (which has made enormous progress in re-envigorating Welsh language education in the past 30 years or so) is worried, one can only imagine the challenge for other minority languages.

I’d be interested in seeing evidence to help understand what sort of effect the closure of Category One schools in favour of Category Two schools has on Welsh language use. The fear articulated by opponents of the merger is that “the natural dynamic [of Category Two schools] will mean that Welsh-speaking pupils will turn to English”. Denbighshire council, on the other hand says that it believes that the Title Two school will “help generate more Welsh speakers.”

We watch with interest.

For more on Welsh language schooling in Ruthin see this post, which I wrote a while back when the head of an independent school in the area drew the ire of locals by suggesting that Welsh language schools harm the prospects of their children. By my calculations, he had his facts wrong.

Postgraduate Certificate in EAL – Oxford Brookes University

Oxford Brookes University offers a Postgraduate Certificate IMG_1907in Education. The fully accredited course focuses on multilingual children learning in mainstream, complementary and international schools. It aims to draw on current debates, policies, practice and research around multilingual learners, to enable participants to:

  • Evaluate and critically compare policies connected with the teaching and learning of the EAL/multilingual child
  • Identify theories of bilingualism, translanguaging and dynamic language
  • Appreciate the links between the learning and the use of different languages
  • Explore identity and self-esteem: the emotional experiences of the EAL/multilingual child
  • Evaluate teacher, teacher assistant, parent, and whole school responses to the EAL/multilingual child, including the use of technology
  • Theorise practice and pedagogy: explore the beliefs, theories and attitudes to language and the EAL/multilingual learner which underpin teacher choices

The course is equivalent to one third of a Master’s degree, and the 60 course credits it attracts can be combined with other modules to build a full MA.

It can be attended online or face to face.

Read more about the course in the attached flyer here.

Multilingual Learners in Context: A wrap

The Multilingual Learners in Context symposium at Oxford Brookes University on Saturday was an excellent bringing together of academics and educators from related but importantly different fields, under the umbrella discipline of teaching multilingual learners. The order of events allowed a narrative to unfold over the course of the day that revealed common themes which were revisited and enriched as we heard about them from the perspectives of mainstream schooling, community schooling and the international sector. Victoria Murphy and Therese Hopfenzbeck, both of the University of Oxford, bookended the day with discussions of quantitative data that described achievement of multilingual learners in the UK, the character and extent of controlled intervention studies pertaining to EAL learners’ education, and international comparisons of literacy attainment. Murphy ended her talk by impressing on us four key takeaways: 1) more research on EAL in the UK is needed, 2) more funding is necessary for the type and scale of research necessary for us to really know what works, 3) understanding of appropriate pedagogy for EAL students needs to be better integrated into Initial Teacher Education, and 4) We need to abolish the monolingual hegemony. Whether by design or as a result of happy coincidence, these themes would recur throughout the day from very different starting points.

Strand et al 2015

Figure shared by Victoria Murphy from Strand, Malmberg and Hall (2015), showing KS2 SATs achievement relative to the grand population mean, by ethnic classification and stratified by EAL and non-EAL status.


We heard from Ana Souza and Jane Spiro of Oxford Brookes University about the importance of grassroots advocacy, from London to Hawaii, for building and defending provision for appropriate education of multilingual learners. Both Spiro and Souza described the critical role of promotion of all aspects of a multilingual person’s self – their language, their history, their culture – in developing linguistic and inter-cultural competences, and the benefits of raising the visibility of multilingualism generally. As well as the promotion of inter-cultural competence through community schools, Souza described a kind of organic, student initiated, development of meta-linguistic competence as a result of attending these kinds of schools. This was an important acknowledgement of both the social justice aspect of community schooling and its academic utility. Spiro described the Hawaiian situation as a forty-year work in progress, underscoring the time and commitment needed for change to happen.

Segueing perfectly from this thought, Peeter Mehisto described his work (in more countries that I can remember) on bringing together stakeholders to improve the educational lot of multilingual learners. In a serendipitous call back to Murphy’s talk, Mehisto emphasised the need for good research reviews that summarise what we know about educating multilingual learners and how these should be available to all stakeholders. He argued that buy-in from stakeholders is contingent on unbiased understanding of what we know about multilingual education. He also described how crucial the support from the upper echelons of power can be. A sympathetic ear at City Hall from the outset pays dividends when multilingual projects hit the inevitable road bumps caused by forces working against them, either by willful blindness or self-interested power, he said. Mehisto touched on ‘monolingual disadvantage’ in another related call back to Murphy’s four key takeaways, emphasising that multilingual education should not be reserved just for high achievers, members of the ‘elite’, or in order to suppress L1. Among a number of mechanisms necessary to move this agenda forward he proposed shining a light on incompetence – wherever it lurks – as key. This put me in mind of Louis Dembitz Brandeis’ 1913 contention that “Sunlight is said to be the best of disinfectants; electric light the most efficient policeman.”

Drawing on the theme of stakeholders and the importance of what goes on in the homes of multilingual learners, Raymonde Sneddon described work she was engaged in, developing bi-literacy through dual language books and by helping children to create personal texts, inspired by Cummins ‘Identity Texts’. She says this started as an attempt to get languages acknowledged in the classroom, but quickly developed into a deeper exploration into bi-directional transfer of literacy skills and deeper understanding of texts read in two languages. Self-report from parents and children suggested that this work had positive social and academic implications.

Oksana Afitska’s presentation on the use of L1s in promoting learning for multilingual learners used concrete illustrations to emphasise that if a student cannot demonstrate their knowledge using English, this doesn’t mean that they do not possess that knowledge. She showed some helpful examples from SATs that, if considered only in terms of the official marking scheme, would lead one to conclude that the children knew very little about the topics being assessed. Looking through an L1 lens, however, reveals a different picture. Her presentation allowed for some useful discussion about whether there have been any empirical demonstrations of the value of translanguaging as a pedagogic tool. For me, this discussion reinforced the notion that translanguaging has neither been sufficiently well defined, nor has it been sufficiently well assessed as a pedagogic tool for us to say anything particularly conclusive about it.

Cathie Wallace, of the IoE at UCL, framed her talk with the need to provide enriched and expanded text worlds for multilingual learners. She used case studies of pupils in London, describing their engagement with texts on their journeys to becoming literate users of English. She discussed ‘relevance’ and ‘resonance’, citing Arthur Miller’s A View from the Bridge as a text that, while on the face of lacking in relevance, actually was one that resonated very well for some of the young learners with whom she worked. This brought to mind the recent debate about the appropriateness of the texts used in the KS2 SATs. These texts, some have argued, would precipitate success only for ‘white, middle-class girls’, such was their non-relevance to other demographic groups. In the light of Wallace’s differentiation between relevance and resonance, I feel this is a debate worth reviewing.

Therese Hopfenbeck of the University of Oxford described PIRLS (Progress in International Reading Literacy Study), an assessment programme not dissimilar to PISA that is conducted every four years to provide data about how well children in different countries are doing in reading after four years of schooling. Her talk underscored for me the importance of teacher engagement in research for making it relevant, understandable and useful/useable. This is why I feel that we were lucky to have a small number of teachers from local schools present at the symposium, contributing to the discussion and helping to ensure that the end users in all of this were represented.

In sum, the cross-disciplinary nature of the event, the contribution of teachers, advisors, and academics and the broad diet of research approaches made the symposium an excellent educational experience for all in attendance. As a researcher who leans towards intervention studies as a preference, I found hearing first-hand about research which uses other approaches fascinating. Moreover, it helped me to better contextualize the world of research and provision for multilingual learners that I inhabit. This learning was reciprocal, and I was delighted to hear one presenter (more familiar with collection of qualitative data) say how great it was to learn about the data presented in Murphy’s and Hopfenbeck’s presentations as “we never get a chance to see it normally”.

My one regret about the day was the difficulty we had in making local teachers aware of it and its relevance to them, and encouraging them to attend. This is something we will need to work on for next time. However, the symposium was billed as an opportunity to examine the commonalities and peculiarities of differing multilingual contexts, exchange knowledge, and consider ways forward in meeting the needs of multilingual learners, with the potential to radically shape the way we think about and deliver effective provision for our language learners. From my perspective, I’d say mission accomplished.

The event was videoed and, subject to a bit of administration, should be available soon. I will post links when I have them.

*This post has been updated from the original to clean up some typos and rephrase some clumsy passages.