I recently read this blog post, which presents and expands on Angus Deaton and Nancy Cartwright’s recent thoughts about RCTs. It is worth a read because, in my view, it is an excellent case study of the misunderstanding and misrepresentation of the the claims made in favour of RCTs. Deaton and Cartwright must take the lion’s share of the blame here, not the author of the post, though he does add to the commentary some startling assertions. Here are two of many:
RCTs do not have external validity.
A key argument in favour of randomization is the ability to blind both those receiving the treatment and those administering it.
Both assertions are just plain wrong. Although the author does demonstrate a more nuanced understanding of the value of randomised trials elsewhere:
The results of RCTs must be integrated with other knowledge, including the
practical wisdom of policy makers if they are to be usable outside the context in which they were constructed.
In a response to a comment I made on his post suggesting that Deaton’s and Cartwright’s arguments unfairly target randomised trials, when the criticisms they make are equally applicable to all kinds of intervention research, the author responded thusly:
I think that this argument fails to acknowledge the single defining feature of a randomised trial and also misrepresents what is claimed for them by people who the author has called ‘randomistas’. Moreover, what Deaton considers to be the position of so called ‘randomistas’, specifically, is irrelevant (or at best thoughtless) when his criticisms are not actually of randomised trials but of all types of research.
My response to the above comment is reproduced below.
You may be correct about how Deaton views ‘randomistas’, but if so, he really needs give examples of people claiming that the results of RCTs are superior to results of obtained using other methods. I am a proud ‘randomista’ and I work with a lot of people who might be classified as such, and the idea that people like me say that the results of RCTs are always superior to alternative methods is just not a familiar one. In fact when reading reports of RCTs it is common to find loads of caveats about the findings.
People who understand what RCTs are and what they are not know that the only unique feature of the design is that they generate comparison groups by randomly allocating cases to conditions. That’s it.
I don’t think it is controversial for ‘randomistas’ to argue that this is the best way of generating comparison groups that differ only as a result of the play of chance, rather than as a result of some systematic (non-random) characteristic. In any population there will be things that we know and can measure (so for example we could deliberately match cases based on these factors – say age, gender, or test scores). But there are also things that might be relevant that we don’t or can’t know about our participants and therefore can’t take into account when generating comparison groups. If we accept that there are things that we don’t or can’t know about our participants, then the only way around it, if you want to create probabilistically similar groups, is to use random allocation. Random allocation thus acknowledges and accounts for the limitations of our knowledge.
So, the notion of ‘superiority’ centres around the question ‘how confident am I that the groups being compared were similar in all important known and unknown (and possibly unknowable) characteristics?’
Of course, if your research question is one that does not involve comparisons and causal description then RCTs are not appropriate. You would be hard pressed to find a ‘randomista’ arguing that you need an RCT to help understand the views or opinions of a population of interest, for example. In addition you will be unlikely to find a ‘randomista’ arguing that you need an RCT when observational studies have reported very dramatic effects. Take for example the tired old chestnut about not needing an RCT to find out if parachutes work. 99.9% of people who do not open their parachutes after jumping out of a plane die. This is a highly statistically significant finding and is extremely dramatic. There is no need to go beyond observation here.
Unfortunately for us, the effects of interventions in the social sciences are rarely so dramatic. Therefore, one key element in making casual inferences is ensuring that when we compare alternative interventions or approaches we are, in the best way we know how, comparing like with like. This means that any differences in outcome that we observe between groups can be more confidently attributed to the interventions being compared rather than to an effect of non-random differences between groups.
That’s the strength of an RCT.
At the end of June I attended the Centre for Evidence Based Medicine’s (CEBM) annual EvidenceLive conference, in Oxford, UK.
It gave me the opportunity to consider how and if Evidence Based (or informed or supported if you prefer) Education can learn from the journey of Evidence Based Medicine over the past twenty years.
I was asked to write guest blog post for the CEBM about my thoughts attending the conference from the perspective of an educator and education researcher. In addition to exploring what the term ‘evidence based’ means in the post, I consider how education is constrained in a way that medicine is not by the relative capacity of end users (teachers, pupils, parents vs patients) to contribute directly to how evidence based practice is conceptualised for them personally. I also suggest that those who take the reductive view that evidence based education is conceptualised only in terms of research evidence (and rather specifically, randomised trials) are mischaracterising the field, and need to reassess their position.
You can view the post at the CEBM site here: http://www.cebm.net/can-education-learn-evidence-based-medicine/
Note: At time of writing one of the links in the blog post is broken. It should link to Gary Thomas’ mischaracterisation of what constitutes ‘evidence’, from the Times Higher Education, and can be viewed here.
University of Reading’s Institute of Education held a ‘research into practice’ event on May 26th entitled Language(s) and Literacy at Primary. The event was billed as an opportunity to bring together practitioners and researchers interested in primary aged children’s language and literacy development. There was a specific expectation that the event would facilitate reflection on the relationships between research and practice, and that practitioners would suggest ways in which the research agenda could and should be taken forward. Given recent discussions about the challenge of productive engagement with research by teachers (TES 20 and 23 May 2016 – links at bottom of post) events like this are extremely welcome.
I am told that the organising team worked tirelessly to encourage teachers and local authority staff to attend the (free) event. If my recent experience of trying to drum up interest among local teachers to attend a similar event here at Oxford Brookes is any measure, they must be heartily congratulated for their efforts: forty-six delegates packed two lots of three parallel workshop sessions and a plenary keynote. This was a fabulous example of research into practice.
Each of the three parallel sessions focused on a different theme: Reading Development, EAL (English as an Additional Language) and Primary (Modern Foreign) Languages. But, of course, it was the EAL sessions that I attended. By way of illustration of the whole afternoon, here is an account of one session.
Vincent Trakulphadetkrai and Jeanine Treffers-Daller reported on a study they conducted with EAL learners on the relationships between reading comprehension and success in maths. Trakulphadetkrai made the point that while EAL learners tend to be pretty good at decoding, this does not mean that they understand the words they have decoded. This is further affected by the potential for confusion presented by words with multiple meanings, and the difference between common usage and specialist usage of these words. For example, table, translate, volume, and similar are all examples of mathematical terms with nonequivalent homophones used in everyday English.
In their research, Trakulphadetkrai and Teffers-Daller sought to address questions concerning the difference between the performance of first language English users and their EAL peers on maths problems (both wordless and word-based problems), and to assess the extent to which EAL learners’ reading comprehension and vocabulary knowledge is related to their performance in solving word-based problems in maths. An example of a wordless problem is 100÷4= and an example of a word-based problem is ‘The school playground is square. It takes 100 equal paces to walk the perimeter of the playground, how many paces is each side?’.
They worked with 33 Year 5 children, 17 EAL (representing 11 countries of origin and 11 different L1s) and 16 EL1 (English First Language). In tests on wordless and word-dependent maths problems they detected a statistically significant difference between performance on word-based problems between EAL and EL1 children. They did not detect a statistically significant difference between scores on wordless problems. This suggests that if we are to address the achievement gap between EAL learners and EL1 learners in maths, then word-based problem solving seems like a suitable target of our efforts.
Trakulphadetkrai and Treffers-Daller then assessed a variety of measures of reading proficiency (YARC test –comprehension, SWRT – vocabulary knowledge, and C-Test – language ability) and found statistically significant relationships between these measures and success in maths). They concluded that:
- Mathematical word-based problem-solving performance is related to EAL learners’ vocabulary knowledge.
- Language ability (C-test) and reading comprehension explain a large portion of EAL children’s word-based mathematical scores.
The floor was then opened to delegates to suggest what should be done next in the light of the findings of this study. Trakulphadetkrai led a discussion, soliciting descriptions of the experiences, ideas and suggestions of practitioners at the ‘chalk-face’ of EAL teaching. The group discussed the potential practical applications of the research and how it can be made most useful for those colleagues who are ultimately in charge of translating research into practice.
Being a part of these discussions was illuminating and has underscored for me the need to involve practitioners in setting agendas so that the research we do is useful. For what it’s worth, my thoughts were that if vocabulary knowledge is an important component of children’s success in maths, then maths teachers with EAL leaners need to make good quality vocabulary instruction an everyday component of their teaching. For me, Trakulphadetkrai and Treffers-Daller would be adding important information to the research literature on EAL teaching in the UK if they conducted an intervention study comparing the effects of such an approach with business as usual. Given the reticence that many of us in the EAL world have witnessed in our non-EAL-specialist colleagues to countenance incorporating EAL methods into their lessons (“I am a teacher of maths, not of English”), such a study should be accompanied by a process evaluation to see what the barriers are to implementing vocabulary teaching in maths lesson and how these might be overcome.
The organisers did a great job in conceptualizing and executing this workshop, and I hope to attend many more like it, at many more institutions, as the voice of practitioners joins the voice of academics to make research count for all.
Read more about teachers’ engagement in research at the links below.
It was my birthday yesterday and I was given a really fantastic present.
This is the title page of a book originally published in 1684, though my version is an impression from 1700. It is a text book for learners of Latin containing Latin colloquies (passages of everyday speech). Not untypical for publications of the time, the title of the book is 58 words long (see above).
What is really cool about those 58 words are the 19 pictured above.
Now compare them to these words taken from the founding statutes of Adams’ Free Grammar School in Shropshire in 1656.
Fifteenthly No scholars that have attained to such a progress in learning as to be able to speak Latin, shall neither within School or without, when they are among the Scholars of the same or a higher form, speak English. And that the Master shall appoint which are the forms, that shall observe this order of speaking Latin, and shall take care that it be observed and due correction given to those that do neglect it.
Now, these 17th century expressions of the place a language learner’s mother tongue has in his or her education might seem familiar to teachers of language learners in the 21st century.
Here is England’s DfE on mother tongue use from 2006:
And here is the first principle of inlingua language schools’ method from 2015:
So, who’s right? Do we have any empirical evidence to settle this disagreement, a disagreement at least 331 years old? Are there any empirical studies that compare the effects of allowing children to use their mother tongue with the effects of forbidding it? We know that bilingual schools, on average, produce better linguistic and academic results for language learners than do all-English schools. But does this mean that allowing mother tongue use in language classes or in mainstream classes has the same effect?
I think it’s time this ancient argument was settled. And that’s what I’m hoping to achieve with my research.
This week I attended an online seminar called Research Impact and Public Engagement for Career Success. Five panellists discussed the expectation that public engagement and research impact should be woven into the fabric of the research process from conception to dissemination and beyond.
I’m not too clear on how I feel about the requirement for an obvious, pre-specified impact in order to validate research. On initial consideration it seems sensible that some assessment of the impact of research should add to any assessment of the value of that research. In terms of the kind of research that I am interested in – What Works research – it is probably important to be able to say something like “The impact of this study is that we will be able to say with some confidence whether teachers’ use of Intervention X will be helpful for students’ achievement of Outcome Y”. Important? Yes. A prerequisite? I’m not sure.
What of the ‘blue skies’ research for which impact is not immediately identifiable? Some individuals talk without irony about teachers needing to prepare students for jobs that haven’t been invented yet (a meme that is rightly criticised in some quarters for its silliness – or at least for the unremarkableness of its premise). But, if we are encouraged to value aspects of a future world that may or may not come into existence then surely we can value research the impact of which we cannot yet conceptualise.
3M is well known for funding with money, time and moral support blue skies research; research for the sake of itself. I’m willing to bet that a great many of the projects 3M funds end up at the bottom of tortuous rabbit holes. However, 3M’s approach to research and development has resulted in some notable, yet unforeseeable, success stories.
In 1968, while trying to create a super-strong glue, Spencer Silver, a scientist at 3M, accidentally created a ‘low-tack’ reusable adhesive that he had no idea what to do with. So flummoxed was he by its apparent lack of a tangible ‘impact’ that when he touted it around the halls of 3M in the succeeding years he referred to it as his ‘solution without a problem’. It wasn’t until 1974 that a colleague of his, Art Fry, deciding that he needed a way to anchor a bookmark to his hymnal, appropriated the glue. He applied it to some yellow paper (the colour of the only paper that happened to be lying around the science lab at the time) and created Press ‘n’ Peel. Even then it wasn’t until 1980 (twelve years after the research that led to the creation of the adhesive) that, consumer tested and rebranded, Post-it notes hit the shelves of stationers across the world. And the rest, as they say, is history.
If an impact assessment had been required of Silver and Fry to green light their research who knows if these ubiquitous little yellow squares would have been allowed to have the impact that they so clearly have.
Of course, what triggered an understanding of the potential impact of Silver’s creation was a form of public engagement. By touting his invention around 3M, both through informal chats with colleagues and formal seminars with peers, his solution eventually found its problem.
Public engagement is where I find myself feeling on firmer ground. There is a lot of waste in research. Some of it is due to things like publication bias, data hoarding, and unnecessary duplication, but some of it is because researchers are not engaging with the public. As a result researchers waste time, effort and money by addressing questions that no one is interested in seeing the answers to.
Consider this example. If you had a chronic poorly knee, what kind of research would you like to see being conducted to help you? Well, in 2000 Deborah Tallon and colleagues engaged with the public to find out what people with osteoarthritis of the knee and their doctors wanted researchers to research. They found that the top two research priorities among these demographic groups were knee replacements and education on how to manage pain. When Tallon and her colleagues looked at what dominated the research literature on interventions for this condition they found that drug treatments and surgery came top; a clear mismatch between what consumers wanted and what researchers were doing.
Tallon suggested that this pattern was unlikely to be confined to just knee problems, an assertion that was borne out by work by Sally Crowe and colleagues earlier this year. In a similar exercise to Tallon’s, Crowe and colleagues found the same pattern of preference for drug trials among researchers, while patients, carers and clinicians prioritised research on non-drug interventions. Some might argue that, despite expectations for public engagement, the medical research community, on average, remains woefully disengaged.
In the world of education I see no reason to be any more optimistic that researchers adequately engage with the end users of their research. I am currently preparing a systematic review of language-teaching interventions for school aged children. This involves screening thousands of titles and abstracts to find studies that might be relevant to a question that is of demonstrable interest to emergent bilinguals, their parents and their teachers (I know they’re interested because I’ve asked some of them). This question is, in essence, what can be done in the classroom and the home to improve outcomes for English language learners. I have so far screened 3000 potentially relevant abstracts and only about five percent of them have anything to do with teaching and learning strategies. Many, many times that number are on things like fMRI to locate which parts of the brain light up when people say things in different languages, or correlational studies that report associations between the skills in different languages of bilingual learners. Interesting? Perhaps. Impactful? Depends on how you define impact. Based on what consumers want? I doubt it.
Public engagement is critical if we as researchers hope to add value to the educational experiences of the consumers of our research. In a climate of top-down imposition, researchers, teachers and pupils will find power in public engagement. That is why, following this week’s seminar, I shall be redoubling my efforts to engage with the pupils, teachers, and parents to set my research agenda, so that together we can produce something useful, relevant, and perhaps even impactful.
Yesterday my attention was drawn to an article on the Wales Online news website that reports on a letter written to a local newspaper by Toby Belfield, headteacher of Ruthin School in Denbighshire, North Wales. In the letter he responds to a parent’s suggestion that the language of instruction in schools in Wales should be Welsh by suggesting that by “forcing young people to learn both English and Welsh (arguably, both to a substandard level) is that young people in Wales will continue to be educationally weaker than their peers in England and abroad.” (Williams 2015a).
This is an interesting claim and one which could be supported or refuted by empirical evidence.
So, in the spirit of encouraging members of the educational community to base their assertions on evidence, I decided to ask Toby Belfield for the evidence upon which he based those in his letter. To do this I used the excellent Ask For Evidence website. The Ask For Evidence initiative was set up by Sense About Science to help “people request for themselves the evidence behind news stories, marketing claims and policies” (AskForEvidence 2011).
Here is the text of my request for evidence:
Dear Mr Belfield
I read with interest about the the disagreement being voiced between you and some members of the Welsh community with regards to the effects that Welsh language education has on the attainment of children educated in Wales (Wales Online 14 May 2015).
I am a doctoral researcher in Oxford, with a particular interest in the way that first and second languages interact during the learning process. My thesis investigates whether making use of a child’s first language in an otherwise monolingual English speaking environment is helpful, harmful or makes no difference.
I am currently preparing a systematic review of empirical studies that have investigated this question. This involves as exhaustive a search as possible to find reports of studies addressing this issue, so that a comprehensive picture of the effects can be assembled, as free as possible from bias.
I would be very grateful if you felt able to provide me with the evidence upon which your comments to the press in Wales were based. This would allow me to incorporate into my review studies that I may have missed.
As an educator of some 20 years prior to going into research, I feel that it is absolutely vital that decisions in education are based on sound empirical evidence, so that we as teachers can be confident that what we are doing is as effective as possible in raising attainment of the children to whom we are responsible.
Please note that in addition to my own research, I am asking this as part of the Ask for Evidence campaign and will share the response I get publicly.
I look forward to hearing from you.
I think that it is unlikely that Mr Belfield will be able to provide me with evidence to support his claim that learning two languages in and of itself is harmful to the academic prospects of Welsh children. The evidence of which I am aware would suggest either that it makes no difference in the long term, or that it is beneficial. But this evidence is derived in the main from studies conducted in the USA or Canada, and so he may be aware of similar studies carried out in Wales that would suggest otherwise.
Where he might have a point is in his assertion that teaching Welsh and English “arguably, both to a substandard level” (Williams 2015a) could have negative effects of the overall attainment of Welsh children. This, though, is not a question of what one teaches, but rather the competence with which it is taught. Indeed, reading between the lines of his letter, there is more than language of instruction that he is worried about, and he makes a number of further claims on which he might be reasonably challenged.
I sympathise with his question later in his letter asking “Why is the Welsh education system one of the weakest in the world?”(ibid). He makes a leap of logic though to suggest that it is weak because of the necessity in some schools in Wales to learn Welsh. But that – in the absence of any evidence – is merely conjecture. We shouldn’t be basing educational claims on conjecture.
You can follow my AskForEvidence request here if you’re interested, and I’ll be updating on my blog as the request progresses.
As I was writing this post I saw that Mr Belfield has made an unreserved apology to people who were offended by his letter. He modified his position somewhat by stating that “If all teachers had to not only speak Welsh, but had to teach through the medium of Welsh, then the pool of teachers available to work in Welsh schools will dramatically be reduced, and high quality academics will not necessarily be able to guide/teach our Welsh children.” (Williams 2015b). This is a not unreasonable position to adopt, and further study to determine the Welsh language proficiency of teachers in the Welsh system would be warranted to to assess whether it is defensible.
The baying crowd on social media has forced this headteacher to apologise for offence he has caused. If people are genuinely offended by the personal opinions of a private school headteacher, then I guess that’s their prerogative. For me the bigger offence is to not take account of the evidence when commentating on approaches to education.
AskForEvidence (2011) Ask For Evidence. Available online at http://askforevidence.org/about [Accessed 15.05.2015]
Williams K (2015a) ‘Forcing pupils to learn Welsh will keep them weaker than English counterparts’ Private school head causes outcry with language claim. Wales Online available online at http://www.walesonline.co.uk/news/wales-news/forcing-pupils-learn-welsh-keep-9256782 [Accessed 15.05.2015]
Williams K (2015b) Unreserved apology from Ruthin private school head who sparked Welsh language row. Wales Online. Available online at http://www.dailypost.co.uk/news/north-wales-news/unreserved-apology-ruthin-private-school-9257760 [Accessed 15.05.2015]
“We have done more to raise standards in 21 years of existence than any other organisation.” (Michael Wilshaw, Chief inspector of Schools in England, 2014)
“The reality is that Ofsted is no longer fit for purpose, if it ever was. It […] does nothing to improve the quality of education.” (Bernadette Hunter, National President NAHT, 2013)
In 1992, John Major’s government, dissatisfied with what it saw as the ‘progressive’ tendencies of existing school inspection systems, established a new school accountability system in England. This new national body, called the Office for Standards in Education (Ofsted), was charged with supervising the inspection of all state funded schools in England. Then, as now, Ofsted’s defining mantra was “improvement through inspection” (Ofsted 1993:8).
From its beginnings Ofsted’s validity, reliability and value for money were questioned by the research community (e.g. Fitz-Gibbon 1995), and its alleged (negative) impacts on morale and professionalism were articulated by teachers (e.g. Brimblecomb, Ormston and Shaw 1995, Jeffery and Woods 1996). In the twenty-one years since its first inspection, attitudes to Ofsted have hardly changed. It is still seen by many as an unaccountable and unregulated body, which bases its judgements on questionable evidence, and which enjoys disproportionate power and influence over schools and teachers (see, for example, comments in Gray and Gardner 1999, Coe 2013a, NASUWT 2014, NUT 2014a).
When I wrote this essay in April 2014 the three main teachers’ unions were holding their annual conferences and, unsurprisingly, all had something to say about Ofsted. Christine Blower, General Secretary of the NUT described it as a “destructive system of inspection” (NUT 2014b). Her counterpart Chris Keates at NASUWT regards Ofsted as “seriously tainted” (NASUWT 2014); and Mary Bousted, General Secretary of ATL has opined that “Ofsted can no longer claim that its inspection reports are worth the paper they are written on” (Exley 2014).
Rhetoric of this kind from those most frequently at the sharp end of Ofsted’s judgements is unsurprising, but the bad blood had well and truly spilled over into government, with Chief Inspector, Michael Wilshaw, accusing then Secretary of State for Education, Michael Gove, of briefing against the inspectorate through think tanks such as Civitas and Policy Exchange (Withnall 2014). While Wilshaw later back-peddled on his accusation (Sellgren 2014), it is true that both organisations had publicly questioned Ofsted’s effectiveness (de Waal 2008, Waldergrave and Simons 2014).
Despite almost continuous criticism from teachers and researchers since its inception, however, Ofsted has enjoyed support from three consecutive governments, four political parties, and eleven Secretaries of State for Education. In 1998, Ofsted’s responsibilities were extended to include inspection of Local Education Authorities (LEAs) and teacher training establishments (Elliot 2012). Then, in 2001, inspection of day care and childminding were added (Plomin 2001). Subsequently, in 2007, inspection of post-16 government-funded education, social care services for children, and the welfare inspection of independent and maintained boarding schools were added to its remit (Carvel and Ward 2007). It is now overseeing voluntary inspection of British Schools Overseas (HMG 2014). In the 21 years of it’s existence, Ofsted’s has formally updated its inspection framework only twice. All of this suggests robust government satisfaction with Ofsted’s effectiveness.
Given the collective uncertainty over Ofsted’s fitness for purpose, summarised in the two quotes opening this essay, what evidence is there that Ofsted meets its defining mantra of ‘Improvement through Inspection’? Drawing on empirical evidence, this essay addresses that question and attempts to assess whether any conclusion can be confidently drawn about the extent to which inspection can be considered a catalyst for school improvement. Continue reading