Yesterday I went to the Stepped Wedge Trials Symposium at London School of Hygiene and Tropical Medicine. The symposium was convened to mark the publication of a special edition of Trials journal that focuses on this type of trial design, and which includes reports and methodological studies of research using the design. At the symposium we heard about the genesis of the term Stepped Wedge from Peter Smith, who used the design in the 1980s to assess the effects of HPV vaccination of infants in Gambia on subsequent incidence of liver cancer in adulthood, and we learned about some of the issues that need to be considered in using the design. These are my thoughts on what I saw.
What is a Stepped Wedge Trial?
A stepped wedge trial is a type of cluster randomised trial that is a bit like a crossover trial. It phases in the intervention being tested over time so that participants in the study gradually cross over from being in the control condition to being in the experimental condition.
The clusters are randomly allocated to the point at which they cross over from control to experimental conditions. During each phase there is the opportunity for a vertical comparison, assuming that outcomes arise sufficiently rapidly after the intervention is introduced. Then, on completion the trial, there can be a horizontal comparison of effects.
So, by way of illustration, imagine that the diagram above represents a trial to assess the effects of introducing free hot lunches on the academic attainment of pupils’ at a junior school. Each column represents all the children in the school, and each cell within the column represents one of the schools’ year groups. The first column represents time before the trial starts – all year groups are not receiving free hot lunches. This is the control condition. The second column represents the first term in the school year, at which point one year group, allocated at random, begins to receive its free hot lunches. Time passes and we enter the second term of the school year at which point another year group, selected at random, starts to get free hot lunches. So it continues until all year groups are getting the hot lunches and are all effectively in the experimental condition.
Comparison of the academic attainment of those clusters (year groups) receiving hot lunches and those not receiving hot lunches can be made at three points along the time scale. This was described at the symposium as ‘vertical comparisons’. Then, at the end of the trial, a bit of statistical jiggery pokery can be done to compare the effects of the hot lunches on pupil attainment using all the data points. This was termed ‘horizontal comparison’.
Why do a Stepped Wedge Trial?
Good question. Why would you want to do a stepped wedge trial when you could be doing a parallel randomised trial, or, if your really had to, an interrupted time sequence, or a regression discontinuity? Well, the main reasons researchers elect to use this design, according to discussions at the symposium, were logistics and ethics.
The former, logistics, might be something like the availability of funding. It could be that money to pay for the hot lunches in the example above, for whatever reason, only comes on-line in stages. So, in order to conduct a trial with sufficient power to detect any differences that may exist between comparison groups, the only option is to introduce the intervention in stages. Because random allocation to time of joining the intervention is used, the risk of systematic bias related to who gets the intervention when is reduced.
The latter, ethics, is when there is tension between researchers (who want to compare interventions) and ethics committees (who feel that it would be unethical to deny the intervention to half of the participants). In these cases an unhappy compromise is reached whereby phased introduction of the intervention partially satisfies both parties.
I can kind of understand the logistical reasons for doing a stepped wedge trial, but I think that the ethics reason is massively problematic – and not restricted just to stepped wedge trials. It is an example of where those responsible for overseeing the ethical conduct of researchers can behave unethically themselves. If an ethics committee knows that the benefits of an intervention are sufficiently well established that it would be unethical to deny it to half the participants in a trial, then I can’t see how they have any business approving any sort of comparison of it with what they know to be an inferior intervention. If, on the other hand, they agree that their remains uncertainty about the effects of an intervention, such that a trial is the best way to resolve that uncertainty, then it is ethically wrong to force everybody in a trial to be subject to that intervention. It could be that the new, untested, intervention actually harms participants. Ethics committees that insist that all participants should, at some stage, be given the new intervention risk being responsible for exposing twice the number of people to harm than might otherwise be the case. In addition, in a stepped wedge trial, according to one speaker at the symposium, the complexity of the design makes establishing stopping rules commensurately more complex. The implication being that any harmful effects (that in a parallel randomised trial might be detected while it is in process and precipitate a shut down of the trial) go undetected until many more people than might otherwise be the case have been exposed to the harm.
Even if one accepts these reasons for delaying introduction of the intervention to some groups, I can’t help but be concerned about the effects that that delay might have on the nature of the participants by the time they are introduced to the intervention. In a medical trial it is not unreasonable to assume that the nature of whatever condition is being treated in the trial might change over time. For example participants might naturally get better, they might die, their medical condition might change so that none of the interventions being compared will be, even theoretically, appropriate. If this happens then any bias reduction made possible by random allocation is compromised and the comparison groups become systematically different.
In social sciences there may be what one panelist at the symposium described as carry-over effects. That is to say, experiences that participants have early in the life span of the trial carry over to later periods of the trial and may, therefore, introduce systematic bias into the trial.
To illustrate this in an education setting, consider the following (admittedly rather crude) example. If I want to compare the effects on students’ critical thinking of teaching evolution, in a school for whom business as usual is to teach creationism, I could do a stepped wedge trial. Assuming that participants are all new to the school at the start of the trial, the cluster in the first phase of the trial gets little or no creationism education and are compared to children who have had commensurately little creationism education. As time passes, the business as usual clusters experience increasing amounts of creationism education so that by the time the final cluster is introduced to the evolution education they are well and truly convinced by the doctrine of creationism and dismiss evolution as contradictory and therefore wrong. They are now systematically different to the comparison groups at the start of the trial.
It was an interesting symposium and it is always informative to hear about different methodological approaches to finding out what works. Based on this brief introduction to stepped wedge trials, though, I remain pretty unconvinced. I can’t help but see them as a capitulation to ethics committees, who, in my opinion, behave unethically when they insist that all participants in a trial must receive an intervention over which there is uncertainty. Stepped wedge trials also introduce a not inconsiderable risk of bias, into what should be an unbiased form of comparison. My position may not be unique, as the final comment in the panel discussion that ended the symposium illustrates (regrettably, I can’t remember who said it). It was something like “when you can’t do any other kind of randomised trial then maybe you should consider doing a stepped wedge trial.”