Bias: how much difference does it really make in randomized trials?
Posted on January 21, 2019 by Giorgio Karam
Randomized controlled trials (RCTs) can be subject to different kinds of bias. Cochrane’s Risk of Bias tool, outlined in chapter 8 of the Cochrane Handbook (version 5.1), is a commonly used way to assess RCTs for bias. We can classify the biases listed by this tool in the following manner:
- Selection bias (due to inadequate sequence generation or inadequate allocation concealment)
- Performance bias (due to inadequate blinding of participants/clinicians)
- Detection bias (due to inadequate blinding of outcome assessors)
- Attrition bias (due to incomplete outcome data)
- Reporting bias (due to only selected outcomes being reported)
- Other forms of bias (e.g., imbalance in baseline characteristics)
These were originally based on theoretical concerns and anecdotes. Starting inchoately in the 1980s, and primarily in the 1990s, reviewers began to compare trials with biases to trials without biases to see how much magnitude of effect was changed by the presence of bias. These studies are known as meta-epidemiological studies. At the current time, the most complete is the 2016 systematic review and meta-analysis of meta-epidemiological studies by Page et al. All results presented here are directly from this review’s meta-analyses or from studies cited therein. Note that results are presented either as a ratio of ratios or a difference in standardized mean difference (difference in SMD). A ratio of ratios < 1, or a negative difference in SMD, represents bias causing the estimate of effect to be overestimated/exaggerated.
Page et al. found that inadequate or unclear randomization sequencing caused the effect of treatment to be overestimated (ratio of odds ratios 0.93, 95% confidence interval 0.86 to 0.99). Inadequate or unclear allocation concealment overestimated effects similarly (ratio of odds ratios 0.90, 95% confidence interval 0.84 to 0.97). With allocation concealment (but not randomization sequence), a greater effect was noticed for subjective outcomes versus objective outcomes: the ratio of odds ratios was 0.80 (95% confidence interval 0.71 to 0.90) for subjective outcomes versus a non-statistically significant effect for objective outcomes (albeit with a wide confidence interval: 95% confidence interval of 0.84 to 1.15).
Note: to make these figures more applicable to continuous outcomes, divide the natural logarithm of the ratio, i.e., ln(ratio of odds ratios), by 1.814 to yield a difference in SMD.
The most rigorous evidence of the effect of performance bias due to lack of participant blinding comes from a 2014 meta-epidemiological study by Hróbjartsson and colleagues which looked at trials where patients were randomized either to a blinded or an unblinded arm. They discovered that when patients were unblinded, the SMD was overestimated by 0.56 standard deviations (difference in SMD -0.56, 95% confidence interval -0.41 to -0.71) compared to when patient blinding was present. This is nearly the difference between what are conventionally considered to be modest and large SMDs (0.2 and 0.8, respectively). Note that generalizability of these findings is limited, as all of these trials were performed on alternative therapies, dominated by acupuncture. The acupuncture trials showed greater response to bias (SMD overestimated by 0.63) than non-acupuncture trials (SMD overestimated by 0.17). Also note that all outcomes were subjective.
Another study, by Nüesch et al., which had a weaker design (it did not use studies where patients were randomized to a blinded or unblinded arm), found similar results for alternative therapy trials, but found much less exaggeration of subjective outcomes for non-alternative therapy trials (SMD difference -0.04, 95% confidence interval -0.10 to 0.18).
Page and colleagues did a meta-analysis of three meta-epidemiological studies focusing more on binary outcomes, including a mix of subjective and objective outcomes, which showed some evidence of the effect of inadequate or unclear blinding of participants (ratio of odds ratios 0.92, 95% confidence interval 0.81 to 1.04).
There was not enough evidence to make a statement about the importance of blinding clinicians.
The best available evidence of the importance of detection bias comes from three meta-epidemiological studies by Hróbjartsson et al., each focusing on a different type of outcome (continuous, binary, time-to-event). These studies are preferable because instead of comparing one trial with outcome assessment blinding to another trial without, they compare outcomes in trials where patients received both blinded and unblinded outcome assessment. For continuous outcomes, lack of assessor blinding exaggerated the SMD by 0.23 standard deviations (difference in SMD -0.23, 95% confidence interval -0.06 to -0.40), and the result was similar in trials where all patients received both blinded and unblinded assessment (difference in SMD -0.29, 95% confidence interval -0.09 to -0.49). For binary outcomes, treatment effect was overestimated when blinding of assessment was absent (ratio of odds ratios 0.64, 95% confidence interval 0.43 to 0.96). Again, the result was similar in trials where all patients received both blinded and unblinded assessment (0.70, 0.52 to 0.96). For time-to-event outcomes, “typical” trials had an overestimation of treatment effect (ratio of hazard ratios 0.73, 95% confidence interval 0.57 to 0.93) but a set of “atypical” trials, in which a new oral form of a drug was compared to its conventional IV formulation in CMV retinitis, seemed to have an underestimation of effect when there was no blinding of assessment (ratio of hazard ratios 1.33, 0.98 to 1.82 – note a ratio of ratios > 1 represents an underestimation). However, this last result should be interpreted with some caution, as splitting trials into “typical” or “atypical” was not predefined.
Note that these three studies were only able to include subjective outcomes (e.g., disease severity score, progression of disease, or if an injury was healed). Other studies included in Page et al. reported results for objective outcomes, but the confidence intervals are too wide for them to be helpful.
Attrition, reporting, and other biases
There was little empirical evidence of the effects of attrition, reporting, or other kinds of biases. The definition of attrition bias varied considerably between meta-epidemiological studies, which may explain the heterogeneity in results found by Page et al. For the remaining biases, the meta-epidemiological studies that were performed were small and yielded results with wide confidence intervals.
Written in 2008, the printed version of the Cochrane Handbook states that “the evidence base remains incomplete” for the effect of bias on randomized trials. This is still true today, as we can see clear limitations to these data.
First, evidence is unequal for different kinds of bias, e.g., attrition and reporting bias have very little data, and even detection bias has little data on objective outcomes. Second, most results had at least moderate heterogeneity so, at best, the figures are a rough approximation of the effect of bias. Third, newer meta-epidemiological studies have been published since the 2016 review by Page et al. (e.g., the 2018 ROBES study), so there is the possibility that new data may change these figures. Finally, although these numbers show that bias generally leads to an overestimation of effect, bias may also favour a null effect or an underestimation of effect.
The direction of effect is not always easy to predict: for example, when allocation concealment is absent, assignment to the treatment arm may be dominated by sicker patients (as doctors may want them to get the newest therapy), which would lead to underestimation of effect, or by healthier patients (as doctors may want the trial to show that the therapy works), which would lead to an overestimation of effect. For this reason, Hróbjartsson et al. (2012) wrote that “in any individual trial it is not possible to safely predict neither the direction nor the size of any bias. We would advise against using our pooled average as a simplistic correction factor.” On the other hand, the Cochrane Handbook suggests trying to consider the likely direction and magnitude of bias. A proper solution could be to only attempt to use these figures in trials where the likely direction of bias is relatively clear (e.g., lack of blinding in a trial of treatment vs. no treatment) and not to try to calculate the true/unbiased effect (which could only be determined by performing a trial at low risk of bias), but only as an aid to understanding how much bias can influence a trial.
To give one example of how to apply these figures: in the BATHE trial, children with atopic dermatitis were randomized to emollient bath additives or usual care and the primary outcome was severity of eczema as assessed on the POEM scale by parents/carers. Since they knew how the child was treated, there is obviously high risk of detection bias. Given that we are considering a subjective outcome and it is a common treatment vs. no treatment, it is reasonable to assume that bias would favor treatment, as meta-epidemiological studies show tends to happen. The mean POEM score over 16 weeks was 7.5 for treated children and 8.4 for untreated children (a lower score is better, but the result was not statistically significant), and the standard deviation for both was 6.0 points.
Above, we found that, on average, detection bias exaggerates subjective continuous outcomes by about 0.23 standard deviations. To put this in terms of the POEM scale, we multiply this by the standard deviation of the result: 0.23 × 6.0 = 1.4. Therefore, the mean difference in effect (8.4 – 7.5 = 0.9 points) may have been exaggerated by 1.4 points simply due to the lack of unblinded assessment, to give a very rough approximation. Detection bias is likely not to pose a threat to the trial authors’ conclusion that bath emollients had no appreciable benefit in treating atopic dermatitis.