Julian PT Higgins, Jelena Savović, Matthew J Page, Roy G Elbers, Jonathan AC Sterne
- This chapter details version 2 of the Cochrane risk-of-bias tool for randomized trials (RoB 2), the recommended tool for use in Cochrane Reviews.
- RoB 2 is structured into a fixed set of domains of bias, focusing on different aspects of trial design, conduct and reporting.
- Each assessment using the RoB 2 tool focuses on a specific result from a randomized trial.
- Within each domain, a series of questions (‘signalling questions’) aim to elicit information about features of the trial that are relevant to risk of bias.
- A judgement about the risk of bias arising from each domain is proposed by an algorithm, based on answers to the signalling questions. Judgements can be ‘Low’, or ‘High’ risk of bias, or can express ‘Some concerns’.
- Answers to signalling questions and judgements about risk of bias should be supported by written justifications.
- The overall risk of bias for the result is the least favourable assessment across the domains of bias. Both the proposed domain-level and overall risk-of-bias judgements can be overridden by the review authors, with justification.
Cite this chapter as: Higgins JPT, Savović J, Page MJ, Elbers RG, Sterne JAC. Chapter 8: Assessing risk of bias in a randomized trial. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.1 (updated September 2020). Cochrane, 2020. Available from www.training.cochrane.org/handbook.
Cochrane Reviews include an assessment of the risk of bias in each included study (see Chapter 7 for a general discussion of this topic). When randomized trials are included, the recommended tool is the revised version of the Cochrane tool, known as RoB 2, described in this chapter. The RoB 2 tool provides a framework for assessing the risk of bias in a single result (an estimate of the effect of an experimental intervention compared with a comparator intervention on a particular outcome) from any type of randomized trial.
The RoB 2 tool is structured into domains through which bias might be introduced into the result. These domains were identified based on both empirical evidence and theoretical considerations. This chapter summarizes the main features of RoB 2 applied to individually randomized parallel-group trials. It describes the process of undertaking an assessment using the RoB 2 tool, summarizes the important issues for each domain of bias, and ends with a list of the key differences between RoB 2 and the earlier version of the tool. Variants of the RoB 2 tool specific to cluster-randomized trials and crossover trials are summarized in Chapter 23.
The full guidance document for the RoB 2 tool is available at www.riskofbias.info: it summarizes the empirical evidence underlying the tool and provides detailed explanations of the concepts covered and guidance on implementation.
8.2 Overview of RoB 2
8.2.1 Selecting which results to assess within the review
Before starting an assessment of risk of bias, authors will need to select which specific results from the included trials to assess. Because trials usually contribute multiple results to a systematic review, several risk-of-bias assessments may be needed for each trial, although it is unlikely to be feasible to assess every result for every trial in the review. It is important not to select results to assess based on the likely judgements arising from the assessment. An approach that focuses on the main outcomes of the review (the results contributing to the review’s ‘Summary of findings’ table) may be the most appropriate approach (see also Chapter 7, Section 7.3.2).
8.2.2 Specifying the nature of the effect of interest: ‘intention-to-treat’ effects versus ‘per-protocol’ effects
Assessments for one of the RoB 2 domains, ‘Bias due to deviations from intended interventions’, differ according to whether review authors are interested in quantifying:
- the effect of assignment to the interventions at baseline, regardless of whether the interventions are received as intended (the ‘intention-to-treat effect’); or
- the effect of adhering to the interventions as specified in the trial protocol (the ‘per-protocol effect’) (Hernán and Robins 2017).
If some patients do not receive their assigned intervention or deviate from the assigned intervention after baseline, these effects will differ, and will each be of interest. For example, the estimated effect of assignment to intervention would be the most appropriate to inform a health policy question about whether to recommend an intervention in a particular health system (e.g. whether to instigate a screening programme, or whether to prescribe a new cholesterol-lowering drug), whereas the estimated effect of adhering to the intervention as specified in the trial protocol would be the most appropriate to inform a care decision by an individual patient (e.g. whether to be screened, or whether to take the new drug). Review authors should define the intervention effect in which they are interested, and apply the risk-of-bias tool appropriately to this effect.
The effect of principal interest should be specified in the review protocol: most systematic reviews are likely to address the question of assignment rather than adherence to intervention. On occasion, review authors may be interested in both effects of interest.
The effect of assignment to intervention should be estimated by an intention-to-treat (ITT) analysis that includes all randomized participants (Fergusson et al 2002). The principles of ITT analyses are (Piantadosi 2005, Menerit 2012):
- analyse participants in the intervention groups to which they were randomized, regardless of the interventions they actually received; and
- include all randomized participants in the analysis, which requires measuring all participants’ outcomes.
An ITT analysis maintains the benefit of randomization: that, on average, the intervention groups do not differ at baseline with respect to measured or unmeasured prognostic factors. Note that the term ‘intention-to-treat’ does not have a consistent definition and is used inconsistently in study reports (Hollis and Campbell 1999, Gravel et al 2007, Bell et al 2014).
Patients and other stakeholders are often interested in the effect of adhering to the intervention as described in the trial protocol (the ‘per-protocol effect’), because it relates most closely to the implications of their choice between the interventions. However, two approaches to estimation of per-protocol effects that are commonly used in randomized trials may be seriously biased. These are:
- ‘as-treated’ analyses in which participants are analysed according to the intervention they actually received, even if their randomized allocation was to a different treatment group; and
- naïve ‘per-protocol’ analyses restricted to individuals who adhered to their assigned interventions.
Each of these analyses is problematic because prognostic factors may influence whether individuals adhere to their assigned intervention. If deviations are present, it is still possible to use data from a randomised trial to derive an unbiased estimate of the effect of adhering to intervention (Hernán and Robins 2017). However, appropriate methods require strong assumptions and published applications of such methods are relatively rare to date. When authors wish to assess the risk of bias in the estimated effect of adhering to intervention, use of results based on modern statistical methods may be at lower risk of bias than results based on ‘as-treated’ or naïve per-protocol analyses.
Trial authors often estimate the effect of intervention using more than one approach. They may not explain the reasons for their choice of analysis approach, or whether their aim is to estimate the effect of assignment or adherence to intervention. We recommend that when the effect of interest is that of assignment to intervention, the trial result included in meta-analyses, and assessed for risk of bias, should be chosen according to the following order of preference:
- the result corresponding to a full ITT analysis, as defined above;
- the result corresponding to an analysis (sometimes described as a ‘modified intention-to-treat’ (mITT) analysis) that adheres to ITT principles except that participants with missing outcome data are excluded (see Section 8.4.2; such an analysis does not prevent bias due to missing outcome data, which is addressed in the corresponding domain of the risk-of-bias assessment);
- a result corresponding to an ‘as-treated’ or naïve ‘per-protocol’ analysis, or an analysis from which eligible trial participants were excluded.
8.2.3 Domains of bias and how they are addressed
The domains included in RoB 2 cover all types of bias that are currently understood to affect the results of randomized trials. These are:
- bias arising from the randomization process;
- bias due to deviations from intended interventions;
- bias due to missing outcome data;
- bias in measurement of the outcome; and
- bias in selection of the reported result.
Each domain is required, and no additional domains should be added. Table 8.2.a summarizes the issues addressed within each bias domain.
For each domain, the tool comprises:
- a series of ‘signalling questions’;
- a judgement about risk of bias for the domain, which is facilitated by an algorithm that maps responses to the signalling questions to a proposed judgement;
- free text boxes to justify responses to the signalling questions and risk-of-bias judgements; and
- an option to predict (and explain) the likely direction of bias.
The signalling questions aim to provide a structured approach to eliciting information relevant to an assessment of risk of bias. They seek to be reasonably factual in nature, but some may require a degree of judgement. The response options are:
- Probably yes;
- Probably no;
- No information.
To maximize their simplicity and clarity, the signalling questions are phrased such that a response of ‘Yes’ may indicate either a low or high risk of bias, depending on the most natural way to ask the question. Responses of ‘Yes’ and ‘Probably yes’ have the same implications for risk of bias, as do responses of ‘No’ and ‘Probably no’. The definitive responses (‘Yes’ and ‘No’) would typically imply that firm evidence is available in relation to the signalling question; the ‘Probably’ versions would typically imply that a judgement has been made. Although not required, if review authors wish to calculate measures of agreement (e.g. kappa statistics) for the answers to the signalling questions, we recommend treating ‘Yes’ and ‘Probably yes’ as the same response, and ‘No’ and ‘Probably no’ as the same response.
The ‘No information’ response should be used only when both (1) insufficient details are reported to permit a response of ‘Yes’, ‘Probably yes’, ‘No’ or ‘Probably no’, and (2) in the absence of these details it would be unreasonable to respond ‘Probably yes’ or ‘Probably no’ given the circumstances of the trial. For example, in the context of a large trial run by an experienced clinical trials unit for regulatory purposes, if specific information about the randomization methods is absent, it may still be reasonable to respond ‘Probably yes’ rather than ‘No information’ to the signalling question about allocation sequence concealment.
The implications of a ‘No information’ response to a signalling question differ according to the purpose of the question. If the question seeks to identify evidence of a problem, then ‘No information’ corresponds to no evidence of that problem. If the question relates to an item that is expected to be reported (such as whether any participants were lost to follow-up), then the absence of information leads to concerns about there being a problem.
A response option ‘Not applicable’ is available for signalling questions that are answered only if the response to a previous question implies that they are required.
Signalling questions should be answered independently: the answer to one question should not affect answers to other questions in the same or other domains other than through determining which subsequent questions are answered.
Once the signalling questions are answered, the next step is to reach a risk-of-bias judgement, and assign one of three levels to each domain:
- Low risk of bias;
- Some concerns; or
- High risk of bias.
The RoB 2 tool includes algorithms that map responses to signalling questions to a proposed risk-of-bias judgement for each domain (see the full documentation at www.riskofbias.info for details). The algorithms include specific mappings of each possible combination of responses to the signalling questions (including responses of ‘No information’) to judgements of low risk of bias, some concerns or high risk of bias.
Use of the word ‘judgement’ is important for the risk-of-bias assessment. The algorithms provide proposed judgements, but review authors should verify these and change them if they feel this is appropriate. In reaching final judgements, review authors should interpret ‘risk of bias’ as ‘risk of material bias’. That is, concerns should be expressed only about issues that are likely to affect the ability to draw reliable conclusions from the study.
A free text box alongside the signalling questions and judgements provides space for review authors to present supporting information for each response. In some instances, when the same information is likely to be used to answer more than one question, one text box covers more than one signalling question. Brief, direct quotations from the text of the study report should be used whenever possible. It is important that reasons are provided for any judgements that do not follow the algorithms. The tool also provides space to indicate all the sources of information about the study obtained to inform the judgements (e.g. published papers, trial registry entries, additional information from the study authors).
RoB 2 includes optional judgements of the direction of the bias for each domain and overall. For some domains, the bias is most easily thought of as being towards or away from the null. For example, high levels of switching of participants from their assigned intervention to the other intervention may have the effect of reducing the observed difference between the groups, leading to the estimated effect of adhering to intervention (see Section 8.2.2) being biased towards the null. For other domains, the bias is likely to favour one of the interventions being compared, implying an increase or decrease in the effect estimate depending on which intervention is favoured. Examples include manipulation of the randomization process, awareness of interventions received influencing the outcome assessment and selective reporting of results. If review authors do not have a clear rationale for judging the likely direction of the bias, they should not guess it and can leave this response blank.
Table 8.2.a Bias domains included in version 2 of the Cochrane risk-of-bias tool for randomized trials, with a summary of the issues addressed
Bias arising from the randomization process
Bias due to deviations from intended interventions
When the review authors’ interest is in the effect of assignment to intervention (see Section 8.2.2):
When the review authors’ interest is in the effect of adhering to intervention (see Section 8.2.2):
Bias due to missing outcome data
Bias in measurement of the outcome
Bias in selection of the reported result
* For the precise wording of signalling questions and guidance for answering each one, see the full risk-of-bias tool at www.riskofbias.info.
8.2.4 Reaching an overall risk-of-bias judgement for a result
The response options for an overall risk-of-bias judgement are the same as for individual domains. Table 8.2.b shows the approach to mapping risk-of-bias judgements within domains to an overall judgement for the outcome.
Judging a result to be at a particular level of risk of bias for an individual domain implies that the result has an overall risk of bias at least this severe. Therefore, a judgement of ‘High’ risk of bias within any domain should have similar implications for the result, irrespective of which domain is being assessed. In practice this means that if the answers to the signalling questions yield a proposed judgement of ‘High’ risk of bias, the assessors should consider whether any identified problems are of sufficient concern to warrant this judgement for that result overall. If this is not the case, the appropriate action would be to override the proposed default judgement and provide justification. ‘Some concerns’ in multiple domains may lead review authors to decide on an overall judgement of ‘High’ risk of bias for that result or group of results.
Once an overall judgement has been reached for an individual trial result, this information will need to be presented in the review and reflected in the analysis and conclusions. For discussion of the presentation of risk-of-bias assessments and how they can be incorporated into analyses, see Chapter 7. Risk-of-bias assessments also feed into one domain of the GRADE approach for assessing certainty of a body of evidence, as discussed in Chapter 14.
Table 8.2.b Reaching an overall risk-of-bias judgement for a specific outcome
Overall risk-of-bias judgement
Low risk of bias
The trial is judged to be at low risk of bias for all domains for this result.
The trial is judged to raise some concerns in at least one domain for this result, but not to be at high risk of bias for any domain.
High risk of bias
The trial is judged to be at high risk of bias in at least one domain for this result.
The trial is judged to have some concerns for multiple domains in a way that substantially lowers confidence in the result.
8.3 Bias arising from the randomization process
If successfully accomplished, randomization avoids the influence of either known or unknown prognostic factors (factors that predict the outcome, such as severity of illness or presence of comorbidities) on the assignment of individual participants to intervention groups. This means that, on average, each intervention group has the same prognosis before the start of intervention. If prognostic factors influence the intervention group to which participants are assigned then the estimated effect of intervention will be biased by ‘confounding’, which occurs when there are common causes of intervention group assignment and outcome. Confounding is an important potential cause of bias in intervention effect estimates from observational studies, because treatment decisions in routine care are often influenced by prognostic factors.
To randomize participants into a study, an allocation sequence that specifies how participants will be assigned to interventions is generated, based on a process that includes an element of chance. We call this allocation sequence generation. Subsequently, steps must be taken to prevent participants or trial personnel from knowing the forthcoming allocations until after recruitment has been confirmed. This process is often termed allocation sequence concealment.
Knowledge of the next assignment (e.g. if the sequence is openly posted on a bulletin board) can enable selective enrolment of participants on the basis of prognostic factors. Participants who would have been assigned to an intervention deemed to be ‘inappropriate’ may be rejected. Other participants may be directed to the ‘appropriate’ intervention, which can be accomplished by delaying their entry into the trial until the desired allocation appears. For this reason, successful allocation sequence concealment is a vital part of randomization.
Some review authors confuse allocation sequence concealment with blinding of assigned interventions during the trial. Allocation sequence concealment seeks to prevent bias in intervention assignment by preventing trial personnel and participants from knowing the allocation sequence before and until assignment. It can always be successfully implemented, regardless of the study design or clinical area (Schulz et al 1995, Jüni et al 2001). In contrast, blinding seeks to prevent bias after assignment (Jüni et al 2001, Schulz et al 2002) and cannot always be implemented. This is often the situation, for example, in trials comparing surgical with non-surgical interventions.
8.3.1 Approaches to sequence generation
Randomization with no constraints is called simple randomization or unrestricted randomization. Sometimes blocked randomization (restricted randomization) is used to ensure that the desired ratio of participants in the experimental and comparator intervention groups (e.g. 1:1) is achieved (Schulz and Grimes 2002, Schulz and Grimes 2006). This is done by ensuring that the numbers of participants assigned to each intervention group is balanced within blocks of specified size (e.g. for every 10 consecutively entered participants): the specified number of allocations to experimental and comparator intervention groups is assigned in random order within each block. If the block size is known to trial personnel and the intervention group is revealed after assignment, then the last allocation within each block can always be predicted. To avoid this problem multiple block sizes may be used, and randomly varied (random permuted blocks).
Stratified randomization, in which randomization is performed separately within subsets of participants defined by potentially important prognostic factors, such as disease severity and study centres, is also common. In practice, stratified randomization is usually performed together with blocked randomization. The purpose of combining these two procedures is to ensure that experimental and comparator groups are similar with respect to the specified prognostic factors other than intervention. If simple (rather than blocked) randomization is used in each stratum, then stratification offers no benefit, but the randomization is still valid.
Another approach that incorporates both general concepts of stratification and restricted randomization is minimization. Minimization algorithms assign the next intervention in a way that achieves the best balance between intervention groups in relation to a specified set of prognostic factors. Minimization generally includes a random element (at least for participants enrolled when the groups are balanced with respect to the prognostic factors included in the algorithm) and should be implemented along with clear strategies for allocation sequence concealment. Some methodologists are cautious about the acceptability of minimization, while others consider it to be an attractive approach (Brown et al 2005, Clark et al 2016).
8.3.2 Allocation sequence concealment and failures of randomization
If future assignments can be anticipated, leading to a failure of allocation sequence concealment, then bias can arise through selective enrolment of participants into a study, depending on their prognostic factors. Ways in which this can happen include:
- knowledge of a deterministic assignment rule, such as by alternation, date of birth or day of admission;
- knowledge of the sequence of assignments, whether randomized or not (e.g. if a sequence of random assignments is posted on the wall); and
- ability to predict assignments successfully, based on previous assignments.
The last of these can occur when blocked randomization is used and assignments are known to the recruiter after each participant is enrolled into the trial. It may then be possible to predict future assignments for some participants, particularly when blocks are of a fixed size and are not divided across multiple recruitment centres (Berger 2005).
Attempts to achieve allocation sequence concealment may be undermined in practice. For example, unsealed allocation envelopes may be opened, while translucent envelopes may be held against a bright light to reveal the contents (Schulz et al 1995, Schulz 1995, Jüni et al 2001). Personal accounts suggest that many allocation schemes have been deduced by investigators because the methods of concealment were inadequate (Schulz 1995).
The success of randomization in producing comparable groups is often examined by comparing baseline values of important prognostic factors between intervention groups. Corbett and colleagues have argued that risk-of-bias assessments should consider whether participant characteristics are balanced between intervention groups (Corbett et al 2014). The RoB 2 tool includes consideration of situations in which baseline characteristics indicate that something may have gone wrong with the randomization process. It is important that baseline imbalances that are consistent with chance are not interpreted as evidence of risk of bias. Chance imbalances are not a source of systematic bias, and the RoB 2 tool does not aim to identify imbalances in baseline variables that have arisen due to chance.
8.4 Bias due to deviations from intended interventions
This domain relates to biases that arise when there are deviations from the intended interventions. Such differences could be the administration of additional interventions that are inconsistent with the trial protocol, failure to implement the protocol interventions as intended, or non-adherence by trial participants to their assigned intervention. Biases that arise due to deviations from intended interventions are sometimes referred to as performance biases.
The intended interventions are those specified in the trial protocol. It is often intended that interventions should change or evolve in response to the health of, or events experienced by, trial participants. For example, the investigators may intend that:
- in a trial of a new drug to control symptoms of rheumatoid arthritis, participants experiencing severe toxicities should receive additional care and/or switch to an alternative drug;
- in a trial of a specified cancer drug regimen, participants whose cancer progresses should switch to a second-line intervention; or
- in a trial comparing surgical intervention with conservative management of stable angina, participants who progress to unstable angina receive surgical intervention.
Unfortunately, trial protocols may not fully specify the circumstances in which deviations from the initial intervention should occur, or distinguish changes to intervention that are consistent with the intentions of the investigators from those that should be considered as deviations from the intended intervention. For example, a cancer trial protocol may not define progression, or specify the second-line drug that should be used in patients who progress (Hernán and Scharfstein 2018). It may therefore be necessary for review authors to document changes that are and are not considered to be deviations from intended intervention. Similarly, for trials in which the comparator intervention is ‘usual care’, the protocol may not specify interventions consistent with usual care or whether they are expected to be used alongside the experimental intervention. Review authors may therefore need to document what departures from usual care will be considered as deviations from intended intervention.
8.4.1 Non-protocol interventions
Non-protocol interventions that trial participants might receive during trial follow up and that are likely to affect the outcome of interest can lead to bias in estimated intervention effects. If possible, review authors should specify potential non-protocol interventions in advance (at review protocol writing stage). Non-protocol interventions may be identified through the expert knowledge of members of the review group, via reviews of the literature, and through discussions with health professionals.
8.4.2 The role of the effect of interest
As described in Section 8.2.2, assessments for this domain depend on the effect of interest. In RoB 2, the only deviations from the intended intervention that are addressed in relation to the effect of assignment to the intervention are those that:
- are inconsistent with the trial protocol;
- arise because of the experimental context; and
- influence the outcome.
For example, in an unblinded study participants may feel unlucky to have been assigned to the comparator group and therefore seek the experimental intervention, or other interventions that improve their prognosis. Similarly, monitoring patients randomized to a novel intervention more frequently than those randomized to standard care would increase the risk of bias, unless such monitoring was an intended part of the novel intervention. Deviations from intervention that do not arise because of the experimental context, such as a patient’s choice to stop taking their assigned medication.
To examine the effect of adhering to the interventions as specified in the trial protocol, it is important to specify what types of deviations from the intended intervention will be examined. These will be one or more of:
- how well the intervention was implemented;
- how well participants adhered to the intervention (without discontinuing or switching to another intervention);
- whether non-protocol interventions were received alongside the intended intervention and (if so) whether they were balanced across intervention groups; and
- if such deviations are present, review authors should consider whether appropriate statistical methods were used to adjust for their effects.
8.4.3 The role of blinding
Bias due to deviations from intended interventions can sometimes be reduced or avoided by implementing mechanisms that ensure the participants, carers and trial personnel (i.e. people delivering the interventions) are unaware of the interventions received. This is commonly referred to as ‘blinding’, although in some areas (including eye health) the term ‘masking’ is preferred. Blinding, if successful, should prevent knowledge of the intervention assignment from influencing contamination (application of one of the interventions in participants intended to receive the other), switches to non-protocol interventions or non-adherence by trial participants.
Trial reports often describe blinding in broad terms, such as ‘double blind’. This term makes it difficult to know who was blinded (Schulz et al 2002). Such terms are also used inconsistently (Haahr and Hróbjartsson 2006). A review of methods used for blinding highlights the variety of methods used in practice (Boutron et al 2006).
Blinding during a trial can be difficult or impossible in some contexts, for example in a trial comparing a surgical with a non-surgical intervention. Non-blinded (‘open’) trials may take other measures to avoid deviations from intended intervention, such as treating patients according to strict criteria that prevent administration of non-protocol interventions.
Lack of blinding of participants, carers or people delivering the interventions may cause bias if it leads to deviations from intended interventions. For example, low expectations of improvement among participants in the comparator group may lead them to seek and receive the experimental intervention. Such deviations from intended intervention that arise due to the experimental context can lead to bias in the estimated effects of both assignment to intervention and of adhering to intervention.
An attempt to blind participants, carers and people delivering the interventions to intervention group does not ensure successful blinding in practice. For many blinded drug trials, the side effects of the drugs allow the possible detection of the intervention being received for some participants, unless the study compares similar interventions, for example drugs with similar side effects, or uses an active placebo (Boutron et al 2006, Bello et al 2017, Jensen et al 2017).
Deducing the intervention received, for example among participants experiencing side effects that are specific to the experimental intervention, does not in itself lead to a risk of bias. As discussed, cessation of a drug intervention because of toxicity will usually not be considered a deviation from intended intervention. See the elaborations that accompany the signalling questions in the full guidance at www.riskofbias.info for further discussion of this issue.
Risk of bias in this domain may differ between outcomes, even if the same people were aware of intervention assignments during the trial. For example, knowledge of the assigned intervention may affect behaviour (such as number of clinic visits), while not having an important impact on physiology (including risk of mortality).
Blinding of outcome assessors, to avoid bias in measuring the outcome, is considered separately, in the ‘Bias in measurement of outcomes’ domain. Bias due to differential rates of dropout (withdrawal from the study) is considered in the ‘Bias due to missing outcome data’ domain.
8.4.4 Appropriate analyses
For the effect of assignment to intervention, an appropriate analysis should follow the principles of ITT (see Section 8.2.2). Some authors may report a ‘modified intention-to-treat’ (mITT) analysis in which participants with missing outcome data are excluded. Such an analysis may be biased because of the missing outcome data: this is addressed in the domain ‘Bias due to missing outcome data’. Note that the phrase ‘modified intention-to-treat’ is used in different ways, and may refer to inclusion of participants who received at least one dose of treatment (Abraha and Montedori 2010); our use of the term refers to missing data rather than to adherence to intervention.
Inappropriate analyses include ‘as-treated’ analyses, naïve ‘per-protocol’ analyses, and other analyses based on post-randomization exclusion of eligible trial participants on whom outcomes were measured (Hernán and Hernandez-Diaz 2012) (see also Section 8.2.2).
For the effect of adhering to intervention, appropriate analysis approaches are described by Hernán and Robins (Hernán and Robins 2017). Instrumental variable approaches can be used in some circumstances to estimate the effect of intervention among participants who received the assigned intervention.
8.5 Bias due to missing outcome data
Missing measurements of the outcome may lead to bias in the intervention effect estimate. Possible reasons for missing outcome data include (National Research Council 2010):
- participants withdraw from the study or cannot be located (‘loss to follow-up’ or ‘dropout’);
- participants do not attend a study visit at which outcomes should have been measured;
- participants attend a study visit but do not provide relevant data;
- data or records are lost or are unavailable for other reasons; and
- participants can no longer experience the outcome, for example because they have died.
This domain addresses risk of bias due to missing outcome data, including biases introduced by procedures used to impute, or otherwise account for, the missing outcome data.
Some participants may be excluded from an analysis for reasons other than missing outcome data. In particular, a naïve ‘per-protocol’ analysis is restricted to participants who received the intended intervention. Potential bias introduced by such analyses, or by other exclusions of eligible participants for whom outcome data are available, is addressed in the domain ‘Bias due to deviations from intended interventions’ (see Section 8.4).
The ITT principle of measuring outcome data on all participants (see Section 8.2.2) is frequently difficult or impossible to achieve in practice. Therefore, it can often only be followed by making assumptions about the missing outcome values. Even when an analysis is described as ITT, it may exclude participants with missing outcome data and be at risk of bias (such analyses may be described as ‘modified intention-to-treat’ (mITT) analyses). Therefore, assessments of risk of bias due to missing outcome data should be based on the issues addressed in the signalling questions for this domain, and not on the way that trial authors described the analysis.
8.5.1 When do missing outcome data lead to bias?
Analyses excluding individuals with missing outcome data are examples of ‘complete-case’ analyses (analyses restricted to individuals in whom there were no missing values of included variables). To understand when missing outcome data lead to bias in such analyses, we need to consider:
- the true value of the outcome in participants with missing outcome data: this is the value of the outcome that should have been measured but was not; and
- the missingness mechanism, which is the process that led to outcome data being missing.
Whether missing outcome data lead to bias in complete case analyses depends on whether the missingness mechanism is related to the true value of the outcome. Equivalently, we can consider whether the measured (non-missing) outcomes differ systematically from the missing outcomes (the true values in participants with missing outcome data). For example, consider a trial of cognitive behavioural therapy compared with usual care for depression. If participants who are more depressed are less likely to return for follow-up, then whether a measurement of depression is missing depends on its true value which implies that the measured depression outcomes will differ systematically from the true values of the missing depression outcomes.
The specific situations in which a complete case analysis suffers from bias (when there are missing data) are discussed in detail in the full guidance for the RoB 2 tool at www.riskofbias.info. In brief:
- missing outcome data will not lead to bias if missingness in the outcome is unrelated to its true value, within each intervention group;
- missing outcome data will lead to bias if missingness in the outcome depends on both the intervention group and the true value of the outcome; and
- missing outcome data will often lead to bias if missingness is related to its true value and, additionally, the effect of the experimental intervention differs from that of the comparator intervention.
8.5.2 When is the amount of missing outcome data small enough to exclude bias?
It is tempting to classify risk of bias according to the proportion of participants with missing outcome data.
Unfortunately, there is no sensible threshold for ‘small enough’ in relation to the proportion of missing outcome data.
In situations where missing outcome data lead to bias, the extent of bias will increase as the amount of missing outcome data increases. There is a tradition of regarding a proportion of less than 5% missing outcome data as ‘small’ (with corresponding implications for risk of bias), and over 20% as ‘large’. However, the potential impact of missing data on estimated intervention effects depends on the proportion of participants with missing data, the type of outcome and (for dichotomous outcome) the risk of the event. For example, consider a study of 1000 participants in the intervention group where the observed mortality is 2% for the 900 participants with outcome data (18 deaths). Even though the proportion of data missing is only 10%, if the mortality rate in the 100 missing participants is 20% (20 deaths), the overall true mortality of the intervention group would be nearly double (3.8% vs 2%) that estimated from the observed data.
8.5.3 Judging risk of bias due to missing outcome data
It is not possible to examine directly whether the chance that the outcome is missing depends on its true value: judgements of risk of bias will depend on the circumstances of the trial. Therefore, we can only be sure that there is no bias due to missing outcome data when: (1) the outcome is measured in all participants; (2) the proportion of missing outcome data is sufficiently low that any bias is too small to be of importance; or (3) sensitivity analyses (conducted by either the trial authors or the review authors) confirm that plausible values of the missing outcome data could make no important difference to the estimated intervention effect.
Indirect evidence that missing outcome data are likely to cause bias can come from examining: (1) differences between the proportion of missing outcome data in the experimental and comparator intervention groups; and (2) reasons that outcome data are missing.
If the effects of the experimental and comparator interventions on the outcome are different, and missingness in the outcome depends on its true value, then the proportion of participants with missing data is likely to differ between the intervention groups. Therefore, differing proportions of missing outcome data in the experimental and comparator intervention groups provide evidence of potential bias.
Trial reports may provide reasons why participants have missing data. For example, trials of haloperidol to treat dementia reported various reasons such as ‘lack of efficacy’, ‘adverse experience’, ‘positive response’, ‘withdrawal of consent’ and ‘patient ran away’, and ‘patient sleeping’ (Higgins et al 2008). It is likely that some of these (e.g. ‘lack of efficacy’ and ‘positive response’) are related to the true values of the missing outcome data. Therefore, these reasons increase the risk of bias if the effects of the experimental and comparator interventions differ, or if the reasons are related to intervention group (e.g. ‘adverse experience’).
In practice, our ability to assess risk of bias will be limited by the extent to which trial authors collected and reported reasons that outcome data were missing. The situation most likely to lead to bias is when reasons for missing outcome data differ between the intervention groups: for example if participants who became seriously unwell withdrew from the comparator group while participants who recovered withdrew from the experimental intervention group.
Trial authors may present statistical analyses (in addition to or instead of complete case analyses) that attempt to address the potential for bias caused by missing outcome data. Approaches include single imputation (e.g. assuming the participant had no event; last observation carried forward), multiple imputation and likelihood-based methods (see Chapter 10, Section 10.12.2). Imputation methods are unlikely to remove or reduce the bias that occurs when missingness in the outcome depends on its true value, unless they use information additional to intervention group assignment to predict the missing values. Review authors may attempt to address missing data using sensitivity analyses, as discussed in Chapter 10, Section 10.12.3.
8.6 Bias in measurement of the outcome
Errors in measurement of outcomes can bias intervention effect estimates. These are often referred to as measurement error (for continuous outcomes), misclassification (for dichotomous or categorical outcomes) or under-ascertainment/over-ascertainment (for events). Measurement errors may be differential or non-differential in relation to intervention assignment:
- Differential measurement errors are related to intervention assignment. Such measures are systematically different between experimental and comparator intervention groups and are less likely when outcome assessors are blinded to intervention assignment.
- Non-differential measurement errors are unrelated to intervention assignment.
This domain relates primarily to differential errors. Non-differential measurement errors are not addressed in detail.
Risk of bias in this domain depends on the following five considerations.
1. Whether the method of measuring the outcome is appropriate. Outcomes in randomized trials should be assessed using appropriate outcome measures. For example, portable blood glucose machines used by trial participants may not reliably measure below 3.1mmol, leading to an inability to detect differences in rates of severe hypoglycaemia between an insulin intervention and placebo, and under-representation of the true incidence of this adverse effect. Such a measurement would be inappropriate for this outcome.
2. Whether measurement or ascertainment of the outcome differs, or could differ, between intervention groups. The methods used to measure or ascertain outcomes should be the same across intervention groups. This is usually the case for pre-specified outcomes, but problems may arise with passive collection of outcome data, as is often the case for unexpected adverse effects. For example, in a placebo-controlled trial, severe headaches occur more frequently in participants assigned to a new drug than those assigned to placebo. These lead to more MRI scans being done in the experimental intervention group, and therefore to more diagnoses of symptomless brain tumours, even though the drug does not increase the incidence of brain tumours. Even for a pre-specified outcome measure, the nature of the intervention may lead to methods of measuring the outcome that are not comparable across intervention groups. For example, an intervention involving additional visits to a healthcare provider may lead to additional opportunities for outcome events to be identified, compared with the comparator intervention.
3. Who is the outcome assessor. The outcome assessor can be:
- the participant, when the outcome is a participant-reported outcome such as pain, quality of life, or self-completed questionnaire;
- the intervention provider, when the outcome is the result of a clinical examination, the occurrence of a clinical event or a therapeutic decision such as decision to offer a surgical intervention; or
- an observer not directly involved in the intervention provided to the participant, such as an adjudication committee, or a health professional recording outcomes for inclusion in disease registries.
4. Whether the outcome assessor is blinded to intervention assignment. Blinding of outcome assessors is often possible even when blinding of participants and personnel during the trial is not feasible. However, it is particularly difficult for participant-reported outcomes: for example, in a trial comparing surgery with medical management when the outcome is pain at 3 months. The potential for bias cannot be ignored even if the outcome assessor cannot be blinded.
5. Whether the assessment of outcome is likely to be influenced by knowledge of intervention received. For trials in which outcome assessors were not blinded, the risk of bias will depend on whether the outcome assessment involves judgement, which depends on the type of outcome. We describe most situations in Table 8.6.a.
Table 8.6.a Considerations of risk of bias in measurement of the outcome for different types of outcomes
Who is the outcome assessor?
Implications for risk of bias if the outcome assessor is aware of the intervention assignment
Reports coming directly from participants about how they function or feel in relation to a health condition or intervention, without interpretation by anyone else. They include any evaluation obtained directly from participants through interviews, self-completed questionnaires or hand-held devices.
Pain, nausea and health-related quality of life.
The participant, even if a blinded interviewer is questioning the participant and completing a questionnaire on their behalf.
The outcome assessment is potentially influenced by knowledge of intervention received, leading to a judgement of at least ‘Some concerns’. Review authors will need to judge whether it is likely that participants’ reporting of the outcome was influenced by knowledge of intervention received, in which case risk of bias is considered high.
Observer-reported outcomes not involving judgement
Outcomes reported by an external observer (e.g. an intervention provider, independent researcher, or radiologist) that do not involve any judgement from the observer.
All-cause mortality or the result of an automated test.
The assessment of outcome is usually not likely to be influenced by knowledge of intervention received.
Observer-reported outcomes involving some judgement
Outcomes reported by an external observer (e.g. an intervention provider, independent researcher, or radiologist) that involve some judgement.
Assessment of an X-ray or other image, clinical examination and clinical events other than death (e.g. myocardial infarction) that require judgements on clinical definitions or medical records.
The assessment of outcome is potentially influenced by knowledge of intervention received, leading to a judgement of at least ‘Some concerns’. Review authors will need to judge whether it is likely that assessment of the outcome was influenced by knowledge of intervention received, in which case risk of bias is considered high.
Outcomes that reflect decisions made by the intervention provider
Outcomes that reflect decisions made by the intervention provider, where recording of the decisions does not involve any judgement, but where the decision itself can be influenced by knowledge of intervention received.
Hospitalization, stopping treatment, referral to a different ward, performing a caesarean section, stopping ventilation and discharge of the participant.
The care provider making the decision.
Assessment of outcome is usually likely to be influenced by knowledge of intervention received, if the care provider is aware of this. This is particularly important when preferences or expectations regarding the effect of the experimental intervention are strong.
Combination of multiple end points into a single outcome. Typically, participants who have experienced any of a specified set of endpoints are considered to have experienced the composite outcome. Composite endpoints can also be constructed from continuous outcome measures.
Major adverse cardiac and cerebrovascular events.
Any of the above.
Assessment of risk of bias for composite outcomes should take into account the frequency or contribution of each component and the risk of bias due to the most influential components.
8.7 Bias in selection of the reported result
This domain addresses bias that arises because the reported result is selected (based on its direction, magnitude or statistical significance) from among multiple intervention effect estimates that were calculated by the trial authors. Consideration of risk of bias requires distinction between:
- an outcome domain: this is a state or endpoint of interest, irrespective of how it is measured (e.g. presence or severity of depression);
- a specific outcome measurement (e.g. measurement of depression using the Hamilton rating scale 6 weeks after starting intervention); and
- an outcome analysis: this is a specific result obtained by analysing one or more outcome measurements (e.g. the difference in mean change in Hamilton rating scale scores from baseline to 6 weeks between experimental and comparator groups).
This domain does not address bias due to selective non-reporting (or incomplete reporting) of outcome domains that were measured and analysed by the trial authors (Kirkham et al 2010). For example, deaths of trial participants may be recorded by the trialists, but the reports of the trial might contain no data for deaths, or state only that the effect estimate for mortality was not statistically significant. Such bias puts the result of a synthesis at risk because results are omitted based on their direction, magnitude or statistical significance. It should therefore be addressed at the review level, as part of an integrated assessment of the risk of reporting bias (Page and Higgins 2016). For further guidance, see Chapter 7 and Chapter 13.
Bias in selection of the reported result typically arises from a desire for findings to support vested interests or to be sufficiently noteworthy to merit publication. It can arise for both harms and benefits, although the motivations may differ. For example, in trials comparing an experimental intervention with placebo, trialists who have a preconception or vested interest in showing that the experimental intervention is beneficial and safe may be inclined to be selective in reporting efficacy estimates that are statistically significant and favourable to the experimental intervention, along with harm estimates that are not significantly different between groups. In contrast, other trialists may selectively report harm estimates that are statistically significant and unfavourable to the experimental intervention if they believe that publicizing the existence of a harm will increase their chances of publishing in a high impact journal.
This domain considers:
1. Whether the trial was analysed in accordance with a pre-specified plan that was finalized before unblinded outcome data were available for analysis. We strongly encourage review authors to attempt to retrieve the pre-specified analysis intentions for each trial (see Chapter 7, Section 7.3.1). Doing so allows for the identification of any outcome measures or analyses that have been omitted from, or added to, the results report, post hoc. Review authors should ideally ask the study authors to supply the study protocol and full statistical analysis plan if these are not publicly available. In addition, if outcome measures and analyses mentioned in an article, protocol or trial registration record are not reported, study authors could be asked to clarify whether those outcome measures were in fact analysed and, if so, to supply the data.
Trial protocols should describe how unexpected adverse outcomes (that potentially reflect unanticipated harms) will be collected and analysed. However, results based on spontaneously reported adverse outcomes may lead to concerns that these were selected based on the finding being noteworthy.
For some trials, the analysis intentions will not be readily available. It is still possible to assess the risk of bias in selection of the reported result. For example, outcome measures and analyses listed in the methods section of an article can be compared with those reported. Furthermore, outcome measures and analyses should be compared across different papers describing the trial.
2. Selective reporting of a particular outcome measurement (based on the results) from among estimates for multiple measurements assessed within an outcome domain. Examples include:
- reporting only one or a subset of time points at which the outcome was measured;
- use of multiple measurement instruments (e.g. pain scales) and only reporting data for the instrument with the most favourable result;
- having multiple assessors measure an outcome domain (e.g. clinician-rated and patient-rated depression scales) and only reporting data for the measure with the most favourable result; and
- reporting only the most favourable subscale (or a subset of subscales) for an instrument when measurements for other subscales were available.
3. Selective reporting of a particular analysis (based on the results) from multiple analyses estimating intervention effects for a specific outcome measurement. Examples include:
- carrying out analyses of both change scores and post-intervention scores adjusted for baseline and reporting only the more favourable analysis;
- multiple analyses of a particular outcome measurement with and without adjustment for prognostic factors (or with adjustment for different sets of prognostic factors);
- a continuously scaled outcome converted to categorical data on the basis of multiple cut-points; and
- effect estimates generated for multiple composite outcomes with full reporting of just one or a subset.
Either type of selective reporting will lead to bias if selection is based on the direction, magnitude or statistical significance of the effect estimate.
Insufficient detail in some documents may preclude full assessment of the risk of bias (e.g. trialists only state in the trial registry record that they will measure ‘pain’, without specifying the measurement scale, time point or metric that will be used). Review authors should indicate insufficient information alongside their responses to signalling questions.
8.8 Differences from the previous version of the tool
Version 2 of the tool replaces the first version, originally published in version 5 of the Handbook in 2008, and updated in 2011 (Higgins et al 2011). Research in the field has progressed, and RoB 2 reflects current understanding of how the causes of bias can influence study results, and the most appropriate ways to assess this risk.
Authors familiar with the previous version of the tool, which is used widely in Cochrane and other systematic reviews, will notice several changes:
- assessment of bias is at the level of an individual result, rather than at a study or outcome level;
- the names given to the bias domains describe more clearly the issues targeted and should reduce confusion arising from terms that are used in different ways or may be unfamiliar (such as ‘selection bias’ and ‘performance bias’) (Mansournia et al 2017);
- signalling questions have been introduced, along with algorithms to assist authors in reaching a judgement about risk of bias for each domain;
- a distinction is introduced between considering the effect of assignment to intervention and the effect of adhering to intervention, with implications for the assessment of bias due to deviations from intended interventions;
- the assessment of bias arising from the exclusion of participants from the analysis (for example, as part of a naïve ‘per-protocol’ analysis) is under the domain of bias due to deviations from the intended intervention, rather than bias due to missing outcome data;
- the concept of selective reporting of a result is distinguished from that of selective non-reporting of a result, with the latter concept removed from the tool so that it can be addressed (more appropriately) at the level of the synthesis (see Chapter 13);
- the option to add new domains has been removed;
- an explicit process for reaching a judgement about the overall risk of bias in the result has been introduced.
Because most Cochrane Reviews published before 2019 used the first version of the tool, authors working on updating these reviews should refer to online Chapter IV for guidance on considering whether to change methodology when updating a review.
8.9 Chapter information
Authors: Julian PT Higgins, Jelena Savović, Matthew J Page, Roy G Elbers, Jonathan AC Sterne
Acknowledgements: Contributors to the development of bias domains were: Natalie Blencowe, Isabelle Boutron, Christopher Cates, Rachel Churchill, Mark Corbett, Nicky Cullum, Jonathan Emberson, Sally Hopewell, Asbjørn Hróbjartsson, Sharea Ijaz, Peter Jüni, Jamie Kirkham, Toby Lasserson, Tianjing Li, Barney Reeves, Sasha Shepperd, Ian Shrier, Lesley Stewart, Kate Tilling, Ian White, Penny Whiting. Other contributors were: Henning Keinke Andersen, Vincent Cheng, Mike Clarke, Jon Deeks, Miguel Hernán, Daniela Junqueira, Yoon Loke, Geraldine MacDonald, Alexandra McAleenan, Richard Morris, Mona Nasser, Nishith Patel, Jani Ruotsalainen, Holger Schünemann, Jayne Tierney, Sunita Vohra, Liliane Zorzela.
Funding: Development of RoB 2 was supported by the Medical Research Council (MRC) Network of Hubs for Trials Methodology Research (MR/L004933/2- N61) hosted by the MRC ConDuCT-II Hub (Collaboration and innovation for Difficult and Complex randomised controlled Trials In Invasive procedures – MR/K025643/1), by a Methods Innovation Fund grant from Cochrane and by MRC grant MR/M025209/1. JPTH and JACS are members of the National Institute for Health Research (NIHR) Biomedical Research Centre at University Hospitals Bristol NHS Foundation Trust and the University of Bristol, and the MRC Integrative Epidemiology Unit at the University of Bristol. JPTH, JS and JACS are members of the NIHR Collaboration for Leadership in Applied Health Research and Care West (CLAHRC West) at University Hospitals Bristol NHS Foundation Trust. JPTH and JACS received funding from NIHR Senior Investigator awards NF-SI-0617-10145 and NF-SI-0611-10168, respectively. MJP received funding from an Australian National Health and Medical Research Council (NHMRC) Early Career Fellowship (1088535). The views expressed are those of the authors and not necessarily those of the National Health Service, the NIHR, the UK Department of Health and Social Care, the MRC or the Australian NHMRC.
Abraha I, Montedori A. Modified intention to treat reporting in randomised controlled trials: systematic review. BMJ 2010; 340: c2697.
Bell ML, Fiero M, Horton NJ, Hsu CH. Handling missing data in RCTs; a review of the top medical journals. BMC Medical Research Methodology 2014; 14: 118.
Bello S, Moustgaard H, Hróbjartsson A. Unreported formal assessment of unblinding occurred in 4 of 10 randomized clinical trials, unreported loss of blinding in 1 of 10 trials. Journal of Clinical Epidemiology 2017; 81: 42-50.
Berger VW. Quantifying the magnitude of baseline covariate imbalances resulting from selection bias in randomized clinical trials. Biometrical Journal 2005; 47: 119-127.
Boutron I, Estellat C, Guittet L, Dechartres A, Sackett DL, Hróbjartsson A, Ravaud P. Methods of blinding in reports of randomized controlled trials assessing pharmacologic treatments: a systematic review. PLoS Medicine 2006; 3: e425.
Brown S, Thorpe H, Hawkins K, Brown J. Minimization--reducing predictability for multi-centre trials whilst retaining balance within centre. Statistics in Medicine 2005; 24: 3715-3727.
Clark L, Fairhurst C, Torgerson DJ. Allocation concealment in randomised controlled trials: are we getting better? BMJ 2016; 355: i5663.
Corbett MS, Higgins JPT, Woolacott NF. Assessing baseline imbalance in randomised trials: implications for the Cochrane risk of bias tool. Research Synthesis Methods 2014; 5: 79-85.
Fergusson D, Aaron SD, Guyatt G, Hebert P. Post-randomisation exclusions: the intention to treat principle and excluding patients from analysis. BMJ 2002; 325: 652-654.
Gravel J, Opatrny L, Shapiro S. The intention-to-treat approach in randomized controlled trials: are authors saying what they do and doing what they say? Clinical Trials (London, England) 2007; 4: 350-356.
Haahr MT, Hróbjartsson A. Who is blinded in randomized clinical trials? A study of 200 trials and a survey of authors. Clinical Trials (London, England) 2006; 3: 360-365.
Hernán MA, Hernandez-Diaz S. Beyond the intention-to-treat in comparative effectiveness research. Clinical Trials (London, England) 2012; 9: 48-55.
Hernán MA, Robins JM. Per-protocol analyses of pragmatic trials. New England Journal of Medicine 2017; 377: 1391-1398.
Hernán MA, Scharfstein D. Cautions as Regulators Move to End Exclusive Reliance on Intention to Treat. Annals of Internal Medicine 2018; 168: 515-516.
Higgins JPT, White IR, Wood AM. Imputation methods for missing outcome data in meta-analysis of clinical trials. Clinical Trials 2008; 5: 225-239.
Higgins JPT, Altman DG, Gøtzsche PC, Jüni P, Moher D, Oxman AD, Savović J, Schulz KF, Weeks L, Sterne JAC. The Cochrane Collaboration's tool for assessing risk of bias in randomised trials. BMJ 2011; 343: d5928.
Hollis S, Campbell F. What is meant by intention to treat analysis? Survey of published randomised controlled trials. BMJ 1999; 319: 670-674.
Jensen JS, Bielefeldt AO, Hróbjartsson A. Active placebo control groups of pharmacological interventions were rarely used but merited serious consideration: a methodological overview. Journal of Clinical Epidemiology 2017; 87: 35-46.
Jüni P, Altman DG, Egger M. Systematic reviews in health care: Assessing the quality of controlled clinical trials. BMJ 2001; 323: 42-46.
Kirkham JJ, Dwan KM, Altman DG, Gamble C, Dodd S, Smyth R, Williamson PR. The impact of outcome reporting bias in randomised controlled trials on a cohort of systematic reviews. BMJ 2010; 340: c365.
Mansournia MA, Higgins JPT, Sterne JAC, Hernán MA. Biases in randomized trials: a conversation between trialists and epidemiologists. Epidemiology 2017; 28: 54-59.
Menerit CL. Clinical Trials – Design, Conduct, and Analysis. Second Edition. Oxford (UK): Oxford University Press; 2012.
National Research Council. The Prevention and Treatment of Missing Data in Clinical Trials. Panel on Handling Missing Data in Clinical Trials. Committee on National Statistics, Division of Behavioral and Social Sciences and Education. Washington, DC: The National Academies Press; 2010.
Page MJ, Higgins JPT. Rethinking the assessment of risk of bias due to selective reporting: a cross-sectional study. Systematic Reviews 2016; 5: 108.
Piantadosi S. Clinical Trials: A Methodologic perspective. 2nd ed. Hoboken (NJ): Wiley; 2005.
Schulz KF, Chalmers I, Hayes RJ, Altman DG. Empirical evidence of bias. Dimensions of methodological quality associated with estimates of treatment effects in controlled trials. JAMA 1995; 273: 408-412.
Schulz KF. Subverting randomization in controlled trials. JAMA 1995; 274: 1456-1458.
Schulz KF, Grimes DA. Generation of allocation sequences in randomised trials: chance, not choice. Lancet 2002; 359: 515-519.
Schulz KF, Chalmers I, Altman DG. The landscape and lexicon of blinding in randomized trials. Annals of Internal Medicine 2002; 136: 254-259.
Schulz KF, Grimes DA. The Lancet Handbook of Essential Concepts in Clinical Research. Edinburgh (UK): Elsevier; 2006 2006.