Joanne E McKenzie, Sue E Brennan, Rebecca E Ryan, Hilary J Thomson, Renea V Johnston
Key Points:
- Synthesis is a process of bringing together data from a set of included studies with the aim of drawing conclusions about a body of evidence. This will include synthesis of study characteristics and, potentially, statistical synthesis of study findings.
- A general framework for synthesis can be used to guide the process of planning the comparisons, preparing for synthesis, undertaking the synthesis, and interpreting and describing the results.
- Tabulation of study characteristics aids the examination and comparison of PICO elements across studies, facilitates synthesis of these characteristics and grouping of studies for statistical synthesis.
- Tabulation of extracted data from studies allows assessment of the number of studies contributing to a particular meta-analysis, and helps determine what other statistical synthesis methods might be used if meta-analysis is not possible.
Cite this chapter as: McKenzie JE, Brennan SE, Ryan RE, Thomson HJ, Johnston RV. Chapter 9: Summarizing study characteristics and preparing for synthesis. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.3 (updated February 2022). Cochrane, 2022. Available from www.training.cochrane.org/handbook.
9.1 Introduction
Synthesis is a process of bringing together data from a set of included studies with the aim of drawing conclusions about a body of evidence. Most Cochrane Reviews on the effects of interventions will include some type of statistical synthesis. Most commonly this is the statistical combination of results from two or more separate studies (henceforth referred to as meta-analysis) of effect estimates.
An examination of the included studies always precedes statistical synthesis in Cochrane Reviews. For example, examination of the interventions studied is often needed to itemize their content so as to determine which studies can be grouped in a single synthesis. More broadly, synthesis of the PICO (Population, Intervention, Comparator and Outcome) elements of the included studies underpins interpretation of review findings and is an important output of the review in its own right. This synthesis should encompass the characteristics of the interventions and comparators in included studies, the populations and settings in which the interventions were evaluated, the outcomes assessed, and the strengths and weaknesses of the body of evidence.
Chapter 2 defined three types of PICO criteria that may be helpful in understanding decisions that need to be made at different stages in the review:
- The review PICO (planned at the protocol stage) is the PICO on which eligibility of studies is based (what will be included and what excluded from the review).
- The PICO for each synthesis (also planned at the protocol stage) defines the question that the specific synthesis aims to answer, determining how the synthesis will be structured, specifying planned comparisons (including intervention and comparator groups, any grouping of outcome and population subgroups).
- The PICO of the included studies (determined at the review stage) is what was actually investigated in the included studies.
In this chapter, we focus on the PICO for each synthesis and the PICO of the included studies, as the basis for determining which studies can be grouped for statistical synthesis and for synthesizing study characteristics. We describe the preliminary steps undertaken before performing the statistical synthesis. Methods for the statistical synthesis are described in Chapter 10, Chapter 11 and Chapter 12.
9.2 A general framework for synthesis
Box 9.2.a A general framework for synthesis that can be applied irrespective of the methods used to synthesize results
Stage 1. At protocol stage: Step 1.1. Set up the comparisons (Chapter 2 and Chapter 3).
Stage 2. Summarizing the included studies and preparing for synthesis: Step 2.1. Summarize the characteristics of each study in a ‘Characteristics of included studies’ table (see Chapter 5), including examining the interventions to itemize their content and other characteristics (Section 9.3.1). Step 2.2. Determine which studies are similar enough to be grouped within each comparison by comparing the characteristics across studies (e.g. in a matrix) (Section 9.3.2). Step 2.3. Determine what data are available for synthesis (Section 9.3.3; extraction of data and conversion to the desired format is discussed in Chapter 5 and Chapter 6). Step 2.4. Determine if modification to the planned comparisons or outcomes is necessary, or new comparisons are needed, noting any deviations from the protocol plans (Section 9.3.4; and Chapter 2 and Chapter 3). Step 2.5. Synthesize the characteristics of the studies contributing to each comparison (Section 9.3.5).
Stage 3. The synthesis itself: Step 3.1. Perform a statistical synthesis (if appropriate), or provide structured reporting of the effects (Section 9.5; and Chapter 10, Chapter 11 and Chapter 12). Step 3.2. Interpret and describe the results, including consideration of the direction of effect, size of the effect, certainty of the evidence (Chapter 14), and the interventions tested and the populations in which they were tested. |
Box 9.2.a provides a general framework for synthesis that can be applied irrespective of the methods used to synthesize results. Planning for the synthesis should start at protocol-writing stage, and Chapter 2 and Chapter 3 describe the steps involved in planning the review questions and comparisons between intervention groups. These steps included specifying which characteristics of the interventions, populations, outcomes and study design would be grouped together for synthesis (the PICO for each synthesis: stage 1 in Box 9.2.a).
This chapter primarily concerns stage 2 of the general framework in Box 9.2.a. After deciding which studies will be included in the review and extracting data, review authors can start implementing their plan, working through steps 2.1 to 2.5 of the framework. This process begins with a detailed examination of the characteristics of each study (step 2.1), and then comparison of characteristics across studies in order to determine which studies are similar enough to be grouped for synthesis (step 2.2). Examination of the type of data available for synthesis follows (step 2.3). These three steps inform decisions about whether any modification to the planned comparisons or outcomes is necessary, or new comparisons are needed (step 2.4). The last step of the framework covered in this chapter involves synthesis of the characteristics of studies contributing to each comparison (step 2.5). The chapter concludes with practical tips for checking data before synthesis (Section 9.4).
Steps 2.1, 2.2 and 2.5 involve analysis and synthesis of mainly qualitative information about study characteristics. The process used to undertake these steps is rarely described in reviews, yet can require many subjective decisions about the nature and similarity of the PICO elements of the included studies. The examples described in this section illustrate approaches for making this process more transparent.
9.3 Preliminary steps of a synthesis
9.3.1 Summarize the characteristics of each study (step 2.1)
A starting point for synthesis is to summarize the PICO characteristics of each study (i.e. the PICO of the included studies, see Chapter 3) and categorize these PICO elements in the groups (or domains) pre-specified in the protocol (i.e. the PICO for each synthesis). The resulting descriptions are reported in the ‘Characteristics of included studies’ table, and are used in step 2.2 to determine which studies can be grouped for synthesis.
In some reviews, the labels and terminology used in each study are retained when describing the PICO elements of the included studies. This may be sufficient in areas with consistent and widely understood terminology that matches the PICO for each synthesis. However, in most areas, terminology is variable, making it difficult to compare the PICO of each included study to the PICO for each synthesis, or to compare PICO elements across studies. Standardizing the description of PICO elements across studies facilitates these comparisons. This standardization includes applying the labels and terminology used to articulate the PICO for each synthesis (Chapter 3), and structuring the description of PICO elements. The description of interventions can be structured using the Template for Intervention Description and Replication (TIDIeR) checklist, for example (see Chapter 3 and Table 9.3.a).
Table 9.3.a illustrates the use of pre-specified groups to categorize and label interventions in a review of psychosocial interventions for smoking cessation in pregnancy (Chamberlain et al 2017). The main intervention strategy in each study was categorized into one of six groups: counselling, health education, feedback, incentive-based interventions, social support, and exercise. This categorization determined which studies were eligible for each comparison (e.g. counselling versus usual care; single or multi-component strategy). The extract from the ‘Characteristics of included studies’ table shows the diverse descriptions of interventions in three of the 54 studies for which the main intervention was categorized as ‘counselling’. Other intervention characteristics, such as duration and frequency, were coded in pre-specified categories to standardize description of the intervention intensity and facilitate meta-regression (not shown here).
Table 9.3.a Example of categorizing interventions into pre-defined groups
Definition of (selected) intervention groups from the PICO for each synthesis
|
|||
Study ID |
Precis of intervention description from study |
Main intervention strategy |
Other intervention components |
Study 1 |
|
Counselling |
Incentive |
Study 2 |
Routine prenatal advice on a range of health issues, from midwives and obstetricians plus:
|
Counselling |
Social support |
Study 3 |
Midwives received two and a half days of training on theory of transtheoretical model. Participants received a set of six stage-based self-help manuals ‘Pro-Change programme for a healthy pregnancy’. The midwife assessed each participant’s stage of change and pointed the woman to the appropriate manual. No more than 15 minutes was spent on the intervention. |
Counselling |
Nil |
* The definition also specified eligible modes of delivery, intervention duration and personnel.
While this example focuses on categorizing and describing interventions according to groups pre-specified in the PICO for each synthesis, the same approach applies to other PICO elements.
9.3.2 Determine which studies are similar enough to be grouped within each comparison (step 2.2)
Once the PICO of included studies have been coded using labels and descriptions specified in the PICO for each synthesis, it will be possible to compare PICO elements across studies and determine which studies are similar enough to be grouped within each comparison.
Tabulating study characteristics can help to explore and compare PICO elements across studies, and is particularly important for reviews that are broad in scope, have diversity across one or more PICO elements, or include large numbers of studies. Data about study characteristics can be ordered in many different ways (e.g. by comparison or by specific PICO elements), and tables may include information about one or more PICO elements. Deciding on the best approach will depend on the purpose of the table and the stage of the review. A close examination of study characteristics will require detailed tables; for example, to identify differences in characteristics that were pre-specified as potentially important modifiers of the intervention effects. As the review progresses, this detail may be replaced by standardized description of PICO characteristics (e.g. the coding of counselling interventions presented in Table 9.3.a).
Table 9.3.b illustrates one approach to tabulating study characteristics to enable comparison and analysis across studies. This table presents a high-level summary of the characteristics that are most important for determining which comparisons can be made. The table was adapted from tables presented in a review of self-management education programmes for osteoarthritis (Kroon et al 2014). The authors presented a structured summary of intervention and comparator groups for each study, and then categorized intervention components thought to be important for enabling patients to manage their own condition. Table 9.3.b shows selected intervention components, the comparator, and outcomes measured in a subset of studies (some details are fictitious). Outcomes have been grouped by the outcome domains ‘Pain’ and ‘Function’ (column ‘Outcome measure’ Table 9.3.b). These pre-specified outcome domains are the chosen level for the synthesis as specified in the PICO for each synthesis. Authors will need to assess whether the measurement methods or tools used within each study provide an appropriate assessment of the domains (Chapter 3, Section 3.2.4). A next step is to group each measure into the pre-specified time points. In this example, outcomes are grouped into short-term (<6 weeks) and long-term follow-up (≥6 weeks to 12 months) (column ‘Time points (time frame)’ Table 9.3.b).
Variations on the format shown in Table 9.3.b can be presented within a review to summarize the characteristics of studies contributing to each synthesis, which is important for interpreting findings (step 2.5).
Table 9.3.b Table of study characteristics illustrating similarity of PICO elements across studies
Study1 |
Comparator |
Self-management intervention components |
Outcome domain |
Outcome measure |
Time points (time frame)2 |
Data3 |
Effect & SE |
||||||
1 |
Attention control |
BEH |
MON |
CON |
SKL |
NAV |
Pain |
Pain VAS |
1 mth (short), 8 mths (long) |
Mean, N / group |
Yes4 |
||
Function |
HAQ disability subscale |
1 mth (short), 8 mths (long) |
Median, IQR, N / group |
Maybe4 |
|||||||||
2 |
Acupuncture |
BEH |
EMO |
CON |
SKL |
NAV |
Pain |
Pain on walking VAS |
1 mth (short), 12 mths (long) |
MD from ANCOVA model, 95%CI |
Yes |
||
Function |
Dutch AIMS-SF |
1 mth (short), 12 mths (long) |
Median, range, N / group |
Maybe4 |
|||||||||
4 |
Information |
BEH |
ENG |
EMO |
MON |
CON |
SKL |
NAV |
Pain |
Pain VAS |
1 mth (short) |
MD, SE |
Yes |
Function |
Dutch AIMS-SF |
1 mth (short) |
Mean, SD, N / group |
Yes |
|||||||||
12 |
Information |
BEH |
SKL |
Pain |
WOMAC pain subscore |
12 mths (long) |
MD from ANCOVA model, 95%CI |
Yes |
|||||
3 |
Usual care |
BEH |
EMO |
MON |
SKL |
NAV |
Pain |
Pain VAS* Pain on walking VAS |
1 mth (short) 1 mth (short) |
Mean, SD, N / group |
Yes |
||
5 |
Usual care |
BEH |
ENG |
EMO |
MON |
CON |
SKL |
Pain |
Pain on walking VAS |
2 wks (short) |
Mean, SD, N / group |
Yes |
|
6 |
Usual care |
BEH |
MON |
CON |
SKL |
NAV |
Pain |
Pain VAS |
2 wks (short), 1 mth (short)* |
MD, t-value and P value for MD |
Yes |
||
Function |
WOMAC disability subscore |
2 wks (short), 1 mth (short)* |
Mean, N / group |
Yes |
|||||||||
7 |
Usual care |
BEH |
MON |
CON |
SKL |
NAV |
Pain |
WOMAC pain subscore |
1 mth (short) |
Direction of effect |
No |
||
Function |
WOMAC disability subscore |
1 mth (short) |
Means, N / group; statistically significant difference |
Yes4 |
|||||||||
8 |
Usual care |
MON |
Pain |
Pain VAS |
12 mths (long) |
MD, 95%CI |
Yes |
||||||
9 |
Usual care |
BEH |
MON |
SKL |
Function |
Global disability |
12 mths (long) |
Direction of effect, NS |
No |
||||
10 |
Usual care |
BEH |
EMO |
MON |
CON |
SKL |
NAV |
Pain |
Pain VAS |
1 mth (short) |
No information |
No |
|
Function |
Global disability |
1 mth (short) |
Direction of effect |
No |
|||||||||
11 |
Usual care |
BEH |
MON |
SKL |
Pain |
WOMAC pain subscore |
1 mth (short), 12 mths (long) |
Mean, SD, N / group |
Yes |
BEH = health-directed behaviour; CON = constructive attitudes and approaches; EMO = emotional well-being; ENG = positive and active engagement in life; MON = self-monitoring and insight; NAV = health service navigation; SKL = skill and technique acquisition.
ANCOVA = Analysis of covariance; CI = confidence interval; IQR = interquartile range; MD = mean difference; SD = standard deviation; SE = standard error, NS = non-significant.
Pain and function measures: Dutch AIMS-SF = Dutch short form of the Arthritis Impact Measurement Scales; HAQ = Health Assessment Questionnaire; VAS = visual analogue scale; WOMAC = Western Ontario and McMaster Universities Osteoarthritis Index.
1Ordered by type of comparator; 2Short-term (denoted ‘immediate’ in the review Kroon et al (2014)) follow-up is defined as <6 weeks, long-term follow-up (denoted ‘intermediate’ in the review) is ≥6 weeks to 12 months; 3For simplicity, in this example the available data are assumed to be the same for all outcomes within an outcome domain within a study. In practice, this is unlikely and the available data would likely vary by outcome; 4Indicates that an effect estimate and its standard error may be computed through imputation of missing statistics, methods to convert between statistics (e.g. medians to means) or contact with study authors. *Indicates the selected outcome when there was multiplicity in the outcome domain and time frame.
9.3.3 Determine what data are available for synthesis (step 2.3)
Once the studies that are similar enough to be grouped together within each comparison have been determined, a next step is to examine what data are available for synthesis. Tabulating the measurement tools and time frames as shown in Table 9.3.b allows assessment of the potential for multiplicity (i.e. when multiple outcomes within a study and outcome domain are available for inclusion (Chapter 3, Section 3.2.4.3)). In this example, multiplicity arises in two ways. First, from multiple measurement instruments used to measure the same outcome domain within the same time frame (e.g. ‘Short-term Pain’ is measured using the ‘Pain VAS’ and ‘Pain on walking VAS’ scales in study 3). Second, from multiple time points measured within the same time frame (e.g. ‘Short-term Pain’ is measured using ‘Pain VAS’ at both 2 weeks and 1 month in study 6). Pre-specified methods to deal with the multiplicity can then be implemented (see Table 9.3.c for examples of approaches for dealing with multiplicity). In this review, the authors pre-specified a set of decision rules for selecting specific outcomes within the outcome domains. For example, for the outcome domain ‘Pain’, the selected outcome was the highest on the following list: global pain, pain on walking, WOMAC pain subscore, composite pain scores other than WOMAC, pain on activities other than walking, rest pain or pain during the night. The authors further specified that if there were multiple time points at which the outcome was measured within a time frame, they would select the longest time point. The selected outcomes from applying these rules to studies 3 and 6 are indicated by an asterisk in Table 9.3.b.
Table 9.3.b also illustrates an approach to tabulating the extracted data. The available statistics are tabulated in the column labelled ‘Data’, from which an assessment can be made as to whether the study contributes the required data for a meta-analysis (column ‘Effect & SE’) (Chapter 10). For example, of the seven studies comparing health-directed behaviour (BEH) with usual care, six measured ‘Short-term Pain’, four of which contribute required data for meta-analysis. Reordering the table by comparison, outcome and time frame, will more readily show the number of studies that will contribute to a particular meta-analysis, and help determine what other synthesis methods might be used if the data available for meta-analysis are limited.
Table 9.3.c Examples of approaches for selecting one outcome (effect estimate) for inclusion in a synthesis.* Adapted from López-López et al (2018)
Approach |
Description |
Comment |
Random selection |
Randomly select an outcome (effect estimate) when multiple are available for an outcome domain |
Assumes that the effect estimates are interchangeable measures of the domain and that random selection will yield a ‘representative’ effect for the meta-analysis. |
Averaging of effect estimates |
Calculate the average of the intervention effects when multiple are available for a particular outcome domain |
Assumes that the effect estimates are interchangeable measures of the domain. The standard error of the average effect can be calculated using a simple method of averaging the variances of the effect estimates. |
Median effect estimate |
Rank the effect estimates of outcomes within an outcome domain and select the outcome with the middle value |
An alternative to averaging effect estimates. Assumes that the effect estimates are interchangeable measures of the domain and that the median effect will yield a ‘representative’ effect for the meta-analysis. This approach is often adopted in Effective Practice and Organization of Care reviews that include broad outcome domains. |
Decision rules |
Select the most relevant outcome from multiple that are available for an outcome domain using a decision rule |
Assumes that while the outcomes all provide a measure of the outcome domain, they are not completely interchangeable, with some being more relevant. The decision rules aim to select the most relevant. The rules may be based on clinical (e.g. content validity of measurement tools) or methodological (e.g. reliability of the measure) considerations. If multiple rules are specified, a hierarchy will need to be determined to specify the order in which they are applied. |
9.3.4 Determine if modification to the planned comparisons or outcomes is necessary, or new comparisons are needed (step 2.4)
The previous steps may reveal the need to modify the planned comparisons. Important variations in the intervention may be identified leading to different or modified intervention groups. Few studies or sparse data, or both, may lead to different groupings of interventions, populations or outcomes. Planning contingencies for anticipated scenarios is likely to lead to less post-hoc decision making (Chapter 2 and Chapter 3); however, it is difficult to plan for all scenarios. In the latter circumstance, the rationale for any post-hoc changes should be reported. This approach was adopted in a review examining the effects of portion, package or tableware size for changing selection and consumption of food, alcohol and tobacco (Hollands et al 2015). After preliminary examination of the outcome data, the review authors changed their planned intervention groups. They judged that intervention groups based on ‘size’ and those based on ‘shape’ of the products were not conceptually comparable, and therefore should form separate comparisons. The authors provided a rationale for the change and noted that it was a post-hoc decision.
9.3.5 Synthesize the characteristics of the studies contributing to each comparison (step 2.5)
A final step, and one that is essential for interpreting combined effects, is to synthesize the characteristics of studies contributing to each comparison. This description should integrate information about key PICO characteristics across studies, and identify any potentially important differences in characteristics that were pre-specified as possible effect modifiers. The synthesis of study characteristics is also needed for GRADE assessments, informing judgements about whether the evidence applies directly to the review question (indirectness) and analyses conducted to examine possible explanations for heterogeneity (inconsistency) (see Chapter 14).
Tabulating study characteristics is generally preferable to lengthy description in the text, since the structure imposed by a table can make it easier and faster for readers to scan and identify patterns in the information presented. Table 9.3.b illustrates one such approach. Tabulating characteristics of studies that contribute to each comparison can also help to improve the transparency of decisions made around grouping of studies, while also ensuring that studies that do not contribute to the combined effect are accounted for.
9.4 Checking data before synthesis
Before embarking on a synthesis, it is important to be confident that the findings from the individual studies have been collated correctly. Therefore, review authors must compare the magnitude and direction of effects reported by studies with how they are to be presented in the review. This is a reasonably straightforward way for authors to check a number of potential problems, including typographical errors in studies’ reports, accuracy of data collection and manipulation, and data entry into RevMan. For example, the direction of a standardized mean difference may accidentally be wrong in the review. A basic check is to ensure the same qualitative findings (e.g. direction of effect and statistical significance) between the data as presented in the review and the data as available from the original study.
Results in forest plots should agree with data in the original report (point estimate and confidence interval) if the same effect measure and statistical model is used. There are legitimate reasons for differences, however, including: using a different measure of intervention effect; making different choices between change-from-baseline measures, post-intervention measures alone or post-intervention measures adjusted for baseline values; grouping similar intervention groups; or making adjustments for unit-of-analysis errors in the reports of the primary studies.
9.5 Types of synthesis
The focus of this chapter has been describing the steps involved in implementing the planned comparisons between intervention groups (stage 2 of the general framework for synthesis (Box 9.2.a)). The next step (stage 3) is often performing a statistical synthesis. Meta-analysis of effect estimates, and its extensions have many advantages. There are circumstances under which a meta-analysis is not possible, however, and other statistical synthesis methods might be considered, so as to make best use of the available data. Available summary and synthesis methods, along with the questions they address and examples of associated plots, are described in Table 9.5.a. Chapter 10 and Chapter 11 discuss meta-analysis (of effect estimate) methods, while Chapter 12 focuses on the other statistical synthesis methods, along with approaches to tabulating, visually displaying and providing a structured presentation of the findings. An important part of planning the analysis strategy is building in contingencies to use alternative methods when the desired method cannot be used.
Table 9.5.a Overview of available methods for summary and synthesis
Summary |
Statistical synthesis methods |
||||||
Methods |
Text/Tabular |
Vote counting |
Combining P values |
Summary of effect estimates |
Pairwise meta-analysis |
Network meta-analysis |
Subgroup analysis/meta-regression |
Questions addressed |
Narrative summary of evidence presented in either text or tabular form |
Is there any evidence of an effect? |
Is there evidence that there is an effect in at least one study? |
What is the range and distribution of observed effects? |
What is the common intervention effect? (fixed-effect model) What is the average intervention effect? (random effects model) |
Which intervention of multiple is most effective? |
What factors modify the magnitude of the intervention effects? |
Example plots |
Forest plot (plotting individual study effects without a combined effect estimate) |
Harvest plot Effect direction plot |
Albatross plot |
Box and whisker plot Bubble plot |
Forest plot |
Forest plot Network diagram Rankogram plots |
Forest plot Box and whisker plot Bubble plot |
9.6 Chapter information
Authors: Joanne E McKenzie, Sue E Brennan, Rebecca E Ryan, Hilary J Thomson, Renea V Johnston
Acknowledgements: Sections of this chapter build on Chapter 9 of version 5.1 of the Handbook, with editors Jonathan Deeks, Julian Higgins and Douglas Altman. We are grateful to Julian Higgins, James Thomas and Tianjing Li for commenting helpfully on earlier drafts.
Funding: JM is supported by an NHMRC Career Development Fellowship (1143429). SB and RR’s positions are supported by the NHMRC Cochrane Collaboration Funding Program. HT is funded by the UK Medical Research Council (MC_UU_12017-13 and MC_UU_12017-15) and Scottish Government Chief Scientist Office (SPHSU13 and SPHSU15). RJ’s position is supported by the NHMRC Cochrane Collaboration Funding Program and Cabrini Institute.
9.7 References
Chamberlain C, O’Mara-Eves A, Porter J, Coleman T, Perlen SM, Thomas J, McKenzie JE. Psychosocial interventions for supporting women to stop smoking in pregnancy. Cochrane Database of Systematic Reviews 2017; 2: CD001055.
Hollands GJ, Shemilt I, Marteau TM, Jebb SA, Lewis HB, Wei Y, Higgins JPT, Ogilvie D. Portion, package or tableware size for changing selection and consumption of food, alcohol and tobacco. Cochrane Database of Systematic Reviews 2015; 9: CD011045.
Kroon FPB, van der Burg LRA, Buchbinder R, Osborne RH, Johnston RV, Pitt V. Self-management education programmes for osteoarthritis. Cochrane Database of Systematic Reviews 2014; 1: CD008963.
López-López JA, Page MJ, Lipsey MW, Higgins JPT. Dealing with effect size multiplicity in systematic reviews and meta-analyses. Research Synthesis Methods 2018; 9: 336–351.