Risk of bias versus quality assessment in systematic reviews: a comparison between ROBIS and AMSTAR




Long oral session 8: Methods for overviews


Wednesday 13 September 2017 - 16:00 to 17:30


All authors in correct order:

Minozzi S1, Cinquini M2, Capobussi M3, Gonzalez-Lorenzo M3, Pecoraro V4, Banzi R5
1 Cochrane Review Group on Drugs and Alcohol, Department of Epidemiology, Lazio Regional Health Service, Via Cristoforo Colombo, 112, 00147 – Rome, Italy
2 Laboratory of Clinical Research Methodology, IRCCS-Mario Negri Institute for Pharmacological Research, Via G. La Masa 19, 20156 Milan, Italy
3 Department of Biomedical Sciences for Health, University of Milan, Via Pascal 36, 20133 Milan, Italy
4 Department of Laboratory Medicine and Pathological Anatomy, Laboratory of Toxicology. Ospedale Civile S. Agostino Estense, Azienda USL of Modena, Italy
5 Laboratory of Regulatory Policies IRCCS-Mario Negri Institute for Pharmacological Research, Via G. La Masa 19, 20156 Milan, Italy
Presenting author and contact person

Presenting author:

Silvia Minozzi

Contact person:

Abstract text
Background: Systematic reviews (SRs) are widely used to support the development of clinical guidelines and other documents driving decisions in healthcare. Suboptimal SRs can be harmful and a reliable assessment of their validity is essential. A widely used tool is the AMSTAR checklist, while the ROBIS tool was recently launched to specifically assess risk of bias of SRs.

Objectives: To evaluate the inter-rater reliability (IRR) of AMSTAR and ROBIS for individual domains and overall judgment, the concurrent validity, and the time required to apply the tools.

Methods: Five raters with different levels of expertise assessed 31 SRs on pharmacological thromboprophylaxis using AMSTAR and ROBIS. For each question, domain and overall risk of bias, we calculated the Fliess’ k for multiple IRR (for AMSTAR, low risk of bias: eight yes-answers or more, high risk of bias: three yes-answers or less). We assessed the concurrent validity of the two tools by comparing different domains addressing similar items (Table). We recorded the time to complete each tool as mean time spent by each reviewer on each review. We classified agreement as: poor (≤0.00), slight (0.01-0.20), fair (0.21-0.40), moderate (0.41-0.60), substantial (0.61-0.80), almost perfect (0.81-1.00).

Results: The kappa for the agreement on individual domains ranged from 0.28 to 1 for AMSTAR and from 0.49 to 0.61 for ROBIS; kappa for overall risk of bias was 0.65 for both tools (Figure). We found a fair correlation between AMSTAR and ROBIS in the overall judgment (ρ=0.38), mainly because of discordances in the classification of SRs at intermediate risk of bias. The mean time to complete ROBIS was about twice that of AMSTAR (mean±standard deviation: 12.6±4.6 vs. 5.8±31.9; mean difference: 6.7±3.2). Concurrent validity on single domains will be presented.

Conclusions: We found a similar substantial IRR for both tools in the judgment of overall risk of bias. ROBIS requires more time to complete. Reasons for low correlation between AMSTAR and ROBIS may be differences in judgments or genuine differences in what the tools aimed to measure (methodological quality vs. risk of bias and appropriateness).