Inter-rater agreement and time to complete the new Cochrane Risk-of-Bias tool (RoB 2.0)




Poster session 2 Thursday: Evidence synthesis - methods / improving conduct and reporting


Thursday 14 September 2017 - 12:30 to 14:00


All authors in correct order:

Minozzi S1, Saulle R1, Mitrova Z1, Cinquini M2
1 Department of Epidemiology, Cochrane Review Group on Drugs and Alcohol, Lazio Regional Health Service, Italy
2 IRCCS-Mario Negri Institute for Pharmacological Research, Italy
Presenting author and contact person

Presenting author:

Silvia Minozzi

Contact person:

Abstract text
Background:The RoB 2.0 tool, a revised tool to assess risk of bias in randomised trials (RCTs) was piloted during 2016 and officially released at the 2016 Cochrane Colloquium.

Objectives:To assess the Inter-rater agreement (IRR) between raters, time to retrieve protocols and to complete the RoB 2.0 tool.

Methods:We used a convenience sample of 20 individually parallel RCTs included in 2 Cochrane reviews in the drug and alcohol-addiction field. Nine studies compared pharmacological intervention versus placebo and 11 compared psychosocial intervention versus no intervention or usual care.
Two raters with medium and high expertise in risk-of-bias assessment were involved. For each relevant outcome we used the Cohen's weighted κ to assess the IRR for signaling questions (SQ), individual domain judgments (DJ) and overall judgment (OJ). We classified agreement as poor (≤0.00), slight (0.01-0.20), fair (0.21-0.40), moderate (0.41-0.60), substantial (0.61-0.80), almost perfect (0.81-1.00).
Time to complete the tool was calculated as the mean time spent in minutes by each rater for each relevant outcome.
Time to search and acquire the study protocol was calculated as the mean time spent in minutes for each trial.

Results:Preliminary results of the first 6 outcomes from 4 trials are provided.
Randomisation process: SQ1.1: k0.57, SQ1.2: k0.57, SQ1.3: k0.18; DJ1: k0.08
Deviations from intended interventions: SQ2.1: k0.45, SQ2.2: k0.45, SQ2.3: k-0.36, SQ2.4: k0, SQ2.5: k0, DJ2: k-0.36
Missing outcome data: SQ3.1: k0.57, SQ3.2: k-0.13, SQ3.3: k0.20, DJ3: k1
Measurement of the outcome: SQ4.1: k0.18, SQ4.2: k0.36, DJ4: k0.67
Selection of the reported results: SQ5.1: k-1, SQ5.2k-1, DJ5: K0
Overall judgment: K0
Mean time to complete the tool was 34.2 minutes; mean time to search for protocols was 20 minutes.

Conclusions: Preliminary results showed an agreement from poor to moderate for signaling questions, from slight to almost perfect for judgments on individual domains and a poor agreement for overall judgments.