Many hands make light work: Progress in crowdsourcing and Cochrane
Posted on March 16, 2015 by Robert Kemp
When you have a question that needs answering, do you look at a couple of pieces of artificially selected evidence or do you examine all the available evidence there is? Systematic reviews are the process of searching, collecting and analysing all of the quality evidence on a single question to get a coherent answer. In medical research, the Cochrane Collaboration are held in high regard for producing good quality, authoritative systematic reviews that help doctors and other healthcare professionals improve the decisions they need to make. But what does the future of online networks and big data hold for Cochrane and its rigorous methods of analysis?
The problem of citation screening: heavy work
Medical research is being published at an alarming rate and authors of systematic reviews are increasingly being faced with two competing demands: making them comprehensive enough to be accurate, but also simple enough to be completed in a timely manner. Let’s illustrate this with a hypothetical (worst-case) example:
Consider a single small part of the review process: citation screening. This involves looking at the citations returned from the search and assessing if they are suitable to be included in the review. Large reviews can have over 10,000 citations to be screened, with each citation conventionally being screened by two members of the author team working independently, with a third on hand to arbitrate. Such screeners often represent a cross-section of the review team and can include a member with the relevant clinical background, a specialist epidemiologist, an expert systematic reviewer, and the review group’s trials search co-ordinator.
Doubling up in this way has been shown to be effective in reducing the risk of bias, with a clinical specialist and an epidemiologist, for example, possessing slightly differing criteria for accepting or rejecting citations. However, a trade off against this methodological rigour is an increase in time spent on the task, despite the fact that it may take an experienced reviewer between just 30 seconds and 2 minutes to screen a single citation. Assuming a constant screening rate of 1 minute per citation, screening 10,000 citations will take somewhere in the region of 167 Hours, or 21 working days. Multiply that by two for screening in duplicate.
Since it’s estimated that roughly 75 new trials are published a day, this is 1500+ new trials over the course of those 21 days that could be included in our review that we’ve currently missed, and which could alter the conclusions of the review drastically. This time delay does not take account of any other stages in processing data, analysis or the writing and publication of the review. In fact many reviews have over a year between publication and last search date. This can have large consequences, as the review is only accurate if it is performed correctly but also if it includes all the available data. An out of date review can be a bit like a computer built 20 years ago; it may have used the best parts at the time but is no longer considered fit for purpose as it fails to include newer parts, or in the case of reviews it fails to include the latest research. There is evidence that this is a problem already. This issue is only going to get worse as more and more research is published and needs to be searched through.
Crowdsourced screening: many hands
In order to solve the problem of dealing with a large amount of evidence and to speed up the review process, a team led by Anna Noel-Storr at the Cochrane Dementia and Cognitive Improvement Group has been looking at utilising crowdsourcing in the review process. Crowdsourcing is effectively asking for help from your friends to complete a project, but instead of your friends, it’s strangers recruited across the Internet. By doing this they hope to speed up the review process but also increase the amount of data and pieces of evidence they can handle.
Systematic reviews provide a good opportunity for crowdsourcing; their systematic nature means people can follow easy predefined rules, and much of the review is preforming ‘micro tasks’ on large data sets, where each individual task (e.g. a single screening of a citation) may not take very long but collectively this mounts up. Handling a large workload is less of a problem with a decent crowd, however, since the tasks can be widely distributed.
Within Cochrane, The first efforts at utilising crowdsourcing began with the ALOIS engagement project designed to assess the feasibility of recruiting carers of dementia to read study reports and extract key characteristics. This initial task demonstrated not only that crowdsourcing was achievable but also suggested that crowd performance was comparable to traditional methods. Following this initial success two projects were set up to assess the wider possibility of using crowdsourcing in systematic reviews. The TrialBlazers and MoCAsub studies looked at using crowdsourcing for screening citations for inclusion in CENTRAL (Cochrane’s central register of clinical trials) or for a diagnostic test accuracy review. The performance of the crowd was compared to the traditional method of screening citations described above. The measures of screening accuracy are sensitivity (are studies that need to be included, included) and specificity (are studies that need to be rejected, rejected), and on these two key metrics the crowd performance was close to the professional standard.
Building on this initial work, current projects utilising crowd sourcing involve the EMBASE screening programme and the Cochrane Dementia and Cognitive Improvement group’s screening programme. These look to extend the initial TrialBlazers study to screen citations for CENTRAL inclusion or for inclusion in a suite of reviews examining modifiable risk factors for dementia, respectively. Both have been a great success. The EMBASE project has managed to sign up 900 ‘screeners’ who have screened over 110,000 records with an accuracy of 99%. The Dementia and Cognitive Improvement screening project’s first round of screening came to a close on February 11th after screening several thousand reviews. These projects are still ongoing. You can sign up for the EMBASE project here and the second round of citation screening for dementia reviews is opening again in May or June of this year.
The future of crowdsourcing: lighter work?
Crowdsourcing promises to increase Cochrane’s capacity to deal with increasing amounts of evidence produced as well as perform systematic reviews in more timely manner (a crowd of 500 people would take only thirty minutes to perform my hypothetical screening task).
To make crowdsourcing acceptable to the community, effort is being made to demonstrate that the crowd’s performance brings no compromise in terms of accuracy in comparison to traditional methods. The screening tool used for the Embase screening project is fundamental to the success of the project. Its built-in ‘crowd algorithm’ (i.e. the number of decisions needed on a citation) has proved robust. Future work will involve continuing to monitor the effectiveness of the algorithm.
Other features of the tool have made a significant impact on the number of citations the crowd have collectively been able to process– such as the highlighted words and phrases. More analysis on the effect of the yellow and red highlights is planned as it is felt improvements to this feature alone may result in further improvements in accuracy and/or speed of screening.
Crowdsourcing’s continued success is contingent upon the crowd remaining committed. Making an effort to nurture and find suitable rewards is a continuing challenge. However, if crowdsourcing is proved to be effective, the crowd may be considered suitable for further tasks. Systematic reviews provide a good opportunity for crowdsourcing work as the process potentially involves many ‘micro-tasks’ on large data sets, performed in a systematic manner in accordance with strict rules. Hopefully, if interest can be sustained, this model can allow more reviews to be published in a quicker time scale, leading to better evidence for us all. After all, many hands make light work.
Edwards, Phil et al. ‘Identification Of Randomized Controlled Trials In Systematic Reviews: Accuracy And Reliability Of Screening Records’. Statist. Med. 21.11 (2002): 1635-1640. Web. 4 Mar. 2015. [accessible online at onlinelibrary.wiley.com/doi/10.1002/sim.1190/abstract]
Bastian et al 2010, http://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.1000326
Sampson et al 2008, http://www.sciencedirect.com/science/article/pii/S089543560800053X
Beller et al 2013, http://www.systematicreviewsjournal.com/content/2/1/36
Shojania et al 2007, http://annals.org/article.aspx?articleid=736284