Crowdsourcing and COVID-19 : a case study of Cochrane Crowd

Introduction Crowdsourcing in health research has become increasingly popular over the last decade (1). Cochrane, an international network that produces systematic reviews, has been harnessing a type of crowdsourcing called “human intelligence tasking” since 2014 (2, 3). Human intelligence tasking involves filtering or classifying large amounts of data or information via an online community. In May 2016, Cochrane launched Cochrane Crowd (https://crowd.cochrane.org), its citizen science platform, with its first crowdsourcing task: the identification of reports of randomised controlled trials (RCTs) from Embase. Other tasks followed soon after and new tasks are in development and rolling out on an ongoing basis. Our evaluations of the Crowd’s performance in terms of accuracy demonstrated that a crowdsourcing approach to identifying RCTs was both robust and efficient (2). By early 2020, over 20,000 contributors had signed up to Cochrane Crowd from 166 countries and generated over 5 million individual classifications, helping to identify around 175,000 reports of randomised trials. 2020 looked to be a busy year, but we did not anticipate how large an impact the COVID-19 pandemic would have on Cochrane Crowd. We had launched a new version of the Crowd platform in early March 2020 and work was about to begin on a new PICO extraction task as part of Cochrane’s trial surveillance initiative. Initially, the pandemic was hugely disruptive to the latter planned work, with our efforts immediately re-focussed to help. One of the main challenges presented by the pandemic was the corresponding infodemic. According to the World Health Organization: “[A]n infodemic is too Abstract Cochrane has used crowdsourcing effectively to identify health evidence since 2014. To date, over 175,000 trials have been identified for Cochrane’s Central Register of Controlled Trials via Cochrane Crowd (https://crowd.cochrane.org), Cochrane’s citizen science platform, engaging a Crowd of over 20,000 people from 166 countries. The COVID-19 pandemic presented the evidence synthesis community with the enormous challenge of keeping up with the exponential output of COVID-19 research. This case study will detail the new tasks we developed to aid the production of COVID-19 rapid reviews and supply the Cochrane COVID-19 study register. The pandemic initially looked set to disrupt the Crowd team’s plans for 2020 but has in fact served to further our understanding of the potential role crowdsourcing can play in the health evidence ecosystem.


Introduction
Crowdsourcing in health research has become increasingly popular over the last decade (1). Cochrane, an international network that produces systematic reviews, has been harnessing a type of crowdsourcing called "human intelligence tasking" since 2014 (2,3). Human intelligence tasking involves filtering or classifying large amounts of data or information via an online community. In May 2016, Cochrane launched Cochrane Crowd (https://crowd.cochrane.org), its citizen science platform, with its first crowdsourcing task: the identification of reports of randomised controlled trials (RCTs) from Embase. Other tasks followed soon after and new tasks are in development and rolling out on an ongoing basis. Our evaluations of the Crowd's performance in terms of accuracy demonstrated that a crowdsourcing approach to identifying RCTs was both robust and efficient (2). By early 2020, over 20,000 contributors had signed up to Cochrane Crowd from 166 countries and generated over 5 million individual classifications, helping to identify around 175,000 reports of randomised trials. 2020 looked to be a busy year, but we did not anticipate how large an impact the COVID-19 pandemic would have on Cochrane Crowd. We had launched a new version of the Crowd platform in early March 2020 and work was about to begin on a new PICO extraction task as part of Cochrane's trial surveillance initiative. Initially, the pandemic was hugely disruptive to the latter planned work, with our efforts immediately re-focussed to help. One of the main challenges presented by the pandemic was the corresponding infodemic. According to the World Health Organization: "[A]n infodemic is too Anna Noel-Storr et al. much information including false or misleading information in digital and physical environments during a disease outbreak. It causes confusion and risk-taking behaviors that can harm health. It also leads to mistrust in health authorities and undermines the public health response. An infodemic can intensify or lengthen outbreaks when people are unsure about what they need to do to protect their health and the health of people around them" (4). The dramatic increase in COVID-19 research production and publication throughout 2020 and 2021 has created significant information retrieval challenges, both from the sheer volume of research and in the nature of the research output. One example was the socalled "preprint rush," with both demand for, and availability of, preprints soaring during 2020 (5,6). Cochrane was able to adapt existing skills and systems for the organisation of COVID-19 research to assist with review production. Cochrane prioritised resources and developed initiatives to respond to the pandemic, including a programme of work to produce rapid reviews and the production of special collections of existing relevant health evidence on topics such as infection control and prevention measures and remote care through telehealth (7). Another major undertaking within the network was the development of a curated register of COVID-19 studies, the Cochrane COVID-19 Study Register (CCSR) (https://covid-19.cochrane.org) (8). The CCSR is a continuously updated open access repository of COVID-19 human studies that have been identified from a range of sources and tagged by study type, study design and study aim. Related reports about the same study are linked together to create a "study based" register. The register went live in April 2020 and within twelve months over 57,000 COVID-19 studies had been identified and described. Cochrane Crowd was uniquely placed to help in the response as our thriving community of contributors were eager to support Cochrane's response to the pandemic. This case study will detail four main areas of work undertaken by Cochrane Crowd during the first twelve months of the pandemic: 1) COVID Quest -a new Cochrane Crowd task; 2) direct review input and methodological research; 3) weekly screening challenges; 4) a COVID-19 machine learning classifier.

COVID Quest
We developed a new crowdsourced task: COVID Quest. In COVID Quest the Crowd identify COVIDrelated studies based on assessing title-abstract records ( Figure 1). Unlike most Cochrane Crowd tasks, it is a "multi-question" task -made up of a series of questions about the record. COVID Quest tasks contributors with identifying a range of different study types and study designs, which is another key difference with this task compared to other mainstream tasks on Cochrane Crowd, which relate to identification or description of randomised controlled trials. This is crucial because in a pandemic, a range of study types are needed to answer urgent questions regarding treatment, diagnostics, health services, mental health and the larger societal impact. Controlled vocabularies are used for each question within the task. Anyone can join, though completion of a brief training module is mandatory. We launched the task in June 2020 after a rapid development and testing phase, and to date (June 2021) the Crowd have amassed around 60,000 assessments helping to identify and describe thousands of studies for the CCSR. We have evaluated Crowd accuracy against a gold standard dataset made up of 2000 records assessed by Cochrane information specialists working on the register. Within this set, 566 records were eligible for the CCSR. The Crowd correctly identified 558 as eligible giving a Crowd sensitivity of 98.5%. The Crowd achieved similarly high levels of sensitivity across the study type (whether the study described was an observational, interventional, qualitative, or mathematical modelling study) and the specific study design used (RCT, cohort study/case control, case report, cross-section etc.) components of COVID Quest: 98.2% and 97.6% respectively. In addition, around 85% of records assessed had matching classifications under our agreement algorithm, with only 15% requiring resolution by an "expert" after discordant classifications between Crowd contributors. COVID Quest forms part of a study identification workflow that is largely based on processes that Cochrane's Centralised Search Service already had in place for identifying studies for the Cochrane Central Register of Controlled Trials (CENTRAL) (9) ( Figure  2). Having some of the foundations and technical infrastructure in place facilitated rapid implementation of this end-to-end process.

Review input
As already described, Cochrane undertook a programme of COVID-related, rapidly produced reviews. This work presented an opportunity to test the Crowd's ability to identify studies for reviews in a time-sensitive context. Four reviews were used in this methodological work: Quarantine alone or in combination with other public health measures to control COVID-19 (10); Barriers and facilitators to healthcare workers' adherence with infection prevention and control (IPC) guidelines for respiratory infectious diseases (11); Universal screening for Severe Acute Respiratory Syndrome Coronavirus 2 (12); and Convalescent plasma or hyperimmune immunoglobulin for people with COVID-19 (13). We created a corresponding crowdsourced task for each of these reviews in Cochrane Crowd. Crowd contributors were tasked with assessing the search results and making one of two possible classifications on each title-abstract record: Possibly relevant or Not relevant. As with COVID Quest, these new crowd tasks marked a departure from Crowd tasks focussed on identifying RCTs. This collection of rapidly produced reviews covered a wide range of eligible study types and designs including mathematical modelling studies, observational studies, interventional studies, and qualitative and mixed study designs. The Crowd had to become familiar with both the topic of the review and study types eligible for the review. They were also only given 48 hours to complete each task. The Crowd performed well, comfortably completing the screening task for three of the four reviews within 48 hours (one review took just over 48 hours to complete). Crowd accuracy levels were high, ranging from 90%-100% recall across the four reviews. This methodological work furthered our understanding of crowdsourcing capabilities in topic-based screening tasks under tight time constraints. The Crowd also inputted directly into the update of the rapid review on quarantine measures, where 65 Crowd contributors screened the 5000 results re- Fig. 2

Weekly screening challenges
From April 2020, we started a series of weekly 3-hour Crowd challenges. Each week we select a task and encourage as many as possible to get online and join in. During the early days of the pandemic, when most of us were in strict lockdown with many not able to work, this felt like a suitable community engagement activity that enabled us to keep some of our "business as usual" tasks going. We have now completed over 50 weekly challenges and in that time, screened approximately 100,000 records mostly from the RCT Identification task.

COVID-19 machine learning classifier
The final area of Crowd input is related to the development of a machine learning classifier for COVID-19 studies. In July 2020 members of the CCSR team and the COVID EPPI-Centre Map team, based at University College London, set up a series of meetings with the aim of sharing best practice and reducing duplication of effort across the two initiatives. One area of focus was on strategies to reduce study identification screening burden. The EPPI-Centre Map team had already developed a binary machine learning classifier that worked to reduce screening workload as well as to help prioritise screening. Given the differing scope regarding studies eligible for the CCSR and the EPPI-Centre COVID Map, we decided that a new binary machine learning classifier should be developed specifically for the CCSR workflow. We therefore used high quality training data generated by both the core Cochrane register team and Cochrane Crowd to train, calibrate and evaluate a COVID-19 study classifier. We followed the same stages of training, calibration and validation as we had done for the development of the Cochrane RCT classifier (14). The result is a classifier that helps to accurately identify records that are not eligible for the CCSR. We have been using this classifier since February 2021, reducing screening burden by between 20-25%.

Conclusion
COVID-19 presented us with major information retrieval challenges, but also provided important oppor-tunities for research and development on methods, processes, and tools. Our experiences have highlighted the benefit of focussed and collaborative working. Development, testing and full implementation of Cochrane Crowd's most complex task to date took eight weeks instead of the more usual 12-24 months. We were able to use and adapt existing systems (such as the Cochrane Crowd platform), processes, for example Cochrane's Centralised Search Service, and expertise across information and data science disciplines. The Cochrane Crowd community itself played an invaluable role in enabling us to keep-up, advancing our expectations of crowdsourced capability in evidence synthesis. We are now working on extending the Crowd's role to include PICO extraction of both COVID-19 studies as well as studies in other health care areas. This will, we hope, significantly improve search precision, and support accurate surveillance of the evidence as it emerges. In its early days, the pandemic appeared to be highly disruptive to "business as usual", but in hindsight it has accelerated our work and our understanding of the value of human and machine input in the production of health research. Sharing an overarching mission to help during a global health crisis, organisations at different levels of the evidence ecosystem pulled together to make the emerging evidence base FAIR (findable, accessible, interoperable, and reusable). Duplication of effort still occurred and enormous challenges remain as the deluge of information around COVID-19 shows little sign of abating, but for the Cochrane Crowd team, the experience and the learning of the last twelve months has been important and lasting.