What settings have been linked to SARS-CoV-2 transmission clusters?

Background: Concern about the health impact of novel coronavirus SARS-CoV-2 has resulted in widespread enforced reductions in people’s movement (“lockdowns”). However, there are increasing concerns about the severe economic and wider societal consequences of these measures. Some countries have begun to lift some of the rules on physical distancing in a stepwise manner, with differences in what these “exit strategies” entail and their timeframes. The aim of this work was to inform such exit strategies by exploring the types of indoor and outdoor settings where transmission of SARS-CoV-2 has been reported to occur and result in clusters of cases. Identifying potential settings that result in transmission clusters allows these to be kept under close surveillance and/or to remain closed as part of strategies that aim to avoid a resurgence in transmission following the lifting of lockdown measures. Methods: We performed a systematic review of available literature and media reports to find settings reported in peer reviewed articles and media with these characteristics. These sources are curated and made available in an editable online database. Results: We found many examples of SARS-CoV-2 clusters linked to a wide range of mostly indoor settings. Few reports came from schools, many from households, and an increasing number were reported in hospitals and elderly care settings across Europe. Conclusions: We identified possible places that are linked to clusters of COVID-19 cases and could be closely monitored and/or remain closed in the first instance following the progressive removal of lockdown restrictions. However, in part due to the limits in surveillance capacities in many settings, the gathering of information such as cluster sizes and attack rates is limited in several ways: inherent recall bias, biased media reporting and missing data.


Introduction
The novel coronavirus SARS-CoV-2, responsible for coronavirus disease 2019 , was first identified in Wuhan, China at the end of 2019, and has since spread around the world (European Centre for Disease Prevention and Control, 2020). The capacity of the virus for human-to-human transmission, coupled with the lack of immunity in the population due to the novelty of SARS-CoV-2, has led to the implementation of severe reductions in people's movements in an effort to reduce disease impact. These strong measures are broadly described as "lockdowns". Due to the highly restrictive nature of lockdowns, and their impact on people's health, wellbeing and finances, it is likely that such interventions cannot be sustained for prolonged periods of time, and will have to be lifted, at least to some extent, before an effective vaccine becomes available.
To successfully remove these lockdown restrictions while avoiding a resurgence in SARS-CoV-2 transmission, we must better understand in which types of settings the virus is most likely to be transmitted. Determining particular places that are linked to clusters of cases could reveal settings that are responsible for amplifying the heterogeneity in transmission that has been reported: potentially 80% of transmission is being caused by only 10% of infected individuals (Endo et al., 2020). Notably, the difference in transmission risk between households and larger communal settings is unclear, as is the difference between indoor and outdoor transmission.
Quantifying these differences in transmission can be further facilitated by the fact that, in many countries now under lockdown, intensive contact tracing of imported cases was performed in the early stages of the epidemic, resulting in the detection of clusters of cases. This data, on the first detected clusters in a country, can give knowledge of the types of settings facilitating transmission before intensive social and physical distancing took place.
The aim of our work is therefore to gather information on reported clusters of COVID-19 cases to determine types of settings in which SARS-CoV-2 transmission occurred. This could inform post-lockdown strategies by identifying places which should be kept under close surveillance and/or should still remain closed to avoid a resurgence in transmission.

Outline
We searched for scientific literature and media articles detailing clusters of SARS-CoV-2 transmission (details below) and extracted data into a Google Sheets file (accessible at https://bit.ly/3ar39ky; archived as Underlying data (Leclerc et al., 2020)). We defined "settings" as sites where transmission was recorded resulting in a cluster of cases. We restricted our definition of "cluster" to the first-generation cases that acquired the infection due to transmission in a single specific setting at a specific time. For example, if a person was infected on a cruise ship, and later infected additional people after disembarking, we would not consider that the latter were part of that "cruise ship cluster", since they were not infected on the ship. We recorded the country and further details about the type of setting, the numbers of primary and secondary cases in the cluster, cluster sizes, and attack rates. We defined a case as a person reported to be infected with the SARS-CoV-2 virus, regardless of symptoms.

Search strategy
References were found in four ways. Firstly, we performed a systematic literature review for COVID-19 clusters in PubMed on the 30th March 2020 (search term below). A total of 67 papers were found. Two reviewers (GMK and QJL) performed data extraction into the online database. We chose to only search this database and use peer reviewed articles as a quality threshold. We included data from English abstracts (where possible), but otherwise excluded non-English publications. Secondly, we used the online Google search engine to find media articles detailing settings of SARS-CoV-2 transmission in general. We searched for combinations of either "COVID", "COVID-19", "COVID-2019","severe acute respiratory syndrome coronavirus 2", "2019-nCoV", "SARS-CoV-2", "2019nCoV" or "coronavirus", and the words "transmission cluster" (e.g. "COVID transmission cluster" or "SARS-CoV-2 transmission cluster"). We only included online articles in English. From the collated list of settings, we then performed a further search for transmission in each of these settings (week beginning 6th April 2020).
Thirdly, we investigated whether information on the settings in which the first 100 "transmission events" in countries with current COVID-19 outbreaks existed by searching for publicly available data sources. As substantial investigation of cases often occurs early in an outbreak, any clusters linked to the first ~100 cases in countries outside China could give information on the transmission of SARS-CoV-2 in the absence of any social distancing measures.
Finally, following the original publication of this article on 01/05/2020, we included a "Suggested updates" tab in our publicly available database (https://bit.ly/3ar39ky). This allows other individuals to suggest new clusters we should include in our analysis. We review these suggestions regularly, and add

Amendments from Version 1
This article has been updated in response to reviewer comments, and to include 49 new transmission events which have been added to our online database. We now discuss a total of 201 transmission events (previously 152), classified into 22 setting types (previously 18).
Any further responses from the reviewers can be found at the end of the article REVISED those with sufficient detail to our "Latest updated results" tab. In this revised version, we have updated our analysis to include suggestions we reviewed up to 26/05/2020.

Cluster characteristics and setting definition
With the above data, we then aimed to estimate both the final (proportion of people in that setting who became infected) and secondary (proportion of contacts of one case who became infected) attack rates in each setting. These were previously identified as key metrics, particularly within households, to estimate whether transmission is driven by a relatively small number of high-risk contacts (Liu et al., 2020).
We defined a setting when several reports mentioned clusters linked to spaces with certain characteristics. For example, "Religious" includes churches and mosques, while "Public" here means public communal shared spaces such as markets or welfare centres. Where settings were a mixture of indoor and outdoor spaces, we used a mixed indoor/outdoor classification.

Results
We found evidence of SARS-CoV-2 transmission clusters for 201 events, which we classified into 22 types of settings (Table 1  and Table 2). All the studies with relevant data are compiled in an online database (accessible at https://bit.ly/3ar39ky; see also Underlying data (Leclerc et al., 2020)). Many of the published reports with setting specific data came from China (47/201) and Singapore (51/201).
The vast majority of these clusters were associated with indoor or indoor/outdoor settings (21/22). Large clusters, such as those linked to churches and ships, were infrequently reported. Almost all clusters involved fewer than 100 cases (181/201), with the outliers being transmission in hospitals, elderly care, worker dormitories, food processing plants, prisons, schools, shopping and ship settings. Religious venues provided a further setting with large cluster sizes: there were separate clusters in South Korea, France, India and Malaysia (Ananthalakshmi & Sipalan, 2020;BBC, 2020;Salaün, 2020;Shin et al., 2020). In addition to these settings with maximum cluster sizes of more than 100 cases per cluster, we identified five further settings with maximum cluster sizes between 50 and 100: sport (65 cases) (Korean Centre for Diease Control & Prevention, 2020), bar (80 cases) (Sim, 2020), wedding (98 cases) (Ministry of Health -New Zealand, 2020), work (97 cases) (Park et al., 2020) and conference (89 cases) (Marcelo & O'brien, 2020).
We found a notably high number of transmission events reported in worker dormitories (21/201), although all of these were from Singapore. This type of setting had the second highest total cluster size out of all the recorded events we found, with 797 cases reported in the S11 dormitory cluster in Singapore (Data Against COVID19 SG, 2020).
We found only a small number of clusters linked to schools (8/201), and there the SARS-CoV-2 cases reported were most often in teachers or other staff. For example, for two school clusters in Singapore (Ministry of Health -Singapore, 2020), 16/26 and 7/8 cases were staff. Some children were also found to be infected in these clusters, as was the case in the Salanter Akiba Riverdale school in New York, USA (Ailworth & Berzon (2020)), although testing for infection was not always universal. In a retrospective close cohort study in a French high school however, 133 children and staff were seropositive for anti-SARS-CoV-2 antibodies, 92 of whom were pupils (Fontanet et al., 2020).
We identified 9 clusters linked to food processing plants in 4 different countries (USA, Germany, Canada, Netherlands). These transmission events have led to large clusters, such as in a meat processing plant in South Dakota where a total of 518 employees were infected by SARS-CoV-2 (Cannon, 2020).
The setting with the greatest number of reported clusters of SARS-CoV-2 transmission was households (36/201). Again, most were from China (25/36) with all cluster sizes being less than 10. However, for 27 out of 36 studies, we were unable to calculate either the secondary or final attack rates due to a lack of information on total household size.
We aimed to estimate secondary and final attack rates in other settings but, as for households, we found that there was substantial missing data. In particular, the number of individuals in a setting was missing, and so we were unable to perform this analysis. Where attack rates could be estimated for individual clusters, these are reported in the online database.
Although information on the index and early cases in a setting was often reported, further information on the subsequently reported 10-100 cases in a country was difficult to extract. Moreover, the index cases were often quarantined and hence not linked to further transmission in most settings.

Discussion
In this review of SARS-CoV-2 transmission events, we found that clusters of cases were reported in many, predominantly indoor, settings. Note that we restrict cluster size to only include individuals infected within a specific setting, and exclude secondary infections which occurred outside the settings. Most clusters involved fewer than 100 cases, with the exceptions being in healthcare (hospitals and elderly care), large religious gatherings, food processing plants, schools, shopping, and large co-habiting settings (worker dormitories, prisons and ships). Other settings with examples of clusters between 50-100 cases in size were weddings, sport, bar, shopping and work. The majority of our reports are from China and Singapore.

Limitations
The settings collated here are biased due to the nature of our general search for SARS-CoV-2 transmission described above. Although based on a systematic review of published peerreviewed literature, many of the reports included came from media articles where relevant epidemiological quantities were not always reported, resulting in many missing data. Many of the more detailed studies originated from the early outbreak in China, especially those providing household information. The settings Table 1. Summary of gathered reported events as of 20th April 2020. Where only one study for this setting is reported, the minimum, maximum and median number of secondary cases in the cluster and/or total cluster size correspond to this single reported number (if given). Total cluster size accounts for all primary and secondary cases in the cluster. For references see the online database, accessible at https://bit.ly/3ar39ky.

Transmission setting Definition
Bar Indoor space such as a bar, club, pub, small live music venues etc.
Building site Outdoor space where construction work takes place.

Conference
Indoor professional event with many people interacting and meeting, shaking hands, eating together, team activities, etc.
Elderly Care Care homes for the elderly; includes staff and residents. Transmission can occur between staff and residents but also from visitors.

Food processing plant
Any establishment that processes food for human consumption, such as a meat or vegetable packing plant.

Funeral
Indoor or outdoor burial ceremony; includes close contact with others such as hugging, shaking hands, eating together, singing, praying, etc.

Hospital
Any transmission that occurs within a hospital between patients and/or staff, in a COVID19 ward or not.

Hotel
Any transmission that occurs within the hotel e.g. hotel rooms, shared spaces, reception desk, etc.

Household
Transmission between individuals in a shared living space

Meal
When people eat together. Meals included took place in restaurants, hotels, cafes, home, etc. Transmission occurs over a meal by speaking, sharing foods, touching the same surfaces, etc.
Prison Any transmission that occurs within a prison between prisoners and/or staff.

Public
Where transmission occurs on public property and does not fall into any of the other settings e.g. park, welfare centre, foodbank, etc.

Religious
Transmission occurs at a religious event such as at mass, services, prayer time, choir practice, etc.

School
Childcare or learning environments (schools, nurseries, kindergartens etc). Includes staff and children.

Ship
Any ship at sea. Includes crew and/or passengers onboard.

Shipyard
Large indoor or outdoor space where ships are made or repaired. Includes those working on the ship as well as customers Shopping A shop or shopping centre. Includes customers and those working in the shop.

Sport
Participation in a sporting activity indoor or outdoor e.g. gym or running.

Transport
Any means of public transportation, such as bus, plane, metro etc.

Wedding
Indoor or outdoor wedding celebration.

Work
In the workplace, typically an office.
Worker dormitories A shared living space for workers.
we identified here therefore might not be representative of settings from a global perspective. Bias is present when relying on media coverage -a cluster is more likely to be reported if controversial or if there is an interesting social narrative. This is then compounded by the method search engines use to provide results where priority is given to high traffic stories. Overall, this can lead to some settings being overly represented in our database, which is why the numbers of clusters per settings should be compared cautiously.
Similarly, there is a bias in our reports which means that attendance in settings with many individuals is more likely to be linked to a cluster: recall bias (Spencer et al., 2017). The accuracy of memories is influenced by subsequent events and experiences such that special, one-off events may be more likely to be remembered and potentially reported. If multiple single transmission events had occurred whilst walking in a park, for example, these would be less likely to be remembered, and more difficult to detect and hence record. Networks of close contacts also tend to be small, resulting in multiple opportunities for transmission, and hence potentially increase the importance of households or workplace for transmission instead of single outstanding settings of potential transmission. Hence, we cannot determine with any reliability the relative importance of the reported different types of settings beyond the record that clusters have been linked to such places.
Other events, such as large music concert (Dalling, 2020), political (Jones, 2020) and sporting (Hope, 2020; Roan, 2020; Wood & Carroll, 2020) gatherings, could potentially have been linked to clusters of COVID-19. But, in the absence of rigorous surveillance systems and widespread testing that would allow countries to link and report the transmissions of such events, such connections remain speculation. An example of this lack of surveillance would be the UK, where only 4/201 clusters have been recorded The outlier for this is Singapore which appears to investigate clusters systematically and provides a well-designed online dashboard with details of all clusters detected (Data Against COVID19 SG, 2020).
In many settings, only symptomatic cases of disease severe enough to require hospitalization are tested and ultimately reported. This misses those infections that result in mildly symptomatic or asymptomatic symptoms, although there is mounting evidence for a significant proportion of infections to remain asymptomatic (Gudbjartsson et al., 2020;He et al., 2020;Lavezzo et al., 2020). For some of the clusters, primarily households, all contacts were tested for infection; but for most of the data collated here, the number of COVID-19 symptomatic cases was the only information provided. These reported cases are a subset of all infections and in the absence of more comprehensive data, such as could be collated through widespread cluster investigation and community testing, we cannot conclude anything about clusters of infections, nor that we have included all relevant settings in which transmission can occur. We were also unable to estimate attack rates from the available data, meaning that comparison between rates of transmission in settings is impossible to achieve.
Settings associated with large cluster sizes One type of setting that was associated with large numbers of eventual cases was religious venues. The common features of these meetings are the large number of attendees, confined spaces and physical contact. For example, there were eventually more than 5000 COVID-19 cases linked to transmission at the Shincheonji Church of Jesus in South Korea (Shin et al., 2020). In this particular religious venue, no preventative action was taken despite knowing members were infected with SARS-CoV-2. In other venues, transmission events took place without prior knowledge of any infections and before the WHO declared pandemic status. Other large clusters in this setting type were associated with annual religious events that took place over a few days or weeks (Ananthalakshmi & Sipalan, 2020;BBC, 2020;Salaün, 2020). Attendees returned to their home countries where they continued to transmit. This generated many secondary cases internationally as well as locally. However, it is clear from smaller "first-generation" clusters, which our analysis focuses on, that these settings provide ideal conditions for transmission: we found 7/16 identified religious clusters had 10 cases or less, whilst 9/16 had 23 or more (see online database https://bit.ly/ 3ar39ky and Underlying data (Leclerc et al., 2020) for more information). The number of cases in each cluster is an approximation, and little is known about the number of index cases in these religious meetings to begin with, with the exception of the South Korea cluster. Religious events are well known sources of heightened transmission; there is a focus on vaccination recommendations for attendees to the annual Hajj pilgrimage for example, which is currently being postponed for 2020 (Aljazeera, 2020).
Worker dormitories have been recognised as key places linked to transmission in Singapore, with 893 out of 942 new cases recorded on April 18th being residents in such dormitories (Asia, 2020). We found 21 reported clusters, one of which had the second largest cluster size of all the events we report here; 797 cases which from the data we believe is a first-generation cluster. Worker dormitories are similar to households (Dalling, 2020) in the sense that they are places where people live together and come in frequent close contact; however, the number of residents in dormitories is higher than in most other households. This probably contributes to the higher cluster sizes seen in this setting. Additionally, hygiene facilities can be limited in worker dormitories (Paul et al., 2020), which could also explain the higher transmission. These points also apply to prisons, another type of large co-habiting setting for which we have identified 4 clusters with a maximum cluster size of 353 cases. It would be beneficial to compare attack rates across households, worker dormitories and prisons, to better understand which factors influence the risk of transmission between people who share a living space. Unfortunately, we were unable to identify the total number of residents in these dormitories and prisons, which prevented us from deriving attack rates and making this comparison.
In addition to religious events and worker homes, we also identified clusters of more than 100 cases in elderly care homes, hospitals and ships. These are all known to be at risk of clusters of infectious disease (Blanco et al., 2019;Kak, 2015;Lansbury et al., 2017). Moreover, people in these settings are often older than the general population and hence at greater risk of severe forms of COVID-19 disease (U.S Centers for Disease Control and Prevention, 2020). The increased mortality and likely dependence on availability of personal protective equipment (PPE) mean that healthcare clusters are more politically sensitive and hence more likely to be reported.
A more unexpected setting type is perhaps food processing plants, in which we identified clusters of up to 518 cases (Cannon, 2020). These plants have been the source of clusters in multiple countries. It is possible that the cold atmosphere in this setting has facilitated the spread of the virus (Molteni, 2020). Other possible explanations include the close proximity of workers for prolonged periods shared welfare spaces, as well as the need to speak loudly to communicate over the noise of the machines, which could lead to an increased projection of viral particles. Another explanation is that we may not be seeing clusters from other manufacturing settings with similar working environments, as fewer have been in operation due to lockdown guidelines during the pandemic, whereas food production has continued.
We identified seven additional setting types with cluster sizes above 50 or 100 cases (school, sport, bar, shopping, wedding, work and conference), which shared characteristics with the settings described above (see online database for more information https://bit.ly/3ar39ky and Underlying data (Leclerc et al., 2020)). Notably, sport, bars, shopping areas and conferences are predominantly indoor settings, where people are in close proximity. For conferences and work, like religious events, transmission within the cluster is facilitated by the duration of the events over several days, as well as the combination of interactions there (workshops, dinners etc…). This can also apply to weddings, where transmission is further increased due to the close-proximity interactions between people (kissing, hugging, dancing etc…). As for bars and shopping areas, these are places with important fluxes of people, which increases the diversity of contacts. Finally, schools, like religious groups, can sometimes represent tightly knit communities which facilitates disease transmission amongst individuals, as was the case with the Salanter Akiba Riverdale school in New York, with a cluster size of at least 60 cases (Ailworth & Berzon (2020)).

The first 100 transmission events & under reporting
The pursuit of the first 100 transmission events revealed little on settings of transmission. This reflects the wider issue we found of under reporting and is likely to reflect the fact that many public health surveillance systems were quickly overwhelmed and could not continue outbreak investigations. An example of this is the UK where only limited information on case follow-up and cluster investigation appears to be available. The impact of such under reporting is that we cannot say with certainty what contribution each setting had to overall transmission -we do not have the denominator information on time and contact in all settings. Nor do we have universal screening for detection of all infections, many of which will be asymptomatic. The importance of such universal testing for infection in interpreting whether transmission has occurred in a setting is highlighted by the difference between the low number of clusters linked to schools and the high level of infection reported in one French high school study (Fontanet et al., 2020).
Further work could pursue data from early investigation of cases where available, to explore the relative importance of different settings to transmission. Importantly, this may counter a bias towards small cluster sizes: with a lack of follow-up only some of the cases actually linked to a setting may be reported and linked. Detailed outbreak investigations should also be explored to get information on the places where transmission is unlikely to have occurred, e.g. if a COVID-19 patient reports 30 contacts at place "A", "B" and "C", but only contacts in "C" subsequently become infected this reflects reduced risk in settings "A" and "B".

Implications for further work
We found that many clusters of cases were linked to indoor settings, but this may be because early spread in China was during their winter, with people naturally spending more time inside close spaces. Increasing evidence suggests that transmission of SARS-CoV-2 can occur via airborne droplets (Morawska & Cao, 2020); however, it is likely that outdoor transmission risk is lower (Nishiura et al., 2020). Further work is needed to clarify this. We found only few clusters in school settings. However, there were many clusters associated with household transmission, and children could be the entry point for the virus into this setting.
Although it should be noted in this context that the Report of the WHO-China Joint Mission on Coronavirus Disease 2019 (COVID-19) did not find a single instance where people recalled transmission from a child to an adult (WHO-China Joint Mission Members, 2020). More generally, the role of children in widespread transmission of the virus is unclear, and whether reopening schools could trigger increased introductions of the virus into households and further within-household spread will have to be carefully monitored.
Further investigation of settings that facilitate clusters of transmission could provide important information for containment strategies as countries lift some of the current restrictions. Previous work has suggested that there might be considerable heterogeneity in individual transmission, which would imply a disproportionate impact from preventing large transmission events from occurring (Endo et al., 2020). Whilst widespread contact tracing is often considered part of future containment strategies, there is a need for this to be complemented with retrospective investigation of clusters in order to better understand the extent to which certain settings and behaviours are at particular risk of generating clusters of transmission. This could, in turn, inform contact tracing efforts and might be particularly relevant in the context of contact tracing using mobile phone apps, which has recently been suggested in support of more traditional contact tracing (Ferretti et al., 2020). For example, past co-location in certain settings could be a trigger for notification of risk from an app instead of, or in addition to, individual contacts.

Online database of collected reports
The online database (accessible at https://bit.ly/3ar39ky) provides information on all collected reports, references and information on cluster sizes as well as notes about the study. This database will be kept as a static source linked to this report, but with an additional tab for newly reported settings. Readers can submit information in the "Suggested updates" tab and we will aim to update information if evidence for substantial new clusters are found linked to a setting that was not in this study.

Conclusions
In conclusion, we found evidence of SARS-CoV-2 transmission in many types of settings. Our results provide a basis to identify possible places that are linked to clusters of cases and could be closely monitored, for example by linking to app-based contact tracing, and/or remain closed in the first instance following the progressive removal of lockdown restrictions. However, reporting should be improved in the majority of settings, with implementation of systematic reporting on the number of potentially exposed individuals and the number of confirmed and suspected cases from these settings, to allow the estimation of attack rates. This project contains 'COVID-19 settings of transmissiondatabase.xlsx', which contains the data extracted from the initial search, as well as an updated version of the dataset from 26/05/2020.

Up to date information on all collected reports is provided in an
open-access online database (accessible at https://bit.ly/3ar39ky).
This database provides references and information on cluster sizes as well as notes about the studies.
Data are available under the terms of the Creative Commons Zero "No rights reserved" data waiver (CC0 1.0 Public domain dedication). 1.

Open Peer Review Current Peer Review Status:
Version 2 30 June 2020 Reviewer Report https://doi.org/10.21956/wellcomeopenres.17583.r39003 © 2020 Scarpino S. This is an open access peer review report distributed under the terms of the Creative Commons , which permits unrestricted use, distribution, and reproduction in any medium, provided the original Attribution License work is properly cited.

Samuel V. Scarpino
Network Science Institute, Northeastern University, Boston, MA, USA In this manuscript, the authors conduct a thorough literature review and identified SARS-CoV-2 transmission clusters. After assembling their data set, the authors discuss the possible similarities in settings associated with transmission. As stated, understanding how transmission risk varies across settings is critical for the safe relaxation of measures implemented to control the spread of COVID-19. This paper provides a valuable resource and synthesis of what is currently known. I should note that this article has already been evaluated and I believe the authors have adequately addressed the points raised by the previous reviewer. However, I do have a few additional comments/questions, which I hope the authors find constructive.
While Google Sheets is a convenient tool for entering and sharing small data sets, it is not "permanent" and also has the potential to be corrupted or heavily modified. There is also no easy way for authors to cite the "version" of the sheet used. The authors do provide a Figshare, but that appears to date back prior to the revised version. I would strongly suggest regularly archiving a version of the data set and assigning each update a version number. At a minimum, please provide a DOI for the revised data set.
I am concerned that one reason we don't see more evidence for transmission at schools is that schools were closed early in nearly all locations. To my knowledge, Sweden is not reporting data on whether there have been significant transmission in their schools (as the authors know not all of which are open). I believe the authors should provide a strong disclaimer, either in the abstract or early in the discussion that we really don't have much to go on w.r.t. schools. (Of course this is my opinion and likely subject to debate).
4. The authors note that they, "use peer reviewed articles as a quality threshold," and, while I strongly disagree with the exclusion of pre-prints, I think the authors should at least provide some information on how many studies or clusters were excluded. Given the long (and increasing lag) between pre-print and publication, is this study missing half of all clusters that are currently published or in-review? 10% 95%? Providing information around what's been excluded is standard practice for such reviews and feels critical in this case.

Is the work clearly and accurately presented and does it cite the current literature? Yes
Is the study design appropriate and is the work technically sound? Partly

Are sufficient details of methods and analysis provided to allow replication by others? Partly
If applicable, is the statistical analysis and its interpretation appropriate? Not applicable 1.

Are sufficient details of methods and analysis provided to allow replication by others? Yes
If applicable, is the statistical analysis and its interpretation appropriate? Yes 1.

1.
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? Partly
No competing interests were disclosed. Competing Interests: Reviewer Expertise: Epidemiology of infectious diseases I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above. Thank you for taking the time to review our article. Please note that we have now updated our analysis to include an additional 49 transmission events (201 events total) and 4 new settings type ("Food processing plant", "Prison", "Transport" and "Wedding"; 22 setting types total). Some of these new elements overlap with your suggestions. Our Discussion section has also been updated to reflect these new results.

I have several major concerns:
Most settings reported herein are not representative of settings from a global perspective, most are from the initial epidemic in Asia (mainly from the Singapore dashboard and <20% of settings in the manuscript are outside of Asia). This needs to be added to the discussion as a major limitation. Thank you for raising this point. We already mentioned in the Discussion -Limitations section that many studies originated from the early outbreak in China, but have included an additional sentence there to clarify that this could prevent our results from being directly applicable to other countries. That said, please note that in our updated analysis, 98/201 (50%) events are from China and Singapore, compared to 92/152 (60%) in our original analysis, which improves the coverage of our results. The added sentence is "The settings we identified here therefore might not be representative of settings from a global perspective." Some important and widely reported outbreaks in particular settings are missing. Thank you for suggesting these additional clusters; we have now added the Ruby Princess and the 1. 1.

1.
and outbreak when searching media reports.
Thank you for suggesting these additional clusters; we have now added the Ruby Princess and the French ski resort events. Our initial analysis was focused on trying to find distinct in which transmission had settings occurred. Hence we were initially trying to prioritise examples of new linked to clusters settings rather than gathering all data on all outbreaks linked to all settings. This has changed somewhat with the open source database and we are happy to act as a gathering point for cluster data. For the outbreak in Mulhouse, this falls into the category of events that we do not include in our analysis. This because we are interested in understanding transmission only within specific settings; for example, for a cruise ship, the cluster size we report corresponds to the number of people infected on that ship only, not the people that these might have infected after disembarking. If we included people infected by passengers after disembarking, this would not reflect the "cruise ship" setting, as this additional transmission could occur in a variety of other settings (household, meal etc…). We had already highlighted this in the Methods -Outline section, but have now repeated that point at the beginning of the Discussion to hopefully make this distinction clearer ("Note that we restrict cluster size to only include individuals infected within a specific setting, and exclude secondary infections which occurred outside the settings.") Given that this manuscript from a team in the UK, it is surprising that only 4 outbreak settings were reported for the UK. The authors need to discuss why they were not able to find more reports from the local and national media outlets in English speaking countries like UK, Ireland, and possibly also Australia, Canada and the US. Our initial search was at the end of March. At that time, the number of confirmed cases in the UK was around 20,000, compared to more than 200,000 now. Therefore, there was little information at the time on clusters in these countries compared with Asia, which is why we were less likely to find media reports on that topic for the UK. For similar reasons, we had little information for English-speaking countries. In addition, because of the lack of widespread testing in the UK and/or follow-up of cases, information on clusters does not appear to be widely available in the UK. As of 26/05/2020, we have now identified 39 transmission events in English-speaking countries (19% of all the transmission events we have identified so far). Therefore, our updated analysis is more geographically balanced.
The authors should discuss reasons for under reporting: public health surveillance systems in many countries were quickly overwhelmed to investigate transmission settings and chains of transmissions. Transmission clusters in elderly care and hospitals homes due to political sensitivity, linked to increased mortality, lack of adequate PPE equipment Thank you for this suggestion. In line with your comments on the "first 100 transmission events" we have adapted the paragraph in the discussion to discuss reasons for under reporting. We have also added a sentence to the paragraph on healthcare clusters in the discussion to reflect the likely increased reporting of clusters linked to these settings due to political sensitivity.
Meat factories and slaughter houses have recently emerged as high risk setting in the US (https://edition.cnn.com/2020/04/08/business/meat-plant-closures-coronavirus/index.html) and Germany (https://www.dw.com/en/coronavirus-breaks-out-in-third-german-slaughterhouse/a-53389860). This setting should be included separately in Table 1. Thank you for raising this point. Our online database had been updated to reflect this, and we have now added the "Food processing plant" setting type in our analysis, and comment on this in the Results and Discussion sections of our article.
This also applies to our new "Prison", "Transport" and "Wedding" setting types.
1 Add wet/cattle markets to the transmission settings list 2 Split food processing plant into two fractions meat and non-meat Thank you to all participants/contributors to your paper. It is most creditable & worthwhile and I believe will prove most valuable line of research. This is an important topic. I am concerned about your search. I may have missed it, but I think having done this scoping exercise that you should rerun your searches with specific terms (and synonyms) for the settings of interests: schools churches, weddings, meatworks (lots of synonyms) etc. I am guessing that you will get a lot more hits. I don't think that 'transmission cluster' is a sufficiently sensitive term. I'd also like to see a PRISMA flow diagram.

Version 1
Reader Comment 21 May 2020 , Independent Consultant in Epidemiology, Colombia María Margarita Ronderos Torres I would like to draw to your attention the football match between Atalanta from Bergamo and Valencia from Spain on the 19th Feb at the San Siro Stadium in Milan. Aprox 40,000 fans from the Region attended the match. 35% of the Valencia team delegation when returning to Spain tested positive for COVID19. The region only went into lockdown on the 4th of March. This gave ample time (1.5 t 2 incubation periods) for household transmission with high intergeneration mix and known high elderly population. Further study is needed but this could be very well explain the explosion of cases that followed and is in line with your proposed explanation for super spread of the virus.

NO competing interests
Competing Interests: