Healthcare Workers Bioresource : Study outline and baseline characteristics of a prospective healthcare worker cohort to study immune protection and pathogenesis in COVID-19

Most biomedical research has focused on sampling Background COVID-19 patients presenting to hospital with advanced disease, with less focus on the asymptomatic or paucisymptomatic. We established a bioresource with serial sampling of health care workers (HCWs) designed to obtain samples before and during mainly mild disease, with follow-up sampling to evaluate the quality and duration of immune memory. : We conducted a prospective observational study on HCWs from Methods three hospital sites in London, initially at a single centre (recruited just prior to first peak community transmission in London), but then extended to multiple sites 3 weeks later (recruitment still ongoing, target n=1,000). Asymptomatic participants attending work complete a health questionnaire, and provide a nasal swab (for SARS-CoV-2 RNA by RT-PCR tests) and blood samples (mononuclear cells, serum, plasma, RNA and DNA are biobanked) at 16 weekly study visits, and at 6 and 12 months. : Preliminary baseline results for the first 731 HCWs (400 Results single-centre, 331 multicentre extension) are presented. Mean age was 38±11 years; 67% are female, 31% nurses, 20% doctors, and 19% work in intensive care units. COVID-19-associated risk factors were: 37% black, Asian or minority ethnicities; 18% smokers; 13% obesity; 11% asthma; 7% hypertension and 2% diabetes mellitus. At baseline, 41% reported symptoms in the preceding 2 weeks. Preliminary test results from the initial cohort (n=400) are available: PCR at baseline for SARS-CoV-2 was positive in 28 of 396 (7.1%, 95% CI 4.9-10.0%) and 15 of 385 (3.9%, 2.4-6.3%) had circulating IgG antibodies. : This COVID-19 bioresource established just before the peak Conclusions of infections in the UK will provide longitudinal assessments of incident infection and immune responses in HCWs through the natural time course of disease and convalescence. The samples and data from this bioresource are available to academic collaborators by application . https://covid-consortium.com/application-for-samples/


Introduction
The global pandemic of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has led to more than 6 million infections and 300,000 deaths worldwide at the time of writing 1 . Healthcare workers (HCW) may be at greater infection risk compared to the general population 2-5 . Many infections are asymptomatic 6 , therefore surveillance of symptomatic coronavirus disease 2019 (COVID-19) underestimates the infection burden. This has led to calls for regular surveillance of asymptomatic HCWs 7-10 , to ensure that health care facilities do not become transmission hot-spots, to protect the workforce and vulnerable patients and to prevent community reseeding.
Most SARS-CoV-2 studies have focused on severe hospitalized COVID-19 cases [11][12][13][14] . Data are lacking on the host response and biology of asymptomatic or pauci-symptomatic infection as well as the early (pre hospitalisation) stages of disease. This undermines efforts to understand the determinants of disease severity.
We sought to provide a resource to address these gaps by establishing an observational cohort of HCWs who are well and attending work across selected central London hospitals. We aimed to characterize and quantify the rates of HCW infection (particularly mild or asymptomatic) over the first London COVID-19 pandemic wave, with moderate frequency longitudinal comprehensive sampling before infection and in the weeks to months afterwards. Accordingly, we established the COVID-consortium (https://covid-consortium.com) and the "COVID-19 Immune Protection and Pathogenesis in Healthcare Worker Bioresource" (NCT04318314). In this manuscript we: (1) provide a description of the study design, (2) present preliminary results of the baseline visit in the first 400 HCWs (single-centre, between March 23 rd and 31st 2020) and subsequent 331 (multicentre, from mid-April 2020) -focusing on two different time-points in the epidemiologic curve (just before and after the peak of new daily cases in London), and (3) call for research collaborators wishing to access biological samples in participants across the spectrum of COVID-19 to contribute to a range of prespecified objectives, planned by the consortium https://covid-consortium.com/application-forsamples/.

Study approvals
The study was approved by a UK Research Ethics Committee (South Central -Oxford A Research Ethics Committee, reference 20/SC/0149). All participants provided written informed consent.

Study participants
Adult (>18 years) hospital HCWs who were fit and well to attend work in any role and across a range of clinical areas, were invited to participate via hospital email, posters, staff meetings, training sessions and participant information leaflets (see https://covid-consortium.com). No other inclusion or exclusion criteria were considered.

Study design
The "COVID-19 Immune Protection and Pathogenesis in Healthcare Worker Bioresource" (NCT04318314) uses a prospective observational cohort design (Figure 1). The study consists of questionnaires and biological samples (blood samples, nasal swabs ± saliva) performed at all visits: baseline, weekly follow-ups for 15 weeks, and visits at 6 and 12 months.
Recruitment was initially at St Bartholomew's Hospital, London, UK (400 HCWs recruited between 23 rd and 31 st March 2020, just before the peak of new daily cases in London, which happened on the 2 nd April, with 1,022 new cases confirmed 15 ), a secondary care hospital part of Barts and the Royal London NHS Trust to a local population of 3 million, with specialist cancer and cardiovascular services to a supra-regional population of 6 million. In response to the pandemic, the hospital expanded ventilated intensive care provision for COVID-19 patients to 122 beds across five units. To improve statistical power for downstream analyses, we expanded the target sample size to n=1,000 and extended recruitment on 17 th April 2020 (after peak transmission in London, recruitment still ongoing) to other local sites: Nightingale Hospital London (a temporary hospital providing intensive care, set up in response to  and Royal Free NHS Hospital Trust (large teaching hospital with specialist expertise in infectious diseases). Collaborations with Cape Town (South Africa) and Sydney (Australia) are also in place to explore the impact of different surge rates, ethnicity, vitamin D levels and the 6-month seasonal difference; unlike UK sites, follow-ups there are performed every fortnight. Our team was comprised of researchers and volunteers from outside of the clinical supply chains.

Baseline visit
Participants complete a baseline questionnaire (  Table 2.
Subjects who miss an attendance due to shift pattern, redeployment or self-isolation for any reason, resume follow-up on return to work. Illness with suspected COVID-19 is self-reported to the study investigators. Following multisite expansion, participants were also allowed opt-in to a home nasopharyngeal swab and saliva test if self-isolating.

Sample collection
The schedule and quantity of biosample collection is summarized in Figure 2. All study personnel in contact with HCW participants were wearing appropriate PPE in accordance with Public Health England guidance. Nasopharyngeal RNA stabilising swabs are performed at baseline and weekly for 16 weeks. After appropriate training, participants were asked to self-swab both nostrils to minimise the risk to study staff. This strategy was later shown to be reliable when compared to swab collection by health care workers 19 . Blood samples were collected in Tempus TM tubes for whole blood RNA, clot activator tubes for serum, and EDTA tubes for plasma, peripheral blood mononuclear cells and DNA (Figure 2). Following multisite expansion in mid-April, a pool (2-3 mL) of saliva was collected into a dedicated saliva collection tube.
Initial sample processing All samples were registered into a Laboratory Inventory Management system onsite and either frozen at -80°C or transferred to a containment level 3 facility. Key samples collected and planned laboratory procedures are described in Table 3.

Core analyses
The following experimental approaches will be implemented (Figure 3), including: -Reverse transcription polymerase chain reaction (RT-PCR) of nasal swabs using Roche cobas® SARS-CoV-2 test 20 .
-Pathogen sequencing with results via the COVID-19 Genomics UK (COG-UK) consortium 21 .  Plasma: centrifuged and stored at -80ºC PBMCs: separated by density gradient centrifugation and cryopreserved Immunology * In the first 400 healthcare workers cohort, saliva samples were taken at the first opportunity after week 5; the participants that followed had a saliva sample taken at baseline.
-Blood RNA extraction focusing on host transcriptomics; -Peripheral blood mononuclear cells (PBMCs) are a scarce resource and discussions are ongoing about maximising yield; -Saliva will be diluted and aliquots are available. Further aliquoting will be dependent on demand; -Other antibody, antigen tests may also be made available should they emerge; -Serum and plasma will be aliquoted into 100µL samples and divided into packs for individual research teams. Excess RNA (swab and blood) and host DNA will potentially also be available.

Access procedure
The COVID-19 consortium has developed access systems to facilitate the use of this bioresource by scientists for healthrelated research of public interest. Research teams can apply to use the bioresource via the study website (https://covid-consortium.com/application-for-samples/). The access principles are those standard to many bioresources: to maximise yield of timely science, to make results available to other researchers in a reasonable timeframe via a data lake, to reward researchers with appropriate levels of authorship and, where present, intellectual property in a fair, transparent and swift way. We encourage teams to apply and to link their analysis datasets of hospitalised patients with severe disease. We also encourage applications from commercial entities as long as the core principles above apply.

Statistical analysis
When designing the initial study, we aimed to sample the population prior to exposure. At the time of ethics submission, there was no data to provide precise estimates. The n=400 was pragmatic, aiming for rapid recruitment and limited by logistical challenges of conducting research within a pandemic environment. An initial n=400 was estimated conservatively in order to ensure sampling without compromising selection criteria. Following initial recruitment success, more formal sample size calculation was possible for study expansion and based on an expected average baseline frequency of SARS-CoV-2 infection of 5% in previously undiagnosed HCWs according to studies 5 . Accordingly, the estimated sample size was n=786 for a β=0.20 and two-sided α=0.05. We targeted a sample size of 1,000 to account for a 20% drop-out rate. However, the specific responses we are seeking are emergent and unknown, and a wider strategy is to link with other studies.
This is a preliminary analysis of the key baseline characteristics of the data. We present discrete variables as absolute frequencies with percentages; continuous normally distributed variables as mean ± standard deviation. Continuous data were checked for normal distribution using Kolmogorov-Smirnov test and visual Q-Q plots assessment. Comparisons between groups were performed using Students' t-test, while categorical variables were compared using Fisher's exact test. Two-sided p-values <0.05 were considered significant. Statistical analysis was performed using SPSS (version 24.0, IBM Corp., Armonk, NY, USA).

Results
Baseline characteristics for the first 400 HCWs (single-centre, recruited just before peak transmission, St Bartholomew's Hospital) and subsequent 331 multicentre study expansion participants (after peak transmission, n=101 in St Bartholomew's Hospital, n=10 in Nightingale Hospital and n=220 in Royal Free Hospital) are presented in Table 4-Table 6. This reflects all baseline visits between March and May 2020 (total n=731).

Demographics
The mean age of all study participants was 38 ± 11 years (0.7% >65 years), 67% female, 37% were black, Asian or minority ethnicities. Demographics are further detailed in Table 4.
Community/social exposure The proportion of HCWs with a household size of at least three people was 48% (n=348), with a third of the participants reported having children at home (Table 6). Only eight participants (1%) had a proven contact with a confirmed COVID-19 case at home (Table 6). Overall, 41% percent (n=299) of HCWs reported having travelled overseas in 2020.

Symptoms, infection and serology
The prevalence of COVID-related symptoms in the two weeks prior to recruitment was 34% (n=249/731), significantly higher in the early cohort recruited in March (41% vs late cohort 26%, p<0.001). More HCWs from the multisite cohort recruited at the later time point thought that they had had prior COVID-19 (24 vs 7% in the initial cohort, p<0.001). Overall, the most prevalent symptom was nasal congestion (13%), followed by odynophagia (11%), dry cough (8%) and fatigue (8%). A recent (<3 months) respiratory tract infection was reported in 20% of the participants.

Discussion
This study is establishing a bioresource (COVID-consortium) derived from health care workers, with samples taken at the time of pre-symptomatic incident infection, linked to data on clinical outcomes, serology and follow-up sampling to evaluate the quality and duration of immune memory to the virus. Here we present preliminary baseline data on the first 731 participants, comprised of a single-centre cohort recruited in March 2020 just before the time of peak community transmission in London, and a subsequent expanded multicentre cohort recruited from mid-April 2020. This resource should enable collaborative science and approved investigators can apply for sample access or access to the resultant data lake to address specific questions or for incorporation into larger COVID-19 datasets.
HCWs baseline characteristics SARS-CoV-2 can rapidly spread to patients and HCWs in hospitals, and HCWs generally have been particularly hard hit with high reported rates of infection 2,3,5,24 . Our cohort is representative of a multi-ethnic urban UK population of working age, and more specifically of the NHS workforce across different clinical roles and departments. Confirmed COVID-19 contacts were low in the community (1%), but much higher in-hospital (43% patients, 30% colleagues), particularly in the second cohort (recruited later). All participants were selfreported as fit to attend work on all clinical visits, and at baseline the majority of participants had been asymptomatic and did not think that they had been infected. Nevertheless, 1 out of 10 participants had a confirmed baseline SARS-CoV-2 infection confirmed by PCR and/or positive serology test that could represent current or previous infection, at the beginning of peak transmission in March.
Of interest, two different timepoints are presented here. As one would expect, the proportion of HCWs who reported prior symptoms was significantly higher in participants recruited just before peak community transmission 25 , and those recruited a month later more often reported they suspected that they had already had COVID-19.

The COVID-19 bioresource
The scientific community has merged forces to tackle this unprecedented pandemic. Since the start of January 2020 (until   29 . The COVID-19 Staff Testing of Antibody Responses Study (CO-STARS) follows a similar design, with serologies performed monthly for 6 months and then 6-monthly for a total of 6 years 30 .
The comprehensive (questionnaires and biosamples) serial assessment of asymptomatic participants starting just before peak community transmission of SARS-CoV-2 makes our bioresource a precious dataset for the scientific community. We expect that the data sampled from HCWs facilitates understanding of mild disease and subclinical infection at a more rapid rate than the general population allowing comparison with those more severely affected or hospitalised for COVID-19. The COVID-19 consortium (https://covid-consortium.com) and the "COVID-19 Immune Protection and Pathogenesis in Healthcare Worker Bioresource" (NCT04318314) thus encourages research teams to apply (https://covid-consortium.com/application-for-samples/) and even potentially link their own datasets to ours (with results expected to be returned to the data lake for collaborative science). Some of the fields worth exploring include immune responses during the subclinical phases of infection, properties of the immunoglobulins and immune cellular reactivity (correlations between viral RNA PCR and subsequent serology, persistence of neutralizing antibodies, immune decay and longevity of serological responses), host and viral genetic variation, and other environmental or acquired risk factors.

Limitations
The three centres initially included reflect the epidemiological curve of a single city (London). The COVID-19 bioresource started at peak community transmission with prevalent asymptomatic infection in 7.1% and seropositivity of 3.8% at baseline. In data from the subsequent four weeks, we have already reported that the incident asymptomatic infections fell in line with reductions in the London wide incidence 31 . Nationwide data are accruing to assess the generalisability of our findings, but there are also opportunities to expand geographical coverage of our bioresource through collaborations with other studies that include serial sampling of HCWs. Although our cohort is ethnically diverse (37% non-white), the frequency of comorbidities is relatively low, there are no children and elderly subjects are under-represented. In addition, our cohort of hospital HCWs is unlikely to be generalisable to other institutional settings such as care homes.

Conclusions
Just before the peak of COVID-19 infections in the UK we established a rich and granular bioresource of healthcare workers with the aim of gathering insights into early disease / asymptomatic SARS-CoV-2 infection. Combining exposure with multi-qualitative and quantitative assessments, we envision a more complete picture of immune response in this context. The samples and data securely curated this bioresource are now accessible to the wider scientific community by application (https://covid-consortium.com/application-for-samples/).

Data availability
The COVID-19 consortium has developed access systems to facilitate the use of this bioresource and the data underlying this article by scientists for health-related research of public interest. Howevcer, although participants are pseudoanonymsed, there is data regarding home addresses, household characteristics, and other details that could potentially lead to identification. Research teams can therefore apply to use the bioresource via the application form, not from that can be found on the study website (https://covid-consortium.com/application-forsamples/).