Risk factors for SARS-CoV-2 seroprevalence following the first pandemic wave in UK healthcare workers in a large NHS Foundation Trust [version 2; peer review: 2 approved]

Background: We aimed to measure SARS-CoV-2 seroprevalence in a cohort of healthcare workers (HCWs) during the first UK wave of the COVID-19 pandemic, explore risk factors associated with infection, and investigate the impact of antibody titres on assay sensitivity. Methods: HCWs at Sheffield Teaching Hospitals NHS Foundation Trust were prospectively enrolled and sampled at two time points. We developed an in-house ELISA for testing participant serum for SARS-CoV-2 IgG and IgA reactivity against Spike and Nucleoprotein. Data were analysed using three statistical models: a seroprevalence model, an antibody kinetics model, and a heterogeneous sensitivity model. age, with sensitivity estimates of 89% in those over 60 years but 61% in those ≤30 years. Conclusions: HCWs in acute medical units and those working closely with COVID-19 patients were at highest risk of infection, though whether these are infections acquired from patients or other staff is unknown. Current serological assays may underestimate seroprevalence in younger age groups if validated using sera from older and/or more severe COVID-19 cases. Any reports and responses or comments on the article can be found at the end of the article. Abstract: Our assay specificity We that (n=311/1275) of HCWs were seropositive as June 2020.”


Introduction
Healthcare workers (HCWs) are at increased risk of COVID-19 1,2 . The true number of HCWs infected with SARS-CoV-2 to-date is unknown, particularly during the early stages of the pandemic. Initial methods to estimate HCW COVID-19 cases included extrapolation from work absenteeism rates, and are unlikely to be reliable 3 . Confirmation by molecular testing increased the accuracy of case detection, although access to nucleic acid amplification testing (NAAT) was limited during the early stages of the pandemic in the UK 4 . Serological testing can be performed at large scale, and is less affected by symptom-activated testing pathways, so may provide a more accurate estimate of previously infected HCWs and could be used in conjunction with other data to determine their risk factors for exposure 5-8 .
To enable the accurate interpretation of seroprevalence readouts, detailed characterisation of antibody evolution relative to the sampling time-frame, immunoglobulin isotype, antigenic target and assay performance is required [9][10][11][12] , Several commercial SARS-CoV-2 antibody assays have been validated using samples from patients with more severe COVID-19, and some studies have suggested those with milder or asymptomatic COVID-19 are less likely to develop detectable antibodies [13][14][15][16][17] . Furthermore, antibody levels to some coronaviruses are known to be higher in older individuals [18][19][20][21][22] . We theorised this may lead to age-specific differences in antibody assay sensitivity, which could be a significant confounder in population seroprevalence studies.
In this study we aimed to investigate SARS-CoV-2 seroprevalence in HCWs at Sheffield Teaching Hospitals (STH), a National Health Service (NHS) Foundation Trust in the United Kingdom (UK), following the first wave of the pandemic in the UK. To achieve this, we sought to measure SARS-CoV-2 antibody titres by developing and using an in-house assay, prior to using statistical modelling to explore risk factors associated with seropositivity, the evolving antibody response, and the impact of age on assay sensitivity.

Methods
Background and setting STH is an NHS trust offering secondary-and tertiarylevel care across four sites in South Yorkshire, UK, with 1,669 inpatient beds and 18,500 employees 23 . Patients with a medical reason for admission are typically admitted to STH either through attending the Emergency Department (ED), or through referral to the Acute Medical Unit (AMU) by their General Practitioner (GP). On AMU, patients are given an initial management plan before being triaged to the most appropriate medical specialty ward e.g. respiratory medicine. The first patient with confirmed COVID-19 was admitted to STH on 23 February 2020; the first wave of the UK pandemic occurred between March 2020 and June 2020. Patients with suspected or confirmed COVID-19 were referred directly to the infectious diseases ward by GPs and other STH admission areas as capacity allowed. When capacity was reached, suspected COVID-19 patients would be either placed in side rooms or cohort bays on AMU or other wards, whilst confirmed COVID-19 patients could be moved to cohort wards.
Testing of symptomatic staff for SARS-CoV-2 by NAAT was introduced on 17 March 2020. On the same day, Public Health England (PHE) de-escalated the recommendations for the personal protective equipment (PPE) required by HCWs caring for inpatients with suspected or confirmed COVID-19 from 'Level 3 Airborne' to 'Level 2 Droplet' for routine care 24 . Subsequently, the requirement for universal 'Level 2 Droplet' PPE for all inpatient and outpatient care began on 08 April 2020. Local STH policy was changed on 15 June 2020 to mandate staff use surgical face masks while on hospital premises.

Recruitment and consent
From 13-18 May 2020, all contactable STH staff (n=17,757) were invited to take part in the COVID-19 Humoral ImmunE RespOnses in front-line HCWs (HERO) study by email and intranet alert. To engage staff in areas with limited communications access, additional recruitment posters and face-to-face enrolment sessions were used.
Following an electronic informed consent process, participants provided self-reported data on-line on age, gender, ethnicity, job role, and pandemic working environment ('COVID-19 zones') 24 . Details of any possible or confirmed prior COVID-19 illnesses occurring since 01 February 2020 was also collected. These were categorised as: i), diagnosed with COVID-19 and confirmed by NAAT, ii), clinically diagnosed with COVID-19 but NAAT not performed, and iii), self-reported symptoms only 24 . Together, we defined these three groups as "symptomatic", as asymptomatic testing was only introduced after the study recruitment period. Those reporting no illness between 01 February 2020 to the date of recruitment were defined as "asymptomatic". All that had enrolled were emailed times of phlebotomy appointments, and were invited to attend on a first come first served basis for the first visit, and then invited by email to book a specific appointment slot to attend for their second visit after four weeks +/-7 days. An 8.5ml serum sample was taken at each visit to outpatient phlebotomy services for serological testing.

Amendments from Version 1
Following Reviewer 1's comments, the major changes between V1 and V2 are: 1. Development of our in-house assay has been added as an objective of the work, and the sensitivity / specificity findings of our assay have been moved from the methods to results section 2. Further information has been provided about the admissions and wards for readers who aren't familiar with the NHS in the background section 3. A section on volunteer bias has been added to the discussion section 4. A new graph ( Figure S1b) has been added to the Extended Data, showing the difference between Spike-and NCP-specific IgG response in inpatients vs outpatients from the development of our in-house assay Any further responses from the reviewers can be found at the end of the article To develop the in house ELISA, we created an assay  validation dataset 25 consisting of serum from 190 SARS-CoV-2  NAAT-confirmed cases (52 hospitalised patients and 138  healthcare workers with mild infections sampled between 14 and 120 days from NAAT positivity), and 675 patients sampled prior to 2017 (Extended data: Table S1). Thresholds based on the absorbance value at 450nm (A 450) for defining reactivity to spike (A 450 0·1750) or NCP (A 450 0·1905) were set to optimise the sensitivity of each assay. Given the IDSA guidance for ensuring a specificity of ≥99·5% in assays used for SARS-CoV-2 seroprevalence studies, specificity was enhanced by defining a SARS-CoV-2 seropositive sample as one where both spike and NCP were reactive 26 .

SARS-CoV-2 serology
Serum samples from study participants were then tested for IgG and IgA reactivity to two SARS-CoV-2 proteins using our in-house ELISA: the full-length extracellular domain (amino acids 14-1213) of Spike glycoprotein, including a replacement of the furin cleavage site R684-R689 by a single alanine residue and replacement of K986-V987 by PP, produced in mammalian cells; and full-length untagged Nucleocapsid protein (NCP) produced in E. coli (Uniprot ID P0DTC9 (NCAP_ SARS2)). 27-28 . High binding microtitre plates (Immulon 4HBX; Thermo Scientific, 6405) were coated overnight with proteins diluted in phosphate buffered saline, washed with 0·05% PBS-Tween, and blocked for one hour with 200 µL/well casein buffer. Following optimisation, sample dilutions used were 1:200 for the IgG assay or 1:100 for the IgA assay 24 . Plates were emptied and 100 µL/well of sample or control loaded. After two hours incubation, plates were washed and loaded with goat anti-human IgG-HRP conjugate (Invitrogen, 62-8420) at 1:500, or goat anti-human IgA-HRP conjugate (Invitrogen, 11594230) at 1:1000, for one hour. Plates were washed and developed for 10 minutes with 100 µL/well TMB substrate (KPL, 5120-0074). Development was stopped with 100 µL/well HCl Stop solution (KPL, 5150-0021), and absorbance read at 450nm. All steps were performed at room temperature.
A calibration curve of sera pooled from convalescent SARS-CoV-2 NAAT-confirmed patients with high antibody titres for both spike and NCP was included on plates to allow quantification of antibody concentrations. The calibration curve was generated by serially diluting in 1·75× steps from a starting concentration of 1:200 for the IgG assay or 1:100 for the serum IgA assay. When the WHO International Standard for anti-SARS-CoV-2 immunoglobulin (NIBSC, 20/136) later became available, the calibration curve was run in parallel for the IgG assay 24 . Data for the IgG assay are therefore given in WHO antibody units, whereas IgA assay data are given in arbitrary antibody units.

Sample size
To meet the primary objective of measuring the SARS-CoV-2 IgG seroprevalence, we calculated a sample size of 1,000 HCWs would provide +/-1·4% precision based on a seroprevalence estimate that ~4% of the UK population may have been infected by April 2020, with a two-sided 95% confidence interval (with n=753, Binomial exact 95%CI has been estimated to be 2·7-5·6%) 30 .

Statistical modelling
We considered three statistical models, i) a seroprevalence model, ii) an antibody kinetics model, and iii) a heterogeneous sensitivity model. For the seroprevalence model, we used the serostatus of all participants at first blood draw in a sensitivityand specificity-adjusted Bayesian multilevel logistic regression model. Using seropositivity as the binary response variable, we considered three different Bayesian Hierarchical Regression model subtypes all with explanatory demographic variables age, race and gender, and each model with a different primary exposure; job location, contact with COVID-19 patients, and job type 24 . In addition, we fitted a symptomatic prevalence model, where the data used were seropositive persons only, and the binary response variable was asymptomatic or symptomatic infection.
For the antibody kinetics model, we included samples from individuals who were seropositive at both bleeds, in a Bayesian multilevel linear regression model in two parts: i) using log2 antibody units (logAU) at the first blood draw as the response variable and ii) using the change in antibody titre at the follow up bleed (median 28 days) as the response variable. Age, ethnicity, gender and symptom severity (asymptomatic or symptomatic) were used as covariates and each model was run separately for four different antibody-antigen combinations; Spike-IgG, NCP-IgG, Spike-IgA, NCP-IgA. The time until seroreversion was calculated for each covariate group and antibody-antigen interactions by i) sampling a starting titre value and a rate of decline from the two models, and then ii) calculating the time until the minimum observed antibody value was reached for that antibody-antigen interaction, assuming a continuous rate of decrease.
In our heterogeneous sensitivity and specificity model, we explored how estimates for the sensitivity and specificity derived from our assay validation dataset generalise to covariate groups, e.g. participant age. To model the generalisability of these performance measures, we compared the seropositivity classification of our study dataset according to our in-house antibody assay, with the predicted seropositivity classification from hypothetical assays with an assumed sensitivity and specificity. Our model considers the different distribution of the A 450 values in the assay validation and HERO study datasets to model how reliably the sensitivity and specificity given by the manufacturers of the assays generalise to specific subpopulations. Using the assay validation dataset, we estimated an A 450 cut-off value for every chosen sensitivity value, and then used this A 450 cut-off to classify seropositivity in the study dataset. We then estimated the implied sensitivity on the HERO dataset by comparing seropositivity classification based on the estimated A 450 cut-off value, with the seropositivity classification from our in-house assay (which for ease of comparison, we assume represents the maximum possible sensitivity and specificity (i.e. 100%) in this model), we estimate an "implied" sensitivity on the HERO dataset which would arise if the commercial assay alone had been used to detect seropositivity. This framework allowed us to estimate the hypothetical performance of serological assays reported in the literature on our HERO dataset, along with co-variate specific sensitivity.
All analysis was performed in R version 4.0.2 31 and cmdstanr version 0.2.0 32 . An R package containing all the analysis in this study is available at https://doi.org/10.5281/zenodo.6320552.

Serology assay development
We found our in-house ELISA had a sensitivity of 99·47% (95% confidence interval (CI) 97·10% -99·99%) and specificity of 99·56% (95% CI 98·71% -99·91%) for our IgG assay (Extended data: Figure S1a). Compared with IgG, we saw more rapid waning of the IgA response following SARS-CoV-2 infection, as well as higher levels of cross-reactivity in pre-pandemic samples. These factors complicated defining seropositivity based on an A 450 threshold, as there was no clear separation between titres in these two groups. We therefore opted to use our spike and NCP IgA ELISA solely to compare IgA titres of individuals classified as seropositive by our IgG assay (Extended data: Figure S2). Antibody units at each given dilution of the calibration curve are shown in Table S2 (Extended data).
Registration and study visits 1478 STH staff consented to take part between 13 May and 5 June 2020 (Extended data: Figure S3). Of these, 1277 attended for a first visit (V1) between 15 May 2020 and 12 June 2020. As two samples were contaminated in transit, we obtained a valid serostatus for 1275 samples. 1174 attended a second visit (V2) between 15 June and 10 July 2020 (Extended data: Figures S3 and S4).

Antibody kinetics model
Differences in antibody concentration between samples were calculated for four different antibody-antigen interactions (spike-IgG, NCP-IgG, spike-IgA, NCP-IgA). Though there was a positive correlation between Spike-IgG and NCP-IgG across all samples (R 2 = 0·53), the correlations between serum IgG and IgA were much weaker (R 2 between 0·17 and 0·3) (Extended data: Figure S6).

Heterogeneous sensitivity model
The heterogeneous sensitivity model demonstrates that using varying A 450 cut-offs (corresponding to varying sensitivity values) to categorise seropositivity in the HERO dataset will result in a lower sensitivity than that defined using our assay validation dataset (Figure 3a). The model also shows that there is no difference in implied sensitivity between using spike or NCP as the antigenic target in the ELISA assay.
The relationship between the A 450 cut-off value and the sensitivity and specificity for the assay validation datasets for each antigen were plotted with the associated ROC curves (Extended data: Figures S8 and S9). We hypothesised that the higher A 450 values seen in older adults suggest that some commercially available serological assays may have a higher sensitivity in detecting COVID-19 antibodies in older age groups compared with younger age groups. We therefore used our model to estimate age-specific implied sensitivity for assays of different sensitivity profiles in estimating seroprevalence in our HERO dataset. We found that the sensitivity of a serological assay decreases with age due to the higher antibody titres seen in older people, with a clearer trend in an NCP-based assay compared to a spike-based assay (Figure 3b). Assuming a theoretical assay validation set sample sensitivity of 80% for the NCP protein, the resulting median implied sensitivity for age groups <30, 30-39, 40-49, 50-59, and 60+ years was 61%, 77%, 70%, 85%, and 89% respectively.

Discussion
We found a high SARS-CoV-2 seroprevalence in HCWs at a large UK hospital trust compared to national seroprevalence estimates, following the first pandemic wave in the UK 33 . In addition, we identified important risk factors associated with occupational exposure to COVID-19, and described a significant association between age and the likelihood of a positive serological result which has important implications for the validation of SARS-CoV-2 antibody assays and the hitherto interpretation of population-level COVID-19 serology data.
Over 20% of HCWs at STH had evidence of SARS-CoV-2 infection within just over 100 days of the first confirmed COVID-19 patient being admitted to our NHS trust. This high proportion over a short space of time is likely representative of the much higher exposure to SARS-CoV-2 infection among certain subpopulations of the workforce that we tested. Although data from other settings and countries suggested infection risk in HCWs is similar to community exposure, this seroprevalence is much higher than estimated seropositivity in the UK population at a similar time (6·0%, 95 CrI 5·8-6·1 in July 2020) 33,34 .
Despite universal PPE and IPC guidelines across STH, our data show that HCWs working in AMUs are at significantly   Occupational and physiotherapists (OT/PT) had the highest rates of seroprevalence across all of the job roles included in our cohort (45.5%), which is consistent with some other UK Increasing age was associated with seropositivity, with over a third of our HCWs aged >60 testing seropositive, and with higher antibody titres. We demonstrate that the sensitivity of a serological assay increases with increasing age due to the higher antibody titres seen in older people, and with a clearer trend in NCP-compared to spike-based assays. Our data complements the existing literature, which shows antibody titres against SARS-CoV-2 and other coronaviruses are higher in older individuals, which could be due to a higher risk of exposure to the virus, greater antigenic load or boosting of antibodies from cumulative seasonal coronavirus infections throughout their lifetime [18][19][20][21][22] . Several of the commercial SARS-CoV-2 antibody assays available (e.g. Roche Elecsys, Abbott SARS-CoV-2 IgG and Wantai ELISA) were validated with patient sera collected from those with more severe disease early on in the pandemic (i.e. those who presented to health services) 13-15 . Patients with severe COVID-19 have been shown to have higher antibody titres than those with milder disease (Extended data : Figure S1b), and it would be reasonable to assume these cases were likely to also be older in age 13-17,39,40 . Our antibody kinetic modeling data suggest that using such samples from severe COVID-19 cases for the purposes of assay calibration may result in an assay with lower or insufficient sensitivity when applied to less severe or younger (often community) populations. We also found that NCP-IgG is likely to wane more quickly than Spike-IgG. Depending on the sampling time frame relative to pandemic wave, serological testing based on NCP-IgG alone may further underestimate seroprevalence. With increasing vaccine coverage, use of spike IgG to determine seroprevalence also becomes more problematic when distinguishing whether an individual is seropositive from vaccination or previous infection. Assays which combine antibody responses to membrane protein with NCP antibodies may overcome these challenges 41,42 .
We note the limitations of our study, which include a potential for selection bias due to participants self-enrolling for convenience, rather than using systematic sampling. While we cannot measure the extent of this effect on the measured seroprevalence, we think any volunteer bias would have been equal across all categories compared, and so not altering the validity of these comparisons. In addition, we recognise that our cohort has relatively low numbers of HCWs from minority ethnic backgrounds (~10%), compared to the Sheffield general population (19%) 42 .
With the ongoing global devastation caused by the COVID-19 pandemic and its lasting effect on healthcare services, understanding the risk factors leading to HCW exposure is paramount to ensuring the continuity of effective and safe patient care. Our real-world data suggest that NHS HCWs face high levels of exposure to SARS-CoV-2, plus highlights locations and job roles at greatest risk during the first wave of the pandemic. Population seroprevalence data can help guide decision makers on risk management. Using assays that have been validated using serum samples from a broad population, combined with antibody kinetic modelling and/or with age-adjusted cut-offs could overcome the potential limitations we have highlighted. The project contains the following extended data:

Data availability
- Table S1. Details of the samples used to set thresholds during assay validation - Table S2. Comparison of antibody units in assay calibration curve sera assigned to the assay with WHO international standard antibody units - Table S3. Summary of the response variables and the covariates used in the regression model.
- Table S4. Summary of the model parameters used in the regression model.
- Table S5. Summary of the response variables and the covariates used in the regression model.
- Table S6. Summary of the model parameters used in the regression model.
- Table S7. Summary of the model parameters used in the Heterogenous sensitivity model.
- Figure S1a. ROC curves of the spike and NCP assays - Figure S1b. Spike-and NCP-specific IgG response in inpatients vs outpatients - Figure S2. Comparison of IgA assay A450 based on IgG Serostatus - Figure S3. Study flow diagram.
- Figure S4. Histogram (overlayed) showing the symptom onset, date of first bleed (all cases and symptomatic cases only), and time at second bleed (all cases and symptomatic cases only).
- Figure S5. Model-predicted proportion of asymptomatic estimates for three different models (A-C), adjusted and unadjusted with covariates gender, age group and ethnicity.
- Figure S6. Correlation between the four different antibody measures for 264 serological samples.

References
- Figure S7. Rate of decline for the antibody concentrations post-symptom onset for the four antibody measures. The fitted line is from a linear regression, with the 95% CI shown in red.
- Figure S8. Relationship between sensitivity/specificity and the cutoff value for the control dataset.
- Figure S9. ROC curves with the A 450 cut-off value indicated in red for the control dataset. x-axis shows the False Positive Rate, y-axis is the sensitivity.
- Figure S10. ROC curves for different age groups and antigen proteins, with the A 450 cut-off value indicated in various colours for the control dataset.
- Figure S11. (a) Specificity of the control data set against the implied specificity of the HERO dataset for spike and nucleoprotein. (b) Specificity of the control data set against the implied age-specific specificity of the HERO dataset for spike and nucleoprotein. This is an interesting paper and the authors have done a very thorough job. Obviously, with the progression of the pandemic, vaccination roll out and development of subsequent variants these data may not have the same relevance as during the first wave. I have a few specific points, but nothing too critical to the article.
As both spike and nucleocapsid antibodies were tested for vaccination is less of an issue, but as the recruitment period in the study overlapped with vaccine trials that were recruiting, it would be worth clarifying that none of the participants were in those early vaccine trials.
What case definition was used for the symptomatic definition? Was there any major difference in the seroprevalence between those who had classical COVID-19 symptoms and those with more viral respiratory type symptoms?
It would be really helpful if there were some data on local community prevalence/seropositivity around the testing dates (I appreciate that the UK figure of 4% is given for April 2020 but local data would be helpful in assessing the numbers).
Was there a difference in PPE use that could explain the difference in seroprevalence between ED and AMU? Many ED's retained use of higher level respiratory PPE due to performing aerosol generating procedures, which would align it with critical care areas.

Mo Yin 1 Mahidol Oxford Tropical Medicine Research Unit, Faculty of Tropical Medicine, Mahidol
University, Bangkok, Thailand 2 National University of Singapore, Singapore, Singapore Thank you for the extensive effort in addressing my earlier queries. I have no further comments.

Mo Yin 1 Mahidol Oxford Tropical Medicine Research Unit, Faculty of Tropical Medicine, Mahidol
University, Bangkok, Thailand 2 National University of Singapore, Singapore, Singapore I congratulate the authors for conducting this large and informative study. It is an important finding that SARS-CoV-2 were highly prevalent amongst healthcare workers so rapidly into the pandemic.
My overall comments are: Figure 1 shows that the association between high seroprevalence and age attenuated after adjustment with zones, job and wards. Could there be confounding relationships? If zones, job and wards were added to the antibody kinetics model, would the association between age and high titre attenuate as well? 1.
It appears that a major aim here is to valid an in-house assay. The manuscript could be more clear if this was clearly stated in the beginning. An important caveat for this work is that the dataset was based on this in-house assay.

2.
My complete list of comments are listed below: Abstract: "Using seropositivity as the binary response variable, we considered three different model subtypes with varying primary exposures; job location, contact with COVID-19 patients, and job type21." It is not clear from this sentence what were the actual independent variables used in the model subtypes? Which variable was given a random effect in the multilevel model? In the GitHub account, where the README says 'The stan code and the raw mcmc outputs is contained in the outputs/ folder.', I could not find the model codes? 4.
"For the antibody kinetics model, we included samples from seropositive individuals in a Bayesian multilevel linear regression model in two parts:…" Does this refer to the HCWs who were positive in the first blood draw only? 5.
"In our heterogeneous sensitivity and specificity model, we explored how estimates derived from our assay validation dataset generalise to covariate groups, e.g. participant age." What is the assay validation dataset? It is probably clearer to move the explanation from results to the methods section. 6.
"In our heterogeneous sensitivity and specificity model, we explored how estimates derived from our assay validation dataset generalise to covariate groups, e.g. participant age. To model the generalisability of these performance measures, we compared the seropositivity classification of our study dataset using our in-house antibody assay, with the predicted seropositivity classification from hypothetical assays with a quoted sensitivity and specificity. Our" These sentences are a little confusing. Firstly, what does 'estimates' in 'estimates derived from our assay validation dataset generalise to covariate groups, e.g. participant age' refer to? Is it referring to sensitivity and specificity? Secondly, "we compared the seropositivity classification of our study dataset using our in-house antibody assay" sounds like you compared the classification using the antibody assay. Thirdly, in "with the predicted seropositivity classification from hypothetical assays with a quoted sensitivity and specificity", what does 'quoted' mean? Does it mean 'assumed sensitivity and specificity since it is a hypothetical assay?

7.
Please introduce A450 when it was first mentioned in the manuscript. 8.
In "to model how reliably quoted performance measures generalise.", what does 'quoted' refer to? 9.
"Using the assay validation dataset, we estimated the A450 cut-off value for a range of chosen sensitivity values, and then used this A450 cut-off to classify seropositivity in the study dataset." How did you choose a single A450 value from a range of sensitivity values? 10.
"We then estimated the implied sensitivity on the HERO dataset by comparing seropositivity classification based on the estimated A450 cut-off value, with the seropositivity classification from our in-house assay (which for ease of comparison, we assume represents the maximum possible sensitivity and specificity (i.e. 100%) in this model)." What does 'implied' mean here?

Results:
"Rapid waning of IgA responses following SARS-CoV-2 infection complicated defining positive and negative samples based on the convalescent sera we used for assay validation." Why is this so? 1.
"1174 attended for a second visit (V2) between 15 June and 10 July 2020" 'for' is unnecessary in this sentence.
2. Figure 1: Why do the black stars not overlap with the unadjusted mean estimates? 3.
Why were zones, job and wards not used in the same model? Was their model selection done?
4. Figure 1: The raw data showed that a higher proportion of HCWs who were >60 years old were seropositive compared to the other age groups. However, there's no longer this association after adjustment. What was/were the confounder(s)?

5.
In the antibody kinetics model, neither zones, job and wards were used as an independent variable. As mentioned in the earlier point, would this finding of age and higher titres modulate/disappear if zones, job and/or wards are included in this model as well? 6. Figure 3: Please specify the meaning of 'control' in the x-axis. 7.

Discussion:
Were the infection prevention and control measures similar in the ED and AMU? Could the higher prevalence in AMU be due to transmissions from asymptomatic/ mildly symptomatic patients who were not triaged to be isolated by the ED? 1.
"Previous studies clearly demonstrate that patients with more severe COVID-19 have higher antibody titres" However this was not found in this study? 2.
"Reassuringly, our seroprevalence rates are similar to those seen in other UK based seroprevalence studies" Citation 5 also enrolled volunteer participants (publicised study using social media), and citation 21 is the HERO extended data? 3.
It will be informative to have a more detailed discussion on if the finding of age affecting serology titres were reported in other studies.

If applicable, is the statistical analysis and its interpretation appropriate? Partly
Are all the source data underlying the results available to ensure full reproducibility? Yes Are the conclusions drawn adequately supported by the results? Partly 1. Figure 1 shows that the association between high seroprevalence and age attenuated after adjustment with zones, job and wards. Could there be confounding relationships? If zones, job and wards were added to the antibody kinetics model, would the association between age and high titre attenuate as well? David Hodgson: The reviewer's interpretation of Figure 1 seems correct, but to be crystal clear; Figure 1 shows the (unadjusted) association between high seroprevalence, and the covariates age, gender, and race, and the (adjusted) association between high seroprevalence, and the covariates age, gender, race, and one primary exposure variable (either zones, jobs or wards). In Figure 1, the similarity between the unadjusted and adjusted seroprevalence estimates suggest that neither zone, jobs or wards were strong confounders on the age-dependent prevalence, therefore they are dropped in the subsequent modes for parsimony.
2. It appears that a major aim here is to valid an in-house assay. The manuscript could be more clear if this was clearly stated in the beginning. An important caveat for this work is that the dataset was based on this in-house assay.
Hailey Hornsby: As seen later in this response, the manuscript intro has been expanded to include the ELISA validation as an objective of the work: "To achieve this, we sought to measure SARS-CoV-2 antibody titres by developing an in-house assay (...)" associated with seropositivity, the evolving antibody response, and the impact of age on assay sensitivity"

infectious diseases wards by GPs or from STH admission units. When infectious diseases capacity was reached, suspected COVID-19 patients would either be placed in side rooms or cohorted in bays, whilst confirmed COVID-19 patients could be moved to COVID cohort wards"
2. It appears that willing healthcare workers enrolled themselves into the study. Were there any measures to prevent volunteer bias? Could volunteer bias influence the results? Paul Collini: Volunteer bias could conceivably increase the observed prevalence if those who thought they had been exposed to COVID were more likely to volunteer and really had been more likely to have been exposed and infected. The only measure we had to prevent bias was that we approached domestic services staff through their managers as well as email as we anticipated they wouldn't be accessing emails. However, we had no measures to mitigate any volunteer enrolment bias. While we cannot measure the extent of this effect on the measured prevalence of antibody detection, we suspect that it is small; at the time of the study there was a huge interest among staff as to whether they had yet been exposed to SARS CoV-2, which appeared to be irrespective of perceived exposure and was reflected in the rapid enrolment; also at that time it was still not appreciated how common asymptomatic COVID was, yet a large number of people without any prior symptoms enrolled. Even if volunteer bias were to artificially increase the observed prevalence, we think any volunteer bias would have been equal across all categories we compared, and so not altering the validity of these comparisons. We have altered the discussion to better reflect this -see response to 'Discussion point 3'.' 3. "These were categorised as: i), diagnosed with COVID-19 and confirmed by NAAT, ii), clinically diagnosed with COVID-19 but NAAT not performed, and iii), self-reported symptoms only21. Together, we defined these three groups as "symptomatic", as asymptomatic testing was only introduced after the study recruitment period." If we expect a proportion of those in groups ii) and iii) are not true COVID-19 cases, how would that affect the results? Were any sensitivity analyses performed to address this uncertainty? David Hodgson: Just to be clear, case positivity was defined through seropositivity and this sentence refers to the classification of these seropositive individuals as either a symptomatic or asymptomatic COVID-19 case. For patients in group ii) the COVID case was diagnosed by a clinical professional but a NAAT test was not possible at the time. Therefore it is unlikely, especially given the short time-scale between reporting COVID-like symptoms (Feb 2020) and the first bleed of the experiment (Apr 2020) that this is a misdiagnosed case. For people in iii) it is possible that some people were asymptomatic with COVID-19 and were reporting symptoms of another respiratory disease, however the sample size of this group is small (22 / 340 (6.4%) of nonasymptomatic cases) and even if such misdiagnoses were common, it is unlikely to impact the results. Regarding the stan code and raw mcmc outputs; there is a typo in the README file, and these files are actually contained in the 'include' folder. This has now been updated in the Github Repository. Thankyou for pointing out this error.

"For the antibody kinetics model, we included samples from seropositive individuals in a
Bayesian multilevel linear regression model in two parts:…" Does this refer to the HCWs who were positive in the first blood draw only? David Hodgson: We included HCWs who were positive at both their first and second bleed as stated in the Extended Data (reference 21 in the manuscript) Manuscript edited as following: "For the antibody kinetics model, we included samples from individuals who were seropositive at both bleeds, in a Bayesian multilevel linear regression model in two parts:" 6. "In our heterogeneous sensitivity and specificity model, we explored how estimates derived from our assay validation dataset generalise to covariate groups, e.g. participant age." What is the assay validation dataset? It is probably clearer to move the explanation from results to the methods section.
Hayley Colton: The validation of the in-house assay has now been added as an objective: "To achieve this, we sought to measure SARS-CoV-2 antibody titres by developing and using an in-house assay, prior to using statistical modelling to explore risk factors associated with seropositivity, the evolving antibody response, and the impact of age on assay sensitivity" The development technique of the in-house assay has been moved to methods, whilst the validation results have been moved to results.
7. "In our heterogeneous sensitivity and specificity model, we explored how estimates derived from our assay validation dataset generalise to covariate groups, e.g. participant age. To model the generalisability of these performance measures, we compared the seropositivity classification of our study dataset using our in-house antibody assay, with the predicted seropositivity classification from hypothetical assays with a quoted sensitivity and specificity. Our" These sentences are a little confusing. Firstly, what does 'estimates' in 'estimates derived from our assay validation dataset generalise to covariate groups, e.g. participant age' refer to? Is it referring to sensitivity and specificity? Secondly, "we compared the seropositivity classification of our study dataset using our in-house antibody assay" sounds like you compared the classification using the antibody assay. Thirdly, in "with the predicted seropositivity classification from hypothetical assays with a quoted sensitivity and specificity", what does 'quoted' mean? Does it mean 'assumed sensitivity and specificity since it is a hypothetical assay? David Hodgson: In answer to the reviewer's first question; yes, "estimates" in this sentence refers to sensitivity and specificity. For the reviewer's second point, we have clarified the sentence to ensure that this confusion does not arise. Finally, for the reviewer's third point, yes, "quoted" in this sentence can be read as "assumed". With these comments in mind, we have changed the text accordingly: "In our heterogeneous sensitivity and specificity model, we explored how estimates for the sensitivity and specificity derived from our assay validation dataset generalise to covariate groups, e.g. participant age. To model the generalisability of these performance measures, we compared the seropositivity classification of our study dataset according to our in-house antibody assay, with the predicted seropositivity classification from hypothetical assays with an assumed sensitivity and specificity." regression analysis. We remind the reviewer that here "unadjusted" means relative to the primary exposure (job, zone, wards), and that regularisation from the hierarchical model means that estimates from this model are likely to be different from the raw point estimates. E.g. The oldest (60+ years) age group had a much smaller sample size compared to the other age groups, therefore it is more susceptible to regularisation in a regression analysis.
4. Why were zones, job and wards not used in the same model? Was their model selection done? David Hodgson: There were two issues with using more than one of the primary response variables (zones, jobs, and wards) in the model. First we found multicollinearity between these exposures, making parameter inference difficult to interpret when more than one primary exposure was included in the regression analysis model. Second, the number of categories of these age groups gave rise to data sparsity issues (many groups have 0 entries) meaning again, results were unreliable. These reasons are explained in the Extended Data (reference 21 in the manuscript).
5. Figure 1: The raw data showed that a higher proportion of HCWs who were >60 years old were seropositive compared to the other age groups. However, there's no longer this association after adjustment. What was/were the confounder(s)? David Hodgson: This is due to regularisation of the hierarchical regression model. The oldest (60+ years) age group had a much smaller sample size compared to the other age groups, therefore it is more susceptible to regularisation in a regression analysis. In other words, the magnitude of the high seroprevalence observed in the older age group (with such a small sample size) is unlikely to be a true reflection of the seroprevalence in that age groups, given the consistency in seroprevalence across the other age groups with a large sample size.
6. In the antibody kinetics model, neither zones, job and wards were used as an independent variable. As mentioned in the earlier point, would this finding of age and higher titres modulate/disappear if zones, job and/or wards are included in this model as well? David Hodgson: As explained in the response to the reviewers main comment above, the similarity between the unadjusted and adjusted seroprevalence estimates suggest that neither zone, jobs or wards were strong confounders on the age-dependent prevalence, therefore they are dropped in the subsequent modes for parsimony. Figure 3: Please specify the meaning of 'control' in the x-axis. David Hodgson: Here the control means the "Sensitivity of assay validation dataset." This change has been made to Figure 3.

Discussion:
1. Were the infection prevention and control measures similar in the ED and AMU? Could the higher prevalence in AMU be due to transmissions from asymptomatic/ mildly symptomatic patients who were not triaged to be isolated by the ED?