Advancing Toxoplasma gondii multiplex serology

ABSTRACT Toxoplasma gondii is a highly prevalent pathogen causing zoonotic infections with significant public health implications. Yet, our understanding of long-term consequences, associated risk factors, and the potential role of co-infections is still limited. Seroepidemiological studies are a valuable approach to address open questions and enhance our insights into T. gondii across human populations. Here, we present substantial advancements to our previously developed T. gondii multiplex serology assay, which is based on the immunodominant antigens SAG1 and P22. While our previous bead-based assay quantified antibody levels against multiple targets in a high-throughput fashion requiring only a small sample volume, impaired assay characteristics emerged in sample dilutions beyond 1:100 and when being transferred to magnetic beads. Both are now critical for inclusion in large-scale seroprevalence studies. Using the truncated versions, SAG1D1 and P22trunc, significantly enhanced signal-to-noise ratios were achieved with almost perfect concordance with the gold-standard Sabin–Feldman dye test. In sample dilutions of 1:100, the diagnostic accuracy of SAG1D1 and P22trunc reached sensitivities (true positive rates) of 98% and 94% and specificities (true negative rates) of 93% and 95%, respectively. Importantly, performance metrics were reproducible in a 1:1,000 sample dilution, using both magnetic and nonmagnetic beads. Thresholds for seropositivity were derived from finite mixture models and performed equally well as thresholds by receiver operating characteristic analysis. Our improved multiplex serology assay is therefore able to generate robust and reproducible performance metrics under various assay conditions. Inclusion of T. gondii antibody measurements with other pathogens, in multiplex serology panels will allow for large-scale seroepidemiological research. IMPORTANCE Toxoplasma gondii is a pathogen of significant public health concern due to its widespread prevalence and zoonotic potential. However, our understanding of key aspects, such as risk factors for infection and disease, potential outcomes, and their trends, remains limited. Seroepidemiological studies in large cohorts are invaluable for addressing these questions but remain scarce. Our revised multiplex serology assay equips researchers with a powerful tool capable of delivering T. gondii serum antibody measurements with high sensitivity and specificity under diverse assay conditions. This advancement paves the way for the integration of T. gondii antibody measurements into multi-pathogen multiplex serology panels, promising valuable insights into public health and pathogen interactions.

• Upload point-by-point responses to the issues raised by the reviewers in a file named "Response to Reviewers," NOT IN YOUR COVER LETTER • Upload a compare copy of the manuscript (without figures) as a "Marked-Up Manuscript" file • Upload a clean .DOC/.DOCX version of the revised manuscript and remove the previous version • Each figure must be uploaded as a separate, editable, high-resolution file (TIFF or EPS preferred), and any multipanel figures must be assembled into one file • Any supplemental material intended for posting by ASM should be uploaded separate from the main manuscript; you can combine all supplemental material into one file (preferred) or split it into a maximum of 10 files, with all associated legends included For complete guidelines on revision requirements, see our Submission and Review Process webpage.Submission of a paper that does not conform to guidelines may delay acceptance of your manuscript.
Data availability: ASM policy requires that data be available to the public upon online posting of the article, so please verify all links to sequence records, if present, and make sure that each number retrieves the full record of the data.If a new accession number is not linked or a link is broken, provide Spectrum production staff with the correct URL for the record.If the accession numbers for new data are not publicly accessible before the expected online posting of the article, publication may be delayed; please contact production staff (Spectrum@asmusa.org)immediately with the expected release date.
Publication Fees: For information on publication fees and which article types are subject to charges, visit our website.If your manuscript is accepted for publication and any fees apply, you will be contacted separately about payment during the production process; please follow the instructions in that e-mail.Arrangements for payment must be made before your article is published.
ASM Membership: Corresponding authors may join or renew ASM membership to obtain discounts on publication fees.Need to upgrade your membership level?Please contact Customer Service at Service@asmusa.org.
The ASM Journals program strives for constant improvement in our submission and publication process.Please tell us how we can improve your experience by taking this quick Author Survey.
Thank you for submitting your paper to Spectrum.

Sincerely, Mark Pandori Editor Microbiology Spectrum
Reviewer #1 (Comments for the Author): I am including a document with all of my specific comments.
Most comments revolve around removing statements about reproducibility and robustness of results, as the authors' own discussion suggests that more testing is needed before conclusions about robustness and reproducibility can be made.
I support all conclusions on the improved assay development.
The other major comment is on clarification on the statistics and modeling done for calculating threshold values.It is unclear if either method used a training set of samples and then tested the model on the remaining set of samples, or if all samples were used to generate the threshold models.I am supportive of reproducibility and robustness claims if the analysis is actually done in this way and was just not clear to me.

Figures are well prepared.
Please note a few comments on background and some grammar suggestions.
Reviewer #2 (Comments for the Author): Dear Author Very good paper but need to some changes, Line 29: P22trunc reached sensitivities of 98% and 94% and specificities of 93% and 95%, respectively.
-"Sensitivities" and "specificities" have not previously been defined and it would be helpful to clearly state what they signify or mean.
Line 30: Importantly, however, these results were reproducible in a 1:1,000 sample dilution, using both -Delete "however" -Specify which results were reproducible.
Line 55: The major source of these environmental contaminations is suggested to be domestic cats (4,5).
-Mentioning that infected cats are the definitive host for the parasite and cause this contamination through shedding of oocysts in feces gives the reader a better understanding of how the parasite spreads.
Line 58: In congenitally infected newborns and immunocompromised individuals, e.g.tissuetransplanted or HIV-infected individuals, T. gondii may proliferate, resulting in a range of severe or life-threatening conditions, e.g.encephalitis (1).
-Consider including the understanding that both acute and latent chronic infections can escape immune control and be life-threatening to immunocompromised individuals.This would help the transition to, and giving more weight to, the following paragraph.
Line 102: All samples had been stored at -20°C.
-Wonderful.This adds value to the study showing the robustness of the method, allowing multiple freeze/thaw cycles of sample according to previous publications.
Line 96: Serum samples for the validation of the T. gondii multiplex serology assay were kindly provided by the Toxoplasma Reference Unit of Public Health Wales (Prof.Edward Guy) -More information on these samples would be important.For example, where were the samples collected and is it known what strains or "(types I, II, III)" the infected individuals were infected with?I see that some of this information is included in other papers, but referencing a few of the features of this data set is relevant.-This would also lead into the needed important background on the expression of each of these antigens being tested in the different T. gondii strains, as prevalence of the different strains varies greatly among locations and populations being tested.I would strongly suggest citing some of John Boothroyd's lab's recent work here for the introduction.
Line 187: Overall, the signal-to-noise ratio was increased for both truncated antigens with MFI values reaching on average 2.9-times higher MFI values for SAG1D1 and 1.6-times for P22trunc compared to the respective full-length versions (Table 2, Figure 1).
-This statement is accurate, but when working with MFI referencing fold changes instead of raw numerical MFI changes can be misleading.For example, the difference in an MFI of 10 and 30 results in a 3-fold difference but only a 20 unit difference.The difference in an MFI of 50 and 100 results in a 2-fold difference but 50-unit difference.This exact scenario occurs in the next comment.The larger raw unit difference can be what matters most for being able to differentiate between real signal and noise in most experiments.If this is not the case for the models being run, then an explanation of this should be included.
Line 199: Linear regression revealed that the difference in antibody levels between full-length and truncated antigens was even more prominent than in the 1:100 serum dilution, with MFI values being on average 4.4-times increased for SAG1D1 and 2.3-times for P22trunc compared to the respective full-length versions (Table 2, Figure 1).Correlations were also higher and reached Pearson's r values of 0.86 and 0.95, respectively (Table 2).
-It is okay to point out that the fold difference is greater in the 1:1000 dilution, but the raw difference in MFI is only greater for SAG1D1 and not P22trunc.Neglecting this point can be misleading.
-The same statement stands for fold vs. raw unit increases.

Discussion
Line 267: These results were reproducible using different bead types and methods to determine thresholds for seropositivity with Cohen's kappa being consistently above 0.9.
-It is clear that ROC analysis was done and finite mixture models were created, but previous papers do go into more detail about these methods than this paper does, and so some questions still remain.Was the model to determine threshold for seropositivity developed using only a subset of the samples and then tested on the remaining samples, or does the model determining the threshold for seropositivity still need to be tested using new samples to measure how accurately it can determine positive from negative samples?Further statements in the discussion suggest that the model has not been tested on other samples, but if that is incorrect then further explanation is needed.
Line 259: As T. gondii usually persists in a latent form, antibody levels decline but do usually not disappear.
-If desired, the authors can support this statement with background on how infection will continually go through small spontaneously reactivations from cysts, making sero-reversion more than exceedingly rare.
Line 362: Although, our results showed that thresholds derived from finite mixture models seem promising for T. gondii multiplex serology, the exploratory approach requires further validation in large-scale studies.Modeling thresholds might also be transferrable to other antigens if both subpopulations (seropositive and seronegative individuals) are large enough.Similarly, validation studies for all antigens are essential, as antibody kinetics might differ and be more complex due to reinfection rates, cross-reactivity, population differences, age-dependence, and others (18,35,36).
-This statement is accurate, but it contradicts the previously and subsequent conclusion that the method generates "robust and reliable results" that could help conduct sero-epidemiological research.Either training the model on a subset of the already collected data and testing it on the other subset, or ideally testing the current threshold models on new samples, is needed to claim that the method can reliably and robustly generate data usable to conduct sero-epidemiological research.
It is also acceptable to remove these claims, or change them and instead focus on the proven improvements on assay sensitivity.This could still include a proposed threshold for sero-positivity that could be tested in future experiments.That is already a significant finding.-Either delete "and others", or include "variables such as" before "reinfection rates".

Introduction
Toxoplasma gondii (T.gondii) is an obligate intracellular parasite capable of infecting a wide range of host species, including humans (1).According to sero-epidemiological studies, approximately one third of the global human population is infected, with particularly high seroprevalences in parts of Africa, Oceania, South America and Europe (2,3).
Infections are acquired by ingestion of tissue cysts in undercooked meat from infected animals or by ingestion of oocysts from contaminated food or water.The major source of these environmental contaminations is suggested to be domestic cats (4,5).Acute infection with T. gondii is most commonly asymptomatic and transitions into a latent chronic state, which is kept in check by the immune system (6).In congenitally infected newborns and immunocompromised individuals, e.g.tissue-transplanted or HIV-infected individuals, T. gondii may proliferate, resulting in a range of severe or life-threatening conditions, e.g.encephalitis (1).
Given its high prevalence and the potentially severe outcomes in some individuals, the interest in studying long-term effects of T. gondii infections is high, especially as reports about associations with neurobehavioral changes and disorders, neurological conditions and different types of cancer accumulate (7)(8)(9)(10).
Focusing on sero-epidemiological research, we previously developed a bead-based multiplex serology assay incorporating the two immunodominant T. gondii antigens, SAG1 (P30) and P22 (SAG2A).It was validated against a reference population tested with the gold-standard Sabin-Feldman dye test and achieved a sensitivity of 92.2% and a specificity of 92.0%.
The multiplex serology assay has been used to simultaneously determine antibody responses against T. gondii and other infectious agents in a high-throughput fashion (11)(12)(13)(14).
Unfortunately, we observed a reduced sensitivity at the 1:1,000 sample dilution now being utilized, which potentially caused weakening of epidemiological associations (12).Since 1:1,000 is the most common dilution in multi-pathogen studies, the exclusion of T. gondii from further studies was considered.
However, here we present how we addressed and overcame these limitations by exchanging the T. gondii antigens SAG1 and P22 for the truncated versions SAG1D1 and P22trunc, potentially increasing the accessibility of known relevant epitopes (15)(16)(17).We determined the diagnostic accuracy in two common sample dilutions (1:100 and 1:1,000) using non-magnetic and magnetic beads.The latter are of special importance to prepare the automatization of large-scale studies, as magnetic beads can be integrated into liquid handling platforms more easily.
Furthermore, we applied ROC analyses to generate thresholds for seropositivity and compared them with thresholds that were derived from finite mixture models.This alternative method to determine thresholds for seropositivity could be helpful in case antibody responses are systemically affected by experiment-specific or study-specific conditions, e.g.bead types, sample dilutions or population differences (18).
Overall, the improved T. gondii multiplex serology assay we present here, enables its inclusion into future multi-pathogen disease surveillance studies.These, will provide critical sources of evidence to underpin infection risk assessment, and thereby inform future public health policy and decision-making.

Reference population
Serum samples for the validation of the T. gondii multiplex serology assay were kindly provided by the Toxoplasma Reference Unit of Public Health Wales (Prof.Edward Guy).All participants were tested in the Sabin-Feldman dye test applying a cutoff level of 2 IU/ml.The assay was calibrated against the WHO ToxoG International Standard Preparation (19).
From the original 198 serum samples used for the initial validation of SAG1 and P22, a total of 162 samples (121 samples with a seropositive and 41 samples with a seronegative reference status for T. gondii) had sufficient residual volume for the present re-validation (14).All samples had been stored at -20°C.

T. gondii antigens
All T. gondii antigens were expressed as recombinant fusion proteins in E. coli BL21, except for P22 which was expressed in E. coli BL21 Rosetta.SAG1 and P22 derive from the initial multiplex serology assay, while SAG1D1, SAG1C, SAG1Pept and P22trunc were produced for the here presented re-validation (Table 1).Their design is based on structural information deposited in UniProt, as well as published studies on immunogenicity and epitopes (15)(16)(17).
Protein sequences were obtained from UniProt and EMBOSS backtransseq was used for backtranslation and codon optimization (20,21).The respective DNA sequences were synthesized and subcloned into the pGEX-4T3tag vector by Eurofins genomics.This vector encoded an N-terminal GST, as well as a C-terminal peptide comprising eleven amino acids of the large T-antigen of the simian virus 40 (22).To verify DNA and protein integrity, plasmids were sequenced and recombinant proteins were characterized by Western Blots and ELISAs.Successful full-length expression was accomplished for all of the fusion proteins, however SAG1C and SAG1Pept did not show promising results in serological pilot experiments.While SAG1Pept turned out to be non-immunogenic, SAG1C antibody levels were high, but not discriminative for T. gondii infection status (Supplementary figure 1).Hence, the multiplex serology results for these two antigens were not further included in the results.
In addition to the T. gondii antigens, a series of common control antigens was prepared accordingly, to check for variation across experiments.These were the envelope glycoproteins gE (ORF68) and gI (ORF67) of the Varicella zoster virus and the capsid protein VP1 of the human polyomavirus 6 (23,24).

Multiplex Serology
Multiplex Serology was performed as described previously (14,22).Briefly, recombinant GSTfusion proteins were affinity-purified on color coded SeroMAP TM (non-magnetic) or MagPlex ® (magnetic) beads (Luminex Corp., Austin, TX, USA), which had been derivatized with glutathione-casein.Serum samples were diluted in preincubation buffer, incubated for 1h under agitation and added to the bead mix, which contained the respective antigens of interest, in a final serum dilution of 1:100 or 1:1,000.
A Luminex 200 Analyzer (Luminex Corp., Austin, TX, USA) was used to determine the bead sort and quantify bound serum antibodies, given as median fluorescence intensity (MFI).For magnetic MagPlex ® beads a Luminex FLEXMAP 3D ® was used, which generated 1.7-times higher MFI values (according to manufacturer) due to hardware/photo multiplier differences.
Assigning different antigens to spectrally distinguishable bead sorts allows quantifying antibody responses against all antigens simultaneously in a single reaction.Yet, the full-length versions (SAG1 and P22) were measured separately from the truncated versions (SAG1D1 and P22trunc) to prevent competing antibody responses due to large overlapping regions.
Common control antigens were included in both multiplex serology panels to verify comparability between the measurements (Supplementary Figure 2).

Statistical evaluation
Raw MFI values were corrected for bead-specific background values and serum-specific GSTbackground signals to obtain net MFI values.They were characterized by medians and interquartile ranges (IQR).
To compare antibody responses between two antigens, linear regression was applied and numerical correlation was assessed by Pearson's r.It was interpreted as follows: r<0.30: slight correlation, 0.3<r<0.50:moderate correlation: 0.5<r<0.80:strong correlation, 0.80<r: very strong correlation.
In order to compare multiplex serology results to the gold-standard Sabin-Feldman dye test, receiver operating characteristic (ROC) analyses were performed and the area under the curve (AUC) was reported to estimate the test accuracy.Paired ROC curves were considered significantly different if the p-value was below 0.05 applying DeLong's test.
In order to calculate concordance with the reference status, continuous MFI values were dichotomized using antigen-specific thresholds.These thresholds either derived from maximizing Youden's index (ROC analysis) or finite mixture modeling, using an expectationmaximization (EM) algorithm to fit a bimodal distribution after log-transforming MFI values (25).
Thresholds corresponded to the local minima of the density curves.
All analyses were conducted using R 4.1.3and the packages pROC and mixsmsn (25,26).
Overall, the signal-to-noise ratio was increased for both truncated antigens with MFI values reaching on average 2.9-times higher MFI values for SAG1D1 and 1.6-times for P22trunc compared to the respective full-length versions (Table 2, Figure 1).Yet, numerical correlations between the paired antigens were strong and very strong with Pearson's r values of 0.76 and 0.88, respectively (Supplementary Figure 3).Linear regression revealed that the difference in antibody levels between full-length and truncated antigens was even more prominent than in the 1:100 serum dilution, with MFI values being on average 4.4-times increased for SAG1D1 and 2.3-times for P22trunc compared to the respective full-length versions (Table 2, Figure 1).Correlations were also higher and reached Pearson's r values of 0.86 and 0.95, respectively (Table 2).
In order to assess if antibody levels can be used to distinguish seropositive from seronegative individuals, ROC analyses were performed.While antibody responses against the full-length antigens SAG1 and P22 reached good AUCs between 0.88 and 0.94, the truncated versions significantly outperformed them with AUCs achieving values of 0.98 and 0.99 and nonoverlapping 95%CI (Figure 1).

Non-magnetic vs. magnetic beads
The T. gondii multiplex serology assay was originally validated using non-magnetic SeroMAP TM polystyrene beads.For large sero-epidemiological studies, magnetic beads can be of advantage, though, as they facilitate the integration into liquid handling platforms.Hence, we compared antibody levels against the new truncated T. gondii antigens quantified with nonmagnetic and magnetic beads in a serum dilution of 1:1,000.

Classical vs. fitted thresholds and definition of T. gondii seropositivity
As demonstrated, numerical antibody levels vary due to different sample dilutions and bead types.Additionally, they can be influenced by population differences, sample handling and readout analyzers.Thus, pre-specifying thresholds for seropositivity can be challenging.
To investigate if thresholds solely based on the overall distribution are concordant with ROC analysis-based thresholds (a), we fitted finite mixture models assuming two underlying subpopulations, which were either normal distributions (b) or skew-normal distributions (c).
They are summarized alongside sensitivities, specificities and agreement with the reference assay in Table 3.
Overall, P22trunc and SAG1D1 show a very good discrimination of seropositive from seronegative individuals in both dilutions, both bead types and any of the thresholds for seropositivity.Concordance with the reference assay is almost perfect with Cohen's kappa values from 0.81 to 0.89 for P22trunc and from 0.90 to 0.93 for SAG1D1.The only exception was the modeled threshold for SAG1D1 in a 1:1,000 dilution presuming two underlying normal distributions (b).Compared to (a) and (c), the threshold of 220 MFI was higher, which led to false negative classifications and a reduced sensitivity of 82%.Apart from this, sensitivities and specificities were constantly high and reached values up to 98%.
To test if a combination of SAG1N and P22trunc could further improve the overall performance of T. gondii multiplex serology, the respective antibody responses were plotted against each other (Figure 3).With any of the given thresholds for seropositivity, the agreement between the two antigens is very high and only few samples show a discordant result.
Combining the T. gondii antigens, an overall seropositivity could be defined as either being seropositive for both antigens or at least one antigen.
Applying ROC analyses derived thresholds, seropositivity to both antigens sets the specificity to 95% in a 1:100 sample dilution and 100% in a 1:1,000 sample dilution using non-magnetic or magnetic beads.Simultaneously, sensitivities reach 94%, 94% and 97%, respectively.
If an overall seropositivity for T. gondii is defined as being seropositive to at least one of the two antigens, sensitivities increase to 98%, 99% and 99%, respectively, while specificities decrease to 93%, 93% and 90%, respectively.Specificity and sensitivity values for modeled thresholds achieve comparable performances.

Discussion
We here present recent advancements in our T. gondii multiplex serology.By exchanging SAG1 and P22 for their truncated versions SAG1D1 and P22trunc, we substantially increased the magnitude of measured antibody responses by up to 4.4 times.This broadens the range of sample dilutions in which antibodies against T. gondii can be reliably determined without losses in sensitivity or specificity.These results were reproducible using different bead types and methods to determine thresholds for seropositivity with Cohen's kappa being consistently above 0.9.
Our multiplex serology assay is based on the two immunodominant proteins P22 (SAG2A) and SAG1 (P30).Both are glycosylphosphatidylinositol (GPI)-anchored proteins and predominantly found on tachyzoites, the rapidly proliferating life stage of T. gondii (27).They are involved in attachment and invasion and, hence, an early target for the immune system.
Their potentials as serological targets have been recognized early (28,29).Since then, modern methods allowed characterization in more detail.
Macêdo et al. investigated the three-dimensional structure of P22 and described an intrinsically unstructured loop at the C-terminal end of the single domain protein.In vitro experiments revealed that this loop actively suppresses pro-inflammatory responses in cells of the immune system (16).Generally, unstructured regions are more flexible and often have multiple interaction partners.This attribute could, however, cause problems in serological assays, e.g. by sterically shielding off epitopes or aggregating beads.Hence, we decided to remove the C-terminus, but keep a known epitope intact which comprises amino acids 137 -141 (17).
This truncation significantly increased the amount of bound serum antibodies by 1.6 or 2.3 times in a serum dilution of 1:100 or 1:1,000, respectively.Whether this presumed epitope shielding only occurs in the artificial environment of serological assays or is an actual physiological defense mechanism requires further studies.
The second antigen SAG1 comprises a N-terminal domain D1, a C-terminal domain D2 and an unstructured loop at the C-terminal end (30).Even before these structural domains were described, the N-terminal region was found to harbor most immunodominant epitopes (31).
Furthermore, the D1 domain has fewer polymorphisms than the rest of the protein and other SAG proteins, which makes it a stable serological target (30).Using Sag1D1 as an antigen in multiplex serology, increased the bound serum antibodies 2.9 and 4.4 times, in a dilution of 1:100 and 1:1,000, respectively.Again, we hypothesize that this was caused by a substantially better accessibility of relevant epitopes and/or removing of non-specific epitopes.
Additionally, we tested the C-terminal end as a potential antigen, as Wang et al. described a highly specific epitope located at amino acids 313-332 (15).In our reference population, however, SAG1C showed no discrimination between sera from individuals with or without a positive reference status for T. gondii.Although the sequence does not show homologies with any other pathogenic species, this might still be caused by cross-reactive antibodies or by high background values due to the nature of the unstructured loop.
Another peptide comprising amino acids 103-116 (SAG1pept), which comprised an exposed loop between two beta sheets, showed barely any antigenicity in our multiplex serology assay (15).Although the epitope selections were based on the publication by Wang et al., the exact constructs differed slightly, as we tried to account for the predicted 3D structure found in the respective UniProt entries (20).Furthermore, these epitopes were identified using pig serum and might therefore not directly be transferable to humans (15).
An advantage of multiplex serology is, that multiple antigens from various pathogens can be combined into a single reaction, which is cost-effective and saves patient material.For a few pathogens, however, this may cause a diminished sensitivity or specificity if the final assay dilution is not ideal.This tradeoff was made for T. gondii in a large sero-epidemiological study comprising approximately 10,000 participants from the UK biobank (UKB) (12).Despite a reduced sensitivity in a 1:1,000 sample dilution, association-analyses were still robust and consistent with the literature, but effect sizes might have been weakened.
Nevertheless, excluding T. gondii from similar multi-pathogen studies, which are currently planned, was discussed.Using the truncated T. gondii antigens SAG1D1 and P22trunc, obviates this consideration, as ROC analyses show that AUCs are very high in both common assay dilutions, 1:100 and 1:1,000.Furthermore, both antigens perform equally well on magnetic beads with numerical correlation values of 0.92 and 0.94 for Sag1D1 and P22trunc.The principal difference was a systematic increase of the quantitative antibody response using magnetic beads by the factor 6.2 or 4.0, respectively.Increased MFI values have been reported for magnetic beads and were additionally magnified here by using a Luminex FlexMap3D as opposed to a Luminex 200 as output device (12,32,33).
Multiplex serology is usually robust against these systematic variations for most pathogens.
As presented in the UKB assay validation, comparing non-magnetic beads to magnetic beads resulted in a median ICC (intraclass correlation coefficient) of 0.94 (12).With an ICC of 0.48, T. gondii multiplex serology was a strong outlier.As antibody responses are in a low to midrange, the threshold for seropositivity is prone to systematic assay variation, e.g.induced by different bead types.Unfortunately, it is not always possible to control or foresee these variations, as they might be caused by sample preparation, storing conditions, population differences, etc. (18).
Considering that threshold may need adjustments in order to account for sample-specific characteristics, e.g.storage conditions, assay components, non-magnetic vs magnetic beads, or sample dilutions, an automated approach would be of help to generate reproducible and reliable results.Hence, thresholds tailored to the study or experiment are strongly desirable to make sero-epidemiological T. gondii research more robust.For this purpose, we examined generating thresholds by fitting finite mixture models to the overall data-distribution and compared them to thresholds classically derived from ROC analyses.
Assuming that the bimodal distribution comprises a mixture of two underlying subpopulations, we applied an EM algorithm to model a mixture of two normal distributions or two skew-normal distributions (34,35).Sensitivity and specificity values that derive from applying these thresholds for seropositivity were very similar to those based on 'ideal' thresholds derived from ROC analyses.The only exception was modeling the antibody response against SAG1D1 in a 1:1,000 dilution assuming two normal distributions.Due to the high threshold, the sensitivity was decreased to 82%, while it reached 93 -99% for all other modeled thresholds.The specificity was consistently high with values between 90% and 98% for both antigens.
While normal distributions are a classical choice for finite mixture models, skew-normal distributions allow some asymmetry across subpopulations, e.g.due to seroconversion or -reversion.Hence, skew-normal distributions have been proposed as a more suitable option for biological data (34,35).
In the case of antibodies against T. gondii, both models perform well as no major seroconversion or seroreversion is expected.While seroconversion usually takes place within a few days to weeks after infection, seroreversion typically occurs over a longer time period after an infection is eliminated.As T. gondii usually persists in a latent form, antibody levels decline but do usually not disappear.
Although, our results showed that thresholds derived from finite mixture models seem promising for T. gondii multiplex serology, the exploratory approach requires further validation in large-scale studies.Modeling thresholds might also be transferrable to other antigens if both subpopulations (seropositive and seronegative individuals) are large enough.Similarly, validation studies for all antigens are essential, as antibody kinetics might differ and be more complex due to reinfection rates, cross-reactivity, population differences, age-dependence, and others (18,35,36).
Overall, we achieved substantial advancements in T. gondii multiplex serology.By truncating recombinant T. gondii antigens, we presumably enhanced the accessibility of relevant epitopes, which led to significantly increased antibody levels and an improved assay performance compared to a reference gold standard assay.Our advancements allow measuring T. gondii antibodies reliably in large multi-pathogen panels and will help to conduct sero-epidemiological research which is crucial for disease surveillance, risk assessment, and informing public health interventions.
Tables Table 1:    Dear Reviewers, Thank you for carefully reviewing our manuscript and providing us with constructive feedback and insightful comments.We appreciate each suggestion and believe that the quality of the manuscript has significantly improved.Please find a point-by-point reply to the issues raised below.
Sincerely, on behalf of all co-authors, Dr. Rima Jeske Review 1 Goal: Sero-epidemiological studies are a valuable approach to address open questions about risk factors, potential outcomes, trends, and co-infections in large cohorts of people.Method: T. gondii multiplex serology approach, which is based on the immunodominant antigens SAG1 and P22.

Conclusions:
1. Using the truncated versions, SAG1D1 and P22trunc, significantly enhanced signal-tonoise ratios were achieved, and with almost perfect concordance with the gold-standard Sabin Feldman dye test.(Yes) 2. In sample dilutions of 1:100, the diagnostic accuracy of SAG1D1 and 29 P22trunc reached sensitivities of 98% and 94% and specificities of 93% and 95%, respectively.(Yes) 3. Importantly, however, these results were reproducible in a 1:1,000 sample dilution, using both magnetic and non-magnetic beads, and across multiple methods to determine thresholds for seropositivity, including receiver operating characteristic (ROC) analysis and finite mixture models.(Yes) 4. Our improved multiplex serology assay is therefore able to generate robust and reproducible results under various assay conditions, thus enabling inclusion of T. gondii antibody measurements with other pathogens, in large multiplex serology panels for sero epidemiological research.(No) 5. Our revised multiplex serology assay equips researchers with a powerful tool capable of delivering reliable and reproducible results under diverse assay conditions.This advancement paves the way for the integration of T. gondii antibody measurements into multi-pathogen multiplex serology panels, promising valuable insights into public health and pathogen interactions.(No) 6.Our advancements allow measuring T. gondii antibodies reliably in large multi-pathogen panels and will help to conduct sero-epidemiological research which is crucial for disease surveillance, risk assessment, and informing public health interventions.(No) All sections highlighted red are not supported by the results.The authors state in the discussion section that the results "seem promising" but "require further validation".Claims about reproducibility and reliability of results, as well as the ability to include this test in larger studies, should not yet be made until those tests or experiments are run and shown to be reliable and reproducible.
Thank you for carefully going through the manuscript and evaluating the core claims.We appreciate your concerns regarding upscaling, multiplexing and reproducibility and hope that we can address them accordingly.
Multiplex serology is the core technology of our research and we gained experience and routine over the years analyzing numerous large serological studies.One of the main advantages is that the technique allows combining antigens from different pathogens with each other while remaining a high reproducibility.Nevertheless, we make sure to validate every single antigen thoroughly to exclude cross-reactivity or adverse effects.
In the preparation for a large seroepidemiological study which will comprise 40,000 human serum samples, the T. gondii reference samples were re-tested (a fourth time) in a 48-plex assay.This means that antibodies against the two T. gondii antigens were quantified alongside 46 antigens from other pathogens.Compared to the monoplex assay (2 T. gondii antigens + 1 control antigen), Pearson's r reached 0.98 for SAG1D1 and 0.97 for P22trunc (Figure below).
Hence, the integration of the two T. gondii antigens into our multiplexed platform works very well.These results also support the claims that assay results are highly reproducible, which we also found for other antigens as shown in the assay validation for the UK biobank (reference 12).Nevertheless, we agree that a modification of the manuscript text is necessary, as these results are not included in the present validation.

was changed to:
Our improved multiplex serology assay is therefore able to generate robust and reproducible performance metrics under various assay conditions.Inclusion of T. gondii antibody measurements with other pathogens, in multiplex serology panels will allow for large-scale sero-epidemiological research.

was changed to:
Our revised multiplex serology assay equips researchers with a powerful tool capable of delivering T. gondii serum antibody measurements with high sensitivity and specificity under diverse assay conditions.This advancement paves the way for the integration of T. gondii antibody measurements into multi-pathogen multiplex serology panels, promising valuable insights into public health and pathogen interactions.

was changed to: Our advancements will allow integrating T. gondii antibody measurements into large multipathogen panels, as demonstrated for other pathogens in the UKB and the China Kadoorie
Biobank (12,39).This will facilitate sero-epidemiological research which is crucial for disease surveillance, risk assessment, and informing public health interventions.

Comments:
Line 14: Yet, our understanding of risk factors for infection and disease, potential outcomes and their trends, and the potential role of co-infections is still limited.-Understanding is not limited.Much is understood from previous studies.Focus more on how increased understanding is still valuable and how it could be used.
While there has been substantial progress in the molecular and infection biology field, epidemiological studies (especially on humans) are very scarce, e.g.potential associations with cancer or neurobehavioral changes are still ambiguous and usually only contain few individuals.
To put the focus on the epidemiological side, the sentence was adapted as follows: 'Yet, our understanding of long-term consequences, associated risk factors and the potential role of co-infections is still limited.Sero-epidemiological studies are a valuable approach to address open questions and enhance our insights into T. gondii in human populations.'Line 29: P22trunc reached sensitivities of 98% and 94% and specificities of 93% and 95%, respectively.
-"Sensitivities" and "specificities" have not previously been defined and it would be helpful to clearly state what they signify or mean.
The text was adapted to: 'In sample dilutions of 1:100, the diagnostic accuracy of SAG1D1 and P22trunc reached sensitivities (true positive rates) of 98% and 94% and specificities (true negative rates) of 93% and 95%, respectively.'Line 30: Importantly, however, these results were reproducible in a 1:1,000 sample dilution, using both -Delete "however" -Specify which results were reproducible.
The sentence was adapted as follows: 'Importantly, performance metrics were reproducible in a 1:1,000 sample dilution, using both magnetic and non-magnetic beads, and across multiple methods to determine thresholds for seropositivity, including receiver operating characteristic (ROC) analysis and finite mixture models.'Line 55: The major source of these environmental contaminations is suggested to be domestic cats (4,5).
-Mentioning that infected cats are the definitive host for the parasite and cause this contamination through shedding of oocysts in feces gives the reader a better understanding of how the parasite spreads.
We added the information.The sentence is now as follows: 'The major source of these environmental contaminations is suggested to be domestic cats due to the excretion of oocysts in their feces' Line 58: In congenitally infected newborns and immunocompromised individuals, e.g.tissue transplanted or HIV-infected individuals, T. gondii may proliferate, resulting in a range of severe or life-threatening conditions, e.g.encephalitis (1).
-Consider including the understanding that both acute and latent chronic infections can escape immune control and be life-threatening to immunocompromised individuals.This would help the transition to, and giving more weight to, the following paragraph.
Thank you for pointing this out.We added your suggestion as follows: 'In congenitally infected newborns and immunocompromised individuals, e.g.tissuetransplanted or HIV-infected individuals, acute and chronic T. gondii may escape immune control, resulting in a range of severe or life-threatening conditions, e.g.encephalitis.'Line 102: All samples had been stored at -20°C.
-Wonderful.This adds value to the study showing the robustness of the method, allowing multiple freeze/thaw cycles of sample according to previous publications.
We try to avoid freeze/thaw cycles whenever possible and especially when dealing with large study cohorts.In some cases, e.g. when samples are retested, this is obviously not possible.It is indeed reassuring to know that the assay is still robust.
Line 96: Serum samples for the validation of the T. gondii multiplex serology assay were kindly provided by the Toxoplasma Reference Unit of Public Health Wales (Prof.Edward Guy) -More information on these samples would be important.For example, where were the samples collected and is it known what strains or "(types I, II, III)" the infected individuals were infected with?I see that some of this information is included in other papers, but referencing a few of the features of this data set is relevant.
-This would also lead into the needed important background on the expression of each of these antigens being tested in the different T. gondii strains, as prevalence of the different strains varies greatly among locations and populations being tested.I would strongly suggest citing some of John Boothroyd's lab's recent work here for the introduction.
All sera derived from UK residents and were submitted to the Toxoplasma Reference Unit for routine toxoplasma serological testing.They comprise a proportion of individuals who will have acquired toxoplasma infection while living in the UK, together with a number of other EU citizens living in the UK and immigrants from outside the EU with a lifelong immune response against strains of toxoplasma acquired in their country of origin.While T. gondii strain are not routinely genotyped, most of them are presumable of the type II, as this is the most common one in Europe/UK.Serological assays are not capable to distinguish between strains of different types, as the cross-reactivity between antigens is too high.Prof. Boothroyd's lab published an approach based on polymorphic peptides to overcome this, however, it lacks validation in humans.
Nevertheless, our aim is enabling a high-throughput testing for T. gondii infection in large patient cohorts, which universally works for all T. gondii types.Hence, we used antigens that are expressed in virtually all strains and also chose highly conserved regions.
We adapted the manuscript as follows: Methods: 'Serum samples for the validation of the T. gondii multiplex serology assay were obtained through routine testing and kindly provided by the Toxoplasma Reference Unit of Public 'Health Wales (Prof.Edward Guy).Discussion: Our multiplex serology assay is based on the two immunodominant proteins P22 (SAG2A) and SAG1 (P30), which are expressed in virtually all T. gondii strains.[Theisen, Terence C., and John C. Boothroyd."Transcriptional signatures of clonally derived Toxoplasma tachyzoites reveal novel insights into the expression of a family of surface proteins."Plos one 17.2 (2022): e0262374.]Line 187: Overall, the signal-to-noise ratio was increased for both truncated antigens with MFI values reaching on average 2.9-times higher MFI values for SAG1D1 and 1.6-times for P22trunc compared to the respective full-length versions (Table 2, Figure 1).
-This statement is accurate, but when working with MFI referencing fold changes instead of raw numerical MFI changes can be misleading.For example, the difference in an MFI of 10 and 30 results in a 3-fold difference but only a 20 unit difference.The difference in an MFI of 50 and 100 results in a 2-fold difference but 50-unit difference.This exact scenario occurs in the next comment.The larger raw unit difference can be what matters most for being able to differentiate between real signal and noise in most experiments.If this is not the case for the models being run, then an explanation of this should be included.
We describe raw numbers in the paragraph before and include them in Table 2 and visualize them Figure 1.Nevertheless, you raise a valid concern regarding the potential for misleading interpretation when reporting fold change values for net MFI numbers.In order to address the raw unit difference between signal and noise, we decided to report the signal-to-noise ratio increase instead of MFI value increase and adapted the text and table accordingly.It now reads as follows: 'Compared to the respective full-length antigen, the signal-to-noise ratio was increased 5.1fold for SAG1D and 9.5-fold for P22trunc (Table 2, Figure 1).' We also made sure to add this information in the method section: Signal-to-noise ratios were obtained by dividing the median net MFI of individuals with a positive reference status (signal) by the median net MFI of those with a negative reference status (noise).
Line 199: Linear regression revealed that the difference in antibody levels between full-length and truncated antigens was even more prominent than in the 1:100 serum dilution, with MFI values being on average 4.4-times increased for SAG1D1 and 2.3-times for P22trunc compared to the respective full-length versions (Table 2, Figure 1).Correlations were also higher and reached Pearson's r values of 0.86 and 0.95, respectively (Table 2).
-It is okay to point out that the fold difference is greater in the 1:1000 dilution, but the raw difference in MFI is only greater for SAG1D1 and not P22trunc.Neglecting this point can be misleading.
As described above, we decided to switch to reporting signal-to-noise ratio increase.The paragraph was changed as follows: Hence, the signal-to-noise ratio was increased 9.1-fold for SAG1D1 and 10.3-fold for P22trunc and even more prominent than in the 1:100 serum dilution (Table 2, Figure 1).Correlations were also high and reached Pearson's r values of 0.86 and 0.95, respectively (Table 2).
-The same statement stands for fold vs. raw unit increases.
In this paragraph, we want to emphasize that MFI values are systematically increased (for sero-positive and sero-negative individuals) when using magnetic beads.Hence, we report regression line slopes and also median MFI values which are reached for both sero-positive and sero-negative individuals in order to emphasize this.The raw unit increase is only of secondary importance here.

Discussion
Line 267: These results were reproducible using different bead types and methods to determine thresholds for seropositivity with Cohen's kappa being consistently above 0.9.
-It is clear that ROC analysis was done and finite mixture models were created, but previous papers do go into more detail about these methods than this paper does, and so some questions still remain.Was the model to determine threshold for seropositivity developed using only a subset of the samples and then tested on the remaining samples, or does the model determining the threshold for seropositivity still need to be tested using new samples to measure how accurately it can determine positive from negative samples?Further statements in the discussion suggest that the model has not been tested on other samples, but if that is incorrect then further explanation is needed.
With the aim to validate a specific threshold (x MFI), it is a common practice to separate samples into a training set to define an optimal threshold, and a validation data set to estimate how well this threshold performs.In a next step, a secondary validation with external samples can be performed to assess the generalizability of the specified threshold.

However, numerous studies and publications have shown that pre-specified thresholds do not perform equally well across different populations (e.g. Kafatos G et al: Is it appropriate to use fixed assay cut-offs for estimating seroprevalence?). Reasons include variances in background immunity in certain populations, differing sample preparations or simply different Lot numbers across assay chemicals.
Hence, a method which generates thresholds just based on the distribution of the results, without considering external reference samples, could be superior.We test this in our samples by applying the same algorithm while modifying assay parameters (e.g.sample dilution, bead type) and consequently generating different MFI values.The EM algorithm solely considers the data distribution and fits a bimodal density curve, which has two subpopulations and a local minimum in between.If these subpopulations actually correspond to sero-positive and sero-negative individuals is determined in a next step comparing the results to the gold-standard and determining sensitivity and specificity.
Here, we show that we can achieve assay parameters which perform equally well to ROCderived thresholds in a given population.These thresholds will differ in other studies due to study-specific setting which cannot be estimated beforehand (especially using cohort samples, as these often have only very limited volume).
Further validation studies are needed to find out if this bimodal distribution can also be seen in other populations.Due to the nature of the immunological response, this is highly likely as we only expect sero-positives and sero-negatives.Based on the improved signal-to-noise ratio of the new antigens these two populations will now be more easily distinguishable, even in a 1:1000 dilution.
We made adaptions to the manuscript text to address this issue: Methods: 'In order to compare multiplex serology results to the gold-standard Sabin-Feldman dye test, continuous MFI values were dichotomized using antigen-specific thresholds.These thresholds derived from finite mixture models, using an expectation-maximization (EM) algorithm to fit a bimodal distribution after log-transforming MFI values (25,26).Thresholds corresponded to the local minima of the density curves.Sensitivities and specificities were determined in a next step based on the reference status of the sera.To evaluate algorithm performance, the thresholds were compared with those derived from the classical maximization of Youden's index (receiver operating characteristic (ROC) analysis).'Discussion: 'Our results showed that thresholds derived from finite mixture models seem promising to generate thresholds for T. gondii multiplex serology, which are solely based on the data distribution of the results.Nevertheless, a bimodal data distribution needs to be verified in further upcoming seroepidemiological the exploratory approach requires further validation in large-scale studies.' Line 259: As T. gondii usually persists in a latent form, antibody levels decline but do usually not disappear.
-If desired, the authors can support this statement with background on how infection will continually go through small spontaneously reactivations from cysts, making sero-reversion more than exceedingly rare.
Thank you for this excellent suggestion.We included the information and cited Rougier et al: Lifelong Persistence of Toxoplasma Cysts: A Questionable Dogma?Line 362: Although, our results showed that thresholds derived from finite mixture models seem promising for T. gondii multiplex serology, the exploratory approach requires further validation in large-scale studies.Modeling thresholds might also be transferrable to other antigens if both subpopulations (seropositive and seronegative individuals) are large enough.Similarly, validation studies for all antigens are essential, as antibody kinetics might differ and be more complex due to reinfection rates, cross-reactivity, population differences, agedependence, and others (18,35,36).
-This statement is accurate, but it contradicts the previously and subsequent conclusion that the method generates "robust and reliable results" that could help conduct seroepidemiological research.Either training the model on a subset of the already collected data and testing it on the other subset, or ideally testing the current threshold models on new samples, is needed to claim that the method can reliably and robustly generate data usable to conduct sero-epidemiological research.It is also acceptable to remove these claims, or change them and instead focus on the proven improvements on assay sensitivity.This could still include a proposed threshold for sero-positivity that could be tested in future experiments.That is already a significant finding.-Either delete "and others", or include "variables such as" before "reinfection rates".This question is addressed in the previous answer.We changed the specific paragraph slightly, to put the emphasize more on the method than on the thresholds: Our results showed that finite mixture models seem promising to generate thresholds for T. gondii multiplex serology, which are solely based on the data distribution of the results.Nevertheless, a bimodal data distribution needs to be verified in upcoming seroepidemiological studies.Modeling thresholds might also be transferrable to other antigens and pathogens if both subpopulations (seropositive and seronegative individuals) are large enough.Similarly, validation studies for all antigens are essential, as antibody kinetics might differ and be more complex than for T. gondii due to variables such as reinfection rates, cross-reactivity, population differences, age-dependence, and others (18,26,38).

Figure 3 :
Figure 3: Antibody levels against the T. gondii antigens SAG1D1 and P22trunc in individuals with a seropositive (magenta) and seronegative (blue) reference status.Lines represent proposed thresholds for seropositivity, derived from ROC analysis (solid), fitting a mixture of two normal distributions (dashed) or fitting a mixture of two skew-normal distributions (dotted).

Table 2 :
Characterization of T. gondii antigens Median antibody responses and IQRs against T. gondii antigens in individuals with positive (T.gondiipos.) and negative (T. gondii neg.) reference status determined in two serum dilutions.

Table 3 :
Diagnostic accuracy applying threshold values based on (a) maximizing Youden's index after ROC analysis, (b) the local minimum of two mixed normal distributions and