Discrimination of primary and chronic cytomegalovirus infection based on humoral immune profiles in pregnancy

.


Introduction:
A member of the herpesvirus family, human cytomegalovirus (CMV) commonly manifests as a mild or asymptomatic infection.However, in very young and immunocompromised individuals, CMV can cause severe disease; it is the leading cause of congenital infection among newborns (1).In the United States alone it is estimated that 40,000 children are born with congenital CMV (cCMV) infection every year (1).This high burden is still likely an underestimate since many cases are asymptomatic at birth.Diagnosed infections are often severe, leading to an estimated 400 deaths and an additional 8,000 cases presenting with permanent disabilities, including speech and language impairment, hearing loss, mental disability, cerebral palsy, and vision impairment, annually (1,2).Fetal infection results from intrauterine transmission and is most likely to occur when a mother experiences primary CMV infection during pregnancy (3)(4)(5).The difference in fetal infection risk between primary and reactivated maternal CMV infection is striking, with approximately one third of primary infections leading to CMV infection of the fetus as compared to < 3.5% estimated to result from CMV reactivation or superinfection (6)(7)(8)(9)(10).New insights into CMV infection during pregnancy that could contribute to identification of pregnancies at greatest risk, efficient testing of new therapeutic interventions, and vaccines that could modify transmission risk are urgently needed (11).
Pregnant women present a population of subjects who are not immunodeficient, but in a unique state of immune regulation in order to ensure fetal tolerance, and can transmit CMV to their fetus during pregnancy.While the differing risk of transmission associated with maternal seropositivity provides strong evidence that pre-conception immunity plays a protective role, studies evaluating the clinical potential of CMV-hyperimmune globulin to improve neonatal outcomes have yielded mixed results (12)(13)(14).While possible contributing factors to these results include low potency and insufficient serum persistence of the CMV-hyperimmune globulin, more data is needed to better understand the clinical potential of this intervention in the context of vertical transmission (15,16).It has been speculated that these differing outcomes may relate to gestational age and the timing of the intervention following maternal infection and therefore be impacted by both the reliability of the diagnostic approach and strictness in the definition of primary infection cases.Because the suitability of these approaches for application during pregnancy can be supported by comparison of responses in pregnant with non-pregnant individuals during primary infection, developing a deeper understanding of humoral immune profiles in this unique immune state may have implications for the clinical development of diverse small molecule and biologic antiviral interventions.
Short of these goals, given the widely diverging congenital infection risks associated with primary and chronic infection, confident discrimination between these two states is crucial for identifying newborns with highest risk of cCMV infection.Complementing virological assessment, discrimination of these two states using serology may allow antiviral treatment in pregnant women and hearing and learning interventions in newborns to be initiated early (17,18).Current diagnostic assays for pregnant women are performed using serology measurements for IgM and IgG avidity, which decrease and increase over time following infection, respectively, while a PCR based urine test, often following a saliva PCR screening test, is the standard for newborns (17,(19)(20)(21).However, although CMV is a common infection, there is no universal standard diagnostic assay across countries and healthcare centers (22), introducing variability in care and posing challenges to clinical trial design for testing of novel interventions.Additionally, many countries, including the United States, do not routinely screen for CMV in pregnant women, whereas some European countries as well as some individual states (e.g.Minnesota) do perform routine screening-adding another layer of complexity to clinical data regarding the impact of maternal CMV status, antibody responses, and timing of infection as assessed by differing diagnostic measures and across populations.Irrespective of these details, following a primary CMV diagnosis during pregnancy, clinicians typically counsel pregnant women about the future risks to their fetus.
The dichotomy in fetal risk profiles of primary and non-primary CMV infection has important implications ifor predicting transmission risk for pregnant women.Beyond this classification, the specific timing of CMV infection is also a crucial factor in the potential of severe congenital disease, with a primary infection in the first term of pregnancy leading to higher likelihood of severe cCMV infections as compared to a primary infection in the second or third trimester (23).Further insights into relative risk profiles are challenged by imperfect precision in defining and estimating the timing of primary infection, particularly during pregnancy, which has been associated with varying and dynamic immunological state changes (24), including differences in humoral responses to both infection and vaccination that vary with gestational age and impact response magnitude, kinetics, and character (25,26) in ways that have the potential to impact CMV diagnoses based on serological testing.As a result, more accurate and confident diagnosis of primary and non-primary CMV infection, specifically in the context of pregnancy, would benefit the medical community and affected birthing parents by providing clearer and more definitive insights into their CMV infection status and more optimally support studies of associated risk to the fetus.
Previous work in a variety of infectious disease settings has shown that antibody responses evolve over time, exhibiting complex patterns in response magnitude and characteristics (27)(28)(29), and that they can also differ in association with pregnancy (29)(30)(31).Pregnant women with primary CMV infection represent a unique intersection of these two complex antibody response scenarios with established clinical significance.Here, leveraging carefully curated cohorts, machine learning, and highly multiplexed assays to capture a wide range of antibody response attributes over time, we evaluate how responses to primary or non-primary CMV infection vary over time and in association with pregnancy.In doing so, this work defines the limited role pregnancy plays in modifying humoral responses to CMV infection and identifies the unusual kinetic profile of IgG3 responses to different viral antigens as a new means of identifying and dating the onset of primary infection, two factors relevant to clinical studies evaluating and reducing risk of cCMV infection for affected pregnancies.

Antibody Profiles Distinguish Primary from Chronic CMV Infection
Antibody responses were profiled among both pregnant and non-pregnant individuals with either primary or chronic CMV infection (Figure 1) (Table 1), with primary infection strictly defined by CMVspecific IgG seroconversion, CMV-specific IgM antibody detection, low IgG avidity index, and/or CMV DNAemia.Conversely, chronic infection was defined by seropositivity in the absence of these diagnostic measures.Antibody responses following primary or chronic CMV infection were profiled in cross sectional and longitudinal cohorts.(Supplemental Figure 1, Supplemental Table 1).Specifically, a cross-sectional cohort included pregnant primary (n=74), non-pregnant primary (n=27), and non-pregnant chronic (n=40) individuals.Additionally, longitudinal pregnant cohort samples included primary (n=57) and chronic (n=36) subjects.To more comprehensively understand how antibody profiles vary among CMV-infected individuals, in addition to naïve subjects (n=9), antibodies specific for CMV tegument protein (32), glycoprotein B (gB), and the pentamer complex were characterized for isotype, subclass, and Fc Receptor (FcR) binding capacity (Supplemental Figure 1, Supplemental Table 1).We profiled the antibody response examining a diverse set of antibody features and CMV antigens, including gB and pentamer complexes from several different sources, with the aim of gaining a more complete view of the humoral immune response to primary or chronic CMV infection.Glycoproteins tested included several that are commercially available (Sino (Sino Biologicals), and NA (Native Antigen)), as well as others that have been characterized in structural studies conducted with an eye toward vaccine development (GSK) (33,34), and by academic groups (UT) (35,36), including a modified form of gB with mutations intended to reduce aggregation and eliminate the furin cleavage site (UT gB), and favor the prefusion conformation by including engineered proline mutations (3p gB).Tegument antigens included portions of pp150 (UL32) and pp52 (UL44) proteins, including CG1 (pp150/2-pp52/3, comprised of amino acids 495-691 and 862-1048 of pp150) and CG2 (pp150/7-pp150/1, comprised of amino acids 695-864 of pp150 and 297-433 of pp52), which were previously reported to be immunodominant targets for serodiagnosis of primary and chronic CMV infection (37,38).
To define differences in antibody profiles among subject groups that go beyond the measures typically used in clinical diagnosis, we performed UMAP analysis on CMV-specific antibody features after specifically excluding IgM (Figure 2A).This unsupervised analysis revealed that the main aspects of differentiation among subjects related to infection status; striking differences in the antibody response were observed between individuals with primary as opposed to chronic infection.In contrast, limited differences in the global response profiles were observed between pregnant and non-pregnant individuals.While univariate analysis between pregnant and non-pregnant female individuals during primary infection did reveal some nominally statistically significant differences (Supplemental Figure 2), we could not exclude the possibility that these differences resulted from a difference in timing of diagnosis or were impacted by differences in age between cohorts.The former may be likely, as similarly subtle differences were observed between primary subjects in these groups and an independent cohort (Erasme Hospital) of pregnant women with primary CMV infection (n=23) as defined using different assays and criteria by a different medical center, and whose blood was collected over a broader range of days post diagnosis (39) (Supplemental Figure 3).We additionally explored whether biological sex impacted antibody responses as some cohorts included both male and female participants, but found limited differences (Supplemental Figure 4).Given these observations and the lack of robust differences associated with pregnancy status or biological sex, these variables were not considered in subsequent analyses.
We next investigated which individual features of the immune response might contribute to clustering of subjects in primary and chronic infection groups.Visualizing the degree and confidence of differences in antibody response features between primary and chronic infection status groups revealed aspects of the humoral response that consistently differed (Figure 2, B and C).Distinct levels of CMVspecific IgA, IgM, and FcR binding antibodies were observed between primary and chronic infection groups across multiple CMV antigens.IgM responses were elevated across all antigens for the primary infection group, which was expected as IgM titers were used to define infection status.Interestingly, total CMV-specific IgA and both IgA1 and IgA2 subclasses were elevated in primary infection subjects.Somewhat surprisingly, total IgG had very modest differences between primary and chronic infection groups.
Given the striking differences among groups we next wanted to further examine individual responses for IgM, IgA, and IgG across the panel of CMV antigens.As expected, IgM responses were strongly and significantly elevated among subjects with primary infection, as were IgA responses across diverse antigen specificities (Figure 2C).Interestingly, and in contrast with total IgG response levels (Figure 2C), which exhibited statistically significant but relatively small differences in response magnitude, FcγR-binding antibodies were both significantly and strongly elevated among subjects with chronic CMV infection (Figure 2B, Supplemental Figure 5), suggesting the presence of qualitative differences in the antibodies present during primary and chronic infection that may impact the effector functions mediated by this class of widely expressed innate immune cell receptors.

IgG Subclasses and CMV Antigens Display Distinct Profiles in Primary and Chronic Infection
The observation that FcgR binding antibodies but not total IgG were reliably elevated in chronic subjects was intriguing, and suggested that differences in induction of IgG subclasses, which differ dramatically in their FcR binding capacity, may exist.However, with the exception of IgG2 responses to tegument, IgG2 and IgG4 responses to CMV antigens were uncommon, and differences in the levels of these more functionally inert subclasses were not observed.Humoral responses to viral infections are generally dominated by IgG1 and IgG3 antibodies, which both bind well to activating FcgR and have the potential to elicit the potent antiviral activities of the complement cascade and innate immune effector cells (40,41).While statistically significant differences were observed for a subset of antigens tested, IgG1 responses, which are typically dominant following acute viral infections (42,43), were generally similar in primary and chronic infection (Figure 3A).Only one of each of the gB, pentamer, and tegument proteins tested showed a statistically significant difference, and each of these demonstrated only a small increase among the chronic infection group.In contrast, IgG3 responses were more uniformly and strikingly distinct between groups.Perhaps most surprisingly, the direction of these differences varied by antigen-specificity.Antibody responses to tegument proteins exhibited elevated levels in primary infection, but elevated levels of gB and pentamer-specific IgG3 were observed among the chronic infection group (Figure 3A).While all pentamer complexes tested showed this profile, responses to recombinant gB proteins were more variable, with two gB preparations showing elevated levels among chronic subjects, and one preparation showing elevated levels in primary infection.While some distinctions in IgG subclass composition in the context of primary and latent herpesvirus infections have been reported (44)(45)(46)(47)(48), how well they may support discrimination between primary and chronic CMV infection is not known.In sum, while IgG1 responses tended to either persist or increase between primary and latent infection classes, IgG3 responses differed dramatically by antigen-specificity; responses to tegument were higher among subjects with primary infection, but responses to gB and pentamer were instead elevated in chronic infection.
While UT pentamer-specific IgG1 and IgG3 levels were correlated with each other among subjects with primary infection (RP=0.33,p<0.0001), they were not well correlated among individuals with chronic infection (RP=0.20,p=0.08) (Figure 3B).This difference in degree of correlation between IgG1 and IgG3 responses between infection status states was consistent across pentamer proteins evaluated.
Additionally, the strength of correlation observed between these IgG subclasses and the FcgRIIIaVbinding capacity of pentamer-specific antibodies changed over time, with stronger correlations between IgG1 and FcgRIIIaV observed in primary infection, but IgG3 and FcgRIIIaV binding in chronic infection (Figure 3C).These patterns were consistent across FcgRs and pentamer proteins tested (Figure 3D), suggesting that the pool of antibodies most capable of eliciting FcgR-dependent effector functions changes in composition over the course of infection.These temporal differences may have important implications to both viral pathogenesis based on the potential influence of IgG subclass on immune evasion mediated by viral Fc receptors such as gp34, gp68, gp95, and gpRL13 (49)(50)(51)(52), as well as to host defense based on their differing capacities to induce robust antibody effector functions directed against free virions or virally-infected cells.

Prediction of CMV infection status using machine learning
Next, because unsupervised analysis of antibody profiles revealed clear distinctions between primary and chronic CMV infection, we applied supervised machine learning to explore the ability of a model to accurately discriminate primary and chronic CMV infection and to identify features that contribute to class differentiation (Figure 4A).Based on its simplicity and interpretability, we employed a logistic regression framework with regularization to classify primary or chronic CMV infection based on antibody profiling data while reducing the risk of overfitting associated with high-dimensional data.Again, IgM responses were excluded because they were used in clinical class assignment of the study groups.
The model was trained on 80% of primary and chronic subject profiles while the remaining 20% was used for testing.Furthermore, repeated five-fold cross validation was employed so each subject would be part of the test set and representative accuracy across different folds could be defined.Model predictions were highly accurate; across 100 repeated five-fold cross validation runs, the median model accuracy for predicting primary or chronic CMV infection was 94% (Figure 4B).In contrast, prediction results were essentially random when training and testing was performed after permutation of class labels, which serves as a means to assess model robustness by measuring the potential for overfitting.Misclassifications were not biased toward one or the other class, as shown for the cross validation run repeat presenting median accuracy as a representative confusion matrix (Figure 4C).
Classification calls for actual but not permuted class labels were both typically correct and assigned to respective classes with high probability (Figure 4D).The relatively few incorrect classifications were typically close to the decision boundary.Given this evidence of model accuracy and robustness, a final model was trained on all subjects in the discovery cohort.When applied to an independent longitudinal set of 57 primary infection and 36 chronic infection samples serving as a validation cohort, this model resulted in similarly high confidence and perfect classification accuracy (Figure 4D).Additionally, an independent sample set of primary infection cases from a distinct medical center (Erasme) also yielded excellent accuracy (Figure 4D).Given this validation, the top features employed in the final model were examined (Figure 4E).The features with largest positive coefficients, which serve to identify primary subjects, were primarily IgA-related antibody responses directed to tegument antigens (4/5 features).
Conversely, features with large negative coefficients, useful to identify chronic infection, were typically related to IgG3 and FcgRIII-binding capacity of response to gB and pentamer (4/5 features).Collectively, these modeling results point to specific antibody response attributes, distinct from traditionally considered parameters such as IgM and IgG avidity, as being excellent candidate markers for distinguishing primary and chronic infection status.Among these, tegument-specific IgA and glycoprotein-specific IgG3 responses stand out as robust contributors that could be easily evaluated.

Modeling Longitudinal Responses to CMV Infection Reveals a Molecular Clock of Antibody Responses
The excellent discrimination of primary and chronic CMV infection in the validation cohort led us to next explore how class predictions related to longitudinal development of humoral immune responses to infection.To this end, subsequent samples were available for the validation cohort over a series of up to four visits that extended out to half a year after initial sampling or infection onset.We explored the longitudinal cohorts in unsupervised fashion across all features by projecting serial timepoints on a UMAP model developed on visit one samples (Supplemental Figure 6).Again, chronic subjects clustered distinctly from primary subjects.Strikingly, subsequent samples from the primary subjects shifted closer to the chronic samples, consistent with the existence of humoral response characteristics that exhibit consistent changes over time following primary CMV infection.Next, the model trained on the initial cohort was used to predict class for longitudinal samples from the validation cohort (Figure 5A).Whereas chronic subjects were consistently classified as such with similarly high confidence at all subsequent visits, subjects with primary infection at the initial timepoint became less confidently classified as primary at subsequent visits (Figure 5B).By visit four, the majority of samples from subjects with primary infection at their initial visit exhibited class probabilities below the lowest observed at the initial sampling.However, because visits were not consistently spaced in time between subjects, these longitudinal profiles can be more meaningfully compared over time post symptom onset (Figure 5C).Despite the imperfect reliability of projected timing of infection, the classification model probabilities presented a clear relationship with time.Samples fell below the midpoint on the classification scale as early as 90 days post infection onset, but failed to reach values typically observed among chronic subjects even out at 250 days post infection, suggesting a relatively prolonged transition to reaching the chronic state profile.As expected, the individual antibody response features making the greatest contributions to the classification model also demonstrated clear changes among subjects with primary, but not chronic infection, over time (Figure 5D).
To this point, machine learning models have only been concerned with making predictions on the probability of a sample belonging to either the primary or chronic infection class.However, given the clear ability for these binary classification models to provide insight into time since infection, we next sought to evaluate models explicitly trained for this specific purpose.For this purpose, longitudinal profiles of the primary infection cases across visits were used to train a model to predict time since symptom onset as a continuous variable (Figure 5E).The resulting linear regression model, which showed good robustness in the context of five-fold cross-validation (Supplemental Figure 7A), was then used to predict days post infection for the cross-sectional primary samples.This model, which relied primarily on IgG3 features (Supplemental Figure 7B) showing strong time-dependence (Supplemental Figure 7C), was then applied to the cross-sectional cohort.Predictions of time since symptom onset in the unseen validation cohort exhibited excellent accuracy for both this and the independent Erasme cohort (Figure 5F).Overall, this analysis demonstrated that antibody profiles in CMV-infected individuals exhibit generalizable temporal patterns in their dynamic antibody responses during primary infection that can be used to retrospectively date the time of infection.

Discussion
Presently, discrimination of primary and non-primary CMV infection in the context of pregnancy is used in counseling as to the risk of congenital CMV infection based on the lower risk of congenital infection associated with non-primary infection and the prescription of antiviral therapies.Beyond this value, in the absence of an effective vaccine, a deeper understanding of how immune responses differ in association with infection history and transmission risk has the potential to contribute to the development of new interventions.Here, high dimensional antibody profiles beyond IgM levels and IgG avidity were developed from a commonly used multiplex assay format and supervised and unsupervised machine learning was used to differentiate primary and chronic infection and pregnancy status.
Whereas limited or no differences in humoral responses were associated with pregnancy status, our study showed clear distinctions in antibody profiles between primary and chronic infection cases.Both composite profiles individual antibody features related to antigen-specificity and immunoglobulin isotype, subclass, and binding to Fc receptors demonstrated these distinctions.Importantly, these differences extended beyond those previously known to exist and which are presently applied to support clinical diagnosis.While further work is needed to assess the performance of a restricted set of the features identified as useful differentiators in this study for routine clinical laboratory use, IgG3 responses to pentamer, and to a lesser extent gB, along with IgA responses to tegument proteins were identified as good potential candidates.Whereas multiplex assays were employed here, we expect that assessment of these responses could be readily adapted to other assay formats commonly used in clinical testing labs.
The number and longitudinal profiles of the features that distinguish infection history suggested that a sort of humoral clock could be defined in order to time the onset of primary CMV infection.Indeed, supervised machine learning models that reliably captured how antibody responses to CMV infection varied over time were learned and validated on independent samples.Beyond potential clinical utility, the defining features of how responses to primary CMV infection transition over time is relevant to understanding the evolution of the immune response to this member of the notoriously immune-evasive Herpesviridae family (53,54).Current diagnostic methods rely on detection/levels of IgM and of IgG avidity.The presence of IgM alone is insufficient to diagnose a primary CMV infection: poor correlation between commercial tests have been reported (55), and false positive primary status calls can result from both persistence of IgM as well as boosting in response to reactivation (56,57).Likewise, a positive test for CMV-specific IgG indicates an individual is positive for CMV but provides little information into the time since infection (17,58).However, the combination of IgM positivity and low IgG avidity is generally considered to be a reliable indicator of primary CMV infection, though interpretation of IgG avidity tests are confounded by low levels of CMV-specific IgG, intermediate responses raise classification issues, and these indicators are supported by clinical studies of small size.
Coupled to the lack of an international standard serum panel of samples from primary infection, the difficulty inherent to establishing a uniform, accurate, and robust diagnostic method is clear (59).Our data demonstrates that there are other changes in the secreted antibody repertoire that reliably occur over consistent time periods during primary infection and which could provide clinical utility, enhancing confidence in enrolment of participants in both interventional and observational studies.
Prior studies of IgG subclasses of CMV-specific antibodies have consistently reported the IgG1 and IgG3 bias typical of anti-viral antibodies but noted that subclass responses and total IgG titers can be discordant (45,60).Further, they have attributed relatively greater neutralization potency to this minor portion to IgG3 (46).The greater sensitivity of multiplexed assays over classical western blots, as well as their ability to define responses directed toward specific antigens as compared to whole virus or infected cell lysate offered the possibility to define previously unappreciated aspects of how the humoral response to CMV infection changes over time.To that end, we were surprised to observe increasing levels of IgG3 specific for viral glycoproteins over time.Whereas IgG3 responses are usually associated with acute infections and typically wane over time (43,(61)(62)(63), they were observed to increase across both crosssectional and longitudinal cohorts in an antigen-specific manner.Namely, whereas IgG3 responses to tegument proteins showed the typical pattern and decreased while IgG1 responses were stable or increased, IgG3 responses to pentamer increased while IgG1 responses remained largely stable over time.Intriguingly, memory B cells (MBC) specific for gB are known to exhibit phenotypic states distinct from those specific for tegument antigens (64).The reduced frequency of MBCs with effector potential specific for glycoprotein as compared to tegument observed in this prior study may relate to the altered kinetics of IgG subclasses observed here.While the mechanistic underpinning and the biological implications of this unusual pattern have yet to be determined, as noted previously, CMV expresses multiple Fc binding proteins, including gp34, gp68, gp95, and gpRL13, some of which are known to antagonize host FcgR (65).Unlike the viral FcR of HSV, these proteins have been reported to bind to all human IgG subclasses (49,66), though affinities and potential differences among the IgG allotypes have not been reported.Additionally, how these two surface proteins contribute to viral pathogenesis is incompletely understood (67).
Limitations of this study include the fact that the majority of samples were sourced from a single geographic region, leaving open the possibility that differences associated with host genetics and environmental and infectious disease exposure history impact these observations.Additionally, the onset of infection could only be estimated rather than defined with precision, which confounds models of this parameter and points to the value of further evaluation in the context of infections for which definitive timing is known in order to provide greater confidence.While sex as a biological factor was investigated, cohorts were skewed toward female representation and had limited power to define sex-based differences.Further, the panel of antigens tested was not exhaustive; for example, no viral FcR were included, and while responses to common vaccine antigens were evaluated, surface glycoprotein coverage was not comprehensive.Among tested antigens, reactivity patterns sometimes differed in association with different sequences, conformational states, presence or absence of glycosylation sites, recombinant protein expression host cell lines, and other factors, the contributions and importance of which have yet to be defined.These and other factors may represent worthwhile directions for future study.
Lastly, while relationships between maternal immune responses and transmission of cCMV are of exceptionally high clinical relevance to both risk management and vaccine research and development, this study focused on CMV infection history.However, while risk is considerably greater during primary infection, there are certainly differences in virologic, innate immune, and other factors that contribute to these differing risk profiles (68)(69)(70)(71).The influence of maternal antibody responses is unclear, particularly given the conflicting results in studies of passive antibody therapy in the context of primary maternal infection (72,73).While the dosing and frequency of hyperimmune globulin also differed in these studies, a positive effect was only observed in women with very early infection, pointing to the potential importance of accuracy in the dating of infection recency.Recent studies have started to explore the role that antibody responses play in CMV transmission in a rhesus macaque model (74,75).However, data in humans is limited and often confounded by the differing risk of infection associated with primary as compared to non-primary infection (76).Indeed, our observations illustrate the extent of this confounding and point to the value of highly curated cohorts targeted to address the critical unmet need for a CMV vaccine.Study within primary and chronic infection groups will be required to unambiguously relate risk of transmission with attributes of the humoral immune response.In the meantime, confident identification of experienced individuals can inform evaluation of the impact of infection history and timing on the immunogenicity of candidate vaccines, and the experimental and analytical pipeline presented here could be deployed on vaccine trial samples to look for relationships between humoral immunity and infection or cCMV transmission risk, Overall, many open questions regarding the role of humoral immunity in the context of CMV infection, transmission, latency, and reactivation remain.Higher resolution and more comprehensive analysis of antibody responses using systems serology approaches has the potential to improve our understanding of the complex virus-host interactions at play.Here, by analyzing highly curated cohorts, we report and validate phenotypic signatures of gB, pentamer, and tegument-specific antibody responses that not only robustly classify primary infection status, but also provide insights into time of infection.It remains to be seen if the atypical dynamic profile of IgG3 responses to envelope glycoproteins elicited by CMV is antigen-intrinsic and might be recapitulated when these antigens are delivered by other means, or if it may represent an evasion strategy dependent on other viral genes or aspects of the innate response to viral infection in the context of this notoriously immunoevasive virus.In the meantime, this work stands to define hallmarks of primary CMV infection and time of infection that may present new opportunities to streamline primary infection diagnosis, potentially impacting current clinical practice and enrolment of pregnant women with primary infection in interventional trials, thereby providing new insights into relative cCMV risk and management strategies.

Sex as a Biological Variable
Our study examined the differences in pregnant and non-pregnant individuals following CMV infection.Pregnant individuals were female.Differences associated with biological sex among non-pregnant individuals are presented in Supplementary Figure 4.

Clinical Samples
Serum samples were gathered from subject cohort groups of pregnant primary infection, pregnant latent infection, non-pregnant primary infection, non-pregnant latent infection, as well as CMV negative patient cohort as a negative control group (Table 1).Human subjects were recruited from Fondazione IRCCS Policlinico San Matteo, Pavia, Italy, and included healthy, primary, and chronically CMV infected subjects, as well as pregnant and non-pregnant subjects.Diagnosis of primary CMV infection in the Pavia cohorts was based on two or more of the following criteria: CMV-specific IgG seroconversion, CMVspecific IgM antibody detection, low IgG avidity index and CMV DNAemia.Chronic infection was defined by the presence of CMV-specific IgG, the absence of CMV-specific IgM, and no detection of CMV DNA in blood, saliva, urine and genital secretions.Primary infection among non-pregnant participants was diagnosed similarly with the exception that DNAemia was not assessed.
HCMV-specific IgG and IgM were determined by ETI-CYTOK-G and ETI-CYTOK-M (DiaSorin, Saluggia, Italy).IgM results obtained by the commercial assay were confirmed by an in-house developed capture ELISA assay (77).IgG avidity index was determined by an in-house developed ELISA test using HCMV nuclear antigen (59).The avidity index was defined as low if <35%, typically representing a primary infection acquired <12 weeks earlier, with avidity index values <15% indicating an infection acquired <6 weeks earlier.
In 49/74 (66%) pregnant women and 26/27 (96%) non-pregnant subjects, time of onset of primary infection was defined by the appearance of symptoms, while in 19/74 (26%) pregnant women who were asymptomatic, onset of infection was estimated on seroconversion (i.e. in the midpoint between the last IgG negative and the first IgG positive test result), occurring within a ≤6-week interval.Finally, in six (8%) asymptomatic pregnant women and one (3%) asymptomatic non-pregnant subject, onset of infection was estimated on the kinetics of CMV-specific IgM and IgG avidity index.For a set of 40 pregnant women and 28 non-pregnant subjects with primary infection, 2-4 sequential serum samples collected until 6 months after onset of infection were available.Longitudinal samples from pregnant women with chronic infection were collected at 10, 20, and 30 weeks of gestation and at delivery.An additional cohort of pregnant women with primary infection was recruited from Erasme Hospital, as previously described (39,78).
Diagnosis of primary CMV infection was made by either documented IgG seroconversion, or increased titers of CMV-specific IgM at a subsequent sampling.IgG seroconversion was not observed in all individuals, and not all subjects were symptomatic.
As a result, in some cases, primary infection status was inferred and time of infection was estimated from clinical assessments.Unsupervised analysis of data generated in this study indicated the lack of systematic bias associated with samples for which primary infection was diagnosed by inference as compared to those for which IgG seroconversion was observed, lending confidence to these inferences (Supplemental Figure 8).
The tegument antigens tested were chosen based on prior studies of the pp150 (UL32) and pp52 (UL44) CMV proteins.CG1 included aa 495-691 and aa 862-1048 of pp150 whereas CG2 included aa 695-864 of pp150 and aa 297-433 of pp52.These two polypeptides were previously described as immunodominant targets for sero-diagnosis of primary and chromic CMV infection (37,38).Characterization of antibody profiles was performed using the Fc array assay (80,81).Antigens were covalently coupled to magnetic microspheres (Luminex Corporation) using carbodiimide chemistry.
Serum dilutions used in assays ranged from 1:250-1:5000 based on initial pilot experiments and previous experience (Supplemental Table 1).Detection of antigen specific antibodies was done using Rphycoerythrin-conjugated secondary reagents specific to human immunoglobulin isotypes and subclasses and by Fc receptor tetramers.Median fluorescent intensity (MFI) data was acquired on a FlexMap 3D array reader (Luminex Corporation).Samples from CMV-naïve individuals were tested to establish the specificity of measurements, and MFI values were not quantitatively compared between antigenspecificities or detection reagents.Samples were tested in technical duplicates and results were averaged.

Classification of CMV infection status
A binomial logistic regression model with least absolute shrinkage and selection operator (LASSO) regularization was used to prediction infection status.Model training was performed using the scikit-learn (version 1.3) in Python (version 3.9) with default options.The regularization parameter was chosen using the option that gave the lowest classification error.The model was trained to minimize the log loss function and the class boundary was set at a probability value of 0.5.Model accuracy was determined by the test set label predictions compared with true labels.Accuracy was assessed over 100 repetitions of five-fold cross validation.Permutation testing was done to measure model robustness by performing the same procedure as described above but on data for which class labels had been randomly shuffled.
Feature importance was determined from a final model that included all subjects.While other model architectures were tested and resulted in similar prediction accuracies, logistic regression results were selected for presentation given their simplicity and interpretability.

Prediction of time since infection
The same machine learning model as described for predicting CMV infection status was used, this time minimizing mean squared error in time since infection.The model was trained on cross sectional sample data and tested on longitudinal and Erasme samples.

Statistics
Statistical analysis was performed in GraphPad Prism (version 9.7).UMAP plots were generated in Python (version 3.9) using the umap-learn package (version 0.4)(82) and then plotted using Prism.
Volcano plots were generated in R (version 4.3) using ggplot2.Statistical tests are described in the relevant figure legends.

Study approval
The study was approved by the IRBs at Fondazione IRCCS Policlinico San Matteo and Erasme Hospital for sample collection, and Dartmouth College for sample testing and analysis.Each participant gave written informed consent.

Figure 2 :
Figure 2: Antibody features distinguish primary from Chronic CMV infection but not pregnancy status. A. Uniform manifold approximation (UMAP) biplot of antibody features excluding IgM.Distinct clusters of subjects with primary (blue, n=158) and chronic (green, n=76) infection but not pregnancy status (hollow and filled symbols) are observed.B. Volcano plot of each CMV-specific antibody feature assessed.Volcano plot represents the log2 fold change (x-axis) against the -log10 p value (Mann-Whitney test: *p < 0.05, **p < 0.01, ***p < 0.001, and ****p < 0.0001).Antibody specificities (Antigen) are indicated by shape and Fc characteristics (Detection) indicated by color.C. IgM, IgA, and IgG binding to CMV antigens (further described in Supplemental Table 1).Data are the mean median fluorescent intensity (MFI) values of technical replicates.Solid red line indicates median.

Figure 4 :
Figure 4: Machine learning accurately predicts primary or chronic CMV infection status.A.Schematic overview of cross-validated machine learning workflow employing antibody profiling data to discriminate between primary and chronic infection status in discovery and validation cohorts.B.

Figure 5 :
Figure 5: Longitudinal models define a molecular clock of CMV primary infection.A. Analysis overview.The infection status classification model trained on the cross-sectional cohort was applied to longitudinal samples available from the validation cohort.B. Class probabilities of each sample in longitudinal cohort over sample collection visits for individuals with primary (blue) and chronic (green) infection.C. Scatterplot of class probabilities for subjects defined as having primary infection at visit 1 over time.D. Scatterplots of features employed by classification model to predict infection status over time in the longitudinal cohort.E. Analysis overview.Primary infection samples from the longitudinal cohort samples were used to train a regression model to predict time since infection (days post symptom onset) that was applied to the primary samples from the cross-sectional cohort.F. Scatterplot of model predictions of time since infection when primary samples used for predicting days post symptom onset.The cross-sectional Pavia (left) and Erasme (right, n=23) samples were used as distinct validation cohorts.Data shows the measure of the predicted label and its closeness to the true label.

Figure 2 :
Figure 2: Antibody features distinguish primary from Chronic CMV infection but not pregnancy status. A. Uniform manifold approximation (UMAP) biplot of antibody features excluding IgM.Distinct clusters of subjects with primary (blue, n=158) and chronic (green, n=76) infection but not pregnancy status (hollow and filled symbols) are observed.B. Volcano plot of each CMV-specific antibody feature assessed.Volcano plot represents the log2 fold change (x-axis) against the -log10 p value (Mann-Whitney test: *p < 0.05, **p < 0.01, ***p < 0.001, and ****p < 0.0001).Antibody specificities (Antigen) are indicated by shape and Fc characteristics (Detection) indicated by color.C. IgM, IgA, and IgG binding to CMV antigens (further described in Supplemental Table 1).Data are the mean median fluorescent intensity (MFI) values of technical replicates.Solid red line indicates median.

Figure 4 :
Figure 4: Machine learning accurately predicts primary or chronic CMV infection status. A. Schematic overview of cross-validated machine learning workflow employing antibody profiling data to discriminate between primary and chronic infection status in discovery and validation cohorts.B. Prediction accuracy for 100 repeated 5-fold cross-validation runs on actual (black) and permuted (gray) class labels.C. Confusion matrix of predicted versus actual class labels in the median model for 5-fold cross validation.D. Class probabilities of each sample in the discovery set when evaluated as a test sample in the cross-validation run exhibiting median performance (left) and for the cross-sectional (center) and Erasme (right) validation cohorts using the final model.E. The identities and coefficients of the features making the largest positive (n=5) or negative (n=5) contributions to the final model.Solid lines denote group medians; differences between groups or conditions were assessed by Mann-Whitney test (*p < 0.05, **p < 0.01, ***p < 0.001, and ****p <0.0001).

Figure 5 :
Figure 5: Longitudinal models define a molecular clock of CMV primary infection.A. Analysis overview.The infection status classification model trained on the cross-sectional cohort was applied to longitudinal samples available from the validation cohort.B. Class probabilities of each sample in longitudinal cohort over sample collection visits for individuals with primary (blue) and chronic (green) infection.C. Scatterplot of class probabilities for subjects defined as having primary infection at visit 1 over time.D. Scatterplots of features employed by classification model to predict infection status over time in the longitudinal cohort.E. Analysis overview.Primary infection samples from the longitudinal cohort samples were used to train a regression model to predict time since infection (days post symptom onset) that was applied to the primary samples from the cross-sectional cohort.F. Scatterplot of model predictions of time since infection when primary samples used for predicting days post symptom onset.The cross-sectional Pavia (left) and Erasme (right, n=23) samples were used as distinct validation cohorts.Data shows the measure of the predicted label and its closeness to the true label.