Reference-based comparison of adaptive immune receptor repertoires

Summary B and T cell receptor (immune) repertoires can represent an individual’s immune history. While current repertoire analysis methods aim to discriminate between health and disease states, they are typically based on only a limited number of parameters. Here, we introduce immuneREF: a quantitative multidimensional measure of adaptive immune repertoire (and transcriptome) similarity that allows interpretation of immune repertoire variation by relying on both repertoire features and cross-referencing of simulated and experimental datasets. To quantify immune repertoire similarity landscapes across health and disease, we applied immuneREF to >2,400 datasets from individuals with varying immune states (healthy, [autoimmune] disease, and infection). We discovered, in contrast to the current paradigm, that blood-derived immune repertoires of healthy and diseased individuals are highly similar for certain immune states, suggesting that repertoire changes to immune perturbations are less pronounced than previously thought. In conclusion, immuneREF enables the population-wide study of adaptive immune response similarity across immune states.

INTRODUCTION B and T cell receptor (BCR, TCR) repertoires (also called adaptive immune receptor repertoires, AIRR) are continually shaped throughout the lifetime of an individual in response to environ-mental and pathogenic exposure. As of yet, however, there exists only a limited quantitative conception of how immune receptor repertoires differ across individuals and cell populations (Brown et al., 2019;Miho et al., 2018;Raybould et al., 2021). This is primarily because a method for measuring inter-individual MOTIVATION B and T cell repertoires record past and current immune states. Therefore, the majority of immune repertoire studies aim to measure the impact of the immune state on the immune repertoire because it is widely assumed that repertoires change as a function of the immune state. So far, a method to measure and immunologically interpret differences between immune repertoires has remained unavailable. We have addressed the methodological challenge of immune repertoire comparison by implementing a referencebased multidimensional repertoire similarity measure based on in silico and experimental immunologically interpretable ground truth.
(inter-repertoire) similarity is lacking, thus greatly impeding the understanding of how health and disease shape immune repertoires and how disease contributes to the deviation of an individual's baseline repertoire (Cobey et al., 2015). Although it is generally thought that infection or disease induces measurable repertoire changes (even on the antigen-specific agnostic level), this belief remains unproven and, in fact, is counter to current evidence finding, using statistical learning, that even in systemic infections such as cytomegalovirus (CMV) only a comparatively very small number of TCRs are infection associated (DeWitt et al., 2018;Emerson et al., 2017;Pavlovi c et al., 2021). As opposed to machine learning approaches that aim to detect the most differentiating factors (i.e., subsets of a repertoire) between, for example, two different immune states Pavlovi c et al., 2021;Pertseva et al., 2021;Shemesh et al., 2021;Widrich et al., 2020aWidrich et al., , 2020b, we investigate here a method for quantitatively comparing any two repertoires in an unsupervised fashion. We thus seek to understand to what extent individuals differ with respect to their entire repertoire and not just class-associated subsets.
The need for comparing immune repertoires using a quantitative measure has recently been addressed by approaches based on single sequence-dependent and sequence-independent features, which vary in statistical dependency (mutual information) and immunological interpretability (Chiffelle et al., 2020;Miho et al., 2018;Olson et al., 2019). Sequence-dependent approaches range from the measurement of clonal overlap (Bolen et al., 2017;Greiff et al., 2015a;Miho et al., 2018;Rognes et al., 2022;Yaari and Kleinstein, 2015) to more sophisticated algorithms that identify disease-specific enrichment of sequence clusters by testing against VDJ recombination models  or similarity networks of control datasets Shugay et al., 2015). Sequence-independent approaches are mainly represented by entropy-based diversity indices (Alon et al., 2021;Greiff et al., 2015a;Kaplinsky and Arnaout, 2016;Strauli and Hernandez, 2016), which have lately been augmented with a correction for sequence similarity (Arora et al., 2018;Vujovi c et al., 2021). None of the currently available comparative methods, which are based on single repertoire features, however, represent an integrated multi-feature measure of immune repertoire similarity that takes into account the complexity of information encoded in the ensemble of the existing immune repertoire features (Gupta et al., 2015;Nazarov et al., 2020;Shugay et al., 2015). Such an integrated measure, encoding per-feature similarity in one common mathematical structure, is needed to enable a representation of repertoire similarity.
Here, we introduce immuneREF: a measure for quantifying immune repertoire similarity across multiple immune repertoire features. Our framework, implemented in an R package, measures immune repertoire similarity using a combination of features that are immunologically interpretable (clonal expansion, sequence composition, repertoire architecture, and clonal overlap) and that cover largely distinct dimensions of the immune repertoire spaces. Specifically, to interpret immune repertoire similarity scores, immuneREF establishes a self-augmenting dictionary of simulated and experimental datasets where each new dataset analyzed may be used as a comparative reference for scoring and biologically interpreting inter-individual variation (and thus the deviation) of immune repertoire features (Figure 1). We applied immuneREF to >2,400 immune repertoires from humans with varying immune states (healthy, virus infection, autoimmune disease) and found that the similarity of blood-derived immune repertoires is not consistently a function of the immune state.
Overall, immuneREF enables the quantification of repertoire similarity at population scale while still providing single-individual resolution, and it enables answering fundamental questions such as to what extent immune repertoires are robust to perturbations introduced by immune events.

RESULTS
Reference-based comparison of immune repertoires based on immunological features: Constructing a similarity atlas of immune repertoires To derive a similarity measure for immune repertoires, we devised a framework that calculates a repertoire similarity score based on six features that reflect immune repertoire biology ( Figure 1). These features are (1) germline gene diversity (Greiff et al., 2015a;Yaari and Kleinstein, 2015), (2) clonal diversity (Greiff et al., 2015a;Stern et al., 2014), (3) clonal overlap (Greiff et al., 2015a;Yaari and Kleinstein, 2015), (4) positional amino acid frequencies (Mason et al., 2019), (5) repertoire similarity architecture (Bashford-Rogers et al., 2013;Ben-Hamo and Efroni, 2011;Miho et al., 2019), and (6) k-mer occurrence (Greiff et al., 2017b;Thomas et al., 2014) (see the STAR Methods section for a detailed immunological and mathematical description of these features). A similarity score is calculated for each pair of repertoires and each feature (six n x n symmetric matrices, n = number of repertoires), creating a similarity matrix for each feature. This matrix may be viewed as a weighted network, in which the nodes correspond to repertoires and the edges connecting the nodes are the similarity scores. The resulting six single-feature similarity networks enable insight into per-feature similarity. Finally, a composite network of the six feature similarity networks represents an interpretable multidimensional picture of the repertoire landscape. Briefly, the single features are condensed into a multi-feature composite network by taking the mean of all single-feature similarity values resulting in a single repertoire similarity value (for alternative approaches to computing composite networks, see STAR Methods section). By virtue of representing a similarity matrix as a weighted network repertoire, similarity may be computed on selected levels such as one (repertoire) to many (repertoires), many to one, and many to many ( Figure 1). Interpretability stems from all repertoire features being transformed into a similarity measure on a 0-1 scale allowing for direct quantification of their individual contribution to multidimensional immune repertoire similarity.
immuneREF measures immune repertoire similarity with high sensitivity We sought to quantify the sensitivity by which immuneREF can detect differences between immune repertoires with respect to the six repertoire features. The simulated repertoires, varying in a controlled manner, represent a ground truth reference map that enables a more precise assessment of immuneREF sensitivity. For example, simulated repertoires may be used to guide the evaluation of variation between experimental repertoires with respect to each repertoire feature as well as multi-feature combinations. Simulations were performed using the immune-SIM repertoire simulation suite , which was used to create native-like repertoires that were varied across eight parameters. Native-likeness was demonstrated in Weber et al. (2019). The parameters that were varied across simulated repertoires included clone count distribution, V-, (D-), J-gene frequency noise, insertion, and deletion likelihoods, species (human and mouse), and receptor type (IgH, TRB). We constructed additional simulated repertoires with spiked-in motifs (mimicking antigen-binding motifs; Akbar et al., 2019), excluded hub sequences in the sequence similarity network (simulating network architecture variation; Miho et al., 2019), and replaced nucleotide codons with synonymous codons (simulating biases in the k-mer occurrence that are relevant in detectable immunogenomic patterns of public clones; Greiff et al., 2017b) (see STAR Methods; Table S1 lists the parameter variations used for the simulations and how each of the parameters is expected to influence the six immuneREF features). The parameter combinations were chosen so each simulated repertoire varied only along one parameter dimension at a time, allowing us to determine the sensitivity of each feature to each parameter change.
The mathematical structure of the single-feature similarity matrices enables their merging into a composite network that provides the opportunity for a condensed single-score representation of inter-sample repertoire similarity. The composite immu-neREF network (which combines all six repertoire features) recovers major variation in the repertoires including noise The complexity of AIRRs spans the frequency, motif, and feature space to each, of which distinct repertoire features may be attributed: the immune information stored in AIRRs is multidimensional. A longstanding question in the AIRR field is how to quantitatively measure inter-sample (sample, e.g., individual, immune cell population) AIRR similarity by accounting for AIRR feature multidimensionality in the effort to understand the distribution of inter-sample AIRR similarity across different immune events or immune cell populations. (B) We set out to develop an AIRR similarity measure that is sensitive, captures maximal immune information, and is sufficiently flexible to allow future integration of additional repertoire features (extensibility). (C) Each AIRR is represented as a node in a similarity network. The edges connecting the nodes represent the similarity score between the AIRR based on the six repertoire features. The immuneREF approach establishes interpretability on different levels: (1) from a single-feature perspective, the application of spider plots allows for an interpretable comparative analysis between repertoires, enabling the user to interpret the result observed in the condensed network on a per feature basis.
(2) From the condensed feature network perspective, a major novelty introduced by the immuneREF workflow is the ability to combine established repertoire features into a common coordinate system. This transformation allows the combination of trends across features into a single condensed network that represents pairwise-cross-feature similarities. These pairwise similarities allow for the identification of subsets of more similar or aberrant repertoires. Interpretability on both features means allowing comparison to other repertoires and to simulated ones (of which we know the repertoire structure as ground truth), thus creating similarity equivalence classes. Equivalence classes create sets of reference repertoires, which enable interpreting the repertoire structures of other repertoires solely based on the immuneREF similarity score.  (Figures 2A and 2B). immu-neREF also clearly distinguishes repertoires from different receptors and species based on strongly distinguishing features such as V-, (D-), and J-gene usage while allowing the identification of commonalities in amino acid usage, clonal diversity, and architecture across immune receptors and species. This sensitivity analysis also underlines a major advantage of immuneREF, namely its flexibility to accommodate both BCR and TCR repertoires from different species in one single analysis workflow.
We quantified the sensitivity of immuneREF by detecting significant changes in similarity scores corresponding to the variation in simulation parameters across both the single feature (Figures S1-S3) and composite network ( Figure 2A) and found that each feature had a unique sensitivity profile to changes in the simulation parameters, underscoring the value of perfeature similarity evaluation. For example, a change in the alpha parameter of the Hill function (controlling clone count distribution) solely impacted the immuneREF diversity feature. As the immuneSIM parameter controlling the distribution of clone counts only affects the clone count simulation without impacting simulated sequences, the fact that only the feature targeted by the parameter change is impacted shows that immuneREF is robust to random noise in the simulation that is not introduced through parameter changes. An increase in the V(D)J noise parameter, which modifies the frequencies of the germline genes used in the simulation, led to detectable and significant changes in similarities of the germline gene usage and k-mer occurrence features. Modification of the insertion/deletion patterns (dropout of deletions and or insertions) led to a consistent impact in the amino acid frequency feature and, more importantly, the architecture feature, where a lower diversity due to restricted insertions and deletions led to significant changes in network architecture. Implanting motifs at various frequencies led to a significant similarity change in the k-mer occurrence feature. The deletion of hub sequences led to an impact in the architecture feature and also changed the repertoire overlap similarity, C  Article ll OPEN ACCESS thus underlining the importance of public clones in the network architecture as reported previously (Miho et al., 2019). Finally, we modified the repertoires by introducing synonymous codons at various percentages and found that the k-mer occurrence feature was the only one impacted. Therefore, we conclude that immuneREF features largely react as hypothesized to variation in simulation parameters (Table S1). Taken together, we demonstrated that the immuneREF framework is sensitive to even comparatively small repertoire variations.
Mutual information analysis demonstrates no interdependence to limited inter-dependence of immuneREF features While the examined features were initially chosen based on immunological criteria, we also wished to verify whether each feature provides a sufficiently different measurement of the immune repertoire information space ( Figure 2C). Specifically, having integrated all features into a common coordinate system, we were able to compute cross-feature mutual information and found that features show no dependence to limited dependence (range = 0.01-0.57; Figure 2C) indicating largely non-overlapping and distinct spaces of immune information captured. The highest mutual information was found between the positional and sequential sequence-derived features (i.e., positional amino acid frequency and gapped k-mer occurrence, respectively), whereas the lowest mutual information value was found between the diversity and convergence features ( Figure 2C).
Complementarily, we sought to quantify to what extent the addition of new repertoire features leads to diminishing returns (sufficiency analysis). To this end, we computed the mean change in repertoire similarity values when increasing the number of features from one through six. Thereby, we could show that each additional feature added increasingly less information, as shown by the diminishing change of the mean similarity value with each added feature. The saturation of the mean similarity change curve indicated information saturation independent of the order in which features were arranged (Figures 2D and S3G-S3J). As discussed below, mutual information values behaved similarly for experimental repertoire data. Thus, we demonstrated that the immuneREF framework creates information-laden similarity networks, whose topologies capture the immunological similarity landscape of immune repertoires.
The similarity landscape of simulated repertoires defines reference repertoires By calculating the similarity matrix for each of the six immune repertoire features, we embedded the six different immunological features into a common coordinate system, i.e., a network structure. This network (with nodes representing repertoires and weighted edges representing pairwise similarity) situates each repertoire within a similarity landscape allowing quantification of many-to-many repertoire similarity.
A more fine-grained image of the similarity landscape may be gained by examining the similarity from the perspective of every single repertoire ( Figures 3C and 3D). We define the local similarity of a repertoire to its neighboring repertoires as a scaled node strength (see STAR Methods). This local similarity represents the position of the repertoire with respect to its direct neighbors in its cohort (defined by an application-dependent label, e.g., same species and disease) and allows us to distinguish between well embedded and aberrant repertoires. The local similarity measure further acts as a magnifying glass by elucidating finer differences between repertoires, which are diluted by population averages when examining repertoire similarity across the full similarity network. Using this perspective, repertoires that are most (locally) similar to other repertoires in their cohort can be identified, allowing the extraction of repertoires most representative for a given immune state. Such detailed one-to-one feature comparisons highlight, in the most simple case, which features of the simulated repertoires are receptor specific (amino acid frequency, k-mer occurrence, VDJ usage, and convergence) and which are more general to immune repertoire data showing higher similarity across different species and receptors (diversity, architecture) ( Figures 3E and S4).
Having evaluated the similarity of simulated datasets, these may serve as a reference to interpreting similarity score variation of experimental repertoires ( Figure 3C), thus enabling the creation of equivalence classes of immune repertoires not only as previously performed based on clonal expansion (Greiff et al., 2015b) but based on six repertoire features. Furthermore, any evaluated repertoire, be it of experimental or simulation origin, will become a new node in the similarity network and may serve as a valid reference point (just as any other node in the network). This network of self-augmenting repertoire similarity reference points is another source of interpretability as it allows the linking of the repertoire similarity of any number of repertoires with their underlying features. In the next section, we provide such a repertoire similarity network on experimental datasets.
Validation of immuneREF on experimental data: Detection of differences between cell populations in mouse immunization and human COVID-19 datasets To validate immuneREF sensitivity on experimental data, we used antibody repertoire datasets generated from a mouse antigen immunization study, where differences in the similarity between antigen immunization cohorts are expected (Greiff et al., 2017a(Greiff et al., , 2017bMiho et al., 2019). Notably, we were able to recover clear differences between isotypes and cell populations (both with higher within-cohort and lower across-cohort similarity); additionally, we found that the antigen immunization cohorts have more distinct similarity profiles in the plasma cell populations (IgG) compared with the antigen-inexperienced cell populations ( Figure S5). The overall high similarity scores across the full immunological feature range are in agreement with our previous studies where we observed high similarity between these repertoires on a single feature basis (Greiff et al., 2017a).
Similarly, applying immuneREF to TCR repertoires of patients recovered from mild cases of COVID-19  revealed clusters of increased similarity within patients and cell populations ( Figure S5).
Application of immuneREF to >1,500 experimental blood immune repertoires indicates only small similarity-based differences between health and autoimmune disease Having established the sensitivity of our approach in detecting a wide range of differences between simulated repertoires  Figure S5) with respect to immunologically relevant and interpretable repertoire features, we set out to determine the similarity landscape of large-scale experimental TCR repertoire datasets. We evaluated 1,522 human TCR repertoires derived from peripheral blood mononuclear cells (PBMCs) of patients with varying and diverse immune states (PanImmune Repertoire Database (PIRD) dataset containing samples from healthy, rheumatoid arthritis (RA), and systemic lupus erythematosus (SLE) patients; Table S2). We found an even similarity landscape of overall high similarity scores ( Figure 4A). Similarity score distribution was also even in single features, which despite feature-specific differences, show overall high similarity scores between repertoires. We examined networks at three different similarity cutoffs (an edge is drawn between two repertoire nodes if their similarity is in 25%, 50%, and 75% top weights, respectively), and we found that in all three cases, no immune state-specific grouping could be observed ( Figure 4B).
The range of general and local similarities across all samples as well as within each disease cohort was evaluated using an analogous approach to that used for the simulated datasets ( Figures 4C and 4D). While the similarity scores ranged between $0.5 and 0.8 overall, the within-disease cohort spread varied, with the healthy and RA cohorts showing a more restricted range of similarity scores compared with a broader range for SLE ( Figures 4C and 4D).
To quantify per feature similarity and dissimilarity with respect to a reference dataset, we compared the repertoires identified as the ones best connected (highest local similarity) within their cohort to an immuneSIM reference repertoire (human, TRB, standard parameters; see STAR Methods) ( Figure 4E). The similarity scores of all tested immune states largely overlap with respect to the healthy reference repertoire, with convergence being the feature dimension with the largest dissimilarity, meaning there is almost no convergence between the RA or SLE samples and the reference.
Following our observations of high repertoire similarity within the PIRD dataset, we ran immuneREF on another large publicly available dataset (human, TCR) (Emerson et al., 2017)  Similarity score D Figure 3. The similarity landscape of simulated repertoires defines reference repertoires (A) Baseline similarity between replicates for repertoires simulated using default immuneSIM parameters (see Table S1) is R0.96 for five of six features, with the convergence feature being the exception by definition at %0.09. Bar graphs show mean SEM across replicates. (B) Repertoire similarity distribution in a condensed network across the various evaluated parameter range. Across cohorts, similarity scores have a broad range, whereas within cohorts the range is more restricted.
(C) Workflow to determine representative repertoires per cohort going from many-to-many to a one-to-one comparison.
(D) Local similarity distribution per species/receptor combination enables situating each repertoire based on its connectivity with respect to neighbors in the same cohort.
(E) Comparing repertoires with maximal local similarity in their cohort visualizes the commonalities between receptor types; here the Murine IgH repertoire with maximal local similarity serves as a reference repertoire. The plot visualizes the similarities of each non-reference repertoire to the Murine IgH reference. tients, 351 are from CMV-negative patients, and 26 are from patients with unknown CMV status. This dataset has previously been used to showcase immune state classification with high accuracy via the identification of CMV-associated public TCR sequences (sequences shared between individuals). In a similar fashion, immune state-associated public sequences were used to successfully classify RA and SLE samples from the PIRD dataset (Liu et al., 2019). As with the PIRD dataset, we observed high within and across immune state repertoire similarity ( Figure S5). This is in line with the findings of Emerson and colleagues as they found that only a small subset of clones (CMV-associated ones in Emerson et al., 2017) significantly differed in abundance between immune states (CMV+, CMV-) and that that shared antigen exposure to CMV led to a reduced number of shared TCRb clones, even after controlling for individual human leukocyte antigen (HLA) type, indicating a largely private response to a major viral antigenic exposure (Johnson et al., 2021).
In summary, the results of our analysis of human TCR repertoires strongly support the argument that the signal-to-noise ratios, where signal means repertoire features associated with disease status, are unfavorably tilted toward noise, where noise is defined as technological and immunological information, which cannot yet be linked to a given disease state.

Extensibility of immuneREF: Integration of gene expression with immune repertoire data
The mathematical structure of the composite network obtained from immuneREF allows the extensibility of the immuneREF framework to other features. As proof of principle of this immu-neREF capability, we show here an integrative analysis of immune repertoires and gene expression. This integration is of high interest to RNA-seq experiments that include both receptor and global transcript sequences, or even repertoire experiments paired with transcriptomics (Rubio et al., 2022;Song et al., 2021). Integration of immune repertoire with gene expression is challenging due to the multidimensional nature of both kinds of datasets and the discrepancy in their data structure. Previous attempts of integration are still over-simplistic, such as the calculation of correlation between the number of distinct CDR3 amino acid sequences and gene expression of some marker genes such as CD3, CD4, CD8, HLA class I, and class II genes (Brown et al., 2015).
immuneREF includes the option to evaluate similarity based on a gene expression matrix and add it to the composite   Table S1).
Cell Reports Methods 2, 100269, August 22, 2022 7 Article ll OPEN ACCESS network. Briefly, immuneREF first filters all genes with low variation between experimental conditions and then calculates the pairwise correlation between observations to construct a single gene expression feature (similarity matrix). Once the seven features (six from immune repertoires and one for gene expression) are calculated, they may be condensed into a multi-feature network as described above. Our solution for integrating receptors with gene expression confers immuneREF the advantage of overlaying dual biological information ( Figure S6A).
As an example, we analyzed bulk RNA-seq gene expression of pre-B cell line B3 from the published STATegra project (Gomez-Cabrero et al., 2019). This is a time-course experiment that collects samples at six time points using an inducible Ikaros system where B cell progenitors undergo growth arrest and differentiation ( Figure S6B). Principal-component analysis (PCA) showed clear differences at gene expression level when control and Ikaros groups were compared but also within the Ikaros group across time, being t0 the nearest to controls ( Figure S6B). To generate the single-feature similarity matrix of gene expression that better collects these differences, we tested the three available correlation-based methods implemented in immuneREF ( Figures S6C-S6E). All of them perfectly separated control (blue) and Ikaros (red) groups. Additionally, ''Pearson correlation'' and ''PCA scores'' nearly recovered correctly the time series pattern (purple to yellow degradation), while mutual rank matched perfectly.

DISCUSSION
Combining methods from both immune repertoire and network analysis, we have provided a framework for flexible referencebased quantification of immune repertoire similarity. Using ground truth simulations, we show that immuneREF is sensitive to inter-repertoire differences in all immunological features. Taking advantage of information theory, we showed using both simulated and experimental data that the features selected for immuneREF cover a large extent of immune repertoire biology. We introduced the concepts of full-network repertoire similarity and local similarity, which allow complementary quantification of the impact of the differences in the repertoire similarity landscape. Specifically, while the more general repertoire similarity evaluated on the entire network provides insight into the range of similarity within and across conditions, local similarity shows a particular advantage of the network approach, as the embedding of a repertoire in its neighborhood can markedly differ from what can be expected by its pairwise connections.
immuneREF not only provides a framework for measuring immune repertoire similarity but also for interpreting it. Specifically, it enables the creation of equivalence classes of immune repertoires lacking from existing methods. For example, once the similarity observed within a given set of experimentally obtained immune repertoires has been computed, such repertoires may function as reference points that in turn enable the interpretation of relative similarity in other repertoires (Figures 1 and 3C). Of note, the concept of diversity measures creating equivalence classes has been noted previously for Hill diversity measures (Greiff et al., 2015b) and is here extended to include additional repertoire features immuneREF unifies as single and composite features, frequency-dependent, and sequence-dependent similarity measures into one computational framework. Beyond quantifying the repertoire similarity of experimental immune repertoires, immuneREF also enables the comparison of simulated Marcou et al., 2017;Safonova et al., 2015;Weber et al., 2020;Marcou et al., 2017;Safonova et al., 2015;Weber et al., 2019) and in vitro synthetic immune repertoires used for therapeutic antibody discovery (Mason et al., 2018). Furthermore, immuneREF may be used for data curation purposes in immune repertoire databases such as iReceptor (Corrie et al., 2018), VDJserver (Cowell et al., 2015, PIRD (Zhang et al., 2019), and Observed Antibody Space (Kovaltsuk et al., 2018). Specifically, upon the integration of an immune repertoire into a database, the similarity of the repertoire with all other stored repertoires may be computed. Beyond immunological insight, immuneREF may reveal unexpected technological variation, thus motivating follow-up inspection (Barennes et al., 2021). Since immuneREF has been built to work across species, cell populations, and receptor types and experimental or simulated data (all-in-one comparative framework), it enables rapid distinction of cohort-specific and cohort-unspecific features. This is also important for comparative immunological approaches not centered on health versus disease comparison but, for example, the evolution of adaptive immunity (Pancer and Cooper, 2006).
The ease of use of the immuneREF approach opens new possibilities for large-scale comparative studies as shown on the PIRD dataset, which may yield additional insight into the challenges of predicting immune state based on repertoire profiling. Indeed, we found that the population average quantified by im-muneREF may ''conceal'' relevant immunological phenotype signals, despite the fact that the sensitivity of immuneREF was shown to be high in simulated and experimental data (Figures 2  and S5). Given the lack of large-scale (antigen-specific) data, it remains unclear how the information of the immune state is distributed across immunological features. Specifically, our finding-that repertoire similarity does not differ across immune states-is strictly only valid for unsorted PBMC TCR repertoire data as examined in this study. As known from previous studies (Amoriello et al., 2020(Amoriello et al., , 2021Csepregi et al., 2021;Ghraichy et al., 2021;Greiff et al., 2017b;Li et al., 2020;Ota et al., 2022;Riedel et al., 2020;Rosati et al., 2021), different cell populations (in different lymphoid organs) may behave in a highly different manner ( Figure S5). On the other hand, it did not escape our attention that this broad similarity in human blood samples might suggest the maintenance of lymphocyte homeostasis even in the event of chronic disease.
Our results reinforce the notion that while some diseases may introduce abnormalities into the immune repertoire, others result in a comparatively normal one (Bashford-Rogers et al., 2019), a result that suggests the absence of a signature unique to health. If this is true, then blood-based immune repertoire diagnostics will require even more advanced methodologies to be developed (Arnaout et al., 2021;Dahal-Koirala et al., 2022;Widrich et al., 2020a). For example, for simulated repertoires, motif implants in R10% of sequences were required to affect the amino acid frequency and architecture features, suggesting that even in the case of high clonal expansion, the impact on the repertoire might not be sufficient to significantly change major repertoire features. This is reinforced by results showing that the diseasedriving response in multiple autoimmune diseases is only to a small part antigen specific (Christophersen et al., 2019;Dahal-Koirala et al., 2022). More generally, our paper advances the state of the art of the immune repertoire field by changing the null hypothesis. Specifically, currently, the predominant thinking is that any immune state changes measurably the immune repertoire in a systematic fashion. Our paper challenges this view by finding that, a priori, we should not expect to see differences (Figures 4 and S10), and any substantial change must be proven. This change of perspective is highly valuable to the field as it pushes it toward more sensitive and robust approaches to immune repertoire and machine learning analysis (Arnaout et al., 2021;Kanduri et al., 2021;Slabodkin et al., 2021). Specifically, the usefulness of global features for diagnostics is severely limited, and to detect single-sequence-level differences (Emerson et al., 2017;Kanduri et al., 2021;Widrich et al., 2020a), single-sequence-level statistical and machine learning approaches are needed Schattgen et al., 2021).
In the future, ultra-deep (Briney et al., 2019;Soto et al., 2019Soto et al., , 2020 and population-wide, large-scale immune repertoire projects such as Human Vaccines Project (Crowe and Koff, 2015) may benefit from using immuneREF for identifying immune event-driven aberrations from a baseline repertoire similarity. Furthermore, large-scale database initiatives such as the iReceptor gateway (Corrie et al., 2018) may benefit from immu-neREF functionality for on-the-fly computation of inter-dataset similarity.
Limitations of the study Although we consider the usefulness of the six chosen features to be established (Figures 2 and S3G-S3J), we concede that the asymptotic nature of the sufficiency calculation leaves the door open to the introduction of additional features. The proposed set of immuneREF features denotes in this sense a minimally sufficient set for the analysis of immune repertoire datasets. It ensures sufficient coverage of the major variationintroducing aspects. It is for that reason we devised immu-neREF as inherently modular, allowing single-and multi-feature analysis as well as encouraging the addition of new features relevant for particular problems such as transcriptome analysis (Schneider-Hohendorf et al., 2018; Figure S6), HLA typing for TCR studies (DeWitt et al., 2018;Emerson et al., 2017;Francis et al., 2021), single-cell omics information Setliff et al., 2019;Sturm et al., 2020;Yermanos et al., 2021), gene-specific substitution profiles for somatic hypermutation analysis (Sheng et al., 2017), lineage-specific information (Hoehn et al., 2016(Hoehn et al., , 2021, and antigen-specific and antigenassociated motifs identified by sequence clustering and machine learning (Akbar et al., 2019;Dash et al., 2017;Friedensohn et al., 2020;Glanville et al., 2017;Greiff et al., 2017b;Horst et al., 2021;Mason et al., 2019;Mayer-Blackwell et al., 2021;Meysman et al., 2018;Quiniou et al., 2020;Sidhom et al., 2019;Wong et al., 2020;Yohannes et al., 2021). In particular, a future extension of immuneREF may be a feature that reliably identifies antigen-specific sequences, thus increasing the amount of immune information recovered. More generally, adult repertoires are very complex and contain hidden informa-tion of many antigens at different time points that might have been shared by different individuals. For instance, repertoire fingerprints of influenza infection might be present on most studied individuals and could explain the difficulty to distinguish healthy and diseased individuals. New features including (single-cell-based) antigen specificity patterns may help separate shared infection marks on the immune repertoire.

STAR+METHODS
Detailed methods are provided in the online version of this paper and include the following:  Evenness profiles were calculated as described previously (Greiff et al., 2015b) on the CDR3 nucleotide level. Briefly, we calculated the Hill-diversity for alpha values 0-10 in steps of 0.1 with alpha = 1 being defined as the Shannon evenness. Each entry in the profile varies between z0 and 1, where higher values indicate an increasingly uniform clonal frequency distribution. We determined evenness profiles for each repertoire and evaluated cross-repertoire evenness similarity by Pearson correlation of the repertoires' evenness profiles as described previously (Amoriello et al., 2020;Greiff et al., 2015bGreiff et al., , 2017a. immuneREF feature: Positional amino acid frequencies The positional amino acid frequencies were calculated separately for each CDR3 sequence length. To decrease bias by extraordinarily short or long CDR3 sequences, we limited this analysis to a range of the most common lengths (8-20 amino acids) (Greiff et al., 2017a;Raybould et al., 2019). Briefly, per position amino acid frequencies were calculated for each length. Subsequently, the resulting per length frequency vectors of each repertoire were Pearson-correlated by length and the mean correlation was calculated. Unlike in the case of k-mer occurrences, no positions are excluded, making AA frequency more sensitive to VDJ usage perturbations. Relative frequencies were used for all positional amino acid frequency calculations.
immuneREF feature: Sequence similarity network architecture As previously described (Miho et al., 2019), we constructed a sequence similarity network for each immune repertoire: nodes represent amino acid CDR3 sequences connected by similarity edges if they had a Levenshtein Distance of 1 (LD = 1). The igraph R package was used to calculate networks (v.1.2.4.1, Csardi and Nepusz, 2006), that were analyzed with respect to four measures representing different aspects of network architecture: (i) cumulative degree distribution, (ii) mean hub score (Kleinberg hub centrality score), (iii) fraction of unconnected clusters and nodes and (iv) percent of sequences in the largest connected component. An LD = 1 network was constructed for each repertoire and the similarity between the repertoires' resulting network was evaluated with respect to their differences in the cumulative degree distribution, mean hub-score, outlier sequence occurrence, and largest network components; these metrics have been shown to be defining repertoire characteristics that are robust to subsampling (Miho et al., 2019). The similarity of the architecture between two repertoires A and B was calculated as the mean of four components: (i) the cumulative degree distribution (Pearson correlation between repertoires), (ii) mean hub scores ð1 À jMeanHubScore A À jMeanHubScore B Þ, (iii) the fraction of unconnected components, and (iv) the fraction of sequences in the largest component ð1 À jPercLargestComponents A À PercLargestComponents B jÞ. Unlike many of the other features, the network feature combines multiple single measures, which rendered it difficult to perform Pearson correlation analysis involving all four investigated network measures. Therefore, we adopted the network feature comparison approach described above.
immuneREF feature: Repertoire overlap (convergence) The pairwise repertoire clonal overlap (clones defined based on 100% similarity of CDR3 amino acid sequence), was calculated across repertoires, as previously described (Greiff et al., 2017b): This clonal sequence overlap measure represents the similarity value between repertoires with respect to clonal convergence.
immuneREF feature: Germline gene diversity The relative frequency of germline genes (defined by the ImmunoGenetics Database, IMGT) (Giudicelli et al., 2004) across clones in each repertoire was calculated for each repertoire depending on species and immune receptor class (Ig, TR). The germline gene usage allows insight into deviations from a baseline recombinational likelihood and thereby captures the potential impact of disease, vaccine, or other events on the immune state (Avnir et al., 2016;Greiff et al., 2017a). To determine germline gene usage similarities, we examined the V-and J-gene frequencies across clones for each individual. The Pearson correlation coefficient was determined for each of the frequency vectors (V-, D-, J-gene) with entries of all IMGT variants in a pairwise fashion between samples as described previously (Greiff et al., 2017a;Weber et al., 2019). Specifically, the correlations are calculated per germline gene, leading to separate V_cor, D_cor, J_cor values (and additionally VJ_cor for each V_J combination). The resulting correlation values are combined into a single value by calculating a weighted mean of these components. The weight vector used for the results in the manuscript is c(V = 1,D = 1,J = 1,VJ = 0).
immuneREF feature: Gapped k-mer occurrence For a given k-mer size k and maximal gap length m, the nucleotide-based gapped-pair-k-mer occurrences were counted for all gap sizes % m (Palme et al., 2015). The parameters k and m were chosen based on previous research (Greiff et al., 2017b), where defining parameters k = 3, m % 3 was shown to lead to an encoding sufficient for sequence classification. The counts were normalized by the total number of gapped k-mers found across all gap sizes such that short-gap gapped-k-mers were weighted higher than larger gap sizes. While the amino acid frequency distribution contains positional information, the gapped k-mer occurrence represents shortand long-range sequential information encoded in the repertoire. We counted the occurrence of gapped k-mers (k = 3, m % 3) across Article ll OPEN ACCESS all CDR3 sequences of a repertoire and correlated the resulting distributions between repertoire pairs using Pearson correlation as described previously .
immuneREF feature: Transcriptome integration In order to keep the most informative genes from the genes obtained in a transcriptome experiment, immuneREF firstly applies a low variation filter (Hackstadt and Hess, 2009). Specifically, the standard deviation (SD) is calculated per gene across samples, and all genes above a certain threshold (default, SD > 1) are preserved for subsequent analysis.
To construct the gene expression feature similarity matrix, the Pearson correlation was calculated between samples. Additional approaches for the calculation of the gene expression feature similarity matrix implemented in the immuneREF package (mutual rank, PCA) are described in the package documentation.
Calculating repertoire similarities per feature The calculation of the similarity values between a pair of repertoires was performed in a feature-specific manner as described in the methods section of each feature.

Repertoire similarity -Condensing features into a composite network
The single features are condensed into a multi-feature network by taking the mean of all single-feature similarity values resulting in a single repertoire similarity value. The resulting condensed network represents a weighted composite of the single-feature similarity networks. Additional approaches to obtain a composite network (max similarity, min similarity, SNF (Wang et al., 2014)) are implemented in the R-package as described in the package documentation.

Mutual information
Mutual information is a measure that quantifies to what extent one random variable explains another. Mutual information was defined as IðX; YÞ = HðXÞ À HðXjYÞ = X x;y P XY ðx; yÞlog P XY ðx; yÞ P x ðxÞP y ðyÞ Where, H(X) is the marginal entropy, H(X|Y) the conditional entropy P XY the joint probability distribution of X and Y and P X and P Y the respective marginals. Mutual information was calculated using the R packages entropy (v.1.2.1, Hausser, 2014) and infotheo (v.1.2.0, Meyer, 2014). The values were normalized to the range [0,1] by dividing the mutual information by the sum of the entropies H(X)+H(Y). This normalized mutual information, also known as redundancy, is zero when both are independent and maximal when knowledge of one of the variables becomes redundant given the other.

Quantification of mutual information across ensembles of repertoire features
The mutual information between two features was calculated across all values in the similarity matrix, whereas the similarity matrix represents all pairwise similarity values between repertoires for a given feature. For the V(D)J diversity feature, values were set to zero by definition (i.e., the similarity between repertoires of different species/receptors) and were excluded from this calculation. We ensembled immune information captured by the repertoire features ( Figure 2) as the extent to which repertoire features collectively cover immune repertoire complexity. Specifically, we evaluated the change in mutual information between subsequently added features. Features were added one by one (1-feature network / 2-feature network, 2-feature network / 3-feature network, and so forth, where n-feature means n features combined into a composite network), with the next feature to be chosen randomly (500 permutations of feature combinations per ''n-features / n+1-feature'' step).

Local repertoire similarity
To determine a single value measure for how connected a repertoire is within a subgraph (e.g. the repertoires of healthy human IgH repertoires and the similarity values between them), we defined the local similarity measure. It is calculated by dividing the node strength of each repertoire within a subgraph (sum of all edge weights connecting it to the other nodes in the subgraph) by the sum of all node strengths in the subgraph.
Local similarity gives the ratio of node strength that is connected to each repertoire in a subgraph and thus allows the identification of the most and least representative node of any category (the one most and least strongly connected within that category, respectively, see Figure 2C). The local similarity is dependent on the number of nodes within the subgraph and is therefore only used to compare repertoires within the subgraph. To enable comparison of local similarity values across different subgraphs, local similarity can be scaled by dividing by the number of nodes in the subgraph to correct for varying subgraph sizes in cases where the number of repertoires per subgraph differs.