Relevance of Stereotyped B-Cell Receptors in the Context of the Molecular, Cytogenetic and Clinical Features of Chronic Lymphocytic Leukemia

Highly homologous B-cell receptors, characterized by non-random combinations of immunoglobulin heavy-chain variable (IGHV) genes and heavy-chain complementarity determining region-3 (HCDR3), are expressed in a recurrent fraction of patients affected by chronic lymphocytic leukemia (CLL). We investigated the IGHV status of 1131 productive IG rearrangements from a panel of 1126 CLL patients from a multicenter Italian study group, and correlated the presence and class of HCDR3 stereotyped subsets with the major cytogenetic alterations evaluated by FISH, molecular prognostic factors, and the time to first treatment (TTFT) of patients with early stage disease (Binet A). Stereotyped HCDR3 sequences were found in 357 cases (31.7%), 231 of which (64.7%) were unmutated. In addition to the previously described subsets, 31 new putative stereotypes subsets were identified. Significant associations between different stereotyped HCDR3 sequences and molecular prognostic factors, such as CD38 and ZAP-70 expression, IGHV mutational status and genomic abnormalities were found. In particular, deletion of 17p13 was significantly represented in stereotype subset #1. Notably, subset #1 was significantly correlated with a substantially reduced TTFT compared to other CLL groups showing unmutated IGHV, ZAP-70 or CD38 positivity and unfavorable cytogenetic lesions including del(17)(p13). Moreover, subset #2 was strongly associated with deletion of 13q14, subsets #8 and #10 with trisomy 12, whereas subset #4 was characterized by the prevalent absence of the common cytogenetic abnormalities. Our data from a large and representative panel of CLL patients indicate that particular stereotyped HCDR3 sequences are associated with specific cytogenetic lesions and a distinct clinical outcome.


Introduction
Chronic lymphocytic leukemia (CLL) is a common disorder characterized by the monoclonal accumulation of B lymphocytes with a distinct phenotype (CD5-positive, CD23-positive, CD22negative and low level of surface Ig) and a highly variable clinical course [1][2][3]. A different clinical outcome has been associated with peculiar cellular and molecular markers and/or specific genomic alterations [4][5][6]. In particular, the mutational status of the immunoglobulin heavy-chain variable (IGHV) gene defines two disease subgroups; one subgroup is characterized by the absence of somatic mutation in CLL cells and has the worst clinical course and outcome, whereas the other, with somatic mutations in IGHV genes, has a more benign prognosis and outcome [7,8]. A biased repertoire of IGHV-diversity (D)-joining (J) genes has been reported to characterize the B cell receptor (BCR) in CLL, with a different prevalence of certain genes in the mutated (M) or unmutated (UM) group, respectively [9]. Moreover, more than 20% of CLL patients exhibit closely homologous (''stereotyped'') heavy chain complementary-determining region 3 (HCDR3) sequences and approximately 1% of these also carry virtually identical IGHV amino-acid sequences [10][11][12][13]. These findings have suggested that clones sharing stereotyped BCRs may expand because of stimulation by a restricted set of epitopes and that antigenic driving may play an important role in the pathogenesis of the disease [6,[14][15][16].
Although recent data have suggested the existence of specific correlation between stereotyped subsets and common cytogenetic lesions [17,18] or clinical outcome [10,13,19,20], it remains to be defined whether the expression of distinct BCRs in CLL may be relevant to the molecular and cytogenetic profile and/or to the clinical outcome in at least a fraction of patients.
In the present study, we investigated the BCR repertoire in 1126 CLL patients recruited by a multicenter Italian study group. Based on previously reported criteria [10,21] and canonical sequence alignment procedures, we searched for the known stereotyped subsets in three publicly available data sets [10,22,23], as well as for potential novel subsets by performing a pair-wise alignment in the proprietary dataset. The most represented stereotyped subsets were then investigated for their association with the common molecular and cytogenetic features as well as for their impact on clinical outcome of early stage patients (Binet A).

Patient samples
Written informed consent was obtained from all patients in accordance with the declaration of Helsinki and the study was approved by the local Ethics Review Committee (Comitato Etico Provinciale, Modena, Italy). All patients were diagnosed according to the National Cancer Institute Working Group criteria [24]. Our dataset counted a total of 1126 CLL patients with productive IGHV-D-J rearrangement included in retrospective (745 patients) and prospective (381 patients, O-CLL1-GISL protocol) multicenter Italian studies from all over the country. In all cases genomic, cytogenetic and molecular analyses were performed on highly purified peripheral mononuclear B-cells from blood samples collected within one year of diagnosis, provided that the patient remained untreated.

Molecular and FISH analyses
CLL IGHV gene usage and mutation were determined as previously described and the 98% homology cut-off value was used to discriminate the M or UM IGHV configuration [9]. ZAP-70 and CD38 expression were investigated by immunophenotypic analysis as previously described [25][26][27]. Specifically, a cut-off $20% or $30% positive cells was chosen to discriminate ZAP-70 or CD38 positive from negative patients. Cytogenetic abnormalities involving deletions at chromosomes 11q23, 13q14 and 17p13 and trisomy of chromosome 12 were investigated by fluorescence in situ hybridization (FISH) as previously described [28]. FISH analyses were performed in all of the patients for whom biological material was available, and no prior selection based on age or disease progression was applied. Time to First Treatment (TTFT) was defined as time from diagnosis to first line treatment (event) or to last follow-up (censored observation). Treatment was decided uniformly in all participating centers based on documented progressive and symptomatic disease according to NCI working guidelines [24]. TTFT information was available for 739 patients (661 staged as Binet A; 56 as Binet B and 22 as Binet C), median follow up was 30 months (range 1-316 months), and 237 (32.1%) patients had received treatment by the end of the follow up.

Identification of stereotyped subsets and statistical analysis
We assigned a stereotyped cluster label to our HCDR3 sequences by means of pair-wise alignment with known stereotyped sequences available from different public databases [10,22,23]. In concordance with previously proposed methods, we applied a primary filter excluding pairs of sequences whose length differed more than 3 amino acids and we considered as stereotyped those sequences sharing more than 60% identity on alignments showing less than 3 gaps [10,29]. Such analysis was performed using a global alignment algorithm [30] with BLOSUM62 as the similarity matrix [31] under the BioStrings package for Bioconductor. The same approach was applied to discover new potential stereotyped clusters with pair-wise alignments of the sequences from the proprietary database. A supplemental ''GX'' number was assigned to novel putative subsets not previously included in the Murray et al and Bomben et al nomenclature system [22,23]. All contingency analyses were performed by Fisher's Exact test. The competing effect of death on the relationship between TTFT and stereotyped BCRs was modeled by proportional hazards of competing risks. Correlation with TTFT was tested between the considered groups in Binet A patients using the crr function of cmprsk package in R software [32]. A P value ,0.05 was considered significant for all statistical calculations. All data were statistically analyzed using conventional procedures available in R (www.r-project.org).

Identification of stereotyped sequences
To identify stereotyped HCDR3 sequences occurring in our dataset, we performed a global alignment analysis which allowed (a) to compare each of our cases with publicly available data [10,22,23] and (b) to investigate the occurrence of new putative stereotypes within the proprietary database. Using this approach, stereotyped sequences were found in 357/1126 (31.7%) of the patients, 64.7% (231/357) of which were UM (P,0.0001), further supporting previous evidence [10,22,23].

Correlation between cellular and molecular features with stereotyped BCRs in CLL
We then evaluated the prevalence of known molecular, biological and cytogenetic markers in the most represented and characterized stereotyped subsets of our dataset ( Table 2). Subset #1 (IGHV1-5-7/IGHD6-19/IGHJ4) was the most frequent in our cohort. Despite the relatively heterogeneous gene usage, it was associated with UM IGHV genes in all of the cases (32pts). Subset #1 patients were more frequently ZAP-70 and CD38 positive compared to all the other patients (P,0.0001) or those utilizing the same IGHV genes without stereotyped HCDR3 (P = 0.0002 and P,0.0001 for ZAP-70 and CD38, respectively). However, no significant association between subset #1 and ZAP-70/CD38 positivity was identified when compared to patients with the UM IGHV configuration (data not shown). Considering only patients (18/32;56.3%) evaluated by FISH, we observed a higher prevalence of unfavorable deletions (7/18; 38.9%), particularly del(17)(p13) (5/18; 27.8%) ( Table 2). Notably, the prevalence of del(17p)(p13) in subset #1 patients was significantly higher than that found in all the remaining patients (30/686;4.3%) (P = 0.0012). To avoid that the prevalent UM status in subset #1 may represent a bias factor, we compared the frequency of del(17)(p13) between subset #1 and all the remaining UM patients (22/244; 9%) confirming the previous association (P = 0.0266). Furthermore, comparing subset #1 patients with those showing the same IGHV gene usage without homologous HCDR3, we observed that del(17)(p13) retained its significant correlation (P = 0.0064). The percentage (median 78%: range 33.5-99) of malignant cells carrying the del(17)(p13) in subset #1 patients did not differ significantly in the remaining ones (30pts) having the del(17)(p13) (data not shown).

Clinical relevance of stereotyped subsets
We investigated whether the stereotype configuration was correlated with disease progression. For this purpose only Binet A patients were considered. Subset #1 (13 patients, see Table 2) exhibited a significantly reduced TTFT when compared to all UM patients (P,0.0001), UM non-stereotyped patients (P,0.0001), or UM patients with the same IGHV gene usage (P,0.0001) ( Figure 1A-C). Additionally, subset #1 was associated with an increased risk of earlier treatment compared to the presence of ZAP-70 and CD38 positivity (P,0.0001) ( Figure 1D-E). The clinical course of subset #1 patients appeared to be similar to that of patients with del(17p) or del(11q) (Figure 2). A multivariate analysis using a proportional hazard model showed that subset #1 retained independence (P = 0.0132) from del(17p), del(11q), IGHV configuration and ZAP-70 or CD38 positivity. Moreover, considering only UM patients, the prognostic power of subset #1 maintained its significant independence from ZAP-70, del(17p), del(11q) and CD38 (P = 0.0102).
Patients in subset #4 were characterized by a long TTT. In fact, only 2/19 patients in Binet A had been treated at the time of the study: of these, one was highly positive for ZAP-70 and the other was the only one with UM IGHV configuration.

Discussion
In order to contribute to the elucidation of HCDR3 stereotyping in CLL, we characterized the BCR repertoire in a comprehensive panel of 1126 patients with the following aims: (a) to investigate whether HCDR3 stereotyped sequences might be correlated with molecular and cytogenetic profiles; and (b) to evaluate the putative clinical relevance in terms of TTFT for the most represented stereotypes.
In our study, we identified a total of 31.7% stereotyped HCDR3 sequences using an amino acid sequence alignment approach according to previously reported criteria [10]. This percentage was slightly higher than those reported to date [10,22,23], a finding in all likelihood related to the assessment of proprietary database stereotypes against previously reported ones. In fact, limiting the analysis only to patients included in our cohort, the percentage decreased to 28.3%, as 31 novel putative stereotyped sequences were identified after comparison with the published registries. Therefore, such a procedure (auto-matching and matching with published data) may represent an optimal and unbiased approach to perform stereotyped BCR characterization in CLL.
Our study revealed that subset #1, known to be the most frequent (9% of all stereotyped cases and 7.9% of all UM patients in our series) and characterized by UM IGHV configurations, was significantly associated with del(17)(p13). Notably, subset #1 exhibited a more unfavorable clinical course than other patients with an UM IGHV configuration, independently of the presence of other adverse prognostic factors, such as del(17)(p13), del(11)(q23), ZAP-70 and CD38 positivity or the usage of IGHV genes. The finding that subset #1 shows the worst clinical outcome as found in patients exhibiting 11q23 or 17p13 deletion suggests that it might represent a reliable marker for high risk CLLs in the early stage of the disease.
As regards subset #2 (IGHV3-21), we confirmed its more unfavorable clinical outcome. However, differently from previously reported data [13], we did not observe a significant difference in the geographical distribution of IGHV3-21 across Italy in our cohort of patients. Moreover, we found the presence of del(13)(q14) in virtually all patients tested by FISH (12/13); this finding is in accordance with data recently published by Marincevic et al [18], suggesting that this association could be considered subset-specific. In addition, we did not observe a strong association between del(11)(q23) and subset #2 as described by the same authors [18]. This discrepancy could be partially explained by the lower number of subset #2 patients analyzed by FISH in our study. However, it should be noted that all but one of the subset #2 cases in our panel were Binet A, whereas 70% of those from Marincevic et al. [18] were either in advanced clinical stages or no information was provided, thus preventing any definitive comparison.
In our study, we described a recurrent favorable cytogenetic profile and the indolent course in subset #4 patients. This finding is in agreement with data reported by some authors [10,18], but not by others [17], leaving this aspect still controversial. Finally, we confirmed that trisomy 12 was correlated with IGHV4-39 stereotyped HCDR3 subsets #8 and #10 [17,20] showing that these two subsets were particularly associated with higher CD38 expression.
In conclusion, our study indicates that distinct stereotyped HCDR3 regions of BCR in CLL are characterized by specific cytogenetic and/or molecular profiles and clinical course. Further validation in larger and prospective series of patients may help to better clarify distinct biological and clinical features of specific stereotyped subsets.