Single nucleotide polymorphism array‐based signature of low hypodiploidy in acute lymphoblastic leukemia

Abstract Low hypodiploidy (30–39 chromosomes) is one of the most prevalent genetic subtypes among adults with ALL and is associated with a very poor outcome. Low hypodiploid clones can often undergo a chromosomal doubling generating a near‐triploid clone (60–78 chromosomes). When cytogenetic techniques detect a near triploid clone, a diagnostic challenge may ensue in differentiating presumed duplicated low hypodiploidy from good risk high hyperdiploid ALL (51–67 chromosomes). We used single‐nucleotide polymorphism (SNP) arrays to analyze low hypodiploid/near triploid (HoTr) (n = 48) and high hyperdiploid (HeH) (n = 40) cases. In addition to standard analysis, we derived log2 ratios for entire chromosomes enabling us to analyze the cohort using machine‐learning techniques. Low hypodiploid and near triploid cases clustered together and separately from high hyperdiploid samples. Using these approaches, we also identified three cases with 50–60 chromosomes, originally called as HeH, which were, in fact, HoTr and two cases incorrectly called as HoTr. TP53 mutation analysis supported the new classification of all cases tested. Next, we constructed a classification and regression tree model for predicting ploidy status with chromosomes 1, 7, and 14 being the key discriminators. The classifier correctly identified 47/50 (94%) HoTr cases. We validated the classifier using an independent cohort of 44 cases where it correctly called 7/7 (100%) low hypodiploid cases. The results of this study suggest that HoTr is more frequent among older adults with ALL than previously estimated and that SNP array analysis should accompany cytogenetics where possible. The classifier can assist where SNP array patterns are challenging to interpret.


| INTRODUCTION
Acute lymphoblastic leukemia (ALL) is characterized by recurrent chromosomal abnormalities within the leukaemic blasts that are prognostic even in the era of measurable residual disease-adapted treatment protocols. [1][2][3][4] Large non-random ploidy shifts define three distinct primary genetic subtypes of ALL: High hyperdiploidy (51-67 chromosomes), near-haploidy (23-29 chromosomes), and low hypodiploidy (30-39 chromosomes). 5 High hyperdiploidy (HeH) occurs in one-third of childhood cases and is associated with a favorable outcome. 4 In contrast, near-haploidy and low hypodiploidy are rare in childhood ALL (<2% each) and are associated with a very poor outcome. [6][7][8] The frequency of low hypodiploidy increases with age, occurring in >5% adult cases and is the second most prevalent chromosomal abnormality (>10%) among older adults (>60 years); whereas near-haploidy is virtually non-existent in adult ALL. 2,[9][10][11] In adults, low hypodiploidy is associated with a very poor outcome even when the patients are treated as high risk. 2,3,10,12 The pattern of chromosomal loss in low hypodiploidy is variable but non-random. Chromosomes 3,7,15,16,17 are lost most frequently while chromosome 21 is always retained. 5 Cases of low hypodiploidy commonly present with a co-existing near-triploid clone with 60-78 chromosomes, 9,12 and the genetic subgroup is therefore termed HoTr hereafter. The pattern of chromosomal loss/gain and the duplication of structurally rearranged chromosomes provide evidence the two clones are related and that low hypodiploidy is the primary event. 13 The mechanism by which the low hypodiploid clone doubles is thought to be a process of chromosomal endo-reduplication without subsequent cytokinesis thereby creating leukaemic blasts with a near triploid karyotype of 60-78 chromosomes. Cytogenetic analysis of 115 paediatric HoTr cases from the Children's Oncology Group revealed the duplicated clone to be present in 76 (66%) cases. 7 In some cases, cytogenetic analysis reveals only a near-triploid clone with a pattern of chromosome gain (i.e., frequent tetrasomies and duplicated structural abnormalities) suggestive of a low hypodiploid origin. 7,12 In such cases, distinguishing between HoTr and HeH rests on the modal chromosome number and pattern of chromosome gains; potentially generating a diagnostic dilemma. 5,7,9 A very high proportion (90%) of HoTr cases harbor pathogenic TP53 mutations which are usually germline in paediatric cases. [14][15][16][17] Although HoTr and near-haploidy share some features (e.g., chromosome loss and clonal doubling) the distinct mutational profile and age distribution indicate that they are distinct subgroups. 5,9,14,18 The rapid and accurate identification of HoTr is crucial in both adult and childhood ALL to assign patients to the optimal therapy.
Historically, cytogenetic and FISH analyses have formed the basis of leukemia genetic testing but recently genomic techniques have emerged and are used to supplement or replace traditional methods. 19,20 SNP arrays are very useful for detecting large ploidy shifts and loss of heterozygosity (LOH). [19][20][21] LOH is a common finding in neoplastic clones and can be a manifestation of monosomy or multiple copies of the same chromosome. 22,23 The hallmark of HoTr by SNP array is widespread LOH in all chromosomes at the lower copy number state, 13,24 reflecting LOH arising from chromosomal loss. A similar pattern is seen in cases presenting with a near triploid clone alone, consistent with the prevailing hypothesis that this has arisen by endoreduplication. 12 In comparison, HeH ALL typically shows preserved heterozygosity in the majority of chromosomes with single additional maternal or paternal homologues in most chromosomes at the higher copy number state. 25 LOH can be seen in HeH but affected chromosomes have at least the same copy number state as preserved heterodisomies as chromosomal loss has not occurred. 25 Despite the wealth of SNP array data that exists for ALL, few cases of HoTr have been included due to the bias toward paediatric ALL and the rarity of the subgroup. 19,20 This study combines cytogenetic and SNP array data to highlight the challenge of detecting this clinically relevant subgroup. We report a novel approach to analyzing SNP array patterns from highly aneuploid samples and in addition, develop and validate a classifier to help distinguish between HoTr and HeH using SNP array patterns when accompanying cytogenetic analysis is not available.

| METHODS
We identified patients and samples from the Leukaemia Research Cytogenetics Group (LRCG) database, as previously described, 26 and from the Northern Genetics Service, Newcastle-upon-Tyne Hospitals NHS Foundation Trust. Patients were enrolled on UKALL14, UKALL60+, UKALLXII, UKALL2011, or UKALL2003 trials giving informed written consent for treatment and genetic studies. Cytogenetic and FISH analyses were performed in and reported from regional genetic laboratories across the UK. Karyotypes and surplus material were collected for central review and additional testing. Karyotypes were described according to the International System for Human Cytogenetic Nomenclature (ISCN) and, for consistency and clarity, were always reported relative to the diploid (2n) state. Fixed cells or DNA from pre-treatment diagnostic bone marrow were used for all analyses reported in this study; except where explicitly stated otherwise. SNP arrays were performed using the Illumina CytoSNP 850k (Illumina, San Diego, CA, USA) or Affymetrix Cytoscan HD array (Affymetrix, Santa Clara, CA, USA) in accordance with the manufacturers' protocols. Briefly, oligonucleotide probes were hybridized to regions across the genome generating log2 ratios of observed to expected probe intensity from internal platform-specific reference datasets, as previously described. 22,24,27,28 Illumina-generated IDAT files were first processed using the

| Creation of whole-chromosome copy number segments
All SNP array analyses were performed using the Nexus. Microarray intensities were median centered with positive or negative deflections representing relative gains or losses of genetic material respectively. A standard analysis of SNP array patterns was performed in Nexus by examining log2 ratio and B-allele frequency traces independently of cytogenetics. 21 In isolation, SNP arrays cannot resolve exact copy number states, particularly in samples with mixed clonal populations as all cellular context is lost. Therefore, each SNP array was assigned a descriptive label of (a) widespread LOH in chromosomes at the lower

| Unsupervised clustering of standardized whole chromosome log2 ratios
To assess whether standardized whole chromosomal log2 ratios produced distinct low hypodiploid, near triploid and high hyperdiploid signatures, unsupervised hierarchical clustering, and principle components analysis (PCA) were performed using the R-packages ComplexHeatmap 30 and prcomp, 31 respectively, (code available at https://github.com/tcreasey/ALL_ploidy_classifier.git). R-package FSelector 29 was used to identify the whole chromosomal log2 ratios that contributed the most information (information gain) to the separation of the clusters. SNP array findings were then used to resolve any discrepancies between the cytogenetic diagnosis and the clustering analyses, to establish the most plausible ploidy subgroup.

| TP53 sequencing
For additional confirmation where SNP array findings conflicted with cytogenetics, TP53 was sequenced in selected samples. A SureSelect XT2 kit (Agilent, Santa Clara, CA, USA) was used to capture coding regions of genes frequently mutated in ALL (Supplementary Table 1

| External validation of the classifier
The classifier was externally validated using SNP array data from a cohort of 29 childhood ALL samples from Children's Cancer Research Institute (Vienna, Austria). The cohort comprised near haploidy (n = 8), HoTr (n = 7), HeH (n = 7), ETV6-RUNX1 (n = 2), TCF3-PBX1 (n = 1), KMT2-AFF1 (n = 1), B-other ALL (n = 3). SNP arrays were performed and analyzed using the Affymetrix Cytoscan HD array and Chromosome Analysis Suite (ChAS) (Affymetrix, Santa Clara, CA, USA). KN extracted whole chromosome log2 ratios for each chromosome and sent the data blind to TC. TC standardized the data as described above before using the classifier to call each case as HoTr, HeH, or non-ploidy based on standardized chromosomal log2 ratios alone. Results were returned to KN who un-blinded the data.

| Patient demographics, cytogenetics and SNP array interpretation
Our initial cohort comprised 88 cases identified as HoTr (n = 48) or HeH (n = 40) at initial diagnosis by either cytogenetics/FISH (n = 73) or SNP array (n = 15) by accredited diagnostic cytogenetic laboratories across the UK (Supplementary Table 2). Of those with karyotypes available (n = 57), additional structural chromosomal abnormalities were present in 39% (9/23) of low hypodiploid, 65% (11/17) of near triploid, and 41% (7/17) of high hyperdiploid clones. There were 55 adults and 33 children/adolescents. Although our cohort is selected in favor of HoTr cases, it is noteworthy that these patients were older within both the adult and paediatric cohorts: Mean 54.6 versus 44.6 years (p = 0.004) and 13.9 versus 4.7 years (p < 0.001); reflecting the disparate age-specific frequencies of the subtypes (Figure 1).    Supplementary Figure 6). The remaining 17 cases had inconclusive SNP array profiles (Figure 3(B)).

| Development and validation of ploidy classifier
To explore whether whole chromosome log2 ratios could be used to develop a ploidy classifier, we performed a CART analysis with an additional cohort of 72 patient samples spanning genetic subgroups lacking a primary ploidy shift. Prior to running the CART analysis, we re-classified the four confirmed discrepant cases (#26910, #27478, #29491, and #27058) ( Table 1) in line with SNP array and TP53 findings. We also re-classified the case with IGH-CRLF2 (#28893) into the non-ploidy subgroup as the underlying primary genetic lesion was clearly distinct from both HoTr and HeH. Thus, the final CART analysis cohort comprised 50 HoTr, 41 HeH (including three with both BCR-ABL1 and HeH) and 69 non-ploidy patients (Supplementary Table 2). A decision tree based on the complete dataset (n = 160) was derived from the CART analysis and identified the log2 ratios of chromosomes 1, 7, and 14 as the key discriminators of the three subgroups ( Figure 5). Using these standardized log2 ratios, cases could be delineated into one of four terminal nodes: One each for HoTr and HeH and two for the non-ploidy cases. The majority of HoTr cases (47/50, 94%) were correctly placed into the HoTr group, while three F I G U R E 1 Patient demographics and cytogenetic characteristics. Patient samples were obtained from patients enrolled in UKALL14 (n = 40), UKALL2011 (n = 11), UKALL60+ (n = 6), UKALLXII (n = 6), and UKALL2003 (n = 3) clinical trials as well as local non-trial cases (n = 22). Number of chromosomes has been divided into 30-39 (low hypodiploidy), 51-59 (high hyperdiploidy), 60-67 (high hyperdiploidy and near triploidy overlap), and 68-78 (near triploidy) cases were placed into non-ploidy groups. Similarly, the majority of HeH cases (33/41, 80%) were correctly assigned to the HeH node.
Importantly, for diagnostic practice, chromosome 1 was a very powerful discriminator between HoTr and HeH ALL, and accurately segregated 97% (88/91) of cases with a ploidy shift. Our data show that if cytogenetic analysis or DNA index identify a hyperdiploid clone, the standardized log2 ratio of chromosome 1 (> or <0.28) can extremely reliably discriminate the biologically distinct HoTr and HeH entities ( Figure 5). Importantly, our dataset included two HeH cases with dup (1q) (#28195 and #M18/968), which is a recognized structural abnormality in HeH ALL. 19 Reassuringly, despite the resulting positive deflection in the standardized log2 ratio of chromosome 1, this remained <0.28, and these cases were therefore not misclassified as HoTr by the decision tree.    (Figure 4) F I G U R E 4 Unsupervised clustering of cases by standardized whole chromosome log2 ratios. Principal component analysis (A) and unsupervised hierarchical clustering as a heatmap (B) demonstrate clustering of low hypodiploid and near triploid cases separately from high hyperdiploid cases. Information contributed by each chromosome (information gain) displayed as a bar chart underneath (C). Cases within the incorrect cluster based on initial cytogenetic classification are detailed in Table 1 The classifier was validated using an independent cohort of 29 samples analyzed using the Affymetrix Cytoscan HD platform.
Individual whole chromosome log2 ratios were extracted, standardized and assessed using the classifier with the ploidy status blinded.
The validation cohort included HoTr, HeH, and non-ploidy cases, along with near haploid samples (Supplementary Table 5). The classifier correctly identified 7/7 cases with HoTr ( Figure 5). The majority of HeH cases (6/7) and non-ploidy cases (4/5) were also assigned to the correct group. Most of the near haploid cases (5/8) were classified into the HeH group which is logical given the discovery cohort did not include this entity and, in the majority, the duplicated near haploid clone included two copies of chromosome 1 and four copies of chromosome 14. This resulted in standardized log2 ratios <0.28 for chromosome 1 and >0.37 for chromosome 14; and an HeH call.

| DISCUSSION
In this study, we present one of the largest SNP array cohorts to date of patients with HoTr ALL. Our observations show that HoTr may present with 50-60 chromosomes (as few as 54 chromosomes in our cohort), approaching the lower limit of the range for HeH. We have identified three cases with <60 chromosomes where the SNP array pattern was indicative of HoTr rather than HeH ( Figure 3A and Supplementary Figures 4, 5). Crucially all three cases harbored a pathogenic TP53 mutation which is the hallmark of this entity. We acknowledge that we did not show direct cytogenetic evidence of the presence of a low hypodiploid clone and that LOH is also well described in HeH, 25 where it may occur as a result of chromosomal mis-segregation during mitosis. 23 Nonetheless, the LOH observed in these cases was extensive and affected the typical chromosomes lost in low hypodiploidy. Moreover, LOH was consistently seen in chromosomes with the lowest log2 ratios (LOH-LCN), and those with preserved heterozygosity always had higher log2 ratios, suggesting these chromosomes initially became monosomic before duplicating. Interestingly, however, the modal chromosome number in these cases coupled with the high number of trisomies does question the prevailing hypothesis regarding the mechanism by which these karyotypes arise. Nonetheless the presence of a TP53 mutation in these cases supports grouping and treating patients with such clones alongside patients with overt low hypodiploidy.
The samples and cases included in this study were selected on the availability of DNA and SNP array results but the age profile of the HoTr group does reflect the underlying epidemiology. Therefore, it is not possible to calculate the true proportion or incidence of misclassified cases from this study. However, we know that the frequency of HoTr increases with age, so these findings are particularly relevant in adult ALL and suggest the true frequency of this subgroup is higher than previously estimated. Indeed, we note all three cases initially misclassified as HeH, were adults >40 years old at diagnosis, suggesting HoTr may be even more common than currently appreciated in older patients. In addition, these findings may also explain the lack of consensus regarding the prognostic impact of HeH in adults. 36 We used a novel approach to analyze SNP array patterns by deriving whole chromosome log2 ratios for each chromosome and,  HeH ALL samples ( Figure 5 and Supplementary Figures 10 and 11) and is the most discriminatory predictor to differentiate these two ploidy subgroups. In the absence of cytogenetics, log2 ratios of key chromosomes (1, 7, and 14) offer valuable information to resolve the genetic ploidy subgroup of a sample, even when visual interpretation of the SNP array is inconclusive. Current SNP array analysis software (e.g., Nexus or Affymetrix Chromosome Analysis Suite) can be used to derive whole chromosome log2 ratios, which can then be standardized as described, to support accurate genetic risk stratification in diagnostic genetic laboratories (Supplementary Figure 12).
The classifier is relatively simple to use and, given the prognostic importance of HoTr, should be used whenever the results of a SNP array are ambiguous.
This study highlights the challenges in diagnosing this enigmatic genetic subtype. Ideally SNP array profiling should be applied to all diagnostic patient samples. However where this is not possible the presence of a hyperdiploid clone, and particularly the presence of trisomy 1, should prompt further investigation by SNP profiling and/or TP53 mutation testing. In addition, we have developed and validated a novel ploidy classifier to assist SNP array interpretation particularly in situations where the pattern is ambiguous. This novel approach is applicable to other cancers where large ploidy shifts define prognostically important subtypes, for example, multiple myeloma. 39 As the majority of ALL treatment protocols assign patients with HoTr to high-risk therapy the accurate detection of this subgroup should be considered standard-of-care for all patients with ALL.

ACKNOWLEDGMENTS
We would like to thank all the patients who took part in this trial as well as their families.

CONFLICT OF INTEREST
The authors declare no potential conflict of interest.