Molecular characterization of genomic breakpoints of ALK rearrangements in non‐small cell lung cancer

ALK rearrangement is called the ‘diamond mutation’ in non‐small cell lung cancer (NSCLC). Accurately identifying patients who are candidates for ALK inhibitors is a key step in making clinical treatment decisions. In this study, a total of 783 ALK rearrangement‐positive NSCLC cases were identified by DNA‐based next‐generation sequencing (NGS), including 731 patients with EML4‐ALK and 52 patients with other ALK rearrangements. Diverse genomic breakpoints of ALK rearrangements were identified. Approximately 94.4% (739/783) of the cases carried ALK rearrangements with genomic breakpoints in the introns of ALK and its partner genes, and 2.8% (21/739) of these cases resulted in frameshift transcripts of ALK. Meanwhile, 5.6% (44/783) of the ALK rearrangement‐positive cases had breakpoints in the exons that would be expected to result in abnormal transcripts. RNA‐based NGS was performed to analyse the aberrant fusions at the transcript level. Some of these rearranged DNAs were not transcribed, and the others were fixed by some mechanisms so that the fusion kinase proteins could be expressed. Altogether, these findings emphasize that, when using DNA‐based NGS, functional RNA fusions should be confirmed in cases with uncommon/frameshift rearrangement by RNA‐based assays.

ALK rearrangement is called the 'diamond mutation' in non-small cell lung cancer (NSCLC). Accurately identifying patients who are candidates for ALK inhibitors is a key step in making clinical treatment decisions. In this study, a total of 783 ALK rearrangement-positive NSCLC cases were identified by DNA-based next-generation sequencing (NGS), including 731 patients with EML4-ALK and 52 patients with other ALK rearrangements. Diverse genomic breakpoints of ALK rearrangements were identified. Approximately 94.4% (739/783) of the cases carried ALK rearrangements with genomic breakpoints in the introns of ALK and its partner genes, and 2.8% (21/739) of these cases resulted in frameshift transcripts of ALK. Meanwhile, 5.6% (44/783) of the ALK rearrangement-positive cases had breakpoints in the exons that would be expected to result in abnormal transcripts. RNA-based NGS was performed to analyse the aberrant fusions at the transcript level. Some of these rearranged DNAs were not transcribed, and the others were fixed by some mechanisms so that the fusion kinase proteins could be expressed. Altogether, these findings emphasize that, when using DNA-based NGS, functional RNA fusions should be confirmed in cases with uncommon/frameshift rearrangement by RNA-based assays.

Introduction
Lung cancer is a major malignancy that threatens human life and health worldwide, with a high incidence and mortality in both male and female patients [1]. Non-small cell lung cancer (NSCLC) is the main type of lung cancer, accounting for approximately 85% of cases, and it includes lung squamous cell carcinoma, lung adenocarcinoma and large cell lung carcinoma [2]. As a molecularly heterogeneous disease, multiple genetic alterations can drive the occurrence of NSCLC [3]. Approximately 3-7% of NSCLC patients harbour anaplastic lymphoma kinase gene (ALK) rearrangements [4,5]. The wild-type ALK gene encodes a transmembrane protein that is a classic receptor tyrosine kinase located on the cell membrane [6][7][8]. When the tyrosine kinase domain (exon 20 to exon 28) of ALK is retained in ALK-containing fusion proteins, it results in oncogenic tyrosine kinases capable of driving oncogenesis through several downstream signalling pathways, including the RAS/MEK/ERK, PI3K and JAK/ STAT pathways [6,9]. Tyrosine kinase inhibitors (TKIs) represent a major milestone in the treatment of ALK rearrangement-positive NSCLC patients, playing a crucial role in combating these oncogenic alterations [10][11][12].
Multiple methods have been developed to detect gene rearrangements/fusions in various clinical diagnostic settings [13,14]. An assay utilizing DNA-based next-generation sequencing (NGS) has been applied frequently in recent years. Abundant types of ALK rearrangements are identified by DNA-based NGS [15]. The most common partner gene for ALK is echinoderm microtubule-associated protein-like 4 (EML4) [16], and other noncanonical partner genes have been identified, such as kinesin family member 5B (KIF5B), kinesin light chain 1 (KLC1) and translocated promoter region (TPR) [17]. Previous studies have reported diverse genomic breakpoints of ALK rearrangements that occur in different regions (introns or exons) in NSCLC, and intronic breakpoint fusions usually result in in-frame chimeric fusion transcripts/ proteins [15,18]. Multiple ALK-fusion variants caused by variable genomic breakpoints have been reported with different sensitivities to ALK TKIs, especially in canonical EML4-ALK fusions [19].
In theory, the potential pathogenicity of fusion variants ensures that the component of the kinase domain is in frame in the transcripts [20]. However, the predicted transcripts of some rearrangement types may be imprecise based on the coding sequence of the DNA. The potential unreliability of genomic breakpoints identified by DNAbased NGS in predicting fusion transcripts has been proposed [15]. Therefore, the validation of ALK rearrangements detected at the DNA level, especially the uncommon genomic breakpoints of rearranged genes, needs to be constantly supplemented [21].
In this study, we retrospectively analysed the DNA molecular characteristics of ALK rearrangements in a local NSCLC database, and ALK rearrangements with noncanonical partner genes and uncommon genomic breakpoints were identified. To explore the actual transcripts of these rearrangements, which may result in abnormal transcripts, an RNA-based NGS assay was performed. This study aimed to effectively and accurately determine the actual fusion status of the ALK gene in the context of specific ALK rearrangements.

Patients and samples
From February 2018 to November 2021, a total of 783 lung cancer patient samples (718 tissues and 65 plasma fractions) were recruited from the Affiliated Hospital of Qingdao University, the Zhejiang Provincial People's Hospital and the Second Hospital of Shandong University, and these cases were detected as ALK rearrangement-positive by DNA-based NGS. In this study, ALK rearrangements retaining the 3 0 ALK kinase domain were included and divided into canonical (EML4-ALK) and noncanonical (other partner genes-ALK) types. Their clinical characteristics were collected from their medical records and analysed. This study was conducted in accordance with the Declaration of Helsinki (as revised in 2013), and it was approved by the ethics committee of Zhejiang Provincial People's Hospital (No. QT2022218). The experiments were undertaken with the understanding and written consent of each subject.

DNA sample extraction and library construction
The sequencing methods have been described in earlier papers [22] [24], somatic insertions and deletions were retrieved using STRELKA (https://github. com/Illumina/strelka) [25], and structural variations were determined using GENEFUSE version 0.6.1 (https:// github.com/OpenGene/GeneFuse) [26]. The variants were filtered and excluded with a population frequency over 0.1% based on guidelines by the Exome Aggregation Consortium. The remaining variants were annotated with Oncotator and Vep.

RNA-based NGS
A Fusioncapture panel (Genetron Health, Beijing, China), which is a 395-gene RNA panel, was used to identify gene fusions at the transcript level. Total RNA was isolated using the AllPrep DNA/RNA Mini Kit (Qiagen) and then reverse transcribed to cDNA using SuperScript III Reverse Transcriptase (Thermo Fisher Scientific, Waltham, MA, USA). The libraries were constructed with the KAPA HTP Library Preparation Kit (KAPA Biosystems) and subjected to Illumina HiSeq X-Ten for paired-end sequencing. Sequencing reads were mapped to a human reference genome (hg19) using HISAT2-2.0.5 (Johns Hopkins University School of Medicine, Baltimore, MD, USA). Gene fusions were identified using FUSIONMAP [27].

Immunohistochemistry
The immunohistochemistry (IHC) assay has been described in the earlier studies [14,28]. The IHC for ALK protein expression was performed on FFPE sections using a VENTANA ALK (Clone D5F3) CDx Kit and benchmark Ultra Immunostainer (Ventana Medical Systems, Inc., Tucson, AZ, USA, Cell Signaling Technology, Danvers, MA, USA) according to the manufacturer's instructions. The presence of granular cytoplasmic staining in the tumour cells (any percentage of positive tumour cells) was considered positive for ALK, while the absence of granular cytoplasmic staining in the tumour cells was considered negative for ALK.

Statistical analyses
The clinical characteristics of the study population were statistically analysed by the chi-square test and Student's t-test. A P value < 0.05 indicated statistical significance. Analyses and the data presentation were undertaken using IBM SPSS STATISTICS 26.0 (IBM, Armonk, NY, USA) and GRAPHPAD PRISM 8.0.1 (GraphPad, La Jolla, CA, USA). The rearrangements and fusions were illustrated using Integrative Genomics Viewer, IGV 2.11.4 (Broad Institute, Cambridge, MA, USA).
In order for the ALK-related fusion to be pathogenic, the ALK components have to remain in frame within the structure of the detected transcripts (ALK components that are out of frame would not be expected to be oncogenic because of the deletion of the kinase domain). We applied this logic to assess the DNAbased NGS data and predicted chimeric transcripts of these fusion patterns. 'intron to intron' EML4-ALK and 16.3% (7/43) of the 'intron to intron' noncanonical ALK rearrangements were predicted to be frameshifts with respect to the 3 0 gene ALK (Fig. 1A,B). This prediction was based on the termination codon that appeared early due to the frameshift of the fusion transcript.  (Fig. 1B). Thus, we separated the ALK rearrangements into three categories: in-frame, frameshift and exon breakpoints ('exon to intron', 'intron to exon' or 'exon to exon') (Table S1).

Validation of the ALK frameshift rearrangement pattern by RNA-based NGS
RNA-based NGS was performed on nine stored samples as a frameshift cohort. Limited by the low-quality RNA samples, two of the FFPE samples were not tested due to failure during the RNA quality control (QC) process. We finally detected seven qualified samples from available tissue, including four cases (#P2106140203, #P2011280013, #P2008100038 and #P1902170006) with EML4-ALK and three cases (#P2007070051, #P2005010014 and #L-2018-00005429) with noncanonical ALK rearrangements (Table 4;  Table S2). Furthermore, ALK-IHC was performed on several available samples as a supplement validation of the NGS results, although there might be several mechanisms, such as ALK fusions, amplification and alternative transcription initiation of ALK, that can drive the overexpression of ALK and result in a positive IHC result [29,30]. In two cases (#P2008100038 and #P1902170006), the EML4-ALK fusions were negative at the transcript level despite positivity in DNA-based NGS, and these results were confirmed by IGV (Fig. S1). We assumed that the rearranged genetic material of these two cases is not transcribed, and the ALK-IHC results of case #P2008100038 showed that ALK protein expression was negative (Fig. S2C).
Another 2 EML4-ALK-positive cases (#P2106140203 and #P2011280013) were positive for a fusion in both the RNA-based and DNA-based NGS assays. The predicted transcript of case #P2106140203 was exon 17 of EML4 fused to exon 19 of ALK and it would not have been in frame (Fig. 3A). However, the actual transcript detected by the RNA-based NGS assay of case #P2106140203 did not match the predicted transcript. The IGV showed a novel variant composed of a sequence derived from ALK intron 19 (42 adjacent nucleotides, 5 0 -CCAGGCTGCCAGGCCATGTTGCA GCTGACCACCCACCTGCAG-3 0 ) and a sequence derived from EML4 intron 17 (26 nonadjacent nucleotides, 5 0 -GAGACAAAAACATGAAGTCAATTTTC-3 0 ) inserted between exon 17 of EML4 and exon 20 of ALK (E17ins26; ins42A20) (Fig. 3B). In addition, the fusion type of case #P2011280013 was intron 19 of EML4 fused to intron 19 of ALK at the genomic level, and the predicted transcript was not in frame (Fig. 3C). However, a novel variant (E19ins1; A20) with a nucleotide inserted between exon 19 of EML4 and exon 20 of ALK was detected by RNA-based NGS and it did not match the predicted transcript (Fig. 3D). The RNA NGS results of #P2106140203 and #P2011280013 suggested that these fusion types did not follow the conventional splicing signal in the exon-intron boundary but instead formed novel fusion variants. We performed IHC in the case (#P2106140203) with some remaining tissue and verified ALK protein expression positivity (Fig. S2D).

Validation of the ALK rearrangement patterns with genomic breakpoints at the exons
In contrast to conventional genomic breakpoints of gene fusions that are located at introns, some partial breakpoints that led to gene fusions observed in this study occurred at exons. We detected the sequence of the fusion transcripts by RNA-based NGS from 13 available samples to explore the actual transcripts produced by the ALK rearrangements with genomic breakpoints located in the exons of ALK or its partner genes. Nine of the 13 samples were qualified by the RNA QC metric, including six cases (#P2006160041, #P2107150201, #P1911070042, #P2010230040, #P2009100057 and # P2008280124) with EML4-ALK and three cases (#P2009120038, #P2004270003 and #P2003200071) with noncanonical ALK rearrangements (Table 5; Table S2).

Discussion
In this study, a total of 731 NSCLC cases with canonical EML4-ALK rearrangements and 52 NSCLC cases with noncanonical ALK rearrangements were identified. Among them, complex genomic breakpoints of ALK rearrangements were detected in the exons or introns of ALK and its partner genes. For rearrangements whose genomic breakpoints are located in exons, their transcripts cannot be inferred from conventional splicing signals. There are also some rearrangements that result in a frameshift transcript that cannot be translated into a fusion protein containing the amino acid sequence of ALK. Therefore, the actual transcripts of these ALK rearrangement types were verified by RNA-based NGS.
Frameshift of the fusion gene caused by chromosomal rearrangement is uncommon, especially in common carcinogenic-driven fusion mutations [31]. In this study, partial canonical and noncanonical ALK rearrangement-positive cases were speculated to exhibit frameshift possibilities based on DNA-based NGS data. For the canonical ALK rearrangements, the results of the actual transcripts were negative in two cases (#P2008100038 and #P1902170006) and positive in the other two cases (#P2106140203 and #P2011280013). To our knowledge, reports on ALK fusion frameshifts are rare, and only one case has been reported in detail. In this case, CMTR1-ALK (intron 2 : intron 19) was determined to be positive by DNAbased NGS, yet the patient did not respond to crizotinib treatment, and the expression of the ALK protein was negative by IHC [32]. Presumably, the two cases in this study with genomic-positive and transcriptnegative EML4-ALK rearrangements will also not show a clinical response to ALK-targeted inhibitors. In addition, the insertion of diverse nucleotide sequences between the nearest fusion exons (#P2106140203 and #P2011280013) prevents frame shifts and maintains the functional transcription of EML4-ALK fusions, which is similar to the EML4-ALK variants reported in previous studies [33][34][35][36][37][38][39]. Alternative splicing caused by translocation can explain inserted or deleted nucleotide sequences to maintain a multiple of 3 required for a codon in frame to produce a functional protein. In contrast, transcripts of the canonical EML4-ALK fusions were detected in frame-shift cases with noncanonical ALK rearrangements (#P2007070051, #P2005010014 and #L-2018-00005429), which are associated with a complex mechanism of chromothripsis, resulting in posttranscriptional removal of other gene sequences that joined between ALK and EML4 [15,40]. Similarly, the transformation of PRR23C to KIF5B in case #P2009120038 may also be related to chromothripsis.
Therefore, the results of complex genomic rearrangement events detected by DNA-based NGS may inaccurately reflect clinically actionable fusions [40]. Although our results showed the rarity of the predicted frameshift transcript of the ALK rearrangement pattern, further verification of these samples by RNA or protein assays is necessary to accurately diagnose patients at the molecular level who are candidates for targeted drug treatments. Most genomic breakpoints of the rearranged gene occur in intronic sequences rather than in coding sequences [41]. According to conventional splicing   principles, 5.6% of the rearrangement breakpoints were located in exonic regions of ALK or its partner genes in this study, and their predicted transcripts may be inaccurate or out-of-frame.
Comparing the results from DNA-based NGS and RNA-based NGS, we found that exon skipping existed in 'exon breakpoints' cases carrying canonical or noncanonical ALK fusions. It may be reasoned that the lack of classical 3 0 or 5 0 accepter splice sites in the 'exon-intron', 'intron-exon' or 'exon-exon' structures resulted in the removal of the broken exon together with the previous intron to restore the reading frame [18,42]. Notably, although lacking the 5 0 acceptor splice site of ALK exon 20, the actual transcript of case #P2008280124 excluded partial nucleotides and retained a portion of exon 20 through an alternative splicing signal at the RNA level rather than implementing exon skipping splicing and it resulted in two different variants (E19; del14A20 and E19; del20A20). Further comparative analysis of the transcripts and the amino acid sequences showed that partial retention of exon 20 could ensure the in-frame sequence with integrity of the ALK kinase domain [43,44]. Patients harbouring multiple EML4-ALK variants implied a poor prognosis due to the high heterogeneity in the tumour tissue [45]. Therefore, RNA-based NGS showed an advantage in detecting fusion patterns in which multiple variants coexist, and more precise splicing results at the transcription level were illustrated.
However, there are some limitations of this study. Due to the retrospective nature of this study, only a small number of tissue samples were available and met the quality control necessary for RNA sequencing, so only a few samples were verified by RNA sequencing. And for some patients with advanced lung cancer, their tissue samples could not be obtained, so blood samples were taken for DNA sequencing. Furthermore, the response of some patients with these uncommon ALK rearrangements to ALK inhibitors is unknown, large-scale validation of relationship between uncommon ALK rearrangements and treatments is necessary. In addition, the DNA NGS panel covers intronic regions where ALK rearrangements frequently occur and it may miss some rare intronic breakpoints. In the future, we will conduct more comprehensive clinical trials to explore the clinical benefits of these patients with specific ALK rearrangements from ALK inhibitor therapy.

Conclusions
In conclusion, by systematically analysing the DNAbased NGS data of ALK rearrangements in lung cancer patients, we identified variable and uncommon genomic breakpoints of ALK and its 5 0 partner genes. We further verified this finding by RNA-based NGS and found that the genomic breakpoints at the transcript level did not match those predicted by the genomic breakpoints; furthermore, we found that some of the fusions identified at the DNA level may be a false-positive. The ALK fusion results at the transcript level were better able to explain their functional significance. Therefore, the identification of ALK fusion status in NSCLC patients may need to use orthogonal assays based on multiomics for fusion detection to achieve an accurate molecular diagnosis and ensure the reliability of the targeted drug use indicators.

Supporting information
Additional supporting information may be found online in the Supporting Information section at the end of the article. Fig. S1. Integrative Genomics Viewer (IGV) screenshot of the EML4-ALK fusions (#P2008100038 and #P1902170006) detected by NGS. Fig. S2. IHC analysis of tissues from lung cancer patients. Fig. S3. Integrative Genomics Viewer (IGV) screenshot of noncanonical ALK rearrangements/fusions detected by NGS (DNA-based and RNA-based). Fig. S4. Integrative Genomics Viewer (IGV) screenshot of EML4-ALK rearrangements/fusions (exon breakpoints) detected by NGS (DNA-based and RNAbased). Table S1. DNA-based NGS data of uncommon ALK rearrangements. Table S2. RNA-based NGS data of uncommon ALK fusions.