MicroRNA Proﬁling of Fresh Lung Adenocarcinoma and Adjacent Normal Tissues from Ten Korean Patients Using miRNA-Seq

: MicroRNA transcriptomes from fresh tumors and the adjacent normal tissues were proﬁled in 10 Korean patients diagnosed with lung adenocarcinoma using a next-generation sequencing (NGS) technique called miRNA-seq. The sequencing quality was assessed using FastQC, and low-quality or adapter-contaminated portions of the reads were removed using Trim Galore. Quality-assured reads were analyzed using miRDeep2 and Bowtie. The abundance of known miRNAs was estimated using the reads per million (RPM) normalization method. Subsequently, using DESeq2 and Wx, we identiﬁed differentially expressed miRNAs and potential miRNA biomarkers for lung adenocarcinoma tissues compared to adjacent normal tissues, respectively. We deﬁned reliable miRNA biomarkers for lung adenocarcinoma as those detected by both methods. The miRNA-seq data are available in the Gene Expression Omnibus (GEO) database under accession number GSE196633, and all processed data can be accessed via the Mendeley data website.


Summary
MicroRNAs (miRNAs) are small regulatory non-coding RNAs (ncRNAs), which are approximately 22 nucleotides in length [1]. They play crucial roles in various cellular processes, such as functioning as post-transcriptional gene regulators. Indeed, miRNAs primarily repress the expression of target mRNAs by complementary base pairing with the seed regions of the target mRNAs [2]. Despite the profound significance of miRNAs in gene regulation, only a limited number of studies have employed high-throughput screening techniques, such as miRNA-seq, to profile miRNAs in both tumor tissues and matched normal tissues of lung adenocarcinoma patients [3][4][5]. Recently, a study specifically conducted miRNA profiling in Korean patients diagnosed with lung adenocarcinoma and revealed distinct subgroups within this population [3,6]. However, none of these studies have utilized deep learning techniques, which have the potential to provide superior results.
In this study, we aimed to identify novel miRNA biomarkers for lung adenocarcinoma by profiling the miRNA transcriptomes in fresh lung adenocarcinoma and adjacent normal tissues from 10 Korean patients. In contrast to previous studies, we employed two different algorithms, DESeq2 and Wx (a deep learning-based biomarker identification algorithm) to accurately identify miRNA biomarkers. Furthermore, we validated the identified miRNA biomarkers by comparing previously reported miRNA transcriptomes from additional Korean lung adenocarcinoma patients [3,6]. This comprehensive list of potential miRNA biomarkers can provide valuable insights into the miRNA-driven gene regulation in lung adenocarcinoma and serve as a foundation for further investigation into their roles in disease onset and progression. The miRNA-seq data generated in this study are available in the Gene Expression Omnibus (GEO) database under accession number GSE196633, and all processed data can be accessed via the Mendeley data website (https://data.mendeley. com/datasets/vp977psjcb/2, accessed on 3 March 2023.).

Quality Assessment of miRNA-Seq Data
To identify potential miRNA biomarkers for lung adenocarcinoma, we profiled miRNA transcriptomes from fresh lung adenocarcinoma and adjacent normal tissues collected from 10 Korean patients using miRNA-seq. The baseline clinicopathological characteristics of patients are described in Table 1 and Table S1. The sequencing quality of the samples, including the number of sequenced reads (single-end), is summarized in Table 2. We estimated the abundance of all known miRNAs using miRDeep2 [7] (Table S2), and plotted all samples in a three-dimensional principal component analysis (PCA)-plot based on their miRNA expression levels ( Figure 1).

Identification of Potential miRNA Biomarkers for Lung Adenocarcinoma
Differentially expressed miRNAs were identified using DESeq2 with an adjusted pvalue cutoff of 0.05 [8]. Subsequently, miRNAs exhibiting less than a two-fold change between lung adenocarcinoma and adjacent normal tissue samples were excluded ( Figure  2A and Table S3). A total of 224 miRNAs (135 upregulated and 89 downregulated) were

Identification of Potential miRNA Biomarkers for Lung Adenocarcinoma
Differentially expressed miRNAs were identified using DESeq2 with an adjusted p-value cutoff of 0.05 [8]. Subsequently, miRNAs exhibiting less than a two-fold change between lung adenocarcinoma and adjacent normal tissue samples were excluded ( Figure 2A and Table S3). A total of 224 miRNAs (135 upregulated and 89 downregulated) were identified ( Figure 2B). Next, the potential biomarkers for lung adenocarcinoma were also identified with a deep learning-based biomarker identification algorithm called Wx [9] ( Figure 2A and Table S4). Similar to the above scheme, miRNAs showing zero Wx score and less than a two-fold change between the groups were further removed. A total of 762 miRNAs (452 upregulated and 310 downregulated) were detected ( Figure 2B). Given the relatively small number of samples (n = 10), we reanalyzed the miRNA-seq data from a previous study comprising 48 Korean patients diagnosed with lung adenocarcinoma [3,6]. Using the DESeq2 approach, a total of 571 miRNAs (412 upregulated and 159 downregulated) were identified ( Figure 2B and Table S3). The characteristics of these patients are described in Table S1.
To identify reliable miRNA biomarkers, 145 common miRNAs (94 upregulated and 51 downregulated) were retrieved using the above DESeq2 and Wx approaches ( Figure 2B and Table S5). Table 3 shows the statistics of the top 10 potential miRNA biomarkers (five upregulated and five downregulated) that can be used to distinguish lung adenocarcinoma from normal tissues.

miRNA Extraction
This study included patients with untreated, primary, and non-metastatic lung tumors who underwent lung lobe resection with curative intent and provided informed consent. After surgical resection, paired tumors and normal tissues were isolated and promptly transported to the research facility. The tumor and normal tissues were macroscopically examined to determine tumor positioning. Tumor tissues consisting of more than 60% of tumors were selected. Ten paired normal and cancer samples from lung adenocarcinoma patients were placed in RNAlater solution (Thermo Scientific, Cat. #AM7020,

miRNA Extraction
This study included patients with untreated, primary, and non-metastatic lung tumors who underwent lung lobe resection with curative intent and provided informed consent. After surgical resection, paired tumors and normal tissues were isolated and promptly transported to the research facility. The tumor and normal tissues were macroscopically examined to determine tumor positioning. Tumor tissues consisting of more than 60% of tumors were selected. Ten paired normal and cancer samples from lung adenocarcinoma patients were placed in RNAlater solution (Thermo Scientific, Cat. #AM7020, Waltham, MA, USA) at 4 • C within a few minutes of collection, and left overnight to ensure RNA stability. For further analysis, samples were stored at −20 • C after removing the RNAlater solution.

miRNA Sequencing (miRNA-Seq)
The RNA integrity and quantity were measured using the Agilent Bioanalyzer 2100. Approximately 1 µg of total RNA was used to prepare a small RNA library, using the TruSeq Small RNA Library Prep Kit (Illumina, San Diego, CA, USA), in accordance with the manufacturer's instructions. The libraries were quantified using KAPA Library Quantification kits for Illumina sequencing platforms, in accordance with the qPCR quantification protocol guide (KAPA BIOSYSTEMS, #KK4854, Wilmington, MA, USA). Then, the samples were sequenced (single-end; 51 bp) using the Illumina HiSeq 2500 system (LC Sciences, Houston, TX, USA) from Macrogen Inc. (Seoul, Republic of Korea).

miRNA-Seq Data Analysis
Sequenced reads were trimmed for sequencing quality and/or adapter contaminations using Cutadapt [10] with the following parameters: -overlap=6 -f fastq -a TG-GAATTCTCGGGTGCCAAGG -m 18 -M 26. The sequencing quality of the trimmed reads was checked using FastQC [11]. Trimmed reads were aligned to the reference human genome using the mapper function (mapper.pl; with parameters -e, -h, -j, -m, and -s) in miRDeep2 [7] in conjunction with Bowtie [12]. Expression levels of all known miRNAs were estimated using the miRDeep2 quantifier function (quantifier.pl; with parameters: -t has -g 2, -e 2, and -f 5). A three-dimensional PCA plot was generated using 581 miRNAs, which had exhibited expression values greater than 1 read per million (RPM), on average, across all samples. Differentially expressed miRNAs between lung adenocarcinoma and adjacent normal tissues were identified using DESeq2 [8]. Potential miRNA biomarkers were also identified using a deep learning-based biomarker algorithm called Wx [9].