RNA Sequencing Reveals LINC00167 as a Potential Diagnosis Biomarker for Primary Osteoarthritis: A Multi-Stage Study

Objectives Given the roles played by lncRNA in human diseases and the high incidence of OA, this study investigated the pivotal pathways involved in the disease and identified potential biomarkers for OA diagnosis. Methods We first performed an exploration of RNA-sequencing in peripheral blood leukocytes from six subjects (3 OA and 3 healthy controls). Promising candidate lncRNAs were evaluated in first stage validation using a GEO dataset (GSE114007) of 38 subjects (20 OA and 18 healthy controls), followed by a second stage validation using quantitative PCR analysis with 101 subjects (67 OA and 34 controls). The third stage investigated the potential value of validated lncRNA in the early diagnosis of OA in peripheral blood leukocytes from a total of 120 participants (60 cases and 60 controls). Results The dataset identified a total of 1,380 up-regulated and 719 down-regulated mRNAs and 5,743 up-regulated and 7,384 down-regulated lncRNAs. The up-regulated DEGs were mainly enriched in the extracellular matrix, while the down-regulated DEGs were mainly enriched in the IL-17 and wnt signaling pathways. 18 overlapping candidate lncRNAs survived after first-stage validation. 3 hub lncRNAs were selected for the second validation stage and qualified in an external sample, and lncRNA LINC00167 was further confirmed with a similar result (down-expressed in both stages). Receiver operating characteristic analysis showed that LINC00167 can distinguish OA cases from healthy controls with a high area under the curve of 0.879 (95%CI: 0.819, 0.938; P < 0.001), with a sensitivity of 80.7% and specificity of 83.5%. Conclusion The expression profile of OA was identified and critical pathways were elucidated by an integrated approach to RNA-seq from easily accessible blood. LINC00167 may serve as a potential early diagnosis marker for OA in clinical practice. The detailed mechanism of action of this lncRNA requires further elucidation in future studies.


INTRODUCTION
Osteoarthritis (OA) is a common and disabling condition characterized by articular cartilage degradation. It creates substantial and increasing burdens for healthcare with notable implications for elderly individuals (Hermann et al., 2018;Piwowar et al., 2018). The etiology of OA is not fully understood, however, comprehensive studies have suggested that OA is a result of complex interactions between environmental factors (including lifestyle factors) and genetics, whose development is primarily a complicated process covering broad mechanisms and a number of molecules (Jayasuriya et al., 2018;Hunt et al., 2019;Maly et al., 2019). Given the irreversibility of its progress, early prevention and precisive diagnosis of OA is of utmost importance (Kijowski et al., 2019;Kloppenburg and Berenbaum, 2020).
Long non-coding RNAs (lncRNAs), which are conventionally defined as being over 200 nucleotides with an absence of coding ability, are emerging as key regulators of gene expression (Jarroux et al., 2017). Several studies have shown that lncRNAs play important roles in malignancy and chronic diseases, including OA (Peffers et al., 2018;Razmara et al., 2019;Wu et al., 2019). In a study by Fu et al. (2015) there were 3,007 upregulated lncRNAs and 1,707 down-regulated lncRNAs in knee OA cartilage compared to normal samples and some of them were predicted to have target genes that were associated with OA. Numerous research has shown that lncRNAs play a critical role in the regulation of chondrocyte and synovial cell survival (Dou et al., 2017;Zheng et al., 2017;Cao et al., 2018), as well as in mediating inflammation (Pearson and Jones, 2016) and angiogenesis (Cen et al., 2017). A recent study reported that NON-HSAG034351 was the hub lncRNA down-regulated in synovial tissue and could play a central role in the pathological progression of OA (Shui et al., 2020).
Large-scale exploration has been conducted by comparing OA cases and healthy controls based on blood samples (Ramos et al., 2014). Compelling studies have confirmed that some specific lncRNAs such as lncRNA-ATB, lncRNA MIR4435-2HG, and lncRNA DILC dysregulated in OA blood samples and regulated chondrocyte cell proliferation and apoptosis (Dang et al., 2018;Huang et al., 2019;Xiao et al., 2019). These studies have demonstrated the importance of lncRNAs in OA blood. However, the profile of lncRNAs and their potential biological functions in blood samples have not been systematically characterized in OA patients (van Spil and Szilagyi, 2019). Importantly, a study has reported that the expression of exosomal lncRNAs in blood showed no significant difference between OA and non-OA, while for synovial fluid samples, the expression of exosomes in early OA and late-stage OA was markedly higher than that in controls (Zhao and Xu, 2018). Considering the tissue specificity of lncRNAs in OA, we explored hub lncRNAs that are abnormally expressed in the synovial tissues and cartilage, and blood.
This study sequenced a complete transcriptome RNA in peripheral blood leukocytes (PBL) of OA and controls (3 cases and 3 controls). A stepwise screening approach was also applied to identify potential hub lncRNAs. It then validated the expression levels of potential hub lncRNAs (67 cases and 34 controls). A further diagnostic test was conducted to investigate the potential ability to OA diagnosis for validated lncRNAs (60 cases and 60 controls). The study aimed to investigate the pivotal pathways involved in OA pathophysiology and to identify potential biomarkers for OA diagnosis in PBL, which could provide a roadmap to pinpoint candidates for future lncRNAbased diagnostics and therapies.

Sample Collection
This study involved three stages. In the RNA-seq screening stage, three OA cases over 40 years old were recruited from the department of orthopedics of Shanghai Zhoupu hospital in March 2018. Three healthy controls were randomly selected from a pool of more than 100 individuals who participated in routine health surveillance in Zhoupu hospital and matched with OA cases according to age and gender. Methods of diagnosing OA have been described in our previous study (Chu et al., 2017). During the second validation stage, 67 OA patients (knee OA: 54; other OA: 13) and 34 OA-free controls were recruited from the Department of Joint Surgery and Sports Medicine in Shanghai Changzheng hospital, between April and December 2019. In the third stage, another 60 OA cases were recruited from the same hospital from April to June 2020 to investigate the potential value of the validated lncRNA in OA diagnosis. Finally, 60 healthy controls were randomly selected from a pool of more than 500 individuals, who participated in routine health surveillance at the same hospital in June 2020. The healthy control subjects were recruited from a survey and had no signs or symptoms of arthritis or joint diseases. Secondary OA patients such as those with inflammatory arthritis, rheumatoid, bone fracture, and developmental dysplasia were excluded.
All participants were Han Chinese, aged 40 over, and were not related to one another. Each subject completed an interview by a trained investigator using a structured questionnaire to collect information regarding demographic characteristics as well as environmental exposure, including recreational sports activities, previous knee injuries, family history of OA and other diseases, and clinical manifestations of OA.
The study was reviewed and approved by the Ethics Committee of Shanghai University of Medicine and Health Sciences. All participants signed written informed consent.

RNA-Sequencing
Five milliliter whole blood was kept at room temperatures for 2 h, followed by centrifugation at 1,500 g for 20 min to obtain white blood cells. A lymphocyte isolation kit was used to purify lymphocytes. Total RNAs were extracted from the PBL of three cases and three healthy controls using TRIzol (Invitrogen, Carlsbad, CA, United States).
The RNA samples were sent to Gminix Biotechnology Co., Ltd. (Shanghai, China) for RNA-seq, only samples with high quality RNA (RNA Integrity Number ≥ 8.0) were used in the subsequent construction of RNA-seq libraries. Sequencing was performed on an Illumina HiSeq 2,500 sequencing platform with 10 M reads (Illumina, San Diego, CA, United States). Hisat2 (Langmead, 2010) was used to compare the differently expressed mRNAs and lncRNAs and featureCounts (Liao et al., 2014) were adopted to annotate and quantify mRNAs and lncRNAs. A principal component analysis (PCA) of normalized data was performed in the present study using R software. PCA was used for unsupervised multivariate analyses to determine the directions of maximum covariance without referring to class labels (OA/Normal). Then, counts in the two groups were analyzed by DESeq2 (Love et al., 2014), and differently expressed genes (DEGs) were screened. mRNAs and lncRNAs with an absolute value fold changes (FC) ≥ 2 and P < 0.05 were considered as significantly differently expressed.

Enrichment Analysis for DEGs
GO is a tool for the unification of biology, which collects structured, defined, and controlled vocabulary for large scale of gene annotation, including biological processes (BP), cellular component (CC), and molecular function (MF) (Kramarz and Lovering, 2019). KEGG database classifies correlating gene sets into their respective pathways. We undertook enrichment analyses for DEGs of the intersection genes across two datasets to explore the potential roles of co-differently DEGs and hub lncRNAs in the current analysis (Du et al., 2014). P < 0.05 for GO enrichment and P < 0.1 and count > 2 for the KEGG pathway were considered for further analysis. The Database for Annotation, Visualization, and Integrated Discovery (DAVID) (Huang et al., 2007), a comprehensive set of functional annotation tools, was employed to relate the functional terms with gene lists by a clustering algorithm.

First-Stage Validation Based on GEO Dataset
The microarray datasets GSE114007 of articular cartilage were downloaded from the Gene Expression Omnibus (GEO) database 1 , which used GPL11154 Illumina HiSeq 2,000 and GPL18573 Illumina NextSeq 500 as platforms, to identify differential expression (Illumina, Inc.). RNA-seq was performed on 18 normal and 20 OA human knee cartilage tissues. Subsequently, the gene expression data were subjected to identical processing using the Robust Multichip Average function within the limma R package (version 3.24.15). The criteria were as follows: mRNAs and lncRNAs with FC ≥ 2, as well as P < 0.05 were considered to be differently expressed. The candidate lncRNAs of co-expression pairs with an absolute value of the Pearson correlation coefficient of ≥ |0.995| were selected as hub lncRNAs.

Second-Stage Validation Using qRT-PCR
To validate the expression level among OA cases and the healthy control group, 101 samples (67 cases and 34 controls) were collected and used for quantification, and the relative expression level in each group was performed by quantitative reverse transcription polymerase chain reaction (qRT-PCR) (Applied Biosystems, Foster City, CA). After the Pearson correlation 1 http://www.ncbi.nlm.nih.gov/geo/ coefficient screening, we randomly selected 3 hub lncRNAs from candidate lncRNAs to perform validation using qRT-PCR, including the metastasis-associated lung adenocarcinoma transcript 1 (MALAT1), long intergenic non-protein coding RNA167 (LINC00167), and the HLA complex group 9 (HCG9). The result for HCG9 was unavailable because of test failure. The primer designs are as follows: LINC00167-142-F: GGTGGCACTGGACTTAGATGAG; LINC00167-142-R: AAAGGGAGCATTCAAGGGACT; MALAT1-256-F: AGTCCAGGAGCCAGTGCG; MALAT1-256-R: TGCCGACCTCACGGATTT; Total RNA was reverse-transcribed to cDNA using Super-ScriptTM III Reverse Transcriptase (Invitrogen). The qRT-PCR was performed using 2 × SYBR Green PCR Master Mix (Arraystar) on an Applied BiosystemsViiA 7 Real-time PCR System. The final reaction volume was 10 µL and contained 5 µL of SYBR Green PCR Master Mix (2 ×), 0.5 µL of the forward and reverse primers (10 µM), 2 µL of cDNA, and 2 µL of double-distilled water. The qRT-PCR reaction conditions were as follows: denaturation at 95 • C for 10 min, followed by 40 cycles of 95 • C for 10 s and 60 • C for 60 s. GAPDH was used as internal control. Analysis of relative expression was calculated using the 2 − Ct method (Ct, the threshold cycle to detect fluorescence) (Fu et al., 2015). All reactions were run in triplicate.

Third-Stage Validation for the Hub lncRNAs
An independent sample was conducted based on the hub lncRNAs. Models were evaluated by a receiver operating characteristic (ROC) analysis using the relative expression for hub lncRNAs and an area under the curve (AUC) criterion.

Statistical Analysis
Body mass index (BMI) was calculated by dividing the weight (kg) by the squared height (m 2 ), and the participants were divided into four categories according to criteria recommended by the Working Group on Obesity in China (Underweight: BMI < 18.5 kg/m 2 ; Normal: 18.5 ≤ BMI < 24; Overweight: 24 ≤ BMI < 28; and General obesity: BMI ≥ 28) (Zhou and Cooperative Meta-Analysis Group of the Working Group on Obesity in China., 2002). lncRNAs with an absolute value fold changes ≥ 2 and P < 0.05 were considered as significantly differently expressed. For GO and pathway analysis, Fisher's exact test was employed to evaluate the significance of GO terms and Pathway identifiers enrichment in the differentially expressed genes. Pearson correlation coefficient was utilized to test the correlation between co-expressed DEG-lncRNA pairs. The qRT-PCR datasets were presented as a scatter diagram with the means. Analysis of variance (ANOVA) was used to evaluate the expression differences of specific lncRNAs. The qualitative data were tested by the chi-square test. A value of P < 0.05 was considered statistically significant. All analyses were performed with SPSS 22.0 (IBM Corp) and GraphPad Prism version 8.0 (GraphPad Software, lnc.).

RESULTS
A schematic diagram of the study design is displayed in Figure 1.
To estimate false-positives, the procedure was performed using randomized labels for the sample. The assessment results were compared with the original classification based on PCA. PCA revealed obvious differences between the normal controls and OA samples. With an adjusted P< 0.05 and | log2FC| > 1 as the cutoff criteria, a total of 1,380 up-regulated and 719 downregulated mRNAs were identified in our sequencing dataset, together with 5,743 up-regulated and 7,384 down-regulated lncRNAs. After the screening of sequencing samples and the GEO database, the two comparison datasets shared 18 overlapping candidate lncRNAs ( Table 1). The characteristics of the screening RNA-seq stage are shown in Table 2.

Enrichment Analysis of DEGs
The differently expressed mRNAs based on screening samples were performed by GO and KEGG enrichment analyses. GO analysis indicated that the up-regulated genes were primarily enriched in extracellular matrix structural constituent, glycosaminoglycan binding, and collagen binding, etc. in BP; proteinaceous extracellular matrix and extracellular matrix component, etc. in CC; extracellular structure organization and extracellular matrix organization, etc. in MF (Figure 2A). Furthermore, the down-regulated genes were mainly enriched in phosphotransferase activity, the nitrogenous group as acceptor, etc. in BP; transcription factor AP-1 complex, etc. in CC; response to muscle stretch, cardiac muscle cell development, and cardiac cell development, etc. in MF (Figure 2B). These pathways possibly participated in regulating the occurrence and progression of OA.
KEGG pathway analysis revealed that up-regulated genes were primarily enriched in pathways associated with protein digestion and absorption, amoebiasis, taste transduction, and ECM-receptor interaction ( Figure 3A). Down-regulated genes were mainly associated with an IL-17 signaling pathway, wnt signaling pathway, and human T-cell leukemia virus 1 infection ( Figure 3B).

qRT-PCR Validation
To validate the expression level among the OA case group and healthy control group, 101 additional samples were collected and used for quantification, and the relative expression level was performed by qRT-PCR. The characteristics of the participants are also shown in Table 2. OA cases were divided into two groups, including knee OA and other OA. Other OA groups included hip site, shoulder site, and ankle site. The relative expression level of LINC00167 was consistent with the GSE114007 microarray data (screening stage) and presented a similar trend of downregulation ( Figure 4A). Nevertheless, the expression of MALAT1 from the GSE114007 data (up-regulated) was inconsistent with the validation result of qRT-PCR (down-regulated) ( Figure 4B). Further pairwise comparisons analysis showed that the difference of LINC00167 and MALAT1 between OA and non-OA was statistically significant (P < 0.01). There was no significant difference between knee OA and other OA groups (P > 0.05).

Evaluation of Hub-lncRNA for OA Diagnosis
To investigate the potential value in OA diagnosis, the ROC curve was calculated using subjects from the third stage validation, including 60 OA cases and 60 healthy controls. In this analysis, the patients were used as true positive samples and the healthy controls were used as true negative samples. As shown in

Association Between the Serum Levels of LINC0067 and Clinical Characteristics of OA Patients
The patients were divided into the high level group and the low level group according to the median of LINC00167 in the serum. As displayed in Table 3, no significant associations were detected between the two groups for age, gender, BMI, and an individual's common habits, including smoking and drinking status. Most importantly, no significant difference was found between the levels of LINC00167 and the course of OA disease (all P > 0.05).

DISCUSSION
Despite concerted efforts made in the treatment and prevention of OA, the incidence of this disease is expected to increase. It is critical to clarify the hub genes and critical signaling pathways and elucidate the pathogenesis of disease onset and progression. RNA-seq screening and independent validation  studies have consistently suggested that the expression of LINC00167 in PBL was strongly lower in OA cases than in healthy controls in a less invasive and more accessible blood sample. Our study provided obvious evidence for the replication of these screening results from RNA-seq with GEO dataset GSE114007, which may considerably enhance the stability of our findings. Taken together, LINC00167 in PBL may be a potential diagnostic biomarker of OA. These findings need to be elucidated through further exploration and verification, but still demonstrate OA pathophysiology, providing an important step in the diagnosis of this condition and showing therapeutic promise.
In the present study, through three stages of screening and validation research, a total of 18 candidate lncRNAs were obtained and 3 hub lncRNAs were selected to confirm the reliability of the results. It was verified that LINC00167 was down-expressed in the PBL of OA and could be detected in the early stages of OA. Our sequencing results show that the up-regulated DEGs were mainly enriched in the extracellular matrix, while the down-regulated DEGs were enriched in the IL-17 signaling pathway and wnt signaling pathway. This suggests that these dysregulated genes and enrichment pathways may play important roles in the occurrence and progression of OA.
In our study, LINC00167 was verified as being differently down-expressed in both human peripheral blood and the articular cartilage of OA patients. This highlights that the down-regulation of LINC00167 is very likely involved in the pathogenesis and may be an important part of the OA process. Another previous study showed that LINC00167 may serve as a biomarker for gastric cancer and LINC00167 might also be associated with gastric cancer through involvement in four pathways, including "cell adhesion molecules, " "cytokine-cytokine receptor interaction, " "leukocyte transendothelial migration, " and "chemokine signaling pathway" which concern 32 genes, and the gene TNFRSF13B was highly associated with LINC00167 (Hu et al., 2019). Importantly, TNFRSF13B is already known for its pivotal role in pain linked to inflammatory regulation, nociceptive signaling, and protein kinase functions, and differs significantly between participants with chronic pain and healthy controls (Polli et al., 2020). Additionally, LINC00167 functioned as a sponge for microRNA miR-203a-3p, restoring the expression of the suppressor of cytokine signaling 3 (SOCS3), which further inhibited the Janus kinase (JAK)/signal transducer and activator of transcription (STAT) signaling pathway . Other studies have shown that enhanced SOCS3 in the human body may limit both proliferation and inflammation (Gui et al., 2017;Kong et al., 2017), demonstrating that LINC00167 may play a protective role in OA. In addition, MALAT1 was identified as up-regulated in the microarray data, but down-regulated in the validation of qRT-PCR. Compelling studies have shown that MALAT1 is involved in some diseases, such as cardiovascular disease, diabetes-related complications, cancer, and metastasis, including OA (Abdulle et al., 2019;Sun and Ma, 2019;Yan et al., 2019;Zhang et al., 2019). MALAT1 was reported to be up-regulated in OA patients and responsible for cell proliferation, apoptosis, and ECM degradation via the miR-150-5p/AKT3 axis . This still requires further verification and should be explored in further studies.
To fully explore the key genes based on the previous analysis strategy, we predicted target mRNAs of the LINC00167 by starbase online software 2 (data are shown in Supplementary  Table 1). One limitation of this study is the lack of verification of the downstream regulatory mechanisms of LINC00167. More investigations are warranted to better elucidate the regulatory network of LINC00167 in patients in future studies.
This study has systematically characterized the profile of mRNAs and lncRNAs expression and their potential biological functions in blood samples between OA patients and controls. Considering the tissue specificity of lncRNAs, we focused on lncRNAs differently expressed in both cartilage (GEO dataset) and peripheral blood (RNA-seq samples) as the hub lncRNAs and further verified in an external population, which make our results more reliable. More importantly, the acquisition of blood samples is feasible as a means of early screening and diagnosis in clinical practice compared with cartilage samples.
There are still some limitations that need to be considered. Firstly, there was a small sample size involved in the RNA-seq. Further expression profiling studies with a larger sample size are still needed to validate these findings. Secondly, the difficulty of knee cartilage availability, especially donors of healthy control cartilage, limited our ability to identify and verify the different expression in OA cartilage, and normal cartilage. Thirdly, we did not collect social/economic information and medication 2 http://starbase.org information from patients owing to accessibility, which might influence the results. Last but not the least, although different skeletal sites (hip, shoulder, and ankle) have not been separately analyzed in the other OA groups because of limited participants, a non-significant difference was detected between the knee OA group and other OA groups. Further studies are warranted to test the veracity and credibility of this difference.
In conclusion, this study discovered that the expression of LINC00167 in OA cases is lower than healthy controls and its expression in serum is a potentially reliable diagnostic marker for osteoarthritis patients. However, the mechanism of action remains to be further elucidated. The expression of MALAT1 in PBL and cartilage in OA patients need to be validated in future research. Further research is warranted, to confirm the underlying biological mechanisms and illuminate feasibility as a biomarker.

DATA AVAILABILITY STATEMENT
This manuscript contains previously unpublished data. The name of the repository is GEO and accession number is GSE163552.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Ethics Committee of Shanghai University of Medicine and Health Sciences. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
LJ contributed to interpretation of results and made critical revisions. YZ drafted the protocol and JS wrote the final manuscript. YC, ZM, YY, and MC participated in the data collection. QQ, XZ, and SX also made critical revisions. All authors have reviewed the final version of the manuscript and approved it for publication.

ACKNOWLEDGMENTS
We acknowledge all participants and staff in the collection of data.