Deep Learning Approaches for lncRNA-Mediated Mechanisms: A Comprehensive Review of Recent Developments

This review paper provides an extensive analysis of the rapidly evolving convergence of deep learning and long non-coding RNAs (lncRNAs). Considering the recent advancements in deep learning and the increasing recognition of lncRNAs as crucial components in various biological processes, this review aims to offer a comprehensive examination of these intertwined research areas. The remarkable progress in deep learning necessitates thoroughly exploring its latest applications in the study of lncRNAs. Therefore, this review provides insights into the growing significance of incorporating deep learning methodologies to unravel the intricate roles of lncRNAs. By scrutinizing the most recent research spanning from 2021 to 2023, this paper provides a comprehensive understanding of how deep learning techniques are employed in investigating lncRNAs, thereby contributing valuable insights to this rapidly evolving field. The review is aimed at researchers and practitioners looking to integrate deep learning advancements into their lncRNA studies.


Introduction
In the evolving landscape of machine learning, deep learning has revolutionized our understanding and application of technology, paving the way for breakthroughs and novel explorations in a plethora of fields [1][2][3][4]. It has spurred the advent of innovative methods, such as image generation [5][6][7] and natural language processing techniques [8][9][10][11], each contributing to expanding the horizons of deep learning applications. This emerging area of study has also found its relevance in molecular biology, particularly in understanding long non-coding RNAs (lncRNAs) [12][13][14].
Our focus in this review is on lncRNAs, which are a category of RNAs that lack proteincoding potential and consist of nucleotide sequences longer than 200 nucleotides, which are transcribed and processed mostly from intergenic regions, introns with or without some exons, or enhancer regions of the genome [12], yet have emerged as significant contributors to numerous biological processes. lncRNAs, with their intricate and diverse functions, have gained increasing attention from the scientific community as their aberrations have been implicated in various diseases. The fast-paced research dedicated to lncRNAs, in parallel with the rapid advancements in deep learning, necessitates a comprehensive overview of the intersection of these two vital areas of study.
As signal molecules, lncRNAs play a crucial role in the transcription of downstream genes, often exhibiting a high degree of context specificity [15][16][17]. Recent studies have shown lncRNAs to be highly organized in their transcription processes, responding and adapting to different environmental stimuli to influence particular signaling pathways [18][19][20]. This detailed orchestration of transcription by lncRNAs, often in association with specific proteins, such as transcription factors, underscores the integral role they play in transcriptional regulation.
In their capacity as decoy molecules, lncRNAs perform a variety of functions, including interference in various molecular pathways. These RNAs interact directly with specific protein molecules after their transcription, leading to a disruption in the normal functioning of these proteins [21,22]. This interaction with transcription regulators inhibits the transcription factors' functionality, thus suppressing downstream gene transcription. Furthermore, lncRNAs can impede protein functionality, affecting their ability to regulate mRNA expression. Furthermore, lncRNAs have been found to play vital roles in tumor progression, participating in the regulation of gene expression at the epigenetic, transcriptional, and post-transcriptional levels [23,24]. Some lncRNAs influence gene expression by altering the chromatin structure, histone modification status, and DNA methylation status [25,26].
In the current technological landscape, deep learning has rapidly evolved to influence a range of scientific fields significantly. This remarkable evolution, combined with the emerging importance of lncRNAs, underscores the necessity to explore the confluence of these two spheres critically. Recognizing this need, we aim to deliver an exhaustive analysis of how deep learning is employed in the study of lncRNAs, thereby providing fresh insights into this fast-growing intersection.
This review is a testament to our commitment to keeping abreast with the most recent advancements in the field, specifically those from 2021 to 2023. As deep learning continues to evolve at a rapid pace, it is imperative to stay up-to-date with the latest research developments. The significance of such a review cannot be overstated, as it will be invaluable to researchers and practitioners who strive to integrate the advancements of deep learning into their lncRNA studies.

Paper Selection Process
The primary aim of the paper selection process was to ensure the inclusion of highquality, relevant research within the domain. To achieve this, we adopted an algorithmic approach centered around the academic search engine, Web of Science (WOS). Carefully selected search keywords were used, focusing on central themes, such as deep learning, lncRNA, and neural networks, to identify pertinent articles for review.
While recognizing the existence of preprints and conference papers, we chose to concentrate solely on peer-reviewed journal articles. This selection criterion enhances the reliability and validity of the review by ensuring the inclusion of studies that have undergone a stringent review process. This decision was driven by two principal factors. First, the peer-review process serves as a crucial mechanism for maintaining the quality and reliability of scientific literature by subjecting research to thorough scrutiny by domain experts. Second, peer-reviewed journals are traditionally esteemed as trustworthy and credible sources for publishing scientifically sound and influential research.
To maintain the novelty and uniqueness of the review, certain categories of articles, such as review articles and perspective pieces, were intentionally excluded. This approach aimed to emphasize the incorporation of primary research-focused studies, in alignment with the purpose of this review.
The review's temporal scope was restricted to articles published within the last three years, from 2021 to 2023. This period was selected to assure the relevance and contemporaneity of the review, and to provide a comprehensive understanding of the latest developments and trends in deep learning for lncRNA research. It is worth noting that data collection for 2023 was carried out until May. This was done to ensure that the review's currency aligns with the most recent advancements in the field. The selected studies are summarized in Table 1.

Research Topics Deep Learning Approaches
Prediction of lncRNA subcellular localization DeepLncLoc, a deep learning framework for lncRNA subcellular localization using subsequence embedding [90]; EVlncRNA-Dpred, an improved prediction method of experimentally validated lncRNAs using deep learning [91]; GM-lncLoc, lncRNA subcellular localization prediction based on graph neural network with meta-learning [92]; GraphLncLoc, predicting lncRNA subcellular localization using graph convolutional networks and sequence-to-graph transformation [93]; PlncRNA-HDeep, a plant long non-coding RNA prediction method that utilizes hybrid deep learning with two encoding styles [94].
An exploration of previous deep learning methodologies utilized in the study of lncRNA prior to 2021, as compared to the more recent developments from 2021 to 2023 will certainly provide a comprehensive understanding. A wealth of knowledge on the subject can be found in several representative articles. For example, Baek et al. [102] developed a deep learning-based approach, lncRNAnet, to identify lncRNAs, incorporating recurrent neural networks for RNA sequence modeling and convolutional neural networks for detecting stop codons to obtain an open reading frame indicator. Another study by Fan et al. [103] constructed a powerful predictor, lncRNA-MFDL, to identify lncRNAs by fusing multiple features of the open reading frame, k-mer, the secondary structure, and the most likely coding domain sequence, using deep learning classification algorithms. In 2019, Liu et al. [104] developed a deep learning model, which included a bidirectional long short-term memory model layer and a convolutional layer with three additional hidden layers for distinguishing lncRNAs from mRNAs. These methodologies provided significant advancements in the field. Moreover, the detailed review by Shaath et al. [105] further discusses the interplay between lncRNAs and RBPs and their involvement in epigenetic regulation via histone modifications, highlighting the potential for RNA-based therapeutics in cancer treatment. It is anticipated that the recent surge in research during 2021-2023 has built upon these foundational methodologies and will continue to push the frontier in our understanding and utilization of lncRNA in deep learning.
During the data collection process, we compiled information on the citation count and publication log for each selected article. These details played a significant role in evaluating the scope, impact, and acceptance of the research within the scientific community.
To provide a structured overview of the deep learning methodologies utilized in lncRNA research, the selected papers were categorized, based on the specific objectives of the studies. This categorization contributes to a comprehensive understanding of the deep learning landscape for lncRNA research, by enhancing the understanding of the diverse methodologies employed in the field. The objectives of the studies are summarized in Table 1.

Brief Analysis of Deep Learning Approaches in lncRNA Research
The application of deep learning methodologies to the study of lncRNAs has surged in recent years, as is reflected by the varied research areas illustrated in Table 1. This analysis provides a comprehensive view of the considerable strides made in this emerging field, emphasizing the necessity and relevance of this review.
From the assortment of categories that have emerged, one of the most extensively researched topics is the lncRNA-disease association. Multiple studies have endeavored to harness the predictive capabilities of various deep learning architectures to ascertain the relationship between lncRNAs and diseases [29,34,48]. The prodigious quantity of investigations in this domain highlights the compelling implications these associations could potentially have on clinical diagnostics and therapeutics.
Another critical domain in lncRNA research pertains to lncRNA-protein interactions. In an attempt to uncover the myriad roles that lncRNAs play in molecular biology, numerous researchers have applied deep learning models to predict potential interactions between lncRNAs and proteins [62,70]. By illuminating these interactions, we can elucidate the complex regulatory networks that underpin biological systems.
Similarly, the prediction of lncRNA-miRNA interactions has also gained substantial interest. These investigations capitalize on the ability of deep learning models to discern patterns and predict interactions [72,75], providing vital insights into the modulation of gene expression by lncRNAs.
A further significant facet of lncRNA research that has reaped the benefits of deep learning involves the identification and prediction of characteristics inherent to lncRNAs [88,100]. By leveraging the ability of deep learning models to learn complex representations from data, researchers have made inroads in understanding the fundamental properties that define lncRNAs.
An interesting development has been the application of deep learning in predicting lncRNA subcellular localization. This growing area of study is vital in comprehending the functional mechanisms of lncRNAs since the location of a lncRNA within a cell often indicates its potential role and function.
The investigation of the role lncRNAs play in immune responses using deep learning methods is less explored, yet equally important. While such studies are currently limited in number, they reflect an area that is ripe for further exploration, given the significant implications for immunology and disease pathology. Moreover, identifying lncRNAprotein-coding gene (PCG) associations is another emerging area in lncRNA research where deep learning has been applied.
The diverse application of deep learning in lncRNA research signals an exciting convergence of computational and biological sciences. The rapid development and wide-reaching applicability of deep learning models have significantly enriched our understanding of lncRNAs, highlighting the need to assess and review the state-of-the-art in this field continually. By providing a comprehensive overview of recent studies and emerging trends, this review aspires to be an invaluable resource for researchers exploring the promising intersection of deep learning and lncRNA research.

Distributive Analysis of Publications Across Various Journals
In the quest to examine the widespread diffusion of selected articles in a multitude of academic journals, a conspicuous pattern was brought to the fore. The compendium of selected literature exhibits an extensive array of publication outlets, which emphasizes the interdisciplinarity of the topic at hand, i.e., deep learning for lncRNA.
The journal wielding the most considerable influence, as measured by the sheer number of published papers, appears to be Briefings in Bioinformatics. This esteemed periodical has been the host to no fewer than 21 of the 75 papers evaluated, thus constituting an impressive 28% of the corpus.
BMC Bioinformatics follows on the leaderboard, which accounts for approximately 17% of the selected papers, with a tally of 13. Furthermore, the IEEE-ACM Transactions on Computational Biology and Bioinformatics journal has been instrumental in disseminating seven papers within our studied selection.
This distribution implicitly suggests that specific journals maintain a predilection for publishing research in the domain of deep learning applied to lncRNA. Figure 1 provides a comprehensive analysis, illustrating the allocation of papers among diverse journals.

Deep Learning Approaches in the Prediction of lncRNA-Disease Associations
LncRNAs have been increasingly recognized as critical components in numerous biological functions and disease processes [106]. The application of deep learning to elucidate lncRNA-disease associations provides an avenue for understanding the complex interactions of lncRNAs with other biological components, including genes and proteins [13]. The richness and complexity of these interactions necessitate computational models that can handle high-dimensional, context-dependent data, a challenge ideally suited to deep learning methodologies.
Notably, trends in lncRNA-disease association research are veering toward the integration of diverse biological data into predictive models. The incorporation of lncRNA sequence data, gene expression data, protein interaction data, and disease phenotype data into deep learning models is enriching the understanding of lncRNA-disease associations [107].
Additionally, the introduction of attention mechanisms has enhanced these models' ability to focus on the most informative parts of input data. This trend is particularly valuable given the context-specific nature of certain lncRNAs or disease phenotypes. Furthermore, the field is seeing an increased adoption of unsupervised and semi-supervised learning techniques to leverage the abundant amount of unlabeled data in the learning process. The summary of recent studies is provided in Table 2. Table 2. Summary of recent studies regarding the prediction of lncRNA-disease association. It is important to acknowledge that each study utilized diverse datasets, cross-validation methods, and simulation settings to assess accuracy, thus rendering direct comparisons potentially inconclusive. The best accuracy was selected if the model was assessed with various datasets.   [45] LGDLDA, non-linear feature learning of neural networks and node representation approximation AUC: 0.926

Ref. Methods
Better stability and performance in cancer-related lncRNA prediction, utilized diverse data Complexity due to diverse data integration [46] LR-GNN, GNN based on link representation for predicting molecular associations; GCN encoder for node embedding and layer-wise fusing rule for the output AUC: 0.9474 AUPR: 0.9497 Outperforms state-of-the-art methods in molecular association predictions, versatile in different association types May need an optimal layer-fusing rule design for performance [47] MAGCNSE, multi-view attention graph convolutional network and stacking ensemble model

Recent Advances from 2021 to 2023
The field of lncRNA-disease associations has seen an influx of diverse prediction models, with the recent trend leaning heavily toward deep learning approaches. The deep learning paradigm, through its effective non-linear data processing capabilities, enables the extraction of complex features and dependencies within the lncRNA-disease association data. The typical models employed in this field incorporate varied architectures, such as deep belief networks (DBNs), convolutional neural networks (CNNs), and attention mechanisms. These methodologies provide improved performance metrics, including the area under the receiver operating characteristic curve (AUC), the area under the precisionrecall curve (AUPR), and accuracy values. Notably, graph-based methodologies and multiomics data incorporation are often employed due to their ability to capture complex lncRNA-disease associations and provide comprehensive insights into the role of lncRNA in diseases. However, the choice of the prediction model is largely contingent on the specific requirements of each research, thereby carrying distinct merits and challenges.
In several studies, autoencoders and CNNs were used to improve prediction performance and explore potential disease associations [27,28,38,[42][43][44][45]53]. Despite achieving high accuracy rates, a shared disadvantage among these studies is the complexity introduced by the use of autoencoders and CNN. These models require careful parameter tuning and have difficulties integrating information from diverse data types or topologies. Some studies also noted issues with noise in the data, such as [42], which could not effectively remove noisy and irrelevant information.
The second group of studies implemented GNNs to predict lncRNA-disease associations [35,36,[38][39][40][41]46,47,50]. These studies demonstrated superior performance in making predictions, and were able to handle multi-view data and efficiently fuse node features, topological structures, and semantic information. However, they have a sensitivity to different datasets and can be complex due to the implementation of multiple attention mechanisms and the integration of multi-layer graph convolutional networks.
The final group of studies utilized matrix factorization or other machine learning techniques, such as support vector machines (SVM) and extreme learning machines (ELM) [33,34,37,49,51,52,54]. They were able to handle complex relationships and diverse features and achieved high accuracy rates. However, they were not without challenges. Some methods required optimal parameter selection, and there were concerns about the quality of the negative samples or graph-based information used in these studies.
In recent studies, while all of these studies have shown promising results in predicting lncRNA-disease associations, each approach has its advantages and potential areas for improvement. Future studies should consider these points when developing prediction models for lncRNA-disease associations.

Emerging Research Trends in Recent Studies
A major shift in recent studies revolves around the application of diverse network architectures, such as DBN, CNN, and attention mechanisms. The primary objective of these trends is to exploit the non-linear data processing potential of deep learning algorithms for complex feature extraction and elucidating the intricate dependencies in lncRNA-disease association data. From a quantitative perspective, models implementing these techniques have persistently demonstrated superior performance metrics.
Another trend gaining momentum is the utilization of graph-based methodologies, with notable contributions by Guo et al. [43], Zeng et al. [33], and Zhao et al. [39]. The inherent strength of these methods lies in their capability to effectively leverage the graph structure of lncRNA-disease associations, thereby capturing the complex interrelationships between lncRNAs and diseases with a higher degree of accuracy. This improved representation of data facilitates more accurate and robust prediction models.
Lastly, the incorporation of multiomics data in lncRNA-disease association prediction, as exemplified by Yuan et al. [45], stands out as a promising trajectory. Quantitatively, this comprehensive approach enables an increased understanding of lncRNA's role in diseases, substantially enhancing the quality of prediction. While these trends offer numerous advantages, it is important to consider the possible trade-offs. The complexity of these methods can lead to increased computational costs and complexities, and the need for large, high-quality datasets. The benefits and limitations of these trends need to be carefully balanced to maximize their potential in real-world applications.

Deep Learning Approaches in the Prediction of lncRNA-Protein Interactions
The prediction of lncRNA-protein interactions, a cornerstone in understanding lncRNA functionality, has seen a surge in the use of deep learning models [108]. Predominantly, these models integrate biological features to discern complex patterns. Ensemble learning and hybrid frameworks are often the preferred choices due to their robustness and resilience to overfitting. Moreover, novel methods for quantifying lncRNA gene essentiality are gaining prominence, allowing researchers to further investigate the implications of these interactions in gene functionality. Lastly, the incorporation of cutting-edge techniques, such as serial fusion, capsule networks, and graph autoencoders, signifies the continued evolution of prediction models in lncRNA-protein interaction prediction.
LncRNA-protein interaction (LPI) predictions often start with sequence-based prediction models, which transform the primary sequences of lncRNAs and proteins into feature vectors. Deep learning has shown great potential in predicting lncRNA-protein interactions, with models capable of automatically extracting meaningful features from raw sequence data, including the amino acids of proteins. To manage proteins of different lengths in these models, a process known as zero-padding is utilized, where zeros are added to each sequence up to a common length [109]. This technique is crucial when using raw amino acid sequences as input, as these models require input with the same shape. Furthermore, to tackle the complexity of the three-dimensional structures of proteins, vari-ous methods have been proposed, including those that integrate sequence and structure features of the lncRNA and protein, as well as those that apply machine learning algorithms to extract features from sequences [110]. Despite the complexity of plant genome structures, these techniques could be instrumental in predicting lncRNA-protein interactions in plants.
A summary of recent studies can be found in Table 3. Table 3. Summary of recent studies regarding the prediction of lncRNA-protein interaction. It is important to acknowledge that each study utilized diverse datasets, cross-validation methods, and simulation settings to assess accuracy, thus rendering direct comparisons potentially inconclusive. The best accuracy was selected if the model was assessed with various datasets.

Recent Advances from 2021 to 2023
Recent studies can approximately be categorized into three groups: The first cluster comprises methods that apply GNNs, demonstrated in [58] and [63]. They harnessed the power of GNN to predict lncRNA-protein interactions, achieving high AUC and AUPR scores. For instance, BiHo-GNN, a bipartite graph-embedding method based on GNN, reported an impressive AUC of 0.950 and AUPR of 0.899 [58]. iEssLnc, another graph neural network, leveraged meta-path-guided random walks on the lncRNA-protein interaction network to attain an AUC of 0.912 and AUPR of 0.921 [63]. Despite the clear advantages of these methods in terms of accuracy and recall, certain disadvantages exist. For instance, iEssLnc is specialized for essential lncRNA genes and is not generalized for all lncRNA-protein interactions.
In the second group, we see models that have integrated multiple features of lncRNAs and proteins to predict interactions, such as capsule-LPI [59], EnANNDeep [62], LGFC-CNN [64], and LPI-CSFFR [65]. These models demonstrated superior performance by integrating multimodal features or combining raw sequence composition, hand-designed, and structure features. For instance, the capsule-LPI's multimodal features and multichannel capsule network framework achieved an AUC of 0.951 and AUPR of 0.932, while LGFC-CNN delivered an exceptional AUC of 0.976 and AUPR of 0.970 [59,64]. However, the main limitation of these methods lies in their increased complexity, especially when fusing diverse features. Additionally, some methods, such as capsule-LPI, lack detailed evaluations for each feature [59].
The third group is characterized by models that use deep learning frameworks with different architectures or hybrid approaches. These include DeepLPI [60], DFRPI [61], LPI-deepGBDT [66], LPI-DLDN [67], LPI-HyADBS [68], and RLF-LPI [71]. These models demonstrated high performance due to their unique approaches. For instance, DeepLPI incorporates interactions between lncRNAs and protein isoforms with a hybrid framework of deep neural networks, while LPI-DLDN utilizes a deep learning framework with a dual-net neural architecture. However, these advanced models often require complex integration or feature dimension reduction.

Emerging Research Trends in Recent Studies
The analysis of the recent studies on lncRNA-protein interactions, as summarized in Table 3, reveals several notable trends. Firstly, the integration of deep learning models and various biological features stands out as a common and effective strategy in the field. The studies of Zhou et al. [70], Peng et al. [62,67], and Huang et al. [64] are characteristic of this approach. Their contributions attest to the powerful role deep learning algorithms play in discerning complex patterns, especially when combined with biological features that enhance the models' understanding of lncRNA-protein interactions.
Secondly, the recent trend of employing ensemble learning and hybrid frameworks is noticeable in studies, such as in the works by Zhou et al. [68] and Song et al. [71]. These studies capitalize on the strength of diverse learning models, making the prediction of lncRNA-protein interactions more robust and less prone to overfitting.
Moreover, the exploration of novel methods for quantifying lncRNA gene essentiality, such as in the study by Zhang et al. [63], further expands the scope of lncRNA research. This signifies a shift toward more comprehensive studies that not only predict interactions but also investigate the implications of these interactions in gene functionality.
Lastly, the efforts of Huang et al. [65], Li et al. [59], and Zhao et al. [69] epitomize the usage of novel techniques, such as serial fusion, capsule networks, and graph autoencoders. These cutting-edge methods contribute to the evolution of prediction models, further advancing the state-of-the-art in lncRNA-protein interaction prediction.

Deep Learning Approaches in the Prediction of lncRNA-miRNA Interactions
Recent studies on lncRNA-miRNA interactions are primarily centered around the development of deep learning frameworks due to their superior accuracy and cross-species applicability. These interactions, central to post-transcriptional gene regulation, are effectively predicted using sophisticated models such as BoT-Net [72]. The integration of attention mechanisms and neural networks into prediction models is becoming increasingly common, contributing to their improved performance. Hybrid feature mining networks are gaining traction due to their ability to extract useful feature information. Moreover, certain studies are exploring the potential of lncRNA-miRNA interactions in predicting miRNA-disease associations, thereby extending the scope of lncRNA-miRNA research into biomedical applications. However, the model choice largely depends on the specificity and sensitivity requirements of the study, each presenting unique advantages and challenges. A summary of recent studies can be found in Table 4. Table 4. Summary of recent studies regarding the prediction of the lncRNA-miRNA interaction. It is important to acknowledge that each study utilized diverse datasets, cross-validation methods, and simulation settings to assess accuracy, thus rendering direct comparisons potentially inconclusive. The best accuracy was selected if the model was assessed with various datasets.

Recent Advances from 2021 to 2023
In the domain of predicting lncRNA-miRNA interaction, numerous methodologies have been utilized, each demonstrating varied performance metrics and innovative methodological elements.
Firstly, deep learning models exploiting recurrent neural network (RNN) structures coupled with other strategies have shown substantial performance in predicting lncRNA-miRNA interactions. The authors of [72] utilized a hybrid model consisting of a bidirectional transformer (BoT-Net) and LSTM with DropConnect, in addition to feature pooling. Despite not specifically indicating any disadvantages, the study successfully optimized the lncRNA sequence length, thus improving specificity. Moreover, an optimized ensemble deep learning model, which employs independent RNNs (IndRNNs) and CNNs, was developed by [77]. While the authors have not pointed out any disadvantages, their model's advantages lie in the improved accuracy achieved through optimal hyperparameter tuning, suitable for large-scale data.
In the second group of studies, graph-based models were significantly utilized, exploiting the potential of deep learning and attention mechanisms. A representative study [73] implemented DWLMI, using the DeepWalk algorithm on an lncRNA-miRNA-diseaseprotein-drug graph, achieving high accuracy; however, it did not discuss the evaluation of each feature's influence. Similarly, GCNCRF [74] applied a graph convolutional network (GCN) coupled with a conditional random field (CRF) and an attention mechanism. Despite not mentioning specific disadvantages, the method managed to integrate an lncRNA-miRNA similarity network to achieve high AUC scores. The study by [76] applied a graph neural network-based RNA representation technique, termed ncRNAInter, and demonstrated robust performance and broad applicability across different species.
The third group of studies incorporated hybrid feature mining networks and multilevel information enhancement models to forecast lncRNA-miRNA interaction. Pmli-HFM [78] uses a hybrid feature mining network tailored for predicting plant miRNA-lncRNA interactions. Despite not specifying any disadvantages, this model uniquely integrated different encodings for miRNA and lncRNA, along with ensemble modules. PmliPEMG [79] employs an ensemble deep learning model, leveraging multi-level information enhancement and a greedy fuzzy decision, thereby incorporating complex fusion features and multi-scale convolutional LSTM networks.
Finally, some studies have developed models that utilize word vector representation and deep feature mining mechanisms. Notably, preMLI [80] is a deep learning model based on rna2vec pre-training and a deep feature mining mechanism. The approach uses rna2vec for RNA word vector representation and displays exceptional cross-species prediction capabilities.

Emerging Research Trends in Recent Studies
As we delve into the intricacies of lncRNA-miRNA associations, one can observe a clear trend toward leveraging advanced machine learning models for predicting these interactions, as evidenced by the studies summarized in Table 4. A noteworthy focus has been on developing deep learning frameworks that offer both improved accuracy and applicability across various species [75][76][77]80].
BoT-Net, a network approach that utilizes long short-term memory networks, provides a good illustration of the potential of these methods [72]. Furthermore, a substantial number of recent studies have shown a strong preference for incorporating attention mechanisms and neural networks into their models, leading to higher performance metrics [74,75].
Moreover, there is an emerging trend of integrating hybrid feature mining networks, which are known to effectively extract useful feature information, thus improving predictive accuracy for lncRNA-miRNA interactions [78]. The commitment to prediction accuracy is further underpinned by the DeepWalk-based method proposed by Yang et al., which offers a high average prediction accuracy [73]. Beyond the application of machine learning in prediction models, some studies have explored the promise of using these interactions to forecast potential miRNA-disease associations, hence pushing the boundaries of lncRNA-miRNA research into biomedical applications.

Deep Learning Approaches in the Classification and Prediction of lncRNA Characteristics
In the domain of lncRNA characteristics, deep learning has emerged as an effective tool for predicting lncRNA functions, identifying novel lncRNAs, and discovering lncRNAdisease associations. The recent trend is largely geared toward the utilization of machine learning and deep learning models, such as CNN and LSTM, to analyze lncRNA expression profiles, predict lncRNA interactions, and classify different RNA types. Concurrently, the design and application of ensemble methods are growing popular due to their superior predictive performance. Investigations into lncRNA stability and the factors influencing it also provide a more comprehensive understanding of lncRNA biology. However, each approach carries its own strengths and weaknesses, and the choice often depends on the specific objectives of the study.
lncRNAs, in contrast to their protein-coding counterparts, exhibit less conservation, thereby posing considerable challenges for computational models. However, certain unique properties of lncRNAs provide advantageous elements for deep learning. Large lncRNA datasets offer a wealth of data suitable for the training of deep learning algorithms. Additionally, the multi-level regulatory roles of lncRNAs provide multi-modal data (sequence, structure, interactions, expression levels), enabling a comprehensive analysis through multi-modal learning approaches. Importantly, lncRNAs can originate from various genomic regions, including intergenic regions, intronic regions, and antisense transcripts, with each category potentially bearing distinct functional implications. In terms of the prediction of lncRNA characteristics and their source genomic regions, several recent methods have emerged. For instance, multiple studies proposed CNN structures to handle these challenges [82,83,89]. Additionally, LSTM structures were commonly employed for the prediction of lncRNA characteristics [88].

Recent Advances from 2021 to 2023
Deep learning has paved the way for significant advancements in understanding lncRNA characteristics, primarily focusing on the prediction of lncRNA functions and lncRNA identification. Numerous studies have developed models to decipher the hidden complexities of lncRNA biology.
In the domain of predicting lncRNA functions, Zhang et al. [87] utilized an ensemble deep learning model, lncIBTP, for predicting interactions between lncRNAs and different types of biomolecules. The model demonstrated an impressive degree of effectiveness, presenting the potential of deep learning to provide insights into lncRNA functionality. The identification of lncRNAs is another crucial aspect in which deep learning approaches have been employed. Lin and Wichadakul [89] proposed Xlnc1DCNN, a one-dimensional convolutional neural network-based tool. It distinguishes lncRNAs from protein-coding transcripts and provides a rationalization for its predictions. This tool outperformed several others in accuracy and F1-score, demonstrating the effective application of convolutional neural networks in lncRNA identification. Another notable contribution came from Wang et al. [85], who developed LncDLSM, a deep learning-based framework capable of differentiating lncRNAs from protein-coding transcripts without requiring prior biological knowledge. This model excelled in lncRNA identification and exhibited a promising potential for transfer learning.
A unique approach was taken by Zhang et al. [81], who designed a class similarity network for classifying coding and long non-coding RNA. The network explores relationships between input samples and those from the same or different classes, thus obtaining highlevel features. The method consistently achieved high accuracy, precision, and F1 scores, signifying its proficiency in lncRNA classification. Ritu et al. [83] also proposed a novel bimodal CNN-based deep learning system, DeepPlnc, which integrated both sequence and structural properties for the identification of plant lncRNAs. DeepPlnc outperformed other tools even when dealing with ambiguous boundaries and incomplete sequences, solidifying its superior applicability in genome and transcriptome annotation tasks.
Promising developments were also made in understanding the stability of lncRNAs. Shi et al. [84] performed a genome-wide RNA-seq study on human lung adenocarcinoma cells and used deep learning-based regression to identify a non-linear relationship between the half-lives of lncRNAs and associated factors. This research illuminated a comprehensive understanding of lncRNA stability, showing the powerful potential of deep learning in elucidating ncRNA characteristics.
Deep learning's contributions extend to the identification of dual-functional lncRNAs as well; Liu et al. [86] developed LncReader, a deep learning model with a multi-head self-attention mechanism. LncReader outperformed various classical machine learning methods, reiterating the superior performance of deep learning in lncRNA research.

Emerging Research Trends in Recent Studies
In recent research trends, as encapsulated in Table 5, several innovative approaches and methodologies have emerged in the study of lncRNA-disease associations. Various machine learning and deep learning models are increasingly being employed to analyze lncRNA expression profiles, predict lncRNA interactions, and classify different RNA types. These computational tools are crucial in enabling more accurate, efficient, and scalable analyses of lncRNA data. Table 5. Summary of recent studies regarding the classification and prediction of lncRNA characteristics. It is important to acknowledge that each study utilized diverse datasets, cross-validation methods, and simulation settings to assess accuracy, thus rendering direct comparisons potentially inconclusive.

Ref. Methods
Accuracy Merits Disadvantages [81] Class similarity network, Siamese neural network-inspired model One remarkable trend observed is the utilization of CNN and LSTM in deep learning models for the prediction and classification of lncRNA [82,83,88,89]. These sophisticated tools offer high accuracy rates and demonstrate robust performance across various datasets. Moreover, they provide an advantage over traditional bioinformatics approaches, which may rely heavily on prior biological knowledge.
Another noteworthy development is the focus on the design and application of ensemble methods, which integrate multiple learning algorithms to obtain better predictive performance [87]. These models, such as WGAN-psoNN and lncIBTP, incorporate advanced concepts, such as NAS for optimal parameter tuning, thus alleviating data imbalance issues.
Furthermore, studies that examine the stability of lncRNAs, and the factors influencing their stability, are gaining traction [84]. These investigations provide a more comprehensive understanding of lncRNA biology and may inform future therapeutic strategies for diseases associated with lncRNA dysfunction.

Other Deep Learning Research Domains and Utilization of lncRNA-Related Data as Deep Learning Inputs
The exploration of lncRNAs has seen a significant shift toward the use of advanced deep learning models to predict lncRNA subcellular localization and distinguish different lncRNA types. Moreover, studies are increasingly focusing on the role of lncRNAs in immune responses and disease processes, particularly cancer. Novel frameworks, such as deep learning and graph neural networks, are being employed to handle the complex nature of lncRNA sequences and structures. Moreover, deep learning has also found utility in studies investigating the role of lncRNAs in immune responses and disease processes. Researchers are employing sophisticated machine learning techniques to predict lncRNA behavior and correlate it with disease states, offering insights into the complexities of immune responses and disease pathogenesis. A summary of recent studies can be found in Table 6. Table 6. Summary of other recent studies and utilizations of lncRNA-related data as deep learning inputs.

Ref. Methods
Merits Disadvantages [90] DeepLncLoc: Uses a subsequence embedding method that keeps the order information of lncRNA sequences. Utilizes a text convolutional neural network for feature extraction and prediction.
Effective for lncRNA subcellular localization prediction. Preserves sequence order information.
Depends on the quality of subsequence embedding. Might miss some complex patterns. [91] EVlncRNA-Dpred: Uses deep learning algorithms to distinguish experimentally validated lncRNAs from mRNAs and high-throughput lncRNAs. Utilizes a three-layer deep learning neural network with a small convolutional neural network.
Provides a method for prioritizing potentially functional lncRNAs for experimental validations.
Accuracy can be limited by the small dataset sizes. [92] GM-lncLoc: Uses a graph neural network with meta-learning to predict lncRNA subcellular localization. Combines the initial sequence information with graph structure information to extract features.
Shows high accuracy, holds the potential to solve the problem of limited samples in lncRNA subcellular localization.
Performance heavily depends on the quality of the graph's structure information. [93] GraphLncLoc: Uses graph convolutional networks to predict lncRNA subcellular localization. Transforms lncRNA sequences into de Bruijn graphs.
Can reveal sequence patterns and motifs. Demonstrates robustness against k-mer frequency features.
Transforming sequences into graphs might lead to the loss of certain information. [94] PlncRNA-HDeep: Hybrid deep learning model that uses LSTM and CNN trained on RNA sequences encoded by p-nucleotide and one-hot encodings.
Achieves high accuracy on the Zea mays dataset. More effective than traditional machine learning methods and some existing tools.
Model complexity could lead to overfitting.

Ref. Methods
Merits Disadvantages [95] Uses lncRNAs to develop a prognostic model that predicts the survival rates of BC patients. Constructs ceRNA networks correlated with the infiltration of CD8 T cells.
Can help understand the role of lncRNA in BC. Useful for predicting patient prognosis.
Relies on the bioinformatic prediction of CD8 T cell abundances, which might not always be accurate. [96] A combined approach using logistic regression and multilayered neural networks to identify lncRNAs related to Bovine Johne's Disease.
Identifies potential lncRNA targets in host immunity against Mycobacterium avium infection.
Not specified [97] Multi-view multitask learning method that predicts microRNA-disease associations from lncRNA-microRNA interactions.
Developed the MVMTMDA model for predicting miRNA-disease associations, achieving an average AUC of 0.8410 ± 0.018.
Requires comprehensive lncRNA-miRNA interaction data. [98] Multimodal deep learning integrating histopathological and molecular data to evaluate the microsatellite instability of colorectal cancer.
The developed model achieves a high AUC of 0.952 when combining an H&E image with DNA methylation data.
Accuracy decreases when combining an H&E image with all types of molecular data. [99] Uses graph convolution networks with multichannel attention mechanism to predict miRNA-disease associations based on lncRNA-miRNA interactions.
Achieves average AUC values of 0.8994, 0.9032, and 0.9044 in different cross-validation setups.
Lacks comparative analysis with non-deep learning models. [100] WGAN-psoNN: Combines the Wasserstein distance-based generative adversarial network (WGAN) and particle swarm optimization neural network (psoNN) to predict lymph node metastasis events using lncRNA expression profiles.
Reduces the requirement for deep learning data quantity and architecture selection.
Uses the novel neural network architecture search (NAS) method, which is untested in other studies. [101] GAE-LGA: Uses graph autoencoders to integrate multiomics data and identify lncRNA-PCG associations.
Shows strong robustness in capturing lncRNA-PCG associations and outperforms other machine learning-based methods.

Recent Advances from 2021 to 2023
The exploration and prediction of lncRNAs have been widely adopted through advanced computational methods. Among them, DeepLncLoc is a noteworthy tool that uses a subsequence embedding method to retain the order information of lncRNA sequences and incorporates a text convolutional neural network for feature extraction and prediction, proving its effectiveness for lncRNA subcellular localization prediction [90]. Along similar lines, the GM-lncLoc model combines sequence information with graph structure information, leveraging a graph neural network with meta-learning for prediction [92]. This approach demonstrates high accuracy and a promising solution for handling limited sample sizes often seen in lncRNA subcellular localization studies.
Further, GraphLncLoc and PlncRNA-HDeep have shown promising results in lncRNA prediction. The former employs graph convolutional networks and transforms lncRNA sequences into de Bruijn graphs, unveiling sequence patterns and motifs [93]. PlncRNA-HDeep implements a hybrid deep learning model using LSTM and CNN trained on RNA sequences encoded by p-nucleotide and one-hot encodings, achieving high accuracy on the Zea mays dataset and outperforming traditional machine learning methods [94].
Turning to the functional roles of lncRNAs in immune response pathways, the study by [95] utilizes lncRNAs to develop a prognostic model, predicting the survival rates of breast cancer (BC) patients. The model constructs ceRNA networks correlated with the infiltration of CD8 T cells, providing valuable insights into the role of lncRNA in BC. Similarly, the authors of [96] employed a combination of logistic regression and multilayered neural networks to identify lncRNAs related to Bovine Johne's Disease, revealing potential lncRNA targets in host immunity against Mycobacterium avium infection.
Deep learning applications utilizing lncRNA input data have significantly contributed to the biomedical field. MVMTMDA, a multi-view multitask learning method, predicts miRNA-disease associations from lncRNA-miRNA interactions with an average AUC of 0.8410 ± 0.018 [97]. Another study employed a multimodal deep learning model by integrating histopathological and molecular data to evaluate microsatellite instability in colorectal cancer [98]. This model achieved a high AUC of 0.952 when combining H&E image with DNA methylation data. Moreover, a model built on graph convolution networks with a multichannel attention mechanism predicted miRNA-disease associations based on lncRNA-miRNA interactions, achieving high average AUC values in different cross-validation setups [99].
One particularly innovative approach, WGAN-psoNN, combines the WGAN and particle swarm optimization neural network for predicting lymph node metastasis events using lncRNA expression profiles [100]. Finally, for the identification of lncRNA-PCG associations, the GAE-LGA model utilizes graph autoencoders to integrate multiomics data. This method shows robust capacity in capturing lncRNA-PCG associations, outperforming other machine learning-based approaches [101].

Emerging Research Trends in Recent Studies
The prevailing trend in recent studies, as encapsulated in Table 6, distinctly manifests the increasing significance of computational and deep learning models in the domain of lncRNAs research. A distinct transformation in methodological approaches and predictive models can be observed, mirroring the swift expansion of artificial intelligence and deep learning into biological and medical research fields.
Predominantly, it is seen that many recent studies are oriented toward exploring the subcellular localization of lncRNAs, reflecting the importance of this aspect in understanding the functional roles of lncRNAs. In this quest, researchers are investing significant efforts in constructing robust and sophisticated models that can predict lncRNA localization with high accuracy.
Moreover, the utilization of novel frameworks, such as deep learning and graph neural networks, underscores a consistent theme in these investigations. Such models capitalize on the intricate and complex nature of lncRNA sequences and structures, offering an exciting promise of outperforming traditional machine learning techniques.
Additionally, the studies reflect an increasing interest in the role of lncRNAs in relation to diseases. This manifests in developing models for predicting lncRNA associations with various cell types and disease conditions, in particular, cancer. Such investigations are integral for deepening our understanding of the role of lncRNAs in disease pathogenesis and potential therapeutic interventions.

Challenges and Future Prospects
LncRNAs have shown substantial promise as novel biomarkers and therapeutic targets in numerous diseases. Notwithstanding, there are still several significant challenges to address and opportunities to seize in the application of deep learning in lncRNA research.

Challenges
Firstly, the scarcity of large, well-annotated datasets is a significant obstacle [111]. Many existing lncRNA-related databases are either relatively small, lack comprehensive annotation, or focus on a particular category of lncRNAs, which restricts the variety and volume of data available for model training and validation. Given the importance of large datasets in deep learning, this poses a substantial challenge to the development of highly accurate and robust models. Moreover, these databases often contain biases, which can inadvertently be learned by the model, leading to biased predictions [112].
Secondly, the heterogeneous nature of lncRNAs themselves presents a significant challenge. LncRNAs are known for their diversity in terms of their biogenesis, structure, function, and localization, which can complicate the feature extraction process in model development [113]. Current methods may not fully capture the complexity and diversity of lncRNAs, leading to the potential loss of valuable predictive information.
Finally, interpretability is a persistent problem in deep learning. Many deep learning models, particularly those with many hidden layers, are often criticized as "black boxes" due to their complex internal computations. This lack of transparency is particularly problematic in biological and medical applications, where understanding the rationale behind predictions is critical [114]. In lncRNA research, this could mean difficulty in identifying key lncRNA features that drive the model's predictions, impeding the biological understanding of lncRNA functions.

Future Prospects and Directions
Despite these challenges, the application of deep learning in lncRNA research holds significant potential for future advancements. Novel strategies for data augmentation, such as transfer learning, can help to mitigate the issue of data scarcity and improve model performance [115]. Furthermore, the development of more advanced feature extraction techniques that can better capture the complex characteristics of lncRNAs will likely enhance model accuracy and robustness.
Regarding interpretability, recent advancements in model interpretability, such as the development of attention mechanisms and interpretation algorithms, provide promising directions for improving the transparency of deep learning models [116]. In lncRNA research, these tools can aid in identifying important lncRNA features and elucidating their biological significance.
Finally, with the rapid advancement of sequencing technologies and the consequent explosion of genomic data, there is enormous potential for the application of deep learning in lncRNA research. The integration of multiomics data, including transcriptomics, proteomics, and metabolomics, can provide a comprehensive understanding of the complex roles of lncRNAs in biological systems. Developing a multiomics-based deep learning model, especially an advanced multi-attention type encompassing lncRNA, miRNA, mRNA, protein, and their pair matrices, for classification and prediction purposes, would ideally provide a comprehensive understanding of the biological systems. However, there exist substantial challenges in developing such a model, including the requirement for extensive computational resources, the need for expertise in bioinformatics to gather and integrate multiomics data, the difficulty in developing an efficient and effective feature selection strategy due to the high-dimensionality and complexity of the data, and the need to accommodate for the inherent high noise and heterogeneity present in biological data. Despite these challenges, such a direction holds immense promise for future research. For instance, in a recent study [117], mRNA and miRNA data were integrated and used in the analysis. Although lncRNA was not involved in this particular instance, it is anticipated that deep learning methods with multiomics data sources, such as the approach employed in this study, will become a prevalent direction in the future.

Conclusions
The journey through the landscape of deep learning applications in lncRNA research, as charted in this review, reaches its culmination in this concluding section, where we attempt to knit together the strands of discussions that have threaded their way through the preceding discourse.
An unmistakable trend, revealed through a diligent combination of the literature, is the considerable quantity of investigations that have concentrated on the association between lncRNAs and diseases. This serves as a testament to the recognition within the research community of the profound influence that lncRNAs wield on our physiological and pathological states. These studies have exploited the power of deep learning architectures to decipher the intricate interconnections between lncRNAs and various maladies.
From this multitude of lncRNA-disease association studies, it is clear that the tools and techniques of deep learning are being increasingly harnessed to delve deeper into the complexities of lncRNA functions and their roles in health and disease. This has undoubtedly added significant depth and breadth to our understanding of lncRNA dynamics.
However, it is important to emphasize that this represents only a fraction of the immense potential that deep learning holds for the further exploration of lncRNA. With the accelerating advancements in deep learning methodologies and increasing availability of highthroughput lncRNA data, there is much to look forward to in the realm of lncRNA research.
While the progress achieved thus far in lncRNA research via deep learning is certainly commendable, the journey has just begun. The horizon is replete with possibilities waiting to be uncovered, and it is our hope that this review has inspired further intellectual curiosity and will act as a catalyst for novel studies in this rapidly evolving field.