Integrating Omics Data and AI for Cancer Diagnosis and Prognosis

Simple Summary Cancer remains one of the leading causes of death worldwide, which emphasizes the need for its early and accurate diagnosis and prognosis. Our review explores AI’s potential in this field, analyzing 89 recent studies from 2020 through 2023. Specifically, these studies included AI applications for the analysis of multi-omics data: radiomics, pathomics, clinical records, and lab data. Notably, eight studies combined diverse omics data types (genomics, transcriptomics, epigenomics, and proteomics). Integration of AI for the analysis of clinical and omics data contributes to a significant advancement and is essential for safe clinical implementation. Abstract Cancer is one of the leading causes of death, making timely diagnosis and prognosis very important. Utilization of AI (artificial intelligence) enables providers to organize and process patient data in a way that can lead to better overall outcomes. This review paper aims to look at the varying uses of AI for diagnosis and prognosis and clinical utility. PubMed and EBSCO databases were utilized for finding publications from 1 January 2020 to 22 December 2023. Articles were collected using key search terms such as “artificial intelligence” and “machine learning.” Included in the collection were studies of the application of AI in determining cancer diagnosis and prognosis using multi-omics data, radiomics, pathomics, and clinical and laboratory data. The resulting 89 studies were categorized into eight sections based on the type of data utilized and then further subdivided into two subsections focusing on cancer diagnosis and prognosis, respectively. Eight studies integrated more than one form of omics, namely genomics, transcriptomics, epigenomics, and proteomics. Incorporating AI into cancer diagnosis and prognosis alongside omics and clinical data represents a significant advancement. Given the considerable potential of AI in this domain, ongoing prospective studies are essential to enhance algorithm interpretability and to ensure safe clinical integration.


Introduction
In 1950, Alan Turing introduced the concept of a thinking machine, marking the birth of artificial intelligence (AI) [1].Today, AI has seamlessly integrated into our lives through familiar names like Siri, Alexa, and Google Assistant.The impact of AI is profoundly felt in oncology, where it has revolutionized the approach to complex challenges posed by cancer.AI-driven techniques have notably elevated the precision and efficiency of oncologic research, opening doors to personalized cancer treatments.Its applications span various areas, including cancer image analysis, genomic studies, data mining from medical records, and drug discovery [2][3][4][5].
There are two main subsets of AI: machine learning and deep learning [4].Machine learning is a branch of AI that concentrates on creating computer software or algorithms capable of learning from data to make predictions autonomously, without the need for radiomics, pathomics, and clinical and laboratory data.In this study, the emphasis lies on exploring the potential of AI in cancer prediction and diagnosis rather than on establishing a direct comparison group or intervention.However, in some studies, a comparison group might involve traditional methods of cancer prediction and diagnosis without AI.ChatGPT was utilized in this study to check spelling and grammar errors.
As a first step, we compiled articles from the past decade to get a better sense of the volume of publications.Among 638 articles as shown in Figure 1, the number of publications on AI and omics from 2018 to 2023 has notably increased, particularly since 2020.However, this query was before we performed screening process and thus included articles from predatory journals.To focus on more recent developments and ensuring relevance to current trends of AI, we opted to narrow the time frame of the studies from 2013-2023 to 2020-2023.It is important to note that the number of publications in 2023 is less than 2022 due to lag in the report of all publications in 2023 by various publishers (delay in complete reports).PubMed and EBSCO databases were used to search the eligible publications from 1 January 2020 to 22 December 2023.The query terms were "artificial intelligence", "machine learning", "deep learning", "cancer diagnosis", "cancer prognosis", "multi-omics", "genomics", "epigenomics", "transcriptomics", "proteomics", "metabolomics", "microbiomics", "radiomics", "pathomics", and "clinical data."Articles on relevant clinical studies in English were included.The search criteria on PubMed were filtered to include only results with "full text available."On EBSCO, we utilized the "find all my search term" option and included the "also search within full text of the articles" expander.We set result limits to include articles that were peer-reviewed, with full text and references available.From 638 articles, we selected 212 articles to screen based on the following processes.We used articles from publications with the DOAJ (Directory of Open Access Journals) seal to prevent articles from predatory journals.The DOAJ is a reputable database that indexes high-quality, open-access scholarly journals.We postulated that using articles from journals with the DOAJ seal would add a layer of quality assurance since the DOAJ employs a stringent review process for journal inclusion.We independently screened the database on December 23, 2023, and reached a consensus under the instructions of a project supervisor (Dr.Anna Blenda).A meta-analysis could not be conducted due to the heterogeneity in the design of these studies.We utilized Zotero 6.0.37,reference management software, for the systematic screening of articles throughout the review process.In Zotero, articles were screened by title and abstract.Full texts were retrieved.We reviewed the full From 638 articles, we selected 212 articles to screen based on the following processes.We used articles from publications with the DOAJ (Directory of Open Access Journals) seal to prevent articles from predatory journals.The DOAJ is a reputable database that indexes high-quality, open-access scholarly journals.We postulated that using articles from journals with the DOAJ seal would add a layer of quality assurance since the DOAJ employs a stringent review process for journal inclusion.We independently screened the database on December 23, 2023, and reached a consensus under the instructions of a project supervisor (Dr.Anna Blenda).A meta-analysis could not be conducted due to the heterogeneity in the design of these studies.We utilized Zotero 6.0.37,reference management software, for the systematic screening of articles throughout the review process.In Zotero, articles were screened by title and abstract.Full texts were retrieved.We reviewed the full texts against the inclusion criteria, which is peer-reviewed, scholarly articles evaluating use of AI in cancer diagnosis and prognosis using multi-omics data, radiomics, pathomics, and/or clinical and laboratory data.The following were excluded from this study: 1. duplicates, 2. review articles, 3. systematic reviews, 4. absence of AI implementation, 5. study aims that were not associated with our theme, 6. inappropriate data type, 7. studies that mislabeled LASSO-Cox method as machine learning, 8. studies with sample size of less than 100, 9. a study protocol, 10. a study that failed to specify the particular machine learning technique employed, and 11. animal studies.

Results
In the analysis of 89 studies, we found a broad spectrum of AI applications within cancer research (Figure 2).There were two articles focusing on genomics data, twenty-one articles on transcriptomics data, three articles on epigenomics data, one article each on proteomics and metabolomics data, eight articles on multiomics data, thirty articles on radiomics data, three articles on pathomics data, and twenty articles on clinical data.No article on microbiomics data was found.Among these studies, 35 articles were pertinent to cancer diagnosis, while 54 articles were about cancer prognosis.Figure 3 shows a visual representation of the frequency of the top five AI models employed.
texts against the inclusion criteria, which is peer-reviewed, scholarly articles evaluating use of AI in cancer diagnosis and prognosis using multi-omics data, radiomics, pathomics, and/or clinical and laboratory data.The following were excluded from this study: 1. duplicates, 2. review articles, 3. systematic reviews, 4. absence of AI implementation, 5. study aims that were not associated with our theme, 6. inappropriate data type, 7. studies that mislabeled LASSO-Cox method as machine learning, 8. studies with sample size of less than 100, 9. a study protocol, 10. a study that failed to specify the particular machine learning technique employed, and 11. animal studies.

Results
In the analysis of 89 studies, we found a broad spectrum of AI applications within cancer research (Figure 2).There were two articles focusing on genomics data, twenty-one articles on transcriptomics data, three articles on epigenomics data, one article each on proteomics and metabolomics data, eight articles on multiomics data, thirty articles on radiomics data, three articles on pathomics data, and twenty articles on clinical data.No article on microbiomics data was found.Among these studies, 35 articles were pertinent to cancer diagnosis, while 54 articles were about cancer prognosis.Figure 3 shows a visual representation of the frequency of the top five AI models employed.The Random Forest (RF) method was the most prominently employed method.Studies that we reviewed with all data types except for pathomics did use RF.RF is a ML classifier composed of a collection of tree-structured classifiers {h(x, Θk), k = 1,...} where The Random Forest (RF) method was the most prominently employed method.Studies that we reviewed with all data types except for pathomics did use RF.RF is a ML classifier composed of a collection of tree-structured classifiers {h(x, Θ k ), k = 1, . ..}where the {Θ k } are identically distributed random vectors and each tree casts a unit vote for the most popular class from a dataset [11].Each decision tree within RF is trained on Θ k , which is a random subset of the training data and features, as illustrated in Figure 4.During prediction, the output of each tree is aggregated to produce the final prediction.This integration of multiple decision trees serves to improve the accuracy and robustness of RF.
Support Vector Machine (SVM), Logistic Regression (LR), and XGBoost are legacy methods that are very well discussed and utilized in the literature.Here we refer the readers to the existing discussion [12][13][14].However, since Deep Neural Network approaches such as Convolutional Neural Networks (CNNs) are considered state-of-the-art with the most impact, here we provide a brief discussion of Convolutional Neural Networks.It is also noteworthy that CNNs were the most popular method in radiomics and pathomics data analysis.A CNN is a type of DL model that uses convolutional operations to find important features in input data by overlapping and combining local areas [15].This helps the network to recognize patterns even when they are not pre-labeled in the training data.The first step is to extract features from the input image.These features are then combined and reduced in size through pooling before being turned into the final network outputs.The last layers of the CNN connect all the neurons together and act as classifiers by sorting the input into different categories.Finally, the output layer gives the final classification or regression result, often using Softmax to calculate class probabilities.
For clarity and organization, the studies were categorized into eight sections based on the type of data utilized.Each section was further subdivided into two subsections focusing on cancer diagnosis and prognosis, respectively.Within these subsections, pertinent information from the articles was systematically collated into tables, including the title, author and year, study aim, modality of AI employed, and outcome or performance.Each table serves as a discrete subset under either cancer diagnosis or prognosis to facilitate efficient referencing and comparison.Articles under each table were then organized based on their study aim.
In evaluating the performance of AI models, it is important to understand several parameters, including accuracy, sensitivity, specificity, area under the curve (AUC), and concordance index (C-index).Accuracy measures the proximity of measurements to their true values.Sensitivity evaluates a model's ability to predict true positives, while specificity assesses the model's capacity to predict true negatives.AUC gives a comprehensive measure of performance across various classification thresholds, calculated as the area under the ROC curve.Then C-index, like the AUC, assesses the performance of prediction models, particularly in the context of survival analysis.A C-index closer to 1.0 indicates better predictive performance.In addition to the parameters, various AI algorithms or statistical methods were compared to evaluate the performance of AI.
is a random subset of the training data and features, as illustrated in Figure 4. Dur prediction, the output of each tree is aggregated to produce the final prediction.This tegration of multiple decision trees serves to improve the accuracy and robustness of Support Vector Machine (SVM), Logistic Regression (LR), and XGBoost are leg methods that are very well discussed and utilized in the literature.Here we refer the re ers to the existing discussion [12][13][14].However, since Deep Neural Network approac such as Convolutional Neural Networks (CNNs) are considered state-of-the-art with most impact, here we provide a brief discussion of Convolutional Neural Networks.also noteworthy that CNNs were the most popular method in radiomics and pathom data analysis.A CNN is a type of DL model that uses convolutional operations to fi important features in input data by overlapping and combining local areas [15].This he the network to recognize patterns even when they are not pre-labeled in the training d The first step is to extract features from the input image.These features are then combi and reduced in size through pooling before being turned into the final network outp The last layers of the CNN connect all the neurons together and act as classifiers by sort the input into different categories.Finally, the output layer gives the final classification regression result, often using Softmax to calculate class probabilities.
For clarity and organization, the studies were categorized into eight sections ba on the type of data utilized.Each section was further subdivided into two subsecti focusing on cancer diagnosis and prognosis, respectively.Within these subsections, pe nent information from the articles was systematically collated into tables, including title, author and year, study aim, modality of AI employed, and outcome or performan Each table serves as a discrete subset under either cancer diagnosis or prognosis to fa tate efficient referencing and comparison.Articles under each table were then organi based on their study aim.
In evaluating the performance of AI models, it is important to understand sev parameters, including accuracy, sensitivity, specificity, area under the curve (AUC), concordance index (C-index).Accuracy measures the proximity of measurements to th true values.Sensitivity evaluates a model's ability to predict true positives, while spec ity assesses the model's capacity to predict true negatives.AUC gives a comprehens measure of performance across various classification thresholds, calculated as the a under the ROC curve.Then C-index, like the AUC, assesses the performance of predict models, particularly in the context of survival analysis.A C-index closer to 1.0 indica better predictive performance.In addition to the parameters, various AI algorithms or tistical methods were compared to evaluate the performance of AI.    [16].The copyright of the image has been confirmed and verified.

Clinical Applications Based on Genomics
The following studies in Table 1 made notable contributions to the field of genomics by leveraging computational algorithms to predict key genetic patterns and treatment responses in cancer patients.Based on our search and exclusion criteria, there were only two papers on genomics data for various reasons (elaborated upon in the Discussion section).Because of the ability of modern AI to incorporate multimodal data, genomics data are often accompanied by other types of data, and therefore in this manuscript they are discussed in other sections.Typically, genomics data are combined with health records and other patient data.It is also possible (in theory) to combine genomics and some form of imaging.for instance, genomics data are often combined with radiology images or other -omics fields.; this image was adapted and modified from the following study [16].The copyright of the image has been confirmed and verified.

Clinical Applications Based on Genomics
The following studies in Table 1 made notable contributions to the field of genomics by leveraging computational algorithms to predict key genetic patterns and treatment responses in cancer patients.Based on our search and exclusion criteria, there were only two papers on genomics data for various reasons (elaborated upon in the Discussion section).Because of the ability of modern AI to incorporate multimodal data, genomics data are often accompanied by other types of data, and therefore in this manuscript they are discussed in other sections.Typically, genomics data are combined with health records and other patient data.It is also possible (in theory) to combine genomics and some form of imaging.for instance, genomics data are often combined with radiology images or other -omics fields.

Clinical Applications Based on Transcriptomics
The following studies in Table 2 advanced the field of transcriptomics by employing machine learning (ML) and deep learning (DL) methods to analyze gene expression data and identify biomarkers associated with cancer.In terms of cancer prognosis, these studies employed ML methods to identify RNA signatures associated with various aspects of cancer prognosis and treatment response.

Clinical Applications Based on Epigenomics
The following studies in Table 3 contributed to epigenomics by employing various ML techniques to analyze epigenetic data and uncover important insights related to cancer prognosis and mutation detection.

Clinical Applications Based on Proteomics and Metabolomics
The following studies in Table 4 employed various ML techniques to analyze proteomics and metabolomics data.

Clinical Applications Based on Multiomics data
The following studies in Table 5 significantly advanced the field of multiomics by introducing innovative approaches to integrate diverse data types for cancer research.Multiomics data included genomics, transcriptomics, epigenomics, and proteomics.In terms of cancer prognosis, these studies leveraged various omics data and integrated them with clinical features to predict important outcomes in cancer.

Clinical Applications Based on Radiomics
In radiomics, these studies in Table 6 used ML and DL techniques for various tasks, including the classification of malignant versus benign tumors, gene expression prediction, and cancer invasion prediction.In terms of cancer prognosis, these studies achieved several advancements in predictive modeling and prognosis assessment for survival, metastasis prediction, and treatment complications.

Clinical Applications Based on Pathomics
In the field of pathomics, the following studies in Table 7 made notable contributions to cancer diagnosis and treatment response prediction by employing CNN models and were able to highlight the potential of pathomic analyses in personalized medicine and treatment optimization for cancer patients.

Pathomics-based prediction of treatment responses
VGGNet had the best predictive ability and was utilized as a backbone model to identify transcriptomic subtypes and predict therapy responses.Neither external nor independent validation was performed.

patients AlexNet, GoogLeNet, and VGGNet
To evaluate a CNN model that diagnoses ovarian cancer and predicts treatment response Yu et al., 2020 [85].

Clinical Applications Based on Clinical and Laboratory Data
In the field of clinical data analysis, the following studies in Table 8 showcased the integration of diverse data modalities for cancer prediction and classification by collectively highlighting the potential of integrating clinical and traditional medical data with ML approaches to enhance cancer diagnosis and prognostication.In terms of cancer prognosis, these studies made significant strides in utilizing ML methods for survival prediction, recurrence prediction, and treatment response assessment across various cancer types.These studies collectively demonstrate the effectiveness of ML approaches in leveraging clinical data to predict cancer prognosis, recurrence risk, and treatment outcomes, thus paving the way for personalized cancer management strategies.

Discussion
In this review, we presented various AI techniques that utilize multi-omics, radiomics, and pathomics, as well as clinical and laboratory data.While some studies focused solely on assessing AI's performance using individual data types, a significant proportion incorporated the integration of diverse data types.This is because of the unique ability of modern ML techniques to integrate heterogenous modes of data to provide a more informed inference.Combining gene mutations with social and behavioral determinants of health will clearly address some of the long-standing challenges in precision and personalized medicine.It is only through a more holistic evaluation of a patient that the most accurate diagnosis and prognosis can be determined.This is one of the primary reasons why most studies incorporate multiple data modalities in their AI implementations.In addition, some forms of data are a poor fit for analysis with AI and therefore are relatively less frequently explored.For instance, it is difficult to formulate an AI/ML-based approach to the analysis of purely genomic data.The most comprehensive approach to genomic analysis would require an AI capable of accepting four billion inputs (the human genome size) to provide an inference.There are many practical limitations in the development of such a network, including computation time, memory capacity of existing computers, and the curse of high dimensionality of the data to name a few.A more pragmatic approach is to limit the analysis to a limited number of genes and loci as relevant indicators that are selected based on other data such as differentially expressed genes, gene methylations, or gene pathway analysis.Studies that were purportedly limited to a single data type integrated demographic information into their AI models.This integrated approach is advantageous given the complexity of cancer as a biological phenomenon, consequently bolstering diagnostic and prognostic capabilities.With cancer being one of the leading causes of death, improving diagnosis and prognosis is an area of medicine that has caught the attention of many physicians and researchers.
In this manuscript, we have aimed to maintain an agnostic position in reporting a summary of the published work and to remain neutral with regards to recommending the best approach and promoting one work over another.In fact, such a discrimination would be very difficult to accomplish since the most successful approach will vary based on the data modality.For instance, convolutional neural networks are generally recognized to be the most suitable approach in analysis and segmentation of images.On the other hand, recurrent neural networks such as LSTM would be most appropriate in application to temporal (potentially longitudinal) analysis such as demand forecasting (e.g., hospital resource use, medication dosage, etc.).In domains where the interpretability of the mechanism by which machines are operating is of critical importance, decision trees or Random Forests may be the most appropriate approaches.Another complication in making naïve comparisons between two methods is the uniformity of the data or other evaluation conditions.For example, two image segmentation methods applied to CTA scans of the abdominal cavity may produce results of 80% and 90%.While it is simple to conclude the former method's superiority over the latter, the question remains whether the quality of the data collected between the two experiments was comparable or not.How about the sample size or diversity of the data?Many factors need to be normalized across all studies to draw a meaningful comparison between different approaches.This is the primary impetus in the need for normalization and standardization of data within the framework of AI/ML evaluation.
We noted that diverse datasets often comprise many features.Some studies have noted overfitting in their models due to the utilization of a larger number of features relative to a smaller sample size [66,96].This issue is commonly referred to as the 'n << P problem,' where 'n' represents the sample size and 'P' denotes the number of features [106].Dealing with many features in data can pose challenges when employing AI models, particularly in the context of high dimensionality.One significant challenge associated with high dimensionality is the increased sparsity of data, where information becomes thinly distributed across the feature space.Imagine each piece of data as a dot on a graph.As we add more and more features, the space where these dots exist gets bigger and bigger, making the dots more spread out, or "sparse".Consequently, making accurate predictions becomes challenging unless a substantial number of data points are available.This difficulty is particularly pronounced when analyzing medical data since it often exhibits considerable variation.Hence, researchers take steps to maximize the number of available samples while minimizing the number of features.We observed that many studies have adopted various feature selection and extraction techniques to address this challenge.
Feature selection and extraction can be accomplished by human experts or with computational algorithms.ML methods such as SVM and RF, along with statistical methods including the LASSO-Cox model, were frequently employed for feature selection.Autoencoder, a type of ML algorithm, was a popular method to integrate multi-omics ML data.DL methods were applied more extensively in radiomics data analysis for feature selection and extraction.This preference for DL, particularly CNNs, stems from their efficiency in handling large volumes of data compared to traditional ML or statistical methods.Additionally, CNNs automate the process of feature extraction and classification by identifying patterns and extracting features from images.A limitation of DL lies in its 'black box problem', where it fails to offer interpretations to justify model findings or provide additional clinical insights.Despite this challenge, efforts have been made to demonstrate the importance of features extracted by CNNs.For instance, researchers like Fujima et al. attempted to validate significant radiomic features extracted using CNN through statistical analysis [73].
Unlike DL methods, statistical methods such as the Cox Proportional Hazards (PH) model offer interpretable outcome values.Shapley values derived from the SHapley Additive exPlanations (SHAP) algorithm can interpret outcomes derived from ML methods [86].Shapley values offer insights into the contributions of features towards specific outcomes.
Another approach to address the 'n << P problem' involves increasing the sample size.Many studies have leveraged data from publicly available datasets such as The Cancer Genome Atlas (TCGA).However, excessive reliance on TCGA data may introduce bias towards the -omics data types present in the TCGA dataset, potentially leading to the overfitting of models and resulting in bias and misrepresentation of the outcome.Therefore, initiatives aimed at providing large-scale, multi-modal datasets to the research community are necessary.Moreover, studies that increased the number of samples encountered challenges related to imbalanced data.To mitigate this issue, Meng et  Using AI to diagnose cancer or make prognoses for cancer patients involves several nuanced ethical considerations.Ensuring accuracy and reliability is very important, as errors can lead to significant harm, such as unnecessary treatments or missed opportunities for early intervention.AI's decision-making process needs to be transparent to maintain trust between patients and healthcare providers.Moreover, patient privacy and data security must be safeguarded, given the highly sensitive nature of medical information.AI should augment, not replace, human judgment, ensuring that medical professionals remain central to providing comprehensive and compassionate patient care.

Conclusions
In this review, we provide a comprehensive synopsis of some of the most promising AI utilizations and discuss the limitations associated with each method.AI can significantly improve the cost-effectiveness of cancer diagnosis and treatment by enhancing accuracy, personalizing care, and improving operational efficiencies.Interdisciplinary collaboration is essential in advancing AI applications in oncology.Ethicists, legal experts, and policymakers should be included in the interdisciplinary teams to navigate the complex landscape of AI deployment in oncology.By bringing together diverse expertise from various fields, these collaborations ensure the development of robust, clinically relevant, and ethically sound AI tools that can significantly improve cancer diagnosis and treatment.Challenges persist in achieving feature reproducibility and in ensuring model interpretability.
Overall, most AI models examined in this study were focused on tasks such as classification, clustering, or regression.These models have demonstrated promising outcomes and performance; however, they are not currently suitable for use in clinical settings.This limitation arises from various contributing factors, including lack of standardization of the data, normalization procedures, and evaluation of models by multiple independent investigators.While the list continues to grow as examined further, the primary root of all existing limitations is based on the private nature of medical data.For instance, evaluation of a model by an independent entity may face obvious challenges regarding IRB data sharing requirements.Once the primary impediment of data sharing is resolved, retrospective evaluation by multiple investigators can add confidence to the use of a trained network, which will lead to its translational deployment in clinical settings.Thus, robust prospective studies are necessary to guarantee the safety and efficacy of AI models.Furthermore, concerted efforts to enhance algorithm interpretability and comprehend human-algorithm interactions will be important for future adoption and safety.

Figure 1 .
Figure 1.Publication year vs. number of publications.

Figure 1 .
Figure 1.Publication year vs. number of publications.

Figure 2 .
Figure 2. Flow diagram of the selection of studies to be included in the review.

Figure 2 .
Figure 2. Flow diagram of the selection of studies to be included in the review.

Figure 3 .
Figure 3. Frequencies of top five AI models.

Figure 3 .
Figure 3. Frequencies of top five AI models.

Figure 4 .
Figure 4. Schematic of Random Forest (RF); this image was adapted and modified from the following study[16].The copyright of the image has been confirmed and verified.

Figure 4 .
Figure 4. Schematic of Random Forest (RF); this image was adapted and modified from the following study[16].The copyright of the image has been confirmed and verified.

Author Contributions:
The research was conceptualized and methodology developed by A.V.B. and H.V., with contributions from H.A. Validation was conducted by H.A., H.V., A.V.B. and Y.O., while formal analysis was performed by Y.O., P.B. and H.A. Investigation and data curation were led by Y.O., P.B., A.V.B., H.V. and H.A.; A.V.B. and H.V. provided resources, supervised the project, and administered the project alongside H.A. Writing and editing were collaborative efforts involving all

Table 1 .
Genomics-based prediction of cancer prognosis.

Table 1 .
Genomics-based prediction of cancer prognosis.

Table 2 .
Transcriptomics-based prediction of cancer diagnosis and prognosis.
Transcriptomics-based classification of malignant vs. benign tumorsMean root square error (0.1587) of ANN-SCGP was lowest among other traditional ML algorithms, including RF, SVM, and ANN.Mean root square error assesses the average difference between the predicted values generated by a model and actual values.External validation performed with their tissue microarray data.Transcriptomics-based survival predictionThe AUC values of four survival groups were all above 90%.The patient groups predicted by the SVM model demonstrated comparable survival outcomes to those clustered by the K-means algorithm.Neither external nor independent validation was performed.

Table 3 .
Epigenomics-based prediction of cancer diagnosis and prognosis.

Table 4 .
Proteomics-and metabolomics-based prediction of cancer diagnosis and prognosis.

Table 5 .
Cancer diagnosis and prognosis based on multiomics data.

Table 6 .
Radiomics-based prediction of cancer diagnosis and prognosis.

Table 7 .
Pathomics-based prediction of cancer diagnosis and prognosis.

Table 8 .
Cancer diagnosis and prognosis based on clinical and laboratory data.
al. and Hu et al. employed the Synthetic Minority Over-sampling Technique (SMOTE) algorithm, which replicates minority class samples.