Editorial: Machine Learning Advanced Dynamic Omics Data Analysis for Precision Medicine

1 Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai, China, 2 Shanghai Research Center for Brain Science and Brain-Inspired Intelligence, Shanghai, China, 3 Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences (CAS), Shanghai, China, 4 Department of Computer Science, Aberystwyth University, Aberystwyth, United Kingdom


STUDIES BASED ON INDIVIDUAL TEMPORAL 'OMICS DATA FROM DISEASE COHORTS OR ANIMAL MODELS
Liu, R. et al. proposed a single-sample-based hidden Markov model approach to detect the dynamical differences between a normal and a pre-disease states, to detect the immediately upcoming critical transition from the pre-disease state. Lee et al. implemented a deep learningbased python package for multimodal longitudinal data integration, especially the numerical data including time series and non-time series data. Yu et al. implemented an adjusted individual-specific edge-network analysis (iENA) method when a limited number of samples from one individual are available, and made a proof-of-concept study on individual-specific disease classification based on microbiota compositional dynamics. Ho et al. provided a review of polygenic risk scoring and machine learning in complex disease risk prediction with tissuespecific targets, expecting their power to manage complex diseases for customized preventive interventions. Li et al. identified target genes at Juvenile idiopathic arthritis risk loci in neutrophils by an integrated multi-omics approach, constructing a protein-protein interaction network on the basis of a machine learning approach. Dai et al. applied the megaanalysis of Odds Ratio (MegaOR) method to prioritize candidate genes of Crohn's Disease, based on a comprehensive collected multi-dimensional data. Wang, C.H. et al. detected differentially expressed lncRNAs and mRNAs in atherosclerosis by analyzing public datasets with the weighted gene co-expression network analysis, and this bioinformatics study would provide potential novel therapeutic and prognostic targets for atherosclerosis. Jiang, S. et al. collected and profiled the circRNA expressions of heart tissues from Atrial fibrillation patients and healthy controls, providing new insights of the circRNA roles in AF with highly potential interaction mechanisms among circRNAs, microRNAs, and mRNAs.

STUDIES BASED ON MULTIPLE
Gu et al. reused the Surveillance, Epidemiology, and End Results registry database to conduct stratification analyses, univariable and multivariable analyses, indicating surgery is an important component of multidisciplinary treatment and sublober resection is not inferior to lobectomy for the specific patients. Zhang, J. et al. exploited the largest crohn's disease dataset and ulcerative colitis dataset by a two-step approach, exhaustively searching for epistasis with dense markers and exploiting marker dependencies. Du et al. analyzed the genome-wide splicing data in 16 cancer types with normal samples by a network-based and modularized approach and captured the pan-cancer splicing and modularized perturbation, which support the dominant patterns of cancer-associated splicing. Zhao et al. assessed the prognostic value of Apolipoprotein E and explored the potential relationship with tumor progression in colorectal cancer (CRC), by collecting the microarray data from the Gene Expression Omnibus and exploring the gene with prognostic significance from the TCGA database. Tang et al. proposed an effective data integration framework HCI (High-order Correlation Integration) to realize high-dimensional data feature extraction with extensive flexibility and applicability on sample clustering with RNA-seq data on bulk and single-cell levels. Chang et al. identified new susceptibility genes and causal sub-networks in schizophrenia by an integrated network-based approach, and reported the N-methyl-D-aspartate receptor interactome highly targeted by multiple types of genetic risk factors. Wang and Liu recognized potential diagnostic biomarkers of Alzheimer's disease by integrating gene expression profiles from six brain regions in a machine-learning manner and validating marker genes in multiple cross-validations and functional enrichment analyses. Xu et al. provided an effective way for the annotation of nuclear non-coding and mitochondrial genes and the identification of new steady RNAs, making a pan RNA-seq analysis to suggest the ubiquitous existence of both 5' and 3' end small RNAs.

STUDIES BASED ON THE GUT METAGENOME AND HOST 'OMICS FOR COMPLEX DISEASES DIAGNOSIS AND TREATMENT
Yang et al. presented a new pathogen detection and strain typing method UltraStrain for Salmonella enterica based on whole genome sequencing data, which includes a noise filtering step, a strains identification step on the basis of statistical learning, and a final refinement step. Tan et al. conducted comprehensive and systematic experiments, including in vitro genetic assessments and an in vivo acute toxicity study, aiming to study safety issues associated with Bacteroides ovatus ELH-B2. Qiu et al. set up an in-silico model emerging or re-emerging dengue virus (DENV) based on possible antigenicity-dominant positions of envelope (E) protein, so that, the DENV serotyping may be re-considered antigenetically rather than genetically. Zhang, B. et al. collected and re-analyzed the published fecal 16S rDNA sequencing datasets to identify biomarkers to classify and predict colorectal tumors by random forest method, and the trained random forest model has good AUC performance for CRC when combined all samples, although the predication performed poorly for advance adenoma and adenoma.  (MR) to test the influence of body mass index (BMI) on the risk of T2DM based on GWAS data, validating the causal effect of high BMI on the risk of T2DM. Feng et al. utilized one analysis procedure of feature selection and classification on both transcriptomes and methylomes cancer data, suggesting age should be an essential factor rather than confounding factor in the training and optimization of disease diagnosis model. Qin et al. developed a new joint gene set analysis statistical framework, aiming to improve the power of identifying enriched gene sets by integrating multiple similar disease datasets when the sample size is limited. Shi, Q. et al. proposed a new computational framework of "Multi-view Subspace Clustering Analysis" to capture the underlying heterogeneity of samples from multiple data types, by first measuring the local similarities of samples in the same subspace and then extracting the global consensus sample patterns. Jiang, P. et al. developed a new variants mining algorithm based on trio-based sequencing data, and applied this method on a Ventricular septal defect (VSD) trio and identified several genes and lncRNA highly related to VSD.
Finally, we sincerely thank the reviewers for their great efforts to ensure the high quality of all contributing articles, and we hope this Research Topic can attract wide attention in these topics of precision medicine based on machine learning and omics data.

AUTHOR CONTRIBUTIONS
TZ drafted the manuscript. TZ, TH, and CL revised the manuscript.