The International Conference on Intelligent Biology and Medicine (ICIBM) 2018: genomics with bigger data and wider applications

The sixth International Conference on Intelligent Biology and Medicine (ICIBM) took place in Los Angeles, California, USA on June 10–12, 2018. This conference featured eleven regular scientific sessions, four tutorials, one poster session, four keynote talks, and four eminent scholar talks. The scientific program covered a wide range of topics from bench to bedside, including 3D Genome Organization, reconstruction of large scale evolution of genomes and gene functions, artificial intelligence in biological and biomedical fields, and precision medicine. Both method development and application in genomic research continued to be a main component in the conference, including studies on genetic variants, regulation of transcription, genetic-epigenetic interaction at both single cell and tissue level and artificial intelligence. Here, we write a summary of the conference and also briefly introduce the four high quality papers selected to be published in BMC Genomics that cover novel methodology development or innovative data analysis.


Introduction
The 2018 International Conference on Intelligent Biology and Medicine (ICIBM 2018) was held from June 10th to 12th, 2018 in Los Angeles, California, USA. This is the sixth ICIBM conference and it became the first official conference of The International Association for Intelligent Biology and Medicine (IAIBM). Since its inception in 2012, ICIBM conference series aim to: 1) foster interdisciplinary and multidisciplinary research in bioinformatics, systems biology, intelligent computing/artificial intelligence, bioengineering, and data sciences, and 2) offer an educational program for trainees and young investigators in multiple scientific disciplines to learn or exchange the new methods/tools/discoveries in these areas and to build a professional network among both the established and junior investigators.
We have expanded our ICIBM conference each year since 2012. The summary of the previous conferences is reported in previous introduction article [1][2][3][4][5]. The ICIBM 2018 was carefully prepared and attracted a record number of the paper submission and participants, and was the largest ICIBM conference. Specifically, we received 95 original manuscripts and 44 abstracts in the areas of bioinformatics, genomics, systems biology, machine learning, and biomedical informatics. Each paper was carefully reviewed by the program committee members. Among these submissions, we observed more papers coming out from emerging fields such as deep learning, new genome technologies, and big medical data science. Accordingly, our keynote speeches and scientific sessions reflect these trends. In addition, financial support from National Science Foundation allowed us to support 20 trainees in diverse background and across the world to attend and present their works in the ICIBM conference. As before, the Award Committee carefully reviewed travel award applications based on the quality of the research (paper or abstract), the financial need, and diversity/minority of the attendees. Below, we briefly summarize the scientific program of the ICIBM 2018 and also provide an introduction of the 4 research articles selected for this BMC Genomics supplement issue.

Overall scientific program of ICIBM 2018
The ICIBM 2018 scientific program includes eleven regular scientific sessions, four tutorials, one poster session, four keynote talks, and four eminent scholar talks. It covered a variety of the topics in bioinformatics, systems biology, machine learning, data sciences, and biomedical informatics. He presented "Huge cohorts, genomics, and clinical data to personalize medicine". In his talk, Dr. Denny demonstrated how electronic health records (EHRs) and genomic data can be combined for phenome-wide association studies, leading to personalized medicine. He also introduced the era of huge international cohorts such as the UK Biobank, Million Veteran Program, and the newly started All of Us Research Program, which will make millions of individuals available with dense molecular and phenotypic data. (2) Paul D. Thomas, Ph.D., an Associate Professor in the Preventive Medicine Department, and who heads the Bioinformatics Division at the University of Southern California Keck School of Medicine. Dr. Thomas is the leader of the Gene Ontology project, which is among the world's largest bioinformatics projects. In Dr. Thomas presentation entitled "Reconstructing the large-scale evolution of genomes and gene functions", he gave an overview of the reconstruction methods and the related findings, introduced his work that reconstructed the evolutionary history of over 1 million genes in 15,000 gene families covering all domains of life, and also his work inferring human gene function from experiments in "model organisms" such as the fruit fly and yeast.

Keynote and eminent scholar lectures
In the third keynote speech, Dr. Alexander Hoffmann, a Professor of Microbiology, Immunology, and Molecular Genetics, and Director of the Institute for Quantitative and Computational Biosciences (QCB) at UCLA, demonstrated "Learning how to predict immune responses". He discussed how molecular network dynamics and molecular noise affect immune cell function, and some of the modeling strategies that allow for prediction and insight. The fourth keynote speaker was Dr. Jason Moore, who holds the Edward Rose Professor of Informatics and Director of the Penn Institute for Biomedical Informatics. He also serves as Senior Associate Dean for Informatics and Chief of the Division of Informatics in the Department of Biostatistics, Epidemiology, and Informatics. Dr. Moore is a fellow of the American Association for the Advancement of Science (AAAS), the American College of Medical Informatics (ACMI), the American Statistical Association (ASA), and a Kavli fellow of the National Academy of Sciences. In his presentation, entitled "Accessible artificial intelligence for data science", Dr. Moore introduced the history of artificial intelligence (AI) and then demonstrated the PennAI, an accessible, open-source, and user-friendly AI system at the University of Pennsylvania. PennAI brings AI and automated machine learning technology to everyone who wants to incorporate this technology into their big data analytics agenda.
ICIBM 2018 also featured four eminent scholar talks. These talks were delivered by four renowned researchers in their specific fields.

Scientific sessions
ICIBM 2018 included eleven concurrent scientific sessions and one poster session. Speakers in the regular sessions were chosen from those top ranked manuscripts after peer review. The topics in these session covered bioinformatics, genomics, systems biology, intelligent computing, data sciences, computational drug discovery, and biomedical informatics. To promote international collaboration, we first time organized an International PI talk session. Several principal investigators (PIs) from oversea presented their exciting projects in this session followed by extensive discussion. In addition, two best papers were selected and honored in the conference. The eleven session are: Since 2010, ICIBM meetings have been covering the latest developments in the entire pipeline of genomics research from low level data processing, to modeling, prediction and visualization, as well as to application to the real data sets. The tasks have grown from single feature to network analysis, and the scale of studies have grown from single data set to joint analysis of large scale cohorts or integration of data from multiple types and sources. The ICIBM 2018 reflects the trend to harvesting signal from increasing amount of data and keeping the development of computational methods to meet the needs of emerging biotechnology that presents new data types. Below, we summarize the contribution of ten papers included in this supplement issue.
The paper by Du et al. [6] explored the idea of gene embedding, distributed representation of genes, in the spirit of word embedding. The authors described a machine learning method that uses transcriptome-wide gene co-expression patterns from 984 publicly available data sets to generate a distributed representation of genes. They trained a vector representation (vectors of dimension 200) of all human genes, which captured functional relatedness of genes in terms of recovering known-pathways. The usefulness of the embedded gene vectors is illustrated in tasks such as gene-gene interaction prediction based solely on gene names.
RNA-sequencing (RNA-seq) has now become a routine technique in genomic studies and its application has expanded beyond quantifying transcription activity. Mohammad et al. [7] introduced a cell line identification method using variants derived from RNA-seq data. They calculated pair-wise correlations based on frequencies and depth of coverage values of variants and conducted comparative analysis of correlations across cell lines. The results showed substantial difference in variant profiles between different cell lines. On the other hand, identical, synonymous and derivative cell lines could share high variant identity and were highly correlated. Their identification method, CeL-ID, has high accuracy of identifying the cell line when a pure cell line is involved, and can be used to detect cross-contamination with a mixture model.
Emerging technologies that bring in new data types have become a constant phenomenon in the genomics era. The paper by Liu et al. [8] presents a computational method to detect DNA modification from single-molecule sequencing data generated by Nanopore technology. This method, named NanoMod, takes Nanopore sequencing data on paired DNA samples with and without modifications, and identifies modified bases by contrasting the signal distributions in the two samples and adjusting for local neighborhood effects. Not only does NanoMod show improved performance in the detection of DNA modification in simulation studies, but also its ability to detect de novo modification without training data makes it particularly desirable.
In addition to technology advancement, innovative analysis using existing datasets continue to bring discoveries. Pei et al. [9] investigated multi-trait associations using genome-wide association studies (GWAS) datasets. Specifically, the authors applied pathway-based analysis of GWAS summary statistics with an analytical framework for systematic integration of cross-trait associations. They used 25 traits belonging to four phenotype groups. The pathway-based analysis provided increased power to estimate cross-trait associations compared to gene-level analysis. The study revealed that the risk variants to the 25 different traits aggregated in particular biological pathways and that these pathways were frequently shared among traits. The results confirmed known mechanisms and also suggested several novel insights into the etiology of multi-traits.

Conference organization
2018 International conference on intelligent biology and medicine (ICIBM 2018) (June 10-12, 2018 Los Angeles, California, USA) We would like to express our sincere gratitude to the members of the Steering, Program, Publication, Workshop/ Tutorial, Publicity, Award, Trainee and Local Organization Committees, as well as to all the reviewers, volunteers and invited speakers, who spent their valuable time and effort on making ICIBM 2018 a success. The conference accomplishments are the results of support and hard work of all these people.

Hosts and sponsors
The International Association for Intelligent Biology and Medicine (IAIBM), The University of Texas Health Science Center at Houston (UTHealth), National Science Foundation, UTHealth Center for Precision Health, and UTHealth Data Science and Informatics Core for Cancer Research.
General Chair Zhongming Zhao (The University of Texas Health Science Center at Houston).
Steering Committee Yidong Chen (The University of Texas Health Science Center at San Antonio), Kun Huang