Application of Systems Biology and Bioinformatics Methods in Biochemistry and Biomedicine 2014

With the explosively increasing high-throughput omics data, it is highly desired to develop effective computational methods and tools that can mine useful information to support the development of biochemistry, biomedicine, and drug design. Furthermore, in order to understand the protein-protein, protein-D/RNA, and other complex interactions, systems biology approaches are applied. 
 
In this collection, diverse topics were covered and there are many novel methods and intriguing findings. 
 
Y. Jiang et al. compared the gene expressions among the colorectal cancer patients in different stages and obtained the early and late stage biomarkers. Then, these two kinds of biomarkers were both mapped onto the protein interaction network, and the signal propagation path from the early stage biomarker to the late one was identified. Their findings may provide useful insights for revealing the mechanism of colorectal cancer progression at the cellular systems biology level. 
 
L. N. Lili et al. investigated the process of stroma activation in human ovarian cancer by molecular analysis of matched sets of cancer and surrounding stroma tissues. They found that functionally significant variability exists among ovarian cancer patients in the ability of the microenvironment to modulate cancer development. 
 
B. Yang et al. constructed a network-based inference framework for identifying cancer genes from gene expression data. Six identified genes (TSPYL5, CD55, CCNE2, DCK, BBC3, and MUC1) susceptible to breast cancer were verified through the literature mining, GO analysis, and pathway functional enrichment analysis. 
 
Lung cancer is one of the most malignant cancers. B. Q. Li et al. identified 25 NSCLC and 38 SCLC genes with the shortest path approach in PPI networks. These candidate genes contained more cancer genes and more functional similarity with cancer genes than those identified from the gene expression profiles. 
 
A. R. Iskandar et al. evaluated the perturbation of xenobiotic metabolism in response to cigarette smoke exposure in nasal and bronchial tissues. Their observation suggested that the effects of cigarette smoke exposure on the xenobiotic responses in the bronchial and nasal epithelium of smokers were similar to those observed in their respective organotypic models exposed to cigarette smoke, and nasal tissue could be a used as a reliable surrogate to measure the xenobiotic responses in the bronchial tissue. 
 
E. G. Maiorov et al. identified interconnected markers for T-cell acute lymphoblastic leukemia (T-ALL). Their identified genes may serve as biomarkers, alternative to the traditional ones used for the diagnosis of T-ALL, and help understand the pathogenesis of the disease. 
 
M. Kalita et al. used a multiplex gene expression profiling platform to investigate the perturbations of the innate pathways induced by TGF in a primary airway epithelial cell model of epithelial mesenchymal transition (EMT). Their results indicated that epigenetic changes produced by EMT induce dynamic state changes of the innate signaling pathway. 
 
C. Lu et al. studied the functions of microRNAs related to the liver regeneration of the whitespotted bamboo shark, Chiloscyllium plagiosum. Their work deepened the understanding of mechanisms of liver regeneration and resulted in the addition of a significant number of novel miRNAs sequences to GenBank. 
 
T. Alioto et al. presented a lightweight pipeline for first-pass gene prediction on newly sequenced genomes. The two main components are ASPic, a program that derives highly accurate, albeit not necessarily complete, EST-based transcript annotations from EST alignments. The other component is GeneID, a standard gene prediction program, which we have modified to take as evidence intron annotations. The pipeline was successfully tested on the entire C. elegans genome and the 44 ENCODE human pilot regions. 
 
J. Zou et al. reviewed advanced systems biology methods in drug discovery and translational biomedicine. Their review provided a framework for addressing disease mechanism and approaching drug discovery. 
 
L. Chen et al. proposed a computational method to predict the side effects of drugs, which integrated the information of chemical-chemical and protein-chemical interactions. Compared to most of the previous studies, the proposed method can provide the order information of the side effects for any query drug. 
 
K. Wang et al. proposed an accurate method for protein-ligand binding site on protein surface using SVM and statistical depth function. The accuracy, sensitivity, and specificity on training set are 77.55%, 56.15%, and 87.96%, respectively, and on the independent test set the accuracy, sensitivity, and specificity are 80.36%, 53.53%, and 92.38%, respectively. 
 
K. K. Tseng et al. presented a new system and novel approaches to classify different kinds of sperm images in order to assess their health. In their evaluation, the method reached accuracy of 87.5% and has better performance than the existing approaches to sperm classification. 
 
A rapid method is required to mitigate complexity and computation challenges on high throughput protein identification. In Method for Rapid Protein Identification in a Large Database, an accelerated open method is presented by W. Zhang et al. to satisfy this requirement to some extent. 
 
Q. Zou et al. proposed a novel method for distinguishing cytokine from other proteins. It is of vital importance of identifying cytokine in silicon. Ensemble classification strategy was employed for improving the prediction performance, and a friendly prediction web server was also developed. 
 
Du and Yu introduced a novel method, SubMito-PSPCP, which embeds the PSSM into the pseudoamino acid compositions, to predict protein submitochondrial locations. 
 
T. Gu et al. applied the Support Vector Regression and a two stage feature selection to developing the computational model which maps DPP-IV inhibitors to the activity. They also developed the online server. 
 
Based on nonlinear mapping and Coulomb function, X. Liu et al. applied 3D kernel approach to predict the four protein tertiary structural classes and five membrane protein types with satisfactory results. It has not escaped our notice that kernel approaches may hold a high potential for predicting the other protein features. 
 
T. H. Zhao et al. proposed a new method to predict protein disordered regions based on sequence features. The accuracy and MCC (Matthew's correlation coefficient) of their method are higher than three popular disordered region predictors: DISOPRED, DISOclust, and OnD-CRF. 
 
M. S. M. Ali et al. studied the structure and function of LipA8 which is able to adapt to extreme temperatures. Simulations show that it is most stable at 0°C and 5°C. In extreme temperature, the catalytic domain (N-terminus) maintained its stability than the noncatalytic domain (C-terminus), but the noncatalytic domain showed higher flexibility than the catalytic domain. 
 
A Boolean network (BN) is widely used as a model of gene regulatory networks. K. Kobayashi et al. proposed a BN model with two types of the control inputs and an optimal control method with duration of drug effectiveness. The optimal control problem is reduced to an integer programming problem. 
 
J. Zhang et al. studied the microRNA-mediated regulation in biological systems with oscillatory behavior. They started with two specific microRNA-mediated regulatory circuits which show their fine-tuning roles in the modulation of periodic behavior and then applied these results to study the effects of miR369-3 regulation of cell cycle. 
 
B. Yan et al. developed a mathematical model to study the mechanisms underlying the size checkpoint in fission yeast. They found that when the spatiotemporal regulation is coupled to the positive feedback loops, the mitosis-promoting factor (MPF) exhibits a bistable steady-state relationship with the cell size. The switch-like response from the positive feedback loops naturally generates the cell size checkpoint. 
 
Detection of potential siRNA off-targets is crucial for High Content Screening (HCS) using small interfering RNAs (siRNAs). S. Das et al. performed a detailed off-target analysis of three most commonly used kinome siRNA libraries based on latest RefSeq version and created SeedSeq database, a new unique format to store off-target information. 
 
L. Zhu et al. systematically investigated the characteristics and evolutionary pattern of actin gene family in primates. Phylogenetic analysis of 233 actin genes in human, chimpanzee, gorilla, orangutan, gibbon, rhesus monkey, and marmoset genomes showed that actin genes in the seven species could be divided into two major types of clades: orthologous group versus complex group. Codon usages and gene expression patterns of actin gene copies were highly consistent among the groups because of basic functions needed by the organisms but much diverged within species due to functional diversification. 
 
J. Ping et al. performed long time-scale molecular dynamics simulations on both open and closed states of Escherichia coli adenylate kinase (ADK); based on which a conformational selection mechanism was proposed to explain the large scale domain motion of this enzyme. 
 
 
Yudong Cai 
 
Tao Huang 
 
Lei Chen 
 
Bin Niu

In biochemistry and biomedicine, more and more new technologies are developed and the high-throughput data generated by such technologies need to be analyzed with more powerful systems biology and bioinformatics methods.
In this special issue, novel systems biology and bioinformatics methods developed in 2014 and their applications in biochemistry and biomedicine were introduced.
Ł. Palkowski et al. presented a SAR study of a novel homologous series of bis-quaternary imidazolium chlorides. With the help of the dominance-based rough set approach (DRSA), some relevant features were extracted, which were deemed to be highly related to antifungal activity of the compounds.
H. Liu et al. developed a new method to predict HIV-1 protease cleavage site. They used two feature fusion methods: combination fusion and decision fusion and improved the prediction performance. Their results and analysis provide useful instruction and help for designing HIV-1 protease inhibitor in the future.
G. P. Monterrubio-López et al. identified novel potential vaccine candidates against tuberculosis based on reverse vaccinology. For each candidate, both comprehensive literature survey and bioinformatics analysis, such as the simulation of the immune response, were conducted. At last, six novel vaccine candidates, EsxL, PE26, PPE65, PE PGRS49, PBP1, and Erp, were considered to be useful for tuberculosis vaccine design.
M. Aqil et al. analyzed the transcriptome of human monocytic cells expressing the HIV-1 Nef protein and their exosomes. They identified four key mRNAs, MECP2, HMOX1, AARSD1, and ATF2, which are important for chromatin modification and gene expression. They also identified three key mRNAs, AATK, SLC27A1, and CDKAL, which are important in apoptosis and fatty acid transport.
Y. Liu et al. developed a computational method to predict protein glycation sites by using the support vector machine classifier, maximum relevancy minimum redundancy (mRMR), and incremental feature selection (IFS) method. Their prediction accuracy was 85.51% and MCC (Matthews correlation coefficient) was 0.70. They found that the composition of k-spaced amino acid pairs feature contributed the most for glycation sites prediction.
Z. Xu et al. proposed a systems biology approach to quantify biofilm formation of P. aeruginosa upon the changes of availability of amino acids, ferrous ions, sulfate, and phosphate in the surrounding environment. Some biofilm formation patterns were discovered, which can be validated by existing experimental data.
F. Wang et al. presented a distribution-based approach for gene pair classification by identifying a disease-specific cutoff point that classified the coexpressed gene pairs into strong and weak coexpression structures. They applied their method to analyze the NPM1-associated genes in chronic myelogenous leukemia (CML) and found that genes involved in the ribosomal synthesis and translation process tended to be coexpressed in the CML group.
Y. Cui et al. collected the biochemical examination and tongue image data from 46 case subjects with hyperuricemia and 46 control subjects. Based on the symmetrical Haar-like features which were extracted from tongue images, they built Chang investigated the effects of electrode geometry in microfluidic devices on the impedance of single HeLa cell. Their simulations indicated that the circle and parallel electrodes provide higher electric field strength compared to cross and standard electrodes at the same operating voltage. Increasing the operating voltage reduces the impedance magnitude of a single HeLa cell in all electrode shapes and decreasing impedance magnitude of the single HeLa cell increases measurement sensitivity.
M. Deng et al. applied gas chromatography-mass spectrometry (GS-MS) in combination with multivariate statistical analysis to explore the metabolic variability in urine of chronically hydrogen sulfide-(H 2 S-) poisoned rats relative to control ones. Their technique can be employed to decipher the mechanism of chronic H 2 S poisoning and promote the use of metabolomics in clinical toxicology.
W. Hu et al. evaluated the accuracy of a novel computersimulated biopsy marking system (CSBMS) developed for endoscopic marking of gastric lesions. Twenty-five patients with history of gastric intestinal metaplasia received both CSBMS-guided marking and India ink injection in five points in the stomach at index endoscopy. The mean accuracy of CSBMS at angularis was 5.3 ± 2.2 mm, antral lesser curvature 5.7 ± 1.4 mm, antral greater curvature 6.1 ± 1.1 mm, antral anterior wall 6.9 ± 1.6 mm, and antral posterior wall 6.9 ± 1.6 mm. Their results suggested that the CSBMS can accurately identify previously marked gastric sites by endoscopic tattooing within 1 cm on follow-up endoscopy.
G. Huang et al. developed a method to predict S-nitrosylation modification sites based on kernel sparse representation classification and mRMR Algorithm. Their predictor achieves Matthews' correlation coefficients of 0.1634 and 0.2919 for the training set and the testing set, respectively, which are better than those of -nearest neighbor algorithm, random forest algorithm, and sparse representation classification algorithm. A webserver for the prediction of S-nitrosylation sites based on kernel sparse representation classification and minimum Redundancy Maximum Relevance algorithm is available at http://www.zhni.net/snopred/ index.html.

Yudong Cai Tao Huang
Lei Chen Bing Niu