BLAT2DOLite: An Online System for Identifying Significant Relationships between Genetic Sequences and Diseases

Liang Cheng; Shuo Zhang; Yang Hu

doi:10.1371/journal.pone.0157274

Abstract

The significantly related diseases of sequences could play an important role in understanding the functions of these sequences. In this paper, we introduced BLAT2DOLite, an online system for annotating human genes and diseases and identifying the significant relationships between sequences and diseases. Currently, BLAT2DOLite integrates Entrez Gene database and Disease Ontology Lite (DOLite), which contain loci of gene and relationships between genes and diseases. It utilizes hypergeometric test to calculate P-values between genes and diseases of DOLite. The system can be accessed from: http://123.59.132.21:8080/BLAT2DOLite. The corresponding web service is described in: http://123.59.132.21:8080/BLAT2DOLite/BLAT2DOLiteIDMappingPort?wsdl.

Citation: Cheng L, Zhang S, Hu Y (2016) BLAT2DOLite: An Online System for Identifying Significant Relationships between Genetic Sequences and Diseases. PLoS ONE 11(6): e0157274. https://doi.org/10.1371/journal.pone.0157274

Editor: Quan Zou, Tianjin University, CHINA

Received: April 4, 2016; Accepted: May 26, 2016; Published: June 17, 2016

Copyright: © 2016 Cheng et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: 1) http://123.59.132.21:8080/BLAT2DOLite/downloads.jsp 2) https://figshare.com/articles/DO_DOLite_Entrez_Gene_database_hg19_BLAT/3420943

Funding: This work was supported by the National Natural Science Foundation of China (Grant No. 61502125) and Heilongjiang Postdoctoral Fund (NO: LBH-Z15179). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Identifying significantly related diseases of genes has drawn more and more attention in interpreting molecular functions [1–13]. For example, through exploiting the significant relationships between diseases and altered genes by promyelocytic leukemia protein (PML) based on microarray analysis, Anida et al. identified the role of PML in diseases other than cancers [1]. Jiny et al. exploited overlapping between disease-related genes and inflammatory genes to explore core transcriptional regulators of inflammatory genes in coronary artery disease [2].

Enrichment analysis is an effective method to identify the significant relationships between diseases and genes. To this end, a disease vocabulary and a data set of associations between diseases and genes are needed first. Many databases are suitable for this purpose, in which Online Mendelian Inheritance in Man (OMIM) [14] and Gene References Into Function (GeneRIF) [15] have been most commonly used. OMIM is a database that concerns genetic disorders and its induced genes. In contrast, GeneRIF is more comprehensive, which is initiated by the National Library of Medicine (NLM) to link published data to Entrez Gene entries. GeneRIF consists of an Entrez Gene ID, a short text (under 255 characters), and the PubMed identifier (PMID) of the publication that provides evidence for the assertion in that text. Then, gene-disease relationships from the GeneRIF database were discovered [16] by Unified Medical Language System (UMLS) [17] MetaMap Transfer tool (MMTx) [18]. Here, disease terms were filtered by Disease Ontology (DO) [19]. In consideration that a simplified version of vocabulary could be helpful for integrating overview of molecular and cellular biology by combining and removing fine-grained terms [20,21], a simplified vocabulary list from the DO called Disease Ontology Lite (DOLite) [22] was constructed for enrichment analysis.

Many tools have been developed for the ease of accessing the significant relationships between diseases and genes, such as DAVID [23], FunDO [22], DOSE [24], DOSim [25], and GeneAnswer [26]. DAVID was an early bioinformatics analytic tool for systematically extracting biological meaning from large gene/protein lists. In contract, FunDO, DOSE, DOSim, and GeneAnswer can be used to study the significant relationship between diseases and genes. Though gene symbols or gene IDs can be analysed by existing tools, sequence data cannot be processed by all of these five tools. With the development of the next-generation sequencing technology, a large number of sequence data have been produced. Meanwhile, sequence alignment tools have been developed to identify the loci of sequence [27,28]. Therefore, analysing the relationship between sequence data and diseases is a critical challenge.

In this paper, we presented an online tool BALT2DOLite to annotate human genes and diseases, and to identify the significantly related diseases of sequences. Through BLAT2DOLite, sequences were first mapped to their locus by BLAT, and then these sequences were mapped to genes. According to associations between diseases of DOLite and genes, hypergeometric test was exploited to calculate the significant relationships between them. The system can be accessed from: http://123.59.132.21:8080/BLAT2DOLite. For easing to invoke the functions of BLAT2DOLite locally, a web service was also provided, which is described in: http://123.59.132.21:8080/BLAT2DOLite/BLAT2DOLiteIDMappingPort?wsdl.

Materials and Methods

Data Collection

Data sets of BLAT2DOLite were from open source databases. All of these databases were listed in the Table 1. For example, disease terms and relationships between these diseases and genes were from DOLite [22]. Currently, DOLite contains 15,016 associations between 560 diseases and 3,966 genes. In addition, a human reference genome (hg19) [29] was originated from UCSC Genome Browser [30]. In order to retrieve mappings from locus to genes, Entrez Gene database [31] was integrated in our system.

Download:

Table 1. Data sources and tools used for identifying significant relationships between sequences and diseases.

https://doi.org/10.1371/journal.pone.0157274.t001

The Process of BLAT2DOLite

According to our system, significantly related diseases of sequences could be identified, the process of which was described in the Fig 1 as following.

Download:

Fig 1. The process of BLAT2DOLite.

https://doi.org/10.1371/journal.pone.0157274.g001

Step 1: Mapping sequence to locus.

Sequences could be mapped to a human reference genome (hg19) by BLAT, which is an open source software for finding loci of sequences. After mapping by BLAT [32], the location with the longest sequence mapping is selected.

Step 2: Annotating locus, gene symbol, or gene ID with diseases.

Sequences in the previous step could be related to genes based on their locus. Here, two types of relevance were used for annotation: 1) Contain: the loci of gene is in the locus of sequences or the locus of sequences is in the loci of gene; 2) Intersect: The loci of gene covers the locus of sequences partly. Then, based on the relationships between genes and diseases of DOLite, sequences could be annotated with human diseases.

Method for analyzing the significant relationship between sequences and diseases.

Here, hypergeometric test was utilized for analyzing the significant relationship between sequences and diseases. The formula for calculating P-value is as follows: (1)

Taking breast cancer as an example, N indicates the number of genes related by all of diseases, M indicates the number of genes related with breast cancer, k indicates the number of genes related with sequences, x indicates the number of common genes related with sequences and breast cancer.

Implementation

BLAT2DOLite has been implemented on a JavaEE framework and run on the web server (2-core (2.26 GHz) processors) of UCloud [33]. The four-layer architecture involving DATABASE, ALGORITHM, TOOLS, and VIEW layer is shown in the Fig 2. The detailed description of the architecture is as following.

Download:

Fig 2. System overview of BLAT2DOLite.

https://doi.org/10.1371/journal.pone.0157274.g002

DATABASE layer. This layer stores locus of genes, disease terms and associations between human genes and diseases. These data are used by ALGORITHM layer and TOOL layer for annotating human genes and diseases and identifying the significant relationships between human diseases and sequences, respectively.
ALGORITHM layer. Hypergeometric analysis is implemented for calculating the significant relationships between diseases and sequences.
TOOL layer. The system provides two types of functions including annotating human genes and diseases and identifying the significant relationships between sequences and diseases. Furthermore, the functions of this system can be accessed based on our web service [34].
VIEW layer. Webpages are provided for viewing all the results based on TOOL layer. For example, the relationship between human diseases and genes can be shown, and the significant relationship between sequences and diseases can also be obtained. In addition, the interface specification of our web service can be accessed from the web.

Results

The system could be used for annotating human genes and diseases, and identifying the significant relationships between sequences and diseases. The details about the access to these two functions are described as follows.

A case for annotating human genes and diseases

Human genes and diseases can be annotated from the web (http://123.59.132.21:8080/BLAT2DOLite/geneid2diseasename.jsp), a case of which is shown in Fig 3.

Download:

Fig 3. Schematic workflow of annotating human genes and diseases.

https://doi.org/10.1371/journal.pone.0157274.g003

According to the figure, the system could return diseases after submitting an Entrez Gene ID. In this case, the inputted gene ID was ‘28’. And diseases could be affected by this gene were listed in the result page, such as bladder cancer, squamous cell cancer, and so on. Similarly, the system could return Entrez Gene IDs after submitting a disease term. In this case, the inputted disease term was ‘Abortion’. And gene IDs could induce this disease were listed in the result page, such as ‘52’, ‘153, and so on.

A case for identifying the significant relationships between sequences and diseases

The significantly related diseases of sequences could be identified from the web (http://123.59.132.21:8080/BLAT2DOLite/sequence.jsp), a case of which is shown in Fig 4.

Download:

Fig 4. Schematic workflow of identifying significant relationships between sequences and diseases.

https://doi.org/10.1371/journal.pone.0157274.g004

In this system, DNA sequences with FASTA format, in which nucleotides are represented using single-letter codes, could be submitted as an input. This format originates from the FASTA software package [35], but has now become a standard in the field of bioinformatics.

According to the schematic workflow of BLAT2DOLite in the Fig 1. First, sequences could be mapped to locus in the hg19. This mapping result could be returned to the result page. Next, the locus of these sequences could be mapped to Entrez Gene IDs based on the integrated Entrez Gene database. The corresponding associations between locus of these sequences and the locus of genes could also be shown in the result page. Then, these mapped gene IDs were annotated with diseases by BLAT2DOLite. The annotation result was not shown in this result page, in case the annotation function was provided by the system in the annotation page. Finally, the hypergeometric test was used to calculate P-values between these mapped genes and each disease of DOLite. Diseases with P-value less than 0.05 could be shown in the result page.

In the case shown in the Fig 4, the sequences in the web page were used as input. And the result page including ‘Sequence-Locus Mapping’, ‘Locus-Gene ID Mapping’ and ‘Disease P-value’ sections could be returned. In the ‘Sequence-Locus Mapping’ section, the identifiers of mapped sequences were shown in the first column of the table. And the mapped chromosome, start position, and end position of sequences in the same line were listed in the next three columns, respectively. For example, sequences gi|224589803:6898638–6929976 were mapped to locus from 6898637 to 6929976 in the twelfth chromosome. In the ‘Locus-Gene ID Mapping’ section, the relationships between loci of sequences and Entrez Gene IDs could be obtained. For example, in the first line of the result table of this section, the loci of gi|224589803:6898638–6929976 was mapped to Entrez Gene ‘920’. In the ‘Disease P-value’ section, significantly related diseases of these sequences were listed ranked by the P-values in descending order. In this case, diabetes mellitus was identified as the most significant disease of these sequences, so it was listed in the top of the corresponding result table.

Web service of BLAT2DOLite

All the functions of our system were implemented as a web service through the JAVA API for XML Web Services (JAX-WS). The detailed description of our web service can be accessed from the following website: http://123.59.132.21:8080/BLAT2DOLite/BLAT2DOLiteIDMappingPort?wsdl. According to the interface of our web service, users can easily introduce the function of BLAT2DOLite locally.

Conclusion

In this paper, an online system was presented for annotating human genes and diseases and identifying the significant relationships between sequences and diseases. For identifying the relationships between sequences and diseases, BLAT and the Entrez Gene database were integrated to map sequence to Entrez Gene ID. In this system, associations between human genes and diseases of DOLite were utilized for calculating the significant relationships between them. Furthermore, a web service was provided for the ease of introducing the function of BLAT2DOLite locally.

Author Contributions

Conceived and designed the experiments: LC YH. Performed the experiments: LC SZ YH. Analyzed the data: LC SZ YH. Contributed reagents/materials/analysis tools: LC. Wrote the paper: LC.

References

1. Sarajlić A, Janjić V, Stojković N, Radak D, Pržulj N (2013) Network topology reveals key cardiovascular disease genes. PloS one 8: e71537. pmid:23977067
- View Article
- PubMed/NCBI
- Google Scholar
2. Nair J, Ghatge M, Kakkar VV, Shanker J (2014) Network analysis of inflammatory genes and their transcriptional regulators in coronary artery disease. PloS one 9: e94328. pmid:24736319
- View Article
- PubMed/NCBI
- Google Scholar
3. Cheng X, Kao H-Y (2012) Microarray analysis revealing common and distinct functions of promyelocytic leukemia protein (PML) and tumor necrosis factor alpha (TNF α) signaling in endothelial cells. BMC genomics 13: 1.
- View Article
- Google Scholar
4. Xiang Y, Payne PR, Huang K (2012) Transactional database transformation and its application in prioritizing human disease genes. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) 9: 294–304.
- View Article
- Google Scholar
5. Shashni B, Sakharkar KR, Nagasaki Y, Sakharkar MK (2013) Glycolytic enzymes PGK1 and PKM2 as novel transcriptional targets of PPARγ in breast cancer pathophysiology. Journal of drug targeting 21: 161–174. pmid:23130662
- View Article
- PubMed/NCBI
- Google Scholar
6. Danilov A, Shaposhnikov M, Plyusnina E, Kogan V, Fedichev P, Moskalev A. (2013) Selective anticancer agents suppress aging in Drosophila. Oncotarget 4: 1507–1526. pmid:24096697
- View Article
- PubMed/NCBI
- Google Scholar
7. Janjić V, Pržulj N (2012) The core diseasome. Molecular Biosystems 8: 2614–2625. pmid:22820726
- View Article
- PubMed/NCBI
- Google Scholar
8. Sullivan J, Karra K, Moxon SA, Vallejos A, Motenko H, Wong J, et al. (2013) InterMOD: integrated data and tools for the unification of model organism research. Scientific reports 3.
9. Zhao M, Sun J, Zhao Z (2013) TSGene: a web resource for tumor suppressor genes. Nucleic acids research 41: D970–D976. pmid:23066107
- View Article
- PubMed/NCBI
- Google Scholar
10. M Vazquez-Naya J, Martinez-Romero M, B Porto-Pazos A, Novoa F, Valladares-Ayerbes M, Pereira J, et al. (2010) Ontologies of drug discovery and design for neurology, cardiology and oncology. Current pharmaceutical design 16: 2724–2736. pmid:20642429
- View Article
- PubMed/NCBI
- Google Scholar
11. Liu Y, Zeng X, He Z, Zou Q (2016) Inferring microRNA-disease associations by random walk on a heterogeneous network with multiple data sources. IEEE/ACM Trans Comput Biol Bioinform.
- View Article
- Google Scholar
12. Zou Q, Li J, Hong Q, Lin Z, Wu Y, Shi H, et al. (2015) Prediction of microRNA-disease associations based on social network analysis methods. BioMed research international 2015: 810514. pmid:26273645
- View Article
- PubMed/NCBI
- Google Scholar
13. ZENG X, LIAO Y, Zou Q (2016) Prediction and validation of disease genes using HeteSim Scores.
14. Amberger JS, Bocchini CA, Schiettecatte F, Scott AF, Hamosh A (2015) OMIM. org: Online Mendelian Inheritance in Man (OMIM^®), an online catalog of human genes and genetic disorders. Nucleic acids research 43: D789–D798. pmid:25428349
- View Article
- PubMed/NCBI
- Google Scholar
15. Lu Z, Cohen KB, Hunter L. GeneRIF quality assurance as summary revision; 2007. NIH Public Access. pp. 269.
16. Osborne JD, Flatow J, Holko M, Lin SM, Kibbe WA, Zhu LJ, et al. (2009) Annotating the human genome with Disease Ontology. BMC genomics 10: S6.
- View Article
- Google Scholar
17. Lindberg DA, Humphreys BL, McCray AT (1993) The Unified Medical Language System. Methods of information in medicine 32: 281–291. pmid:8412823
- View Article
- PubMed/NCBI
- Google Scholar
18. Meystre S, Haug PJ (2005) Evaluation of medical problem extraction from electronic clinical documents using MetaMap Transfer (MMTx). Studies in health technology and informatics 116: 823–828. pmid:16160360
- View Article
- PubMed/NCBI
- Google Scholar
19. Schriml LM, Arze C, Nadendla S, Chang Y-WW, Mazaitis M, Felix V, et al. (2012) Disease Ontology: a backbone for disease semantic integration. Nucleic acids research 40: D940–D946. pmid:22080554
- View Article
- PubMed/NCBI
- Google Scholar
20. Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, Amanatides PG, et al. (2000) The genome sequence of Drosophila melanogaster. Science 287: 2185–2195. pmid:10731132
- View Article
- PubMed/NCBI
- Google Scholar
21. Shah N, Fedoroff NV (2004) CLENCH: a program for calculating Cluster ENriCHment using the Gene Ontology. Bioinformatics 20: 1196–1197. pmid:14764555
- View Article
- PubMed/NCBI
- Google Scholar
22. Du P, Feng G, Flatow J, Song J, Holko M, Kibbe WA, et al. (2009) From disease ontology to disease-ontology lite: statistical methods to adapt a general-purpose ontology for the test of gene-ontology associations. Bioinformatics 25: i63–68. pmid:19478018
- View Article
- PubMed/NCBI
- Google Scholar
23. Huang DW, Sherman BT, Lempicki RA (2009) Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nature protocols 4: 44–57. pmid:19131956
- View Article
- PubMed/NCBI
- Google Scholar
24. Yu G, Wang L-G, Yan G-R, He Q-Y (2015) DOSE: an R/Bioconductor package for disease ontology semantic and enrichment analysis. Bioinformatics 31: 608–609. pmid:25677125
- View Article
- PubMed/NCBI
- Google Scholar
25. Li J, Gong B, Chen X, Liu T, Wu C, Zhang F, et al. (2011) DOSim: An R package for similarity between diseases based on Disease Ontology. BMC bioinformatics 12: 1.
- View Article
- Google Scholar
26. Feng G, Shaw P, Rosen ST, Lin SM, Kibbe WA (2012) Using the bioconductor GeneAnswers package to interpret gene lists. Next Generation Microarray Bioinformatics: Methods and Protocols: 101–112.
- View Article
- Google Scholar
27. McGinnis S, Madden TL (2004) BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucleic acids research 32: W20–W25. pmid:15215342
- View Article
- PubMed/NCBI
- Google Scholar
28. Zou Q, Hu Q, Guo M, Wang G (2015) HAlign: Fast multiple similar DNA/RNA sequence alignment based on the centre star strategy. Bioinformatics 31: 2475–2481. pmid:25812743
- View Article
- PubMed/NCBI
- Google Scholar
29. Bioinformatics UG (2011) GRCh37/hg19 assembly.
30. Meyer LR, Zweig AS, Hinrichs AS, Karolchik D, Kuhn RM, Wong M, et al. (2013) The UCSC Genome Browser database: extensions and updates 2013. Nucleic Acids Res 41: D64–69. pmid:23155063
- View Article
- PubMed/NCBI
- Google Scholar
31. Maglott D, Ostell J, Pruitt KD, Tatusova T (2011) Entrez Gene: gene-centered information at NCBI. Nucleic acids research 39: D52–D57. pmid:21115458
- View Article
- PubMed/NCBI
- Google Scholar
32. Kent WJ (2002) BLAT—the BLAST-like alignment tool. Genome research 12: 656–664. pmid:11932250
- View Article
- PubMed/NCBI
- Google Scholar
33. Sqalli MH, Al-Saeedi M, Binbeshr F, Siddiqui M. UCloud: A simulated Hybrid Cloud for a university environment; 2012. IEEE. pp. 170–172.
34. Vaughan-Nichols SJ (2002) Web services: Beyond the hype. Computer: 18–21.
- View Article
- Google Scholar
35. Pearson WR, Lipman DJ (1988) Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A 85: 2444–2448. pmid:3162770
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Sarajlić A, Janjić V, Stojković N, Radak D, Pržulj N (2013) Network topology reveals key cardiovascular disease genes. PloS one 8: e71537. pmid:23977067
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Nair J, Ghatge M, Kakkar VV, Shanker J (2014) Network analysis of inflammatory genes and their transcriptional regulators in coronary artery disease. PloS one 9: e94328. pmid:24736319
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. Cheng X, Kao H-Y (2012) Microarray analysis revealing common and distinct functions of promyelocytic leukemia protein (PML) and tumor necrosis factor alpha (TNF α) signaling in endothelial cells. BMC genomics 13: 1.
View Article
Google Scholar

[10] View Article

[11] Google Scholar

[ref4] 4. Xiang Y, Payne PR, Huang K (2012) Transactional database transformation and its application in prioritizing human disease genes. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) 9: 294–304.
View Article
Google Scholar

[13] View Article

[14] Google Scholar

[ref5] 5. Shashni B, Sakharkar KR, Nagasaki Y, Sakharkar MK (2013) Glycolytic enzymes PGK1 and PKM2 as novel transcriptional targets of PPARγ in breast cancer pathophysiology. Journal of drug targeting 21: 161–174. pmid:23130662
View Article
PubMed/NCBI
Google Scholar

[16] View Article

[17] PubMed/NCBI

[18] Google Scholar

[ref6] 6. Danilov A, Shaposhnikov M, Plyusnina E, Kogan V, Fedichev P, Moskalev A. (2013) Selective anticancer agents suppress aging in Drosophila. Oncotarget 4: 1507–1526. pmid:24096697
View Article
PubMed/NCBI
Google Scholar

[20] View Article

[21] PubMed/NCBI

[22] Google Scholar

[ref7] 7. Janjić V, Pržulj N (2012) The core diseasome. Molecular Biosystems 8: 2614–2625. pmid:22820726
View Article
PubMed/NCBI
Google Scholar

[24] View Article

[25] PubMed/NCBI

[26] Google Scholar

[ref8] 8. Sullivan J, Karra K, Moxon SA, Vallejos A, Motenko H, Wong J, et al. (2013) InterMOD: integrated data and tools for the unification of model organism research. Scientific reports 3.

[ref9] 9. Zhao M, Sun J, Zhao Z (2013) TSGene: a web resource for tumor suppressor genes. Nucleic acids research 41: D970–D976. pmid:23066107
View Article
PubMed/NCBI
Google Scholar

[29] View Article

[30] PubMed/NCBI

[31] Google Scholar

[ref10] 10. M Vazquez-Naya J, Martinez-Romero M, B Porto-Pazos A, Novoa F, Valladares-Ayerbes M, Pereira J, et al. (2010) Ontologies of drug discovery and design for neurology, cardiology and oncology. Current pharmaceutical design 16: 2724–2736. pmid:20642429
View Article
PubMed/NCBI
Google Scholar

[33] View Article

[34] PubMed/NCBI

[35] Google Scholar

[ref11] 11. Liu Y, Zeng X, He Z, Zou Q (2016) Inferring microRNA-disease associations by random walk on a heterogeneous network with multiple data sources. IEEE/ACM Trans Comput Biol Bioinform.
View Article
Google Scholar

[37] View Article

[38] Google Scholar

[ref12] 12. Zou Q, Li J, Hong Q, Lin Z, Wu Y, Shi H, et al. (2015) Prediction of microRNA-disease associations based on social network analysis methods. BioMed research international 2015: 810514. pmid:26273645
View Article
PubMed/NCBI
Google Scholar

[40] View Article

[41] PubMed/NCBI

[42] Google Scholar

[ref13] 13. ZENG X, LIAO Y, Zou Q (2016) Prediction and validation of disease genes using HeteSim Scores.

[ref14] 14. Amberger JS, Bocchini CA, Schiettecatte F, Scott AF, Hamosh A (2015) OMIM. org: Online Mendelian Inheritance in Man (OMIM^®), an online catalog of human genes and genetic disorders. Nucleic acids research 43: D789–D798. pmid:25428349
View Article
PubMed/NCBI
Google Scholar

[45] View Article

[46] PubMed/NCBI

[47] Google Scholar

[ref15] 15. Lu Z, Cohen KB, Hunter L. GeneRIF quality assurance as summary revision; 2007. NIH Public Access. pp. 269.

[ref16] 16. Osborne JD, Flatow J, Holko M, Lin SM, Kibbe WA, Zhu LJ, et al. (2009) Annotating the human genome with Disease Ontology. BMC genomics 10: S6.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref17] 17. Lindberg DA, Humphreys BL, McCray AT (1993) The Unified Medical Language System. Methods of information in medicine 32: 281–291. pmid:8412823
View Article
PubMed/NCBI
Google Scholar

[53] View Article

[54] PubMed/NCBI

[55] Google Scholar

[ref18] 18. Meystre S, Haug PJ (2005) Evaluation of medical problem extraction from electronic clinical documents using MetaMap Transfer (MMTx). Studies in health technology and informatics 116: 823–828. pmid:16160360
View Article
PubMed/NCBI
Google Scholar

[57] View Article

[58] PubMed/NCBI

[59] Google Scholar

[ref19] 19. Schriml LM, Arze C, Nadendla S, Chang Y-WW, Mazaitis M, Felix V, et al. (2012) Disease Ontology: a backbone for disease semantic integration. Nucleic acids research 40: D940–D946. pmid:22080554
View Article
PubMed/NCBI
Google Scholar

[61] View Article

[62] PubMed/NCBI

[63] Google Scholar

[ref20] 20. Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, Amanatides PG, et al. (2000) The genome sequence of Drosophila melanogaster. Science 287: 2185–2195. pmid:10731132
View Article
PubMed/NCBI
Google Scholar

[65] View Article

[66] PubMed/NCBI

[67] Google Scholar

[ref21] 21. Shah N, Fedoroff NV (2004) CLENCH: a program for calculating Cluster ENriCHment using the Gene Ontology. Bioinformatics 20: 1196–1197. pmid:14764555
View Article
PubMed/NCBI
Google Scholar

[69] View Article

[70] PubMed/NCBI

[71] Google Scholar

[ref22] 22. Du P, Feng G, Flatow J, Song J, Holko M, Kibbe WA, et al. (2009) From disease ontology to disease-ontology lite: statistical methods to adapt a general-purpose ontology for the test of gene-ontology associations. Bioinformatics 25: i63–68. pmid:19478018
View Article
PubMed/NCBI
Google Scholar

[73] View Article

[74] PubMed/NCBI

[75] Google Scholar

[ref23] 23. Huang DW, Sherman BT, Lempicki RA (2009) Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nature protocols 4: 44–57. pmid:19131956
View Article
PubMed/NCBI
Google Scholar

[77] View Article

[78] PubMed/NCBI

[79] Google Scholar

[ref24] 24. Yu G, Wang L-G, Yan G-R, He Q-Y (2015) DOSE: an R/Bioconductor package for disease ontology semantic and enrichment analysis. Bioinformatics 31: 608–609. pmid:25677125
View Article
PubMed/NCBI
Google Scholar

[81] View Article

[82] PubMed/NCBI

[83] Google Scholar

[ref25] 25. Li J, Gong B, Chen X, Liu T, Wu C, Zhang F, et al. (2011) DOSim: An R package for similarity between diseases based on Disease Ontology. BMC bioinformatics 12: 1.
View Article
Google Scholar

[85] View Article

[86] Google Scholar

[ref26] 26. Feng G, Shaw P, Rosen ST, Lin SM, Kibbe WA (2012) Using the bioconductor GeneAnswers package to interpret gene lists. Next Generation Microarray Bioinformatics: Methods and Protocols: 101–112.
View Article
Google Scholar

[88] View Article

[89] Google Scholar

[ref27] 27. McGinnis S, Madden TL (2004) BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucleic acids research 32: W20–W25. pmid:15215342
View Article
PubMed/NCBI
Google Scholar

[91] View Article

[92] PubMed/NCBI

[93] Google Scholar

[ref28] 28. Zou Q, Hu Q, Guo M, Wang G (2015) HAlign: Fast multiple similar DNA/RNA sequence alignment based on the centre star strategy. Bioinformatics 31: 2475–2481. pmid:25812743
View Article
PubMed/NCBI
Google Scholar

[95] View Article

[96] PubMed/NCBI

[97] Google Scholar

[ref29] 29. Bioinformatics UG (2011) GRCh37/hg19 assembly.

[ref30] 30. Meyer LR, Zweig AS, Hinrichs AS, Karolchik D, Kuhn RM, Wong M, et al. (2013) The UCSC Genome Browser database: extensions and updates 2013. Nucleic Acids Res 41: D64–69. pmid:23155063
View Article
PubMed/NCBI
Google Scholar

[100] View Article

[101] PubMed/NCBI

[102] Google Scholar

[ref31] 31. Maglott D, Ostell J, Pruitt KD, Tatusova T (2011) Entrez Gene: gene-centered information at NCBI. Nucleic acids research 39: D52–D57. pmid:21115458
View Article
PubMed/NCBI
Google Scholar

[104] View Article

[105] PubMed/NCBI

[106] Google Scholar

[ref32] 32. Kent WJ (2002) BLAT—the BLAST-like alignment tool. Genome research 12: 656–664. pmid:11932250
View Article
PubMed/NCBI
Google Scholar

[108] View Article

[109] PubMed/NCBI

[110] Google Scholar

[ref33] 33. Sqalli MH, Al-Saeedi M, Binbeshr F, Siddiqui M. UCloud: A simulated Hybrid Cloud for a university environment; 2012. IEEE. pp. 170–172.

[ref34] 34. Vaughan-Nichols SJ (2002) Web services: Beyond the hype. Computer: 18–21.
View Article
Google Scholar

[113] View Article

[114] Google Scholar

[ref35] 35. Pearson WR, Lipman DJ (1988) Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A 85: 2444–2448. pmid:3162770
View Article
PubMed/NCBI
Google Scholar

[116] View Article

[117] PubMed/NCBI

[118] Google Scholar

Figures

Abstract

Introduction

Materials and Methods

Data Collection

The Process of BLAT2DOLite

Step 1: Mapping sequence to locus.

Step 2: Annotating locus, gene symbol, or gene ID with diseases.

Method for analyzing the significant relationship between sequences and diseases.

Implementation

Results

A case for annotating human genes and diseases

A case for identifying the significant relationships between sequences and diseases

Web service of BLAT2DOLite

Conclusion

Author Contributions

References