GeneMANIA: Fast gene network construction and function prediction for Cytoscape

Jason Montojo; Khalid Zuberi; Harold Rodriguez; Gary D. Bader; Quaid Morris

doi:10.12688/f1000research.4572.1

Home Browse GeneMANIA: Fast gene network construction and function prediction...

ALL Metrics

-

Views

-

Downloads

Get PDF

Get XML

Export

▬

✚

Software Tool Article

GeneMANIA: Fast gene network construction and function prediction for Cytoscape

[version 1; peer review: 2 approved]

Jason Montojo¹, Khalid Zuberi¹, Harold Rodriguez¹, Gary D. Bader¹, Quaid Morris¹

Jason Montojo¹, Khalid Zuberi¹, [...] Harold Rodriguez¹, Gary D. Bader¹, Quaid Morris¹

PUBLISHED 01 Jul 2014

Author details Author details

¹ Departments of Molecular Genetics and Computer Science, The Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada

OPEN PEER REVIEW

REVIEWER STATUS

This article is included in the Bioinformatics gateway.

This article is included in the Cytoscape gateway.

Abstract

The GeneMANIA Cytoscape app enables users to construct a composite gene-gene functional interaction network from a gene list. The resulting network includes the genes most related to the original list, and functional annotations from Gene Ontology. The edges are annotated with details about the publication or data source the interactions were derived from. The app leverages GeneMANIA’s database of 1800+ networks, containing over 500 million interactions spanning 8 organisms: A. thaliana, C. elegans, D. melanogaster, D. rerio, H. sapiens, M. musculus, R. norvegicus, and S. cerevisiae. Users may also import their own organisms, networks, and expression profiles. The app is compatible with Cytoscape versions 2 and 3.

Corresponding author: Jason Montojo

Competing interests: No competing interests were disclosed.

Grant information: This work was funded by the Ontario Government’s Ministry of Research and Innovation via the Global Leadership Round in Genomics & Life Sciences (GL2) program (assigned to GDB and QM); and the National Resource for Network Biology (U.S. National Institutes of Health, National Center for Research Resources grant number P41 GM103504, assigned to GDB).
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Copyright: © 2014 Montojo J et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Data associated with the article are available under the terms of the Creative Commons Zero "No rights reserved" data waiver (CC0 1.0 Public domain dedication).

How to cite: Montojo J, Zuberi K, Rodriguez H et al. GeneMANIA: Fast gene network construction and function prediction for Cytoscape [version 1; peer review: 2 approved]. F1000Research 2014, 3:153 (https://doi.org/10.12688/f1000research.4572.1) First published: 01 Jul 2014, 3:153 (https://doi.org/10.12688/f1000research.4572.1) Latest published: 01 Jul 2014, 3:153 (https://doi.org/10.12688/f1000research.4572.1)

Introduction

The GeneMANIA Cytoscape¹ app enables users to construct a weighted composite functional interaction network from a list of genes. Each node represents a gene and its products. The app uses the GeneMANIA algorithm² to find other genes and gene products that are most related to the original list, and shows how they are related.

The app provides access to most of the features of the GeneMANIA prediction server³ while removing limitations on gene list length, and the maximum size of the resulting network. The app also allows predictions to be made on user-defined organisms and arbitrarily large custom networks.

Source networks

GeneMANIA uses a database of organism-specific weighted networks to construct the resulting composite network. The database includes over 1800 networks, containing over 500 million interactions for 8 organisms: A. thaliana, C. elegans, D. melanogaster, D. rerio, H. sapiens, M. musculus, R. norvegicus, and S. cerevisiae. The networks are organized into groups such as co-expression, where edges are derived from expression profiles, and shared protein domains, where edges represent genes that encode proteins with similar domains. Users may select any combination of these as the basis of the composite network they construct for their gene list.

Gene scores

Prior to construction, the selected networks are each assigned a weight by the GeneMANIA algorithm. The weight of each edge is multiplied by the weight of the containing network. Next, the union of all edges in the network is taken. In the case of multiple edges between any pair of nodes, the edges are collapsed into one and assigned a weight equal to the sum of the individual edge weights. The query genes are assigned a label value of 1, while all other genes are 0. Label propagation is then applied to the entire network² and the resulting labels are saved as the score attribute in the node table. This score indicates the relevance of each gene to the original list based on the selected networks. Higher scores indicate genes that are more likely to be functionally related. Users may extend their original gene list by adding these top ranking genes to their network. They can also choose not to add any other genes so they can visualize how the members of their list are connected.

Composite network

Instead of providing the user with the composite network used during label propagation, the Cytoscape app displays at most one edge for each type of network that contributed to the gene scores (Figure 1). For example, if five co-expression networks and two physical interaction networks contained an edge between the same pair of genes, the resulting network would contain one co-expression edge and one physical interaction edge for that pair. The edges are annotated with the original edge weights, the source networks from which those weights originate, relevant publications, and details about how the data was collected or processed (Figure 2). The nodes are annotated with Gene Ontology⁴ terms, alternate identifiers and synonyms.

Figure 1. Composite network for BRCA1.

The circles are genes and the diamonds are protein domain attributes. Up to 20 most related genes and 20 most related attributes are shown. The red genes are annotated with DNA repair, as indicated in the Functions tab.

Figure 2. Sample of provenance details provided for each edge in composite network.

The source networks are grouped by type (e.g. co-expression) and list each network weight, as well as the sum of the weights of the networks in each group. Citations and links to relevant publications and data sources and provided where possible.

Implementation

The GeneMANIA app is an update to the GeneMANIA plugin for Cytoscape 2⁵. The app preserves runtime compatibility with older versions of Cytoscape. It is distributed as a universal binary that runs on every release of Cytoscape since version 2.6.3. Figure 3 illustrates how we architected the software to enable the same code to run in multiple environments. The GeneMANIA Engine module, which implements the algorithm, is an independent layer that is also used directly by the GeneMANIA prediction web server. The App Core module includes highly parallelized command line tools for function prediction and cross validation⁶ on multiprocessor clusters and multicore workstations. It also contains an abstraction layer to provide access to a small subset of Cytoscape’s functionality through high-level Application Programming Interface (API). This alternative API effectively decouples the app implementation from a particular version of Cytoscape, allowing the same code to drive a Cytoscape 2 plugin and Cytoscape 3 app.

Figure 3. Architecture diagram of the GeneMANIA app illustrating the inputs and outputs of the system.

The user-provided gene list is used to select the most relevant interactions from the GeneMANIA database. The resulting network is visualized in Cytoscape.

Database

The app provides access to all previous editions of the GeneMANIA database dating back to the initial September 23, 2010 release. New data updates will also be supported as they become available. As of the March 3, 2011 database release, two subsets of the data are available for users with special requirements. The core subset is roughly 20% of the size of the full database and only includes networks that are selected by default³. The open license subset only includes network data with no restrictions on use. For example, networks derived from I2D⁷ and HPRD⁸ are excluded from this subset since their standard licenses prohibit commercial use of their data.

The networks are stored on disk as compact binary sparse matricies, which are used directly by GeneMANIA’s network integrator. This representation allows networks to be loaded quickly and used immediately without transformation into a different data structure. Gene and network metadata, including descriptions and provenance details, are stored in a Lucene index. This allows fast retrieval of metadata and gene name autocompletion as users type in their list.

User-defined organisms and networks

Unlike the GeneMANIA prediction server which only supports 8 organisms, the app allows users to perform predictions on their own organisms. To import an organism into a user’s local database, the user needs to provide a tab-delimited file containing the organism’s genome, where each row contains the primary identifier of a gene followed by alternate identifiers and synonyms. From there, users may import tab-delimited network data or expression profiles. Users may also import networks or expression profiles they have loaded into Cytoscape. The app can also be used with non-biological data such as social networks, where the nodes are individuals and edges represent various relationships between them.

Results

To demonstrate the steps involved with performing predictions on custom organisms not already provided by GeneMANIA, Ensembl⁹ Gene IDs and their associated gene names for Felis catus were imported from BioMart¹⁰ and imported into GeneMANIA as an organism. Data set GSE46431 was downloaded from the Gene Expression Omnibus (GEO)¹¹ and imported directly as expression profile data to yield a coexpression network. On a 2.3 GHz Intel Core i7 3615QM system with 16 GB RAM and SSD storage, it took approximately 5 minutes to import the data. Using this network, the app was used to find and display the 20 genes most related to ASIP, which took 1 second (Figure 4).

Figure 4. The 20 genes most related to Felis catus gene ASIP, based on GEO dataset GSE46431.

The expression profiles from this dataset were converted into a co-expression network using the GeneMANIA app.

Conclusions

The GeneMANIA app extends the capabilities of the GeneMANIA prediction server by allowing users to quickly construct networks from gene lists for custom organisms and network data without imposing any limits on the size of the inputs or output while retaining provenance of the source data. The app also allows users to replicate past results by providing access to all publicly-released GeneMANIA datasets.

Software availability

Software available from the Cytoscape’s App Manager or the App Store: http://apps.cytoscape.org/apps/GeneMania.

Latest source code: https://github.com/GeneMANIA/genemania.

Source code as at the time of publication: https://github.com/F1000Research/genemania/releases/tag/V1.0

Archived source code as at the time of publication: http://www.dx.doi.org/10.5281/zenodo.10523¹²

License: LGPL 2.1: https://www.gnu.org/licenses/old-licenses/lgpl-2.1.html

Author contributions

JM wrote the manuscript. JM and KZ wrote the software. JM, KZ, and HR prepared the prepackaged network data. GDB and QM designed and supervised the project.

Competing interests

No competing interests were disclosed.

Grant information

This work was funded by the Ontario Government’s Ministry of Research and Innovation via the Global Leadership Round in Genomics & Life Sciences (GL2) program (assigned to GDB and QM); and the National Resource for Network Biology (U.S. National Institutes of Health, National Center for Research Resources grant number P41GM103504, assigned to GDB).

The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Faculty Opinions recommended

References

1. Shannon P, Markiel A, Ozier O, et al.: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003; 13(11): 2498–2504. PubMed Abstract | Publisher Full Text | Free Full Text
2. Mostafavi S, Morris Q: Fast integration of heterogeneous data sources for predicting gene function with limited annotation. Bioinformatics. 2010; 26(14): 1759–1765. PubMed Abstract | Publisher Full Text | Free Full Text
3. Zuberi K, Franz M, Rodriguez H, et al.: GeneMANIA prediction server 2013 update. Nucleic Acids Res. 2013; 41(Web Server issue): W115–W122. PubMed Abstract | Publisher Full Text | Free Full Text
4. Harris MA, Clark J, Ireland A, et al.: Gene Ontology Consortium. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 2004; 32(Database issue): D258–D261. PubMed Abstract | Publisher Full Text | Free Full Text
5. Montojo J, Zuberi K, Rodriguez H, et al.: GeneMANIA Cytoscape plugin: fast gene function predictions on the desktop. Bioinformatics. 2010; 26(22): 2927–2928. PubMed Abstract | Publisher Full Text | Free Full Text
6. Montojo J, Zuberi K, Shao Q, et al.: Network Assessor: an automated method for quantitative assessment of a network’s potential for gene function prediction. Front Genet. 2014; 5: 123. PubMed Abstract | Publisher Full Text | Free Full Text
7. Brown KR, Jurisica I: Unequal evolutionary conservation of human protein interactions in interologous networks. Genome Biol. 2007; 8(5): R95. PubMed Abstract | Publisher Full Text | Free Full Text
8. Keshava Prasad TS, Goel R, Kandasamy K, et al.: Human Protein Reference Database--2009 update. Nucleic Acids Res. 2009; 37(Database issue): D767–D772. PubMed Abstract | Publisher Full Text | Free Full Text
9. Flicek P, Amode MR, Barrell D, et al.: Ensembl 2014. Nucleic Acids Res. 2014; 42(Database issue): D749–D755. PubMed Abstract | Publisher Full Text | Free Full Text
10. Kinsella RJ, Kähäri A, Haider S, et al.: Ensembl Biomarts: a hub for data retrieval across taxonomic space. Database (Oxford). 2011; 2011: bar030. PubMed Abstract | Publisher Full Text | Free Full Text
11. Barrett T, Wilhite SE, Ledoux P, et al.: NCBI GEO: archive for functional genomics data sets--update. Nucleic Acids Res. 2013; 41(Database issue): D991–D995. PubMed Abstract | Publisher Full Text | Free Full Text
12. Montojo J, Zuberi K, Rodriguez H, et al.: F1000Research/genemania. ZENODO. 2014. Data Source

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 01 Jul 2014

Author details Author details

¹ Departments of Molecular Genetics and Computer Science, The Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada

Competing interests

No competing interests were disclosed.

Grant information

This work was funded by the Ontario Government’s Ministry of Research and Innovation via the Global Leadership Round in Genomics & Life Sciences (GL2) program (assigned to GDB and QM); and the National Resource for Network Biology (U.S. National Institutes of Health, National Center for Research Resources grant number P41 GM103504, assigned to GDB).
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Article Versions (1)

version 1

Published: 01 Jul 2014, 3:153

https://doi.org/10.12688/f1000research.4572.1

Copyright

© 2014 Montojo J et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Data associated with the article are available under the terms of the Creative Commons Zero "No rights reserved" data waiver (CC0 1.0 Public domain dedication).

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

0

SEE MORE DETAILS

CITE

how to cite this article

Montojo J, Zuberi K, Rodriguez H et al. GeneMANIA: Fast gene network construction and function prediction for Cytoscape [version 1; peer review: 2 approved] F1000Research 2014, 3:153 (https://doi.org/10.12688/f1000research.4572.1)

NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?

Key to Reviewer Statuses VIEW HIDE

ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested

Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.

Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions

Version 1

VERSION 1

PUBLISHED 01 Jul 2014

Views

55

Reviewer Report 01 Aug 2014

Rosalba Giugno, Department of Molecular Virology, Immunology and Medical Genetics, The Ohio State University, Columbus, OH, USA

Approved

https://doi.org/10.5256/f1000research.4891.r5567

The authors present an updated version of an affirmed work in the literature. A few comments follow.

The authors should highlight the upgrading aspects with respect to the previous work. Moreover, it would be useful to report the used database sources, ... Continue reading

CITE

Report a concern

Respond or Comment

Views

50

Reviewer Report 21 Jul 2014

Giovanni Scardoni, Center for Biomedical Computing, University of Verona, Verona, Italy

Approved

https://doi.org/10.5256/f1000research.4891.r5285

In this paper the authors present the GeneMANIA Cytoscape app. It allows users to construct gene networks from a gene list. The paper is well written and complete.

Minor comment:

The database contains over 500 million interactions spanning 8 organisms: A. thaliana, C. elegans, D. melanogaster, D. ... Continue reading

In this paper the authors present the GeneMANIA Cytoscape app. It allows users to construct gene networks from a gene list. The paper is well written and complete.

Minor comment:

The database contains over 500 million interactions spanning 8 organisms: A. thaliana, C. elegans, D. melanogaster, D. rerio, H. sapiens, M. musculus, R. norvegicus, and S. cerevisiae. It is not clear how this interactions are obtained (literature, other database ...) More details about it would improve the paper.

Competing Interests: No competing interests were disclosed.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

CITE

Report a concern

Respond or Comment

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 01 Jul 2014

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1	2
Version 1 01 Jul 14	read	read

Giovanni Scardoni, University of Verona, Verona, Italy
Rosalba Giugno, The Ohio State University, Columbus, OH, USA

Comments on this article

All Comments(0)

Add a comment

Sign up for content alerts

Browse by related subjects

Back to all reports

Reviewer Report

55 Views

01 Aug 2014 | for Version 1

Rosalba Giugno, Department of Molecular Virology, Immunology and Medical Genetics, The Ohio State University, Columbus, OH, USA

55 Views Cite this report Responses(0)

Approved

The authors present an updated version of an affirmed work in the literature. A few comments follow.

The authors should highlight the upgrading aspects with respect to the previous work. Moreover, it would be useful to report the used database sources, the biggest retrieved network and how they deal with such a network with respect to the ‘unlimited’ list of genes (visualization and running time).

Competing Interests

No competing interests were disclosed.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

50 Views

21 Jul 2014 | for Version 1

Giovanni Scardoni, Center for Biomedical Computing, University of Verona, Verona, Italy

50 Views Cite this report Responses(0)

Approved

In this paper the authors present the GeneMANIA Cytoscape app. It allows users to construct gene networks from a gene list. The paper is well written and complete.

Minor comment:

The database contains over 500 million interactions spanning 8 organisms: A. thaliana, C. elegans, D. melanogaster, D. rerio, H. sapiens, M. musculus, R. norvegicus, and S. cerevisiae. It is not clear how this interactions are obtained (literature, other database ...) More details about it would improve the paper.

Competing Interests

No competing interests were disclosed.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (0)

[1] 1. Shannon P, Markiel A, Ozier O, et al.: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003; 13(11): 2498–2504. PubMed Abstract | Publisher Full Text | Free Full Text

[2] 2. Mostafavi S, Morris Q: Fast integration of heterogeneous data sources for predicting gene function with limited annotation. Bioinformatics. 2010; 26(14): 1759–1765. PubMed Abstract | Publisher Full Text | Free Full Text

[3] 3. Zuberi K, Franz M, Rodriguez H, et al.: GeneMANIA prediction server 2013 update. Nucleic Acids Res. 2013; 41(Web Server issue): W115–W122. PubMed Abstract | Publisher Full Text | Free Full Text

[4] 4. Harris MA, Clark J, Ireland A, et al.: Gene Ontology Consortium. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 2004; 32(Database issue): D258–D261. PubMed Abstract | Publisher Full Text | Free Full Text

[5] 5. Montojo J, Zuberi K, Rodriguez H, et al.: GeneMANIA Cytoscape plugin: fast gene function predictions on the desktop. Bioinformatics. 2010; 26(22): 2927–2928. PubMed Abstract | Publisher Full Text | Free Full Text

[6] 6. Montojo J, Zuberi K, Shao Q, et al.: Network Assessor: an automated method for quantitative assessment of a network’s potential for gene function prediction. Front Genet. 2014; 5: 123. PubMed Abstract | Publisher Full Text | Free Full Text

[7] 7. Brown KR, Jurisica I: Unequal evolutionary conservation of human protein interactions in interologous networks. Genome Biol. 2007; 8(5): R95. PubMed Abstract | Publisher Full Text | Free Full Text

[8] 8. Keshava Prasad TS, Goel R, Kandasamy K, et al.: Human Protein Reference Database--2009 update. Nucleic Acids Res. 2009; 37(Database issue): D767–D772. PubMed Abstract | Publisher Full Text | Free Full Text

[9] 9. Flicek P, Amode MR, Barrell D, et al.: Ensembl 2014. Nucleic Acids Res. 2014; 42(Database issue): D749–D755. PubMed Abstract | Publisher Full Text | Free Full Text

[10] 10. Kinsella RJ, Kähäri A, Haider S, et al.: Ensembl Biomarts: a hub for data retrieval across taxonomic space. Database (Oxford). 2011; 2011: bar030. PubMed Abstract | Publisher Full Text | Free Full Text

[11] 11. Barrett T, Wilhite SE, Ledoux P, et al.: NCBI GEO: archive for functional genomics data sets--update. Nucleic Acids Res. 2013; 41(Database issue): D991–D995. PubMed Abstract | Publisher Full Text | Free Full Text

[12] 12. Montojo J, Zuberi K, Rodriguez H, et al.: F1000Research/genemania. ZENODO. 2014. Data Source

GeneMANIA: Fast gene network construction and function prediction for Cytoscape

Abstract

Introduction

Source networks

Gene scores

Composite network

Figure 1. Composite network for BRCA1.

Figure 2. Sample of provenance details provided for each edge in composite network.

Implementation

Figure 3. Architecture diagram of the GeneMANIA app illustrating the inputs and outputs of the system.

Database

User-defined organisms and networks

Results

Figure 4. The 20 genes most related to Felis catus gene ASIP, based on GEO dataset GSE46431.

Conclusions

Software availability

Author contributions

Competing interests

Grant information

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated