Key Points
-
Several standards are contributing to advancement of knowledge in biology, however most standardization initiatives are still in the investment stage for biologists.
-
Developing a complete and self-contained standard in biology involves four steps: conceptual model design, model formalization, development of a data exchange format and implementation of the supporting tools.
-
In life sciences, standards development typically is done by grass roots movements, and it is difficult to persuade funding agencies to fund such activities.
-
Although it might be faster for a single organization to develop its own standards, a bottom-up community consensus approach is key to the long-term acceptance and usefulness of standards.
-
Developing and deploying a standard creates an overhead, which can be expensive. Standards related to a particular technology have a life span that is no longer than the technology itself and there is only a limited period of time in which the overheads can be paid off.
-
The body of biological knowledge is incomplete and expanding rapidly; therefore, standards that describe biological knowledge have to be flexible, and a mechanism of change must be a part of the standard.
-
To avoid proliferation of standards, common features of existing standards should be re-used wherever possible. Simplicity, but not oversimplification, is the key to success.
Abstract
High-throughput technologies are generating large amounts of complex data that have to be stored in databases, communicated to various data analysis tools and interpreted by scientists. Data representation and communication standards are needed to implement these steps efficiently. Here we give a classification of various standards related to systems biology and discuss various aspects of standardization in life sciences in general. Why are some standards more successful than others, what are the prerequisites for a standard to succeed and what are the possible pitfalls?
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Ashburner, M. et al. Gene Ontology: tool for the unification of biology. Nature Genet. 25, 25–29 (2000). GO has been a true success story: it has been taken up by the entire scientific community as the main means for annotation of gene products.
Brazma, A. et al. Minimum Information About a Microarray Experiment (MIAME) — toward standards for microarray data. Nature Genet. 29, 365–371 (2001). The first result of the microarray data standardization effort was a community agreement about the level of detail necessary to make data exchange meaningful (MIAME). MIAME set a pace for such standards (Minimum Information About XYZ) in other domains.
Hucka, M. et al. The Systems Biology Markup Language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 19, 524–531 (2003). SBML has been evolving since the early 2000s through the efforts of an international group of software developers and users. Today, SBML is supported by over 90 software systems.
Lloyd, C. M., Halstead M. D. & Nielsen P. F. CellML: its future, present and past. Prog. Biophys. Mol. Biol. 85, 433–450 (2004).
Quackenbush, J. Data standards for 'omic' science. Nature Biotechnol. 22, 613–614 (2004).
Stoeckert, C. J. Jr, Causton, H. C. & Ball, C. A. Microarray databases: standards and ontologies. Nature Genet. 32, S469–S473 (2002).
Gentleman, R. C. et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5, R80 (2004).
Brazma, A. On the importance of standardisation in life sciences. Bioinformatics 17, 113–114 (2001).
Brazma, A., Robinson, A., Cameron, G. & Ashburner, M. One-stop shop for microarray data. Commentary. Nature 403, 699–700 (2000).
Spellman, P. A status report on MAGE. Bioinformatics 21, 3459–3460 (2005).
Whetzel, P. L. et al. The MGED Ontology; a resource for semantics-based description of microarray experiments. Bioinformatics 22, 866–873 (2006).
Eyre, T. A. et al. The HUGO Gene Nomenclature Database, updates. Nucleic Acids Res. 34, D319–D321 (2006).
Schlitt, T. & Brazma A. Modelling gene networks at different organisational levels. FEBS Lett. 579, 1859–1866 (2005).
Schlitt, T. & Brazma A. Modelling in molecular biology: describing transcription regulatory networks. Philos. Trans. R. Soc. B 361, 483–494 (2006).
Bard, J., Rhee, S.Y. & Ashburner, M. An ontology for cell types. Genome Biol. 6, R21 (2005).
Kelso, J. et al. eVOC: a controlled vocabulary for unifying gene expression data. Genome Res. 13, 1222–1230 (2003).
Bard, J. B. & Rhee, S.Y. Ontologies in biology: design, applications and future challenges. Nature Rev. Genet. 5, 213–222 (2004).
Hermjakob, H. et al. The HUPO PSI's molecular interaction format — a community standard for the representation of protein interaction data. Nature Biotechnol. 22, 177–183 (2004). The PSI aims to define community standards for data representation in proteomics to facilitate data comparison, exchange and verification. The data exchange format for protein–protein interactions PSI-MI was designed by a group of people including representatives from database providers and users in both academia and industry, and is supported by the DIP, MINT, IntAct, BIND and HPRD databases.
Luciano, J. S. PAX of mind for pathway researchers. Drug Discov. Today. 10, 937–942 (2005).
Joshi-Tope, G. et al. Reactome: a knowledgebase of biological pathways. Nucleic Acids Res. 33, D428–D432 (2005).
Tyson, J. J. Modeling the cell division cycle: cdc2 and cyclin interactions. Proc. Natl Acad. Sci. USA 88, 7328–7332 (1991).
Huang, C. Y. & Ferrell, J. E. Jr. Ultrasensitivity in the mitogen-activated protein kinase cascade. Proc. Natl Acad. Sci. USA 93, 10078–10083 (1996).
Stromback, L. & Lambrix, P. Representations of molecular pathways: an evaluation of SBML, PSI MI and BioPAX. Bioinformatics. 21, 4401–4407 (2005).
Le Novere, N. et al. Minimum information requested in the annotation of biochemical models (MIRIAM). Nature Biotechnol. 23, 1509–1515 (2005).
Le Novere, N. et al. BioModels Database: a free, centralized database of curated, published, quantitative kinetic models of biochemical and cellular systems. Nucleic Acids Res. 34, D689–D691 (2006).
Ball, C. A. et al. Submission of microarray data to public repositories. PLoS Biol. e317 (2004).
Stoeckert, C. J., Quackenbush, J., Brazma, A. & Ball, C. A. Minimum information about a functional genomics experiment: the state of microarray standards and their extension to other technologies. Drug Discov. Today 3, 159–164 (2004).
Brazma, A. et al. ArrayExpress — a public repository for microarray gene expression data at the EBI. Nucleic Acids Res. 31, 68–71 (2003).
Barrett, T. et al. NCBI GEO: mining millions of expression profiles — database and tools. Nucleic Acids Res. 33, D562–D566 (2005).
Gollub, J. et al. The Stanford Microarray Database: data access and quality assessment tools. Nucleic Acids Res. 31, 94–96 (2003).
Sarkans, U. et al. The ArrayExpress gene expression database: a software engineering and implementation perspective. Bioinformatics, 21, 1495–1501 (2005).
Orchard, S., Hermjakob, H., Taylor, C., Aebersold, R. & Apweiler, R. Human Proteome Organisation Proteomics Standards Initiative. Pre-Congress Initiative. Proteomics 5, 4651–4652 (2005).
Orchard, S. et al. Common interchange standards for proteomics data: public availability of tools and schema. Proteomics 4, 490–491 (2004).
Taylor, C. F. et al. A systematic approach to modeling, capturing, and disseminating proteomics experimental data. Nature Biotechnol. 21, 247–254 (2003).
Jenkins, H. et al. A proposed framework for the description of plant metabolomics experiments and their results. Nature Biotechnol. 22, 1601–1606 (2004).
Fogh, R. et al. The CCPN project: an interim report on a data model for the NMR community. Nature Struct. Biol. 9, 416–418 (2002).
Lindon, J. C. et al. Standard Metabolic Reporting Structures working group. Summary recommendations for standardization and reporting of metabolic analyses. Nature Biotechnol. 23, 833–838 (2005). The SMRS group aims to supply an open, community-driven specification for the reporting of metabonomic/metabolomic experiments and a standard file transfer format for the data. Participants in the SMRS include leaders in the fields of metabonomics and metabolomics from both industry and academia.
Goldberg, I. G. et al. The Open Microscopy Environment (OME) data model and XML file: open tools for informatics and quantitative analysis in biological imaging. Genome Biol. 6, R47 (2005).
Jones, A., Hunt, E., Wastling, J. M., Pizarro, A. & Stoeckert, C. J. Jr. An object model and database for functional genomics. Bioinformatics 20, 1583–1590 (2004).
Xirasagar, S. et al. CEBS object model for systems biology data, SysBio-OM. Bioinformatics 20, 2004–2015 (2004).
Rendl, M., Lewis, L. & Fuchs, E. Molecular dissection of mesenchymal–epithelial interactions in the hair follicle. PLoS Biol. 3, e331 (2005).
Cassman, M. Barriers to progress in systems biology. Nature 438, 1079 (2005).
Quackenbush, J. et al. Top-down standards will not serve systems biology. Nature 440, 24 (2006).
Raychaudhuri, S., Chang, J. T., Sutphin, P. D. & Altman, R. B. Associating genes with gene ontology codes using a maximum entropy analysis of biomedical literature. Genome Res. 12, 203–214 (2002).
[Editorial] Microarray standards at last. Nature 419, 323 (2002).
Dolin, R. H. et al. HL7 clinical document architecture, Release 2. J. Am. Med. Inform. Assoc. 13, 30–39 (2006).
Carr, S. et al. Working Group on Publication Guidelines for Peptide and Protein Identification Data. The need for guidelines in publication of peptide and protein identification data. Mol. Cell. Proteomics 3, 531–533 (2004).
Jones, A., Wastling, J. & Hunt, E. Proposal for a standard representation of two-dimensional gel electrophoresis data. Comp. Funct. Genomics 5, 492–501 (2003).
Pedrioli, P. G. et al. A common open representation of mass spectrometry data and its application to proteomics research. Nature Biotechnol. 22, 1459–1466 (2004).
Acknowledgements
We would like to thank M. Ashburner, C. Brooksbank, H. Hermjakob and N. Le Novere for reading the manuscript and providing valuable comments. The work on this survey was partly funded by the MolPAGE grant from the European Commission and a grant from the US National Human Genome Research Institute and National Institute of Biomedical Imaging and Bioengineering.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Related links
Related links
FURTHER INFORMATION
MIAME and MAGE-OM Terms Explained web site
Microarray and Gene Expression
Microarray Gene Expression Data Society web site
MISFISHIE Standard Working Group web page
Proteomics Standards Initiative — Molecular Interactions
Summary Report — W3C Workshop on Semantic Web for Life Sciences
Systems Biology Markup Language
W3C Semantic Web Health Care and Life Sciences Interest Group web site
Glossary
- Domain
-
A field of study.
- Conceptual model
-
In information engineering, a model that is meant to facilitate human communication; it does not need to be absolutely precise, as opposed to a formal model that has strictly defined semantics.
- Data exchange format
-
A file or message format that is formally defined so that software can be built that 'knows' where to find various pieces of information.
- Ontology
-
A model that describes a domain and can be used to reason about objects and relationships between them.
- Tool
-
In softwa re engineering, a program or set of programs that enables a certain task(s).
- Directed acyclic graph
-
A graph consisting of nodes and edges, where edges have direction (that is, can be traversed only one way), and it is not possible to find a set of edges that form a closed loop.
- Diagram
-
A visual representation of concepts and relationships, used in information engineering to facilitate human communication.
- Semantics
-
The meaning of something; in computer science, it is usually used in opposition to syntax (that is, format).
- Reporting requirements
-
An agreed set of information items that needs to be provided for meaningful information communication (reporting).
- Metabolomics
-
The study of metabolite profiles in individual cells and cell types.
- Metabonomics
-
The study of systemic response to the pathophysiological stimuli and regulation of function in the whole organism through analysis of biofluids and tissues.
- Visual language
-
In computer science and computer engineering, an agreed set of conventions for drawing diagrams that formally describe a model or a program.
- Class
-
A concept used in ontology engineering and model building for referring to a set of objects with similar properties.
- Graph
-
A visual representation of information in the form of edges (lines) and nodes (connection points). In biology, graphs can be represented as boxes (nodes) and lines between boxes.
Rights and permissions
About this article
Cite this article
Brazma, A., Krestyaninova, M. & Sarkans, U. Standards for systems biology. Nat Rev Genet 7, 593–605 (2006). https://doi.org/10.1038/nrg1922
Issue Date:
DOI: https://doi.org/10.1038/nrg1922
This article is cited by
-
A minimum information standard for reproducing bench-scale bacterial cell growth and productivity
Communications Biology (2018)
-
Image Data Resource: a bioimage data integration and publication platform
Nature Methods (2017)
-
Compliance with minimum information guidelines in public metabolomics repositories
Scientific Data (2017)
-
e!DAL - a framework to store, share and publish research data
BMC Bioinformatics (2014)
-
Biomarkers in autism spectrum disorder: the old and the new
Psychopharmacology (2014)