Quo Vadis, enzymology data?

The post-genomic era is characterized by a gold-rush mood, because many previously separate disciplines, ranging from biology and biochemistry to physics, mathematics and computer sciences, have grown together and contribute to the generation of enormous amounts of experimental and theoretical data. These data are published in journals and often collected in electronic data repositories. Such resources provide, as a challenge for intelligent data mining, many potential chances to create new knowledge and to gain insights into complex biological systems. One approach of, for example, systems biologists, is not only to depict the cellular metabolic pathways such as those drawn in the well-known Boehringer poster or the KEGG pathway map but to enter in the third dimension with a higher level of information such as the e-cell project (Tomita et al., 1999; Takahashi et al., 2004). Apart to the basic scientific understanding of metabolic networks the application of these digitized maps can also be useful for the simulation of the treatment of diseases such as diabetes which could lead to the development of new “intelligent” drugs (Werner, 2002). However, the way to this scientific goldmine is paved with serious problems. Have you also been faced with the difficulty for comparing your kinetic data obtained from your experimental results with those published in the literature? Have you been interested in the effect of directed mutations within the catalytic domain or within structure determining sections of the protein on structure– function relationships regarding the catalytic properties? Or did you just want to understand the experimental results in the literature and to draw the conclusion in reference to the materials and methods described? Or have you tried to construct a computer model on the basis of published data?


The enzymology dilemma
The post-genomic era is characterized by a gold-rush mood, because many previously separate disciplines, ranging from biology and biochemistry to physics, mathematics and computer sciences, have grown together and contribute to the generation of enormous amounts of experimental and theoretical data. These data are published in journals and often collected in electronic data repositories. Such resources provide, as a challenge for intelligent data mining, many potential chances to create new knowledge and to gain insights into complex biological systems. One approach of, for example, systems biologists, is not only to depict the cellular metabolic pathways such as those drawn in the well-known Boehringer poster or the KEGG pathway map but to enter in the third dimension with a higher level of information such as the e-cell project (Tomita et al., 1999;Takahashi et al., 2004). Apart to the basic scientific understanding of metabolic networks the application of these digitized maps can also be useful for the simulation of the treatment of diseases such as diabetes which could lead to the development of new "intelligent" drugs (Werner, 2002).
However, the way to this scientific goldmine is paved with serious problems. Have you also been faced with the difficulty for comparing your kinetic data obtained from your experimental results with those published in the literature? Have you been interested in the effect of directed mutations within the catalytic domain or within structure determining sections of the protein on structurefunction relationships regarding the catalytic properties? Or did you just want to understand the experimental results in the literature and to draw the conclusion in reference to the materials and methods described? Or have you tried to construct a computer model on the basis of published data?
The following brief examples will demonstrate the stumbling blocks on the way to the goldmine.
Imagine you are investigating the functional properties of the enzymes of your particular interest. Appropriate, that is to say published and proven, methodologies are applied and your assays produce apparently reasonable results. Imagine you are working on the characterization of the key enzymes of a wellknown metabolic pathway, which could be glycolysis in baker's yeast. Your primary interest could be to understand the interdependences of the metabolic control of this pathway and thus you intend to supply the simulation algorithms such as JWS Online (Olivier and Snoep, 2004) with your kinetic data. However, before doing the theoretical work you want to refer to the primary literature to seek for support for your own experimental results. For this purpose the most productive way would be to query enzyme data bases such as BRENDA (Schomburg et al., 2013) or SABIO-RK (Wittig et al., 2012) to obtain the appropriate references along with the functional enzyme data and to enter these data in a spread sheet. After the compilation of all relevant data you will make the surprising discovery that the functional data is fragmented in such a way that for particular enzymes there are no published data at all, or that they exist but span an excessively broad range. For example, K m values from the literature (as stored, for example, in BRENDA) may have been measured at pH values from 3 to more than 10, and at temperatures from 0 to more than 100 1C. This is clearly not the fault of curators of these databases, but arises from the inadequacy of the data in the literature, since the functional data were extracted from publications in primary biochemistry journals.
Imagine another researcher who characterizes the ATPcoupled transport of ions across biological membranes. Usually these transporters are ion pumps that couple the transport of, for example, protons across the plasma membrane or intra-cellular membranes of compartments such as lysosomes or vacuoles against chemo-osmotic gradients to the hydrolysis of ATP. Among other issues regarding the catalytic properties of this enzyme, in particular, the thermodynamic coupling ratio is the relationship of the  (Rea and Sanders, 1987). This ratio is calculated as a function of ΔG and both the transport of charges and equilibrium reaction of the hydrolysis of ATP (see for example Kettner et al., 2003). However, this calculation requires the value of the apparent equilibrium constant of the ATP hydrolysis, K ATP , which depends on a number of parameters such as the pH and the concentrations of Mg 2 + , K + and Ca 2 + (Alberty, 1968;Rosing and Slater, 1972). When the calculations have been done our imaginary researcher wants to know whether his coupling ratios are consistent with those previously published with other organisms. However, he fails, despite finding coupling ratios in biochemical or biophysical papers, either because the calculations are not available or because they are insufficiently set out in the Materials and Methods section of the papers. Thus, he can neither understand the published values nor compare his results with the published ones.
These two following examples demonstrate the dilemma of protein functional data: Even though there are few projects that collect and organize functional and kinetic enzyme data such as the BRENDA database for enzyme functions and properties, SABIO-RK for biochemical reactions within metabolic pathways, KEGG, BioCyc (Caspi et al., 2010), and BioCarta for the representation of metabolic pathways, the availability of comparable functional enzyme data is limited or sometimes non-existent. But this comparability based on homogeneous experimental designs is required when using the kinetic data, for example for the understanding of the metabolic flux control.
The common property of, in particular, the enzyme data collections is that they are created retrospectively, extracting functional data from the literature by hand, a very expensive, time-consuming and often error-prone process that is never trivial. The difficulties derive from the fact that the data are widely distributed among the journals from different fields. Actually, the results from experimental work need to be interpreted and standardized to create unambiguous data sets for the comprehensive description of the individual enzyme.
The implementation of different experimental designs affects significantly the estimation of kinetic parameters. For example different wavelengths applied to record NADH oxidation in coupled optical tests may lead to different values of the product concentrations, and thus to different kinetic parameters for the enzyme (see for example Kettner and Hicks, 2005).
In conclusion, data generated in laboratories that use different methods result in large ranges of method-specific data. Additionally, if the experimental conditions are not clearly and fully stated, the data can, in worst cases, lead to misinterpretations of laboratory findings when data move between researchers whose laboratories employ individual methods. In practice, kinetics data are sometimes extrapolated from published experimental conditions and results to different assay conditions and lead to "new" data with high uncertainties. In particular, in silico analysis and representations of metabolic systems are certainly impossible under these circumstances (Stelling et al., 2002). Nicolas Le Novère expressed the consequences more drastically: "There is no point to exchanging quantitative data or models if nobody understands the meaning of the data and the content of the models beside their initial generators." (Le Novère et al., 2007). We have nothing to add.
The "computational" community of metabolic network researchers is not the only one that suffers from these problems, and there are many other scientific reasons for the requirement of enzyme data, such as for understanding the contribution of complex biological pathways to human pathophysiology and disease, for biotechnology applications, the representations of structure-function relationships, the generation of a comprehensive enzyme compendium, which in turn supports the interpretation of the genome information by using a systematic and standardized collection of functional enzyme data.
Therefore, successful research in the "omics" disciplines requires functional protein data to be comprehensively available, comparable, valid and reliable, ideally collected under physiological standardized conditions.

How to improve this unsatisfactory situation?
It may seem too idealistic to try to create enzymology data sets of the high quality needed. It may be tempting to take enzyme data that are not truly comparable and to use them for modeling and simulation anyway. If they are affected by the lack of the availability of tools for data analysis (because these data were obtained too long ago) so that the kinetic parameters calculated do not fit the experimental data properly, or if experiments were carried out under non-physiological conditions, they could be corrected by recalculating the reaction mechanisms by using thermodynamics criteria. The applicability of this approach was demonstrated long ago, by Frieden and Alberty (1955), but it faded into obscurity. Recently Beard et al. (2008) took up this suggestion and reanalysed the kinetics of citrate synthase (EC 2.3.3.1) in an exemplary manner, using data from various sources. However, even though this approach can be successful it is very time-consuming to collect all relevant data from published sources and it is doubtful whether the community will really profit from such work by using the rejuvenated data for further investigations. Additionally, correction of calculated and published data can be considered a retrospective method.
What about avoiding these correction requirements and generating prospective comparable data by adopting appropriate recommendations or standards? However, what does standardization mean, what kinds of standards are available?

Uniform standard of practice
The basic idea of standardized assays is to unify the experimental conditions when carrying out the experimental characterization of an identified enzyme. This can be equated with the use of a single, uniform and agreed methodology and would lead to a set of protocols or experimental recipes that might be applicable for the study of enzymes in comparable cellular environments. In molecular biology protocols are not unusual, and are applied, for example, in procedures for heterologous expression of proteins in yeast using vectors made in Escherichia coli, etc.
The hope is to significantly reduce the method-dependent between-laboratory variability of reported enzyme data when applying uniform methodologies for enzyme characterizations. In the field of applied enzymology clinical chemists were also concerned with the difficulty to interpret enzyme-activity measurements in human serum due to the numerous analytical procedures for enzyme assays performed. This not only leads to uncertainties of the physicians to diagnose the patients with a clear vision of a disease and to decide for the correct therapy, but also complicates the transfer of clinical laboratory results from the literature to the daily medicinal treatment of the patients. Therefore, in the 1970s the Enzyme Commission of the Netherlands Society for Clinical Chemistry introduced recommended methods for the determination of the activity of a series of enzymes and subjected these uniform methods a test under the supervision of the Netherlands National External Quality Control Program. The trial showed a decreased between-laboratory variation for the determination of aspartate aminotransferase and alanine aminotransferase from about 50% to about 15% for a group of 40 participating laboratories (Jansen et al., 1977). This proof of increased consistency of laboratory experimental results prompted the International Federation of Clinical Chemistry and Laboratory Medicine (IFCC) to continue working on guideline definitions on standard operation procedures for a number of certain enzymes. The result is, for instance, that after about 80% of laboratories in the United Kingdom National External Quality Assessment Schemes (UK NEQAS) had adopted the method for the measurement of creatine kinase activity according to the IFCC guidelines the interlaboratory agreement dropped to a coefficient of variation of less than 10% (Moss, 1997).
In the basic research of pathway investigation, the first approaches to the application of uniform methods were demonstrated for the experimental analysis of the enzymes involved in glycolysis in baker's yeast. The strategy was first to evaluate the intra-cellular conditions for cells in a determined environment and second to study the kinetics of the enzymes involved under these "physiological" conditions in comparison with commercially available enzymes (van Eunen et al., 2010; see also van Eunen and Bakker, 2014). The successful demonstration of a proof-of-principle suggests the application of this protocol to assay all other enzymes in the yeast cytosol. In addition, the strategy demonstrated here could serve as a template for the standardization of experimental conditions in other compartments and organisms. There are some additional success stories worthy of mention: within both the yeast systems biology network (Mustacchi et al., 2006) and the competence network of the systems biology of liver cells (HepatoSys) (Klingmüller et al., 2006) first approaches towards the generation of comparable and reproducible quantitative data under standardized experimental conditions have been presented.
However, the disadvantages of uniform standards of practice should not be concealed. Both analytical methods and laboratory techniques are subject of permanent developments and improvements. Methods and techniques, once recommended to and agreed by the community, will respond slowly the technological advances. Recommended methods also can become corrupted, either inadvertently, by misinterpretation of the standards, or deliberately, to accommodate the limitations imposed by automated instrumentation. Consequently, acceptance of these recommended methods will decrease, and the procedures of experiments will not comply with a uniform practice leading to incomparable enzymology data. Last but not least, it is questionable whether standard protocols can be applied to enzymes of unknown function, identity or even cellular localization. Finally, the results of many discussions raise serious doubts about whether the scientific community would be willing to adopt a particular recommendation when the consequent lack of a clear advantage of one method over another appears to provide no incentive for users.

Using reference enzymes as standards
Despite the widespread application of the IFCC guidelines it has become obvious that this approach was reaching its limits of improvement due to the disadvantages shown above. In particular, for the IFCC guidelines it turned out that transfer of some procedures was impractical for routine test practices, such as temperature, the need for sample blanks, long reaction times and limited linearity (Panteghini et al., 2001). This observation drove the development of additional components to the standardization of methods, specifically the introduction of validated calibrated enzymes to act as reference systems and to replace the use of theoretical and computational factors, which, in turn, were usually dependent on the analytical system. The use of these standards to normalize the individual laboratory results was rather successful in reducing interlaboratory variations from 50% without standard to 10% with standard (Jansen and Jansen, 1983). In brief, the IFCC Working Group on Calibrator in Clinical Enzymology has worked out guidelines for the validation of enzyme calibrators, created a network of reference laboratories where the calibrations are carried out, and set up a global reference system for the measurement of catalytic concentrations (Ferrard et al., 1998). It is anticipated that the combination of validated reference enzymes with the application of standardized procedures will result in an increase of reliability of enzyme data and in an improvement in both inter-method and inter-laboratory agreement, leading to valid diagnosis of diseases and therapy assessment.
However, the main disadvantage of the use of calibrated enzymes as reference system is that there is only a relatively small number of standards of specific enzymes available, namely alkaline phosphatase, alanine aminotransferase, α-amylase, aspartate aminotransferase, creatine kinase, γglutamyltransferase, and lactate dehydrogenase. Furthermore, these standards are usually restricted to routine tests in human health care where the relevant enzymes that need to be assayed are known. In contrast, basic enzymology research takes place on a map of metabolic networks with many gaps standing for unknown, unidentified or scientifically uncertain catalytic entities.

Standards for reporting data
The development of an applicable framework of rules for uniform experimental procedures implies a number of advantages and disadvantages, as described above. After such rules are available for applied enzymology, at least one alternative to procedural standards could be to define reporting standards, because both the implementation and acceptance of such guidelines or recommendations can be realised more rapidly. They could help to increase the value of experimental data by clear and full statements of the assay conditions used and by annotation of the results in relation to the experimental environment.
However, reporting standards should not be considered a straitjacket for the community, but as an important tool for assisting researchers to draft their papers and to make the results described reproducible. Indeed, the reality of scientific publication shows that the quality of both the "Materials and Methods" section and the "Results" section ranges from very poor to reasonably useful. As the experimental results should serve as a valid basis for the acceptance of hypotheses, or for the creation of new hypotheses that need to be accepted, again, both the materials and the methods applied, and the data generated, must be reported accurately in ways that do not allow misinterpretation. Even more, enzymology data should be reported in standardized way to link protein (structure) to enzyme function datasets and to make them machine-readable for the creation of proteinfunction databases. Apweiler et al. (2005Apweiler et al. ( , 2010 pointed out the importance of standards when protein-function data are reported in journals (see also Tipton et al., 2014).
A framework of criteria that determines a minimum of data reported will help to ensure that data generated can be located by researchers and computers alike, an important pre-requisite for successful in silico analysis and representation of metabolic systems. In recent years scientists from diverse fields in computational and experimental biology have been developing minimum information standards for improving the data quality in publications and databases. The Minimum Information for Biological and Biomedical Investigations (MIBBI) project has devoted great efforts to coordinating the development of data standards and to avoiding redundancy and incompatibility. MIBBI is intended to be a one-stop-shop for minimum-information checklists; it currently provides links to 39 registered checklists in the portal section and assistance for the creation of new, non-redundant guidelines in the foundry section (Taylor et al., 2008). In the best case, authors can access MIBBI to find the most appropriate set of minimum information guidelines when writing their papers. Examination of the publication guidelines of the major biochemistry journals confirms the emerging interest of their editors in high-quality data reporting, as a growing number of these journals have adopted community-based guidelines for data standards. However, the checklist groups need to take into account the constant changes in technology and methodology, as well as modifications of laboratory standard practices that lead to the need for continual revision and periodic updating of their lists.
The advantages of data reporting standards appear to be obvious; potential problems with the standardization of enzyme data in terms of good publication practice are so far unknown.

Does the community like standards?
This is a typical question when rules and recommendations are proposed, on account of suspicions that it may restrict scientific freedom and potentially put researchers in a straitjacket, as previously mentioned. Nobody likes rules that must be obeyed but everybody claims to want highdata quality in the literature. This appears to be an unresolvable problem, however, reality is encouraging.
The answer on the question put in the section title is simply "yes". Surprisingly, the community demands for standards according to a survey carried out by Edda Klipp and colleagues in 2006 80% of the respondents consider standards necessary whereas only 20% fear practical difficulties caused by standards (Klipp et al., 2007). However, there is also general consensus that standards that must be applied under all circumstances should not be established: they must be flexible enough to permit alternatives or new technological and methodological developments, standards should be developed by the scientific community itself, in a bottom-up approach instead of top-down, as this kind of procedure has inherent impact on their perceived legitimacy, the acceptance of standards can only be successful if they are supported by scientific journals, funding agencies and community-based initiatives, as only these institutions can enforce the use of standards.
In particular, the participants in this survey identified a number of future tasks for standardization, amongst others the standardization of experimental procedures and data reporting to support modelers in network simulations and database curators in data import and export. However, setting standards has a number of implications that affect not only on technical and scientific aspects but also touch political issues. Holmes et al. (2010) describe in detail the possible pitfalls, problems and solutions of standard setting projects using the examples of the development of minimum information checklists such as Minimum Information About a Microarray Experiment (MIAME) and HUPO-PSI.
There are numerous other examples that indicate that the scientific community does favor standards because there is a general agreement that the current situation of incomparable, to some extent invalid, and insufficiently described enzymology data needs to be revised to provide an incentive for successful data sharing between the biological disciplines.

Why and for whom this special issue?
A great number of authors from all many fields within biochemistry, ranging from thermodynamic research to in silico modeling of enzyme reactions and pathway interactions, contributed to this book to address the issue of data generation and reporting. The development of the nomenclature for enzymes and its adherent difficulties is considered as well as the IUBMB recommendations on Symbolism and Terminology in Enzyme Kinetics (Nomenclature Committee of IUB, 1982IUB, , 1983aIUB, , 1983bIUB, , 1992. The design and implementation of both enzyme assays and highthroughput assays is addressed in combination with the analysis and interpretation of the experimental results. The description of the impact of uniform and standardized data for database curation, the development of modeling algorithms and for the interlaboratory data exchange may underline all arguments that support the adoption of standards by the scientific community for implementation in its daily research routine. Examples of standards for basic and applied enzyme research as well as suggestions for quality assessment tools in the publication process complete this collection of articles.
Both editors and authors hope that this collection will help students and teachers to raise awareness of the existence and the advantage of standards for conducting research and reporting data. The adoption and acceptance of standards is a mid-term project, and includes the need to convince a wide range of people concerned that a potential smalltrivial, evenloss of academic freedom will be replaced by substantial gain in the generation of scientific knowledge.
We have tried to cover all of the appropriate topics, but there will probably be some omissions that will need to be dealt with in the future, either because we did not think of them, or because we were unable to persuade suitable authors to participate, and we shall appreciate it if readers will draw our attention to these. Experience with commissions that make recommendations tells us that nothing is ever definitive and there are always revisions to be made.
To avoid giving the impression that we regard some contributions as more important than others, we shall mention the different articles in alphabetical order of their authors. First, therefore, is the treatment of aspects of particular importance for high-throughput screening, described by Michael Acker and Douglas Auld. The requirements for more classical enzyme assays are described by Hans Bisswanger. Athel Cornish-Bowden discusses the analysis of enzyme kinetic data, in particular the statistical analysis of data, and in a separate article, describes the IUBMB recommendations on enzyme kinetics-which are now rather old and in some respects in urgent need of updatingtogether with the IUBMB system for classifying enzymecatalysed reactions, which, in contrast, is kept continuously up-to-date. Kevin Francis and Amnon Kohen discuss the analysis of kinetic isotope effects. Robert Goldberg describes the application of standards in thermodynamics to enzyme data. Peter Halling and Munishwar Gupta deal with standards for application to industrial biocatalysis. Masaaki Kotera, Susumu Goto and Minoru Kanehisa describe how databases such as KEGG can be used predictively for genome and metabolome studies. Octavio Monasterio deals with the use of nuclear magnetic resonance for studying enzyme catalysis. Ida Schomburg, Antje Chang and Dietmar Schomburg discuss standardization in enzymology in the context of the BRENDA database. Ulrike Wittig and 10 collaborators describe the problems that need to be considered and resolved in order to construct an enzyme reaction database, specifically SABIO-RK. Finally-out of alphabetical order because it deals with the whole purpose of this collection-Keith Tipton and the members of the Beilstein STRENDA Commission describe the work of this Commission: why it exists and what has been achieved.
We, the guest editors of this collection, would like to thank all authors who contributed to this collection with both their overviews and thoughts about their area of research interests and for making this special issue on topics beyond those discussed by the STRENDA Commission possible.
Robert A. (Bob) Alberty, one of the giants of enzymology of the past half century , had a long life, but, sadly, not long enough to see the completion of this collection. He died on 18th January 2014 at the age of 92. He was a loyal and enthusiastic supporter of the work of STRENDA, and in particular he campaigned for a rigorous treatment of biochemical thermodynamics, as will be evident in particular in Robert Goldberg's article.