The evolution of enzyme function in the isomerases

Highlights • Isomerases usually evolve to become enzymes from other EC classes.• Conservation of substrates is more common than conservation of bond changes.• The relationship between functional similarity and sequence similarity is not linear.


Introduction
Enzymes are life's workforce. They catalyse the biochemical reactions that are the basis of metabolism in all living organisms. The major route for creating new enzyme functions is gene duplication and subsequent evolution of one enzyme to another with a novel, though usually related, function. Under the pressures of survival and reproduction, innovating new functions at the metabolic level allows organisms to adapt to an environment of changing chemical conditions [1]; for example, bacterial resistance to manmade chemicals such as drugs or pesticides.

Previous work
Previous studies focusing on analysing enzyme superfamilies [2,3] and directed evolution experiments [4] discovered aspects of how enzyme evolution is influenced by aspects of the chemistry of enzymes. The overall chemical reaction is often changed by recruiting different catalytic residues within an active site, whilst conserving a few residues required for the catalysis of at least one mechanistic step of the overall reaction [5]. Similarly, binding different substrates is commonly achieved by changing the residues involved in substrate binding and conserving residues involved in the overall reaction [6]. There is substantial evidence supporting changes of the overall chemical reaction [7], as well as results reporting the importance of binding different substrates in the evolution of function in superfamilies [8 ,9 ,10 ]. Commonly, enzyme superfamilies evolve by a combination of these two strategies [11,12]. For instance, phosphate binding sites are often conserved, whilst the rest of the substrate can be changed during evolution [13,14].
Other comprehensive studies on the variation of enzyme sequence and structure [15,16 ] and plasticity of active sites [17,18 ] have also been fundamental in understanding how homologous enzymes accommodate alternative chemistries. Similarly, research on the convergent evolution of enzyme mechanisms [19] and active sites [20] presented nature's strategies to evolve different structural solutions for the catalysis of similar reactions [21,22 ,23]. The widespread interest in understanding the evolution and chemistry of enzymes has led to large scale collaborative projects such as the Enzyme Function Initiative (EFI) [24] which aims to determine enzyme function using both experimental and computational approaches. Starting from a comprehensive alignment of genomic regions, Zhao and co-workers from the EFI have identified the epimerase activity, pathway context and biological role in osmoprotection of a structurally characterised enzyme of unknown function from P. bermudensis using a combination of virtual screening, metabolomics, transcriptomics and biochemical experiments [25 ].
To explore this area further, we review our current knowledge of the evolution of the isomerase class of reactions, using newly developed computational tools to compare enzyme reactions [26 ] and their evolution [27]. This is a specialised class of enzymes, which catalyse geometrical and structural rearrangements between isomers. crucial role in the metabolism of terpenoids and polyketides that are important in generating secondary metabolites, especially in plants (Figure 1a).
The relative proportion of enzymes encoding for isomerase activity depends on the species. Whereas 2.6% of the genes encoding for enzymatic activity corresponds to isomerases in Homo sapiens, this proportion is higher in bacterial genomes such as Escherichia coli where they account for 6.2%. These figures correlate with the relative proportion of protein-coding genes encoding for enzymatic activity in general. Whereas in human, 20% of genes correspond to enzymes, this value increases to 37% in bacteria (Figure 1b).
The Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB) maintains the most widely used functional classification of isomerases in the Enzyme Commission (EC) classification system [28]. Isomerases belong to the EC 5 primary class and they are grouped according to the chemistry of the reactions that they catalyse. They are subdivided in three hierarchical levels: 6 subclasses, 17 sub-subclasses and 231 serial numbers (Figure 1c). These    serial numbers are associated with almost 300 biochemical reactions -for example EC 5.1.1.9 describes the racemisation of arginine, lysine or ornithine and it is therefore linked to three distinct reactions.
From a practical viewpoint, the total number of isomerase EC numbers (231) is small compared to other EC classes, which makes them attractive for manual analysis. Three of the six isomerase EC subclasses are similar to three EC primary classes (intramolecular oxidoreductases -EC 5.3 are designated from oxidoreductases -EC 1; intramolecular transferases -EC 5.4 from transferases -EC 2; and intramolecular lyases -EC 5.5 from lyases -EC 4, but refer to intramolecular reactions). Lastly, most of the isomerase reactions are unimolecular (one substrate and one product), which makes them relatively easy to compare.
Isomerases are used in many applications. In metabolic engineering, xylose isomerase (EC 5.3.1.5) has been traditionally used to convert glucose to fructose in the syrup industry and has recently been engineered to increase the yield of alcohol-based biofuels in S. cerevisiae [29]. In organic synthesis, several racemases and epimerases (EC 5.1) have been employed to resolve racemic mixtures in mild conditions and for the production of stereochemically pure amino acids [30]. Efforts in enzyme design have also managed to successfully convert racemases and epimerases acting on amino acids and derivatives (EC 5.1.1) into enzymes with lyase activity (EC 4) [31,32]. Ultimately, some racemases and epimerases acting on amino acids are also targets for the development of antimicrobial drugs and the treatment of neuropathological disorders [33].
Studies linking chemical details of the catalytic reaction with how enzyme sequences evolve considering multiple enzyme superfamilies are scarce. Whereas some studies have focused on analysing only the chemistry [34,35], other studies concentrated on sequence and structure evolution [15,22 ]. Some literature is however available addressing certain aspects of the chemistry and evolution of specific isomerases. In the 1990s, mandelate racemase (EC 5.1.2.2) and muconate-lactonizing enzyme (EC 5.5.1.1), members of the enolase superfamily, were among the first enzymes reported to be highly structurally similar yet catalysing different overall reactions. Several isomerases belonging to this superfamily have been studied over the last two decades [7]. Successive research efforts focused on ketosteroid isomerase (EC 5.3.3.1) have also been fundamental in understanding basic principles of enzyme catalysis [36 ]. In addition, general strategies to assign isomerase specificity have been recently presented [37,38,25 ], as well as comparative genomic techniques to discover new isomerases in bacterial genomes [39 ]. Methods for analysing sequence, structure and functional relationships Protein similarity networks have been used very successfully to map biological information to large sets of proteins [44,43]. However, it is also necessary to include associated changes of catalytic function during evolution preferably in an automated fashion. FunTree is a resource developed to accomplish that goal [27] and it is maintained in collaboration with the CATH classification of protein structures [45]. By combining sequence, structure, phylogenetic, chemical and mechanistic information, it allows one to answer fundamental questions about the link between enzyme activities and their evolutionary history in the context of superfamilies. FunTree uses phylogenetic methods to infer ancestral enzymes in superfamilies and estimate their most likely functions [46]. By traversing the generated phylogenetic tree from ancestor to modern enzymes, explicit changes of function are identified between groups of enzymes belonging to a superfamily. Ultimately, each functional change is represented by two sets of enzymes catalysing two distinct functions so both functions and enzymes are comparatively analysed using functional and all-against-all sequence similarity.
To explore the evolution of the isomerases, we have calculated the functional similarity between enzymes using EC-BLAST [26 ], a recently developed algorithm to automatically compare biochemical reactions. This approach introduces three measures of functional similarity -comparison of bond changes, reaction centres and structure similarity of substrate(s) and product(s)derived from the biochemical reaction catalysed by any given enzyme (Figure 1d). Bond changes refer to cleavage, formation and order change of chemical bonds and changes in stereochemistry of atoms and bonds. Reaction centres are molecular substructures representing the local environment around the atoms involved in bond changes. Last, the complete two-dimensional structures of substrate(s) and product(s) are also considered in the comparisons. These three measures are then combined with mechanistic data from MACiE [47] and extensive literature searches in order to inform our analyses.

Review of current status and availability of data on isomerase reactions and their sequences
Information related to the nomenclature of enzymes is publicly available in the ENZYME database [48]. It actively follows the recommendations of the NC-IUBMB and the 24-Jul-2013 version contained 231 current 4-digit isomerase EC numbers. 199 of them have sequence information in UniprotKB [49] and 32 are orphan isomerase EC numbers, also known as orphan enzymes [50,51], a term given to EC numbers where no gene has been associated with these reactions and no sequence information is available in protein sequence repositories. Almost half of the isomerase EC numbers with sequence information (96) are present in FunTree [27] and Figure 2a shows the distribution by EC 5 subclass.
Protein structural data are available for 126 isomerase EC numbers, which have at least one entry in the PDB [52].

Observed changes of isomerase function Change in EC number
Analysis of FunTree data on 58 domain superfamilies identified a total of 145 unique changes of isomerase activity that occurred during evolution. Only one-fifth of the changes occur between isomerases whereas the rest involve changing from isomerases to perform reactions in other EC primary classes ( Figure 2b). This is strikingly different from enzymes in other EC classes where changes in lower levels of the EC classification are more common than changes in the primary classification [8 ].
Among the 26 changes within isomerases, only 3 change the EC subclass and 23 change the EC serial number, indicating a change in substrate (Figure 2c). A previous limited study of 24 pairs of enzymes reported that changes involving isomerases and lyases (EC 5$EC 4) occur more often than changes to other EC classes [5].
Other analyses provided further evidence of these changes by revealing the structural insights of the evolution of an isomerase from a family of lyases, namely Nsuccinylamino acid racemase (EC 5.  (Figure 2d).

Correlation of sequence and function evolution
To gain an overview of the relationship between sequence and functional divergence, an overall representation of the sequence and functional similarity between the homologous enzymes that perform different catalytic reactions is presented in Figure 3. This illustrates that most sequences have diverged considerably, with sequence identities in the range lower than 40%. The three measures of functional similarity (Figure 3ac) capture different properties of the change in function, but none of the plots show any linear relationship between sequence and functional divergence. In addition, the distributions for each of these measures look quite different. In Figure 3a, which assesses the overall bond changes, there are two clusters, one consists of changes exhibiting bond change conservation when the isomerase EC subclass is maintained, and in the second changes at the isomerase EC subclass or EC primary class do not exhibit bond change conservation. This partition is not observed in the comparisons by reaction centres and structures of substrate(s) and product(s) and in overall, the similarities tend to be more uniformly spread (Figure 3b,c) Figure 3a-c). This change involves different enzyme sequences from the terpenoid biosynthesis pathway of Arabidopsis thaliana that share high sequence identity (79%) and high reaction similarity (48% -bond change, 72% -reaction centre and 84% -structure similarity). They both act on (S)-2,3epoxysqualene as the main substrate to synthesise a different product, which explains why the structure similarity is high.
In an attempt to analyse the chemical diversity of the domain superfamilies performing changes of function in isomerases, we divided the functional similarity space in four quadrants as depicted in Figure 3d. Each point represents a superfamily whose changes of isomerase function were averaged according to overall chemistry -as measured by bond change similarity -and structures of the reactants -in line with the similarity of the structures of substrate(s) and product(s). Half of the superfamilies shared average similarities of reactants higher than 50% (top two quadrants), whereas only about   one-fourth exhibited average similarities of overall chemistry higher than 50% (right two quadrants). Particularly, there are only three instances where the overall chemistry is similar but the structures of the reactants significantly diverge (bottom right quadrant), highlighting that this is a rare event in the evolution of isomerase function.

An example -a family of SDRs acting on NDP-sugars from the UDP-galactose 4-epimerase superfamily
To explore one set of changes in more detail we have studied eight changes of isomerase function involving a group of nine enzymes catalysing transformations between nucleoside diphosphate sugars (NDP-sugars).
The evolution of enzyme function in the isomerases Martinez Cuesta et al. 127    The evolution of SDRs acting on NDP-sugars. (a) Overview of the EC changes involving isomerases and domain composition of UDP-glucose 4epimerases (EC 5.1.3.2). Biochemical reactions are represented in boxes. Black arrows inside boxes denote chemical transformations whereas coloured arrows linking boxes represent EC changes. EC numbers with an asterisk indicate reactions for which we found mechanistic evidence in MACiE [47] or in literature searches. Changing substructures are highlighted in red whereas X corresponds to nucleoside diphosphate moieties (ADP, TDP, GDP, CDP, UDP) in which the base may change, but the ribose diphosphate (or sometimes the 2 0 -deoxy derivatives) is broadly conserved. Three scatterplots illustrating sequence and functional similarity for this superfamily (b) bond change, (c) reaction centre and (d) structure similarity of substrate(s) and product(s) as in Figure 3.
These metabolites are common in bacterial secondary metabolic pathways and they are necessary in molecular recognition and signalling processes [42 ]. Several studies have revealed the structural, functional and mechanistic determinants of this group of evolutionary-related enzymes. They are epimerases (EC 5), dehydratases (EC 4), decarboxylases (EC 4) and oxidoreductases (EC 1) belonging to the subfamily of short-chain dehydrogenases/reductases (SDR) acting on NDP-sugars (Figure 4a) [55][56][57][58] [58]. Since the reactivity takes place in the attached sugar moiety, the nucleoside diphosphate substructure (noted as X in Figure 4a) is not disrupted during catalysis and remains conserved in all enzymatic activities of this superfamily.
FunTree catalogues 8 changes of isomerase function within this family of enzymes ( Figure 4a). They all share the same domain composition and therefore changes in function result directly from changes in sequence, rather than domain architecture. The analysis of sequence and functional similarities revealed that this family is divergent, with members sharing sequence identities in the 20-40% range. Bond change similarities revealed the already observed bimodal distribution due to the EC classification definitions (Figure 4b). Similarities by reaction centre remain low -not higher than 50% (Figure 4c) whereas overall, this set of functional changes tend to conserve structural similarity, due to the common binding of a conserved nucleoside diphosphate (Figure 4d). The surprising observation from our study highlights that isomerases are more likely to evolve new functions in different EC primary classes, rather than evolve to perform different isomerase reactions. This is unlike the other EC classes where more than two-thirds of the exchanges happen within the same EC class. In addition we note that exchanges between isomerases and lyases (EC 4) are prevalent.
Isomerases change their overall chemistry and conserve the structure of their substrates more often than conserving the chemistry and changing substrates. This is also unlike other types of enzymes and reflects the mechanisms of isomerases, which can often incorporate mechanistic components from different classes to provide a different overall outcome while conserving the substrate binding abilities.
This study is based on exploring the evolution of separate domains. However many enzymes are multidomain and change their domain composition and function during evolution [61]. Cataloguing the evolution of each one of the composite domains can lead to multiple different evolutionary pathways. Further analysis of multidomain architecture and more experimental data would complement and broaden this analysis.
The chaotic nature of the sequence and function relationship in superfamilies including isomerases is evidenced by the lack of correlation between sequence and functional similarity. Variations in sequence are always very large revealing that changes happened long ago, emphasizing that evolutionary studies need to be undertaken on a superfamily basis. Here we gave an example of how combining knowledge from the chemistry and evolution of enzymes acting on nucleoside diphosphate sugars may help to characterise related orphan activities.

18.
Dessailly BH, Dawson NL, Mizuguchi K, Orengo CA: Functional site plasticity in domain superfamilies. Biochim Biophys Acta 2013, 1834:874-889. Authors discovered that structurally diverse superfamilies are plastic in the location of functional sites such as protein-protein binding, catalytic, ligand binding and nucleic acid binding sites. They found that although there is generally a preferred functional site across almost all members of a superfamily and particularly in diverse superfamilies, members tend to exploit other sites for developing other functions. Catalytic sites are distinctively specific in superfamilies compared to other functional sites and they tend to always occur in the same general location within a superfamily.

22.
Elias M, Tawfik DS: Divergence and convergence in enzyme evolution: parallel evolution of paraoxonases from quorumquenching lactonases. J Biol Chem 2012, 287:11-20. Authors investigated the evolution of actives sites, mechanism and promiscuous paraoxonase activity in three structurally different quorum-quenching lactonases. The structural similarity between the reaction intermediates in the native and promiscuous activities identifies the promiscuous activity as a starting point for the evolution of new enzymes that specialise as paraoxonases. This study is key to understand the mechanisms of bacterial resistance to pesticides.