Structural comparisons of phosphoenolpyruvate carboxykinases reveal the evolutionary trajectories of these phosphodiester energy conversion enzymes

Inorganic pyrophosphate (PPi) consists of two phosphate molecules and can act as an energy and phosphate donor in cellular reactions, similar to ATP. Several kinases use PPi as a substrate, and these kinases have recently been suggested to have evolved from ATP-dependent functional homologs, which have significant amino acid sequence similarity to PPi-utilizing enzymes. In contrast, phosphoenolpyruvate carboxykinase (PEPCK) can be divided into three types according to the phosphate donor (ATP, GTP, or PPi), and the amino acid sequence similarity of these PEPCKs is too low to confirm that they share a common ancestor. Here we solved the crystal structure of a PPi-PEPCK homolog from the bacterium Actinomyces israelii at 2.6 Å resolution and compared it with previously reported structures from ATP- and GTP-specific PEPCKs to assess the degrees of similarities and divergences among these PEPCKs. These comparisons revealed that they share a tertiary structure with significant value and that amino acid residues directly contributing to substrate recognition, except for those that recognize purine moieties, are conserved. Furthermore, the order of secondary structural elements between PPi-, ATP-, and GTP-specific PEPCKs was strictly conserved. The structure-based comparisons of the three PEPCK types provide key insights into the structural basis of PPi specificity and suggest that all of these PEPCKs are derived from a common ancestor.

Inorganic pyrophosphate (PP i ) consists of two phosphate molecules and can act as an energy and phosphate donor in cellular reactions, similar to ATP. Several kinases use PP i as a substrate, and these kinases have recently been suggested to have evolved from ATP-dependent functional homologs, which have significant amino acid sequence similarity to PP i -utilizing enzymes. In contrast, phosphoenolpyruvate carboxykinase (PEPCK) can be divided into three types according to the phosphate donor (ATP, GTP, or PPi), and the amino acid sequence similarity of these PEPCKs is too low to confirm that they share a common ancestor. Here we solved the crystal structure of a PP i -PEPCK homolog from the bacterium Actinomyces israelii at 2.6 Å resolution and compared it with previously reported structures from ATP-and GTP-specific PEPCKs to assess the degrees of similarities and divergences among these PEPCKs. These comparisons revealed that they share a tertiary structure with significant value and that amino acid residues directly contributing to substrate recognition, except for those that recognize purine moieties, are conserved. Furthermore, the order of secondary structural elements between PP i -, ATP-, and GTP-specific PEPCKs was strictly conserved. The structure-based comparisons of the three PEPCK types provide key insights into the structural basis of PP i specificity and suggest that all of these PEPCKs are derived from a common ancestor.
Inorganic pyrophosphate (PP i ) 3 is the simplest compound containing a high-energy phosphate bond (1) between two P i molecules. PP i can act as an energy and phosphate donor, similar to nucleoside di-or triphosphates, including ATP, in cellular reactions (2)(3)(4). Several enzymes selectively utilize PP i over ATP and other nucleotides to catalyze similar reactions as ATP-dependent functional homologs.
PP i -utilizing enzymes are potentially useful for metabolic engineering. In contrast to ATP, reactions involving PP i are reversible in vivo, with only a few exceptions (5-7) because the energy released by cleavage of PP i is smaller than that of ATP or GTP (4,8). In addition, utilization of PP i requires less cellular energy because PP i is generated as a byproduct of many in vivo reactions hydrolyzing nucleotide triphosphate (4), whereas energy is required for ATP synthesis. A number of organisms, particularly anaerobic fermenting microbes, utilize PP i -dependent enzymes instead of ATP-dependent functional homologs for glycolysis and closely related metabolic reactions as a strategy to increase net ATP production (9). Therefore, introducing or substituting PP i -dependent enzymes in place of ATP-dependent functional homologs is an attractive approach to alter metabolic flux or improve cellular energy efficiency. Furthermore, use of PP i -dependent enzymes is an attractive possibility for industrial production of phosphorylated compounds because PP i is 1000 times cheaper than ATP (10).
The functional and structural study of PP i -utilizing enzymes is expected to provide important insights into the evolution of cellular energy currency. Because of its simple structure, PP i has been proposed to be the evolutionary precursor of ATP (3,11).
In addition, adoption of PP i -using enzymes by ancestral organisms with poor ATP-producing ability may have been energetically favorable. Therefore, understanding the evolutionary origin and relationship of PP i -dependent enzymes with ATPdependent functional homologs is of interest for the evolutionary study of metabolism.
The evolutionary relationship between PP i -dependent kinases and ATP-dependent functional homologs has been discussed in several studies. The best-studied enzyme to date is phosphofructokinase (PFK), and ATP-dependent (EC 2.7.1.11) and PP i -dependent PFKs (EC 2.7.1.90) have been shown to have a common ancestor (6,12,13). The specificity of these proteins for ATP or PP i is fundamentally determined by a single amino acid residue in the active site (14). The evolutionary relationship between ATP-and PP i -PFK is controversial (13,15,16); however, the results of the most recent large-scale phylogenetic analysis suggest that PP i -PFK evolved from an ATP-utilizing ancestor and that changes in the phosphodonors have occurred multiple independent times (13). In contrast, all biochemically characterized acetate kinases (ACKs) utilize ATP (EC 2.7.2.1) as the phosphate donor. The only exception is an ACK from the eukaryotic parasite Entamoeba histolytica that strictly recognizes PP i as the substrate (EC 2.7.2.12) (7,17). PP i -ACK has clear homology to ATP-ACKs, and phylogenetic analysis indicated that PP i -ACK has not formed a separate clade from ATP-ACKs (18,19), suggesting that PP i -ACK also arose from an ATP-ACK. In addition, a member of the ribokinase family of proteins, which are considered to be ATP-or ADP-dependent, was recently found to be PP i -dependent (10). In summary, these three PP i -dependent kinases and their functional ATP-utilizing homologs have clear amino acid sequence similarity, and the former appear to have evolved from ATP-dependent ancestors.
The evolution of phosphoenolpyruvate carboxykinase (PEPCK) seems to have followed a different pathway from the kinases described above. Depending on the phosphate donor to oxaloacetate, PEPCK can be divided into three types: GTP-PEPCK (EC 4.1.1.32), ATP-PEPCK (EC 4.1.1.49), and PP i -PEPCK (EC 4.1.1.38). Although both ATP-and GTP-PEPCK show significant amino acid sequence identity within each type, no significant overall sequence homology is observed between the two groups of enzymes (20). In contrast, crystal structurebased studies have revealed that ATP-and GTP-PEPCK have highly similar tertiary structures (21,22). In addition, these enzymes possess "consensus motifs," including a PEPCK-specific domain, which directly associates with PEP or oxaloacetate, and P-binding loop (also called kinase-1a) and kinase-2 motifs, which directly interact with the phosphate moiety of nucleoside triphosphates (20,21). In contrast to ATP-and GTP-PEPCK, PP i -PEPCK has not been structurally characterized at the tertiary level. Although a PP i -PEPCK was biochemically characterized more than 50 years ago (23)(24)(25)(26)(27), the amino acid and gene sequence of the enzyme has recently been reported (28). PP i -PEPCK consists of more than 1100 amino acid residues and is approximately twice as long as ATP-and GTP PEPCK. PP i -PEPCK does not share a significant amino acid sequence identity with ATP-or GTP-PEPCK (E-value Ն 1). It remains to be determined whether PP i -PEPCK has a similar tertiary structure as ATP-and GTP-PEPCK.
In this study, we obtained and compared the crystal structure of a PP i -PEPCK homolog from Actinomyces israelii (AiPEPCK) with those of ATP-and GTP-PEPCK to evaluate the degree of homology between PP i -, ATP-, and GTP-PEPCK and to determine the structural basis of the PP i specificity of PP i -PEPCK.

PP i -dependent activity and structure determination of PEPCK
Crystallization was attempted using the biochemically characterized PP i -PEPCKs from Propionibacterium freudenreichii subsp. shermanii (PfPEPCK, WP_013160152.1) and E. histolytica (EhPEPCK1, XP_654765.1) (28) and PP i -PEPCK homologs from several bacteria. The crystals of PfPEPCK and the PP i -PEPCK homolog from A. israelii (AiPEPCK, WP_043560275.1), which show 61% and 43% amino acid identity with PfPEPCK and EhPEPCK1, respectively, were obtained and subjected to X-ray analysis. Quality diffraction data were collected only from a crystal of AiPEPCK labeled with selenomethionine. The data collection and refinement statistics are presented in Table 1.
PP i -dependent PEPCK activity was detected (2 ϫ 10 Ϫ2 mol⅐mg of protein Ϫ1 ⅐min Ϫ1 ) from purified AiPEPCK, whereas ATP-and GTP-dependent activity was not detected. The activity was not increased when Mn was added to the reaction mixture instead of Co. The PP i -PEPCK activity was two orders of magnitude lower than that of PfPEPCK (28) under the same reaction conditions. The lower PP i -PEPCK activity of AiPEPCK may have been attributable to the purification and experimental conditions; for example, AiPEPCK might have lower oxygen tolerance than PfPEPCK, resulting in reduced activity following purification under aerobic conditions.

Structural comparisons of phosphoenolpyruvate carboxykinases
Because AiPEPCK has high (61%) amino acid sequence identity to biochemically characterized PfPEPCK, and strictly conserved residues are likely critical for the PEPCK activity of PfPEPCK and EhPEPCK, as described below, we considered that at least the three-dimensional structure of monomeric AiPEPCK is highly similar to that of the PP i -PEPCKs from P. freudenreichii and E. histolytica. The structure of AiPEPCK was therefore used as the representative structure of PP i -PEPCKs.

Overall structure of AiPEPCK
The crystal structure of AiPEPCK was determined at 2.6 Å resolution. Two molecules in an asymmetric unit contained 1117 amino acid residues (A chain: 8 -28, 39 -431, 435-523, 542-755, and 758 -1149) and 1118 residues (B chain: 11-28, 42-523, and 542-1149) (Fig. S1). A Co ion was observed in each molecule, a finding that was not unexpected, as AiPEPCK was crystallized in the presence of Co and Mg ions, which are both required for EhPEPCK activity (26). The crystal structure of AiPEPCK was highly similar between the A and B chains (C␣ RMSD of 0.141 Å); therefore, the AiPEPCK structure was described using the A chain, which had a lower B factor (49.0 Å 2 ) than the B chain (61.4 Å 2 ) (Fig. S1).

Dimer formation with the contacts between lobes 2 and 3
To evaluate the oligomeric state of AiPEPCK, we carried out size-exclusion chromatography. The result indicated that soluble AiPEPCK exists mainly as a homodimer ( Fig. 2A). Dimer formation was further analyzed using the PISA server (29), and each chain was predicted to form the same homodimer with an identical chain generated by symmetry operation (Fig. 2B). The contact surface area between two protomers was 1605 Å 2 (A-A' dimer) and 1599 (B-B' dimer).
In the quaternary structure of AiPEPCK, lobes 2 and 3 are located on the dimer interface (Fig. 2B). These lobes seem to form handclasp-like interactions; lobe 2 contacts lobes 2Ј and 3Ј of another protomer, and lobe 3 is positioned in close proximity to lobe 2Ј. The strand ␤26 of lobe 2 forms three main-chain hydrogen bonds, which are observed in a parallel ␤-sheet, with the loop connecting 7Ј and ␤25Ј in lobe 2Ј (Fig. 2C). This dimer interface is further reinforced by two hydrogen bonds between Ser 640 and Asn 607 Ј and a van der Waals contact between Trp 636 and Gly 618 Ј. On the other hand, van der Waals contacts mainly contribute to the dimer interface between lobes 3 and 2Ј (Fig. 2D), and Asp 731 and Ser 736 in lobe 3 also form hydrogen bonds with Gln 425 Ј and Trp 467 Ј in lobe 2Ј. These residues located on ␣17 and ␣18 of lobe 3 and on the loops between Ser 4 Ј and Ser 5 Ј in lobe 2Ј. There is no interaction among other lobes and the core structure (Fig. 2B). These structural findings suggest that lobes 2 and 3 are required for dimer formation of AiPEPCK.

Structure comparison of PP i -PEPCK with ATP-and GTP-PEPCK
A structural similarity search using the DALI server (30) revealed that PP i -PEPCK has significant similarity (Z score Ն 10) only to ATP-and GTP-PEPCK. The top hit was ATP-PEPCK from Escherichia coli (PDB code 1OS1-A; Z score, 21.6; A Co ion is located in the deep cleft between the N-and C-terminal domains. The notation Ј is added to all the labels for the AЈ chain. C and D, detailed diagrams of the structures in the dashed boxes in B. The residues for dimer contacts are represented by stick models. Spheres and dashed lines show van der Waals contacts and hydrogen bonds, respectively. The notation Ј is added to all labels for the AЈ chain.

Structural comparisons of phosphoenolpyruvate carboxykinases
RMSD, 3.6 Å; sequence identity, 13%), and the top hit among GTP-PEPCK was an enzyme from Rattus norvegicus (PDB code 5FH0-A; Z score, 17.5; RMSD, 3.5 Å; sequence identity, 10%). The structural superposition of PP i -, ATP-, and GTP-PEPCK revealed that they share a core structure consisting of the Nand C-terminal domains (Fig. S2), although PP i -PEPCK is ϳ500 amino acid residues longer than ATP-and GTP-PEPCK by lobe structure (lobes 1-4, Figs. 2B). Notably, the order of the secondary structural elements present in the core structure of PP i -, ATP-, and GTP PEPCK was completely conserved in the primary structures of all enzymes (Fig. S3). A Co ion was located in the deep cleft between the two globular domains of PP i -PEPCK. ATP-and GTP-PEPCK adopt Ca and Mn ions at the same position, respectively (Fig. S2). In addition, the residues surrounding the metal ions (Lys 331 , Lys 332 , His 352 , Asp 655 , and Asp 656 in AiPEPCK) are spatially conserved among three types of PEP-CKs. The Mn ion at this position is required for enzymatic activity of ATP-and GTP-PEPCK (22,31), indicating that the cleft likely functions as the active site in PP i -PEPCK.
The lobe structures are specific to PP i -PEPCK and are not contained in ATP-and GTP-PEPCKs. Amino acid sequence comparisons revealed that lobes 1-4 existed in all PP i -PEPCKs (Fig. S4). We further searched for reported structures similar to individual lobe structures using the DALI server by extracting them from the overall structure of AiPEPCK. As a result, there was no protein hit with query of lobe 1, whereas the overall structure of lobe 3 and the partial structures of lobes 2 and 4 matched structural elements of other proteins.
The DALI results indicated that the overall structural similarity with lobe 2 was not found in any other proteins. However, the structure of lobe 2 was partially related to those of various types of proteins, such as a component of the bacterial type VII secretion apparatus EssC, a putative transcriptional regulator of the arabinosyltransferase EmbR, propionyl-CoA synthetase, the serine phosphatase MtX, adenylate cyclase-like protein CT664, and so on (Fig. S5). Among the structural elements of lobe 2, the ␤-sandwich fold of ␤-sheets S4 and S5 highly emerges in the protein structures found in the DALI analysis. The top hit was EssC from Staphylococcus aureus (PDB code 1WV3-A; Z score, 4.3; RMSD, 2.9 Å; sequence identity, 6%). This protein and CT644 have a forkhead-associated (FHA) domain that adopts a ␤-sandwich fold and functions as a phosphopeptide recognition module (32). Although the loops connecting ␤-sheets S4 and S5 are used for peptide recognition of lobe 3, similar to the FHA domain (Fig. 2D), lobe 2 shows no sequence similarity to the FHA domain, and lobe 3 has no phosphorylated reside. The structure of lobe 3 was similar to the structural element of NUP155 (1166 -1195), a component of the human nuclear pore complex (PDB code 5IJN-E and 5IJO-E; Z score, 2.1; RMSD, 2.4 Å; sequence identity, 6%) (Fig.  S6) and a large protein with 1391 residues. According to the structures determined by cryoelectron tomography (33), this structural element is located outside of the pore ring and seems not to interact with any other subunits. In addition, the residues on the dimer interface of lobe 3 are not conserved in the similar structural element of NUP155. The helices of lobe 2 (497-648) also emerge as a partial structure of the large globular protein, as shown in propionyl-CoA synthetase (Fig. S5).
However, the structural elements of lobe 2-2Ј interaction (Fig. 2C) are not conserved in any proteins, according to DALI results. These structural findings suggest that lobes 2 and 3 may be specific modules for dimer formation of PP i -PEPCK.
Structures similar to a part of lobe 4 were found in various types of proteins with antiparallel helices consisting of two long ␣-helices (Fig. S7). Five protein structures with the highest Z scores are as follows: two molecules in the Rad50 dimer, a component of the Mre11 complex for the eukaryotic DNA damage response; PHYL1, a phyllody-inducing effector protein of phytoplasma; seryl-tRNA synthetase; and Sso2, a t-SNARE (targetmembrane-associated-soluble N-ethylmaleimide fusion protein attachment protein SNAP receptor) protein that functions in intracellular membrane fusion. The top hit was one molecule in the Rad50 dimer (PDB code 1GOX-A; Z score, 6.2; RMSD, 5.1 Å; sequence identity, 6%). Their antiparallel helices match helices ␣31 and ␣33 of lobe 4, whereas their functions are highly divergent. Rad50 uses the antiparallel helices as an arm for assembly of the Mre11 complex (34). This structural element is required for tetramer formation and folding of a four-helix bundle in PHT1 and Sso2, respectively (35,36). On the other hand, seryl-tRNA synthetase interacts with tRNA Ser using parallel helices (37). In AiPEPCK, helices ␣31 and ␣33 of lobe 4 contact the loops connecting the ␤-strands of S1 in the N-terminal domain. Therefore, the structural homology search could not identify the functional role and evolutionary origin of lobe 4.

Conservation of residues important for PEPCK activity
Standard multiple sequence alignment algorithms work well for amino acid sequences with high similarity but are not suitable for sequences with low similarity or largely different lengths. Here a structure-based alignment algorithm was employed to perform an amino acid sequence alignment of PP i -, ATP-, and GTP-PEPCK because they share less than 10% amino acid sequence similarity, and PP i -PEPCK is twice as long as ATP-and GTP-PEPCK. The structure-based alignment revealed that the catalytic residues in ATP-and GTP-PEPCK are strictly conserved in AiPEPCK and other PP i -PEPCKs (Figs. 3A and Fig. S3). Most of the conserved residues are located within three motifs found in ATP-and GTP-PEPCK: a PEPCKspecific domain, P-binding loop, and kinase 2 domain, which interact with PEP or oxaloacetate, nucleotide triphosphate, and a divalent cation, respectively (20,38). This finding strongly suggests that PP i -PEPCK also contains these three motifs and utilizes a similar catalytic mechanism.

Structural comparisons of phosphoenolpyruvate carboxykinases
orientation of the side chains was also similar between ATP/ GTP-PEPCK and PP i -PEPCK. In addition, all residues ligand to the Co ion (Lys 332 , His 352 , and Asp 656 ) in PP i -PEPCK (Fig. 3A, pink circles) were conserved in ATP/GTP-PEPCK.

Structural basis of PP i specificity
PP i -PEPCK does not recognize ADP or GDP as a phosphate acceptor (24). As expected, the amino acid residues in purinebinding regions were not conserved between ATP-PEPCK ( 449 RISIKDT 455 ) (20), GTP-PEPCK ( 516 WFRKDKNGKFL-WPGFGEN 533 ) (22,42), and PP i -PEPCK, with the exception of a lysine residue corresponding to Lys 917 of AiPEPCK (Fig. 3A). Superposition of PP i -PEPCK and ATP-PEPCK revealed that the bulky side chain of His 920 sterically hinders access of ATP to the active site in PP i -PEPCK (Fig. 3B). His 920 in PP i -PEPCK corresponds to Phe 530 in GTP-PEPCK, one of the two phenylalanine residues whose side chains sandwich the guanine base (22). However, the structural superposition of PP i -PEPCK and GTP-PEPCK showed that the side chains of His 920 and Phe 530 are arranged quite differently and that His 920 also appears to block access of GTP to the catalytic site (Fig. 3C). In addition, His 920 of AiPEPCK is strictly conserved in PP i -PEPCKs from a variety of organisms (Fig. S4). Therefore, this bulky residue of PP i -PEPCK may contribute to the specificity of PP i as the small phosphate donor and phosphate as the acceptor.
A P-binding loop and purine-recognizing helix, which is located immediately after a purine-binding region (Fig. 3A), of ATP-and GTP-PEPCK are flexible and close upon substrate binding (43,44) (Fig. 4A). Active-site comparison of AiPEPCK, ATP-binding ATP-PEPCK, and apoATP-PEPCK revealed that the P-binding loop of apoAiPEPCK is in an open conformation, similar to that in apoATP-PEPCK, whereas the purine-recognizing helix of AiPEPCK is even more closed than the closed helix of ATP-binding ATP-PEPCK (Fig. 4B). The purine-recognizing helix of apoPP i -PEPCK may adopt a closed conformation because of the presence of the PP i -PEPCK-specific appendage structures of lobes 2 and 3. Lobe 2 directly contacts and hinders movement of the purine-recognizing helix to an Red arrows, residues that interact with PEP, oxaloacetate, or analogs; orange circles, residues that interact with the phosphogroup of ATP and/or GTP; pink circles, residues that interact with a divalent cation. His 920 of AiPEPCK, which is suggested to contribute to PP i specificity, is highlighted by a yellow box. B and C, comparison of the active site structure between AiPEPCK (B, light blue) and ATP-binding ATP-PEPCK (PDB code 2PXZ, yellow) and AiPEPCK and GTP-binding GTP-PEPCK (C, PDB code 3DT7, light brown). Residues of PP i -PEPCK and ATP-or GTP-PEPCK are indicated by regular letters and letters in parentheses, respectively.

Structural comparisons of phosphoenolpyruvate carboxykinases
open position. Dimer formation at the surface of lobes 2 and 3 (Fig. 2B) may further inhibit movement of the purine-recognizing helix. Notably, all of the experimentally confirmed GTP-PEPCKs and ATP-PEPCK from E. coli are monomers (21). ATP-PEPCKs from other organisms form homomultimers (21), although this conformation may not inhibit movement of the purine-recognizing helix because ATP-PEPCK from Trypanosoma cruzi (45) forms a homodimer between the N-terminal domains, which does not inhibit movement of the purinerecognizing helix on the C-terminal domain. In addition, His 920 of AiPEPCK is located within the N terminus of the purinerecognizing helix. Therefore, inflexibility of the purine-recognizing helix may fix the bulky residue at the active site and sterically inhibit access of large substrates such as ATP.
In summary, dimer formation by the PP i -PEPCK-specific appendage structures of lobes 2 and 3 and the bulky residue corresponding to His 920 of AiPEPCK may contribute to the PP i specificity of PP i -PEPCK by conserving the small size of the phosphate donor/acceptor-binding site. Most PP i -dependent kinases characterized to date are homologous to their ATP-dependent counterparts, and large amino acid residues occluding a part of the ATP-binding pocket contribute to the PP i specificity of PP i -dependent kinases (10,14). In contrast, to our knowledge, the existence of a large tertiary structure and/or formation of a quaternary structure that contributes to substrate specificity appears to be unique to PEPCK.

Evolutionary relationship between three types of PEPCKs
ATP-PEPCK and GTP-PEPCK share less than 20% overall amino acid sequence identity and also share several conserved motifs; it has been considered that these two types of PEPCKs likely resulted from convergent evolution (20,46). The amino acid sequence of PP i -PEPCK was also found to lack overall sequence similarity with ATP-PEPCK and GTP-PEPCK and is nearly twice as long as that of ATP-PEPCK and GTP-PEPCK (28), suggesting that PP i -PEPCK also does not share a common ancestor with ATP-PEPCK or GTP-PEPCK. Although an amino acid sequence-based search failed to detect conserved motifs in PP i -PEPCK, crystal structure analyses revealed that PP i -PEPCK conserves motifs common to ATP-PEPCK and GTP-PEPCK (Fig. 3A), and the three-dimensional structures of the three types of PEPCKs share statistically significant similar-ity. Furthermore, the order of the secondary structural elements was completely conserved in the primary structures, without exception (Fig. S3). These facts strongly suggest that PP i -, ATP-, and GTP-PEPCKs are not the result of convergent evolution but have a shared origin, as in the case of enzymes that share whole structures and the catalytic domains (47)(48)(49)(50). If this assumption is true, then the timing of the divergence of the three types of PEPCKs and the function of the common ancestor remain to be determined.
The phylogenic tree constructed using the structure-based alignment of core structures revealed that each type of PEPCK forms a single clade with significant bootstrap values (Fig. S8), indicating that the phosphate donor change did not occur multiple times, unlike in the case of PFK. It should be noted that the order of division cannot be discerned from this tree solely because outgroup sequences are not available. However, as PP i -PEPCK possesses insertion sequences forming appendage structures (lobes 1-4) that are absent in ATP-PEPCK and GTP-PEPCK, it is most probable that, at first, the ancestor of PP i -PEPCK diverged from the ancestor of three types of PEP-CKs, and then the ancestor of ATP-PEPCK and GTP-PEPCK diverged (Fig. 5). Both ATP-and GTP-PEPCKs exist in various bacteria and archaea and follow a chimeric distribution. For example, most alpha-, gamma-, and epsilonproteobacteria and approximately half of all deltaproteobacteria have ATP-type PEPCKs, whereas nearly half of all betaproteobacteria and deltaproteobacteria possess GTP-type PEPCKs. Furthermore, in the case of eukaryotes, ATP-PEPCKs exist in yeasts and plants, whereas GTP-dependent PEPCKs are found in higher organisms, including animals and insects. Phylogenic analyses indicate that most archaeal and bacterial ATP-and GTP-PEPCKs sequences are separated by significant values (Fig. S8). Taken together, these results suggest that the separation of ATP-PEPCK and GTP-PEPCK occurred before the birth of the last universal common ancestor (LUCA). To sum up these discussions, the ancestor of PP i -PEPCK may also separate from the common ancestor of the three types of PEPCKs before the LUCA. The three types of PEPCKs may have evolved independently for a sufficiently long period to lose amino acid sequence similarity but still retain the active site, with the exception of the purine-binding region.
It remains uncertain whether the common ancestor of the three types of PEPCKs possessed PP i -PEPCK specific lobes 1-4

Structural comparisons of phosphoenolpyruvate carboxykinases
or whether the appendage structures were inserted after the ancestor of PP i -PEPCK diverged from the common ancestor of the three types of PEPCKs. If lobes 1-4 were added to ancestral PEPCK by lateral gene transfer, then sequences and/or structures that were the source of lobes 1-4 might remain in protein sequences or structures of extant organisms. However, BLASTP and PDB searches conducted using lobes 1-4 as queries found no sequences or functionally conserved structures like the core structure with statistically significant similarities. Discovery of enzymes that share the origin with PEPCKs is required to estimate the presence or absence of the appendage structures and substrate specificity of the common ancestor of PEPCKs.
The evolutionary history of the three types of PEPCKs is clearly distinct from that of other kinases. In the case of PFK, the PP i type has high amino acid sequence similarity to ATP types, and PP i -PFK appears to have been derived from ATP-PFK in multiple events that have occurred relatively recently and involved substitution of one or two Gly residues in the active site with bulky ones (13,14). In contrast, separation of PP i -PEPCK and the ATP-and GTP-dependent functional homologs occurred only once, most probably before the LUCA arose. Furthermore, a drastic change of the structure (insertion or deletion of multiple appendage structures) occurred when the ancestor of ATP-and GTP-PEPCK and PP i -PEPCK were separated. In conclusion, the present structure-based analyses of PP i -PEPCK have helped determine the evolutionary history of PEPCKs, which could not have been detected from the amino acid sequence alone.

Plasmid construction
AiPEPCK (WP_043560275.1) was PCR-amplified from genomic DNA (JGD12771) purchased from RIKEN BRC, which is participating in the National Bio-Resource Project of MEXT, Japan using the primers 5Ј-TCGAAGGTAGGCATA-ATGTCCGTAGTCGAACGC-3Ј and 5Ј-ATTCGGATCCCT-CGATCAGACGAACCTGGGCTG-3Ј and was then cloned into pCold GST plasmids (Takara) cut with NdeI and XhoI using the In-Fusion HD cloning system (Takara).

Overexpression and purification of recombinant PP i -PEPCKs
E. coli BL21 Star (DE3) (Life Technologies) was used for expression of AiPEPCK. Host cells transformed with expression plasmids were inoculated into 400 ml of Luria-Bertani medium in a 1-liter conical flask containing 50 mg liter Ϫ1 ampicillin. After cultivating the cells aerobically at 37°C until A 600 reached ϳ0.5, protein expression was induced by cooling the culture on ice for 30 min and adding 0.1 mM isopropyl 1-thio-␤-D-galactopyranoside to the medium, followed by overnight cultivation at 15°C. To obtain AiPEPCK labeled with selenomethionine, the host cells were cultivated in 400 ml M9 minimal medium supplemented with 20 mg liter Ϫ1 thiamine, biotin, adenosine, guanosine, cytidine, and thymidine and 50 mg liter Ϫ1 ampicillin at 37°C. When the A 600 of the culture reached ϳ0.5, 100 mg liter Ϫ1 (final concentration) of Lys, Phe, and Thr; 50 mg liter Ϫ1 (final concentration) of Ile, Leu, and Val; and 60 mg liter Ϫ1 (final concentration) of selenomethionine were added to the medium, and the cells were further cultivated until A 600 reached ϳ0.7. Protein expression was then induced as described above.
Harvested cells (ϳ7 g of wet cells from 2 liters of culture) were disrupted by sonication in GST buffer (20 mM Tris-HCl (pH 8.0), 300 mM NaCl, 1 mM DTT, and 3 ml g Ϫ1 of wet cells), and cell debris was removed by centrifugation. The supernatant was then mixed with 0.5 ml of GSH-Sepharose 4B (GE Healthcare) and incubated for 30 min at 4°C. The mixture was then applied to an open column and washed with 30 bed volumes of GST buffer. The beads were suspended with 125 l of GST buffer, and the N-terminal GST tag was digested by reaction with human rhinovirus 3C protease at 4°C overnight. The cleaved protein was eluted with 2 bed volumes of GST buffer and then diluted with the same volume of 0.2 mM CoCl 2 to give a NaCl concentration of 150 mM. The resulting solution was applied to a MonoQ HR 5/5 column (bed volume, 1 ml; GE Healthcare) equilibrated with 10 mM Tris-HCl (pH 8.0) con-taining150 mM NaCl and 0.1 mM CoCl 2 . Proteins were eluted by increasing the NaCl gradient from 150 to 650 mM over 20 column volumes at a flow rate of 1.0 ml min Ϫ1 . The purified protein was concentrated to ϳ10 mg ml Ϫ1 using a 4-ml Vivaspin concentrator (50-kDa cutoff, Vivascience) in 10 mM Tris-HCl (pH 8.0), 150 mM NaCl, and 0.1 mM CoCl 2 . Protein concentrations were measured using Bio-Rad DC protein assay dye and bovine ␥-globulin as a standard.

Gel filtration
To determine the quaternary structure of AiPEPCK, gel filtration was performed using a Superdex 200 (10/300) column equilibrated with 20 mM Tris-HCl (pH 8.0) supplemented with 150 mM NaCl at a flow rate of 1 ml min Ϫ1 . Gel filtration standard (Bio-Rad, catalog no. 1511901) was used as a standard.

Crystallization, data collection, and preliminary X-ray analysis
Crystallization experiments were performed using commercially available crystallization kits (Crystal Screen HT, Index HT (Hampton Research, Aliso Viejo, CA), and Wizard Screens I and II (Emerald BioSystems, Bainbridge Island, WA) at 293 K in 96-well VIOLAMO sitting-drop protein crystallization plates (AS ONE Co., Osaka, Japan) with a Gryphon protein crystallization system (Art Robbins Instruments, Sunnyvale, CA). A sitting drop was prepared by mixing 0.2 l of protein solution and 0.2 l of reservoir solution and equilibrated against 40 l of reservoir solution. After optimizing the crystallization conditions, the crystals obtained using a reservoir solution consisting of 100 mM HEPES-NaOH (pH 7.5), 30% (w/v) PEG 400, and 200 mM MgCl 2 were used for data collection.
The selenomethionine-modified AiPEPCK crystals were soaked in crystallization solution supplemented with 25% (v/v) ethylene glycol as a cryoprotectant, picked up using Dual Thickness MicroMounts TM (MiTeGen, Ithaca, NY), and cooled in a liquid nitrogen stream. The X-ray diffraction data (1800 images) were collected using a PILATUS 6 M detector on beamline BL-17A at the Photon Factory (Tsukuba, Japan) with the following parameters: wavelength, 0.97887 Å (Se peak); oscillation angle, 0.2°; exposure time, 0.5 s; crystal-to-detector distance, 485.7 mm.

Structural comparisons of phosphoenolpyruvate carboxykinases
The diffraction data were indexed, integrated, and scaled using XDS (51) and AIMLESS (52). The obtained crystal belonged to the space group P3 1 21 with unit cell parameters of a ϭ b ϭ 160.4 Å and c ϭ 200.2 Å. The initial model was solved by single-wavelength anomalous dispersion phasing using AutoSol of the PHENIX program suite (53). Iterative model building and refinement cycles were performed using COOT (54) and PHENIX.REFINE (55). The final model was refined by Refmac5 (56) with twin refinement (twin fraction of 0.062) and local noncrystallographic symmetry restraint to an R work of 0.196 and R free of 0.241. Data collection and refinement statistics are summarized in Table 1. All structures were depicted using PyMOL viewer (Schrödinger, Tokyo, Japan).

Construction of structure-based amino acid sequence alignment
6K31 (A. israelii PP i -PEPCK), 2PXZ (E. coli ATP-PEPCK), and 3DT7 (rat cytosolic GTP-PEPCK) (57) were used for structure-based amino acid sequence alignment. Structures with complete sequences were reconstructed as follows because the PDB coordinate files lacked some amino acid residues. Complete sequences of each PDB structure were obtained from GenBank. Self-homology modeling with each complete sequence was performed using SWISS-MODEL (58). The first through seventeenth and first through eighth sequences of AiPEPCK and ATP-PEPCK, respectively, were added manually because the program was unable to determine the topology.
The structure-based alignment of the reconstructed and fulllength PEPCKs was performed using the MultiSeq plugin (59) in the VMD program (60) and visualized with ESPript 3.0 (61). Secondary structures were predicted using PyMOL.