In silico characterization of the citrate synthase family in Mycobacterium tuberculosis

: Objective: Mycobacterium tuberculosis (MTB) is an obligate aerobe bacterial pathogen. Here, the citrate synthase (CS) family, an important component of aerobic respiration, was investigated in MTB. Methods: MTB genome was analyzed in silico to reveal the members of CS family. The nucleotide and amino acid sequences were retrieved from the NCBI database, and searched for the similarity using the NCBI BLAST tool. Sequence alignment and phylogenetic analysis were performed using MEGA6. The physicochemical parameters, cellular localization, HMM profiles, motif structure, 3D modeling, and the interactions of the proteins were analyzed using GPMAW, PSORTb, Pfam and SMART, MEME, Phyre2, and STRING databases, respectively. Results: The members of CS family in MTB were identified as CitA, GltA2, and PrpC. The CitA and PrpC were found to be closer in phylogeny than GltA2, and the trees of three proteins were shown to be similar to that constructed based on 16S rRNA in mycobacteria. The CitA contains two CS domains while a single CS domain is found in GltA2 and PrpC. Besides, LHGGA and MGFGHRVY motifs are conserved in MTB and various bacteria. The molecular weight and pI values of CitA, GltA2, and PrpC were calculated as 40.1, 47.9, and 42.9 kDa, and 5.41, 5.35, and 9.31, respectively. Cellular localization of the proteins was predicted as cytoplasm. The highest expression ratio was found to be for gltA2 followed by prpC and citA , respectively, in the retrieved RNA-seq datasets obtained from the aerobic log phase of MTB H37Rv. Conclusion: This comprehensive bioinformatics analysis of CS family in MTB has a contribution to the knowledge of the genetics and physiology of this pathogen.


Introduction
Tuberculosis (TB) is an infectious fatal disease caused by the intracellular bacterial pathogen Mycobacterium tuberculosis (MTB) [1]. According to the World Health Organization (WHO) Global Tuberculosis Report 2014 [2], the incidence of global TB cases in 2013 was 9.0 million and the mortality was 1.5 million including 360.000 people who were HIV-positive. Deciphering the genetics and physiology of the disease agent, MTB, will contribute to the prevention of the high rate of TB mortality.
The tubercle bacillus is obligate aerobe in acute infection but can also survive indefinitely under hypoxic conditions causing latent infection. The latency is associated with limited oxygen in the environment, in which MTB does not replicate or grow very slowly [3,4]. Besides, respiration is an important component of the MTB infection providing flexibility to the pathogen for the adaptation to the environmental stresses [5][6][7].
Citrate synthase (CS) (EC 2.3.3.1) enzyme catalyses the Claisen condensation of oxaloacetate and acetyl-coenzyme A to produce citrate and coenzyme A, the first step in the Krebs tricarboxylic acid (TCA) cycle [8][9][10]. The CS enzymes are characterized into two types according to their regulatory and structural properties. Type I CSs are dimeric molecules without allosteric regulation, and found in Gram-positive bacteria, archaea and eukaryotes. On the other hand, type II CSs, found in Gram-negative bacteria, have homohexameric structure strongly inhibited by NADH [10,11]. There is only a single study on the structural analysis of a member of CS family from MTB. Ferraris et al. [7] reported the crystal structure of dimeric GltA2 protein having a typical common α-structure.
The increased number of genome sequences belonging to MTB strains and other mycobacteria enables the comparative bioinformatic analysis of gene families in a genome and among the genomes from different sources. Here, the citrate synthase family covering CitA, GltA2, and PrpC proteins from MTB was in silico characterized exhibiting their phylogenetic relationships, physicochemical properties, cellular localization, HMM profiles, motif structure, 3D modeling, and interaction network. Moreover, expression profiles of citA, gltA2, and prpC in MTB under aerobic conditions were analyzed using retrieved RNA-seq datasets.

Sequence alignment and phylogenetic analysis
The similarity search for the sequences were performed using the NCBI Basic Local Alignment Search Tool (BLAST; http://blast.ncbi.nlm.nih.gov/Blast.cgi). The multiple alignment of the sequences was conducted with ClustalW, and the phylogenetic relationships were analyzed using the Neighbor-Joining method implemented in MEGA6 software [12] on the basis of uncorrected p-distance. The phylogeny was tested via bootstrap method with 1000 replications [13]. Malate dehydrogenase protein from M. tuberculosis H37Rv (accession number NP_215756.1) was used as the outgroup.

In silico analysis of the RNA-seq datasets
The RNA-seq datasets belonging to the wildtype M. tuberculosis H37Rv were retrieved from the NCBI Sequence Read Archive (SRA) database (http://www. ncbi.nlm.nih.gov/sra) with the accession numbers of SRX727249, SRX727250, and SRX727251. The datasets were belonging to the three biological replicates of log phase cultures grown under aerobic conditions. The transcripts representing citA, gltA2, and prpC were detected using the sequences encoding the LHGGA or MGFGHRVY motifs. Selected transcripts were verified by BLAST analysis. The number of transcripts for the CS family members was determined separately in each RNA-seq datasets, and their percentages with respect to the total number of transcripts in each dataset were calculated.

Phylogenetic relationships of the MTB citrate synthase family
Aerobic respiration is critical for the pathogenesis of M. tuberculosis (MTB) [7]. Investigation of the MTB H37Rv genome for the citrate synthase (CS) family revealed three members, namely citA, gltA2, and prpC ( Table 1). The CitA, GltA2, and PrpC proteins are composed of 373, 431, and 393 amino acids, respectively. The CS family members in MTB take role in the pathogenesis of this microorganism. Muñoz-Elías et al. [20] showed that the PrpC is involved in the replication of MTB in macrophages. Moreover, Baek et al. [21] reported that the overexpression of citA in MTB caused an increased in the antibiotic sensitivity. The number studies     alignment tools such as MUSCLE, MAFFT or T-Coffee. The 'W' is the abbreviation of 'weights' which provides sensitive and efficient alignment of large sets of sequences even from unrelated organisms. Another advantage of ClustalW is the free and user-friendly graphical interface which is compatible with diverse operating systems and desktop environments [23,24]. The alignment of MTB CS proteins exhibited 62 conserved residues, and the CitA and PrpC were found to be more related phylogenetically than GltA2 (Fig. 1). The phylogenetic relationships of CitA (Fig. 2a), GltA2 (Fig. 2b), and PrpC (GltA1) (Fig. 2c) proteins among the mycobacteria were also investigated, and compared with the phylogenetic tree constructed based on 16S rRNA genes (Fig. 2d). A similar pattern was observed in all of the four phylogenetic trees. The amino acid sequences of CS family members from the species in MTB complex, namely M. tuberculosis, M. bovis, M. africanum, and M. canettii were grouped together, and those from M. marinum, M. ulcerans, and M. liflandii were shown to be closely related, which are in the same mycobacterial complex [25]. The alignment (Fig. 3a) and phylogenetic analysis (Fig. 3b) of the CS members from diverse organisms exhibited a pattern similar to their evolutionary relationship. However, Bond et al. [26] reported that the CS sequences in Geobacteraceae have more similarity to eukaryotic ones as observed in 16S rRNA sequences.

Structural properties of the CS family members in MTB
The protein family analysis revealed that the CitA contains two hidden Markov model (HMM) CS domains while a single CS domain was detected in GltA2 and PrpC (Table  2). Additionally, the amino acid sequences of MTB CS family members were analyzed to determine the conserved motifs, and three motifs covering LHGGA and MGFGHRVY were revealed (Fig. 4a). The locations of the motifs mainly found in C-terminal of the proteins (Fig. 4b). The BLAST analysis also showed that the GxIxAxxGxLHGGA and KxMGFGHRVYxxxDxR motifs are conserved in MTB CS members as well as those from the bacteria such as Corynebacterium, Rhodococcus, Helicobacter, Bacillus, and Streptomyces. The alignment of the CSs from the distant organisms (GltA2 from MTB) revealed that GxxHGxA and GxGHxxxxxxDPR are conserved phylogenetically (Fig. 3a). The online, free, and user-friendly portal Protein Homology/analogY Recognition Engine v2.0 (Phyre2) was utilized for the 3D modeling of the CitA (Fig. 5a), GltA2 (Fig.  5b), and PrpC (Fig. 5c) proteins. For CitA and GltA2, the structure of Acetobacter aceti citrate synthase complexed with oxaloacetate and carboxymethyldethia coenzyme A (CMX) was used as template. On the other hand, crystal structure of methylcitrate synthase from MTB was utilized for PrpC homology. The tertiary structure of CS (monomer) proteins in MTB are mainly composed of α-helices with a few β-sheets. Ferraris et al. [7] reported that the GltA2 from MTB is a dimeric protein with unstructured N-terminal region, and consists of eight α-helices and two antiparallel small β-sheets. The major structure of CS from Fasciola hepatica with 469 amino acid length also has α-helix folding [10].
Analysis of the physicochemical parameters of the CS family proteins in MTB showed that the predicted molecular weight (MW) of CitA, GltA2, and PrpC monomers are approximately 40.1, 47.9, and 42.9 kDa, while isoelectric points (pI) were found to be 5.41, 5.35, and 9.31, respectively ( Table 3). The MW and pI of the CS from Geobacter sulfurreducens were predicted as 49.8 kDa and 6.46, respectively [26]. The CS from F. hepatica has a MW of 52 kDa, and pI of 8.1 [10]. Min et al. [27] reported that the CitA from Aspergillus nidulans has 474 amino acids in length with a MW of 52.2 kDa. Moreover, Cheung et al. [28] characterized an iron-regulated CS (SbnG) from Staphylococcus aureus. The SbnG protein has 259 amino acids in length with a MW of 28.7 kDa, and pI of 5.46.
The cellular localization of the proteins was determined as cytoplasm with a score of 9.97 for CitA and GltA2, and 9.95 for PrpC. The interaction network of CitA (Fig. 6a), GltA2 (Fig. 6b), and PrpC (Fig. 6c) proteins with citrate synthase function revealed the connections with other proteins taking role in TCA cycle.

Gene expression profiles of the CS family members in MTB H37Rv
Transcriptome-wide expression analysis of the citA, gltA2, and prpC was conducted using the RNA-seq datasets obtained from the wild-type MTB H37Rv grown under aerobic conditions. The nucleotide sequences encoding LHGGA or MGFGHRVY motifs in each CS family member are varying, which are ctgcatggtggcgcg, cttcatggcggcgcc, and ctacacggcggcgcc or atggggttcgggcaccgggtctac, atgggtttcggtcatcgtgtctac, and atgggcttcgggcatcgggtgtac for CitA, GltA2, and PrpC, respectively. These sequences and their reverse complements were used to find out the number of transcripts representing citA, gltA2, and prpC in three RNA-seq datasets from biological replicates. The number of transcripts (Fig. 7a) and their percentages with respect to the total number of transcripts in each dataset (Fig. 7b) were exhibited. The highest number of transcripts was detected as 458 for gltA2, and then 143 and 105 transcripts were found for prpC and citA, respectively. The rank order was found to be the same in all of the datasets. This comprehensive in silico analysis provides a closer look into the CS family in MTB, which would confer a better insight into the genetics and physiology of this infectious agent. The expression profiles of CS genes in MTB under limited oxygen conditions should also be investigated to figure out their role in latency of the pathogen. Besides, the sub-cellular localization of the proteins can be visualized using a recombinant reporter protein such as GFP-tagging. Recently, isocitrate lyase, another TCA cycle enzyme, from MTB was shown to be taking role in the antibiotic tolerance of the pathogen [29]. For a future perspective, involvement of the CSs in the antibiotic tolerance of MTB may be investigated. Moreover, an attenuated strain that would be obtained via inactivation of a CS member in MTB might be used for vaccine development against tuberculosis.