ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Research Article

A bioinformatics insight to rhizobial globins: gene identification and mapping, polypeptide sequence and phenetic analysis, and protein modeling.

[version 1; peer review: 2 approved]
PUBLISHED 13 May 2015
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Oxygen-binding and sensing proteins collection.

Abstract

Globins (Glbs) are proteins widely distributed in organisms. Three evolutionary families have been identified in Glbs: the M, S and T Glb families. The M Glbs include flavohemoglobins (fHbs) and single-domain Glbs (SDgbs); the S Glbs include globin-coupled sensors (GCSs), protoglobins and sensor single domain globins, and the T Glbs include truncated Glbs (tHbs). Structurally, the M and S Glbs exhibit 3/3-folding whereas the T Glbs exhibit 2/2-folding. Glbs are widespread in bacteria, including several rhizobial genomes. However, only few rhizobial Glbs have been characterized. Hence, we characterized Glbs from 62 rhizobial genomes using bioinformatics methods such as data mining in databases, sequence alignment, phenogram construction and protein modeling. Also, we analyzed soluble extracts from Bradyrhizobium japonicum USDA38 and USDA58 by (reduced + carbon monoxide (CO) minus reduced) differential spectroscopy. Database searching showed that only fhb, sdgb, gcs and thb genes exist in the rhizobia analyzed in this work. Promoter analysis revealed that apparently several rhizobial glb genes are not regulated by a -10 promoter but might be regulated by -35 and Fnr (fumarate-nitrate reduction regulator)-like promoters. Mapping analysis revealed that rhizobial fhbs and thbs are flanked by a variety of genes whereas several rhizobial sdgbs and gcss are flanked by genes coding for proteins involved in the metabolism of nitrates and nitrites and chemotaxis, respectively. Phenetic analysis showed that rhizobial Glbs segregate into the M, S and T Glb families, while structural analysis showed that predicted rhizobial SDgbs and fHbs and GCSs globin domain and tHbs fold into the 3/3- and 2/2-folding, respectively. Spectra from B. japonicum USDA38 and USDA58 soluble extracts exhibited peaks and troughs characteristic of bacterial and vertebrate Glbs thus indicating that putative Glbs are synthesized in B. japonicum USDA38 and USDA58.

Keywords

Burkholderia, Cupriavidus, flavohemoglobin, globin-coupled sensor, Rhizobium, single-domain globin, truncated (2/2) hemoglobin

Introduction

Globins (Glbs) are proteins widely distributed in organisms from the three kingdoms of life, i.e. in Archaea, Eubacteria and Eukarya1. Structurally, Glbs fold into a tertiary structure known as the globin fold. This protein folding consists of six to eight α-helices (designated with letters A to H) that form a hydrophobic pocket where a heme prosthetic group is located2. Two structural types of the globin fold have been identified in Glbs: the 2/2- and 3/3-fold. In the 2/2-Glbs, helices B and E overlap to helices G and H3 and in the 3/3-Glbs helices A, E and F overlap to helices B, G and H4,5. Likewise, three evolutionary families have been identified in Glbs6,7: the M, S and T Glb families. The M Glbs include flavohemoglobins (fHbs) and single-domain Glbs (SDgbs), the S Glbs include globin-coupled sensors (GCSs), protoglobins and sensor single domain globins, and the T Glbs include truncated Glbs (tHbs) (which are further classified into class 1, class 2 and class 3 tHbs). Canonical tHbs are ~20 to 40 amino acids shorter than the globin fold, resulting in an almost absent helix A and a helix F that is reduced to a single turn8,9. The M and S Glbs fold into the 3/3-fold whereas the T Glbs fold into the 2/2-fold.

A variety of gaseous ligands bind to the heme Fe of Glbs, most notably O2 and nitric oxide (NO). The reversible binding of O2 is associated with the major function of Glbs in organisms: the transport of O2. Binding of NO by oxygenated Glbs is essential to NO-detoxification via NO-dioxygenase activity10,11. Several additional functions have been reported for Glbs, including dehaloperoxidase activity and reaction with free radicals, binding and transport of sulfide and lipids, and O2-sensing (reviewed by Giardina et al.12 and Vinogradov et al.13). This indicates that in vivo, Glbs might be multifunctional proteins.

Glbs are widespread in bacteria. A comprehensive genomic analysis revealed that glb genes belonging to the M, S and T Glb families exist in the genomes of 1185 Eubacteria, including several rhizobial genomes7. However, only few rhizobial glb genes have been characterized. Characterizing rhizobial Glbs is of interest because rhizobia establish symbiotic relationships with leguminous plants. A result of this plant-microbe interaction is the symbiotic fixation of atmospheric N2, which occurs within specialized plant organs called nodules14. Symbiotic N2-fixation is a process modulated by a variety of factors, such as the O215 and NO16,17 levels in the surrounding environment. Glbs bind O2 and NO and thus may function in some aspects of the N2-fixation, e.g. by transporting O2 and detoxifying NO. Modulation of O2 levels in the plant cell cytoplasm from nodules is well characterized18,19. A plant Glb (leghemoglobin (Lb)) that is synthesized at high (~3 to 5 mM) concentrations in nodules apparently facilitates O2-diffusion to the symbiotic rhizobia and maintains low (submicromolar) concentrations of O2 within nodules. This is essential for sustaining the (micro) aerobic respiration of symbiotic rhizobia and preventing the inactivation of nitrogenase (which fixes the atmospheric N2 into NH4+) by O2. The binding and metabolizing of NO by Lb and other Glbs is also well documented11,20. Thus, a likely function for Lb in nodules is to detoxify the NO that is generated during the plant infection by rhizobia21. However, little is known about the properties and functions of Glbs either within the symbiotic or free-living rhizobia.

Forty-six years ago Appleby22 was the first to propose the existence of Glbs in rhizobia. This author detected absorption peaks and troughs that are characteristic of Glbs in differential (dithionite reduced + CO minus dithionite reduced) spectra of soluble extracts from Bradyrhizobium japonicum 505 (Wisconsin). Subsequent spectroscopic analyses suggested the existence of soluble Glbs in Rhizobium leguminosarum bv. viciae23, B. japonicum NPK6324 and R. etli CE325. The first rhizobial glb gene was identified in the pSymA megaplasmid of Sinorhizobium meliloti 102126. BLAST analysis revealed that this gene corresponded to an fhb gene and thus was named smfhb. A bioinformatics analysis showed that smfhb is flanked by nos and fix genes (which code for denitrification enzymes and high O2-affinity terminal oxidases and an O2-sensor, respectively) and that apparently it is regulated by an Fnr-like promoter. These observations suggested that smfhb is regulated by the concentration of O2 and that SmfHb functions in some aspects of nitrogen metabolism. A transcriptomic analysis of the S. meliloti response to NO in culture showed that smfhb (also designated as a S. meliloti hmp) is upregulated by NO and the analysis of a smfhb- mutant exhibited a high sensitivity to NO in culture and led to a reduced N2-fixation efficiency in planta. These observations suggested that SmfHb functions in some aspects of NO metabolism, possibly by detoxifying NO27.

Genomic analysis reported by Vinogradov et al.7 revealed that Glb sequences exist in several rhizobia. However, in spite of the above reports knowledge on the rhizobial Glbs is quite limited. Hence, in order to obtain information on the properties of rhizobial Glbs we characterized Glb sequences from selected rhizobial genomes by using bioinformatics methods. These included gene characterization, polypeptide sequence and phenetic analysis, as well as protein modeling. Also, we analyzed soluble extracts from B. japonicum USDA38 and USDA58 by differential spectroscopy. Our main results showed that only fhb, sdgb, gcs and thb genes exist in the rhizobia analyzed in this work; that several rhizobial glb genes are not regulated by a -10 promoter but might be regulated by -35 and Fnr-like promoters; that rhizobial fhbs and thbs are flanked by a variety of genes whereas several rhizobial sdgbs and gcss are flanked by genes coding for proteins involved in the metabolism of nitrates and nitrites and chemotaxis, respectively; that rhizobial Glbs segregate into the M, S and T Glb families; that predicted rhizobial SDgbs and fHbs and GCSs globin domain and tHbs fold into the 3/3- and 2/2-fold, respectively, and that spectra from B. japonicum USDA38 and USDA58 soluble extracts exhibit peaks and troughs characteristic of bacterial and vertebrate Glbs.

Methods

Database search

Putative Glb sequences and Glb domains were identified in databases (Table S1) containing the genomes of rhizobial species and strains using the query sequences S. meliloti fHb; Vitreoscilla SDgb; Agrobacterium tumefaciens GCS; Methanosarcina acetivorans protoglobin; Methylacidiphilum infernorum sensor single domain globin; Mycobacterium tuberculosis tHb class 1; A. tumefaciens tHb class 2, and M. avium tHb class 3 (Genbank accession numbers AY328026, AAA75506, NP_354049, 2VEB_A, YP_001939425, NP_216058, WP_020813663 and BAN32501, respectively) and the SUPERFAMILY database (http://supfam.mrc-lmb.cam.ac.uk)28. Resulting sequences were subjected to a FUGUE analysis (http://tardis.nibio.go.jp/fugue/prfsearch.html)29 to determine the most similar Glb structure and presence of proximal H at the myoglobin-fold position F8. Putative Glbs had to satisfy the following criteria: length higher than or ~100 amino acids, a FUGUE Z score higher than 6 (which corresponds to 99% specificity29) with known Glb structures, and the presence of proximal H at position F8.

Gene mapping and detection of promoter sequences

Scaffolds containing copies of the glb gene were used for mapping glbs. This included the detection of open reading frames (ORFs) ~5 kb up- and downstream to glbs and ORF length, transcription direction and localization in the +/- strand. Canonical (-10 and -35) and Fnr30 promoter sequences and Shine-Dalgarno sequences were searched within 130 nucleotides upstream to the rhizobial glb genes either by using the search tool of MS Word® or by pairwise sequence alignments using the ClustalX program (http://www.clustal.org/clustal2/)31.

Protein sequence alignments and phenetic analysis

Pairwise and multiple sequence alignments were performed using the ClustalX program31. Multiple sequence alignment was manually verified using the procedure described by Kapp et al.32 based on the myoglobin-fold33. A phenogram was constructed from the aligned sequences using the UPGMA method from the ClustalX program. The resulting phenogram was edited using the iTOL program (http://itol.embl.de/)34.

Modeling and analysis of the predicted proteins tertiary structure

The tertiary structure of rhizobial Glbs was modeled using the automated mode of the I-TASSER server (http://zhanglab.ccmb.med.umich.edu/I-TASSER/)3537, which also provided the best structural homologs to the query sequences. Models were edited using the VMD program (http://www.ks.uiuc.edu/Research/vmd/)38 and Adobe Photoshop® software. Distance and dihedral angles of amino acids at the heme prosthetic group were calculated using the distance and dihedral tools of the SwissPDBViewer program (http://spdbv.vital-it.ch/) as described by Gopalasubramaniam et al.39 and Sáenz-Rivera et al.40, respectively.

Bacterial growth, cell rupture and spectral analysis

Bradyrhizobium japonicum USDA38 and USDA58 were kindly provided by Drs. Donald Keister and Douglas Jones (United States Department of Agriculture, USA). All reagents were purchased from Sigma-Aldrich (St. Louis MO, USA). B. japonicum cells were grown in YM (Yeast Mannitol) broth (per 100 ml: KH2PO4, 50 mg; MgSO4, 20 mg; NaCl, 10 mg; mannitol, 1 g; yeast extract, 50 mg, pH 7.0) for 3 to 5 days at 30°C with shaking at 200 rpm. Cells were harvested by centrifugation at 11,000 × g, pellets were resuspended in 50 mM Na-phosphate buffer (pH 7.2) containing 1 mM EDTA and 1 mM phenylmethylsulfonyl fluoride (PMSF). Cells were disrupted by sonication at maximum power (three cycles of 1 min each in ice) and incubation at 4°C overnight with gentle agitation after the addition of DNAse I (40 U/ml), RNAse A (3 U/ml) and lysozyme (2 mg/ml). The resulting solution was cleared by centrifugation at 22,000 × g for 40 min at 4°C, and the supernatant was fractionated with solid ammonium sulphate between 35 and 65% saturation. The resulting pellet was resuspended in 5 ml of 50 mM Na-phosphate buffer (pH 7.2) containing 1 mM EDTA and 1 mM PMSF and dialyzed for 18 h against the same buffer to remove the excess of salts. 0.5 to 1 ml aliquots of the dialyzed solution were used to obtain the dithionite reduced + CO minus dithionite reduced differential spectra in a Beckman DU6 spectrophotometer. Control spectra were obtained from commercial (Sigma-Aldrich) preparations of the sperm whale myoglobin and bovine blood hemoglobin.

Dataset 1.Globin genes detected in the genomes of rhizobial bacteria.
Globin nomenclature corresponds to the first three binomial (genus and species) letters followed by the strain name, globin type and gene copy number. URLs indicate links to individual glb gene sequences56.
Data set 2. Predicted Glb polypeptides detected in the genomes of rhizobial bacteria. Globin nomenclature corresponds to the first three binomial (genus and species) letters followed by the strain name, globin type and globin copy number. URLs indicate links to individual Glb polypeptide sequences.
Glbs
fHbs
?-rhizobia
RhilegUPM1137fHb
https://img.jgi.doe.gov/cgi-bin/w/main.cgi?section=GeneDetail&page=genePageMainFaa&gene_oid=2513574459
MPKTLSSETVAAVKATISALDEHGAVITAAMYRRLFEDAEIAALFNQSNQ
KSGTQIHALAAAILAYARNIESLAALGPAVERIAQKHIGYAILPDHYPHV
ATALLGAIEEVLGGAATPDVLTAWGEAYWFLADILKGREAAIRDDLLSKA
GGWTGWRRFVFAERRQESETITSFILRPQDGGRVLRHKPGQYLTFRFDAA
GREGLKRNYSISCAPNDEHYRISVKREPQGDASVYLHDEASAGTVVECTP
PAGDFFLSDPPQRPVVLLSGGVGLTPMVSILEALAEKHAGHPTFYIHGTA
SRATHAFDSHVKILAARQQATSVATFYDQSSDEAEVHSGYISFEWLLANT
PFMEADFYICGPRPFMRFFVSGLTQAGVSADRIHYEFFGPTDEVLAA
Sinmel1021fHb
https://img.jgi.doe.gov/cgi-bin/w/main.cgi?section=GeneDetail&page=genePageMainFaa&gene_oid=637180064
MLTQKTKDIVKATAPVLAQHGYAIIQHFYKRMFQAHPELKDIFNMAHQER
GEQQQALARAVYAYAANIENPESLSAVLKDIAHKHASIGVRPEQYPIVGE
HLLASIKEVLGDAATDEIISAWAQAYGNLADILAGMESELYGRSEERAGG
WAGWRRFIVREKNPESDVITSFVLEPADGGPVADFEPGQYTSVAVQVPKL
GYQQIRQYSLSDSPNGRSYRISVKREDGGLGTPGYVSSLLHDEINVGDEP
KLAAPYGNFYIDVSATTPIVLISGGVGLTPMVSMLKKALQTPPRKVVFVH
GARNSAVHAMRDRLKEASRTYPDFKLFIFYDEPLPTDIEGRDYDFAGLVD
VENVKDSILLDDADYYICGPVPFMRMQHDKLLGLGITEARIHYEVFGPDL
FAE
?-rhizobia
BurphySTM815fHb
https://img.jgi.doe.gov/cgi-bin/er/main.cgi?section=GeneDetail&page=genePageMainFaa&gene_oid=642595788
MLSAEHRAIVKATVPLLESGGEALTTHFYKVMLSEYPSVRPLFNQAHQQS
GDQPRALANAVLMYARHIEQLEQLGGLVSQIVNKHVALNILPEHYPIVGT
CLLRAIREVLGPEIATDAVIEAWGAAYGQLADLLIGLEEKVYVEKETSKG
GWRGTRPFVVARKVKESDEITSFYLRPADGGDVLEFQPGQYIGLRLIVDG
EEIRRNYSLSAAANGREYRISVKREPNGKGSNYLHDVVKEGDTLDLYAPS
GDFTLEHSDKPLVLISGGVGITPTLAMLNAALQTSRPIHFIHATRHGGVH
AFRDAIDELAARHPQLKRFYVYEKPRQQDDAHHAEGFIDEDRLIEWMPAT
RDVDVYFLGPKPFMKAVKRHLKAIGVPEKQSRFEFFGPAAALD
CupnecN1fHb1
https://img.jgi.doe.gov/cgi-bin/er/main.cgi?section=GeneDetail&page=genePageMainFaa&gene_oid=650995360
MLSAASRPYIDASVPVLREHGLAITTHFYREMFADRPELTQMFNMGNQAN
GSQQQSLASAVFAYAANIDNAAALGPVLERIVHKHAAVGLTPAHYPIVGR
HLLGAISAVLGEAATPPLLAAWDEAYWLLAGELIAAEARLYQRTGVAAGE
LTPVRVVRREAQGDQVVALTLAAADGQPLRAFRPGQYISVEARLDDGQRQ
LRQYSLSAESGLPTWRISVKREAGDRTTPAGAVSNWLHANAQVGTELKVS
APFGEFTPALDGRRPLVLLSAGIGITPMLSVLRTLAAQGSQRQVLFAHAA
RDGRHHAHRADLQWARERLPQLATHISYETPQAGDVAGRDYDHAGTMPVA
ELLRQPDLQRFVDGSFYLCGPLGFMQEQRHALVSAGVPVAHIEREVFGPD
LLDDLL
CupnecN1fHb2
https://img.jgi.doe.gov/cgi-bin/er/main.cgi?section=GeneDetail&page=genePageMainFaa&gene_oid=650996610
MLTQQTKDIVKATAPVLAAHGYDIIKCFYKRMFEAHPELKNVFNMAHQEQ
GQQQQALARAVYAYAENIEDPSSLLAVLKNIANKHASLGVRPEQYPIVGE
HLLAAIKEVLGDAATDDIISAWAQAYGNLADVLMGMESELYERSAEQPGG
WKGWRNFVVREKRPESDVITSFILEPVDGGPLLNFEPGQYTSVAIDVPAL
GLQQIRQYSLSDMPNGRSYRISVKREAGGTQPPGYVSNLLHDHVNVGDEV
RLAAPYGSFHIDVNARTPIVLISGGVGLTPMISMLKNALQEPPRQVVFVH
GARNSAVHAMRDRLREAAKAYENFDLFVFYDQPLSEDVQGRDYDYPGLVD
VKLIEKSILLPDADYYICGPIPFMRMQHDALKKLGVHEGCIHYEVFGPDL
FAE
CupnecHPC(L)fHb
https://img.jgi.doe.gov/cgi-bin/w/main.cgi?section=GeneDetail&page=genePageMainFaa&gene_oid=2563138427
MLSPNTIALVKATVPVLTQHGEAITQHFYRLLLTQHPELKAFFNEAHQVH
GTQARALAGAVLAFASHVDELEALAGALPRIVQKHAALGVQPEHYPIVGG
CLLQAIRDVLGEAATDEIIGAWGEAYGVLAKILIDAEEAVYRDNAAQPGG
WRGTRGLRIARKVQESEIITSFYLEPADGGVLPAFRPGQYLTLLLTIDGA
PTRRHYSLSDAPGKPWYRISVKREPGGRASNWLHDHAAVGDVLQALQPCG
DFVLEPAADERPLVLVTGGVGITPAISMLEAAAPAGRPIQFIHAARHGGV
HAFRERVDAIAANYDNVSVCYVYDTPRDGDNPHAVGFVTRELLASRLPAD
RDVDFYLLGPKAFMRAVHADGRALGIAPERLRFEFFGPLEDLQAA
CupnecJMP134fHb
https://img.jgi.doe.gov/cgi-bin/w/main.cgi?section=GeneDetail&page=genePageMainFaa&gene_oid=637692645
MLSAASRPYIDASVPVLREHGLAITTHFYREMFAARPELTQLFNMGNQAN
GSQQQSLASAVFAYAANIDNANALAPVVERIVHKHAAVGLKPAHYPIVGR
HLLGAISAVLGEAATPDLIAAWDEAYWLLAGELIAAEARLYQSTGMAAGE
RIAVRVDRREVQSDTVVALTLSAVDGQPLRDFRPGQYVSVEVTLDDGNRQ
QRQYSLSAERGLPTWQISVKREDGDHATPAGAVSNWLHANAQPGTELSVS
APFGDFAPRLDNHRPIVLLSAGIGITPMLSVLRTLAAQGSRREILFAHAA
RDGRHHAHRADVAWARERLPQLRTHISYEQPQAADVAGRDYDHAGTMPVA
ALLDAPDNRLFIDGDFHLCGPLGFMQAQRHALISAGVPVGHIHREVFGPD
LLDDLL
SDgbs
?-rhizobia
AzodoeUFLA1-100SDgb
https://img.jgi.doe.gov/cgi-bin/er/main.cgi?section=GeneDetail&page=genePageMainFaa&gene_oid=2513592844
MTPSQVELVQSSFAKVAPIADTAAGLFYGRLFETAPEVKPLFKGDIAEQG
RKLMATLAVVVNGLTKLEVIVPAAQTLARRHVAYGVRPEHYAPVGAALLW
TLEQGLGPDFTPETKAAWAEAYTLLSSVMIEAAADAAPVA
BraelkUSDA3254SDgb1
https://img.jgi.doe.gov/cgi-bin/w/main.cgi?section=GeneDetail&page=genePageMainFaa&gene_oid=2513917904
MTPSSNPIERSFELAAAACDDLTPSVYRRLFRDHPEAQAMFRTEGSEPVK
GSMLQLTIEAILDFAGERRGHFRLIESEVFSHDAYGTPRELFVAFFAVIA
DSLREILGEQWTAEIDAAWHKLLGDIEAIVLQQKHLVDERP
BraelkUSDA3254SDgb2
https://img.jgi.doe.gov/cgi-bin/er/main.cgi?section=GeneDetail&page=genePageMainFaa&gene_oid=2513921873
MTPEQVDLIGISFDAMWPIRRDIADLCYSRFVELDPDAKDMFAGDIERRR
MKVLDMITALVASLDERPIFQSLITLSGHKHARLGVQLSHYVAMGEALMW
SLERKLGASFTQELQEAWRTLYATAQTEMLRSAAKT
BraelkUSDA3254SDgb3
https://img.jgi.doe.gov/cgi-bin/er/main.cgi?section=GeneDetail&page=genePageMainFaa&gene_oid=2513923443
MWPIRRDLADLCYNRFVELAPDARQMFGGDTEKQRMKVLDMITALVASLD
ERPMFQSLIAISGHKHAILGVQPSHFVAMGEALMWSFERKFGASFTPELR
ESWHTLYATAQNEMLRATGRHSSF
BraelkUSDA3254SDgb4
https://img.jgi.doe.gov/cgi-bin/er/main.cgi?section=GeneDetail&page=genePageMainFaa&gene_oid=2513923416
MNPAQIKLVQDSFGKVAPISEQAAVIFYDRLFEVAPAVKAMFPVDMKEQR
KKLMTTLAVVVNGLSNLDTILPAASALAKRHVGYGAKAEHYPVVGGALLW
TLEKGLGEAWTPDVAAAWTAAYGTLSGYMISEAYGPVQPVE
Braelk587SDgb1
https://img.jgi.doe.gov/cgi-bin/w/main.cgi?section=GeneDetail&page=genePageMainFaa&gene_oid=2550666940
MNPAQIKLVQESFGKVAPISEQAAVIFYDRLFEVAPAVKAMFPADMKEQR
RKLMTTLAVVVNGLSNLDTILPAASALAKRHVNYGARPEHYPVVGGALLW
TLEKGLGPAWTPDVAAAWTAAYGTLSGYMISEAYGGPRAAE
Braelk587SDgb2
https://img.jgi.doe.gov/cgi-bin/w/main.cgi?section=GeneDetail&page=genePageMainFaa&gene_oid=2550660969
MTPEQVDLIRTSFDAMWPIRRDLADLCYNRFVELAPDARSLFGGDAEKQR
MKMLDMIIALVASLDERPMFQSLITLSGHKHARLGVQPSHFVAMGEALMW
SFERKFGAFFTPELRDSWRALYATAQNEMLRAAGRPSSF
Braelk587SDgb3
https://img.jgi.doe.gov/cgi-bin/w/main.cgi?section=GeneDetail&page=genePageMainFaa&gene_oid=2550659533
MFRTEGSEPVKGAMLQLTIEAILDFAGERRGHFRLIESEVFSHDAYGTPR
ELFVAFFAMIADSLRDILGEQWTAEIDAAWHTLLGDIEAIVLQQKHLVDE
RP
BraelkUSDA3259SDgb1
https://img.jgi.doe.gov/cgi-bin/w/main.cgi?section=GeneDetail&page=genePageMainFaa&gene_oid=2513661769
MNPAQIKLVQDSFGKVAPISEQAAVIFYDRLFEVAPAVKAMFPVDMKEQR
KKLMTTLAVVVNGLSNLDTILPAASALAKRHVGYGAKAEHYPVVGGALLW
TLEKGLGEAWTPDVAAAWTAAYGTLSGYMISEAYGPVQPVE
BraelkUSDA3259SDgb2
https://img.jgi.doe.gov/cgi-bin/w/main.cgi?section=GeneDetail&page=genePageMainFaa&gene_oid=2513659875
MTPEQVDLIGISFDAMWPIRRDIADLCYSRFVELDPDAKDMFAGDIERRR
MKVLDMITALVASLDERPIFQSLITLSGHKHARLGVQLSHYVAMGEALMW
SLERKLGASFTQELQEAWRTLYATAQTEMLRSAAKT
BraelkUSDA3259SDgb3
https://img.jgi.doe.gov/cgi-bin/w/main.cgi?section=GeneDetail&page=genePageMainFaa&gene_oid=2513656719
MTPSSNPIERSFELAAAACDDLTPSVYRRLFRDHPEAQAMFRTEGSEPVK
GSMLQLTIEAILDFAGERRGHFRLIESEVFSHDAYGTPRELFVAFFAVIA
DSLREILGEQWTAEIDAAWHKLLGDIEAIVLQQKHLVDERP
BraelkUSDA3259SDgb4
https://img.jgi.doe.gov/cgi-bin/w/main.cgi?section=GeneDetail&page=genePageMainFaa&gene_oid=2513661812
MWPIRRDLADLCYNRFVELAPDARQMFGGDTEKQRMKVLDMITALVASLD
ERPMFQSLIAISGHKHAILGVQPSHFVAMGEALMWSFERKFGASFTPELR
ESWHTLYATAQNEMLRATGRHSSF
BraelkUSDA76SDgb1
https://img.jgi.doe.gov/cgi-bin/w/main.cgi?section=GeneDetail&page=genePageMainFaa&gene_oid=2517891202
MNPAQIKLVQESFGKVAPISEQAAVIFYDRLFEVAPAVKAMFPADMKEQR
RKLMTTLAVVVNGLSNLDTILPAASALAKRHVNYGARPEHYPVVGGALLW
TLEKGLGPAWTPDVAAAWTAAYGTLSGYMISEAYGGPRAAE
BraelkUSDA76SDgb2
https://img.jgi.doe.gov/cgi-bin/w/main.cgi?section=GeneDetail&page=genePageMainFaa&gene_oid=2517887587
MTPEQVDLIRTSFDAMWPIRRDLADLCYNRFVELAPDARSLFGGDAEKQR
MKMLDMIIALVASLDERPMFQSLITLSGHKHARLGVQPSHFVAMGEALMW
SFERKFGAFFTPELRDSWRALYATAQNEMLRAAGRPSSF
BraelkUSDA76SDgb3
https://img.jgi.doe.gov/cgi-bin/w/main.cgi?section=GeneDetail&page=genePageMainFaa&gene_oid=2517893050
MTPSSNPIERSFELAAAACDDLTPFVYRRLFREHPETQAMFRTEGSEPVK
GAMLQLTIEAILDFAGERRGHFRLIESEVFSHDAYGTPRELFVAFFAMIA
DSLRDILGEQWTAEIDAAWHTLLGDIEAIVLQQKHLVDERP
BraelkUSDA94SDgb1
https://img.jgi.doe.gov/cgi-bin/w/main.cgi?section=GeneDetail&page=genePageMainFaa&gene_oid=2513856978
MNPAQIKLVQESFGKVAPISEQAAVIFYDRLFEVAPAVRAMFPADMKEQR
KKLMTTLAVVVNGLSNLDTILPAASALAKRHVGYGAKPEHYPVVGGALLW
TLEKGLGEAWTPDVAAAWTAAYGTLSGYMISEAYGSAQPAE
BraelkUSDA94SDgb2
https://img.jgi.doe.gov/cgi-bin/w/main.cgi?section=GeneDetail&page=genePageMainFaa&gene_oid=2513862009
MKMLDMITALVASLDERPMFQSLITLSGHKHARLGVQPSHFVAMGEALMW
SFERKFGAFFTPELRDSWRTLYATAQNEMLRAAGRPSSF
BraelkUSDA94SDgb3
https://img.jgi.doe.gov/cgi-bin/w/main.cgi?section=GeneDetail&page=genePageMainFaa&gene_oid=2513857909
MTPSSNPIERSFELAAAACDDLTPFVYRRLFREHPETQAMFRTEGSEPVK
GSMLQLTIEALLDFAGERRGHFRLIESEVFSHDAYGTPRELFVAFFAVIA
DSLREILGEQWTAEIDAAWHKLLGDIEAIVLQQKHLVDGRP
BraelkWSM1741SDgb1
This is a portion of the data; to view all the data, please download the file.
Dataset 2.Predicted Glb polypeptides detected in the genomes of rhizobial bacteria.
Globin nomenclature corresponds to the first three binomial (genus and species) letters followed by the strain name, globin type and globin copy number. URLs indicate links to individual Glb polypeptide sequences57.
Data set 3. Distance to the heme Fe and orientation of distal, proximal, B10 and CD1 amino acids in the predicted structure of selected rhizobial Glbs (Table S2). Structural homologs (including the PDB ID number), amino acids from the structural homologs and values for the structural homologs amino acids to individual rhizobial GLbs are indicated in parenthesis for comparison.
Amino acid from theDistance to heme ironDihedral angle (o)
Glbpredicted structure(�)omegaphipsi
fHbs
BurphySTM815fHbH (H) proximal1.73 (2.09)-178.57 (179.80)-63.64 (-60.72)-37.17 (-34.98)
(Escherichia coli,Q (Q) distal6.93 (7.05)-177.03 (-179.94)-62.85 (-56.93)-45.65 (-37.76)
EsccolfHb, PDB ID: 1GVH)Y (Y) B1011.76 (7.14)-173.45 (179.90)-61.24 (-70.76)-46.21 (-28.81)
F (F) CD110.51 (6.08)-178.43 (177.54)-120.84 (-79.29)153.65 (160.67)
CunecHPC(L)fHbH (H) proximal1.63 (2.09)-179.86 (179.80)-64.08 (-60.72)-39.71 (-34.98)
(Escherichia coli,Q (Q) distal6.93 (7.05)-175.27 (-179.94)-63.77 (-56.93)-43.40 (-37.76)
EsccolfHb, PDB ID: 1GVH)Y (Y) B109.60 (7.14)-173.45 (179.90)-61.24 (-70.76)-46.21 (-28.81)
F (F) CD112.09 (6.08)-176.40 (177.54)-111.99 (-79.29)152.04 (160.67)
CupnecJMP134fHbH (H) proximal2.10 (2.41)-179.46 (178.61)-65.87 (-63.63)-35.91 (-43.91)
(Alcaligenes eutrophus,Q (Q) distal12.73 (14.32)-175.90 (178.87)-65.64 (-61.05)-40.68 (-48.18)
AlceutfHb, PDB ID: 1CQX)Y (Y) B1012.78 (8.48)-173.72 (179.82)-63.61 (-58.32)-41.97 (-47.04)
F (F) CD19.30 (6.19)-170.17 (179.56)-101.58 (-89.50)167.58 (153.86)
CupnecN1fHb1H (H) proximal1.80 (2.41)-177.73 (178.61)-66.24 (-63.63)-34.66 (-43.91)
(Alcaligenes eutrophus,Q (Q) distal13.24 (14.32)-175.94 (178.87)-63.85 (-61.05)-41.40 (-48.18)
AlceutfHb, PDB ID: 1CQX)Y (Y) B1012.65 (8.48)-171.43 (179.82)-62.49 (-58.32)-43.17 (-47.04)
F (F) CD19.04 (6.19)-169.87 (179.56)-104.69 (-89.50)167.08 (153.86)
CupnecN1fHb2H (H) proximal2.47 (2.41)179.46 (178.61)-65.95 (-63.63)-36.68 (-43.91)
(Alcaligenes eutrophus,Q (Q) distal11.46 (14.32)-176.05 (178.87)-66.47 (-61.05)-40.35 (-48.18)
AlceutfHb, PDB ID: 1CQX)Y (Y) B1012.18 (8.48)-179.87 (179.82)-63.93 (-58.32)-42.83 (-47.04)
F (F) CD19.01 (6.19)-171.87 (179.56)-102.69 (-89.50)166.66 (153.86)
RhilegUPM1137fHb H (H) proximal1.44 (2.09)-178.72 (179.80)-65.22 (-60.72)-34.42 (-34.98)
(Escherichia coli,Q (Q) distal7.36 (7.05)-175.65 (-179.94)-63.30 (-56.93)-43.58 (-37.76)
EsccolfHb, PDB ID: 1GVH)Y (Y) B107.62 (7.14)178.20 (179.90)-63.83 (-70.76)-40.35 (-28.81)
F (F) CD15.45 (6.08)-178.18 (177.54)-99.47 (-79.29)150.45 (160.67)
Sinmel1021fHbH (H) proximal2.10 (2.41)-178.97 (178.61)-66.97 (-63.63)-34.97 (-43.91)
(Alcaligenes eutrophus,Q (Q) distal11.52 (14.32)-176.36 (178.87)-67.88 (-61.05)-40.68 (-48.18)
AlceutfHb, PDB ID: 1CQX)Y (Y) B1013.24 (8.48)-173.26 (179.82)-62.67 (-58.32)-41.97 (-47.04)
F (F) CD18.81 (6.19)-174.43 (179.56)-87.96 (-89.50)166.13 (153.86)
SDgbs
AzodoeUFLA1-100SDgbH (H) proximal4.44 (2.10)-176.39 (179.59)-65.77 (-62.81)-38.59 (-36.47)
(Methylacidiphilum infernorum,Q (Q) distal5.14 (4.25)-176.28 (177.10)-63.30 (-61.05)-43.96 (-41.66)
MetaciSDgb, PDB ID: 3S1I)Y (Y) B1010.60 (5.29)-175.00 (-179.52)-62.39 (-65.92)-46.65 (-39.33)
F (F) CD16.91 (5.63)-173.63 (-176.31)-82.55 (-88.98)108.01 (117.55)
BraelkUSDA94SDgb2H (H) proximal2.88 (1.98)-179.39 (178.02)-61.47 (-60.62)-43.67 (-44.91)
(Vitreoscilla stercoraria,
VitsteSDgb, PDB ID: 2VHB)
BraelkUSDA3254SDgb1H (H) proximal2.11 (2.03)-179.51 (177.31)-63.16 (-64.03)-41.44 (-37.89)
(Ralstonia eutropha, K (Q) distal7.40 (8.79)-176.70 (177.55)-65.53 (-72.20)-40.35 (-35.50)
RaleutfHb, PDB: 3OZW)Y (Y) B109.80 (7.41)-171.54 (-177.52)-59.98 (-64.58)-48.43 (-38.15)
F (F) CD16.63 (5.92)-178.73 (173.89)-85.36 (-76.98)149.51 (156.36)
BraelkUSDA3254SDgb2H (H) proximal3.96 (2.10)-175.94 (179.59)-64.12 (-62.81)-33.35 (-36.47)
(Methylacidiphilum infernorum,R (Q) distal5.08 (4.25)-177.96 (177.10)-64.28 (-61.05)-40.91 (-41.66)
MetaciSDgb, PDB ID: 3S1I)Y (Y) B109.17 (5.29)-176.20 (-179.52)-62.28 (-65.92)-48.26 (-39.33)
F (F) CD17.98 (5.63)-176.45 (-176.31)-82.55 (-88.98)112.30 (117.55)
BraelkUSDA3259SDgb1H (H) proximal3.79 (2.10)-175.79 (179.59)-65.44 (-62.81)-35.62 (-36.47)
(Methylacidiphilum infernorum,Q (Q)distal5.09 (4.25)-177.42 (177.10)-63.89 (-61.05)-46.08 (-41.66)
MetaciSDgb, PDB ID: 3S1I)Y (Y) B109.62 (5.29)-177.17 (-179.52)-62.71 (-65.92)-46.56 (-39.33)
F (F) CD17.45 (5.63)-177.35 (-176.31)-87.82 (-88.98)115.73 (117.55)
BraelkWSM1741SDgb2H (H) proximal4.37 (2.10)-175.67 (179.59)-65.68 (-62.81)-36.99 (-36.47)
(Methylacidiphilum infernorum,K (Q) distal5.63 (4.25)-176.67 (177.10)-64.70 (-61.05)-40.63 (-41.66)
MetaciSDgb, PDB ID: 3S1I)Y (Y) B107.63 (5.29)-171.82 (-179.52)-69.31 (-65.92)-40.53 (-39.33)
F (F) CD16.13 (5.63)16.38 (-176.31)-72.70 (-88.98)-12.34 (117.55)
BrajapUSDA38SDgb2H (H) proximal3.79 (2.10)-174.73 (179.59)-63.88 (-62.81)-40.77 (-36.47)
(Methylacidiphilum infernorum,M (Q) distal7.23 (4.25)-177.85 (177.10)-63.70 (-61.05)-43.96 (-41.66)
MetaciSDgb, PDB ID: 3S1I)Y (Y) B1010.09 (5.29)-178.21 (-179.52)-64.70 (-65.92)-41.73 (-39.33)
F (F) CD16.12 (5.63)-177.99 (-176.31)-62.84 (-88.98)-35.83 (117.55)
BrajapUSDA124SDgb1H (H) proximal2.31 (1.98)-178.52 (-177.74)-61.22 (-61.44)-41.84 (-51.23)
(Saccharomyces cerevisiae,Q (Q) distal6.63 (6.67)-176.70 (172.73)-65.91 (-152.51)-40.26 (5.68)
SaccerfHb, PDB ID: 4G1B)Y (Y) B108.18 (7.35)178.65 (-172.97)-68.13 (-74.10)-36.03 (-45.12)
F (F) CD111.33 (5.73)-176.90 (-173.99)-107.76 (-79.00) 92.93 (144.72)
GCSs
Brajapin8p8GCSH (H) proximal1.76 (2.01)-178.74 (-175.91)-61.67 (-74.33)-36.94 (-27.48)
(Bacillus subtilis,Q (L) distal7.08 (7.20)-177.44 (177.35)-63.15 (-71.12)-43.20 (-35.52)
BacsubGCS, PDB ID: 1OR4)Y (Y) B108.19 (5.65)-178.71 (-176.45)-63.80 (-73.87)-43.29 (-37.09)
I (I) CD112.43 (7.33)178.61 (-177.29)-62.57 (-64.94)-41.16 (-47.43)
RhietlCIAT652GCS1H (H) proximal2.30 (2.01)-179.78 (-175.91)-65.35 (-74.33)-35.74 (-27.48)
(Bacillus subtilis,Q (L) distal9.04 (7.20)-177.39 (177.35)-62.91 (-71.12)-42.75 (-35.52)
BacsubGCS, PDB ID: 1OR4)Y (Y) B107.62 (5.65)-178.04 (-176.45)-62.57 (-73.87)-42.78 (-37.09)
F (I) CD14.73 (7.33)178.16 (-177.29)-58.00 (-64.94)-52.92 (-47.43)
RhietlCIAT652GCS2E (H) proximal2.60 (1.88)-177.62 (-179.46)-66.87(-73.99)-31.13 (-37.72)
(Geobacter sulfurreducens,Q (H) distal4.96 (2.09)-175.85 (-179.91)-67.88 (-76.90)-35.44 (-30.34)
GeosulGCS, PDB ID: 2W31)F (Y) B1011.03 (11.06)179.24 (-178.96)-65.10 (-62.85)-38.73 (-46.43)
F (F) CD17.98 (9.76)-178.00 (-172.60)-59.25 (-71.70)-51.67 (-10.93)
Rhietl8C3GCSE (H) proximal2.60 (1.88)-177.65 (-179.46)-66.91 (-73.99)-31.17 (-37.72)
(Geobacter sulfurreducens,Q (H) distal4.97 (2.09)-175.84 (-179.91)-67.84 (-76.90)-35.45 (-30.34)
GeosulGCS, PDB ID: 2W31)F (Y) B1011.05 (11.06)179.22 (-178.96)-65.13 (-62.85)-38.72 (-46.43)
F (F) CD18.00 (9.76)-178.08 (-172.60)-59.19 (-71.70)-51.65 (-10.93)
RhietlCFN42GCS1H (H) proximal5.56 (2.01)-177.13 (-175.91)-63.16 (-74.33)-39.86 (-27.48)
(Bacillus subtilis,Q (L) distal8.07 (7.20)-177.54 (177.35)-62.14 (-71.12)-44.08 (-35.52)
BacsubGCS, PDB ID: 1OR4)Y (Y) B107.53 (5.65)179.72 (-176.45)-62.60 (-73.87)-41.72 (-37.09)
F (I) CD15.12 (7.33)179.24 (-177.29)-58.70 (-64.94)-50.30 (-47.43)
RhilegGB30GCS1H (H) proximal2.84 (2.01)-179.69 (-175.91)-63.63 (-74.33)-38.52 (-27.48)
(Bacillus subtilis,Q (L) distal7.81 (7.20)-177.70 (177.35)-62.27 (-71.12)-43.10 (-35.52)
BacsubGCS, PDB ID: 1OR4)Y (Y) B107.56 (5.65)177.05 (-176.45)-63.93 (-73.87)-42.49 (-37.09)
F (I) CD15.11 (7.33)-177.07 (-177.29)-59.52 (-64.94)-50.57 (-47.43)
RhilegGB30GCS2E (H) proximal2.52 (1.88)-178.50 (-179.46)-65.29 (-73.99)-31.38 (-37.72)
(Geobacter sulfurreducens,Q (H) distal6.93 (2.09)-174.45 (-179.91)-63.74 (-76.90)-38.22 (-30.34)
GeosulGCS, PDB ID: 2W31)F (Y) B106.72 (11.06)-176.48 (-178.96)-67.79 (-62.85)-26.67 (-46.43)
F (F) CD15.90 (9.76)179.12 (-172.60)-61.11 (-71.70)-48.54 (-10.93)
SinfreGR64GCSE (H) proximal4.00 (2.01)-179.57 (-175.91)-63.89 (-74.33)-33.98 (-27.48)
(Bacillus subtilis,Q (L) distal5.77 (7.20)-174.52 (177.35)-63.32 (-71.12)-40.47 (-35.52)
BacsubGCS, PDB ID: 1OR4)F (Y) B109.90 (5.65)-178.12 (-176.45)-70.29 (-73.87)-33.10 (-37.09)
F (I) CD14.52 (7.33)-178.36 (-177.29)-61.37 (-64.94)-53.60 (-47.43)
Sinmel1021GCSE (H) proximal4.40 (2.01)-178.22 (-175.91)-65.21 (-74.33)-34.24 (-27.48)
(Bacillus subtilis,Q (L) distal6.72 (7.20)-175.68 (177.35)-65.08 (-71.12)-39.35 (-35.52)
BacsubGCS, PDB ID: 1OR4)S (Y) B1010.57 (5.65)-176.74 (-176.45)-63.73 (-73.87)-40.20 (-37.09)
F (I) CD18.83 (7.33)-176.32 (-177.29)-58.76 (-64.94)-48.53 (-47.43)
tHbs
AzodoeUFLA1-100tHb1H (H) proximal5.88 (2.08)-174.84 (177.99)-65.16 (-72.67)-42.78 (-33.19)
(Campylobacter jejuni,H (H) distal6.32 (5.72)176.31 (175.81)-72.04 (-70.37)-38.34 (-46.43)
CamjejtHb, PDB ID: 2IG3)Y (Y) B105.96 (5.40)-175.91 (-175.09)-62.64 (-74.54)-43.28 (-19.57)
F (F) CD14.59 (4.79)-176.88 (-176.78)-63.46 (-69.12)-45.74 (-45.47)
AzodoeUFLA1-100tHb2H (H) proximal4.71 (1.99)-175.83 (-177.97)-70.83 (-85.55)-22.87 (-15.70)
(Agrobacterium tumefaciens,L (F) distal6.01 (5.32)-176.59 (-178.59)-60.57 (-68.21)-44.34 (-40.86)
AgrtumtHb, PDB ID: 2XYK)Y (Y) B107.00 (6.01)178.40 (-179.33)-65.62 (-71.14)-40.92 (-33.27)
H (H) CD15.35 (6.28)174.53 (177.57)-76.23 (-85.38)141.88 (158.10)
BraelkUSDA76tHb2H (H) proximal7.23 (2.08)-173.87 (177.99)-63.86 (-72.67)-42.19 (-33.19)
(Campylobacter jejuni,H (H) distal8.18 (5.72)177.95 (175.81)-68.18 (-70.37)-37.42 (-46.43)
CamjejtHb, PDB ID: 2IG3)Y (Y) B107.01 (5.40)-179.37 (-175.09)-64.57 (-74.54)-40.69 (-19.57)
F (F) CD14.94 (4.79)-179.53 (-176.78)-62.41 (-69.12)-42.53 (-45.47)
BraelkUSDA94tHb1H (H) proximal4.73 (2.01)-176.61 (-177.52)-64.06 (-85.92)-36.96 (-8.56)
(Geobacillus stearothermophilus,H (Q) distal5.57 (6.40)-179.91 (-179.48)-62.77 (-68.05)-46.16 (-37.87)
GeostetHb, PDB ID: 2BKM)Y (Y) B107.53 (5.97)-176.52 (-175.55)-62.63 (-72.60)-47.06 (-25.12)
F (F) CD15.16 (4.97)173.65 (175.39)-114.06 (-94.32)139.02 (157.40)
BrajapUSDA38tHb2H (H) proximal5.04 (1.99)-175.12 (-177.97)-70.71 (-85.55)-21.60 (-15.70)
(Agrobacterium tumefaciens,L (F) distal4.75 (5.32)-176.19 (-178.59)-63.10 (-68.21)-37.51 (-40.86)
AgrtumtHb, PDB ID: 2XYK)Y (Y) B106.66 (6.01)-177.33 (-179.33)-66.51 (-71.14)-38.00 (-33.27)
H (H) CD17.20 (6.28)-174.53 (177.57)-85.28 (-85.38)139.31 (158.10)
BrajapUSDA123tHb1H (H) proximal4.76 (2.10)178.74 (178.30)-93.14 (-94.71)-32.09 (-1.97)
(Arabidopsis thaliana,H (Q) distal5.73 (6.21)-179.70 (-175.91)-64.46 (-67.26)-38.96 (-52.95)
ArathatHb, PDB ID: 4C0N)Y (Y) B104.82 (4.95)-178.26 (-173.06)-67.21 (-74.75)-34.44 (-25.42)
F (F) CD16.22 (5.87)179.48 (-176.67)-62.92 (-93.83)-43.45 (9.52)
BurphySTM815tHb1H (H) proximal7.14 (2.08)-174.40 (177.99)-64.03 (-72.67)-44.14 (-33.19)
(Campylobacter jejuni,H (H) distal5.60 (5.72)-176.71 (175.81)-63.13 (-70.37)-39.85 (-46.43)
CamjejtHb, PDB ID: 2IG3)Y (Y) B108.26 (5.40)-172.62 (-175.09)-65.67 (-74.54)-41.84 (-19.57)
F (F) CD15.39 (4.79)179.28 (-176.78)-63.35 (-69.12)-43.85 (-45.47)
BurphySTM815tHb2H (H) proximal7.19 (1.99)-176.23 (-177.97)-74.29 (-85.55)-10.86 (-15.70)
(Agrobacterium tumefaciens,L (F) distal5.62 (5.32)-176.54 (-178.59)-60.34 (-68.21)-48.51 (-40.86)
AgrtumtHb, PDB ID: 2XYK)Y (Y) B107.06 (6.01)-177.13 (-179.33)-65.04 (-71.14)-39.10 (-33.27)
H (H) CD15.24 (6.28)175.46 (177.57)-81.97 (-85.38)141.12 (158.10)
CupnecN1tHb1H (H) proximal4.55 (2.06)-172.38 (-171.43)-65.11 (-115.96)-41.69 (17.15)
(Tetrahymena pyriformis,L (Q) distal7.24 (9.92)-177.76 (-176.32)-62.73 (-69.53)-40.87 (-35.62)
TetpyrtHb, PDB ID: 3AQ5)F (Y) B108.68 (5.48)-178.91 (177.58)-59.27 (-66.62)-49.65 (-33.45)
F (F) CD17.17 (5.09)179.23 (-177.03)-66.91 (-95.88)-15.74 (9.14)
CupnecN1tHb2H (H) proximal4.34 (1.99)-174.71 (-177.97)-72.37 (-85.55)-13.46 (-15.70)
(Agrobacterium tumefaciens,L (F) distal7.89 (5.32)-176.79 (-178.59)-63.06 (-68.21)-40.33 (-40.86)
AgrtumtHb, PDB ID: 2XYK)Y (Y) B106.62 (6.01)-176.67 (-179.33)-66.60 (-71.14)-35.30 (-33.27)
H (H) CD15.11 (6.28)175.61 (177.57)-87.63 (-85.38)138.75 (158.10)
MescicCMG6tHbH (H) proximal5.41 (2.01)-177.93 (-177.52)-71.84 (-85.92)-20.55 (-8.56)
(Geobacillus stearothermophilus,H (Q) distal4.54 (6.40)-173.93 (-179.48)-67.73 (-68.05)-46.66 (-37.87)
GeostetHb, PDB ID: 2BKM)Y (Y) B105.91 (4.97)-178.77 (-175.55)-64.17 (-72.60)-38.09 (-25.12)
F (F) CD15.43 (5.97)-178.37 (175.39)-80.07 (-94.32)161.16 (157.40)
MesloNZP2037tHb2H (H) proximal2.60 (2.08)-173.87 (177.99)-63.86 (-72.67)-42.19 (-33.19)
(Campylobacter jejuni,H (H) distal5.59 (5.72)177.95 (175.81)-68.18 (-70.37)-37.42 (-46.43)
CamjejtHb, PDB ID: 2IG3)Y (Y) B108.28 (5.40)-179.37 (-175.09)-64.57 (-74.54)-40.69 (-19.57)
F (F) CD14.68 (4.79)-179.53 (-176.78)-62.41 (-69.12)-42.53 (-45.47)
RhietlCNPAF512tHbH (H) proximal7.24 (2.08)-173.87 (177.99)-63.86 (-72.67)-42.19 (-33.19)
(Campylobacter jejuni,H (H) distal6.33 (5.72)177.95 (175.81)-68.18 (-70.37)-37.42 (-46.43)
CamjejtHb, PDB ID: 2IG3)Y (Y) B108.28 (5.40)-179.37 (-175.09)-64.57 (-74.54)-40.69 (-19.57)
F (F) CD13.66 (4.79)-179.53 (-176.78)-62.41 (-69.12)-42.53 (-45.47)
RhietlKim5tHbH (H) proximal1.77 (1.99)-179.37 (-177.97)-69.98 (-85.55)-9.25 (-15.70)
(Agrobacterium tumefaciens,F (F) distal5.62 (5.32)-177.24 (-178.59)-64.54 (-68.21)-42.08 (-40.86)
AgrtumtHb, PDB ID: 2XYK)Y (Y) B107.57 (6.01)177.83 (-179.33)-66.46 (-71.14)-37.33 (-33.27)
This is a portion of the data; to view all the data, please download the file.
Dataset 3.Distance to the heme Fe and orientation of distal, proximal, B10 and CD1 amino acids in the predicted structure of selected rhizobial Glbs (Table S2).
Structural homologs (including the PDB ID number), amino acids from the structural homologs and values for the structural homologs amino acids to individual rhizobial Glbs are indicated in parenthesis for comparison58.

Results and discussion

Detection of Glb sequences in the genomes of α- and β-rhizobia

Recently, Vinogradov et al.7 reported that Glb sequences exist in the genomes of 96 rhizobia. However, this report did not provide the rhizobial Glb sequences or links to rhizobial scaffolds containing the Glb sequences. Hence, we searched in databases (see the Methods section and Table S1) in order to obtain rhizobial Glb sequences for analysis. We selected 62 out of the 96 rhizobial genomes reported by the above authors representing the major rhizobial genera, species and strains, which included α- and β-rhizobia (i.e. those classified within the α- and β-proteobacteria, respectively). A total of 197 glb sequences were detected in the 62 rhizobial genomes, corresponding to 7 fhbs, 47 sdgbs, 40 gcss and 103 thbs (4 thbs class 1, 56 thbs class 2 and 43 thbs class 3). Individual Glb nucleotide and polypeptide sequences and links to rhizobial scaffolds containing the Glb sequences are provided in Dataset 1 and Dataset 2, respectively. All the rhizobial genomes analyzed in this work contained glb sequences, thus indicating that glbs are widespread in rhizobia. However, protoglobin and sensor single domain globin sequences were not detected in the rhizobial genomes. This observation indicates that apparently only the fhb, sdgb, gcs and thb lineages evolved within rhizobia.

A distribution analysis showed that most (61) of the rhizobial genomes analyzed in this work contain thbs, either as single thbs (13) or in combination with fhbs, sdgbs and/or gcss (48). Furthermore, one rhizobial genome contained only a gcs and none contained only fhbs and sdgbs and the combinations fhbs + sdgbs, fhbs + gcss and sdgbs + gcss (Figure 1). These observations indicate that in the rhizobia analyzed in this work thbs predominate over other glbs and that in these bacteria fhbs, sdgbs and gcss mostly exist in combination with thbs. Also, analysis of the glb copy number showed that in the rhizobia analyzed in this work fhbs mostly exist as single copy (ranging from one to two copies), sdgbs mostly exist as two copies (ranging from one to four copies), gcss exist as either single or two copies (ranging from one to two copies) and thbs mostly exist as two copies (ranging from one to three copies) although quite a few thbs exist as single copy (Table 1). Thus, apparently rhizobial glbs mostly exist as either single or two copies.

f59a205a-0b40-4048-96e9-1d49054ced61_figure1.gif

Figure 1. Venn diagram illustrating the distribution of glb genes in the rhizobial bacteria analyzed in this work.

Numbers correspond to rhizobial genomes containing glbs.

Table 1. Number of glb copies detected in the rhizobial genomes analyzed in this work.

glb/no. of copiesNo. of genomes
fhbs
15
21
sdgbs
15
211
34
42
gcss
112
214
thbs
122
236
33

Mapping of glb genes in the rhizobial genomes

The glb genes detected in this work were mapped within the rhizobial genomes in order to identify genes that flank nearby to and could coexpress with glbs. Mapping analysis showed that rhizobial glb copies are located in different scaffolds and that they are not tandemly arrayed. Figure S1A shows that either no ORFs or ORFs coding for hypothetical or non-identified proteins are located nearby most of the rhizobial fhb genes. However, genes coding for the transcriptional regulator NsrR, 2-nitropropane dioxygenase and NosR, Z, D, F, Y and X are located nearby cupnecN1fhb1, rhilegUPM1137fhb and sinmel1021fhb, respectively. Figure S1B shows that B. elkanii and B. japonicum sdgbs are mostly flanked by genes coding for proteins that function in nitrate/nitrite metabolism and sugar transport. Figure S1C shows that genes coding for proteins that function in chemotaxis are located nearby several rhizobial gcss, although genes coding for a peptide deformylase, sugar and nitrate transport proteins and NAD(P)H nitrate reductase are located nearby some other rhizobial gcss. Figure S1D shows that genes flanking the rhizobial thbs are rather variable. However, B. japonicum thbs are often flanked by genes coding for the transcriptional regulator Rieske Fe-S, shikimate kinase and alcohol dehydrogenase; mesorhizobia thbs are often flanked by genes coding for permeases and tRNA-Trp, and R. leguminosarum thbs are often flanked by genes coding for membrane proteins. Thus, if glb and flanking genes coexpress in rhizobia, and proteins coded by these genes function within the same metabolic pathways, the above observations suggest that rhizobial Glbs could play a variety of roles in rhizobial physiology, including nitrate/nitrite metabolism, transport processes, gene regulation and chemotaxis. Interestingly, with the exception of sinmel1021fhb which is flanked by nos and fix genes (Figure S1A)26, nif and fix genes coding for proteins that function in N2-fixation were not detected nearby the rhizobial glb genes. This observation suggests that rhizobial Glbs might not directly function in N2-fixation.

Detection of promoter sequences upstream to the rhizobial glb genes

Identification of promoter sequences is crucial to an understanding of gene regulation and ultimately protein function within the cell's physiology. Hence, we searched for canonical (-10 and -35) promoters and the O2- and NO-regulated Fnr promoter30,41,42 within 130 nucleotides upstream to 44 selected rhizobial glb genes (i.e. those representative of major rhizobial Glb clades identified in this work (see Figure 2)). Also, we searched for Shine-Dalgarno sequences within the same region, which indicate that Glb transcripts could be translated into proteins. Results showed that, with the exception of burphySTM815thb1, burphySTM815thb2 and rhilupHPC(L)thb1, a -10 promoter is absent upstream of the selected rhizobial glbs. In contrast, with the exception of cupnecN1thb1 and rhilupHPC(L)thb2, a -35 promoter exists upstream of the selected rhizobial glbs. Searching for Fnr promoter sequences revealed that Fnr-like promoters exist upstream to 30 out of the 44 selected rhizobial glbs, including fhb, sdgb, gcs and thb genes. A Shine-Dalgarno sequence was detected upstream to most of the selected rhizobial glbs (Table 2). These observations suggest that the -35 promoter is a major canonical promoter that regulates most of the rhizobial glbs, that it is likely that several rhizobial glbs are regulated by levels of O2 and NO throughout an FNR mechanism4144 and that rhizobial Glb transcripts are translated into proteins.

Table 2. Position of canonical and Fnr-like promoter sequences and Shine-Dalgarno sequence within 130 nucleotides upstream to selected rhizobial glb genes.

Consensus sequences are indicated in parenthesis. Identical and non-identical nucleotides into the Fnr-like promoter sequences to the consensus Fnr promoter sequence are indicated with upper- and lowercase letters, respectively. N.D., non-detected.

glb geneCanonical promotersFnr promoterShine-Dalgarno
sequence
(AGGAGG)
-10 promoter-35 promoterSequencePosition
(TATAAT)(TTGACA)(TTTAAGAGGCCAAT)
fhbs
burphySTM815fhb N.D.-36 to -41TcTAAGcGaCtgAT-102 to -115-10 to -13
cupnecHPC(L)fhb N.D.-43 to -48N.D. -8 to -12
cupnecJMP134fhb N.D.-46 to -54TTTAAaAcGgagcc-5 to -18-10 to -15
cupnecN1fhb1 N.D.-46 to -52N.D. -10 to -15
cupnecN1fhb2 N.D.-29 to -36aTcAAGgcGgCgAg-64 to -77-8 to -12
rhilegUPM1137fhb N.D.-32 to -37N.D. -9 to -12
sinmel1021fhb N.D.-48 to -56gTcAAGgaGCCAAa-12 to -25-8 to -12
ggTtgGgGtCCAcT-61 to -74
sdgbs
azodoeUFLA1-100sdgb N.D.-38 to -42gccAgGAGtCCgAT-2 to -15-8 to -12
braelkUSDA94sdgb2 N.D.-36 to -43TaTAAGgacatcAT-114 to -127-7 to -11
braelkWSM1741sdgb2 N.D.-34 to -40N.D. -7 to -12
braelkUSDA3254sdgb1 N.D.-61 to -66TTTttGgGGCaAAT-71 to -84N.D.
braelkUSDA3254sdgb2 N.D.-40 to -44TTTAcGAGGCtgcT-11 to -24-16 to -22
braelkUSDA3259sdgb1 N.D.-41 to -46TTTcAGAactCAtT-22 to -35-8 to -12
cTTcgGttaCCAAT-56 to -69
brajapUSDA38sdgb2 N.D.-53 to -58N.D. -6 to -9
brajapUSDA124sdgb1 N.D.-60 to -65N.D. -7 to -11
gcss
brajapin8p8gcs N.D.-55 to -60gTTtcGcctCCgAT-21 to -34-6 to -11
rhietlCIAT652gcs1 N.D.-44 to -50N.D. -7 to -9
rhietlCIAT652gcs2 N.D.-31 to -36gTggAGAGGaCcgT-91 to -104-2 to -5
rhietl8c3gcs N.D.-44 to -51TTTAAccaGgCAtc-80 to -93-5 to -11
rhietlCFN42gcs1 N.D.-65 to -70N.D. N.D.
rhilegGB30gcs1 N.D.-51 to -55TgatcGAGGCaAgg-33 to -46-8 to -10
rhilegGB30gcs2 N.D.-47 to -52gTggtGAGGaCcgT-90 to -103-3 to -8
sinfreGR64gcs N.D.-23 to -29TTcAgcgGGCCAca-47 to -60-6 to -8
sinmel1021gcs N.D.-31 to -36N.D. -6 to -8
thbs
azodoeUFLA1-100thb1 N.D.-54 to -59TgctgGAcGCCAAc-95 to -108-5 to -7
azodoeUFLA1-100thb2 N.D.-61 to -67N.D. -7 to -12
braelkUSDA76thb2 N.D.-56 to -61TTTgAGAtaCCtAT-15 to -28-3 to -5
braelkUSDA94thb1 N.D.-34 to -39gTTgAGAGcCgcca-59 to -72-2 to -4
brajapUSDA38thb2 N.D.-38 to -42TaTAtcAGGgCAca-23 to -36-7 to -9
brajapUSDA123thb1 N.D.-32 to -37N.D. -11 to -14
burphySTM815thb1 -8 to -13-31 to 35TaTAAacGGtaAcT-90 to -103-2 to -4
burphySTM815thb2 -23 to -28-43 to -47cTgAtGcGGCCAgc-70 to -83 N.D.
cupnecN1thb1 N.D.N.D.TcgctaAGGCCgcT 47 to -60-7 to -11
cupnecN1thb2 N.D.-43 to -47N.D. -7 to -9
mescicWSM1271thb N.D.-61 to -66TTgtAGtGGgCgAc-95 to -108-8 to -12
meslotNZP2037thb2 N.D.-39 to -43TgcAAGccGCCAtc-47 to -60-9 to -12
rhietlCNPAF512thb N.D.-32 to -37TaTAtGAGGagcgg-28 to -41N.D.
rhietlKIM5thb N.D.-59 to -64N.D. -8 to -10
rhilegGB30thb2 N.D.-54 to -58TTggAatGGaCAAT-58 to -71-6 to -10
rhilegVc2thb1 N.D.-31 to -35TTcgAcAtGCaAAT-90 to -103N.D.
rhilupHPC(L)thb1 -17 to -22-40 to -45N.D. -4 to -7
rhilupHPC(L)thb2 N.D.N.D.N.D. -7 to -11
sinfreHH103thb N.D.-42 to -49TTTgtcAaGCCctg-102 to -115-3 to -6 and
-8 to -11
sinmel1021thb2 N.D.-57 to -63cTTgtcgGGCagAT-87 to -100-5 to -7

Sequence alignments and phenetic analysis of rhizobial Glbs

Pairwise sequence alignments showed that the rhizobial fHbs, SDgbs, GCSs and tHbs analyzed in this work are 34.6 to 85.4%, 6.7 to 100%, 10.9 to 100% and 3.5 to 100% identical, respectively. This indicates that variability among the rhizobial Glb sequences is high. Moreover, identity values for the fHbs globin and flavin domains were 39.1 to 93.7% and 26.5 to 81.1%, respectively, and identity values for the GCSs globin and transmitter domains were 17.5 to 100% and 5.9 to 100%, respectively. Thus, apparently in the rhizobial fHbs and GCSs analyzed in this work the globin domain is more conserved than the flavin and transmitter domains.

The average length and molecular mass for the rhizobial fHbs, SDgbs, GCSs and tHbs analyzed in this work are 400 amino acids and 44 kDa, 141 amino acids and 15 kDa, 510 amino acids and 55 kDa and 149 amino acids and 17 kDa, respectively. However, sequence analysis revealed that globin domain from BraelkUSDA76tHb1, BraelkUSDA94tHb1 and Braelk587tHb2 contains 119 to 237 extra amino acids at the N-terminal and 131 extra amino acids at the C-terminal, and that the globin domain from BrajapUSDA123tHb1, BrajapUSDA135tHb1, BraelkWSM1741SDgb2, RhietlCFN42GCS1, BrajapUSDA4tHb2 and BrajapWSM2793tHb3 contains 27 to 73 extra amino acids at the N-terminal. In contrast, a large deletion comprising helices A and B, CD loop and part of helix E was detected in the BraelkUSDA94SDgb2 sequence indicating that BraelkUSDA94SDgb2 is 89 amino acids in length (Figure S2).

Multiple sequence alignment showed that, with the exception of 21 GCSs, in the rhizobial Glbs analyzed in this work, the proximal (F8, located at position 322/323 in Figure S2) amino acid to the heme Fe is H. Apparently, in the above rhizobial GCSs, F8 is E. Amino acids other than H occupying the F8 position in bacterial Glbs were previously reported by Vinogradov et al.7. However, because H F8 is absolutely conserved in Glbs (i.e. from bacteria to mammals)1,32,4547, assigning E F8 to rhizobial (and other bacterial) GCSs should be taken with caution as this assignment might result from a sequence alignment artifact. Ideally, F8 from rhizobial GCSs should be identified by experimental methods, such as x-ray crystallography. Multiple sequence alignment also showed that in the rhizobial Glbs analyzed in this work, the distal (E7, located at position 285/289/290 in Figure S2) amino acid to the heme Fe is Q in fHbs, can be Q/R/K/M/L in SDgbs, Q in GCSs and can be H/F/L/V/R in tHbs. This indicates that distal Q is conserved in rhizobial fHbs and GCSs and that amino acids occupying the distal position in rhizobial SDgbs and tHbs are variable. The B10 and CD1 amino acids (located at positions 257 and 270/271/273 in Figure S2, respectively), which also participate in binding of ligands to the heme Fe4850, are Y and F in most of the rhizobial Glbs analyzed in this work followed by (in order of abundance) F, S and V and H, I, S and Y, respectively.

f59a205a-0b40-4048-96e9-1d49054ced61_figure2.gif

Figure 2. Phenetic relationships among Glbs detected in the genomes of rhizobial bacteria.

Phenogram was obtained from the Glbs sequence alignment shown in Figure S2. The fHb, SDgb, GCS, tHb class 1, tHb class 2 and tHb class 3 clusters are indicated with light blue, dark blue, red, light green, bright green and dark green, respectively. Stars indicate Glbs selected for the detection of promoter sequences upstream to the glb genes and Glb protein modeling.

A phenogram was constructed from the above multiple sequence alignment. Figure 2 shows that the rhizobial Glbs analyzed in this work segregate into two main lineages: one containing fHbs, SDgbs and GCSs, and the other containing tHbs (the fHb/SDgb/GCS and tHb lineages, respectively). This is consistent with the main evolutionary lineages identified in bacterial Glbs1,51,52 thus indicating that major evolutionary patterns for rhizobial Glbs were identical to those for other bacterial Glbs. Rhizobial fHbs and GCSs cluster with rhizobial SDgbs within the fHb/SDgb/GCS lineage owing to the similarity between the fHb and GCS globin domains and SDgbs. This has been postulated to be the result of an early divergence from a common ancestor to the bacterial fHb and GCS globin domains and SDgbs1,6. The tHb lineage segregates into rhizobial tHbs class 1, tHbs class 2 and tHbs class 3. Within this lineage the rhizobial tHbs class 3 segregate in ancestral position to the rhizobial tHbs class 1 and tHbs class 2. Also, the bradyrhizobial, azorhizobial, mesorhizobial, rhizobial and burkholderial tHbs class 3 segregate from each other; the segregation within rhizobial, sinorhizobial, mesorhizobial and β-rhizobial tHbs class 2 is rather conserved, and bradyrhizobial tHbs class 2 and class 3 segregate into the B. elkanii and B. japonicum tHb sublineages. These observations indicate that rhizobial tHbs evolved similarly to other bacterial tHbs7,8,52 and that evolution of rhizobial tHb sublineages was rather conserved.

Modeling and analysis of the predicted rhizobial Glbs tertiary structure

Structure elucidation is essential to a full understand of a protein´s function within the cell´s physiology. The structure of a considerable number of bacterial and non-bacterial Glbs has been elucidated by x-ray crystallography. However, with the exception of a S. meliloti fHb whose tertiary structure was predicted using bioinformatics methods26, the structure of rhizobial Glbs is not known. Hence, we used bioinformatics methods to predict and analyze the tertiary structure of 44 selected rhizobial Glbs (i.e. those representative of major rhizobial Glb clades identified in this work (see Figure 2 and Table S2)) using the best structural homologs as templates (Dataset 3).

Predicted structures for selected rhizobial SDgbs and fHbs and GCSs globin domain and tHbs fold into the 3/3- and 2/2-globin fold, respectively (Figure 3 to Figure 8). Figure 3 shows that structures among the predicted rhizobial fHbs are highly similar. Yet major differences were detected in the BurphySTM815fHb, CupnecHPC(L)fHb and RhilegUMP1137fHb flavin domains, which exhibited two additional helices. Dataset 3 shows that among globin domains from predicted rhizobial fHbs the distance of the proximal H and distal Q to the heme Fe is 1.44 to 2.47 Å and 6.71 to 15.35 Å, respectively. This observation suggests that the heme Fe in rhizobial fHbs is pentacoordinate.

f59a205a-0b40-4048-96e9-1d49054ced61_figure3.gif

Figure 3. Predicted structure of rhizobial fHbs (blue) overlapped to structural homologues (green).

Structural homologues are indicated in Dataset 3. Distal and proximal amino acids to the heme Fe and amino acids that interact with the FAD cofactor are shown in brown. Heme and FAD are shown in red and yellow, respectively. Helices within the globin domain are indicated with letters A to H. All structures are displayed in the same orientation.

f59a205a-0b40-4048-96e9-1d49054ced61_figure4.gif

Figure 4. Predicted structure of selected rhizobial SDgbs (blue) overlapped to structural homologues (green).

Structural homologues are indicated in Dataset 3. Distal and proximal amino acids to the heme Fe are shown in brown. Heme is shown in red. Helices are indicated with letters A to H. All structures are displayed in the same orientation.

f59a205a-0b40-4048-96e9-1d49054ced61_figure5.gif

Figure 5. Predicted structure of selected rhizobial GCSs globin domain (blue) overlapped to structural homologues (green).

Structural homologues are indicated in Dataset 3. Distal and proximal amino acids to the heme Fe are shown in brown. Heme is shown in red. Helices are indicated with letters A to H. All structures are displayed in the same orientation.

f59a205a-0b40-4048-96e9-1d49054ced61_figure6.gif

Figure 6. Predicted structure of class 1 CupnecN1tHb1 (blue) overlapped to the structural homologue Tetrahymena pyriformis tHb (PDB ID 3AQ5) (green).

Distal and proximal amino acids to the heme Fe are shown in brown; only potential distal E11 is shown in the CupnecN1tHb1 structure. Heme is shown in red. Helices are indicated with letters A to H.

f59a205a-0b40-4048-96e9-1d49054ced61_figure7.gif

Figure 7. Predicted structure of selected rhizobial tHbs class 2 (blue) overlapped to structural homologues (green).

Structural homologues are indicated in Dataset 3. Distal and proximal amino acids to the heme Fe are shown in brown; only potential distal E11 is shown in the tHbs structure. Heme is shown in red. Helices are indicated with letters A to H. Pre-helix F is indicated with the Greek letter φ. All structures are displayed in the same orientation.

f59a205a-0b40-4048-96e9-1d49054ced61_figure8.gif

Figure 8. Predicted structure of selected rhizobial tHbs class 3 (blue) overlapped to structural homologues (green).

Structural homologues are indicated in Dataset 3. Distal and proximal amino acids to the heme Fe are shown in brown; only potential distal E11 is shown in the tHbs structure. Heme is shown in red. Helices are indicated with letters A to H. All structures are displayed in the same orientation.

Figure 4 shows that 3/3-globin folding is highly conserved in the predicted structure of the rhizobial SDgbs AzodoeUFLA1-100SDgb, BraelkUSDA3254SDgb2, BraelkUSDA3259SDgb1 and BrajapUSDA38SDgb2. Major variations to 3/3-globin folding from predicted rhizobial SDgbs consisted of the existence of an unusually short helix E in BraelkUSDA94SDgb2, a long helix H in BraelkUSDA3254SDgb1 and BrajapUSDA124SDgb1, and the existence of a pre-helix A followed by a long loop at the N-terminal of BraelkWSM1741SDgb2. Dataset 3 shows that among the predicted rhizobial SDgbs the distance of proximal H and distal Q/R/K/M to the heme Fe is 2.11 to 4.44 Å and 5.08 to 6.63 Å, respectively. This observation suggests that the heme Fe in rhizobial SDgbs is either penta- or hexacoordinate.

Only the globin domain from bacterial GCSs has been crystalized and analyzed by x-ray crystallography53,54 (Dataset 3). Crystal structure for the bacterial GCSs transmitter domain has not been elucidated. Hence, we only predicted and analyzed the tertiary structure of globin domains from the selected rhizobial GCSs. Figure 5 shows that the predicted rhizobial GCSs globin domain exhibits a 1.5- to 3-turn pre-helix A, that (with the exception of SinfreGR64GCS) no loop exists between helices A and B, and that helix H is unusually long in Rhietl8C3GCS, RhietlCIAT652GCS2 and RhilegGB30GCS2. Dataset 3 shows that among the predicted rhizobial GCSs globin domain distance of proximal H/E and distal Q to the heme Fe is 1.77 to 5.56 Å and 4.09 to 9.04 Å, respectively. This observation suggests that the heme Fe in the rhizobial GCSs globin domain is either penta- or hexacoordinate.

Figure 6 to Figure 8 show that 2/2-globin folding is highly conserved in the predicted rhizobial tHbs class 1, class 2 and class 3. Major variations to 2/2-globin folding from predicted rhizobial tHbs consisted of the existence of a 2.5-turn pre-helix A followed by a long loop at the N-terminal of (class 1) CupnecN1tHb1 (Figure 6); the existence of a one-turn pre-helix F (designated as φ in Figure 78) in the rhizobial tHbs class 2; the existence of a long and extended C-terminal region in (class 2) BraelkUSDA94tHb1 (Figure 7), and the substitution of helix A by a long loop that connects to helix B through a 1- to 2.5-turn pre-helix B in (class 3) BraelkUSDA76tHb2, BrajapUSDA123tHb1, BurphySTM815tHb1, MeslotNZP2037tHb2 and Sinmel1021tHb2 (Figure 8). Dataset 3 shows that among the predicted rhizobial tHbs, the distance of proximal H and distal H/L/F to the heme Fe is 1.77 to 7.51 Å and 4.09 to 8.25 Å, respectively. This observation suggests that the heme Fe in the rhizobial tHbs is either penta- or hexacoordinate.

The above observations suggest that in spite of sequence variability (see the Sequence alignments and phenetic analysis of rhizobial Glbs subsection) the structure of rhizobial Glbs is similar to the canonical 3/3- or 2/2-globin folding of bacterial and non-bacterial Glbs. However, a number of predicted rhizobial Glbs exhibited variations at the N- and C-terminal regions suggesting that their structural properties could be different to those of canonical Glbs.

Data also shows that (with few exceptions) in addition to proximal and distal amino acids the distance of B10 and CD1 amino acids to the heme Fe and the orientation of proximal, distal, B10 and CD1 amino acids are similar within and among the predicted rhizobial SDgbs, fHbs and GCSs globin domain and tHbs. These amino acids participate in the binding of ligands to the heme Fe. Thus, these observations suggest that the mechanisms and chemistry for ligand binding are similar among the rhizobial Glbs.

Spectroscopic identification of putative Glbs in soluble extracts from Bradyrhizobium japonicum USDA38 and USDA58

The prerequisites for being able to infer a protein’s function are isolating and characterizing either native or recombinant proteins and detecting protein synthesis in vivo. No rhizobial Glb has been isolated and characterized thus far. However, spectroscopic evidence indicates that putative Glbs exist in soluble extracts from B. japonicum 505 (Wisconsin), R. leguminosarum bv. viciae, B. japonicum NPK63 and R. etli CE3 (see the Introduction section). In order to extend these analyses to other rhizobia, we analyzed soluble extracts from B. japonicum USDA38 and USDA58 by (dithionite reduced + CO minus dithionite reduced) differential spectroscopy using as controls the sperm whale myoglobin and bovine blood hemoglobin. Table 3 shows that absorption peaks and troughs in the Soret and Q regions for the B. japonicum USDA38 and USDA58, B. japonicum 505 (Wisconsin), R. leguminosarum bv. viciae, B. japonicum NPK63 and R. etli CE3 soluble extracts, Vitreoscilla VHb, E. coli K12 Hmp, sperm whale myoglobin and bovine blood hemoglobin are nearly identical. This preliminary evidence indicates that putative soluble Glbs are synthesized in B. japonicum USDA38 and USDA58. Interestingly, genes coding for SDgbs (brajapUSDA38SDgb1 and brajapUSDA38SDgb2) and tHbs (brajapUSDA38tHb1 and brajapUSDA38tHb2) were identified in the B. japonicum USDA38 genome (Dataset 1). Thus, it is likely that putative B. japonicum USDA38 Glbs corresponds to a combination of SDgbs and tHbs. Inferences from the preliminary results reported here should be confirmed by Glb detection, isolation and unequivocal identification after protein sequencing. This may open the possibility to carry out further experimental analyses on rhizobial Glbs.

Table 3. Absorption peaks and troughs in the Soret and Q regions from the (dithionite reduced + CO minus dithionite reduced) differential spectra of rhizobial soluble extracts and other bacterial and vertebrate Glbs.

Rhizobial soluble extract/GlbSoret regionQ regionReference
Peak
(nm)
Trough
(nm)
Peak (nm)Trough (nm)
Rhizobial soluble extracts
B. japonicum USDA38425448535573549600This work
B. japonicum USDA58416437535573554601This work
B. japonicum NPK6342244352957455859824
B. japonicum 505 (Wisconsin)417434540569556n.i.22
R. etli CE342143953956354759025
R. leguminosarum bv. viciae 96424443535574555n.i.23
Bacterial Glbs
Vitreoscilla VHb41843653456755159025
E. coli K12 Hmp42043753057055559255
Vertebrate Glbs
Sperm whale myoglobin419436538578558596This work
Bovine blood hemoglobin417432533570554588This work

n.i., non-identified

Conclusions

Rhizobial Glbs have been poorly studied. However, results reported in this work provide molecular and biochemical data from a bioinformatics perspective that contribute to a better understanding of these proteins. For example, the distribution and outline for the evolution of glb genes and Glb proteins among rhizobia was clarified, genes that could coexpress with the rhizobial glbs were identified and the predicted tertiary structure for rhizobial Glbs was elucidated. Also, spectroscopic analysis suggested that soluble Glbs are synthesized in free-living B. japonicum USDA38 and USDA58. This information will be useful in designing future experimental work focused on clarifying Glb functions within the physiology of free-living and symbiotic rhizobia.

Data availability

F1000Research: Dataset 1. Globin genes detected in the genomes of rhizobial bacteria. 10.5256/f1000research.6392.d46189

F1000Research: Dataset 2. Predicted Glb polypeptides detected in the genomes of rhizobial bacteria. 10.5256/f1000research.6392.d46190

F1000Research: Dataset 3. Distance to the heme Fe and orientation of distal, proximal, B10 and CD1 amino acids in the predicted structure of selected rhizobial Glbs (Table S2). 10.5256/f1000research.6392.d46191

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 13 May 2015
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Gesto-Borroto R, Sánchez-Sánchez M and Arredondo-Peter R. A bioinformatics insight to rhizobial globins: gene identification and mapping, polypeptide sequence and phenetic analysis, and protein modeling. [version 1; peer review: 2 approved] F1000Research 2015, 4:117 (https://doi.org/10.12688/f1000research.6392.1)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 13 May 2015
Views
10
Cite
Reviewer Report 08 Jul 2015
Paul Twigg, Department of Biology, University of Nebraska at Kearney, Kearney, NE, USA 
Approved
VIEWS 10
I find this paper by Gesto-Borroto et al. to be well written and analyzed.  This paper fills a gap in the knowledge base for what is known about bacterial or more specifically rhizobial globins.  The authors seem to have taken ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Twigg P. Reviewer Report For: A bioinformatics insight to rhizobial globins: gene identification and mapping, polypeptide sequence and phenetic analysis, and protein modeling. [version 1; peer review: 2 approved]. F1000Research 2015, 4:117 (https://doi.org/10.5256/f1000research.6858.r9023)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 08 Jul 2015
    Raul Arredondo-Peter, Laboratorio de Biofísica y Biología Molecular, Centro de Investigación en Dinámica Celular, Instituto de Investigación en Ciencias Básicas y Aplicadas, Universidad Autónoma del Estado de Morelos, Colonia Chamilpa, 62210, Mexico
    08 Jul 2015
    Author Response
    We thank Dr. Twigg for evaluating this article and constructive comments.
    Competing Interests: No competing interests were disclosed.
COMMENTS ON THIS REPORT
  • Author Response 08 Jul 2015
    Raul Arredondo-Peter, Laboratorio de Biofísica y Biología Molecular, Centro de Investigación en Dinámica Celular, Instituto de Investigación en Ciencias Básicas y Aplicadas, Universidad Autónoma del Estado de Morelos, Colonia Chamilpa, 62210, Mexico
    08 Jul 2015
    Author Response
    We thank Dr. Twigg for evaluating this article and constructive comments.
    Competing Interests: No competing interests were disclosed.
Views
21
Cite
Reviewer Report 26 May 2015
Manuel Becana, Department of Plant Nutrition, Spanish National Research Council, Zaragoza, Spain 
Approved
VIEWS 21
This is a well-written paper on a subject of great interest. There is very little information of rhizobial globins and the authors have done a good job by systematically analyzing the composition of globin genes of 62 genomes in various ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Becana M. Reviewer Report For: A bioinformatics insight to rhizobial globins: gene identification and mapping, polypeptide sequence and phenetic analysis, and protein modeling. [version 1; peer review: 2 approved]. F1000Research 2015, 4:117 (https://doi.org/10.5256/f1000research.6858.r8649)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 26 May 2015
    Raul Arredondo-Peter, Laboratorio de Biofísica y Biología Molecular, Centro de Investigación en Dinámica Celular, Instituto de Investigación en Ciencias Básicas y Aplicadas, Universidad Autónoma del Estado de Morelos, Colonia Chamilpa, 62210, Mexico
    26 May 2015
    Author Response
    We thank Dr. Becana for evaluating this article and constructive comments.
    Competing Interests: No competing interests were disclosed.
COMMENTS ON THIS REPORT
  • Author Response 26 May 2015
    Raul Arredondo-Peter, Laboratorio de Biofísica y Biología Molecular, Centro de Investigación en Dinámica Celular, Instituto de Investigación en Ciencias Básicas y Aplicadas, Universidad Autónoma del Estado de Morelos, Colonia Chamilpa, 62210, Mexico
    26 May 2015
    Author Response
    We thank Dr. Becana for evaluating this article and constructive comments.
    Competing Interests: No competing interests were disclosed.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 13 May 2015
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.