Comparison of Envelope-Related Genes in Unicellular and Filamentous Cyanobacteria

To elucidate the evolution of cyanobacterial envelopes and the relation between gene content and environmental adaptation, cell envelope structures and components of unicellular and filamentous cyanobacteria were analyzed in comparative genomics. Hundreds of envelope biogenesis genes were divided into 5 major groups and annotated according to their conserved domains and phylogenetic profiles. Compared to unicellular species, the gene numbers of filamentous cyanobacteria expanded due to genome enlargement effect, but only few gene families amplified disproportionately, such as those encoding waaG and glycosyl transferase 2. Comparison of envelope genes among various species suggested that the significant variance of certain cyanobacterial envelope biogenesis genes should be the response to their environmental adaptation, which might be also related to the emergence of filamentous shapes with some new functions.


INTRODUCTION
As the oldest oxygenic phototrophs on the earth, cyanobacteria originated 2.8 ∼ 3.5 billion years ago [1]. Cyanobacteria are usually considered gram negative in traditional classification of prokaryotic envelopes [2], for the existence of outer membrane and lack of teichoic acid in cell walls. However, many unusual features exist in their envelopes. Cyanobacteria have a thick (15 ∼ 35 nm or more) and high crosslinking peptidoglycan layer, similar to gram-positive bacteria [3]. Some rare composition of gram-negative walls, such as carotenoid [4] and β-hydroxypalmitic acid [5], has been found from in lipopolysaccharide (LPS) of cyanobacteria. The archaic organisms contain cellulose indicative of vascular plants [6].
Phylum cyanobacteria has been well diverged in evolution. Some cyanobacteria have been evolved in a multicellular filamentous form, while others remained unicellular. Filamentous cyanobacteria are the oldest known multicellular organisms [7], and the divergence of cyanobacteria is a landmark in biological evolution. Transition from unicellu-lar to filamentous cyanobacteria was a significant evolutionary event, as the organisms were equipped with an advantageous interior nutrition system able to interact with ambient factors [8].
The rise of genomics greatly promoted biological research, of which comparative genomics became an effective tool to explore different species. So far, 25 cyanobacterial genomes, both unicellular and filamentous, have been sequenced, ranging from 1.6 to 9.1 Mb [9]. However, a large difference exists in cell envelope between unicellular and filamentous species. At present, few comparative analyses have been made concerning the structure and function of cell envelopes of both. Therefore, to understand the diversity in cyanobacterial envelope, comparative genomic analysis is conducted in this paper by comparing envelope biogenetic genes between unicellular and filamentous species. As each of them occupies own ecological niche, cyanobacterial genome, the envelope structure, and environment adaptability were associated for inferring multicellular selection of cyanobacteria.
2 Comparative and Functional Genomics

The information management system
At the time of this study, 25 sequenced cyanobacterial genomes, including 21 unicellular and 4 filamentous were available for public online access into the Integrated Microbial Genomes (IMG) system provided by Joint Genome Institute (JGI) (http://img.jgi.doe.gov/cgi-bin/pub/main.cgi) [10]. Unicellular Prochlorococcus marinus MED4 and Synechocystis sp. PCC 6803, and filamentous Trichodesmium erythraeum IMS101 and Anabaena sp. PCC 7120 (also called Nostoc sp. PCC 7120) were chosen for this research. In each species, over 60% of genes have been already included into the database of Clusters of Orthologous Groups (COGs) [11] based on orthology concept [12]. In a COG under the directory of "Cell wall/membrane/envelope biogenesis," gene sequences in FASTA amino acid format were selected, exported, and downloaded in November, 2006 (as IMG version often updates, the data may change).

Gene retrieval and annotation
Quite a number of genes directly available online have only accession or gene ID, but complete description. So it was hard to know their roles in cyanobacterial envelope biogenesis. What we tried to solve the problem was to online-use software InterProScan from the EMBL of European Bioinformatics Institute (EBI) (http://www.ebi.ac.uk/ InterProScan) [13]. Unfortunately, this action alone could not provide enough information, such as the family to which the gene belongs and the impact by envelope biogenesis. Therefore, two online tools in NCBI, protein-protein BLAST (blastp), (http://www.ncbi.nlm.nih.gov/BLAST) [14] and reverse position specific BLAST (RPS-BLAST) (http://www .ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi) [15], were also used as assistants.
Putative conserved domains of the genes (without detailed description) were detected; and the genes were aligned up with other known genes, commonly with score > 80 bits and expect < 1e-10 at least. Finally, present references to the roles of particular domains or gene families involved in bacterial envelope biogenesis were combined; the unclear genes would be retrieved and annotated.

Sequence alignment and phylogenetic analysis
The sequences with similar domains were input and completely aligned using ClustalX 1.8. The produced files with " * .aln" extension were opened by BioEdit at the option of "Graphic View." The same or similar residues were highlighted in black or dark shade. In this paper, only the most conserved area of gene sequences is shown in figures.
In addition to Trichodesmium erythraeum IMS101 and Anabaena sp. PCC 7120, 15 FAS1-containing genes from other cyanobacteria, archaebacteria, eubacteria, yeast, filamentous fungi, and high plants were gained from NCBI. Sequence alignments of genes predicted for the same families were used as an input file for MEGA3 program [16]. Phylogenetic tree was built via the Neighbor-Joining (NJ) method in evaluation with 1000 rounds of bootstrapping test [17,18].

RESULTS
One hundred envelope biogenesis genes were obtained from Prochlorococcus marinus MED4, 186 from Synechocystis sp. PCC 6803, 266 from Trichodesmium erythraeum IMS101, and 294 from Anabaena sp. PCC 7120, which are shown in the "total" column in Table 1. Known constituents of cyanobacterial cell walls, including peptidoglycan, lipopolysaccharide (LPS), exopolysaccharide (EPS), outer membrane protein, and so on, were respectively synthesized under the control of different genes. Thus we might as well divide above 846 envelope biogenesis genes into 5 major types: peptidoglycan biosynthesis-related (PBR) genes, lipopolysaccharide biosynthesis-related (LBR) genes, exopolysaccharide biosynthesis-related (EBR) genes, outer membrane proteins (OMP) coding genes, and other unknown (OU) genes. The OU ones were loaded from the COG "Cell wall/membrane/envelope biogenesis;" but not enough information was available to annotate them using the methods mentioned in the section "Gene retrieval and annotation." Table 1 shows the absolute and relative amounts of classified genes from unicellular and filamentous species. The appearance of filament naturally resulted in the enlargement of genome sizes and the addition of gene numbers; however, the percentage of each type of "total" varied, too. Therefore, the percentage of EBR increased in filamentous species (EBR percentage of Trichodesmium erythraeum IMS101 and Anabaena sp. PCC 7120 was 18.0% and 21.2% respectively, compared with 15.0% of Prochlorococcus marinus MED4 and 15.1% of Synechocystis sp. PCC 6803). The percentage of other types changed simultaneously, which were discussed in detail in Section 4.

Percentage variation of peptidoglycan biosynthesis-related (PBR) genes
Being an important component of cyanobacterial envelope, peptidoglycan forms a covalently closed and net-like layer, for protecting cells against detrimental environmental influences, maintaining a high internal osmotic pressure, and serving as a barrier to transenvelope transport sometimes [19]. As the amount of envelope biogenesis gene from Prochlorococcus marinus MED4 to Anabaena sp. PCC 7120 increased, this increase was exclusively reflected on one gene family, which encodes class A high-molecular-weight penicillin binding proteins [20]. However, the percentage of PBR decreased instead. In filamentous cyanobacteria, envelope components (besides peptidoglycan) and structures could also protect the cells, such as exopolysaccharide and filamentous sheaths; so relatively fewer peptidoglycan genes were expressed.   Table 2.

Uneven increase of LBR genes in filamentous cyanobacteria
LPS also has a function of the protection, so the percentage of LBR genes of "total" decreased from unicellular to filamentous cyanobacteria, which is like PBR genes. This course is clearly expressed among Prochlorococcus marinus MED4, Synechocystis sp. PCC 6803, and Trichodesmium erythraeum IMS101. However, Anabaena sp. PCC 7120 did not obey the "trend." It expressed relatively more LBR genes than that of Trichodesmium erythraeum IMS101, which is probable due to differentiation of some cells into heterocysts, forming special N 2 -fixing cells within O 2 -producing filamentous cyanobacteria [20,21]. For nitrogen fixing, the heterocysts need extracellular LPS layers to protect oxygen invasion [22]. In terms of absolute amounts, Anabaena sp. PCC 7120 had most of the LBR genes. Interesting is that most increased genes had the common conserved domain waaG (formerly RfaG). There were 43 waaG-containing genes found in Anabaena sp. PCC 7120 (while only 5 in Prochlorococcus marinus MED4, 17 in Synechocystis sp. PCC 6803, and 24 in Trichodesmium erythraeum IMS101). The 43 genes and their multiple alignments in similar domain were shown in Table 2 and Figure 1, about 20 residues out of the 43 sequences were in common (black shading areas). These residues may have formed typical spatial structures that could be defined as active sites of waaG domain.
The waa family includes many members, such as waaP, waaY, waaA, waaT, waaO, waaQ, waaA, and waaC, and helps synthesize the LPS core oligosaccharide. At present, we only knew that the waaG product is a glucosyltransferase, and its mutation can truncated LPS at the phosphorylation sites and destabilized the outer membrane [23]. Probably, waaG can provide a selective advantage to Anabaena sp. PCC 7120.

Analysis of EBR
During the progress from unicellular to filamentous cyanobacteria, the percentage of EBR genes increased clearly but unevenly in some particular genes. Most extra genes of filamentous species belonged to the family encoding glycosyl transferase 2 that involved in many metabolic processes, mainly in the cellulose biosynthesis [24]. The common  Tables 3 and 4 and Figure 2, whereas it was only 8 times in Prochlorococcus marinus MED4 and 14 times in Synechocystis sp. PCC 6803. It is believed that certain member in the family glycosyl transferase 2 was a key enzyme synthesizing cellulose in filamentous cyanobacteria.
Fasciclin-like (FAS1) family has been identified as hemicellulose synthase in fungi and high plants [25], and it was involved in the secondary wall biosynthesis [26]. Homologues of this conserved domain, closely relative to the formation of filaments and extracellular polysaccharides, has been found in archaebacteria, eubacteria, actinomycetes, yeast, filamentous fungi, and vascular plants. It  Table 5. Phylogenetic tree of all 23 FAS1-containing genes in many species was constructed (See Figure 3). It is clear that genes in Trichodesmium erythraeum IMS101 and Anabaena sp. PCC 7120 were distant from other cyanobacteria (Synechocystis, Synechococcus, Crocosphaera, and Nostoc); and all the cyanobacterial genes were separated from fungi and plants. The FAS1-containing genes were paralogous in the Phylum Cyanobacteria.

General descriptions of 5 types of genes
In Table 1, remarkable changes could be seen from top to bottom lines, especially in columns of PBR, EBR, and OMP, which should be easily understood: to adopt better external environment and improve own nutrition status, cyanobacterial envelopes have to be modified. Adding outer membrane proteins could be a choice, as happened in Synechocystis sp. PCC 6803. From unicellular to filamentous cyanobacteria, the number of envelope biogenesis gene has increased. However, the increase was uneven, and gene duplication focused on in very few families. It is shown that in the evolution, only few families of genes expressed excessively, and they could be involved in generating novel structures and functions.

Role of waaG in filamentous cyanobacterial regulation
LPS is a characteristic component of gram-negative bacteria, which shows architecture of 3 covalently linked domains, namely hydrophobic lipid A, core oligosaccharide, and hydrophilic O-antigen [27]. In structural feature, the region of phosphorylated core oligosaccharide can be subdivided into inner and outer cores [28]. During LPS biosynthesis, waaG produces transferases, a glucosyl group from D-glucose I (Glc I) I to L-glycero-D-manno-heptose II (Hep II). The mutation of waaG destabilizes the LPS layer by interfering with core phosphorylation [23]; and the stability of LPS layer is necessary to the stabilization of heterocysts' external layers [22]. Unlike marine filamentous Trichodesmium erythraeum IMS101, Anabaena sp. PCC 7120 usually lives in freshwater or wetland, which is considered as a less stable environment than marine ecosystem, with drastic changes of temperature and light, abundant but inconstant nutrient resources and more potential hazards. Anabaena sp. PCC 7120 is also able to produce heterocysts to fix N 2 and actively adapt environment, making itself more mutable than in the ocean environ-ment. Over-expression of waaG homologous genes would help stabilize the heterocysts, and improve the N 2 -fixing in Anabaena sp. PCC 7120.

Relation between EBR and cyanobacterial evolution
Cyanobacterial filaments were made up of mainly diverse polysaccharide molecules, containing cellulose and matrix polysaccharide. Most of the genes are from the glycosyl transferase 2 (GT2) family. In model plant Arabidopsis thaliana, over 10 members of the family catalyze glucan-chain elongation in cellulose, and they belong to the group of genes encoding catalytic subunit of cellulose synthase (CESA) [29]. Since cellulose and other EPS were also the main components of cyanobacterial filamentous sheath, the GT2 family may play a vital role in the formation of filaments. In the meanwhile, these results could further prove that the cellulose produced by cyanobacteria is, at least one of, the earliest origins of the most abundant biopolymer on the earth today [30]. At present, a little is known about the matrix polysaccharide (hemicellulose, pectin, and so on) in cyanobacteria. Surprisingly, several matrix polysaccharide biogenesis genes or their homologues were discovered in this study. The phylogenetic tree (see Figure 3) shows that the genes of fasciclin-like (FAS1) family are duplicated in evolution among different cyanobacteria, suggesting that the FAS1 family occurred after the branch point where cyanobacteria separated from other archaic species but before the divergence of different cyanobacteria. The family is very rare in oceanic unicellular cyanobacteria, but in filamentous Anabaena, Nostoc, and Trichodesmium, it cannot be neglected. Large difference in content of the family between unicellular and filamentous cyanobacteria implied the family's contribution to filament formation, which provides us a clue to understanding the evolution of cyanobacteria. In addition to major components of typical gramnegative bacteria, the existence of EPS (mainly refers to cellulose and hemicellulose) in cyanobacteria is significant. Therefore, peptidoglycan, LPS, EPS, and outer membrane proteins become 4 major components of cyanobacterial envelopes. Over 93% of biogenesis genes of each cyanobacterial envelope were placed in correct place, leaving only <7% of   Table 5.

Species selection and gene classification
other unknown genes, showing that the classification is scientifically acceptable and also practical. However, problem still remains as it is difficult to eliminate error or misplacement until all cyanobacterial genes are correctly annotated. For instance, some LBR coding proteins were localized in the outer membrane; so these LBR genes can also be considered as OMP genes. Therefore, the genes of the OMP defined in this paper represented mostly those genes whose expressing products are located in outer membrane and carry out functions other than the biosynthesis by peptidoglycan, LPS, and EPS.
Moreover, previous reports believed that cyanobacterial cell wall did not contain teichoic acid [3], but the gene alr4011 in Anabaena sp. PCC 7120 put the issue in argument. The amino acid sequence of alr4011 has a conserved domain DltE that is a short-chain dehydrogenase involved in the teichoic acid synthesis [31]; and alr4011 showed great similarity to the gene dltE in gram-positive Bacillus subtilis (146 bits [Expect = 7e-34]). No DltE-containing gene was found in Prochlorococcus marinus MED4, Synechocystis sp. PCC 6803, or Trichodesmium erythraeum IMS101. A possible explanation is that alr4011 was transferred horizontally from gram-positive bacteria, or that the gene was regulated via a special pathway to produce another envelope constituent but teichoic acid. Whether teichoic acid exists in cyanobacterial envelopes is currently an open question that needs further research and experiment.