Small proteins in cyanobacteria provide a paradigm for the functional analysis of the bacterial micro-proteome

Background Despite their versatile functions in multimeric protein complexes, in the modification of enzymatic activities, intercellular communication or regulatory processes, proteins shorter than 80 amino acids (μ-proteins) are a systematically underestimated class of gene products in bacteria. Photosynthetic cyanobacteria provide a paradigm for small protein functions due to extensive work on the photosynthetic apparatus that led to the functional characterization of 19 small proteins of less than 50 amino acids. In analogy, previously unstudied small ORFs with similar degrees of conservation might encode small proteins of high relevance also in other functional contexts. Results Here we used comparative transcriptomic information available for two model cyanobacteria, Synechocystis sp. PCC 6803 and Synechocystis sp. PCC 6714 for the prediction of small ORFs. We found 293 transcriptional units containing candidate small ORFs ≤80 codons in Synechocystis sp. PCC 6803, also including the known mRNAs encoding small proteins of the photosynthetic apparatus. From these transcriptional units, 146 are shared between the two strains, 42 are shared with the higher plant Arabidopsis thaliana and 25 with E. coli. To verify the existence of the respective μ-proteins in vivo, we selected five genes as examples to which a FLAG tag sequence was added and re-introduced them into Synechocystis sp. PCC 6803. These were the previously annotated gene ssr1169, two newly defined genes norf1 and norf4, as well as nsiR6 (nitrogen stress-induced RNA 6) and hliR1(high light-inducible RNA 1) , which originally were considered non-coding. Upon activation of expression via the Cu2+.responsive petE promoter or from the native promoters, all five proteins were detected in Western blot experiments. Conclusions The distribution and conservation of these five genes as well as their regulation of expression and the physico-chemical properties of the encoded proteins underline the likely great bandwidth of small protein functions in bacteria and makes them attractive candidates for functional studies.


Background
Proteins with less than 80 amino acids in prokaryotes or 100 amino acids in eukaryotes are defined as short proteins (μ-proteins). During standard genome annotation these short protein-coding genes are frequently neglected and proteomics-based analyses fail to detect this class of peptides routinely. As a result, μ-protein-coding genes are a systematically underestimated class of gene products.
In strong contrast is the finding that small ORFs constitute the most frequent essential genomic component in bacteria, even more than conventional ORFs [1]. Indeed, the functional characterization of selected examples of μ-proteins has revealed their critical involvement in processes such as quorum sensing or interspecies communication [2], regulatory functions [3][4][5][6] and in the formation of multi-subunit protein complexes. An increasing number of μ-proteins is being discovered also in eukaryotes [7][8][9][10], and archaea [11], indicating their ubiquity in all three domains of life. Nevertheless, the likely diverse functions of short proteins are largely unknown, even for simple unicellular bacteria.
We chose 5 examples from Synechocystis 6803 for experimental analysis. These were norf1 and norf4 (for novel orf 1 and 4, [22]), nsiR6 and hliR1 (for nitrogen stress-induced RNA 6 and high light inducible RNA 1), the latter two transcripts originally considered noncoding [33] as well as the short gene ssr1169, which was predicted as protein-coding in the current version of the genome sequence [NCBI reference NC_000911]. All five proteins could be detected after FLAG tagging in vivo. Their modes of regulation, conservation and physicochemical properties make these five μ-proteins interesting candidates for functional studies.

Strains and growth conditions
Synechocystis 6803, substrain "PCC-M" [34], served as WT and was grown in Cu 2+ -free, TES-buffered (20 mM, pH 8.0) liquid BG11 medium [35] with gentle agitation or on agar-solidified (0.9% [w/v] Kobe I agar, Roth, Germany) BG11 supplemented with 0.3% (w/v) sodium thiosulfate at 30°C under continuous illumination with white light of~40 μmol photons m −2 s −1 . To induce expression of FLAG -tagged μ-proteins from the Cu 2+ -responsive petE promoter [36] 2 μM CuSO 4 was added to exponentially growing cells. Different environmental conditions were applied for induction of gene expression under control of native promoters: (i) high light, 300 μmol photons m −2 s −1 ; (ii) dark, flasks wrapped with aluminium foil; (iii) nitrogen deficiency, cells were pelleted by centrifugation, washed once and resuspended in NO 3 − -free BG11. Samples for protein extraction were taken just before and 6 h (Norf1, HliR1) or 24 h (NsiR6, Norf4) after induction of gene expression. Ssr1169 was expected to be most expressed in exponential growth phase, hence samples were taken from exponentially growing cells at two consecutive days. Synechocystis 6803 strain pUR-PpetJ-3xFlag-sfGFP [37] was used as positive control for the detection of FLAG-tagged proteins by Western blots. E. coli strains TOP10F' and J53/ RP4 were used for generating Synechocystis 6803 mutant strains by conjugation. In liquid BG11 medium 5 μg ml −1 gentamicin or 50 μg ml −1 kanamycin and 5 μg ml −1 gentamicin were used to maintain recombinant strains (see below).
For examination of gene expression by Northern blot analysis, exponentially growing WT cells were transferred to the different environmental conditions described above. Cultivation under high light was followed by a shift back to standard light conditions (40 μmol photons m −2 s −1 )). Cultures grown in the dark as well as nitrogen deprived cultures were additionally aerated with ambient air through a glass tube and a sterile filter for constant and fast growth.

Computational methods
Small ORFs and their orthologs were identified and annotated in Synechocystis 6803 and 6714 in three steps.

BlastN searches returning hits with E values ≤1e −2
were performed against the NCBI nt database [38] for all intergenic regions covered by TUs [20,21]. From the blast results, multiple alignments were created with ClustalW [39] and analyzed for their coding potential with RNAcode [40]. The significant (p ≤0.05) small ORF candidates were manually curated. 2. To annotate candidate small ORFs, blastP queries with E values ≤1e −2 were done against the NCBI nr database [38].

Orthologs of existing and newly detected small
ORFs were identified in Synechocystis 6803 and 6714 via a reciprocal best hit approach using blastP with a minimum E value ≤1e −2 and allowing a difference in length of ≤20% and a maximum length of 80 amino acids in both strains.
Genes of small ORFs that were covered by a predicted TU were considered to be expressed. Transmembrane helices were predicted with TMHMM Server v. 2.0 [41].

RNA extraction and analysis
Synechocystis 6803 cells were harvested by vacuum filtration on hydrophilic polyethersulfone filters (Pall Supor®-800, 0.8 μm), immediately immersed in 1 ml PGTX [44] and frozen in liquid nitrogen. RNA extraction was performed by 15 min incubation at 65°C followed by chloroform washing and isopropanol precipitation as previously described [45]. Northern hybridization with 32 P-labelled, single-stranded transcript probes was carried out as described [46]. Oligonucleotide sequences for PCR amplification of probe templates used for in vitro transcription are listed in Table 1.

Protein purification and immunodetection
Cells for protein extraction were collected by centrifugation (4000 × g, 10 min, 4°C), resuspended in PBS buffer (137 mM sodium chloride, 2.7 mM potassium chloride, 10 mM disodium phosphate, 1.8 mM potassium dihydrogen phosphate, pH 7.4) in the presence of protease inhibitor cocktail (cOmplete, Roche) and immediately frozen in liquid nitrogen. Cells were mechanically disrupted by using glass beads (diameter 0.1-0.25 mm) and a Precellys® 24 homogenizer (Bertin Technologies) at 6000 rpm and 4°C applying six cycles of 3 × 10 s homogenization. Glass beads were removed by centrifugation (1000 × g, 1 min, 4°C). To solubilize membrane proteins, samples were heated for 30 min at 50°C with 2% SDS (w/v) followed by determination of the protein concentration using Direct Detect Spectrometer (Merck Millipore).
For immunoblot analysis, separated proteins were transferred to nitrocellulose membranes (Hybond™- ECL, GE Healthcare). Membranes were blocked over night at 4°C with 5% low fat milk powder in TBS-T and subsequently probed with monoclonal ANTI-FLAG® M2-Peroxidase (HRP) antibody raised in mouse (Sigma-Aldrich) in TBS-T for 1 h at room temperature in the dark. All washing steps were performed with gentle agitation in TBS-T (20 mM Tris pH 7.6, 150 mM NaCl, 0.1% (v/v) Tween-20) at room temperature. Signals were detected with ECL™ start Western blotting detection reagent (GE Healthcare) on a chemiluminescence imager system (Fusion SL, Vilber Lourmat) and subsequently visualized using FUSION-CAP (Vilber Lourmat) and Quantity One software (BIO-RAD).

Reporter gene assays
To measure promoter activity as a function of bioluminescence the putative norf1 promoter sequence and its 5′ UTR (−328 to +137, TSS at +1) was fused to promoterless luxAB reporter genes by PCR, followed by cloning into the promoter probe vector pILA as described [47]. The resulting pILA derivative was used for transformation of a Synechocystis 6803 strain expressing the lux-CDE genes encoding enzymes for the synthesis of decanal, the luciferase substrate, under control of the strong promoter of the ncRNA Yfr2a [48].
Cells were grown in the presence of 10 mM glucose to provide energy for the luciferase reaction also in darkness. Bioluminescence was measured in vivo at different time points after inducing dark conditions as described [47].

Comparative transcriptomics for the identification of μproteins in Synechocystis
The extensive comparative transcriptome and genome information for the model cyanobacterium Synechocystis 6803 [21,22] and the closely related strain Synechocystis 6714 [20,32] was utilized for the prediction of possible μ-ORFs. In our previous studies [20,21] transcriptional units (TUs) had been defined, combining information on the transcriptional start sites, the lengths of transcribed UTRs, operons, coding and non-coding regions.
Here we judged all possible non-coding transcripts by the program RNAcode [40] for their protein-coding potential. RNAcode detects protein-coding regions in any given sequence on the basis of multiple sequence alignments and the evolutionary signatures that are associated with a coding sequence [40]. After combination with the pre-existing annotation, this analysis led to the prediction of 293 potential small proteins with a maximum of 80 amino acids in Synechocystis 6803 and possibly 773 in Synechocystis 6714 (Fig. 1).
The resulting sets of candidate μ-proteins were compared against the predicted proteome of the respective other Synechocystis strain, against E. coli and the higher plant Arabidopsis thaliana as reference organisms for proteins possibly conserved among bacteria or among photosynthetic organisms. This procedure led to the identification of 146 μ-proteins shared between the two Synechocystis strains, as well as 42 and 29 μ-proteins which are shared between Synechocystis 6803 and A. thaliana or E. coli, respectively. Interestingly, we found the 42 proteins shared with higher plants to be identical in both Synechocystis strains. In contrast to observations in other bacteria, a relatively high number of the predicted proteins in the smallest fraction (≤50) had assigned functions (e.g., in photosynthesis) and a matching protein in the higher plant Arabidopsis thaliana or in E. coli (Table 2).

In vivo tagging and detection of cyanobacterial μproteins
We chose 5 examples for closer analysis: Norf1, NsiR6, HliR1, Ssr1169 and Norf4. Norf1 and Norf4 were previously defined based on transcriptomic evidence [22]. The protein Ssr1169 was previously modelled as part of the existing annotation, but there is no information on possible functions nor that their very existence was shown thus far. NsiR6 and HliR1 are not annotated in the genome but were found by transcriptomics [21,33]. Although these RNAs harbor potential open reading frames they were initially indicated as non-coding. After FLAG -tagging and inducing their expression in Synechocystis 6803, all five proteins were detected by Western blotting (Fig. 2). HliR1 and Ssr1169 showed a tendency for aggregation, even under the used denaturing conditions, possibly related to their  [20][21][22] and their coding potential evaluated with RNAcode [40]. This information was merged with the pre-existing annotation [32]. Orthologs between the two small ORF populations were detected when they were identified as reciprocal best hits (RBH) by blastP with e ≤ 1e −2  Table 3).
The NsiR6 transcript is highly induced upon nitrogen deprivation NsiR6 was not previously known as a protein-coding gene. Its mRNA originates from a TSS at position 729645f in the chromosome of Synechocystis 6803 (Fig. 4a, data extracted from reference [21]). Previously, we introduced the UEF (unique expression factor) to identify genes whose expression was enhanced at a single from ten tested environmental conditions [21]. This factor gives the ratio of the transcriptome read counts for the condition with the highest and the one with the second highest expression for a single TU. Thus, TUs with a high UEF respond strongly to a particular stimulus. For NsiR6, the UEF was 9.65, ranking on position 4 of the most-strongly induced genes, both in Synechocystis 6803 as well as in strain 6714 [20,21], when the cells were deprived of sources of combined nitrogen (Fig. 3). This induction was confirmed by independently performed Northern blots, indicating a rapid induction of expression, reaching a peak at 6 h with an about 10-fold higher transcript accumulation, followed by a declining abundance which remained higher than at the beginning of the experiment (Fig. 4b and c). The nitrogen-stress-dependent induction is likely mediated via a conserved NtcA binding site 5′-GTAacatttgtGAC-3′, centered 42 nt upstream the transcription initiation site in both strains (Fig. 4a). NtcA-binding sites frequently overlap the −35 promoter region and are centered close to position −41.5 with respect to the TSS when they mediate activation [23,49]. Homologs of NsiR6 are widely conserved throughout the cyanobacterial phylum and in the Paulinella chromatophora chromatophore genome, consistent with its occurrence in the genomes of αcyanobacteria, but not in any other bacteria or plants. The alignment of these homologs shows two pairs of conserved cysteine residues which might be involved in redox control, protein-protein interactions or structure formation (Fig. 4d). Two pairs of cysteine residues occur also in another short protein, the 70 amino acid CP12 protein, which mediates the formation of a complex between glyceraldehyde-3-phosphate dehydrogenase and phosphoribulokinase in response to changes in light intensity, characterizing it as a thioredoxin-mediated metabolic switch [50]. In CP12, the cysteine pairs confer the redox input via post-translational thiol-disulfide bridge conversion. The arrangement 'CPVC' of the first cysteine pair The start and end positions according to the chromosomal or plasmid sequences in Genbank files (accessions NC_000911, AP004311 and AP004310), the location (S) on the forward (+) or reverse strand (−) and respective length (L; in amino acids) are given, followed by the locus tag ID, gene name and product if assigned. Location on chromosome or one of the plasmids is prefixed by "Chr" or the name of the plasmid. The existence of homologs in Synechocystis 6714, A. thaliana and E. coli is indicated by "Y" for yes or "N" for no. Homologs tagged and detected in this study are highlighted in boldface letters. Names of genes tested in this work are in boldface  Table 3. Two gels were run in parallel. a Proteins (30 μg) were separated on a 15% (w/v) SDS polyacrylamide gel and subjected to colloidal Coomassie G-250 staining as a loading control. b Immunoblot with the same loading order probed with specific ANTI-FLAG® M2-Peroxidase (HRP) antibody   Fig. 4 The NsiR6 peptide. a Transcriptomic datasets indicated high read coverage in a region without annotation in Synechocystis 6803 [21], which contains the here defined nsiR6 gene. The homolog in Synechocystis 6714 is D082_18940 [20]. Shown is the read coverage (grey) resulting from previous transcriptome analysis, including the respective transcriptional units (TU) defined in that work [20,21] Fig. 4d) matches the C-(X) 2 -C motif, which frequently is involved in metal-binding [51]. Hence, the putative cysteine pairs in NsiR6 may confer redox control or metal binding.

Norf1 is highly induced upon dark incubation
Norf1 is specific for cyanobacteria but widely conserved throughout this phylum. It is present in 138 (68%) of 202 cyanobacterial genomes available in the JGI database [52] (blastP + tblastN, E value ≤1e −5 ). Homologs are lacking in early-branching cyanobacteria such as Gloeobacteria and thermophilic Synechococcus JA-2-3B'a(2-13) and JA-3-3Ab and also in marine picocyanobacteria. An alignment of representative homologs is shown in Fig. 5a. Strong accumulation of the norf1 mRNA was observed in response to darkness (Fig. 5b). The UEF for this condition was 2.66 in Synechocystis 6803, but the gene was expressed also under the other tested conditions (Fig. 3) [21]. To examine whether the dark-related expression of norf1 is under transcriptional control, we conducted reporter gene assays. The upstream sequence of Synechocystis 6803 norf1 was fused to luxAB reporter genes encoding luciferase, and expression was monitored as bioluminescence in vivo. Indeed, the promoter activity showed a positive response after transfer into darkness as seen for the mRNA accumulation ( Fig. 5b and c). We conclude that the observed induction of norf1 in response to shifts from light exposure to darkness is under transcriptional control.
The high expression of the norf1 gene in darkness sets it apart from the vast majority of genes. Among the previously tested 10 different growth conditions, in Synechocystis 6803 only 70 out of 4091 TUs and in Synechocystis 6714 only 57 out of 4292 TUs defined in total had their maximum expression after dark incubation [20,21].
The Norf4 μ-protein is highly conserved and its mRNA overlaps the gap1 gene Norf4 is encoded within a TU much longer than is needed to encode the 31 amino acids: TU1188 in Synechocystis 6803 is 704 nt and TU3474 in Synechocystis 6714 is 534 nt (Fig. 6a). These TUs partially overlap the gap1 gene encoding glyceraldehyde 3-phosphate dehydrogenase 1 on the complementary DNA strand. As a result, these TUs overlap the gap1 mRNA by 702 and 373 nt, respectively. Transcriptomic evidence suggested that both the gap1 and the norf4 mRNAs were co-regulated with each other, with a mild upregulation upon the removal of nitrogen (Fig. 3). Thus, the norf4 transcript does not function as an antisense RNA with a co-degradation function, which was observed previously for other pairs of overlapping transcripts in Synechocystis 6803 [53,54]. However, co-regulation between an asRNA and its cognate mRNA was previously observed for the psbA asRNA protecting its 5′ leader from RNase Emediated degradation [55]. The expression of norf4 was stimulated upon removal of nitrogen, but its expression was detectable under most of the previously tested conditions, although at a lower level (especially low in darkness and after heat stress; Fig. 3). Dual-function RNAs are transcripts which assume a regulatory function as sRNA and additionally act as short protein-coding mRNA. Exploring this possibility for norf4, we checked the accumulation of norf4 transcripts during the removal of combined nitrogen. Northern blot analysis showed the existence of a prominent Synechocystis 6803 is strongly upregulated after transfer to darkness. c Bioluminescence of a Synechocystis 6803 reporter strain harboring a transcriptional fusion of Pnorf1 (−328 to +137, TSS at +1) and luxAB genes in response to transfer to darkness. Representative bioluminescence dataset indicating means ± SD of measurements for two biological replicates (= independent transformants). A strain carrying a promoterless luxAB was used as a negative control (measured in two independent cultures each) transcript of~200 nt which declined initially (Fig. 6b). Due to the localization of the RNA probe used in the detection of norf4 transcripts, this prominent transcript corresponds to the coding part of TU1188. However, with increasing duration of the nitrogen stress, we noticed the overaccumulation of a longer transcript, of about 600-800 nt that appeared more diffuse (Fig. 6b). Quantitative analysis of transcript accumulation showed that this longer norf4 transcript was only transiently accumulated, with a peak at the 24 h time points (Fig. 6c).
With very few amino acid substitutions, Norf4 is extremely conserved, including a predicted transmembrane region (Fig. 6d). Homologs can be detected in 51 cyanobacterial genome sequences from all 5 morphological subsections, comprising free-living unicellular as well as multicellular strains, marine and freshwater isolates, thermophiles and symbionts. The presence of norf4 in the two available genome sequences of Candidatus Atelocyanobacterium thalassa suggests their positive selection in these highly streamlined genomes [56,57]. However, homologs are lacking in α-cyanobacteria, which are mainly marine Synechococcus and Prochlorococcus. The homologs from the two used Synechocystis strains are identical, except for a possible N-terminal extension by 13 amino acids in Synechocystis 6714 (Fig. 6d). However, such extensions appear questionable also in other strains, because the start codon corresponding to the Synechocystis 6803 ORF is 100% conserved. Moreover, the homologs in 12 Microcystis genomes Fig. 6 The Norf4 peptide. a Datasets from the previously performed primary transcriptome analysis showed that Norf4 expression responded positively to nitrogen depletion in both Synechocystis 6803 [21] and Synechocystis 6714 [20], when 11 different growth conditions were tested. The mRNAs of norf4 and gap1 are co-regulated and overlap by several hundred nt. The previously mapped transcriptional start sites are labelled by black arrows. b Northern blot analysis of norf4 mRNA accumulation in a time course experiment up to 72 h after the removal of nitrogen. The same RNA samples were used as in Fig. 4. c The signals obtained from the Northern blots (panel b) were evaluated densitometrically after normalization by the level of 5S rRNA. The relative norf4 expression is shown with respect to the maximum expression after transfer to nitrogenfree conditions (24 h = 100%). The bands at 200 nt (filled circles) and at 800 nt (empty circles) were analyzed separately from each other. d Multiple sequence alignment of 35 homologs from 51 different cyanobacterial genome sequences (homologs are identical among 12 Microcystis, five Crocosphaera watsonii and two Fischerella genome sequences) are identical to each other, as are the homologs in five Crocosphaera watsonii and in two Fischerella genome sequences.
Our data suggest that Norf4 is a previously unknown membrane-bound μ-protein and that the norf4 transcript may play a dual role, with a mainly coding function during nitrogen-sufficient conditions and a possibly RNAmediated regulatory function on the gap1 mRNA during nitrogen stress.

HliR1 and Ssr1169
HliR1 was chosen because of its very high induction under high light (UEF of 5.47) and the gene location upstream of sodB encoding superoxide dismutase. Whereas the homologs from the two Synechocystis strains are conserved in length, sequence (2 substitutions over 35 amino acids) and the likely presence of a transmembrane region (Fig. 7a), no possible homologs were detected beyond the genus Synechocystis. The location upstream of sodB and the shape of the read coverage in transcriptome analysis (Fig. 7b) suggested a possible link between the two genes. Indeed, Northern analysis confirmed the inducibility by high light (Fig. 7c and d) and in addition showed the presence of two major transcripts,~450 and 1400 nt in length. The longer form should encompass also the complete sodB gene. Thus, transcription from the upstream located hliR1 promoter will lead by readthough to an enhanced sodB gene expression under high light. Hence, it is tempting to speculate, that HliR1 is a membrane-bound peptide with a regulatory function on the superoxide dismutase.
The previously annotated short gene ssr1169 was chosen because of its expression under several different conditions (Fig. 3) and its physicochemical characterization as a hydrophobic protein. Features of all 5 investigated μ-proteins are summarized in Table 3.
Homologs of Ssr1169 are frequently encoded by a small gene family and exist in plants (best homolog in A. thaliana: Low temperature and salt responsive protein, gi|15223610|ref|NP_176067.1|, E value 3e -11; Table 2; Fig. 8), in E. coli (gi|446430313|ref|WP_000508168.1|, E value 3e −8 ; Table 2) and in many other bacteria and other eukaryotic organisms, including yeast and C. elegans. Expression of the homologs RCI2A and RCI2B in A. thaliana became induced upon exposure to low temperature, dehydration, salt stress, or abscisic acid [58]. Ssr1169 homologs possess two transmembrane helices (Fig. 8) that form a Pmp3 domain and might be a stress induced proteolipid membrane modulator.
All five μ-proteins can be expressed from their native promoters in a regulated fashion In the previous sections we verified the transcription of the five selected μ-protein encoding genes (Figs. 3, 4, 5, 6 and 7) as well as their translation from an mRNA harboring the regulatory sequence elements (e.g. ribosome Fig. 7 The HliR1 peptide in Synechocystis 6803. a Pairwise sequence alignment of the HliR1 peptides from Synechocystis 6803 and Synechocystis 6714. A predicted transmembrane region is boxed. b Data replotted from the primary transcriptome analysis of Synechocystis 6803 suggest that HliR1 expression is induced by high light and that transcripts may extend into the subsequent TU1649 covering the sodB gene [21]. c Northern analysis of hliR1 mRNA accumulation upon transfer to high light (HL) or normal light (NL). d Quantification of the hliR1 mRNA accumulation shown in panel c after normalization to the 5S rRNA level. Relative values refer to the maximum level at 0.5 h after HL shift (=100%) binding site) of the petE gene (Fig. 2). However, despite verifying a stable accumulation of the translated protein the latter approach renders the possibility of translating all RNAs as long as they contain an open reading frame. To exclude this possibility, we repeated the experiment from Fig. 2 but placed all five FLAG-tagged μ-ORFs under control of their own, native promoter and 5′UTRs. After introduction of these constructs into Synechocystis 6803 we subjected the resulting cultures to an inducing condition according to the transcriptome analysis. Samples from cultures grown at standard conditions or the inducing conditions were taken and analyzed by Western blot experiments (Fig. 9). The results showed unambiguously the expression of all five μ-proteins when placed under control of their own promoters and 5′ UTRs, i.e., their expression was not artificially induced by the ectopic fusion of their ORFs to the petE promoter and 5′ UTR. We noticed a strong upregulation of NsiR6 accumulation 24 h after transfer to nitrogen starvation and of HliR1 accumulation 6 h after exposure to high light as well as a mild upregulation of Norf4 accumulation 24 h after transfer to nitrogen starvation (Fig. 9). The accumulation of Norf1 increased somewhat 6 h after the shift to darkness. These data show that the observed regulation of gene expression at RNA level has a strong effect on the amounts of three of the respective proteins and a milder on one of the other two.

Discussion
For Synechocystis 6803 alone, more than 50 independent proteomic studies identified a total of 2967 proteins at least once (reviewed by Gao et al., [59]), representing 80.8% of the entire predicted proteome. However, the percentage of identified proteins was only 34.4% for small proteins (<100 aa) of high hydrophobicity [59]. In addition, as we show in this study, very short proteincoding genes might not even be modelled and annotated at all. Thus, due to the challenges in their identification and biochemical detection, μ-proteins were in the past either not detected or were ignored. However, systematic genome-wide approaches have recently reported an increasing number of μ-proteins in pro-and eukaryotes [8,10,11,19,60]. Besides the short ORFs within 5′ leader and 3′ trailer sequences of mRNAs, known for a long time [61][62][63][64][65], μ-peptides were recently also described to originate from long ncRNAs, i.e. transcripts, which were previously assumed to be non-coding [60,66].
In E. coli approximately 60 genes encoding μ-proteins have previously been reported [67]. Expression profiling showed that many μ-proteins accumulate under specific growth conditions or are induced by stress [68]. A particular group of small proteins are toxic due to their integration into the cell membrane as peptide component of a type I toxin-antitoxin system [69][70][71]. In the cyanobacterium Synechococcus elongatus, four small secreted proteins have been suggested to be involved in biofilm development [72]. Small proteins of the type II toxinantitoxin category in Synechocystis 6803 have been catalogued separately [73] but the majority of them are somewhat larger than the here considered μ-proteins.
Here, we found 293 candidate genes for small proteins ≤80 amino acids in the model cyanobacterium Synechocystis 6803 and demonstrate the synthesis of five examples by C-terminal FLAG-tagging and immune detection. Three of these five small proteins are predicted to contain one or two transmembrane helices (Table 3), placing them in the category of proteins that are particularly challenging to verify by proteomic approaches [59]. Hence, our list of predicted proteins provides a solid basis for functional studies.
Regulated expression suggests involvement in stress adaptation for some of the here investigated small proteins. This applies especially to HliR1, NsiR6 and Norf1, whose expression is activated in response to high light, nitrogen stress or transfer into darkness (Figs. 3, 4, 5 and 9).
The fact that some of the here described proteins are part of TUs much longer than needed points to the possibility that some of them could constitute dual function RNAs. Such dual-function RNAs that in addition to their role as a regulatory RNA molecule also encode a Fig. 8 Sequence alignment of Ssr1169 homologs from cyanobacteria with those from Arabidopsis thaliana, Desulfococcus oleovorans Hxd3, and identical proteins in 4 strains of Rhodospirillum rubrum. Putative transmembrane domains were predicted using TMHMM v. 2.0 [41] and are boxed functional peptide, have been identified in bacteria. A prominent example for a dual function RNA is the 43 amino acid peptide SgrT encoded in the 5′ region of the E. coli SgrS transcript, which regulates the glucose transporter PtsG at protein level, whilst the SgrS 3′ region contains a regulatory domain that targets the ptsG mRNA by base-pairing [74].
In Bacillus subtilis, SR1 is a highly conserved dualfunction sRNA that acts as a base-pairing regulatory RNA on the ahrC mRNA (encoding AhrC, the transcriptional activator of arginine catabolic operons) and in addition encodes the 39 amino acid peptide SR1P. Interestingly, this peptide binds GapA (glyceraldehyde-3-phosphate dehydrogenase), thereby stabilizing the gapA operon mRNA [75,76]. In analogy, it is interesting to note that the here described cyanobacterial Norf4 μprotein overlaps the gap1 mRNA and appears to be coregulated with it.
The high total numbers of predicted μ-ORFs, together with the distribution, conservation, regulation of gene expression and the physicochemical properties of the five examples studied here in more detail, underline the likely great bandwidth of small protein functions in bacteria and makes them attractive candidates for functional studies.

Conclusions
Synechocystis 6803 is a widely used model cyanobacterium that possess with 44 genes encoding small proteins ≤50 amino acids and potentially 293 proteins ≤80 amino acids a high number of such μ-ORFs. These numbers are certainly no overestimation: due to the previous extensive work to elucidate all subunits of the photosynthetic apparatus, 52% of the small proteins ≤50 amino acids have a known function. This sets the small proteome of cyanobacteria apart from that of other bacteria: in addition to the 19 photosynthesis-related small proteins only five other in the size category ≤50 are functionally annotated (NdhP,NdhQ, RpL34, Rpl36 and a VapC toxin homolog). Hence, about half of the predicted small proteins are uncharacterized. When analysing small proteins up to 80 aa, we found 235 of the 293 predicted small proteins (80%) without annotation. The experimental results and expression data for the five here selected proteins (three ≤50 aa and another two larger, but ≤70 aa) underline that it is worthwhile to study small protein functions directly in cyanobacteria. The here provided data and strains will be useful for such studies in a systematic way.