Specific interaction of zinc finger protein Com with RNA and the crystal structure of a self-complementary RNA duplex recognized by Com

The bacteriophage Mu Com is a small zinc finger protein that binds to its cognate mom mRNA and activates its translation. The Mom protein, in turn, elicits a chemical modification (momification) of the bacteriophage genome, rendering the DNA resistant to cleavage by bacterial restriction endonucleases, and thereby protecting it from defense mechanisms of the host. We examined the basis of specificity in Com–RNA interactions by in vitro selection and probing of RNA structure. We demonstrated that Com recognizes a sequence motif within a hairpin-loop structure of its target RNA. Our data support the model of Com interaction with mom mRNA, in which Com binds to the short hairpin structure proximal to the so-called translation inhibition structure. We also observed that Com binds its target motif weakly if it is within an RNA duplex. These results suggest that the RNA structure, in addition to its sequence, is crucial for Com to recognize its target and that RNA conformational changes may constitute another level of Mom regulation. We determined a crystal structure of a Com binding site variant designed to form an RNA duplex preferentially. Our crystal model forms a 19-mer self-complementary double helix composed of the canonical and non-canonical base pairs. The helical parameters of crystalized RNA indicate why Com may bind it more weakly than a monomeric hairpin form.


Introduction
RNA-binding proteins play important roles at every stage of RNA life cycle: transcription, splicing, editing, export, degradation and regulation of translation [1]. Many  RNA molecules using RNA binding domains (RBDs), which exhibit a wide variety of structural forms as well as mechanisms of substrate recognition and binding. One of the most abundant RNA-binding domain types is a zinc-finger (ZnF) [2][3][4], which belongs to a large class of protein domains that stabilize their structures by tightly bound zinc ions [2]. Amongst them are classical ZnFs that were first discovered as DNA-binding domains [5]. Later on, it has been demonstrated that ZnFs also act as RNA or protein binders. ZnF-containing proteins can use just one ZnF domain to recognize and bind their substrate, e.g., GAGA-DBD [6] and SUP [7], or they can employ a whole array of ZnF domains, such as in TFIIIA [5] and WT1 [8]. ZnF domains that recognize and bind RNA substrates can do it in different ways. The binding usually involves hydrogen bonds, made by either ZnF's backbone or side chain functional groups, and stacking interactions [2]. RNA binding by ZnFs can be either sequence-specific or nonsequence-specific, and in case of proteins containing ZnF arrays, a combination of both manners is possible. Sometimes the structure of the RNA substrate is important for binding; for instance ZRANB2s are single-stranded RNA-binding domains [9], JAZ preferentially binds to double-stranded RNA or RNA/DNA hybrids [10,11], whereas some ZnF modules of transcription factor TFIIIA recognize RNA bases that are in the 'flipped-out' conformations [2,12]. The bacteriophage Mu Com-RNA complex can be considered one of the model systems for studying the interaction of a ZnF domain with RNA. The Com protein consists of 62 amino acids, including an N-terminal CCCC zinc finger module where four cysteine residues are involved in zinc ion coordination, and a C-terminal intrinsically disordered segment [13,14]. Com regulates the expression of the Mom system (Fig 1), responsible for chemical modification of the phage DNA and phage genome protection against a wide variety of bacterial restriction endonucleases [15]. So far, it has been proposed that Com targets the RNA hairpin-loop structure upstream to its cognate Mom mRNA translation start site, contributes to the changes in the mRNA secondary structure of the so-called translation inhibition structure (TIS) and, consequently, to the exposition of the translation start signals [15][16][17].
In this work, we aimed at establishing the preferred RNA sequence recognized by Com and determining the structural basis of Com-RNA interactions. We attempted to co-crystallize Com with its natural RNA target, as well as with its variants designed computationally to form either a monomeric hairpin-loop structure or a homo-duplex form. Thus far, we obtained crystals and solved the structure of an RNA homo-duplex form. While the high-resolution structure of the Com-RNA complex remains to be determined, we propose a low-resolution structural model of Com-RNA interactions based on the available experimental data.

Cloning, expression, and purification of Com
The Haemophilus sputorum Mu-like prophage Com gene sequence (GI:400376712) was optimized for Escherichia coli expression and synthesized by GeneArt Gene Synthesis (Thermo Fisher Scientific). The gene was subcloned into prokaryotic expression pGEX4T vector (GE Healthcare Life Sciences). E. coli BL21(DE3) (New England Biolabs) was used to overexpressed the glutathione S-transferase (GST) fusion-tagged Com that included a two-residue linker (S-H) between the GST tag and Com. Expression was carried out in LB medium, induced with 1 mM isopropyl-D-1-thiogalactopyranoside solution (IPTG) at OD 600 of 0.6 and conducted at 37˚C with shaking at 200 rpm for 4 hours. The LB medium was supplemented with 100 μM ZnSO 4 (Sigma-Aldrich) upon induction. The fusion protein was purified by GST affinity with Glutathione-Agarose beads (Sigma-Aldrich) according to the manufacturer protocol and stored in +4˚C.

SELEX
The Systematic Evolution of Ligands by Exponential enrichment (SELEX) was used to determine the specificity of Com binding with RNA. SELEX was carried out, as described previously by Skrisovska et al. [18] and Cavaloc et al. [19]. The starting matrix of DNA oligonucleotides was as follows: 5 0 GCGTCTCTGCAGTAGTTA(N20)AGTCGGCATCTTGG TACC CTATAGTGAGTCGTATTACC3 0 (where N20 indicates a 20 bases random sequence), and 5'GGTAATACGACTCACTATAGGGTACCAAGATGCCGACT3 0 (Metabion). After the fifth cycle of selection, the RT-PCR products were subjected to Next-Generation Sequencing on MiSeq (Illumina) platform (Oligo.pl, Warsaw, Poland).
The RNA sequence consensus motif was generated by the motif discovery tool MEME (Multiple Em for Motif Elicitation) [20] with default parameters and width of the motif set for maximum seven nucleotide residues. The gapped RNA motif comprising two repeats of the binding site was generated by the GLAM2 (Gapped Local Alignment of Motifs 2) method [21] and was defined as NGAGNNCC(N) 2-3 GAGNNCCNN, where N refers to any nucleotide residue.

RNA design
The sequence of the native Com target folds preferentially into a hairpin but can dimerize to form a largely helical duplex, depending on conditions [22]. Based on the RNA sequence motif obtained from SELEX, RNA molecules preferentially folding either into the monomeric or the dimeric form were computationally designed using DesiRNA (G.L. and J.M.B., unpublished, software available for download at http://iimcb.genesilico.pl/desirna/, and at https://github. com/GrzegorzLach/DesiRNA). The optimized parameter was the difference between the free energies of the dimer and of the separated strands folded into monomeric hairpin-loop structures. This difference is directly related to the equilibrium constant of the dimer formation and has been either minimized or maximized, to produce sequences containing the conserved motif that exhibit strong propensity to form either a dimer or a monomer. The free energies have been computed using the McCaskill algorithm [23] and Turner parameters implemented in the ViennaRNA package RNAlib library (version 2.1.2) [24]. Sequences of designed monomeric RNAs were as follows: RNA I 5'CGAGAACCAGAGAGU UCCGG3', RNA IA 5'CUGCAACCAGAGAGUUGCGG3', RNA IB 5'CGCGUACAAGUGAGUA CCGG3'. The sequence of designed dimeric RNAII was 5'AGAGAACCCGGAGUUCCCU3'. The native 19-mer Mom RNA sequence was: 5'GAAUGCCUGCGAGCAUCCC3'. All the oligonucleotides were chemically synthesized by FutureSynthesis, Poznan, Poland.

EMSA
The RNA-protein interactions were determined by the Electrophoretic Mobility Shift Assay (EMSA). We used the affinity purified GST-fused Com and [ 32 P] labeled synthetic RNAs. RNAs were labeled with γ 33 P ATP (Hartmann Analytic) using T4 PNK Kinase (Thermo Fisher Scientific), phenol/chloroform extracted, precipitated, desalted and separated from the unincorporated label on MicroSpin G-25 Columns (GE Healthcare Life Sciences). RNA samples were annealed in buffer containing 50

Probing of RNA structure by SHAPE
The chemical probing of RNA molecules using the SHAPE method was carried out in triplicates, as described previously by Wilkinson et al. [25]. The genetic constructs contained DNA sequences for RNA I, RNA IA, RNA II, 19-mer Mom RNA (described in the RNA designing subchapter) embedded within a SHAPE cassette that contained 5 0 and 3 0 flanking sequences covering a unique primer binding sites. The SHAPE cassette did not interfere with the folding of internal RNA. The constructs were synthesized by GeneArt Gene Synthesis (Thermo Fisher Scientific). The RNAs were transcribed in vitro using the FlashScribe kit (Invitrogen), according to the manufacturer's protocol.
The SHAPE probing reactions were carried out for each RNA alone and for RNAs crosslinked with the Com protein. For footprinting of Com binding sites, an excess of Com was added to RNA after the annealing step (250 pmole of Com protein per 2 pmole of RNA) and the mixed samples were incubated for 30 min at room temperature. Next, the Com protein was crosslinked to RNA with a 254 nm wavelength light in the Ultraviolet Crosslinker (UVP, LLC) for 30 min. During crosslinking the samples were kept on ice, 15 cm from the light source. At the same time, the annealed RNA for RNA probing reactions (without the Com protein) was kept at room temperature for one hour. The probing of all samples was carried out with 9 mM NMIA (N-methylisatoic anhydride) (Thermo Fisher Scientific) for 30 min. DMSO was used in control reactions. RNA was reversetranscribed with SuperScript III (Invitrogen), in the presence of fluorescent-labeled primers (VIC and NED, Thermo Fisher Scientific). The DNA obtained was capillary sequenced (Oligo.pl, Warsaw, Poland) and the SHAPE results were analyzed with the qshape software [26].

RNA structure prediction and visualization
RNA secondary structure was predicted with RNAstructure [27], using reactivity from the SHAPE experiments as pseudo-free-energy constraints. Secondary structure was visualized by VARNA [28].

Crystallization and X-ray data collection
Chemically synthesized oligoribonucleotides: RNA I, RNA II and 19-mer Mom RNA, 60 nmol (80 μg), were dissolved in 5 μl of buffer containing: 50 mM HEPES pH 7, 100 mM NaCl, 2.5 mM MgCl 2 and annealed by heating at 80˚C for 15 min and cooling 1˚C per min until the solution reached room temperature. RNA was added to 100 μl of affinity-purified GST-Com fusion protein solution concentrated to 8 mg/ml in the crystallization buffer containing 50 mM HEPES, pH 7, 100 mM NaCl, 1 mM DTT. The Phoenix nano-dispensing robot was used to set 0.2 μl crystallization drops with the Index Screen and Crystal Screen (Hampton Research) in 96 well crystallization plates (Hampton Research). It took 2-4 days for first rocky crystals to appear in Crystal Screen D11 condition. No further optimization from initial screening was carried out. Crystals were cryoprotected for 10 sec. in reservoir solution supplemented with 20% PEG400, flash-frozen, and stored in liquid nitrogen. The crystal collected from the Crystal Screen D11 crystallization condition (0.1 M sodium acetate trihydrate pH 4.6, 2M ammonium sulfate) was used for X-ray data collection at Bessy synchrotron 14.2 beamline (Berlin, Germany) (Gerlach, Mueller & Weiss, 2016). The data were indexed and scaled by XDS [29] to 2.27 Å resolution.

Structure determination and refinement
An estimation of the number of molecules in the asymmetric unit [30] indicated the presence of a 19-mer RNA duplex with a solvent content of 56.1%. The search model used for molecular replacement was a crystal structure of an 8-mer RNA duplex, PDB code 3GLP [31]. Phases were determined using Phaser [32]. The rotation/translation search led to a Z score of 8.1 and a final log-likelihood gain (LLG) of 350. The solution consisted of two sequentially stacked 8-mer search models, with a single base pair gap between them. The initial model was improved by several rounds of model building using Coot [33] and refinement. Initial refinement was carried out using Refmac5 [34] from the CCP4 program suite [35] and then continued with PHENIX [36]. Solvent molecules were manually modeled using Coot after the RNA duplex was fully built and refined. Intramolecular interactions, including canonical and noncanonical base pairs and stacking interactions, were analyzed by ClaRNA [37]. Atomic coordinates of the crystallographic model were deposited in the Protein Data Bank (accession code 6IA2).

Identification of Com RNA targets by SELEX
In order to characterize the RNA substrate preference of the Com protein, we carried out an in vitro selection experiment using the SELEX method (see Methods for details). As a protein bait, we used the zinc finger domain of the H. sputorum Com. After five cycles of SELEX, we found two very similar RNA motifs, both with the core sequence 5 0 -GAG(N) 2 CC-3 0 , where N refers to any nucleotide (Fig 2A) (first was present in all and second in 637 per 999 sequences analyzed). The GLAM2 method, which is able to perform gapped motif discovery showed that the majority of selected RNA sequences had a bipartite motif with two instances of the 5 0 -GAG(N) 2 CC-3 0 sequence (Fig 2A) (present in 868 per 999 sequences analyzed).
Secondary structure predictions indicated that the two core sequences were symmetrically localized in loops at both sides of a 3-4 nucleotide long stem. The stem consisted of variable sequence, but the pair closing the loop was invariably C-G.

Com binding to selected RNA is sequence-and structure-specific
RNA sequences found during in vitro selection were predicted to form a hairpin. However, RNA hairpins at high concentration could also form a self-complementary duplex [22,38]. We decided to examine which of the two possible RNA structures (a monomeric stem-loop or a dimeric stem) was preferred by Com. Therefore, we designed two RNA molecules folding preferentially into either of the forms and tested them for Com binding. As a control, we used the 19-mer Mom RNA fragment that preferably formed the hairpin form [22] (Fig 2B and 2C). One designed molecule, called RNA I, was expected to fold into a hairpin structure at 1 mM concentration (Fig 2C). The other molecule, RNA II, at the same concentration was expected to form a self-complementary duplex ( Fig 2C). All RNAs contained the consensus motif recognized by Com. The monomeric form of RNA I, 19-mer Mom RNA as well as a dimeric form Two single motifs were generated by MEME [20]. The bipartite motif was created by GLAM2 [21] and WebLogo [39]. (B) Mom regulatory region containing TIS structure (in green) and Com binding region-19-mer Mom RNA fragment (in blue). (C) RNA molecules used in the presented studies. Secondary structure was predicted with RNAstructure Web Server. of RNA II were confirmed by non-denaturing PAGE (with the slower migration of the RNA duplex band) ( Fig 3A). As seen on the gels after EMSA, some RNA II molecules also acquired a monomeric, hairpin form (at the same time and under the same conditions, we did not notice a dimeric form of RNA I and 19-mer Mom RNA).
We observed binding of Com to both designed RNAs, as well as to the 19-mer Mom RNA control, indicated by shifted bands on the gel after EMSA (Fig 3A). We observed that Com preferentially bound RNA hairpins (i.e., the monomeric version of the RNA substrate). The self-complementary duplex was targeted only after all RNA hairpin molecules were bound, as demonstrated by a super-shift in mobility of the RNA duplex ( Fig 3A).
To establish the RNA-Com binding site, we checked which variant of the single RNA motif within the hairpin bipartite motif of RNA I (Fig 2C) is actually bound by Com. To this end we modified the sequence of RNA I (while preserving its hairpin structure) in order to eliminate one or the other part of the bipartite RNA motif: the RNA IA contained only 5 0 -CC(N) 2-  [27] using reactivity from the SHAPE experiment as pseudo-free-energy constraints for RNA molecule alone (first molecule of each pair) and for the same RNA in the presence of Com (second molecule of each pair). Residue symbols are color-coded according to SHAPE reactivity: red-high reactivity (� 0.85), orange-moderate reactivity (� 0.4, < 0.85), black-weak reactivity (< 0.4). The GA residues with decreased reactivity upon Com binding are indicated with the gray shade.
https://doi.org/10.1371/journal.pone.0214481.g003 3 GAG-3 0 present in the loop, and the RNA IB contained only 5 0 -GAG(N) 2 CC-3 0 present in the 3 0 half of the stem-loop RNA (Fig 2C). We observed binding of Com to the RNA IA and very little binding to the RNA IB ( Fig 3B). However, the binding of Com protein to the RNA IA (with just one motif) was less efficient than its binding to the RNA I (with two motifs).
In the next step, we decided to probe the binding of Com to it RNA targets (19-mer Mom RNA, RNA I, RNA IA, and RNA II; RNA IB was excluded due to lack of binding to Com ( Fig  3B)). To examine the secondary structure of Com binding sites, we carried out structure probing by SHAPE. First, we probed each RNA alone and then we used the SHAPE method to perform RNA footprinting in complex with Com. The secondary structure models of RNAs obtained on the basis of our SHAPE (Fig 3C) was in agreement with our in silico predictions and the earlier models proposed by Hattman et al. [13] and Wulczyn & Kahmann [16]. The 19-mer Mom RNA, RNA I and RNA IA were predicted to form a short RNA hairpin with loops of six residues and the RNA II was predicted to form a duplex with unpaired ends and wobble adenines. In the earlier models of Mom regulatory region, the predicted Com binding site was located in the loop of a short hairpin proximal to the TIS (also in the hairpin form, Fig  2B), which included the Mom GUG start codon in its stem. The footprinting results indicated that the GA dinucleotide of the consensus sequence motif was somehow involved in Com binding, as indicated by a substantial decrease in SHAPE reactivity in the examined Com-RNAs crosslinked samples, in comparison to the free RNA samples (Fig 3C). We also notice a decrease in reactivity of the residue proceeding GA for all Com-RNA-cross-linked samples as well as of cytosine (the first nucleotide of the loop) for all RNAs in the hairpin form.
In earlier chemical and enzymatic footprinting studies of Hattman et al. [13] and Wulczyn & Kahmann [16], the A residue of the GA dinucleotide was consistently predicted as important for Com binding. However, the G residue in the presence of Com was indicated as sensitive to cleavage by T1 RNase. Since this RNase recognizes only unpaired G residues, the G involvement in Com binging was inconclusive [16]. SHAPE and RNase T1 probe different structural features: SHAPE indicates flexible residues whereas T1 cleaves phosphodiester bond after single-stranded guanosines. Thus, G residue although being single-stranded could be more rigid in the presence of Com.
In earlier studies, the CC dinucleotides, both, proceeding and following the SHAPE-reactive GA, were sensitive to double-strand-specific CV1 nuclease cleavage, but only in the absence of Com and they were unreactive or weakly reactive to chemical probes, regardless of the absence or presence of Com [15,16]. This suggested its double-stranded form and Com proximity during binding. In our studies, only the first C of the CC dinucleotide remained weakly reactive in the hairpin RNAs upon Com binding, suggesting that it was rigid and most probably base-paired; however, the lack of reactivity due to Com proximity cannot be excluded. The second C of the CC dinucleotide could actually interact with G of GA dinucleotide in the loop, gaining some properties of a paired nucleotide in the free RNA form. However, during the engagement of GA in Com binding, the C in the loop became more susceptible to SHAPE reagent.
The 19-mer Mom RNA, RNA I, and RNA II analyzed by SHAPE had more than one GA dinucleotide in the context of the bipartite consensus sequence motif obtained in our SELEX experiment. Interestingly, we noticed a decrease in SHAPE reactivity for both GA dinucleotide in the examined Com-19-mer Mom RNA, Com-RNA I and Com-RNA II crosslinked samples (Fig 3C). This may indicate that more than one Com binding site is present and utilized in the Mom regulatory region, or/and that the RNA structure rearrangements triggered by Com binding in one position, expose other binding sites which are consequently occupied by Com in the next step of Mom regulation. Alternatively, the occupancy of Com in another GA motif, shown in our SHAPE experiment may be explained by dimerization of GST tag of the GST-Com fusion protein used in the study or by a high excess of protein (125:1 molar ratio) used for crosslinking with RNA.

Crystal structure of RNA II duplex
To better understand interactions of ZnF Com with its RNA target, we attempted to co-crystallize the GST-Com protein with the hairpins: RNA I and the 19-mer Mom RNA fragment, as well as dimeric RNA II. We were able to obtain crystals and to collect and process X-ray data only for samples where GST-Com was supplemented with RNA II. Data collection statistics are summarized in Table 1. The initial estimation of asymmetric unit content revealed, however, that the packing of a putative GST-Com:RNA II complex was unfeasible, meaning that the macromolecule(s) crystallized were of smaller size. According to the computational analysis structure factors with the RIBER/DIBER server [40], the crystal contained only RNA and no protein with 94% probability. The presence of an RNA homo-duplex in the crystal was then definitely proven by the molecular replacement solution using Phaser [32]. The top solution from molecular replacement had good statistics, and electron density maps with good quality. The maps also corresponded to the first two nucleotides at the 5 0 end of the RNA II sequence (not present in the search model), and a clearly visible gap between the 8-mer RNA model duplexes (Fig 4A). Thus the molecular replacement solution provided continues density for the 19-mer homo-duplex of the RNA II molecule. The initial R factors (40.98/43.10 for R factor and Rfree, respectively) were improved to the final 21.19/24.60, by the addition of the missing nucleotides, sequence correction, introduction of a double conformation for the central G•G pair (and consequently also for G at position 11 in chain B), and the addition of solvent molecules.
The 19-mer self-complementary RNA II duplex folds into a double helix of the A-form (A-RNA). In the crystal lattice, along with the two-one screw axis parallel to c, the duplexes stack end to end forming a pseudo-continuous helix (S1 Fig). The RNA duplex consists of 14 canonical Watson-Crick base pairs (A-U, C-G) and 5 non-canonical base pairs (four A•C and one G•G pair in the middle of the duplex (Fig 4B)). All non-canonical base pairs form two hydrogen bonds.
The central G•G base pair, although showing static disorder, breaks the chemical and crystallographic symmetry of the helix. This non-canonical pair was modeled as two alternative conformations: one with syn-anti and other with the anti-syn orientation of base rings. These two possibilities were modeled with 0.7 and 0.3 occupancies, respectively. Both guanine rings are flipped in respect to one another with an~180 degree rotation around the C1'-N9 bond (no flipping is observed for the next residue-G at position 11 in chain B, which is also present in a double conformation). The interacting guanosine residues can be described as a Watson-Crick-Hoogsteen cis pair, according to ClaRNA classifier [37]. The two hydrogen bonds are formed between the N1-O6 and N2-N7 atoms of the guanines (the distance vary between 2.5 and 3.0 Å). An additional hydrogen bond is formed between the exo-amino group of G(syn) Specific interaction of Com with RNA residues and its phosphate oxygen atom (2.86-3.23 Å), which further stabilizes the conformation of the central G•G pair.
All the other base pairs in RNA II show canonical Watson-Crick cis conformations. The distances between the C1' atoms of the paired residues are clearly different for the non-canonical G•G pair (S2 Fig). The C1' atoms of the guanosine residues are separated by 11.0 or 11.6 Å, which is~0.5 or 1.1 Å longer than for all canonical Watson-Crick pairs (average distance is 10.5 Å). In the case of A•C the C1 0 -C1 0 distances are slightly shorter (average is 10.2 Å). The average rise parameter calculated between each neighboring residues is 2.8 Å with a standard deviation of 0.35 Å (helical rise was calculated in W3DNA by the projection of the vector connecting consecutive C1'-C1' middle points onto the helix axis [41]). The largest raise value (3.4 Å) is observed between the 12A and 13G residues in chain A while the minimum (2.2 Å) between 1A and 2G of chain A (S2 Fig). The presence of non-canonical base pairs, with the loosening of the helix packing, results in greater local tilt, roll, and twist of the G•G base pair and the neighboring bases. In respect to the λ angles, measured between the N-glycosidic bond and C1'-C1' atoms of paired residues, they range within~50-60˚for the typical Watson-Crick base pairs, whereas in the G•G pair they are between 25 and 66˚. Of notice, also the non-canonical A•C pairs show a decrease of λ angles of 5˚for adenine bases and an increase of~10˚for the cytosine bases, independently of the chain (S2 Fig). The changes in values of helical twist are also associated with the non-canonical base pairs. For AG/CC steps, the helical twists is 23.5 and 28.5˚while for GA/CC steps are 34.9 and 37.7˚. In the case of GG/CG step the helical twist shows the lowest value-22.3˚. Unwinding and twisting are observed locally resulting in an average helical twist of 31˚typical for A-RNA.
The solvent molecules in the crystal comprise a Cl-anion, a sulfate ion, and 10 water molecules. The sulfate ion was modeled at 0.7 occupancy, and all the other solvent molecules were modeled at full occupancy. The sulfate anion is found in the major groove, interacting with G10 of chain B, namely with the Watson-Crick edge of G(syn), as observed previously in similar circumstances [42]. The Cl-anion is located on the opposite side of the RNA helix, minor groove, and bound to G11 of chain A.
In general, the presence of non-canonical base pairs G•G and C•A does not distort the A-RNA form of RNA II, and the overall structure is stable, with the characteristic 11 base pairs per turn, the C3' endo conformation of the sugar rings, and the presence of the axial hole. The syn-anti arrangement of the central G•G pair allows minimizing the effect of guanines bulkiness and avoiding the steric clash between them. The non-canonical A•C pairs have G•U wobble-like conformation. The occurrence of double hydrogen bonding for the A•C pair is likely due to protonation of the adenine, on nitrogen in position 1, which can be promoted in the acidic crystallization condition (pH 4.6), but we cannot rule out that one of the bases is present as an imino tautomer, thus allowing two hydrogen bonds to be formed. Thus, weaker interactions are expected at higher pH, offering an explanation for the fact that some of the RNA II molecules were observed in the monomeric form in our EMSA experiment.

Discussion
Com is a small zinc finger protein that regulates the translation of bacteriophage Mu Mom RNA. The regulation occurs via Mom RNA binding and structural rearrangements leading to an exposition of translation start site. Here we demonstrated that Com zinc finger is sufficient for RNA binding in sequence and structure-specific manner.

Overview of Com interaction with RNA
Taking into account our data obtained by SELEX and SHAPE probing, we propose that the Com protein recognizes the 5 0 -CC(N) 2-3 GAG-3 0 motif present in the loop structure of an RNA hairpin that can be formed by its physiological target. The accessible GA dinucleotide is crucial for this recognition.
Multiple sequence motifs, similar to our bipartite SELEX-derived motif, are also present in the bacteriophage Mu Mom translation initiation region, which was already predicted [15], and confirmed in this study, to be a Com target. The motifs are localized in the region proximal to the TIS structure, which partially overlays TIS (5' GAAUGCCUGCGAGCAUCCCACGG AG 3'). Moreover, our results indicated that the Com recognition and binding of RNA required not only defined RNA sequence motif, but most probably the motif has to be positioned within a specific secondary structure. The results obtained herein support the model of Mom regulation by Com and definition of the Com target structure determined earlier by enzymatic and chemical probing [15]. In this respect, Com specifically binds the putative RNA target region, presumably contained in the stem-loop, and its binding destabilizes and melts the TIS structure [15]. We did not observe the transition stage of hairpin structure melting after Com binding in our SHAPE footprinting experiment. The explanation for this can be the lack of the C-terminal intrinsically disordered segment in the truncated version of Com used in our studies. This part of Com is dispensable for RNA binding (Fig 3A and 3B). However, it may be necessary for further structural RNA rearrangement and regulation of Mom.
Lima and coworkers presented contradictory results of Com binding assays. In their EMSA experiments, Com preferentially bound a 19-mer self-complementary RNA duplex (observed at about 1.55 mM concentration) and did not generate a mobility shift for the RNA hairpin form [22]. Although it is difficult to reconcile the lack of hairpin binding in studies reported by Lima et al., the strong binding of the RNA duplex might be explained by the differences in the structures of both duplexes, the 19-mer Mom RNA fragment [22] and 19-mer RNA II, used in our studies.

Comparison of RNA II and Mom RNA duplex structures
The helical structure of the RNA II presented in this work is similar to the structure of 19-mer Mom RNA fragment published by Lima and colleagues [22]. Nonetheless, the stacking of bases in our RNA II homo-duplex is slightly looser. The biggest rise for the Mom RNA structure (PDB access number 1KFO) is 3.1 Å that is 0.3 Å smaller than the maximum rise measurement observed for our RNA II (although, the average rise for the Mom RNA structure is 2.7 Å, which is only 0.1 Å smaller than the average rise observed in our study). In general, the distance from the more distanced phosphates is 49.9-50.7 Å (chain A and B, respectively) for RNA II, and 48.3 Å in the 19-mer Mom RNA fragment. Both duplexes superimpose with a rmsd of 2.07 Å when all atoms are used, and 1.85 Å when only the backbone atoms are used. The 19-mer Mom RNA duplex analyzed earlier has a single nucleotide 3 0 overhang, while RNA II has blunt ends, which explains why the biggest differences are observed at both ends of compared structures (Fig 5). Both, RNA II and a 19-mer Mom RNA duplex, contain the internal sequence motif recognized by Com, however, with different distances between CC and GAG sequences (2 and 3 nucleotides, respectively) as well as the GA dinucleotide at the beginning of each strand. Both of them have the A•C non-canonical base pair in the context of the internal GA dinucleotides, crucial for recognition by Com. An A•C base pair has been already speculated by Lima et al. to be able to introduce structural flexibility due to adenine N1 protonation and to function as a key element of a conformational switch, which can be triggered by environmental factors [22].
The internal GA dinucleotide of 19-mer Mom RNA duplex crystalized by Lima and colleagues [22] is involved in tandem wobble pairs A•C/G•U. This could potentially make this element even more suitable for weakening the stability of the RNA helix and making it prone to conformational rearrangements. In contrast, RNA II contains a G•G base pair, which is stabilized by three hydrogen bonds (including one internal hydrogen bond), adjacent to the internal GA dinucleotide. In RNA II, the G residue of the GA dinucleotide is involved in canonical base pairing (S1 Fig). This situation may reduce the accessibility of the RNA II sequence motif for interaction with Com, which prefers the GA dinucleotide to be available in a flexible, single-stranded form (as indicated by high reactivity of GA dinucleotides at the beginning of each strand in SHAPE probing) (Fig 3C). Better accessibility of GA dinucleotide could also explain why Lima's RNA duplex seems to be a better substrate for Com binding compared to our RNA II. However, a better understanding of Com-RNA interaction will be possible once the complex structure is experimentally determined.

Comparison of Com-RNA with the Tat-TAR system
As mentioned earlier by Hattman [15], there is a sequence and structure similarity between Com binding region in Mom RNA and TAR RNA (trans-activating region RNA) hairpin involved in interaction with the regulatory protein Tat (trans-activator of transcription) of HIV (human immunodeficiency virus). We noticed that the 5 0 -CC(N) 2-3 GAG-3 0 sequence motif recognized by Com is also present in the hexa-loop structure of TAR RNA hairpin. Additionally, in both RNAs the pining base pair is C-G and they have similar stem length (in case of TAR to its bulge region) [15].
TAR interaction with Tat is critical for efficient HIV transcription, gene expression and pathogenesis [44,45]. The TAR hairpin structure is positioned immediately after the transcription start site and stalls viral transcription by RNA polymerase II (Pol II). By binding to TAR, Tat recruits the host super elongation complex (SEC) to the promoter and restores transcription [46] [47]. The SEC complex consists of positive elongation factor b (P-TEFb), composed of CDK9 and Cyclin T1 (CycT1), other transcription factors (ELL2 and ENL/AF9) and scaffold proteins (AFF1 and/or AFF4) [48][49][50]. The studies of Tat-TAR regulatory system revealed that Tat binds directly to a 3-nucleotide bulge region in the major groove of TAR stem by its arginine-rich motif (ARM) and to the loop region in TAR hairpin by cysteine-rich domain [50,51]. Concurrently, Tat binds to the CycT1 of SEC forming a positively charged TAR-interacting surface composed of a helical Tat-TAR recognition motif (TRM) of CycT1 and the Zn 2+ coordinating loop of Tat [49,50]. The TAR loop in the crystal structure of the complex determined by Schulze-Gahman et al. [50] is stabilized by cross-loop hydrogen bonds between C30 and G34 and additional contacts with G33 (corresponding to residues C2, G6 and N5 in the 8-mer Com-binding motif 5 0 -CCNNNGAG-3 0 ), whereas the remaining loop base moieties of U31, G32 and A35 (corresponding to residues N3, N4, A7 in the 8-mer Com-binding motif) are projected outward from the loop. The protein complex makes a contact with the G32 and G33 bases directly and make extensive contacts with sugar phosphate backbone, suggesting that TAR recognition by SEC is predominantly based on RNA structure [50]. The importance of the TAR loop structure was also proposed based on the mutations in C30 or G34 resulting in a large reduction in CycT1 binding and the fact that the binding could be rescued by another mutation restoring hydrogen-bonding [52].
It is hard to speculate whether Com-Mom and Tat-TAR systems exhibit any analogy in the mode of RNA binding without experimental determination of Com-Mom complex structure. However, our data and the strong similarity between Com target and TAR RNA suggest that the Com binding is also RNA structure-dependent. Due to the almost identical sequence of both hairpins, the Mom loop structure could be stabilized by a cross-loop C-G hydrogen bond and the A nucleotide of the GA dinucleotide could be flipped out as in the TAR loop. The flexibility in the GA dinucleotide context of the unbound form of Mom will be critical for Com binding.
In the structure of Tat-TAR complex (PDB code 6CYT) [50], the key protein residues interacting with the TAR loop are Tyr26 of Tat and Trp258, Arg251, Arg254, Arg259 of CycT1. Searching for similar amino aides combination in Com models [53], we found a positively charged region with Arg8, Asn13, Lys14, and Arg31 located in the proximity of the four cysteine residues involved in the zinc ion coordination. This region could potentially serve as an RNA binding surface in Com. A remaining open question is whether Com requires other factors binding simultaneously to the Mom regulatory element in order to stimulate Mom translation, in analogy to the Tat-TAR system.
Another aspect of Tat-TAR regulation is the proposed recently Chaperna (RNA that provides chaperone function to proteins) activity of TAR RNA [54]. The Tat protein is intrinsically disordered and its interaction with TAR is cooperative [54]. Tat itself exhibits nucleic acid-chaperoning activities [55], and TAR RNA binding, in turn, prevents the Tat protein from misfolding and degradation [54]. It was proposed that TAR RNA may dictate the folding status of Tat, and therefore its interaction with other factors and successful HIV replication in host cells [54]. Such a cooperation could potentially exist also in the Com-Mom regulatory system. The Com protein has a C-terminal intrinsically disordered region, which is dispensable for RNA binding, but could fold upon RNA binding and participate in regulation by RNA structure unwinding or interaction with other factors. In turn, Com in the free, unbound form could be prone to misfolding and degradation (as we observed for GST-fused Com protein after the GST tag cleavage).
Com-Mom and Tat-TAR systems are evolutionary unrelated. However, both systems are essential for the infection and virus propagation in the respective host cells [15] [44,45]. The post transcriptional regulation of Mom enables the momification and provide protection against bacterial restriction nucleases, whereas TAR regulation is necessary for efficient HIV transcription. Both, the Mom regulatory region recognition and TAR recognition are sequence-and structure-dependent. If the high similarity of both RNA hairpins is accompanied by similar modes of protein binding, one could anticipate that the inhibitory molecules which interfere with Tat-TAR binding, could also disrupt Com-Mom system, and vice versa. It has been previously shown that methylated oligoribonucleotides which are complementary to the TAR stem-loop [56] or LNA/2 0 -O-methyl nucleoside analogue aptamers complementary to the loop of TAR can block Tat binding and inhibit the TAR-dependent transcription [57]. In line with these results, we demonstrated that the double-stranded self-complementary version of the Com RNA target is not sufficient for effective Com binding. Whether it could have anti-viral implications needs to be assessed with further biochemical and structural studies.