DUF3055 from Staphylococcus aureus adopts unique strategy for structural distinctiveness

Staphylococcus aureus remains a public health threat with the WHO classifying the pathogen as a high priority in the development of new antimicrobial agents. Whole genome sequencing has revealed a number of conserved genes that may be essential for cell viability and infection. Characterising the structure and function of these proteins will inevitably aid development of new antimicrobials. Therefore, this study elucidated the structure of hypothetical protein DUF3055 from S. aureus stain Mu50. The protein possesses an as yet undefined function and a unique fold. The size of DUF3055 made it an ideal candidate for NMR characterisation which in conjunction with circular dichroism revealed the protein to be folded. Crystallisation and structural solution found that the overall dimer fold has a negatively charged surface formed by a β-bulge and tightly crossed α-helices, with a complementary size to a DNA single turn. Our structural observations suggest that hypothetical protein DUF3055 from S. aureus has a role in DNA binding and gene regulation.


Introduction
Proteins perform essential functions that facilitate physiological processes; these functions are tied to their structure with surface shape playing a pivotal role.As such, investigations into protein structure can aid us in understanding the function of unknown proteins.Currently, 180,000 protein structures are available in the RCSB PDB database representing less than 0.1% of proteins identified through genome sequencing [1].Computational methods can assist in the structural analysis of unknown proteins but predictive models produced by PROSITE, SWISS-MODEL, and Alphafold can be spurious in the case of novel folds, unique structure, or fail to consider quaternary structure.Therefore, it is crucial to experimentally determine protein structure in order to confidently understand biological systems.
Staphylococcus aureus is a pathogenic bacteria considered to be of high priority to the World Health Organisation (WHO).The pathogen caused over 1 million deaths in 2019 primarily from lower respiratory and bloodstream infections [2].S. aureus can acquire a wide range of antibiotic resistances and currently 99% of S. aureus strains are resistant to penicillin [3].In healthcare settings both vancomycin-resistant (VRSA) and methicillinresistant strains (MRSA) are frequently observed with some hospitals reporting that > 50% of strains exhibit resistance to methicillin [4].
The genome of S. aureus strain Mu50 is made up of 2,878,529 nucleotides encoding ~ 2700 genes that will produce proteins organised into ~ 2400 families [5,6].Presently, > 1500 proteins from S. aureus have been structurally solved with proteins from the Mu50 strain accounting for less than 200.About 20% of assigned families also include domains of unknown functions (DUFs), which categorize functionally uncharacterized proteins [7].Although functional information is limited, evolutionary conservation suggests that many DUFs are essential.By converging the database of essential genes on Pfam DUF database, over 300 essential proteins from 16 different bacterial species are found to contain 238 DUFs, and most of which represent single-domain structure [8,9].DUF3055 domain possessing proteins are highly conserved in Alkalihalophilus, Bacillus, Macrococcus, and Staphylococcus.All proteins classified in the DUF3055 domain-containing proteins are single-domained with less than 100 amino acids.This implies biological essentiality of DUF3055 in bacteria.In the present study, the crystal structure of DUF3055 domain-containing protein from S. aureus is determined ( SA DUF3055).The unique fold possesses an abundance of glycine residues in both its α-helices and β-strands which typically disfavour secondary structure.Our studies highlight that DUF3055 most likely functions as a DNA mimic and may form one component of an toxin-antitoxin pair.

Cloning, expression and purification
The SAV0927 gene coding for SA DUF3055 was amplified from the genomic DNA of S. aureus Mu50 by polymerase chain reaction using 5′-GAA CAT ATG ATT GAT ATG TAT TTA TAT GAT -3′ and 5′-GAT CTC GAG ATG AAT GAC TTC ATT TAA ATA -3′ as the forward and reverse primers, respectively.The gene for SA DUF3055 was cloned into the NdeI and XhoI restriction site of the pET21a (+) vector.The resulting construct has additional residues (LEHHHHHH) that encodes a C-terminal histidine tag.The sequence of the cloned vector was confirmed by DNA sequencing (data not shown).The recombinant plasmid was transformed into E. coli BL21 (DE3).Cells were grown with ampicillin (50 μg/ ml) in LB medium for circular dichroism and crystallography and M9 medium supplemented with 15 N-NH 4 Cl for NMR spectroscopy.The cultures were incubated at 37 °C with shaking at 180 rpm until OD 600 reached 0.5 and expression was induced with 0.5 mM isopropyl-β-D-1-thiogalactopyranoside (IPTG).Cells were harvested by centrifugation at 4500 × g at 4 °C after 4 h of additional growth.For purification, the cell pellet was resuspended in buffer (50 mM Tris-HCl, pH 7.5, and 500 mM NaCl) and disrupted using an Ultrasonic processor (Cole-Parmer, U.S.A.) at 4 °C.After centrifugation at 20,000 × g for 1 h at 4 °C, the supernatant was purified by binding to His-Trap affinity column (Cytiva, England) and eluted with elution buffer containing 300 mM imidazole.The purified protein was analyzed with > 99% purity by SDS-PAGE and was concentrated to 10 mg/ml by ultrafiltration in 3,000 Da molecular-mass cutoff spin columns (Millipore, U.S.A.).

Circular dichroism and NMR spectroscopy
Circular dichroism experiments were carried out on a Circular Dichroism Spectropolarimeter J815 (JASCO, Japan).SA DUF3055 was diluted to 20 μM in 50 mM sodium phosphate (pH 7), 100 mM NaCl at 20 °C.A 1 mm path quartz cuvette was used for measurements.The far-UV CD spectra were recorded from 195 to 260 nm with a scan speed of 2 cm/min.The spectra were plotted as a mean ellipticity θ (degree × cm 2 × dmol −1 ) versus wavelength λ (nm).Thermal unfolding scan experiments were conducted by increasing temperature from 20 °C to 95 °C in steps of 0.2 °C.
For NMR spectroscopy, 0.5 mM 15 N labeled protein was prepared in 20 mM sodium phosphate (pH 6), 100 mM NaCl, 1 mM dithiothreitol, in 90% (v/v) H 2 O and 10% (v/v) D 2 O buffer.The NMR spectrum was recorded at 298 K on a JEOL 600 MHz NMR spectrometer equipped with a cryoprobe.The 2D-[ 1 H- 15 N] HSQC spectrum was acquired for uniformly 15 N labelled SA DUF3055.The spectrum was processed with NMRPipe and analyzed using NMRviewJ [10,11].

Crystallization and data collection
SA DUF3055 samples for crystallization were diluted to 10 mg/ml with 50 mM Tris-HCl (pH 7.5), 150 mM NaCl.Initial crystallization screening was established in 96-well sitting drop (MiTeGen, U.S.A.) at 20 °C.SA DUF3055 was crystallized in a variety of conditions, which were further improved by hanging-drop vapor diffusion methods.For the optimal growth of SA DUF3055 crystals, 1 μl of precipitant solution (1 M (NH 4 ) 2 SO 4 , 1% (w/v) PEG3350, 100 mM Bis-Tris pH 5.5) and 1 μl of protein solution was mixed on a siliconized cover slip and equilibrated against a 1 ml reservoir of precipitant solution.This condition yielded block-shaped crystals in 5 days which were cryo-protected in artificial mother liquor supplemented with 20% (w/v) glycerol.Diffraction data was collected at 100 K to a resolution of 3.0 Å on beamline 5C at the Pohang Light Source, Korea.Data were processed and scaled through HKL2000 program suite [12].Further structural analysis was performed using the CCP4 suite [13].The SA DUF3055 crystals belong to space group P6 5 22 and contained 6 monomers (3 dimers) per asymmetric unit.
The structure was determined by molecular replacement technique using Phaser with the coordinates of a model predicted by program AlphaFold from Google Deepmind as a starting model [14].Refinement was continued with Refmac5 and model was built using Coot [15,16].5% of data was set aside for the refinement calculations of R free , and the final crystallographic statistics are summarized in Table 1.

Accession number
The coordinates of SA DUF3055 was deposited in the PDB database as entry 8XFU.

Physical stability of SA DUF3055
Initial characterisation of SA DUF3055 focused on thermodynamic stability and secondary structure analysis using CD [17].Our far UV spectra showed two distinguishable peaks with the lowest negative peak near 220 nm and an adjacent minor peak at 212 nm, which indicated the protein had a predominantly β-sheet secondary structure.However, the spectra was slightly shifted to lower wavelengths compared to a theoretical β-sheet indicative of a smaller proportion of α-helix.Conversely, secondary structure prediction using K2D2 and BeStSel suggested that SA DUF3055 had a greater α-helix content of 60% [18,19].Given the lack of signal for random coil it is unlikely that the protein contains unstructured regions.Temperature scans at 220 nm show that SA DUF3055 is remarkably stable at all temperature below 70 °C, exhibiting a melting transition (T m ) at 72 °C (Fig. 1A).To study the changes in protein conformation and structural determination, protein NMR spectroscopy was conducted.Attempts to produce 1 H-15 N-13 C triple-labeled SA DUF3055 were not successful since inefficient amount of cells were grown using 13 C-glucose as a carbon source. 1 H-15 N 2D NMR HSQC spectrum showed a few dispersed broad peaks with around 50 counts.This confirmed the protein was folded and supports our CD observations.However, it is also apparent that many residues are showing dynamics in different timescales because of the uneven intensity distribution (Fig. 1B).As such, for structural elucidation crystallography was performed.

Structural analysis of DUF3055
The crystal structure of SA DUF3055 was determined to a 3.0 Å resolution with clear density from residue 2 to the C-terminal histidine tags (at residue 92).The monomer  shows an antiparallel β-sheet and 3 consecutive α-helices with β1-β2-β3-β4-α1-α2-α3 topology (Fig. 2A and B).The SA DUF3055 crystal belongs to the P6 5 22 spacegroup with unit cell dimensions 106, 106, 454 Å, that harbors 3 dimers in an asymmetric unit.In contrast to our previously characterised crystal system which adopted a linear polymer this new crystal form has molecules packed in a ball-like state with substantial solvent volume (43%) [20].Although the dimerization is mainly dependent upon β-sheet backbone hydrogen bonds, some interactions are observed between side chains.25 out of 87 residues (30%) participate in hydrogen bonding or salt bridge interactions with the neighbouring monomer with a large 1900 Å 2 contact surface formed by the long β1 strand.
A small repeating sequence FVGFVG in the strand supports dimer formation through a π-π stack between Phe20 and Tyr5 and π-cation interaction between Phe17 and Arg36.We surmise that the stability observed in our CD experiments is largely granted by the tight bonding between two monomers (Fig. 2C).
Additionally, the β1 strand in both monomers are distorted and irregular.When the standard β-strands are exposed to solvent area, they become aggregationprone structure and favors to form insoluble fibrils.Therefore, natural proteins adopt alternative β-strand architecture rather than ordinary shape to protect β-strand on the edge (β-edge).Therefore, natural proteins adopt alternative β-strand architecture rather than ordinary shape to protect β-strand on the edge (β-edge) [20,21].In contrast to a typical β-strand, the β1 strand of SA DUF3055 shows a loss in planarity, featuring a ~ 115° bend with Glu12 serving as the apex of the protrusion.The bulged β-strand also exhibits twisted edges, incorporating atypical glycine residues for β-strands (Gly19 and Gly52) [22].Sheltered beneath a canopy, three consecutive α-helices occupy the pocket area, establishing secure interactions with bulged β-strands.In addition to the β-strands, the α-helices exhibit distinctive characters.Despite glycine being one of the most destabilizing residues for an α-helix due to the wider range of backbone dihedral angles, it is frequently observed in membranous proteins, especially in the region of helix crossings [23][24][25].This allows helix packing, providing favorable hydrophobic interaction in the confined space between adjacent helices.In the structure of SA DUF3055, Gly76 on α3 is close proximity to α2, enables helix crossing (Fig. 2B).Through this arrangement, three α-helices are tightly packed into a narrow space with the charged side chains of helices oriented toward the solvent region.Structural analysis by the DALI server revealed a very low Z-score (less than 4.3 for top scored structure), indicating that SA DUF3055 has a unique fold [26].Our previous investigation asserted that SA DUF3055 may form a long chain polymer as part of its function, owing to observations from the crystal packing.However, the crystallographic oligomer does not correlate with the solution state.The altered packing observed in our new crystal form suggests SA DUF3055 most likely functions as a dimer.Indeed, analysis using PISA confirms the likely solution state is dimer with a calculated ΔG of − 23.5, − 21.2, − 23.3 kcal/mol, for monomer pairs AB, CD and EF respectively.
Analysis of the surface area shows a prominent negative charge due to solvent exposed aspartate and glutamate residues.In total, all 17 negatively charged residues (7 aspartates and 10 glutamates) have side chains displayed on the protein's exterior.Comparing the sequence of SA DUF3055 to homologs in other bacteria show that these negatively charged residues are well conserved and aligned (Fig. 3A).Moreover, the residues occur as consecutive pairs in the amino acid sequence or are localised within the 3D structure.These characteristics are often observed in nucleic acid related functional proteins [27].This observation, together with the curved shape afforded by the β-bulge would allow for the protein to mimic the DNA major groove, showing 34 Å of width (Fig. 3B).As the dimer imparts a symmetry operator onto the protein known DNA binding motifs such as a zinc finger and helix-turn-helix would be accommodated by SA DUF3055.Using ClusPro, a reasonable binding model between SA DUF3055 and a helix-turn-helix transcription factor can be generated [32].Substantial hydrogen bonds and ionic interactions are found in the proposed representation, suggesting potential protein-protein interaction (Fig. 4).

Discussion
SA DUF3055 is a well-conserved hypothetical protein present throughout pathogenic staphylococcus with an as yet uncharacterised role.Our previous study elucidated the structure with an alternative packing state that suggested the protein formed a linear polymer [28].However, the packing adopted by our new structure strongly suggests the native form of the protein is likely a dimer.Our structural analysis shows SA DUF3055 has a highly negative electrostatic surface which is conserved in closely related homologs.From its shape, size, and charge, we surmise that SA DUF3055 interacts with DNA binding proteins.SA DUF3055 acts as a DNA-mimetic, with the length of the major beta strand ~ 34 Å, mirroring that of a B-DNA single turn.Further studies such as pull-down assay to find partner molecule, DNA dissociation activity test to compare the binding affinity, or RNA sequencing understand its network are expected to confirm SA DUF3055 function.
Staphylococi represent a threat to public health with species such as S. aureus, S. epidermis, or S. haemolyticus gaining multi-drug resistances [29].SA DUF3055 is highly conserved in many species of Staphylcocci, implying this would perform a key role in its pathogenicity.Whilst the structure has elucidated the likely function, a subsequent microbiological investigation would be required to determine the protein's impact on virulence phenotypes.This study presents structural analysis, which can widen our understandings regarding bacterial physiology.

Fig. 1
Fig. 1 Protein stability of SA DUF3055.A (left) Circular dichroism spectrum in the form of molecular ellipticity against wavelength (nm) for SA DUF3055 (colored in black).The experiment was measured from 195 to 260 nm in Tris-HCl buffer, pH 6.0 at 20 °C.Standard CD spectra are drawn as references (blue, α-helix; red, β-sheet; green, random coil).Values are the mean of three separate determinations.(Right) Thermal stability of SA DUF3055 was analyzed using circular dichroism at 222 nm.B NMR contour plots for the SA DUF3055 1 H-15 N HSQC experiments conducted at 600 MHz frequency

Fig. 2
Fig. 2 Structural overview and secondary structure topology of SA DUF3055.A Secondary-structure diagram of the SA DUF3055 dimer.The two monomers are colored light and dark cyan.B Crystal structure of the SA DUF3055 dimer in ribbon representation.The glycine residues that are involved in secondary structure elements are indicated as red and showed in an enlarged inset.The two α-helices, α2-helix and α3-helix are separated by a short distance of 4.5 Å. C Visualization using LIGPLOT depicts the amino acid residues at the dimer interface, with green dashed lines indicating hydrogen bonds

Fig. 3
Fig. 3 Sequence alignment and protein electrostatics.A Sequence alignment of DUF3055 from different bacterial species.Identical residues are colored white on a red background and similar residues are red on white background.Secondary structure elements (springs, α-helices; arrows, β-strands) are shown above.Negatively charged residues in SA DUF3055 is denoted as red dots.Sequence alignments were performed using ClustalX and visualized by ESPript [30, 31].B Side chains for residues denoted as red dots in A are represented as sticks.All shown side chains are heading toward the solvent area.Corresponding electrostatic potential at the molecular surface of SA DUF3055.The negatively charged surface is shown as front and flipped view (by 180° rotation)

Fig. 4
Fig. 4 Proposed function of SA DUF3055.A Surface representation of SA DUF3055.The negative charges are distributed on the β-bulge region.B An alignment of typical B-DNA with SA DUF3055 β1-strand.C Refined ClusPro docking simulation with a common transcription factor from our previous study [32].Favored docking model interacts directly with SA DUF3055

Table 1
Crystallographic data collection and refinement statisticsa Numbers in parentheses indicate the statistics for the last resolution shell b Rwork = ∑(|Fobs-Fcalc|/∑Fobs, where Fobs = observed structure factor amplitude, and Fcalc = structure factor calculated from model.Rfree is computed in the same manner as Rwork, but from a test set containing 5% of data excluded from the refinement calculation