Extent of N-Terminus Folding of Semenogelin 1 Cleavage Product Determines Tendency to Amyloid Formation

It is known that four peptide fragments of predominant protein in human semen Semenogelin 1 (SEM1) (SEM1(86–107), SEM1(68–107), SEM1(49–107) and SEM1(45–107)) are involved in fertilization and amyloid formation processes. In this work, the structure and dynamic behavior of SEM1(45–107) and SEM1(49–107) peptides and their N-domains were described. According to ThT fluorescence spectroscopy data, it was shown that the amyloid formation of SEM1(45–107) starts immediately after purification, which is not observed for SEM1(49–107). Seeing that the peptide amino acid sequence of SEM1(45–107) differs from SEM1(49–107) only by the presence of four additional amino acid residues in the N domain, these domains of both peptides were obtained via solid-phase synthesis and the difference in their dynamics and structure was investigated. SEM1(45–67) and SEM1(49–67) showed no principal difference in dynamic behavior in water solution. Furthermore, we obtained mostly disordered structures of SEM1(45–67) and SEM1(49–67). However, SEM1(45–67) contains a helix (E58-K60) and helix-like (S49-Q51) fragments. These helical fragments may rearrange into β-strands during amyloid formation process. Thus, the difference in full-length peptides’ (SEM1(45–107) and SEM1(49–107)) amyloid-forming behavior may be explained by the presence of a structured helix at the SEM1(45–107) N-terminus, which contributes to an increased rate of amyloid formation.


Introduction
Human immunodeficiency virus (HIV) was first identified in 1981 [1]. This is one of the most dangerous diseases due to its influence on cells of the human immune system. Over time, HIV causes acquired immunodeficiency syndrome (AIDS) which may lead to susceptibility to various infections and tumors and eventually death [2]. Since the main route of transmission of the virus is unprotected intercourse, seminal fluid is considered as the main factor in increasing HIV activity [3].
Human semen forms a coagulum immediately after ejaculation. Semenogelin 1 (SEM1) and Semenogelin 2 (SEM2) are predominant proteins in seminal vesicles occupying approximately 60% of the ejaculate volume [4][5][6][7][8]. Both proteins originate from the glandular epithelium of the seminal vesicles, which secrete them in high concentrations (60 g/L) [9]. Semenogelins 1 and 2 are the predominant structural proteins of the loose gel formed in loose gel formed in freshly ejaculated human semen. The concentration of SEM1 is ten times higher compared with SEM2 in semen. Semenogelins perform many bioc cal functions, such as semen liquefaction, antibacterial activity, protection and other tions of semen [9]. The Semenogelins participate in noncovalently linked gel-like stru formation that ensnares the spermatozoa. Within the first 20 min, the gel-like struct liquefied by serine proteases, primarily prostate-specific antigen (PSA), which cleav full-sized proteins into small fragments [10].
In previous works, it was demonstrated that certain cationic polypeptides with semen and seminal plasma could form amyloid fibrils amplifying HIV infection [1 menogelin cleavage products found in semen in high concentration could be respo for an increase in the activity of HIV virions in seminal fluid. The analysis of semen loid compounds showed the presence of SEM1 fragments (SEM1 , SEM1  and SEM1(86-107)) ( Figure 1) [12,13].
Nuclear magnetic resonance spectroscopy (NMR) is an informative method f termining the spatial structure of proteins and peptides in solution [21][22][23][24][25][26]. Pulsed gradient NMR spectroscopy (PFG-NMR) and dynamic light scattering spectroscopy are effective methods for studying the translational mobility of a protein in solutio 32]. CD was applied for the verification of the protein secondary structure [33]. I discussion, we will describe the difference in the amyloid behavior of SEM1 cle products based on an individual study of the N-domains (SEM1 (49)(50)(51)(52)(53)(54)(55)(56)(57)(58)(59)(60)(61)(62)(63)(64)(65)(66)(67) and SEM1(4 of SEM1  and SEM1 . Based on structural studies of N-domains an Molecular dynamics (MD) simulation of full-sized peptides, we suppose the respo ity of helical fragments in N-terminuses of SEM1  in fast amyloid formation previous article, Lui et al. (2010) [34] reported monomer helixes' rearrangement int yloid β-strand structures during the formation of amyloid fibrils. Thus, the prese It is believed that semen amyloid fibrils enhance the adhesion of HIV virions to cell membranes by decreasing electrostatic repulsion between membranes of the virus and target cell [14,15]. Additionally, studies have shown that amyloid fibrils from Semenogelin 1 play a physiological role in the process of fertilization [16]. The most obvious similarity between these processes is their same physiological circumstances [17]. The fusion of semen and oocyte membranes, as well as HIV and target cell membranes, is an energetically unfavorable process that requires many different cooperative protein-protein interactions [18][19][20].
Nuclear magnetic resonance spectroscopy (NMR) is an informative method for determining the spatial structure of proteins and peptides in solution [21][22][23][24][25][26]. Pulsed-field gradient NMR spectroscopy (PFG-NMR) and dynamic light scattering spectroscopy (DLS) are effective methods for studying the translational mobility of a protein in solution [27][28][29][30][31][32]. CD was applied for the verification of the protein secondary structure [33]. In our discussion, we will describe the difference in the amyloid behavior of SEM1 cleavage products based on an individual study of the N-domains (SEM1 (49)(50)(51)(52)(53)(54)(55)(56)(57)(58)(59)(60)(61)(62)(63)(64)(65)(66)(67) and SEM1 ) of SEM1  and SEM1 . Based on structural studies of N-domains and the Molecular dynamics (MD) simulation of full-sized peptides, we suppose the responsibility of helical fragments in N-terminuses of SEM1  in fast amyloid formation. In a previous article, Lui et al. (2010) [34] reported monomer helixes' rearrangement into amyloid β-strand structures during the formation of amyloid fibrils. Thus, the presence of helix-like structures of SEM1(45-107) may lead to an increase in the amyloid-forming rate in comparison with SEM1(49-107).  and SEM1(45-107) Figure 2A depicts the ThT fluorescence spectra of SEM1(45-107) and SEM1  peptides and a control pure ThT solution (black line). The red and green lines show the ThT fluorescence spectra of the SEM1(49-107) and SEM1(45-107) solutions, respectively. The red and black lines are almost identical, pointing to the absence of the amyloid fibril formation of SEM1  in water solution. In the case of SEM1(45-107), ThT fluorescence intensity (green line) increases by 4-5 times, indicating the presence of amyloid fibrils in the peptide solution. helix-like structures of SEM1(45-107) may lead to an increase in the amyloid-formin in comparison with SEM1(49-107). Figure 2A depicts the ThT fluorescence spectra of SEM1(45-107) and SEM1(49 peptides and a control pure ThT solution (black line). The red and green lines sho ThT fluorescence spectra of the SEM1(49-107) and SEM1(45-107) solutions, respect The red and black lines are almost identical, pointing to the absence of the amyloid formation of SEM1  in water solution. In the case of SEM1(45-107), ThT flu cence intensity (green line) increases by 4-5 times, indicating the presence of amylo brils in the peptide solution. As mentioned above, only fresh peptide solutions (up to 10 min after synthesis) used for ThT fluorescence measurements. Therefore, the fibril formation of SEM1(45 starts immediately after purification in comparison with SEM1(49-107). However, t creased ThT fluorescence intensity measured 30 min after purification showed the loid formation of both SEM1(49-107) and SEM1(45-107) peptides (see Supplementar terial Figure S6). The time dependence of the fluorescence intensity of ThT showe time for the amyloid formation of SEM1  in comparison with SEM1(49-107) ure 2B). We used transmission electron microscopy (TEM) data to detail the fibr mation process. TEM images of SEM1(45-107) and SEM1(49-107) fibrils were record hour after preparation (Figure 3). TEM images show that the length of SEM1(45-1 brils is larger than that of SEM1(49-107) ones. Thus, TEM data are in good agreemen ThT data, where we observed the more intensive fibril formation of SEM1(45-107) pared with SEM1(49-107). As mentioned above, only fresh peptide solutions (up to 10 min after synthesis) were used for ThT fluorescence measurements. Therefore, the fibril formation of SEM1(45-107) starts immediately after purification in comparison with SEM1(49-107). However, the increased ThT fluorescence intensity measured 30 min after purification showed the amyloid formation of both SEM1(49-107) and SEM1(45-107) peptides (see Supplementary Material Figure S6). The time dependence of the fluorescence intensity of ThT showed less time for the amyloid formation of SEM1  in comparison with SEM1(49-107) ( Figure 2B). We used transmission electron microscopy (TEM) data to detail the fibril formation process. TEM images of SEM1(45-107) and SEM1(49-107) fibrils were recorded an hour after preparation ( Figure 3). TEM images show that the length of SEM1(45-107) fibrils is larger than that of SEM1(49-107) ones. Thus, TEM data are in good agreement with ThT data, where we observed the more intensive fibril formation of SEM1(45-107) compared with SEM1(49-107).

DLS Spectroscopy of SEM1(45-67) and SEM1(49-67)
The oligomerization of SEM1(45-67) and SEM1(49-67) was further monitored via dynamic light scattering (DLS), which enables the observation of NMR-invisible diffusive species since the scattering intensity is very dependent on the particle mass/size [39]. The evolution of the size distribution with time is shown in Figure 9. species since the scattering intensity is very dependent on the particle mass/size [39]. The evolution of the size distribution with time is shown in Figure 9.
The above results raise the question of what breaks the helix of SEM1 , and how is this all connected to the role of the first four residues in the amyloid formation of SEM amyloidogenic peptides. Residue H47 is positively charged and may form electrostatic contact with residue E58 to stabilize the helix motif of SEM1(45-107). Residue 63 is an hydrophobic phenylalanine that may provide hydrophobic contact with the I65 residue side chain in the helix motif that is not observed in SEM1 . Hence, four extra residues stabilize the helix motif of SEM1(45-107), which contributes to forming the stabilized hydrophobic region. Moreover, for other amyloid peptides, it has been previously shown that hydrophobic regions enhance intermolecular cohesion during amyloid formation [47,48]. The higher hydrophobicity of SEM1(45-107) provides the propensity of peptides to form attractive interactions, leading to a fast aggregation rate.
We carried out the CD analysis of SEM1(45-107) fibrils. The recorded CD spectrum ( Figure 12) is typical for the β-sheet of amyloids [49]. We proposed that the so-called discordant helix of SEM1  in the N-terminus may convert to a β-sheet during fibril formation [50]. A more detailed and complete study of this phenomenon is expected in further studies. However, the fast aggregation of SEM1(45-107) and SEM1(49-107) makes it impossible to perform NMR structural studies of these peptides. Therefore, we performed MD simulations of SEM1(45-107) and SEM1(49-107) (Figures 10 and 11) to evaluate the influence of a peptide fragment (G45-Y48) on the C-terminus (D86-L107). The analysis of the MD simulations did not show a principal difference between the spatial structure of the C-terminuses (D86-L107) of the SEM1(45-107) and SEM1(49-107) peptides. The helical fragments were present in all configurations of SEM1(45-107), while SEM1(49-107) obtained all configurations characterized by a disordered structure. The above results raise the question of what breaks the helix of SEM1(49-107), and how is this all connected to the role of the first four residues in the amyloid formation of SEM amyloidogenic peptides. Residue H47 is positively charged and may form electrostatic contact with residue E58 to stabilize the helix motif of SEM1(45-107). Residue 63 is an hydrophobic phenylalanine that may provide hydrophobic contact with the I65 residue side chain in the helix motif that is not observed in SEM1 . Hence, four extra residues stabilize the helix motif of SEM1(45-107), which contributes to forming the stabilized hydrophobic region. Moreover, for other amyloid peptides, it has been previously shown that hydrophobic regions enhance intermolecular cohesion during amyloid formation [47,48]. The higher hydrophobicity of SEM1(45-107) provides the propensity of peptides to form attractive interactions, leading to a fast aggregation rate.
We carried out the CD analysis of SEM1(45-107) fibrils. The recorded CD spectrum ( Figure 12) is typical for the β-sheet of amyloids [49]. We proposed that the so-called discordant helix of SEM1  in the N-terminus may convert to a β-sheet during fibril formation [50]. A more detailed and complete study of this phenomenon is expected in further studies.

Protein Expression and Purification
The expression vectors were obtained by cloning the SEM1(49-107) and SEM1(45-107) peptide fragments of the H. sapiens Semenogelin 1 gene fused with 6xHistidinetagged GB1 partner protein into pET28a plasmid [51]. Histidine-tagged partner protein was linked with semenogelin fragments via a TEV-protease cleavage site for further separation. The expression and purification protocols for the SEM1(49-107) and SEM1(45-107) peptide fragments (SEM1 fragments) are the same and based on the protocol described before with minor modifications [52]. Protein expression was carried out in an E. coli BL21 (DE3) pLysS strain (Novagen, Darmstadt, Germany). Cells were grown in LB-

Protein Expression and Purification
The expression vectors were obtained by cloning the SEM1(49-107) and SEM1(45-107) peptide fragments of the H. sapiens Semenogelin 1 gene fused with 6xHistidine-tagged GB1 partner protein into pET28a plasmid [51]. Histidine-tagged partner protein was linked with semenogelin fragments via a TEV-protease cleavage site for further separation. The expression and purification protocols for the SEM1(49-107) and SEM1(45-107) peptide fragments (SEM1 fragments) are the same and based on the protocol described before with minor modifications [52]. Protein expression was carried out in an E. coli BL21 (DE3) pLysS strain (Novagen, Darmstadt, Germany). Cells were grown in LB-rich nutrient medium, supplemented with 50 µg/mL kanamycin and 25 µg/mL chloramphenicol at 37 • C and 180 rpm shaking until the optical density OD 600 of 0.6-0.8 was reached. The culture was induced to express the SEM1 fragment via the addition of 1 mM isopropyl 1-thio-β-Dgalactopyranoside (IPTG) and allowed to grow for 4 h in the same conditions. Then, cells were harvested via centrifugation (5000 rpm, 15 min, 4 • C), frozen and stored at −20 • C.
Gel filtration was performed using an NGC Discover chromatographic system and Enrich SEC75 column (BioRad, Hercules, CA, USA) in buffer 3 (50 mM Tris-HCl pH 8.5, 0.5 M NaCl) with a 1 mL/min flow rate. Peak fractions were pooled, and the fusion protein was digested via homemade his-tagged TEV-protease [53] at a ratio TEV:GB1-SEM1 fragment equal to 1:100 (w/w). Overnight digestion was carried out in the presence of DTT (1 mM), PMSF (1 mM) and EDTA (0.5 mM) at 4 • C [54]. Then, the reaction mix was loaded on NiNTA-resin again to trap GB1 and TEV-protease. The concentration of the SEM1 fragment in the flow-through fraction reached~2 mM using Amicon Ultra-0.5 (3K) spin-concentrators (Merck, Burlington, MA, USA). The purity of the samples on each purification step was evaluated via polyacrylamide gel electrophoresis under denaturing conditions (SDS-PAGE) in pH 8.3 Tris-glycine buffer [55]. Finally, samples of SEM1  and SEM1(45-107) peptide fragments with purities of more than 95% were obtained.

ThT Fluorescence
The fibril formation of SEM1(45-107) and SEM1(49-107) was studied using ThT fluorescence intensity measurements. Thioflavin T (ThT) dye fluorescence is regularly used to quantify in vitro amyloid fibril formation. Upon binding to amyloid fibrils, ThT gives a strong fluorescence signal at approximately 482 nm [56]. ThT fluorescence probes were observed at 37 • C on a Thermo Scientific Varioskan LUX multimode microplate reader (Waltham, MA, USA) via FluorEssence (v. 6.1) software in a 96-well microplate. ThT fluorescence assays were prepared by mixing 17 µL of the peptide solution (2mM peptide in 50 mM Tris-HCl pH 8.5, 0.5 M NaCl), 3 µL of 500 µM ThT in 50mM Tris-HCl pH 8.5 and 0.5 M NaCl (total volume of 20 µL) to maintain the fluorescence signal within the linear range of the instrument. The samples were excited at 440 nm, and the fluorescence emission intensity was collected at 482 nm for 90 s and averaged. Fresh peptide solutions of SEM1(45-107) and SEM1(49-107) were prepared for ThT fluorescence measurements. The time interval between the end of synthesis and the registration of fluorescence spectra was less than 10 min. The fluorescence intensity was corrected for lamp intensity fluctuations by dividing the observing fluorescence signal by the lamp intensity. The concentration of SEM1(45-107) and SEM1(49-107) peptides was controlled using NanoDrop One C (Thermo Fisher Scientific, Waltham, MA, USA).

PFG-NMR Spectroscopy
Diffusion experiments were carried out using a 700 MHz NMR spectrometer (AVANCE III-HD, Bruker, Billerica, MA, USA) equipped with a quadruple resonance CryoProbe ( 1 H, 13 C, 15 N and 31 P) with a standard z-gradient (a maximum strength of 55.7 G × cm −1 ). Diffusion decays were obtained with the help of the stimulated-echo pulse sequence with water suppression (STEBPGP1S19) containing two field gradient pulses (g) with duration δ, which are separated by interval ∆ [57]. Diffusion decays were fitted by: where A(0) is the spin echo amplitude without gradient pulses, D s is the self-diffusion coefficient of molecules and γ is the proton gyromagnetic ratio. For all experiments, the amplitude of the field gradient pulses (g) was varied from 2 to 95% of its maximum under a constant diffusion time (∆ = 100 ms) and gradient pulse duration (δ = 3.6 ms). All experiments were performed at 298 K. Data processing and analysis were carried out with Bruker Topspin (v. 3.6) software.
Data processing was carried out using Bruker Topspin (v. 3.6) software. All spectra were analyzed with the help of the CCPNMR (v. 2.5) program [64].
The peptide spatial structures were visualized using UCSF Chimera (v.

MD Simulation
To generate the structural ensembles of the SEM1(45-107) and SEM1(49-107) peptides, we used an integrative approach that incorporates NMR distance constraints into molecular dynamics simulations using a method from Sinelnikova and Spoel's 2021 article [74]. The all-atom MD simulation of peptides was performed using Gromacs (v. 2022) software [41]. The Charmm36 [75] and TIP3P [76] models were used to simulate protein and water molecules, respectively. The initial conformations of SEM1(45-107) and SEM1(49-107) were prepared using the XPLOR-NIH (v. 3.6) program, where we selected structures of peptides with the lowest energy. These structures were solvated in a rhombic dodecahedron box with an initial volume of 282 nm 3 containing 9287 water molecules for SEM1(45-107) and 255 nm 3 containing 8371 water molecules for SEM1 . The neutralization of systems was performed by adding 3 Cl − ions. The obtained systems were minimized using the steepest descent algorithm with a target maximum force of 1000 kJ mol −1 nm −1 . In the next step, equilibration was performed in the canonical NVT (constant Number of particles, Volume, and Temperature) ensemble for 100 ps at 300 K using the Berendsen thermostat [77] and for 100 ps at 300 K and 1 bar in the isothermal-isobaric NPT (constant Number of particles, Pressure, and Temperature) ensemble using a Parrinello-Rahman barostat [78]. Furthermore, simulations were carried out for 100 nanoseconds using the same pressure (1 bar) and temperature (300 K) as for the equilibration process. NMR distance constraints were used as input parameters during the MD simulation [74]. After the MD simulation, structural ensembles of SEM1(45-107) and SEM1(49-107) and their MD-trajectory were obtained. In the final stage, to characterize the convergence of structural ensembles, we performed a cluster analysis of the MD-trajectories ( Figures S7 and S8) via the GROMOS clustering algorithm [79]. GROMOS clustering was performed with 0.5 nm RMSD of the Cα-Cα atom-pair distance cut-off for two structures to be neighbors.

Transmission Electron Microscopy
The presence of fibrils was shown with the help of transmission electron microscopy (TEM) using the Hitachi HT7700 Exalens scanning electron microscope (Tokyo, Japan). Solutions (10 µL) of SEM1(45-107) and SEM1(49-107) peptides in Tris-buffer (10 −3 M) were placed on a 3 mm formvar/carbon-coated copper grid, and drying was carried out at room temperature. The analysis was carried out at an accelerating voltage of 100 kV in TEM mode.