Prediction and characterisation of lantibiotic structures with molecular modelling and molecular dynamics simulations

Lantibiotics are lanthionine-containing bactericidal peptides produced by gram-positive bacteria as a defence mechanism against other bacterial species. Lantipeptides disrupt the integrity of target cells by forming pores in their cell membranes, or by preventing cell wall biosynthesis, which subsequently results in cell death. Lantibiotics are of immense importance to the food preservation and pharmaceutical industries. The rise in multidrug resistance demands the discovery of novel antimicrobials, and several authors advocate that lantibiotics hold the future of antimicrobial drug discovery. Owing to their amenability to structural modifications, novel lantibiotics with higher efficacy and antimicrobial activity can be constructed by bioengineering and nanoengineering strategies, and is opined to have immense therapeutic success in combating the rise in multidrug resistance. Understanding the structure and dynamics of lantibiotics is therefore crucial for the development of novel lantipeptides, and this study aimed to study the structural properties and dynamics of 37 lantibiotics using computational strategies. The structures of these 37 lantibiotics were constructed from homology, and their structural stability and compactness were analysed by molecular dynamics simulations. The phylogenetic relationships, physicochemical properties, disordered regions, pockets, intramolecular bonds and interactions, and structural diversity of the 37 lantipeptides were studied. The structures of the 37 lantipeptides constructed herein remained stable throughout simulation. The study revealed that the structural diversity of lantibiotics is not significantly correlated to sequence diversity, and this property could be exploited for designing novel lantipeptides with higher efficacy.

the view that they can serve as feasible alternatives to antibiotics in the future 15 . Efforts are being made to employ bioengineering strategies for the development of optimised lantipeptides and nano-engineering approaches for broadening the antibacterial spectrum of lantibiotics 16,17 . With the exception of cinnamycin, all the lantibiotics selected herein are lanthionine-containing peptide antibiotics that are able to depolarise the energised bacterial membrane, and subsequently destabilise their membrane integrity. Additionally, the 37 lantipeptides, barring cinnamycin, are capable of creating aqueous transmembrane pores 17 . Although these 36 lantibiotics are functionally similar, their structures are diverse, especially with respect to post-translational modifications, presence of unusual amino acids including dehydrated and unsaturated amino acids with variable linkage patterns, and methyl lanthionine bridges that are crucial to structural stability and function 10,18 . The tertiary structures, structural conformation, important amino acid residues, conserved domains, and intra-molecular chemical bonds need to be understood in further detail for designing engineered lantipeptides with enhanced stability and bioactivity 19 .
In this study we constructed the structures of 37 lantibiotics from over 25 organisms, using molecular modelling approaches, and studied their structural and sequence diversity, in addition to analysing their structural dynamics using molecular dynamics simulations. The lantibiotic sequences selected in this study had reviewed, manually annotated information in UniProtKB, and the existence and function of the 37 lantipeptides were experimentally proven. Table S1] belonged to five protein families (InterPro accession IDs: IPR007682, IPR006079, IPR029243, IPR027632, and IPR012519), containing five Pfam detailed signatures (Pfam accession IDs: PF04604, PF02052, PF14867, PF16934, and PF08130). Based on the composition of the conserved domains, the lantibiotics were found to belong to six super families, namely, lantibiotic type A, gallidermin, lantibiotic A, TOMM pelo, mersacidin, and antimicrobial 18. The physico-chemical properties, including the molecular weight, isoelectric point, aliphatic indices, sequence length distribution, extinction coefficients, hydropathy indices, antigenicity, and presence of disordered regions, were determined [Supplementary Figs. S1-S6, Supplementary Table S3]. phylogenetic analysis. The multiple sequence alignment (MSA) revealed that the 37 lantibiotic sequences shared a reasonable degree of sequence similarity [ Fig. 1]. The Neighbour-Joining phylogenetic tree demonstrated that the sequences belonged to three distinct evolutionarily-related clusters. The nisins (A, Z, and U) were clustered in the same group as epidermin, gallidermin, mutacins, subtillin, streptin, and pep5 [ Fig. 2]. The duramycins and epilancins were grouped along with mersacidin, lacticin, actagardine, cinnamycin, ancoverin, and paenibacillin. The third group comprised the ruminococcins, mutacin2, lichenicidins, salivaricin, streptococcin, nukacins, and cypermicin [ Fig. 2]. This third group could be further sub grouped into two -with salivaricin A, cypemycin, lacticin 3147 A1, and the lichenicidins in one subgroup, and lacticidin 481, mutacin 2, the nukacins, streptococcins, and ruminococcins in the other.
MD simulation. The lantipeptides demonstrated structural consistency throughout the simulation, indicated by the RMSD and radius of gyration 20 [Figs. 7 and 8]. The lantipeptides with a higher content of turns and coils, including ancovenin, duramycin B, actagardine, mutacin B-Ny266, and lantibiotic 107891, had the lowest radii of gyration among the 37 lantipeptides. Since the radius of gyration is a measure of structural compactness, it can be said that the structures of ancovenin, duramyin B, actagardine, mutacin B-Ny266, and lantibiotic 107891 were the most compact, while the structures of gallidermin, epilancin, lacticin 3147-A2, lacticin 481, mutacin 2, and lichenicidin VK21-A2 were the least compact among the 37 lantipeptides [ Fig. 8 and Supplementary Fig. S7]. The www.nature.com/scientificreports www.nature.com/scientificreports/ RMSF of the peptide backbone was used to determine the most flexible region of the peptide backbone [ Fig. 9]. It was noted that while the backbone RMSDs of most of the lantibiotics remained consistent throughout the simulation, the backbone RMSDs of lichenicidin VK21-A2, mutacin 2, lacticin 3147-A2, epilancin, gallidermin, and lichenicidin VK21-A1 were higher than the rest [ Fig. 7 and Supplementary Fig. S8]. Analyses of cluster density, cluster size, and average cluster RMSD revealed that the representative structure from cluster 1 was the best conformation in each case. The representative structures were superimposed with the cluster members to compute the relation between the average RMSD and the global distance test (GDT_TS) [ Supplementary Fig. S9].

Discussion
Lantibiotics are bacteroicidal peptides characterised by the presence of unusual amino acids -the thioether-containing polycyclic lanthionines and unsaturated amino acids 1 . They are produced by gram-positive bacteria for targeting other bacterial species by forming pores in the target membrane that disrupt cellular integrity or inhibit cell wall biosynthesis 9 . Lantibiotics are widely used in the food preservation and pharmaceutical industries 7 . In the present global scenario, the surge in the development of drug-resistant strains demands the development of novel drugs and antimicrobials for combating the emerging drug resistance. The high in vitro potency combined with the variety of strategies employed for effectively targeting bacterial cells, makes   www.nature.com/scientificreports www.nature.com/scientificreports/ construct the structures of 37 lantipeptides having reviewed and annotated sequence information in UniProtKB using homology modelling, and to evaluate the diversity, compactness, and stability of the structures of the 37 lantipeptides.
Analysis of the MSA revealed that the lantibiotic sequences shared a high degree of conservedness, which was in marked contrast to the diversity of their structures. The structural diversity of the 37 lantipeptides was determined from the RMSD values. The correlation coefficient between the sequence diversity and structural diversity of the 37 lantipeptides was 0.189. A value of 0.189 indicated that the structural diversity of the 37 lantibiotics is not significantly correlated to the diversity of lantibiotic sequences. This further indicates that the sequence-structure relationship of the lantibiotics selected herein is flexible, allowing room not only for human tailoring, but also explains that the natural post-transcriptional engineering is probably not an accident. Lacticin 3147-A1, lacticin 3147-A2, and cypemycin were found to contain disordered residues that are capable of binding proteins, and some of the residues were also found to comprise the pockets in the lantipeptide structures. Protein-protein interactions involving a disordered protein are generally mediated by a transition from disorder to order upon protein binding 23 . Since protein-protein interactions are often mediated by small flexible pockets at  www.nature.com/scientificreports www.nature.com/scientificreports/ the protein-protein interface, these disordered residues might be responsible for lantibiotic-protein interactions, and could undergo similar structural transitions upon binding.

Methods
Lantibiotic sequences. The existence and biological functions of the 37 lantibiotics selected in this study have been established by experimental studies, and the sequences had reviewed and manually annotated information in UniProtKB/Swiss-Prot non-redundant sequence database 24  Information from primary data. The domains, repeats, super families, and conserved patterns of the 37 lantibiotics were identified using InterPro Scan and the batch CD-search tool 25,26 . The transmembrane regions and the hydropathy indices of the lantibiotics were determined using the CLC Genomics Work Bench v 8.5. The Kyte-Doolittle and the Eisenberg scales were used for determining the local hydropathy plots. Lantibiotic antigenicity was analysed by the semi-empirical method of Kolaskar and Tongaonkarhas. Information pertaining  www.nature.com/scientificreports www.nature.com/scientificreports/ to the physico-chemical properties, such as molecular weight, isoelectric pH, aliphatic index, hydrophobicity, hydrophilicity, and amino acid composition was also computed. The disordered regions were identified with the DISOPRED3 algorithm 27 . phylogenetic analyses. An MSA of the 37 lantibiotic sequences was generated using the MUSCLE algorithm. The phylogenetic tree was constructed using the Neighbour-Joining algorithm, keeping the bootstrap value at 1000. The CLC Genomics Work Bench v 8.5 was used for phylogenetic analyses.
Homology modelling, validation, and analysis. The complete structures of the 37 lantipeptides were constructed by homology modelling, using Modeller v 9.11 28,29 . A structure BLAST was performed against the Protein Data Bank (PDB) to identify templates for comparative modelling 30,31 . Template identification was also achieved by the threading-based fold recognition method employed by the PSIPRED server (http://bioinf.cs.ucl. ac.uk/psipred/) 32 . The backbone torsions of the validated models were assessed by analysing their Ramachandran plots, while the improper geometries and clashes were evaluated by checking their stereochemistry, using ProCheck 33 . The quality of the constructed models was additionally estimated by using different servers, including the ProSA II, Verify3D, and PSVS servers [34][35][36] . The intermolecular bonds and interactions of the 37 structures generated herein were determined using the RING-2.0 web server (http://protein.bio.unipd.it/ring/) 37 . www.nature.com/scientificreports www.nature.com/scientificreports/ Identification of pockets and determination of structural diversity. The secondary structure composition of the lantipeptides were determined with STRIDE (http://webclu.bio.wzw.tum.de/cgi-bin/stride/stridecgi.py) 38 . The pockets were identified using CASTp (http://sts.bioe.uic.edu/castp/), with a probe of radius 1.4 Å 39 . The structural diversity of the lantipeptides was analysed by calculating the RMSD values following structural superimposition of the 37 lantibiotic structures. Each lantipeptide structure was individually superimposed and the intra-RMSD value was computed using CLC Genomics Work Bench v 8.5. In order to understand the structural correlation among the 37 lantipeptides with respect to their intra-RMSD values, a data matrix [ Supplementary SF2] of all the 37 lantibiotics were prepared and standardised prior to the PCA. The PCA was performed with the ClustVis tool 40 , where vector scaling is applied to the rows and SVD with imputation is used to calculate the principal components of N = 37 data points.
Molecular dynamics simulation and trajectory analyses. The structural stability, compactness, backbone flexibility, and per-residue fluctuations were characterised by performing coarse-grained molecular dynamics (MD) simulations of the lantibiotic structures in explicit water. The simulations were performed by combining the four most widely used force fields, namely, Amber, Gromos, OPLS, and CHARMM, in the CABS simulation procedure, run on a high-performance computing server (http://biocomp.chem.uw.edu.pl/CABSflex/) 41,42 . The CABS protein representation was reduced up to four pseudo-atoms per residue, and the sampling was realised by the Monte Carlo method 43 . The simulation length was optimised to obtain the best possible convergence within 10 ns. The trajectories were analysed with VMD and VEGA ZZ 44 . The mean-square-fluctuation [(ΔR) 2 ] was calculated using the following equation: where < > denotes the average across the entire trajectory, x represents the position of a particle i in the frame j, and N represents the total number of frames in the trajectory 41,44 . The trajectories were clustered using the k-means clustering method in such a way that structurally closer models belonged to the same cluster. The best conformation of each lantibiotic was selected after screening the trajectories. Each cluster was superimposed for identifying the best conformation using the Theseus application. The RMSD and RoG of the lantipeptides were determined across the simulation time frame. The root mean square fluctuation (RMSF) was determined for estimating the residual fluctuations, and the most flexible regions were identifiedfrom the RMSF graphs. The stability of the system and the fluctuations across the trajectories were analysed with XMGRACE 45 .