Maquiberry Cystatins: Recombinant Expression, Characterization, and Use to Protect Tooth Dentin and Enamel

Phytocystatins are proteinaceous competitive inhibitors of cysteine peptidases involved in physiological and defensive roles in plants. Their application as potential therapeutics for human disorders has been suggested, and the hunt for novel cystatin variants in different plants, such as maqui (Aristotelia chilensis), is pertinent. Being an understudied species, the biotechnological potential of maqui proteins is little understood. In the present study, we constructed a transcriptome of maqui plantlets using next-generation sequencing, in which we found six cystatin sequences. Five of them were cloned and recombinantly expressed. Inhibition assays were performed against papain and human cathepsins B and L. Maquicystatins can inhibit the proteases in nanomolar order, except MaquiCPIs 4 and 5, which inhibit cathepsin B in micromolar order. This suggests maquicystatins’ potential use for treating human diseases. In addition, since we previously demonstrated the efficacy of a sugarcane-derived cystatin to protect dental enamel, we tested the ability of MaquiCPI-3 to protect both dentin and enamel. Both were protected by this protein (by One-way ANOVA and Tukey’s Multiple Comparisons Test, p < 0.05), suggesting its potential usage in dental products.


Introduction
The cystatin superfamily of proteins is composed of reversible inhibitors of cysteine proteinases, mainly from the family C1A. They present a competitive mechanism, acting as a pseudo-substrate [1][2][3]. Its high inhibitory capacity is due to a tight-binding interaction caused by three conserved contact points between the proteases and the inhibitor. They consist of an N-terminal region containing a glycine residue, a central β-hairpin loop with a Gln-X-Val-X-Gly motif and a C-terminal β-hairpin loop featuring a tryptophan residue [4][5][6]. The cystatin superfamily is subdivided into four families. Three of them-the stefins, the cystatins and the kininogens-are from animal origin, and the last one, the phytocystatins, is present in plants [1,6,7].
Compared to animal cystatins, phytocystatins show high homology with members of the cystatin family, although they lack disulfide bonds as stefins. They also present an additional conserved motif, with an unknown function, in an N-terminal α-helix with the following sequence: phytocystatins have a C-terminal extension containing an SNSL motif, which gives them the capacity to inhibit C13 cys-proteases [8].
Plant cystatins are thought to play a number of physiological roles, such as regulating endogenous cysteine peptidases during seed development and germination [9], senescence [10] and abiotic stresses, such as drought [11], cold [12] and high salinity [13]. The defensive roles of these inhibitors against insects [14], fungi [15] and nematodes [16] have also been reported. This evidence demonstrates how cystatins might be important in the development of biotechnological approaches in agriculture [17].
Phytocystatins have also been described with potential biotechnological applications in medicine, and this is strongly related to their capacity to inhibit human cathepsins [18,19]. These inhibitors have been reported to regulate distinct pathologies, showing anticancer, anti-inflammatory and osteogenic effects [20][21][22]. They are also associated with the inhibition of pathogens of different natures, such as the malaria parasite, Plasmodium falciparum [23], and the yeast, Candida spp. [24]. Recently, plant cystatins have demonstrated potential applications in dentistry. Studies with a sugarcane cystatin (CaneCPI-5) revealed that this protein decreased the initial erosion of the dental enamel and also reduced biofilm activity and mineral loss related to caries progression [25][26][27].
Maqui (Aristotelia chilensis (Mol.) Stuntz) is a native plant of the Elaeocarpaceae family. As a shrubby and perennial species, it grows wild in the central and southern regions of Chile and Argentina. It produces small purple-colored berries, which are typically consumed fresh or as jams and beverages [28][29][30]. These fruits, as well as maqui leaves, have been part of traditional medicine as treatments for sore throats, kidney pain, digestive ailments, fever and scarring injuries [31,32].
Nowadays, maqui has been highlighted due to its phytochemicals, mainly anthocyanins, which are available in high quantities. This fact not only calls attention to its uses as a natural pigment but also to its potential as a health agent. The fruit's high anthocyanin content has been described as responsible for its high antioxidant properties, which has made maqui to be considered a superfruit [33,34]. In this context, maqui anthocyanins have been reported to act as cardioprotective [35], antidiabetic [36] and anti-inflammatory [37] effectors. Currently, several studies are focused on the addition of plant polyphenols in dental materials, as well as their direct application on the tooth surface [38]. In the context of erosive tooth wear, the use of proteins, polyphenols and natural extracts has demonstrated protection for the tooth structure when applied directly to this surface or through modification of the acquired pellicle [25,39,40]. Although maqui has been studied in terms of its polyphenols, little is known about its genes and proteins.
In this study, we report the identification and phylogenetic analysis of Aristotelia chilensis cystatins, as well as the cloning, recombinant expression and inhibitory profile of five distinct proteins. In vitro assays were performed with maquicystatins against papain and human cathepsins L and B, and considerable inhibition was observed. Our results contribute to describing these inhibitors and presenting their interaction with relevant cysteine proteases, reflecting their potential uses in agriculture and medicine. Furthermore, considering cystatin's applicability in dentistry, MaquiCPI-3, which has the highest production yield, was studied regarding dental enamel and dentin protection capacity.
The following null hypothesis was tested: solutions containing different concentrations of MaquiCPI-3 do not protect enamel and dentin against initial erosion in vitro.

RNA Extraction
Wild maqui plants (Aristotelia chilensis (Mol.) Stuntz) were selected from the surroundings of the locality of Vilches (35 • 35 59 south; 71 • 11 6 west), Maule Region, Chile. Plantlets were cultivated in Temporary Immersion Bioreactor (TIB), as described [41]. Bioreactors were maintained at 25 ± 2 • C under natural sunlight and cool-white fluorescent tubes at a light intensity of 100 µM m −2 s −1 . Culture medium (200 mL) was MS basal salts, TDZ 1 mg/L, pH 5.6. Air was enriched with 0.4 MPa CO 2 , while the immersion frequency was every 6 h for 4 min. After three weeks of plant multiplication, RNA extraction (about 1 g biomass) was made from 15-day-old plants using Trizol (Invitrogen, Carlsbad, CA, USA), according to manufacturer instructions. RNA integrity was analyzed by the rRNA pattern in 1% agarose gel, while the purity was checked in a NanoDrop 1000 (Thermo Scientific Inc., Wilmington, DE, USA).

Expression and Purification of Recombinant Cystatins
Total RNA was used as template for cDNA synthesis performed by the High-Capacity cDNA Reverse Transcription Kit (Applied Biosystems, Carlsbad, CA, USA), following manufacturer instructions. The coding regions for A. chilensis mature cystatins (MaquiCPIs 1 to 5) were obtained by amplification of the cDNA template using the primers described in Table S1. Forward and reverse primers contained a site for the restriction enzymes NdeI and SalI, respectively, for directional cloning into pET28a (Novagen), previously digested with the same enzymes. Briefly, PCRs were performed in a 25 µL reaction containing 200 µM dNTPs (Invitrogen), 1x reaction buffer, 12.5 pmol of each primer, 2.5 U of Taq High Fidelity Pol (Cellco Biotec, Brazil) and the template cDNA. The reaction was carried out with the following cycle: 95 • C for 3 min, followed by 35 cycles at 95 • C for 1 min, 45-55 • C for 1 min and 72 • C for 1 min with final extension at 72 • C for 7 min. The PCR product was digested with NdeI and SalI and cloned into pET28a (Novagen) in frame with a 5 His-tag coding sequence.
Recombinant expression and purification of maquicystatins were performed as previously described [55]. In short, the recombinant plasmids were transformed into E. coli Rosetta TM DE3 competent cells (Novagen), and bacterial cultures were grown at 37 • C and 250 rpm until they reached OD 600 = 0.5. Expression was induced by the addition of IPTG (isopropyl-β-D-thiogalactopyranoside) to a final concentration of 0.4 mM and performed for 4 h. Due to the presence of a His-tag, the recombinant proteins were purified using Ni-NTA Superflow Resin (Qiagen, Valencia, CA, USA) following the manufacturer's instructions. All the protein production steps were analyzed by SDS-PAGE 15% as described elsewhere [56]. Purified fractions were dialyzed in PBS buffer (137 mM NaCl, 2.7 mM KCl, 10 mM Na 2 HPO 4 , 1.8 mM KH 2 PO 4 ) pH 7.4 (MaquiCPIs 1, 2 and 5) or pH 8 (MaquiCPIs 3 and 4). Total protein quantification was performed using Pierce BCA Protein Assay Kit (Thermo Scientific, Rockford, IL, USA).

Enzyme Inhibition Assays
The inhibitory potential of MaquiCPIs was evaluated against papain (10 nM), human cathepsins B (1.9 nM) and L (3.5 nM) (Calbiochem, San Diego, CA, USA). Enzymes were preincubated in the activation buffer (0.1 M sodium acetate buffer, pH 5.5, containing 2.5 mM dithiothreitol (USB) in a final volume of 500 µL) for 5 min at 37 • C. The fluorogenic substrate Z-Phe-Arg-AMC (2.5 µM, 8 µM and 20 µM for papain, cathepsin L and B, respectively) (Calbiochem) was used to determine the catalytic activity of the cysproteases. Cystatins were added to the reactions at increasing concentrations. Fluorescence changes were monitored continuously using Hitachi F-2500 spectrofluorometer (Hitachi) at λ ex = 380 nm and λ em = 460 nm. The inhibitory potential of maquicystatins was determined using the residual enzymatic activity of the cys-proteases after the addition of the inhibitor. Slope values were obtained using the FL Solutions 2.0 program, and the apparent inhibition constant (Ki app ) values were obtained using the following equation [57]: where V 0 and V i are the velocities of substrate hydrolysis in the absence and presence of different inhibitor concentrations, [I], respectively. The assays were performed in triplicate, and the Ki parameters were obtained from the following equation [57]: The following Km values were used to correct the values for substrate competition: 23 µM cathepsin B, 2 µM cathepsin L and 10 µM papain [58].

Preparation of Bovine Enamel and Dentin Samples
One hundred and forty samples of bovine enamel and root dentin were prepared (4 mm × 4 mm × 3 mm) using a cutting machine (ISOMET Low-Speed Saw Buehler, Lake Bluff, IL, USA). All samples were polished with 300, 600 and 1200 granulation silicon carbide sandpaper (Extec Corp. Papers, Buehler, Lake Bluff, IL, USA). Subsequently, a felt was used (Polishing cloth, Buehler, Lake Bluff, IL, USA) with diamond solution (Extec Corp., Buehler, Lake Bluff, IL, USA). A total of 120 µm of tooth structure (enamel or dentin) was removed. The samples were washed in an ultrasonication bath (5 min) after each granulation, as described above, and stored at 4 • C until the beginning of the experimental procedure [59].

Human Saliva Acquisition
Nine adults (four men and five women) aged approximately 30 years were selected for this study based on the following selection criteria for general health: non-pregnant women, non-smokers, no use of prolonged medication and without systemic diseases. They were also checked for the following oral health-specific criteria: without active caries or periodontal disease and presenting normal salivary flow (stimulated saliva >1.0 mL/min and unstimulated saliva >0.3 mL/min) [27]. The stimulated saliva was collected in the morning period (from 9:00 to 9:30), using paraffin wax for 10 min. Then, all saliva was pooled, and the supernatants were separated using centrifugation (14,000× g, 20 min, at 4 • C). Finally, the saliva was aliquoted and stored at −80 • C [25].

Treatment Groups
The samples (enamel and dentin) were randomly distributed to 7 groups (n = 20/group):

Treatment, Acquired Pellicle Formation and Erosive Process
At first, the surfaces of the enamel and dentin samples were individually treated (250 µL), according to the groups displayed above for 2 h, at 37 • C, under agitation of 300 rpm. After this period, the samples were washed (10 s) and dried (5 s). Then, the acquired pellicle was individually formed by exposure to 250 µL of human saliva for 2 h, at 37 • C, under agitation of 300 rpm. Subsequently, the samples were washed (10 s) and dried (5 s). Lastly, the erosive process was also individually performed by incubating the samples in 1 mL of 1% citric acid (pH = 3.6) for 1 min, at 25 • C, under agitation of 300 rpm. Again, the samples were washed (10 s) and dried (5 s). These experimental procedures were repeated 3 times, on 3 consecutive days. Between the days of the experiment intervals, the samples were stored under humidity control at 4 • C [25].

Percentage of Surface Microhardness Change (%SMC)
The values were obtained with a Microhardness Tester, using a Knoop diamond (SMH-HMV-2000, Shimadzu, Kyoto, Japan). Six indentations per sample were made (at intervals of 25 µm between them) on a defect-free surface area at the beginning of the experiment (SM baseline ) and after the experimental procedure (SM final ). For the enamel samples, a load of 50 g and a dwell time of 15 s was used [25], while for the dentin samples, a load of 10 g and a dwell time of 15 s was used [60]. The data were tabulated as percentage of surface microhardness change (%SMC), according to the following equation [25]: (3)

Statistical Analysis
GraphPad Prism software (version 6.0 for Windows, GraphPad Software Inc., La Jolla, CA, USA) was used. All analyses were checked for normality (Kolmogorov-Smirnov test) and homogeneity (Bartlett test). The data from enamel and dentin surfaces were analyzed using One-way ANOVA and Tukey's Multiple Comparisons Test. The significance level was set at 0.05.

In Silico Analysis of Cystatin Sequences
We identified six putative non-redundant cystatin sequences (named MaquiCPI-1 to 6) in an Aristotelia chilensis transcriptome of plantlets cultivated in TIBs (Table S2). Figure 1 shows the multiple alignments between MaquiCPI amino acid sequences, highlighting the cystatin inhibitory motifs and the characteristic N-terminal plant cystatin motif. All of them present the typical cystatin motifs (GG, QxVxG and W), while MaquiCPI-5 and MaquiCPI-6 have a C-terminal extension, described as a cystatin-like domain, which includes an additional SNSL legumain inhibitory motif. All proteins, except MaquiCPI-4 and MaquiCPI-5, possess a putative signal peptide sequence of about 19-28 amino acids, and they comprise proteins with 101 to 224 amino acid residues. The molecular weight ranges from 11 to 25 kDa. experiment (SM baseline) and after the experimental procedure (SM final). For the ename samples, a load of 50 g and a dwell time of 15 s was used [25], while for the dentin samples a load of 10 g and a dwell time of 15 s was used [60]. The data were tabulated as percentage of surface microhardness change (%SMC), according to the following equation [25]: (3

Statistical Analysis
GraphPad Prism software (version 6.0 for Windows, GraphPad Software Inc., La Jolla, CA, USA) was used. All analyses were checked for normality (Kolmogorov-Smirnov test) and homogeneity (Bartlett test). The data from enamel and dentin surface were analyzed using One-way ANOVA and Tukey's Multiple Comparisons Test. The significance level was set at 0.05.

In silico Analysis of Cystatin Sequences
We identified six putative non-redundant cystatin sequences (named MaquiCPI-1 to 6) in an Aristotelia chilensis transcriptome of plantlets cultivated in TIBs (Table S2). Figure  1 shows the multiple alignments between MaquiCPI amino acid sequences, highlighting the cystatin inhibitory motifs and the characteristic N-terminal plant cystatin motif. All o them present the typical cystatin motifs (GG, QxVxG and W), while MaquiCPI-5 and MaquiCPI-6 have a C-terminal extension, described as a cystatin-like domain, which includes an additional SNSL legumain inhibitory motif. All proteins, except MaquiCPI-4 and MaquiCPI-5, possess a putative signal peptide sequence of about 19-28 amino acids and they comprise proteins with 101 to 224 amino acid residues. The molecular weigh ranges from 11 to 25 kDa.  with and without C-terminal extension, displaying the QVVAG sequence for the former and QVVSG for the latter. The dipeptide motif from the C-terminal loop (PW) is conserved in most cystatins, although MaquiCPI-2 has an alanine residue instead of a proline preceding the tryptophan.
A phylogenetic tree was built including the six identified cystatins from A. chilensis ( Figure 2). The amino acid sequences were clustered into three groups. Group A is composed of MaquiCPI-4 and an inner clade, formed by the extended cystatins MaquiCPI-5 and MaquiCPI-6. The proteins of this group share a highly conserved N-terminal α-helix motif: LARFAV-[DEQ]-EHN. MaquiCPI-1 is the only example from group B, and it is characterized by an extension of eight amino acid residues starting in position 59 from the alignment (Figure 1). Finally, group C comprises MaquiCPI-2 and MaquiCPI-3, which lack any extension and the N-terminal α-helix displays the IGEFAVD-[EA]-YN pattern. When a phylogenetic tree (Figure 3) was constructed using cystatins from maqui and Oriza sativa ssp. japonica, three clusters were also observed. MaquiCPIs were distributed within the groups following the same pattern described in Figure 2.
In all protein sequences, the GG dipeptide mot if is conserved in the N-terminal region. The typical phytocystatin motif in A. chilensis is presented in all groups as [LI]- The central loop motif (QxVxG) diverged between proteins with and without C-terminal extension, displaying the QVVAG sequence for the former and QVVSG for the latter. The dipeptide motif from the Cterminal loop (PW) is conserved in most cystatins, although MaquiCPI-2 has an alanine residue instead of a proline preceding the tryptophan.
A phylogenetic tree was built including the six identified cystatins from A. chilensis ( Figure 2). The amino acid sequences were clustered into three groups. Group A is composed of MaquiCPI-4 and an inner clade, formed by the extended cystatins MaquiCPI-5 and MaquiCPI-6. The proteins of this group share a highly conserved Nterminal α-helix motif: LARFAV-[DEQ]-EHN. MaquiCPI-1 is the only example from group B, and it is characterized by an extension of eight amino acid residues starting in position 59 from the alignment (Figure 1). Finally, group C comprises MaquiCPI-2 and MaquiCPI-3, which lack any extension and the N-terminal α-helix displays the IGEFAVD-[EA]-YN pattern. When a phylogenetic tree (Figure 3) was constructed using cystatins from maqui and Oriza sativa ssp japonica, three clusters were also observed. MaquiCPIs were distributed within the groups following the same pattern described in Figure 2.  [54]. A discrete Gamma distribution was used to model evolutionary rate differences among sites (2 categories). The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. Nodes support values are shown next to the branches. This analysis involved 18 amino acid sequences, where only positions with 75% site coverage or higher were kept.  [54]. A discrete Gamma distribution was used to between proteins with and without C-terminal extension, displaying the QVVAG sequence for the former and QVVSG for the latter. The dipeptide motif from the Cterminal loop (PW) is conserved in most cystatins, although MaquiCPI-2 has an alanine residue instead of a proline preceding the tryptophan.
A phylogenetic tree was built including the six identified cystatins from A. chilensis ( Figure 2). The amino acid sequences were clustered into three groups. Group A is composed of MaquiCPI-4 and an inner clade, formed by the extended cystatins MaquiCPI-5 and MaquiCPI-6. The proteins of this group share a highly conserved Nterminal α-helix motif: LARFAV-[DEQ]-EHN. MaquiCPI-1 is the only example from group B, and it is characterized by an extension of eight amino acid residues starting in position 59 from the alignment (Figure 1). Finally, group C comprises MaquiCPI-2 and MaquiCPI-3, which lack any extension and the N-terminal α-helix displays the IGEFAVD-[EA]-YN pattern. When a phylogenetic tree (Figure 3) was constructed using cystatins from maqui and Oriza sativa ssp japonica, three clusters were also observed. MaquiCPIs were distributed within the groups following the same pattern described in Figure 2.   [54]. A discrete Gamma distribution was used to and C correspond to different phylogenetic groups. Evolutionary analysis by Maximum Likelihood method using the Whelan and Goldman model [54]. A discrete Gamma distribution was used to model evolutionary rate differences among sites (2 categories). The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. Nodes support values are shown next to the branches. This analysis involved 18 amino acid sequences, where only positions with 75% site coverage or higher were kept.

Protein Expression Purification
The ORFs from five of six cystatins (MaquiCPI-1 to MaquiCPI-5) were successfully obtained by PCR using specific primers and subcloned in a pET28a vector in frame with an N-terminal His-tag coding sequence. Expression was induced in E. coli Rosetta (DE3), and purification was successfully performed in a single step using affinity chromatography. SDS PAGE analysis (Figure 4) revealed that, after sonication, the recombinant proteins were mostly present in the soluble fraction and were able to be directly purified. The yields of purified MaquiCPIs 1 to 5 were 17.5, 23.6, 120.0, 18.5 and 44.6 mg per liter of cell culture, respectively. The amounts of pure protein were sufficient for performing activity assays.
used to model evolutionary rate differences among sites (2 categories). The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. Nodes support values are shown next to the branches. This analysis involved 18 amino acid sequences, where only positions with 75% site coverage or higher were kept.

Protein Expression Purification
The ORFs from five of six cystatins (MaquiCPI-1 to MaquiCPI-5) were successfully obtained by PCR using specific primers and subcloned in a pET28a vector in frame with an N-terminal His-tag coding sequence. Expression was induced in E. coli Rosetta (DE3), and purification was successfully performed in a single step using affinity chromatography. SDS PAGE analysis ( Figure 4) revealed that, after sonication, the recombinant proteins were mostly present in the soluble fraction and were able to be directly purified. The yields of purified MaquiCPIs 1 to 5 were 17.5, 23.6, 120.0, 18.5 and 44.6 mg per liter of cell culture, respectively. The amounts of pure protein were sufficient for performing activity assays.

Inhibitory Activity Assay
MaquiCPIs were tested against papain ( Figure S1) and human cathepsins B ( Figure S2) and L ( Figure S3) to assess their inhibition profile. The Ki of each A. chilensis cystatin tested against the cys-proteases are presented in Table 1.

Discussion
In this work, we identified six different maquicystatins sequences in a transcriptome of maqui plantlets cultivated in a Temporary Immersion Bioreactor (TIB). Five of them were cloned and recombinantly expressed in E. coli Rosetta (DE3) and tested against different cysteine proteases. Additionally, one of them was tested to protect tooth dentin and enamel, a possible application of this protein.
All deduced proteins share three conserved motifs that help them to interact with the active site cleft of cysteine proteases. These are a conserved glycine and a LARFAV motif in the N-terminal region, a very conserved QxVxG motif in a central loop of the protein and a second loop close to the C-terminal region presenting a tryptophan residue [6,49]. MaquiCPI-5 and MaquiCPI-6 have a C-terminal extension containing a motif composed of four amino acids: SNSL, related to the ability to inhibit legumains [8].
The presence of putative signal peptides was also investigated. It was clear that MaquiCPI-1, MaquiCPI-3 and MaquiCPI-6 presented this classic sequence for secretion, suggesting they are extracellular. On the other hand, MaquiCPI-2 did not present a typical signal for secretion. However, after cloning and expression of the entire MaquiCPI-2 sequence, the expression was unsuccessful. This question led us to a deeper investigation of the first residues of the MaquiCPI-2 N-terminal sequence. Another analysis was made using the prediction software iPSORT [61], and this demonstrated, with high probability, that the amino acids from 1 to 30 signalize the cystatin to the chloroplast. Madureira et al. (2006) identified, by immunolocalization, traces of a tomato multicystatin naturally occurring in chloroplasts [62]. Prins et al. (2008) described the effects of a transgenic tobacco plant overexpressing the phytocystatin OC-I, from rice, on leaf senescence. They observed that the cystatin was not only present in vacuoles and cytosol but also in chloroplasts and, compared to the wild-type plant, there were differences in the protein content and turnover, which culminated in senescence delay [63]. Alomrani et al. (2021) investigated the behavior of a transgenic Arabidopsis thaliana event expressing OC-I targeted to the chloroplast. They evaluated its effect on photosynthesis and the accumulation of leaf pigments was evident, suggesting a retardation in senescence. This indicates that the cystatin targets in the chloroplasts might be related to pigment degradation and/or biosynthesis [64]. These reports are in accordance with recent research revealing an active cysteine protease (HvPAP14) in barley chloroplasts. The protein is

Discussion
In this work, we identified six different maquicystatins sequences in a transcriptome of maqui plantlets cultivated in a Temporary Immersion Bioreactor (TIB). Five of them were cloned and recombinantly expressed in E. coli Rosetta (DE3) and tested against different cysteine proteases. Additionally, one of them was tested to protect tooth dentin and enamel, a possible application of this protein.
All deduced proteins share three conserved motifs that help them to interact with the active site cleft of cysteine proteases. These are a conserved glycine and a LARFAV motif in the N-terminal region, a very conserved QxVxG motif in a central loop of the protein and a second loop close to the C-terminal region presenting a tryptophan residue [6,49]. MaquiCPI-5 and MaquiCPI-6 have a C-terminal extension containing a motif composed of four amino acids: SNSL, related to the ability to inhibit legumains [8].
The presence of putative signal peptides was also investigated. It was clear that MaquiCPI-1, MaquiCPI-3 and MaquiCPI-6 presented this classic sequence for secretion, suggesting they are extracellular. On the other hand, MaquiCPI-2 did not present a typical signal for secretion. However, after cloning and expression of the entire MaquiCPI-2 sequence, the expression was unsuccessful. This question led us to a deeper investigation of the first residues of the MaquiCPI-2 N-terminal sequence. Another analysis was made using the prediction software iPSORT [61], and this demonstrated, with high probability, that the amino acids from 1 to 30 signalize the cystatin to the chloroplast. Madureira et al. (2006) identified, by immunolocalization, traces of a tomato multicystatin naturally occurring in chloroplasts [62]. Prins et al. (2008) described the effects of a transgenic tobacco plant overexpressing the phytocystatin OC-I, from rice, on leaf senescence. They observed that the cystatin was not only present in vacuoles and cytosol but also in chloroplasts and, compared to the wild-type plant, there were differences in the protein content and turnover, which culminated in senescence delay [63]. Alomrani et al. (2021) investigated the behavior of a transgenic Arabidopsis thaliana event expressing OC-I targeted to the chloroplast. They evaluated its effect on photosynthesis and the accumulation of leaf pigments was evident, suggesting a retardation in senescence. This indicates that the cystatin targets in the chloroplasts might be related to pigment degradation and/or biosynthesis [64]. These reports are in accordance with recent research revealing an active cysteine protease (HvPAP14) in barley chloroplasts. The protein is activated in the thylakoid lumen due to its acidic pH, and its active form is found in the thylakoid membranes, where it performs a proteolytic role. It was also observed that HvPAP14 contributes to the degradation of the large subunit of Rubisco [65]. A prior report also described the presence of cysteine proteases in the thylakoid lumen of spinach leaves [66]. Therefore, MaquiCPI-2 might be associated with the maintenance of tissue homeostasis, and a deep investigation about its function and localization should be performed in the future.
It is well known that signal peptides usually negatively interfere with recombinant protein expression [55]. When the MaquiCPI-2 predicted signal sequence was removed, the protein expression was successful. This fact supports the possibility of this N-terminal sequence's involvement in the localization of this cystatin in the plant cell. To establish the size of the peptide removed, we compared the N-terminal regions of MaquiCPI-2 and MaquiCPI-3. The signal peptide cleavage site prediction of MaquiCPI-3 is between residues 22 and 23 (AASA-RI). The alignment shows a similar region in MaquiCPI-2 (AISA-WK) between residues 29 and 30. Six residues ahead, there is a conserved block among both proteins (GGWT). Accordingly, due to the similarities described and to maintain the probable distances, the first 29 amino acids were discarded in the plasmid construction, keeping the tryptophan residue, which, in the iPSORT prediction, is part of the signal peptide.
A phylogenetic analysis among MaquiCPIs was carried out, and three groups were identified. They share characteristics related to amino acid sequence extensions and the plant cystatin exclusive N-terminus alpha helix motif. The analysis was complemented with amino acid sequences from 12 rice cystatins, which are well characterized. It was observed that, despite the evolutionary distance among the species, only three groups were formed. This supports the evidence of three groups formed by MaquiCPIs, which is what happens in other species such as citrus, barley and turnip, as well as rice [18,[67][68][69]. According to Balbinott and Margis (2022), the diversification in clusters occurred due to the evolution of an ancestral cystatin gene from the most recent common ancestor (MRCA) of Viridiplantae [70]. This gene was subjected to an in tandem duplication, resulting in a carboxy-extended form, which is present only in plants. This extension underwent a process of neofunctionalization, culminating in the ability to inhibit legumain-like proteases (from the C13 family). In A. chilensis, these cystatins are represented by MaquiCPI-5 and MaquiCPI-6. Parallelly, the evolution of the remaining single-domain phytocystatin resulted in part of the current phytocystatins with introns. Genes with the same characteristics also originated from recent losses of the second domain of carboxy-extended phytocystatins. MaquiCPI-4 is an example that presents these attributes. Finally, a singledomain cystatin from the MRCA of flowering plants was retroduplicated. It resulted in intron loss and, consequently, in the emergence of intronless phytocystatins. Duplication and diversification events culminated in two distinct clusters. In maqui plants, one group is represented by MaquiCPI-1, and the other by MaquiCPI-2 and MaquiCPI-3.
Five of the six identified phytocystatins could be cloned and recombinantly expressed with a satisfactory yield, mostly in the soluble fraction. Regarding yield, MaquiCPI-3 stands out among the phytocystatins due to the fact that more than 100 mg can be produced per liter of E. coli culture. Although there are divergences in the methods of production, such as culture media, IPTG concentration and E. coli strain, it is clear that the quantity of MaquiCPI-3 produced is superior to most phytocystatins produced with this expression system (Table S3). The yield is also greater than that of human cystatins of biotechnological interest, such as stefin B and cystatin C. Therefore, the high production level of these peptidase inhibitors reveals considerable potential for pharmaceutical, agricultural and biotechnological applications in industry.
When we analyzed MaquiCPIs by SDS-PAGE, we noticed additional bands that were twice their molecular weight. Probably, these homodimers are generated by domain swapping, which was also observed in other phytocystatins [71].
All maquicystatins were able to inhibit papain efficiently, with Ki values of 7.13 nM, 1.42 nM, 3.29 nM, 2.99 nM, and 5.05 nM from MaquiCPIs 1 to 5. The inhibitory capacity against papain family peptidases indicates that the recombinant MaquiCPIs were produced in the right conformation and proves that they are effective cysteine peptidase inhibitors.
Cathepsin B was inhibited only by MaquiCPIs 1, 2 and 3 in the nanomolar order, with Ki of 35.74 nM, 20.97 nM and 21.94 nM, respectively. However, MaquiCPIs 4 and 5 showed a lower inhibition potential, with Ki in the micromolar order (0.876 µM and 5.47 µM). Maquicystatins 4 and 5 belong to the same phylogenetic group (Figures 2 and 3). This group is formed by cystatins with or without C-terminal extensions of which the genes present introns [70]. By amplifying the genomic DNA, we observed that the sequence associated with MaquiCPI-4 possesses an intron (data not shown). There is evidence that phytocystatins encoded by genes with introns are less likely to inhibit cathepsin B [70], presenting higher Ki values or no observable inhibition. Sugarcane cystatin genes with introns are represented by CaneCPI-1, CaneCPI-2, CaneCPI-3 and CaneCPI-6. The first and second do not have any C-terminal extension and present Ki values higher than 100 nM. The latter is extended, and the Ki values surpass 1.5 µM [19]. Other cystatins encoded by genes with introns from clementine (CclemCPI-3), sweet orange (CsinCPI-2) [18], amaranth (AhCPI) [72] and barley (HvCPI-4) [73] do not present detectable cathepsin B inhibition.
The difficulty of inhibiting cathepsin B is related to the existence of an occlusion loop in the enzyme that blocks the catalytic cleft, hampering their interaction [74]. When cystatins are able to inhibit cathepsin B, a process of two steps can be observed. First, the occlusion loop undergoes a conformational change, unblocking the active site, then allowing the inhibition process [75,76]. This mechanism is associated with phytocystatins N-terminal region [77,78]. The inhibition of cathepsin B by phytocystatins has revealed their potential to control health issues such as cancer and inflammatory diseases [20,22].
All of the A. chilensis cystatins were able to inhibit cathepsin L with satisfactory Ki values of 0.34, 0.33, 0.38, 0.57 and 1.25 nM, from MaquiCPI-1 to 5. These values indicate that all MaquiCPIs have the potential to control pathologies resulting from deregulated function or cellular quantities of this protease. As cathepsin L is associated with neurological problems such as Parkinson's disease, urological issues such as proteinuria and even cancers [79][80][81], it would be of great relevance to study the intervention of maquicystatins in these disorders. Recently, the inhibition of cathepsin L has received attention as a strategy for controlling the SARS-CoV-2 cycle because it is essential for one of the viral entry pathways into the cell [82].
This study brought important results regarding the use of MaquiCPI-3 in the field of dentistry. Our in vitro protocol was designed to evaluate the protective effect and the best concentration of MaquiCPI-3 against an initial dental erosion process. In this sense, we used bovine enamel and dentin samples, which demonstrate a similar structure to human tooth samples [83]. Additionally, we carried out the erosive challenge with citric acid to simulate an extrinsic erosion process (similar to the consumption of citric juices) [84]. Regarding the sequence of application, we followed our previous protocols of "acquired pellicle engineering", in which the treatment is used before the acquired pellicle formation [25,85]. Furthermore, we used two treatments as positive controls: Elmex ® and CaneCPI-5. The first is a highly effective commercial solution for controlling erosive tooth wear, containing fluoride and tin [86]. The second solution, containing a sugarcanederived phytocystatin, has also demonstrated similar protection to Elmex ® for enamel erosion in different protocols [87][88][89][90]. Although the protocol employed in this in vitro study might have been suitable to answer the questions posed, some limitations might be acknowledged, such as the long treatment time (2 h), which does not reproduce the clinical condition, since mouthwashes are typically used between 1 and 2 min. In addition, the time of formation of the acquired pellicle may have also allowed the denaturation of proteins present in human saliva due to the long incubation time. Another fact that deserves to be pointed out is the intact model of pellicle formation, which does not happen in the oral cavity because of the presence of salivary flow. Despite another phytocystatin (CaneCPI-5) that has been shown to be effective in protecting enamel against erosion in vitro [25,[87][88][89], in situ [27,59] and in vivo [85,91], this is the first study evaluating the protective potential of a Maqui-berry-derived cystatin. Our results demonstrate that all MaquiCPI-3 concentrations (ranging from 0.1 to 1.0 mg/mL) were effective in protecting enamel against initial dental erosion. Our group has demonstrated that CaneCPI-5 has a strong binding force to hydroxyapatite and that, similarly to human cystatin-B [91], when present in the AP, is resistant to removal by citric and lactic acids [25]. Thus, we suggest that MaquiCPI-3 binds preferably to the enamel surface due to its high binding force to hydroxyapatite [25] and that, after binding, it remodels the architecture of the whole AP, increasing the amount of acid-resistant proteins within this integument [85]. It is important to highlight that 0.5 mg/mL MaquiCPI-3 showed significantly higher protection when compared to the other concentrations and the positive control groups (Elmex ® and CaneCPI-5). This is an important finding since, so far, Elmex ® is the commercial product with the best results to protect against erosion. It is based on the combination of three inorganic components (Elmex ® -SnCl 2 /NaF/AmF) [92,93], while our MaquiCPI-3 solution is based on a single organic component. This good performance may be related to the ideal amount of MaquiCPI-3 to bind to available enamel sites so that there is no lack or excess of this treatment on enamel. In addition, 1.0 mg/mL MaquiCPI-3 demonstrated intermediate protection (similar to the concentrations 0.1 and 0.25 mg/mL), showing that there is no need to test higher concentrations.
When evaluated for protection against dentin erosion, MaquiCPI-3 led to distinct results. This might be explained by the different composition of the dentin tissue. This layer, opposite to enamel, is composed of a large organic content (collagen) that, when demineralized, slows down the progression of erosion. However, this layer can be degraded by matrix metalloproteinases (MMPs) and cysteine cathepsins (CCs) [94], allowing the progression of erosion [95]. As seen in this study, MaquiCPI-3 inhibits CCs, similarly to CaneCPI-5 [25]. Thus, one of the mechanisms by which MaquiCPI-3 (at concentrations ranging from 0.1 to 0.5 mg/mL) protects against dentin erosion might be through the inhibition of CCs. Another mechanism might involve the binding of MaquiCPI-3 to hydroxyapatite on the dentin surface, thus modifying the AP architecture, as mentioned for enamel [25,85]. The highest MaquiCPI-3 concentration (1.0 mg/mL) did not protect the dentin. One probable reason might be protein dimerization through domain swapping (and consequent inactivation) since this is common to other phytocystatins [71]. One unexpected finding of this study was the lack of a protective effect against dentin initial erosion for Elmex ® . Although this commercial product is effective against enamel and dentin erosion, its protection is usually evidenced in protocols involving more severe erosive and abrasive challenges [87,89,96]. Moreover, dentin has lower mineral content compared to enamel, and the stannous ion is a potent reactant with hydroxyapatite [97]. This, along with the short erosive challenge (3 min), might not have allowed the protective action of Elmex ® to occur.

Conclusions
In summary, we identified six different cystatins from a transcriptome of Aristotelia chilensis cultivated in Temporary Immersion Bioreactors. They present the three motifs that form the tripartite wedge as well as the alpha helix phytocystatin exclusive motif. Phylogenetically, they are distributed into three distinct groups, following the same pattern as rice and other species. Recombinantly expressed maquicystatins presented inhibitory activity in nanomolar order against papain and human cathepsins B (except MaquiCPIs 4 and 5) and L. Considering the limitations of an in vitro design of initial erosion, MaquiCPI-3 seems to be a promising agent for inclusion in dental products to protect against enamel (at 0.5 mg/mL) and dentin (at 0.1 mg/mL) erosion. Future studies employing protocols that more closely resemble the clinic, such as in situ design, shorter treatment times and prolonged erosive challenges, also associated with abrasive ones, are necessary to pave the way for the use of MaquiCPI-3 in preventive dentistry.