Food proteins are a potential resource for mining cathepsin L inhibitory drugs to combat SARS-CoV-2

The entry of SARS-CoV-2 into host cells proceeds by a proteolysis process, which involves the lysosomal peptidase cathepsin L. Inhibition of cathepsin L is therefore considered an effective method to decrease the virus internalization. Analysis from the perspective of structure-functionality elucidates that cathepsin L inhibitory proteins/peptides found in food share specific features: multiple disulfide crosslinks (buried in protein core), lack or low contents of (small) α-helices, and high surface hydrophobicity. Lactoferrin can inhibit cathepsin L, but not cathepsins B and H. This selective inhibition might be useful in fine targeting of cathepsin L. Molecular docking indicated that only the carboxyl-terminal lobe of lactoferrin interacts with cathepsin L and that the active site cleft of cathepsin L is heavily superposed by lactoferrin. A controlled proteolysis process might yield lactoferrin-derived peptides that strongly inhibit cathepsin L.


Introduction
The ongoing coronavirus disease 2019  is caused by Severe Acute Respiratory Syndrome Corona Virus-2 (SARS-CoV-2). Coronavirus binds though its spike glycoprotein (S) to human angiotensin converting enzyme 2 (hACE2) on the host cell membrane. Proteolysis of the protein S to its subunits S1 and S2 by the transmembrane protease serine 2 (TMPRSS2) enables fusion of the virus envelope with the host cell membrane to gain entry into the cell (Fig. 1A) (Hoffmann et al., 2020). However, it is now known that SARS-CoV-2 as well (mainly) enters cells through endocytosis. Phosphatidylinositol 3-phosphate 5-kinase and cathepsin L are critical for the endocytosis (Ou et al., 2020). Similar to SARS-CoV, after endocytosis of SARS-CoV-2, the protein S is cleaved (Fig. 1A) by cathepsin L, which allows fusion of the viral membrane with the endosomal membrane, and then the viral genome is released into the host cell. Therefore, cathepsin L inhibition might be helpful in reducing infection by SARS-CoV (Adedeji et al., 2013) and SARS-CoV-2 (Ou et al., 2020). Actually, application of a clinically proven drug that inhibits TMPRSS2 was shown to incompletely inhibit SARS-CoV-2 infection. Full inhibition was achieved by co-employment of two drugs, one for inhibition of TMPRSS2 and the other for cathepsin L/B (Hoffmann et al., 2020).
Cathepsin L (220 amino acid residues) (Fujishima et al., 1997) is a lysosomal cysteine peptidase and has a two-chain form (L and R). The L domain contains three α-helices (one is the longest central helix) and the R domain is a β-barrel, which is closed at the bottom by an α-helix ( Fig. 1B). The reactive site-cleft composes of histidine (His163) located at the top of the barrel and cysteine (Cys25) located at the N-terminus of the L domain central helix (Turk et al., 2012). Cathepsin L contributes to protein turnover, and apoptosis in cells. The overexpression of cathepsin L and H in cancer cells, in contrast to cathepsins B, C, S, and X/Z has made cathepsin L a target for anticancer strategies (Lankelma et al., 2010). The enzyme is also implicated in other pathologies such as osteoporosis, and periodontal diseases (Fujishima et al., 1997). Consequently, a plethora of cathepsin L inhibitory drugs are present for assessment. Some of the drugs e.g. E− 64 (Fujishima et al., 1997) inhibit several cathepsins including cathepsin L, whereas, a few such as CLIK-148 (an epoxysuccinyl derivate) and iCL (an aldehyde derivative) exclusively inhibit cathepsin L (Lankelma et al., 2010). Many of the cathepsin L inhibitory drugs, like dipeptide epoxyketones, which are used for SARS-CoV suppression are essentially peptidomimetic (Adedeji et al., 2013).

Food protein and cathepsin inhibition
Given that food contains naturally occurring bioactive peptides and proteins, recently we outlined how food protein-derived peptides (small fragments, usually <3 kDa) that influence the renin-angiotensin system might have implications on SARS-CoV-2 endocytosis and pulmonary function in COVID-19 patients. The presumed mechanisms of action were over-expression of hACE2, and the receptor Mas, as well as blockage of the angiotensin II receptor type I on cell surface (Goudarzi et al., 2020). However, food proteins/peptides afford further opportunities for inhibition of SARS-CoV-2. Certain foods contain protein/peptide-based cathepsin L inhibitory compounds. This characteristic confers a new vision to exploration and implementation of biomolecules that can inhibit SARS-CoV-2 entry into host cells.
Whereas some naturally occurring bioactive peptides in food such as carnosine (226 Da, β-alanyl-L-histidine, in meat and chicken) do not affect cathepsin activity (Bonner et al., 1995), several cathepsin inhibitory molecules have been identified in plant and animal foods. Oryzacystatin-I (11.4 kDa, residue count 102) in rice is known to inhibit cathepsin L (inhibition constant, K i : 7.3 × 10 − 10 M), but also cathepsin B (K i : 7.9 × 10 − 8 M) and cathepsin H (K i : 1.0 × 10 − 6 ) (Hellinger and Gruber, 2019). A hydrophobic cluster between the α-helix and the five-stranded antiparallel β-sheet structures (Fig. 1C) stabilizes the helix architecture of oryzacystatin-I (Nagata et al., 2000) and the N-terminal 21 amino acid residues are most probably not essential for cathepsin inhibitory activity of oryzacystatin-I (Abe et al., 1988). It is known that hydrophobic interactions are crucial for the inhibition of cathepsins, likely associated with the hydrophobic wedge-shaped structure of their active-site cleft (Turk et al., 2012). Corn cystatin-I as well inhibits cathepsin L (K i : 1.7 × 10 − 8 M) and cathepsin H (K i : 5.7 × 10 − 9 M) (Hellinger and Gruber, 2019).
In contrast to cystatin, bromelain inhibitor VI [BI-VI; 5.89 kDa; heavy chain (H, 41 amino acid residues) and light chain (L, 11 amino acid residues)] (Fig. 1D) which is a peptide present in pineapple stem selectively inhibits cathepsin L (K i 0.2 × 10 − 6 M) and at a lesser extent trypsin (Polya, 2003). The primary and secondary structures of BI-VI are remarkably distinct from those of cystatin. Hen egg white cystatin consists of two α-helices and a five-stranded antiparallel β-sheet, but the secondary structure of BI-VI lacks α-helix structure. It is composed of two domains A and B, each of which formed by a three-stranded antiparallel β-sheet (Hatano et al., 1995). BI-VI contains ten cysteine residues, and five intra/inter-chain disulfide bonds between Cys3 L -Cys7 H , Cys6 L -Cys39 H , Cys8 L -Cys5 H , Cys14 H -Cys21 H , and Cys18 H -Cys30 H crosslink the protein. The disulfide bonds form the protein core, which is not common as a protein core is commonly occupied with some bulky hydrophobic side chains. This arrangement is homologous with Bowman− Birk trypsin/chymotrypsin inhibitor from soybean (BBI-I), a typical serine protease inhibitor (Hatano et al., 1996). The heavily S-S crosslinked double chain conformation might be important for the inhibition selectivity towards cathepsin L. This feature is shared with lactoferrin, which is discussed later in the current communication. It is noteworthy that similar to cathepsin L but not essentially required, trypsin could efficaciously activate SARS-CoV-2 S protein, enabling formation of syncytium (Ou et al., 2020). Therefore, the co-inhibition of trypsin and cathepsin L by BI-VI is advantageous.
Comparable to plant resources, foods of animal origin contain cathepsin L inhibitors. Though the proteins and peptides with cathepsin L inhibitory property in foods of animal origin have been scarcely investigated, some pieces of evidence are available in the literature. Mammalian milk has significant contents of cysteine protease inhibitors, such as lactoferrin, β-casein and β-lactoglobulin. The inhibitory effect of β-lactoglobulin B, also at a lower extent that of β-lactoglobulin A, on cathepsins K and L have been observed (Ogawa et al., 2009). Beta-lactoglobulin is a member of lipocalin family and binds to hydrophobic molecules. In addition to a single cysteine residue buried in protein and protected by α-helix, β-lactoglobulin has 2 internal disulfide linkages that stabilize the protein structure. At physiological pH, β-lactoglobulin exists as a supramolecular dimer. The secondary structure of β-lactoglobulin consists of nine antiparallel β-sheets, three helical turns and only one short α-helix (Ragona et al., 2000). Beta-lactoglobulin A has an additional negative charge compared to β-lactoglobulin B. A shared feature between BI-VI and β-lactoglobulin B is the existence of internal disulfide bonds.
For β-casein, an allosteric-type inhibition mechanism is believed to cause cathepsin inhibitory action (Sano et al., 2005). Beta-casein (23.98 kDa) is a single chain polypeptide and the most hydrophobic casein.
Essentially all of β-casein net charge and α-helix structures are positioned at the N-terminal portion of the molecule (Kumosinski et al., 1993). It lacks cysteine. Considerable hydrophobicity and low contents of α-helix structure (<20%) are the shared features between β-casein and β-lactoglobulin B. Taking hydrophobicity into account, I speculate that comparable to oryzacystatin-I (Abe et al., 1988), the hydrophilic N-terminal portion of β-casein may not be essential for cathepsin inhibitory activity. Rather, the C-terminal hydrophobic portion which is also poor in α-helix structure probably causes the inhibition. In fact, inhibition of cathepsin L may occur as a consequence of the interaction between the carbonyl (carboxyl) group of an inhibitor with the electrophilic oxyanion hole of cathepsin L, which consists of side chains of Gln16, Trp189, His163 and the main chain of Cys25 (Fujishima et al., 1997).
Lactoferrin (~80 kDa) is present at much higher contents in human milk than cow's milk. It strongly inhibits cathepsin L. The IC50 of lactoferrin against cathepsin L was 10 − 7 M, while, that of a synthetic peptide which targets cathepsin L active site was 10 − 5 M. Notably, lactoferrin does not inhibit cathepsin B and cathepsin H (Sano et al., 2005). This may enable fine targeting of cathepsin L for obstacle SARS-CoV-2 internalization, while avoiding possible jeopardizes to cells. Bovine lactoferrin is a single polypeptide consisting of 689 residues. It is folded into two lobes (N-terminal and C-terminal halves) joined by a small helix (3-turn) (Moore et al., 1997) (Fig. 1E). The crystal structures of lactoferrin and cathepsin L were taken from the RCSB protein data bank. Molecular docking using MDsrv (Tiemann et al., 2017) indicates that the C-lobe of lactoferrin interacts with cathepsin L (Fig. 1F). It also shows that the C-lobe superposes the active-site cleft of cathepsin L (Fig. 1F). The α-helix content of bovine lactoferrin varies between 7% and 16% depending on the pH (Sreedhara et al., 2010). This protein contains 17 disulfide bonds (Wang et al., 2019) and the interactions between the two lobes are mostly hydrophobic caused by packing of nonpolar surfaces on the lobes (Moore et al., 1997). Low α-helix content, small helices, and hydrophobic surfaces are the shared features between lactoferrin, β-lactoglobulin and β-casein. It is worthy to note that both lactoferrin and BI-VI, which show high selectivity towards cathepsin L, include several disulfide bonds, positioned in protein core. As mentioned in an earlier place, cathepsin L inhibition proceeds by thiol chemistry.
In addition to the naturally present protein and peptides with cathepsin L inhibitory activity in foods, some peptides encrypted in food proteins might be able to inhibit cathepsin L and hence assist COVID-19 prevention. Food proteins release diverse biologically active peptides once they are hydrolyzed by the enzymes present in the gastrointestinal tract, and from microbes. Presently some pieces of evidence showing cathepsin L inhibitory activity by food protein-derived peptides are scarce. Two cathepsin B inhibitory peptides derived from β-casein were identified in a pancreatic digest of casein (Lee and Lee, 2000), as well as peptides from the digestion of β-lactoglobulin B by an endopeptidase could inhibit cathepsin K. It remains to be explored whether food protein-derived peptides can inhibit cathepsin L. Isolation of peptides, for instance from lactoferrin that exclusively (or much more preferentially) inhibit cathepsin L rather than other cathepsins can be highly advantageous.
The structural and biofunctional characteristics of many food proteins are extensively characterized. This is especially true for milk proteins, but also egg white proteins. For example, it is known that hen egg white riboflavin-binding protein has a high degree of crosslinking by nine disulfide bonds. Besides, food proteins are generally recognized as safe (GRAS) and inexpensive. Hundreds of biologically active peptide sequences derived from food proteins are known in the literature. The hydrolysis conditions, separation and purification procedures of these peptides are established. In silico examination of the cathepsin L inhibitory property of the known peptide sequences can accelerate the drug discovery.

Declaration of competing interest
The author declares no conflict of interest.