Structural Basis of SARS-CoV-2 Spike Protein Priming by TMPRSS2

Entry of SARS-CoV-2, etiological agent of COVID-19, in the host cell is driven by the interaction of its spike protein with human ACE2 receptor and a serine protease, TMPRSS2. Although complex between SARS-CoV-2 spike protein and ACE2 has been structurally resolved, the molecular details of the SARS-CoV-2 and TMPRSS2 complex are still elusive. TMPRSS2 is responsible for priming of the viral spike protein that entails cleavage of the spike protein at two potential sites, Arg685/Ser686 and Arg815/Ser816. The present study aims to investigate the conformational details of complex between TMPRSS2 and SARS-CoV-2 spike protein, in order to discern the finer details of the priming of viral spike and to point candidate drug targets. Briefly, full length structural model of TMPRSS2 was developed and docked against the resolved structure of SARS-CoV-2 spike protein with directional restraints of both cleavage sites. The docking simulations showed that TMPRSS2 interacts with the two different loops of SARS-CoV-2 spike protein, each containing different cleavage sites. Key functional residues of TMPRSS2 (His296, Ser441 and Ser460) were found to interact with immediate flanking residues of cleavage sites of SARS-CoV-2 spike protein. Compared to the N-terminal cleavage site (Arg685/Ser686), TMPRSS2 region that interact with C-terminal cleavage site (Arg815/Ser816) of the SARS-CoV-2 spike protein was predicted as relatively more druggable. In summary, the present study provide structural characteristics of molecular complex between human TMPRSS2 and SARS-CoV-2 spike protein and points to the candidate drug targets that could further be exploited to direct structure base drug designing.


Introduction
The recent pandemic of COVID-19 is the third outbreak of the diseases caused by beta coronavirus in humans, following Severe Acute Respiratory Syndrome (SARS) and Middle Eastern Respiratory Syndrome (MERS) (Cui et al., 2019). By 13th March 2020, over 1.7 million of global population has been infected with mortality rate of 21% in closed cases (WHO COVID-19 situation report-84). Genetically, etiological agent of COVID-19, SARS-CoV-2, is closely related to SARS-CoV compared to MERS-CoV (Wu et al., 2020). Similarly, like for SARS-CoV, Angiotensin Converting Enzyme-2 (ACE2) has been identified as the primary receptor for SARS-CoV-2 spike protein (Li, 2015;Lan et al., 2020). Whereas MERS-CoV spike protein interacts with the DiPeptidyl Peptidase 4 (DPP4) as the first site of attachment to the host cell (Li, 2015). Spike protein of SARS-CoV-2 is 1273 amino acid long protein with two functionally distinct regions, S1 and S2, involved in the attachment and entry of the virus, respectively. SARS-CoV-2 entry in the host cell is mediated by proteolytic cleavage of its spike protein, a process dubbed as priming. Recently, human Transmembrane Protease Serine 2 (TMPRSS2) has been shown to carry out the priming of the SARS-CoV-2 spike protein by generating two distinct fragments of the viral spike protein, S1/S2 and S2' (Hoffman et al.,

2020).
Recently, co-crystal structure of SARS-CoV-2 spike protein complexed with ACE2 receptor has been resolved unraveling the finer details of intermolecular interactions (Lan et al., 2020). The However, no complex structure of SARS-CoV-2 spike protein with TMPRSS2 has been resolved to date. Moreover, the molecular structure of human TMPRSS2 protein is also not known.
Resultantly, structural details of intermolecular interactions between SARS-CoV-2 and TMPRSS2 are largely unknown. Although, like many other protease inhibitors (Zhang et al., 2020), TMPRSS2 inhibitor has been suggested and/or shown to antagonize the entry of the virus into the host cells (Hoffman et al., 2020). This study aims to investigate the interaction points between TMPRSS2 and SARS-CoV-2 spike protein using an array of bioinformatic tool. The findings not only provide structure-function relationship of TMPRSS2 of humans but also predict the sites of interactions between TMPRSS2 and SARS-CoV-2 spike protein. This could lead to the development and/or directed screening of disruptor and/or inhibitor molecules.

Data mining for structures
Protein sequence of human TMPRSS2 (O15393) was retrieved from UniProt and subjected to

Molecular model of TMPRSS2
Human TMPRSS2 is 492 amino acid long protein with three functional domains: an N-terminal LDL-receptor class A domain (113-148) followed by SRCR (153-246) and finally at C-terminal peptidase S1 domain spanning from 256 to 487 amino acid ( Figure 1A). Till now molecular structure of the protein has not been resolved and our XtalPred analysis showed the least possibility for this molecule to be crystalized, potentially due to the high percentage of coiled structure, isoelectric point and surface hydrophobicity ( Figure 1B). This may be the reason that Therefore, we used multiple approaches to develop the full length molecular model of TMPRSS2. The finally selected refined model of TMPRSS2 has 96.32% residues within the allowed regions of Ramachandran plot, which is acceptable considering the N-terminal portion of the protein was predicted to be intrinsically disordered. Secondly, it has been demonstrated rather frequently that many of the resolved structures of the proteins such as USP7 (PDB id: 2F1Z) have more than 20% of the residues outside the allowed region in Ramachandran plot.

Molecular structure of TMPRSS2
Full length molecular models of TMRPSS2 has considerable structural homology with the template molecule (PDB: 1Z8G), where the deviation between the Cα back bone of model and template was found as 0.33Å ( Figure 1E;F). All three domains, LDL-receptor class A, SRCR and peptidase S1, formed distinct structural units in the molecular model. N-terminal region and LDL-receptor class A of TMPRSS2 were found more or less unstructured ( Figure 1G;H).

Interaction of TMPRSS2 with SARS-CoV-2 Spike protein
The entry of the SARS-CoV-2 in the host cell is driven by the proteolytic cleavage of its spike protein resulting in the formation of two fragments, S1/S2 and S2' (Hoffman et al., 2020). The precise positioning of the proteolytic cleavage sites have been mapped by sequence comparison and found to be at the junction of Arg685/Ser686 and Arg815/Ser816. The cleavage at the later site results in the production of S1/S2 and S2' fragments, which is necessary for the viral entry  (Table 1; Figure 2D).
Whereas, at the second cleavage site (Arg815/Ser816), out of the three residues of catalytic triad, His296 and Ser441 established hydrogen bond interactions with Pro809, Lys814 and Ser810 of the SARS-CoV-2 spike protein (Table 1; Figure 3D). Ser810 also formed a hydrogen bond and hydrophobic interaction with Ser460, substrate binding site, and His296, catalytic site of TMPRSS2, respectively (Table 1; Figure 3D). Since the functionally important residues of  (Table 1). In order to explore the druggability, interacting residues of TMPRSS2 for each cleavage site of SARS-CoV-2 spike protein were assessed for the volume, area and drug score. Consistent to the molecular docking simulations, intermolecular interactions between TMPRSS2 and Arg815/Ser816 of SARS-CoV-2 spike protein appeared as a suitable target for drug designing and development ( Figure 2E-F; 3E-F).

Discussion
Human TMPRSS2 is a 70kDa protein, a member of large superfamily of serine protease, mainly expressed in prostate, colon, stomach, and salivary gland (Vaarala et al., 2001). In prostate gland its expression is regulated by androgens and found overexpressed in prostate carcinoma (Afar et al., 2001). Physiologically, the protein is important in the functioning of epithelial sodium transport (Donaldson et al., 2002) and angiogenesis (Aimes et al., 2003). Therefore, the present study could also be advanced in relation to explore the effect of natural polymorphism found in human TMPRSS2 on priming of SARS-CoV-2 spike protein.
The authors declare that they have no conflict of interest