The Role of Host Cell Glycans on Virus Infectivity: The SARS‐CoV‐2 Case

Abstract Glycans are ubiquitously expressed sugars, coating the cell and protein surfaces. They are found on many proteins as either short and branched chains or long chains sticking out from special membrane proteins, known as proteoglycans. This sugar cushion, the glycocalyx, modulates specific interactions and protects the cell. Here it is shown that both the expression of proteoglycans and the glycans expressed on the surface of both the host and virus proteins have a critical role in modulating viral attachment to the cell. A mathematical model using SARS‐Cov‐2 as an archetypical virus to study the glycan role during infection is proposed. It is shown that this occurs via a tug‐of‐war of forces. On one side, the multivalent molecular recognition that viral proteins have toward specific host glycans and receptors. On the other side, the glycan steric repulsion that a virus must overcome to approach such specific receptors. By balancing both interactions, viral tropism can be predicted. In other words, the authors can map out the cells susceptible to virus infection in terms of receptors and proteoglycans compositions.


Theoretical Model
As discussed in the main text, binding between coronaviruses and the host cells proceeds in a two step process. First, the virus binds to the glycosaminoglycans (Heperan Sulfate), the long-chain sugars that coat the cell surface. After binding to the glycosaminoglycans, the virus then can form bonds with the entry-receptors on the surface, particularly ACE2 and TMPRSS2 in the case of SARS-CoV-2.
Following the derivation of a simple analytical model for the adsorption of multivalent particles on a surface [1], we will define the glycocalyx surface coverage,θ G , as the probability that a virus will be bound to a GAG chain at a given site, and the cell-surface coverage, θ R , as the probability that a virus will be bound to a receptor at a given site. As pointed out in the original work of Veracochea et.al. [1] the predicted Langmuir dependence of the surface covarge (θ) on bulk concentration of the multivalent guest particles has been observed experimentally for functionalized colloids [2], nanoparticle-based drug delivery systems [3,4,5,6] and multivalent host-guest interaction interfaces [7]. In equation 1 in the main text, we give the general form of the surface coverage, which we will specialise to deal with each case, giving where the binding volume, v B , is determined from the viral radius, R, and the spike length, d, and ρ is the viral bulk concentration, and ρ G is the viral concentration in the glycocalyx. The cell-surface coverage can be linked to the glycocalyx surface coverage as ρ G = θ G v B , however, in order to limit the complexity for our calculation, we treat ρ G as a parameter, effectively decoupling the system.
The partition functions, Q G and Q R , contains information about the possible states of the system when the virus is bound to either the surface or the GAG chains respectively. We can calculate the partition function taking the product of the individual contributions to the respective partition function.
For Q G , we have contributions from the GAG binding, q (−) GAG as well as a contribution from the steric repulsion arising from insertion into the chain, q (+) GAG , Likewise, for Q R we have contributions from the receptor binding q It should be noted that q (+) GAG has slightly different form in each equation, due to differences in the geometry of binding in the different cases. The individual partition functions will be discussed in more detail in the following subsections.

GAG Binding
Each GAG chain contains multiple sites to which the spike can bind. The number of these binding sites, N L is some fraction of the number of units, N p . The number of units can be calculated from the average GAG chain length, d GAG and the GAG Kuhn length, b GAG as N p = (d GAG /b GAG ) 2 . Only a single bond can form between a given GAG chain and the virus. This is likely due to the mechanical properties of the GAG chain, suggesting the entropic cost of forming an additional bond is too high. We can, therefore, state the partition function for a single GAG chain binding to the virus in terms of the number of spikes available for binding, N S , and the GAG-spike binding energy in solution motif GAG [8]: Then, we can calculate the total partition function for GAG-spike binding from the binding energy of a given chain, chain GAG = − ln(q chain GAG ), and the number of chains, as [1] q (−) Figure S1: Virion landing onto the host cell membrane. Description of the geometrical parameters of the model.

GAG Repulsion
As multiple GAG chains will bind to a single proteoglycan, we expect the density of the HS chains to be non-uniformly distributed, with areas of locally high density around each proteoglycan. This means when the virus is bound only to GAG chains, there will be effectively a high local density then when the virus is bound to the receptors. We, therefore, have to consider two cases for the steric repulsion. First, when the virus is bound only to GAG chains, and the second when the virus is bound to the receptors. When the virus is bound to receptors, the steric repulsion depends on the global GAG density, ρ HS , i.e. the average density over the entire cell. However, when the virus is bound only to GAG chains, we use the local GAG density to account for the clustering together of GAG chains.
We can then calculate the energy of insertion in terms of the density of chains, the inserted volume, V µ (z), and the insertion parameter, δ [4]. This gives From this expression, we can calculate the partition function as The inserted volume can be calculated geometrically ( Figure S.Fig.S1) by considering the distance from the surface, z, as

Insertion Parameter
The insertion parameter quantifies how far into the brush the virus has penetrated, and is given as the ratio between the brush length and the distance between the virus and surface. It can be expressed in terms of the GAG chain length, d GAG , the distance between the virus and the cell surface, z, and the receptor tether length, d r , as In the presence of receptor-spike binding, the cell-virus distance will be equal to the receptor tether length, d r , Fig.??. However, in the absence of receptor-spike binding, situation is more complex.
We will number the binding sites of a GAG chain, from n = 1, ..., N L , where n = 1 is the site farthest from the cell surface. We can then define the distance between two sites, n and n + 1, as d n . It is not possible to identify each distance between sites, so we will assume that the distances can be approximated by the average distance, an assumption that holds for long chain lengths with roughly equal distribution of sites. We can then express the inter-site distance as The virus-cell distance can then be expressed as a function of the closest binding site to the surface, n; z = d r for receptor-spike binding

Mean Field Approximation
As we cannot know the distance of each virus from the cell surface, we will apply a mean field approximation, treating the repulsion as the repulsion at the average distance.
To do this, we must find the probability that a given site, n, is the closest bound site to the surface on a given chain. This can be expressed in terms of the probability that no bonds are formed on any GAG chain at a given site, p κ = (1 − κ) N GAG , where κ is the probability a given site is bound (which can be calculated from the binding energy of a given site as described in equation (16)) The expectation value of the cell-virus distance is then given by . (12)

Receptor Interaction
As there are multiple possible receptors that the spike can bind to, we need to take a different but equivalent approach for calculating the receptor-spike partition function. First, we define the partition function as [9] q (−) where R is the binding energy of the single receptor-spike RDB domain and u R is the steric repulsion arising from the receptor glycans trapped in the binding site, Fig. S2. Then, we can define the energy of binding in terms of the probability that a given site is unbound, p i .
The probability of a receptor and the spike RBD domain being unbound can be calculated by closure (i.e. all probabilities must sum to 1) [10]. Defining p ij as the probability of receptor i is bound to the RDB domain j, we have Finally, we can calculate p ij can be defined as the product of the probability that both i and j are unbound and the probability that a bond will form, which is given by the Boltzmann factor -the negative exponential of the binding energy. This gives the expression A full derivation for this can be found in the original work by Varilly et al. [10].

Trapped receptor N-glycans generated steric repulsion
The binding energy between the spike and a given receptor has both an attractive component, arising from the binding interaction, and a repulsive component, arising from the steric repulsion caused by trapped glycans, as discussed in the main body. TMPRSS2 is not gycosylated and has only one binding site, therefore u T M P RSS2 R = 0. The ACE2 monomer has one glycosylated binding site. Here we derive the corresponding steric potential arising from the glycans trapped in the ACE2 binding site.
We start by considering the Kuhn length of the glycans, b G ; the unit size at which each unit can freely move, and orient in any direction. All details of chemical arrangements, bond rotation constraints, monomermonomer interaction and monomer interaction with its local environment, are described by this parameter.
The mean squared end-to-end distance of the single glycan is given by while the length of a fully extended conformation is given by r max = N G b G , where N G is the number of monomers of the glycan. The probability that the end-to-end distance of a glycan is a given value, r, can be determined from a Gaussian distribution, Each branch can explore a hemispherical volume of radius r. Therefore, integrating equation (18) over this volume gives the partition function for a given branch, Hence the energy loss associated with the compression of a glycan with B branches upon binding is determined by the change in r from the average end-to-end distance r 0 to the binding distance r B , Figure S2: Entry receptor binding and glycans steric repulsion The total free energy of repulsion can now be found by summing over each glycan, u R = N Glycan U glycans , where N Glycan is the number of glycans. Assuming that each glycan contains a similar amount of units, we can approximate       Cell size, m2