Molecular analysis of a public cross-neutralizing antibody response to SARS-CoV-2

Summary As severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants of concerns (VOCs) continue to emerge, cross-neutralizing antibody responses become key toward next-generation design of a more universal COVID-19 vaccine. By analyzing published data from the literature, we report here that the combination of germline genes IGHV2-5/IGLV2-14 represents a public antibody response to the receptor-binding domain (RBD) that potently cross-neutralizes a broad range of VOCs, including Omicron and its sub-lineages. Detailed molecular analysis shows that the complementarity-determining region H3 sequences of IGHV2-5/IGLV2-14-encoded RBD antibodies have a preferred length of 11 amino acids and a conserved HxIxxI motif. In addition, these antibodies have a strong allelic preference due to an allelic polymorphism at amino acid residue 54 of IGHV2-5, which is located at the paratope. These findings have important implications for understanding cross-neutralizing antibody responses to SARS-CoV-2 and its heterogenicity at the population level as well as the development of a universal COVID-19 vaccine.


Correspondence nicwu@illinois.edu
In brief Through literature mining, Yuan et al. find that many IGHV2-5/IGLV2-14 RBD antibodies can broadly neutralize SARS-CoV-2 VOCs. Their CDR H3 sequences have a length preference and a conserved HxIxxI motif. These antibodies also have a strong allelic preference toward IGHV2-5*02 due to an allelic polymorphism at residue 54.

INTRODUCTION
The effectiveness of COVID-19 vaccines has been challenged by the evolution of diverse severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants in the past 2 years. The recent emergence of Omicron and its sub-lineages BA.2, BA.2.12.1, BA.4, and BA.5 further highlights the urgent need for a more broadly protective vaccine. An ideal COVID-19 vaccine should elicit high titers of neutralizing antibodies that are potent against antigenically distinct variants. However, many potent neutralizing antibodies only have limited cross-reactivity for variants other than the immunizing strain. For example, a major class of antibodies to the receptor-binding domain (RBD) that are encoded by IGHV3-53/3-66 are highly potent against the ancestral Hu-1 strain, but most of them lose their activity against many of the variants. 1,2 Similarly, beta-specific antibodies can be elicited without cross-neutralizing activity against ancestral or other variants. 3 On the other hand, antibodies to S2 are typically broadly reactive but have weak neutralizing activity. [4][5][6] Nevertheless, a few RBD antibodies exhibit marked neutralization potency and breadth, as exemplified by those to the RBS-D epitope. 2 One representative RBS-D antibody is LY-CoV1404 (also known as Bebtelovimab), which is a monoclonal therapeutic antibody from Eli Lilly. LY-CoV1404 is encoded by IGHV2-5/ IGLV2-14 and can cross-neutralize the ancestral Hu-1 strain as well as all known variants of concern (VOCs), including Omicron and circulating sub-lineages. 7,8 In fact, the binding mode of LY-CoV1404 is identical to the cross-neutralizing antibody 2-7, which is also encoded by IGHV2-5/IGLV2-14. 9 More recently, Veesler and colleagues reported another potently cross-neutralizing antibody with similar sequences and binding mode as LY-CoV1404. 10 As IGHV2-5 was shown to be an important contributor to the cross-neutralizing antibody response, 11,12 the observations above stimulated a systematic analysis of IGHV2-5/IGLV2-14-encoded RBD antibodies to SARS-CoV-2.
Germline-encoded residues in IGHV2-5/IGLV2-14 are important for RBD binding Next, we performed a structural analysis to uncover the sequence determinants of IGHV2-5/IGLV2-14-encoded antibodies for RBD engagement. For antibody residues, the Kabat numbering scheme is used unless otherwise stated. All four IGHV2-5/IGLV2-14-encoded RBD antibodies with available structural information exhibit the same binding mode to the RBD ( Figure 1B). As observed in LY-CoV1404, most amino acid side chains in the paratope are germline encoded and form key interactions with the RBD ( Figure 1C). For example, while most CDR H1 contacts are mediated by its main chain, the side chain of germline-encoded V H S32 in CDR H1 fits into a polar pocket in the RBD. In addition, germline-encoded V H Y52, D54, D56, and R58 in CDR H2 form an extensive network of H-bonds and electrostatic interactions with the RBD. Furthermore, two key paratope residues in the light chain V L Y32 and  19 XG005, PDB: 7V26. 21 ACE2-binding site 42 is in green. Epitope for each antibody is indicated by the black outline in the bottom panels. Buried surface area was calculated by PISA 43 and is shown at the bottom right. (C) Key interactions between LY-CoV1404 and RBD are shown. Hydrogen bonds and salt bridges are represented by black dashed lines. All germline-encoded residues are underlined. HC is in orange, LC in yellow, and RBD is in white. See also Figure S1 and Table S1. Y91 are also germline encoded. V L Y32 H-bonds with RBD N439, whereas V L Y91 stacks with RBD P499. Both N439 and P499 are conserved among all VOCs to date. These observations demonstrate that the RBD-binding determinants are encoded in the germline sequences of IGHV2-5 and IGLV2-14. Consistently, several IGHV2-5/IGLV2-14-encoded RBD antibodies have very few somatic hypermutations (SHMs) ( Table S1). For example, S24-223 has only one SHM, and COV2-2268 and 2-7 have only four each. Of note, none of their SHMs overlap.
The lack of IGHV2-5*01 among IGHV2-5/IGLV2-14-encoded RBD antibodies is likely due to an allelic polymorphism at residue 54. IGHV2-5*01 and IGHV2-5*02 have Asn and Asp, respectively, at residue 54. V H D54 in IGHV2-5/IGLV2-14-encoded RBD antibodies plays an important role in RBD binding through a salt bridge with RBD K444 and an H-bond with RBD N450 (Figure 1C). As demonstrated by structural modeling using Rosetta, [25][26][27] replacing the Asp at V H residue 54 by Asn would convert the salt bridge with RBD K444 to an H-bond, which weakened the binding energy by 3.2 Rosetta energy unit (Figure 2B). This observation was further validated by a binding experiment, which showed that mutation D54N weakened the RBD binding affinity (K D ) of LY-CoV1404 and 2-7, both of which are IGHV2-5/IGLV2-14-encoded RBD antibodies (Figures 1A  and 1B; Table S1), by at least 100 fold ( Figure 2C). Consistently, all eight of the nine IGHV2-5/IGLV2-14-encoded RBD antibodies with sequence information available have an Asp at V H residue 54, whereas the remaining one has a Glu at V H residue 54 (Table S1). These findings provide a mechanistic basis for the allelic preference against IGHV2-5*01 despite its prevalence in the human population. Coincidentally, an almost identical observation was observed in an IGHV2-5-encoded HIV antibody, in which V H D54 results in much stronger binding than V H N54. 28 Sequence features of CDR H3 in IGHV2-5/IGLV2-14encoded RBD antibodies Lastly, we analyzed the CDR H3 sequences of the IGHV2-5/ IGLV2-14-encoded RBD antibodies. Among 10 IGHV2-5/ IGLV2-14-encoded RBD antibodies with CDR H3 sequence information available, eight had a CDR H3 length of 11 amino acids (IMGT numbering) and came from at least five patients (Figure 3A). The CDR H3 sequences from these eight antibodies shared a motif HxIxxI or conserved variations of it, including HxIxxL and HxVxxI ( Figures 3A and 3B). The HxIxxI motif consisted of V H H95, I97, and I100 (Kabat numbering) and is uncommon among the CDR H3 sequences of IGHV2-5-encoded antibodies in the human antibody repertoire ( Figure 3B). V H H95, I97, and I100 in the HxIxxI motif play critical roles in stabilizing the loop conformation as well as RBD binding ( Figure 3C). V H H95 forms two intramolecular H-bonds to stabilize the CDR H3 loop. The first H-bond involves the side chain of V H Y52, which in turn H-bonds with RBD V445 amide nitrogen. The second H-bond involves the backbone carbonyl of V H I100. In addition, V H H95 also forms van der Waals interaction with RBD V445. V H I97 at the tip of the CDR H3 loop inserts into a hydrophobic pocket formed by RBD V445 and P499, as well as the aliphatic portion of RBD N440. V H I100 helps position V L Y91 to interact with RBD V445 and P499. As shown by IgBlast analysis, 29 the HxIxxI motif is largely encoded by N-nucleotide addition, although V H I97 may sometimes be encoded by an IGHD gene ( Figure 3D). Of note, while CDR H3 of XG005 has 12 amino acids ( Figure 3A), it adopts a similar conformation to those with 11 amino acids ( Figure S2). Overall, IGHV2-5/IGLV2-14-encoded RBD antibodies with a CDR H3 length of 11 amino acids have convergent CDR H3 sequences and thus can be classified as a public clonotype.
Nevertheless, some individuals may have difficulties generating an IGHV2-5/IGLV2-14-encoded RBD antibody response due to the alleles that they possess (Figure 2). Since there is no known copy-number variation for IGHV2-5, 38 each person should carry two copies of IGHV2-5 in the genome. If both copies are IGHV2-5*01 allele, the person may not have the suitable B cell germline clone to produce an IGHV2-5/IGLV2-14-encoded RBD antibody response. In fact, donor 112 in the 13 healthy donors that were analyzed in this study is very likely to be IGHV2-5*01 homozygous since 94% of the IGHV2-5-encoded antibodies were assigned to IGHV2-5*01 (Table S2). Moreover, the conserved HxIxxI motif in CDR H3 of IGHV2-5/IGLV2-14encoded RBD antibodies is mostly encoded by random N-nucleotide addition. As a result, B cell germline clones that can produce IGHV2-5/IGLV2-14-encoded RBD antibodies may be relatively rare. These results suggest that the ability to generate IGHV2-5/IGLV2-14-encoded cross-neutralizing antibody response is heterogeneous among different individuals. While allelic preference has previously been described for neutralizing antibodies to other viruses, 28,39-41 its clinical implications for COVID-19 remain to be fully explored.

STAR+METHODS
Detailed methods are provided in the online version of this paper and include the following:

AUTHOR CONTRIBUTIONS
M.Y. and N.C.W. conceived and designed the study. All authors performed data analysis. M.Y. and N.C.W. wrote the paper, and all authors reviewed and/or edited the paper.

Materials availability
All plasmids generated in this study are available from the Lead Contact without restriction.
Data and code availability d The assembled dataset IGHV2-5/IGLV2-14-encoded RBD antibodies is in Table S1. d Custom python scripts for all analyses have been deposited to https://doi.org/10.5281/zenodo.7196474. d Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

EXPERIMENTAL MODELS AND SUBJECT DETAILS
ExpiCHO cells (Chinese hamster ovary cells, female) and Expi293F cells (human embryonic kidney cells, female) were maintained in ExpiCHO expression medium and Expi293 expression medium, respectively, at 37 C with 8% CO 2 according to the manufacturer's instructions (Thermo Fisher Scientific).

Collection of antibody information
The information on antibodies S24-223, P2B-1E4, 2-7, LY-CoV1404, XG005, XG031, and COV2-2268 were compiled in our previous study, 13 whereas the information on XGv042, XGv264, XGv265, and XGv266 were compiled in CoV-AbDab. 45 Neutralization data of each monoclonal antibody were collected from the original papers (Table S1). Somatic hypermutations were identified by IgBlast. 29 Allele assignment of IGHV2-5/IGLV2-14-encoded RBD antibodies For antibodies P2B-1E4, XG005, and XG031, the allele information was obtained from the original publications. 15,17 For other antibodies, IgBlast was used to assign the allele of each antibody. 29 Nucleotide sequence, if available, was used as input for IgBlast. Otherwise, protein sequence was used. If an antibody showed equally likely to be encoded by two or more alleles, the allele assignment would be classified as ''ambiguous''. All ''ambiguous'' allele assignments in this study came from antibodies that do not have nucleotide sequence information available, namely XGv264, XGv265, and XGv266. Of note, while IgBlast showed that XGv266 was equally likely to be encoded by IGHV2-5*01 and IGHV2-5*02, we postulated that XGv266 should be assigned to IGHV2-5*02 at the nucleotide level. Specifically, XGv266 had a Glu at V H residue 54, which was one nucleotide change from the Asp codon used (IGHV2-5*02) but two nucleotides away from Asn (IGHV2-5*01). However, IgBlast did not utilize codon information for allele assignment when the amino acid sequence was used as input.
Analysis of allele usage in published antibody repertoire Published antibody repertoire sequencing datasets from 13 healthy donors 23,24 were downloaded from cAb-Rep. 46 Putative germline gene alleles for each antibody sequence in these repertoire sequencing datasets from healthy donors were identified by IgBLAST. 29 Analysis of CDR H3 sequences Sequence alignment was performed using MAFFT. 47 Antibody sequences in the human antibody repertoire were downloaded from the Observed Antibody Space. 44 IGHV2-5 antibodies as well as their CDR H3 sequences were identified using IgBLAST. 29 Sequence logos were generated by WebLogo. 48 Putative germline sequences and segments in the V-D-J junctions were identified by IgBLAST. 29 Expression and purification of Fabs The heavy and light chains were cloned into phCMV3. The plasmids were transiently co-transfected into ExpiCHO cells at a ratio of 2:1 (heavy chain:light chain) using ExpiFectamine CHO Reagent (Thermo Fisher Scientific) according to the manufacturer's instructions. The supernatant was collected at 10 days post-transfection. The IgGs and Fabs were purified with a CaptureSelect CH1-XL Affinity Matrix (Thermo Fisher Scientific) followed by size exclusion chromatography.

Expression and purification of RBD
The receptor-binding domain (RBD) (residues 319-541) of the SARS-CoV-2 spike (S) protein (GenBank: QHD43416.1) was previously cloned into phCMV3 vector and fused with a C-terminal His 6 tag. 49 The plasmids were transiently transfected into Expi293F cells using ExpiFectamine 293 reagent (Thermo Fisher Scientific) according to the manufacturer's instructions. The supernatant was collected at 7 d posttransfection. The His 6 -tagged proteins were then purified with Ni Sepharose Excel protein purification resin (Cytiva) followed by size exclusion chromatography.

DDG calculation
Our DDG calculation was based on the structure of SARS-CoV-2 S bound to LY-CoV1404 (PDB ID: 7MMO). 7 Pyroglutamic acid (PCA) and 2-acetamido-2-deoxy-b-D-glucopyranose (NAG) were removed using PyMOL (Schrö dinger). Coordinates for one heavy chain and the RBD that interacted with it (amino acid residues 334-527) were extracted using PyMOL. The resulting PDB file was then renumbered using the ''pdb_reres.py'' script in pdb-tools. 51 The DDG between WT and the D54N mutant heavy chain was calculated using Rosetta (Rosetta Commons). A constraint file was first generated, and the global structure was relaxed using the 'relax' application. 25 Out of the thirty poses, the pose with the lowest score was used for DDG calculation. 26,27 One-hundred poses were generated for DDG calculation and the DG values of WT and mutant antibodies were obtained from the lowest-scoring pose. In this case: DG = DG antibody=RBD complex À À DG antibody + DG RBD Á : DDG was calculated as DG mutant -DG WT . A positive DDG value suggests that the binding energy of the mutant antibody/RBD complex is greater than that of the WT antibody/RBD complex. The lowest-scoring pose for D54N mutant is shown in Figure 2B.

QUANTIFICATION AND STATISTICAL ANALYSIS
Standard deviation for K D estimation was computed by Octet analysis software 9.0.