Sequencing and Modeling of Anti-DNA Immunoglobulin Fv Domains COMPARISON WITH CRYSTAL STRUCTURES*

Models for the three-dimensional structures of the combining regions of six DNA-binding antibodies have been derived &om the sequence data for their Fv domains presented here. Using the amino acid sequences and the canonical structure classes described by Cho- thia and Leak (Chothia, C., and Leekc, A. M. (1987) J. MOL Biol. 196, Sol), model loops were selected from immuno- globulin domains of known structure for five of the six antibody hypervariable regions. Models for the third complementarity-determining region of the heavy chain were constructed h m known immunoglobulin loops of similar length and sequence. Comparison of three of the models with the respective crystal structure indicates that this procedure can generate a working model of the antibody combining region that provides useful infor-mation on the nature of the interactions between anti. bodies and nucleic acids. As part of our continuing in-vestigation into the structural basis of antibody-DNA recognition, the observed and predicted models for the combining regions of nucleic acid-binding antibodies have been examined. In general, single strand-specific antibodies have deep clefts where the antigen might bind, whereas duplex-specific antibodies present a relatively flat surface. In addition, on the basis of both sequence and structure, there is little to distinguish au- toimmune antibodies h m those produced by immunization. Testable hypotheses for how these antibodies might interact with single- and double-stranded nucleic acids are presented. known crystal structures of immunoglobulin Fab fragments. The r e sulting model Fv structures were then subjected to a limited energy minimization (30) to relieve unfavorable van der Waals contacts. The electrostatic surface potentials of crystal structures and of the modeled Fv fragmenta were calculated using the program DELPHI implemented within version 2.1.0 of INSIGHT I1 (Biosym Rchnologies, Inc.) using only the protein formal charges. For these calculations, the protein dielectric was set to 2.0, and the solvent dielectric to 80.0. The radius of the solvent probe was 1.8 A, and the ionic strength was set to 0.145 M.

generates loops ab initio based on energetic constraints, whereas the knowledge-or template-based approach uses loops from known crystal structures as models for the new loops. Several groups have reported some success predicting protein loop conformations de nouo (2-41, whereas others have proven equally successful using the template-based approach (5-81, and still others have employed a combination of the two approaches (9). As the data base of high resolution Fab structures grows, it becomes increasingly reasonable to model the combining sites of other antibodies based on these known structures using a template-based approach.
The observed conformations of the antibody CDR loops may be influenced by interactions with framework residues and the other hypervariable regions of the molecule. One potential disadvantage of the template-based prediction of the conformation of antibody CDRs is that variation in the association of the VL and VH domains (10) can alter the observed conformation. However, given the estimated coordinate error in the observed structures, the very simple template-based approach using canonical structures (1) provides an adequate description of the antibody CDRs and the general properties of the combining regions of DNA-binding antibodies. In particular, the distribution of positively charged residues within the combining region may indicate areas of the antibody that are likely to interact with the negatively charged sugar phosphate backbone of the DNA. If the orientation of the DNA backbone with respect to the combining region can be fured, then antibody residues that are in position to interact with the DNA can be identified. The ability to predict the conformation of the CDR loops from amino acid sequence data makes it possible to derive plausible models for specific antibody-DNA interactions that can be tested experimentally and improved upon even in the absence of crystallographic data for the Fab and Fab-DNA complexes.
To extend our studies of DNA-binding antibodies, we have determined the sequences of the Fv domains of six murine anti-nucleic acid antibodies and used the canonical structure model (1,5 ) to predict the loop conformations. Three of these antibodies were derived from autoimmune mice, whereas the others were prepared by immunization of mice with the respective nucleic acid polymer. These results allow us to compare autoimmune and induced antibodies on the basis of both sequence and structure to determine whether there are any obvious differences between the antibodies obtained from these two sources.
The crystal structures o f three of the Fab fragments have been determined and are compared with the modeled structures of their variable domains. The structure of Fab Hed 10 (11) has been determined to 2 . 4 4 resolution, whereas the structures of Fab Jel72 and Fab Jel318 have been determined to resolutions of 2.7 (12) and 2. 8 (13) A, respectively. The pre-
Hybridoma supernatants were screened for antibody production and the retention of their original specificity with the aid of a solid-phase radioimmune assay (18).
Preparation of RNA-Total cellular RNA was prepared from hybridomas grown in tissue culture by a modification of the procedure of Chomczynski and Sacchi (20). Briefly, cells were denatured by vigorous vortexing in 5 ml of 4 M guanidinium isothiocyanate (Sigma), 25 m~ Tris-HC1, pH 8.0, 0.5% N-lauroylsarcosine, and 0.1 M 2-mercaptoethanol. Following this, 0.5 ml of 2 M sodium acetate, pH 4.0, 5.0 ml of water-saturated phenol, and 1 ml of chloroform were added sequentially and mixed thoroughly after the addition of each reagent. The suspension was then cooled on ice for 15 min, followed by centrifugation at 10,000 x g for 20 min. The aqueous phase was transferred, and the RNA was precipitated at -20 "C in 1 volume of isopropyl alcohol. Total RNA was sedimented at 10,000 x g for 20 min, and the resulting pellet was dissolved in 0.7 ml of the denaturing solution. This RNA was once again precipitated using 1 volume of isopropyl alcohol, and the pellet was washed three times in ice-cold 70% ethanol. The final pellet was dried under a stream of nitrogen gas and dissolved in 500 m~ NaCl and 10 m~ Tris-HC1, pH 7.6, for subsequent isolation of the poly(A+) RNA by oligo(dT)-cellulose chromatography (21). Total RNA was applied to an oligo(dT)-cellulose chromatography column (Sigma) pre-equilibrated with 500 m~ NaCl and 10 rn Tris-HC1, pH 7.6. The RNA sample was collected and reapplied to the column five times. The column was then washed with equilibration buffer, and poly(A+) RNA was eluted using 10 m M NaCl and 10 m~ Tris-HC1, pH 7.6, at 50 "C. The resulting poly(A+) RNA was precipitated, washed three times in ice-cold 70% ethanol, and dried under a stream of nitrogen gas.
Oligonucleotide Primers-Since a portion of both the heavy and light chain mRNAs is conserved, synthetic oligonucleotide primers have been constructed that are complementary to the constant region of each chain. The light chain primer, C,-(17) (d(TGGATGGTGGGAAGATG)), hybridizes 26 nucleotides into the constant region, whereas the heavy chain primer, C,-(15) (d(GGCCAGTGGATAGAC)), hybridizes 21 nucleotides into the constant region (22). Additional synthetic primers used to complete the sequencing were as follows: lO . The nomenclature is such that 318-L2, for example, refers to the second light chain primer for Jel318. All synthetic oligonucleotide primers were produced by the Regional DNA Synthesis Laboratory, Department of Medical Biochemistry, University of Calgary (Calgary, Alberta, Canada).
RNA Sequencing-Dideoxynucleotide sequencing was performed essentially as reported by Kaartinen et al. (22). Initially, 5 pg of poly(A+) RNA was annealed at 60 "C for 10 min to 10 ng of the appropriate primer in a 10-ml volume containing 75 l l l~ Tris-HC1, pH 8.3, 72 m~ NaCl, 15 m~ MgCIZ, 1.5 m~ dithiothreitol, and 1.5 m~ EDTA. The reaction was cooled to room temperature, and 10 pCi of >lo00 Ci/mmol a-Y3-dCTP (DuPont NEN) was added to the annealing mixture. Following this, 20 units of avian myeloblastosis virus reverse transcriptase (Life Technologies, Inc.) was added, and an equal amount of the annealing mixture was dispensed to four reaction tubes. All four reactions contained 50 p each dATP, dGTP, and d m (Pharmacia LKB Biotechnology Inc.), but differed in the amount of ddNTP. The ddATP-specific reaction contained 30 p~ ddATP, the ddGTP-specific reaction contained 10 p ddGTP, the ddTTP-specific reaction contained 30 p ddTTP, and the ddCTP-specific reaction contained 2.5 p ddCTP. The reactions were incubated for 25 min at 45 "C, followed by the addition of 2 pl of chase solution containing 0.5 p~ each dATP, dCTP, dGTP, and dlTP. The reactions were continued for 20 min, after which NaOH was added to 0.1 M, and the reactions were incubated at 60 "C for 30 min to degrade the mRNA. The reactions were stopped by the addition of 2 pl of 95% deionized formamide (Life Technologies, Inc.) and 10 m~ EDTA, followed by heating at 90 "C for 4 min. All sequences were confirmed, and any ambiguities were resolved by performing additional sequencing using CI-~~S-~ATP (>lo00 Ci/mmol; DuPont NEN) under the following conditions. The ddATP-specific reaction consisted of 1 p~ ddATP and 60 p each dCTP, dGTP, and dTTP. The ddCTP reaction tube contained 9 p ddCTP, 4 p dCTP, and 80 p each dGTP and dTTP. The ddGTP reaction tube contained 4 p ddGTP, 4 p~ dGTP, and 80 p each dCTP and dlTP. Finally, the ddlTP-specific reaction contained 7 p dd'ITP, 4 p~ dmP, and 80 p each dCTP and dGTP. The remainder of the sequencing protocol was identical to the a-36S-dCTP protocol. Jel274 was sequenced from the cDNA by a polymerase chain reaction technique that will be published elsewhere.2 Model Building-An examination of the amino acid sequences of the anti-DNA antibodies revealed that many of the sequences possessed significant homology to antibodies with known crystal structures, particularly in their VL domains (see Fig. 1). Additionally, in most of the sequences examined, the hypervariable loops belonged to one of the canonical structures described by Chothia and Lesk (1). For this reason, we decided to attempt to model their combining sites using the loop designations and procedures described by them. The sequences of the hypervariable loops are presented in Table 11. Also shown are those key residues that determine the loop conformation, the canonical structure class to which each loop was assigned, and the name of the known structure from which the model was derived.
Parent structures for the VH and V, framework regions were selected, and those amino acids that differed were replaced with the amino acids corresponding to the anti-DNA antibodies in such a way as to maximize side chain overlap. Model loops were grafted onto these framework structures by superimposing the backbone atoms (nitrogen, carbon, and a-carbon) of the 3 residues preceding and the 3 residues following the loop onto the corresponding atoms of the parent domains.
For the special case of the H3 loop, models were selected based on their length and amino acid sequence homology. If an existing H3 loop of the same size was not available, then the nearest length loop was selected; the residue(s) that were deemed to be at the apex of the loop were added or excised; and the loop was closed by rotating the two newly exposed end residues about their @ and ' 4' torsion angles to bring the respective nitrogen and carbonyl carbon atoms to within approximate peptide bonding distance. A primary consideration in selecting a model H3 loop was whether both the modeled and existing loops contained arginine at position 94 and aspartic acid at position 101. In the McPC603 Fab structure, these 2 residues form a salt link at the base of the H3 loop (24).
After the six loops had been assembled, the VH and VL domains ofthe model structures were superimposed onto the VH and VL domains of

RESULTS AND DISCUSSION
Sequences ofAnti-DNA Antibodies-The sequences of all six antibodies are shown in Fig. 1. It is of considerable interest to compare the sequences, gene usage, and homologies within this group since they all bind to a similar antigen. In addition, three of the antibodies (Hed 10, Jel274, and Jel242) were produced from autoimmune mice, and the origin of this type of antibody is obscure. A summary of gene usage among these antibodies is shown in Table 111.
The antibodies were found to be from three different VH gene families. This result is consistent with earlier studies that have found anti-nucleic acid antibodies mainly in the 5558 family, but also in the 7183, S107 and V H l O families (3641). Thus, there appears to be little restriction on the usage of VH genes to generate DNA binding potential. Moreover, in contrast to some reports, there is no bias in these autoantibodies for the utilization of genes from the most 3'-gene families since VH J606 and 3558 are at the 5'-end of the genome (42,43). In other words, these antibodies did not likely arise from rearrangements that occurred early in B-cell ontogeny.
The question also arises as to whether autoimmune antibodies and experimentally induced antibodies are derived from related or different gene origins. Brigido and Stollar (41) re- Residue numbering corresponds to that of Kabat et al. (25). The positions of the antibody CDRs are indicated above the sequences by a solid bar. A, VH domains; B, V, domains.
ported that two anti-Z-DNA antibodies use the VHlO gene family and that the sequences are very closely related to two mouse I g G anti-DNA autoantibodies. In contrast, our results suggest that there is no obvious preference in V, gene usage for either autoimmune or experimentally induced antibodies. In addition, as judged from the percent homology to known VH genes, there are similar degrees of somatic mutation in this group of antibodies, suggesting that the autoimmune antibodies have also arisen by antigen stimulation. Overall, it is difficult to distinguish the autoimmune from the induced antibodies on the basis of sequence or gene usage. Hed 10, Jel274, and Jel242 appear to be typical DNA-binding antibodies. The only common theme is that neither of the single strand-specific antibodies use the VH 5558 gene family.
Although not shown in Table 111 Table IV. Since the modeled structures of the H3 hypervariable loops are the most likely to be in error, their conformations are compared with the crystallographically determined conformations in Fig. 2. The positions of the backbone atoms of all of the modeled loops, with the exception of H3, are within 1.0-A deviation of the crystallographically observed loop conformations. The backbone atoms of the H3 loops deviate by only slightly more than 1.0 A. When all of the side chain atoms are included, the r.m.s. deviation from alignment for most of the loops increases to between 1.0 and 2.0 A. The large deviations of the side chain atoms for the H3 loops and the H2 loop of Jel 318 can be attributed to radically different orientations of a few residues with large side chains (Lys, Arg, and Tyr). In a free Fab fragment, these problematic residues would, for the most part, be exposed to solvent, and the side chains of these residues may exist in multiple conformations. Additionally, crystal packing interactions influence the conformations of some of these side chains in the observed structures. The variable domains of both Hed 10 and Jel318 pack closely against constant domains of symmetry-related molecules.
The crystal packing of Jel318 involves contacts made by the H2 loop with the COOH terminus of a symmetry-related constant light chain domain (12). The conformation of lysine H53 is influenced by these interactions, with a possible hydrogen bond existing between the terminal nitrogen on the lysine side chain and the carbonyl oxygen of serine L208. The conforma-   tion of the side chain of this lysine differs markedly in the modeled loop, and it is this large difference in overall side chain orientation that contributes to the unusually large r.m.s. deviation for the atoms in this loop. Despite some similarity in the crystal packing of Hed 10 and Jel 318, different residues make contacts. Among the crystal contacts in Hed 10 are some made by residues of the L1 hypervariable loop of the VL domain interacting with a symmetryrelated constant light chain domain. It is particularly interesting that the conformation of the L1 loop in Hed 10 was modeled satisfactorily, but the disposition of this loop with respect to the conserved framework residues of the VL domain was different. When the modeled V, and VL domains are superimposed onto the crystal structures by their conserved framework residues, the differences in the positions of the a-carbon atoms of the loops are indicative of whether the entire loop is shifted in its position (data not shown). The L1 loop of Hed 10 is moved inward, toward the combining region, by nearly 3.5 A in the crystal structure relative to the modeled structure. This positioning of the L1 loop in the crystal allows it to make several crystal contacts and potential hydrogen bonds.
Immunoglobulin CDR loops are on the surface of the protein, as they must be to carry out their biological function. Because of this location, the CDR loops are also very likely to take part in crystal packing interactions. This places a significant constraint on the apparent accuracy one can hope to attain in predicting the structure of an antibody Fv domain. Both the CDR loop templates and the target structures are derived from crystal structures; therefore, crystal packing interactions can affect both the quality of the predicted model and our evaluation of its quality.
Another means to assess the quality of a model is to determine the degree to which two independent crystallographic structures are more similar to each other than to the model.
The crystals of Fab Jel72 contained two independent Fab molecules in the asymmetric unit, and these two molecules were refined independently. Comparison of the r.m.8. deviations between the backbone atoms of the CDR loops in these two copies of Jel 72 with those between the model and one of the Fab fragments (Table V)   loops, except H3. Considering the simplicity of the templatebased modeling, even the H3 loop results are not discouraging.
In evaluating the Fv domain model structures or drawing conclusions from them, a couple of provisos must be kept in mind. One is that the CDR loops, especially H3, tend to have high temperature factors, indicating that motion or alternative conformations are possible. A second is that there may be conformational changes upon binding antigen. In the case of anti-DNAFab BVO4-01, the temperature factors for the H3 loop are reduced in Fab complexes with the trinucleotide d(pT3), and there is a significant alteration in the structure of the loop (31). The longer glycine-containing H3 loops, such as that of Jel 72, may be relatively flexible and assume a conformation that maximizes interaction with the DNA during binding. Nonetheless, a good approximation of the antibody combining region can be an advantage in designing further experiments.

DNA Binding M o d e l e A n examination of the relative dispo-
sition of the antibody CDRs and the electrostatic potential of the crystallographically determined structures has aided us in deriving testable hypotheses for how these antibodies may interact with their respective nucleic acid antigens. Descriptions of how these antibodies might interact with nucleic acids and suggested mutations to test their validity are presented in more detail elsewhere for Jel 318 (12) and Jel 72 (13). We present here models for the interaction with nucleic acids of the Fv domains whose structures we have derived from our sequence data.
It has been proposed that antibody combining sites are enriched in relatively solvent-exposed aromatic residues and that the burying of these large rigid side chains by antigen contributes to higher binding constants (33). DNA-binding antibodies are also rich in aromatic amino acids, some of which no doubt interact with DNA. A comparison of the model and crystal structures shows that the solvent accessibility of the individual residues of each loop, including H3, is similar in both structures (data not shown). Aromatic amino acid residues pertinent to the binding of antigen could be discerned on the basis of their solvent accessibility.
An examination of the electrostatic surface potential of antibody combining sites can be useful in determining where charged groups on the antigen may bind. For DNA-binding antibodies, the primary concern is where the negatively charged phosphate groups of the DNA backbone will interact. Electrostatic surface potentials of the model Fv domains were calculated and are compared with the surface potentials of the crystal structures, where applicable, in Fig. 3. Although there are some differences in the potential surfaces, the general features of the combining regions of the models do correspond with the calculated potentials of the observed structures.
Single Stmnd-specific Nucleic Acid-binding Antibodies-Of the six antibodies that we have examined, two are specific for single-stranded nucleic acids: Hed 10, which binds poly(dT), and Jel 201, which is specific for poly(ADP-ribose). The combining site of Hed 10 forms a long cleft roughly parallel to the VH-VL domain interface. The interior of this cleft is particularly enriched in tyrosines, mostly from the H3 loop, and is bordered on the VL side by a large ridge of positive electrostatic potential created by lysine and arginine residues in CDR-L1 and CDR-H2 (Fig. 3A). The current model places the sugar phosphate backbone of the DNA along this positively charged ridge while the thymine bases penetrate into the cleft and stack with the side chains of the tyrosines in the H3 loop.
The VL domain of Jel201 is significantly homologous to Hed 10 (>go% amino acid identity), but its VH domain is different. Specifically, it has a shorter H2 loop and a much longer H3 loop, resulting in the modeled combining region forming a shallower cleft than in Hed 10. The H3 loop of Jel201 was modeled from the H3 loop of McPC603, despite the fact that it lacks the aspartic acid residue at position HlOl necessary to form the salt link with arginine H94. Although the model for this loop may not be correct in detail, the general features of the combining region suggest that adenine bases of the nucleic acid antigen may stack between the tyrosine residues and form specific hydrogen bonds with the side chains of the asparagine residues in CDR-H3. The ribose moieties would then interact with the polar residues in CDR-L1 and CDR-L3. This would place the phosphate groups of the nucleic acid in the region of positive potential centered on the H3 and L2 hypervariable loops (Fig. 3G).
Double-and Diple-stranded DNA-binding Antibodies-The combining regions of the double strand-specific (Jel72, Jel274, and Jel 242) and triple strand-specific (Jel 318) anti-DNA antibodies share several features that distinguish them from the single-stranded DNA-specific antibodies. The combining regions of Jel 72, Jel274, Jel242, and Jel318 possess relatively flat surfaces that are bordered by knob-like ridges composed of residues in the H2 loop on the VH side or residues in the L1 or H3 loop on the VL side of the combining region. These projecting knobs often possess strongly positive electrostatic potential, whereas there is commonly an area of negative potential near the L3 side of the combining region (Fig. 3, C-F). If the DNA were to bind with its helix axis parallel to the VH-VL domain interface, then it would be bracketed by the positively charged knobs, but the sugar phosphate backbone of the DNA would have to pass through the area of negative electrostatic potential. The possibility exists that this region serves as a cationbinding site that could act as a bridge between the negatively charged side chains and the DNA. In a heavy atom derivative of Jel 72, the cation UO? binds with high occupancy in this area of the combining site around glutamic acid H50 at the base of CDR-H2. If cation binding is required for DNA binding, then neutralizing the electronegative areas by substituting glutamines or asparagines for glutamic or aspartic acid residues should result in lowered affinities for DNA, whereas a glutamic acid to arginine mutation may increase affinity for DNA.
Another possibility is that the DNA binds with its helix axis inclined to the VH-VL domain interface. The protruding knobs would then be in position to penetrate the major and minor grooves of the DNA and make specific contacts with the DNA bases, an idea first suggested by Stollar (34). In this orientation, the DNA helix would be mostly localized near the H3 side of the combining region, away from the negatively charged areas, and the mutations suggested above should have little effect on DNA binding. A consequence of this model is that it places the residues of the H3 loop within the major groove of the DNA. For Jel 242, substitutions can be made at aspartic acid L28 and glutamic acid H50 to discriminate between the two orientations. To increase the potential for specific hydrogen bonding with the DNA bases, valine H99 could be changed to threonine, and tyrosine H98 substituted with asparagine or glutamine to increase specificity for DNA containing adenines or to arginine for greater guanine specificity. In the crystal structure of a zinc finger protein bound to DNA, arginines were observed to interact quite specifically with guanine bases (35).
A similar type of interaction may be occurring in anti-DNA antibodies. The H3 loop of Jel72, which binds the right-handed duplex of poly(dG).poly(dC), is particularly arginine-rich, whereas the H3 loop of Jel242, which binds B-DNA with little sequence specificity, contains only 1 arginine residue.
Jel274 also binds B-DNA, but has a sequence preference for DNA that is GC-rich (16). The amino acid sequence of the VH domain of Jel274 is significantly homologous to the V , domain of Jel 72 (>84% identity), whereas their VL domains are less similar. This observation, coupled with the fact that both antibodies bind very similar nucleic acids (Table I), suggests that their VH domains are major determinants of the binding affinity for double-stranded DNA that contains guanine and cytosine bases. Both antibodies possess a long L1 and a short H2 hypervariable loop. The distribution of positive charge in the combining region (Fig. 31) suggests that Jel274 binds doublestranded DNA at angle, with the long axis of the DNA inclined to the VH-VL domain interface. In this orientation, the H3 loop is positioned within the major groove, whereas residues of the H2 loop are in position to interact with the minor groove of the DNA, similar to the model for Jel 72 binding A-type poly(dG).poly(dC) (13). The arginine residue at position H99 could thus be a major determinant of sequence specificity by interacting specifically with the guanine bases of the DNA.
In summary, the sequences, gene usage, and model or crystal structures of six DNA-binding antibodies have been examined. The antibodies were obtained either from autoimmune sera or by immunization with nucleic acids. Overall, it is difficult to distinguish between these two sets of antibodies based on sequence or gene usage. Whereas no present modeling protocol can precisely predict Fv domain structure, the shape and overall charge distribution of the modeled combining regions generally correspond to those of the crystallographically determined structures. The nature of the antigens for the antibodies discussed here is such that electrostatic forces are likely to play a significant role in the interaction with antibody. Therefore, identifying those regions of the antibody that are likely to interact with the negatively charged groups of the DNA backbone is an important first step in understanding antibody-DNA in-

Sequences and Models of Anti-DNA Fv Domains
.. r potential mechanisms for recognition of nucleic acids. These models may aid in designing antibodies with enhanced or altered specificity for DNA by indicating key residues for substitution. If the affinity for specific DNA sequences can be increased, these antibodies may prove more efficacious for the crystallization of Fab-DNA complexes. The resulting detailed structures of Fab-DNA complexes will assist in the design of antibodies that not only bind to specific DNA sequences, but also catalyze cleavage of the DNA.