Ricin B Chain Is a Product of Gene Duplication*

The cytotoxin ricin is a heterodimer composed of an A chain which enzymatically inhibits protein synthesis on eukaryotic ribosomes and a lectin B chain which binds to cell surfaces and triggers uptake of the toxin. A low resolution (4 A) x-ray structure revealed that the B chain is a bilobal structure and that each domain binds a galactose sugar. These apparent structural similarities suggested that the protein might have internal symmetry and might have arisen by gene duplication. A subsequent search of the amino acid sequence pro- vided strong evidence for homology between the N H z and COOH-terminal halves of the B chain and sug- gested that they may have arisen from a common ancestor. There is no strong relationship between the halves of the A chain and little, if any, significant sequence ho- mology between the A and B chains of ricin.

The cytotoxin ricin is a heterodimer composed of an A chain which enzymatically inhibits protein synthesis on eukaryotic ribosomes and a lectin B chain which binds to cell surfaces and triggers uptake of the toxin. A low resolution (4 A) x-ray structure revealed that the B chain is a bilobal structure and that each domain binds a galactose sugar. These apparent structural similarities suggested that the protein might have internal symmetry and might have arisen by gene duplication. A subsequent search of the amino acid sequence provided strong evidence for homology between the N H zand COOH-terminal halves of the B chain and suggested that they may have arisen from a common ancestor.
There is no strong relationship between the halves of the A chain and little, if any, significant sequence homology between the A and B chains of ricin.
Ricin is a toxic glycoprotein found in the seeds of Ricinus communis, the castor bean plant. Its extreme toxicity in eukaryotic cells has been attributed to its ability to inhibit protein synthesis. Since Lin et al. (1) reported that ricin showed antitumor activity, the biochemical and biological properties of this protein have been extensively studied by several investigators (2)(3)(4)(5)(6). The toxin is composed of two subunits linked together by a single disulfide bond. The A chain (M, = 30,600) has been shown catalytically to inactivate the 60 S ribosomal subunit such that it has a greatly reduced interaction with the elongation factors (7-9). The B chain (M, = 31,400) is a lectin with a high affinity for galactose (10). It is also known to bind cell surface receptors, presumably via oligosaccharide recognition, and to thereby initiate entry into the cell (11-13). Recently, the complete amino acid sequence of both subunits has been determined (14, 15).
We have previously reported the crystallization of ricin (16) and we have now constructed a 4-A resolution model of the protein which shows the general molecular structure of the two chains. Using difference Fourier methods we have been able to locate two lactose binding sites on the B subunit. Analysis of the model indicated that the B chain has two lobes and subsequent sequence comparisons within the B chain amino acid sequence c o n f i s the notion of a two domain structure for the B subunit. In this report, we provide * This investigation was supported by United States Public Health Service Grants CA-24059 and AI-13584 from the National Cancer Institute and the National Institute of Allergy and Infectious Diseases. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. evidence supporting our conclusion that the B chain of ricin is a product of gene duplication.

EXPERIMENTAL PROCEDURES
X-ray diffraction data were collected by oscillation photography to a resolution of 2.8 t f for native ricin and four isomorphous heavy metal derivatives. Phasing and heavy metal parameter refinement were carried out by conventional crystallographic methods and electron density maps at 4-A and 2.8-A resolution were calculated. Details of the crystallographic analysis will be published later, when our high resolution analysis is complete.
The 4-A three-dimensional electron density map was traced onto Plexiglas and displayed in a mounting box. Solvent regions were easily discernible through most of the map, facilitating the assignment of molecular boundaries. A model was constructed from %inch balsa wood using the electron density map as a guide and the corresponding density contour paper tracings as templates. Positions of the lactose binding sites were obtained from difference Fourier calculations, using the hkO and h01 projections of native ricin and of a ricin.lactose complex, formed by soaking the crystals in 2 mM sugar. The sugar peaks were 6 and 4 times the root mean square difference density and could be positioned with confidence.
Amino acid sequence alignments were analyzed by the computer program KOFO provided by Dr. J. L. Fox. The theory and operation of this program have been discussed extensively (17) and originate from ideas proposed by W. M. Fitch (18). In brief, the program scans peptides of a given length for all possible alignments of the compared sequences then lists only those within a given range of statistical probability based on the minimum base change per codon (MBC/C) per scanned peptide.

RESULTS AND DISCUSSION
The 4-A ricin model is shown in Fig. 1. The molecule appears to be a compact structure with the two elongated subunits oriented in a roughly parallel fashion. The A chain (the lighter subunit in Fig. 1) has an asymmetrical wedge shape and is more tightly associated with the B chain at its larger end (top in Fig. 1). Very likely, the interchain disulfide is situated in this region as there is evidence that the bond is partially buried in the interior of the molecule (19). We are presently unable to determine the disulfide position unambiguously from the electron density map, however. The B chain appears to be composed of two similarly shaped domains, each containing a lactose binding site. The lactose sites are shown as white blobs on the darker subunit in Fig. 1. The difference Fourier indicated that one of the lactose sites (upper site) was more highly occupied than the other. Two binding sites for the B chain have previously been suggested (20, 21), which is consistent with the observation that ricin has lectin activity. A prominent groove in the region along the lactose sites indicates that an extended site may exist for oligosaccharide binding. T h i s conclusion is supported by the observation that higher binding constants are obtained for oligosaccharides and glycopeptides than for simple sugars and disaccharides (23). Extended sites have already been proposed for several other lectins (24-26). Although CD measurements indicate that ricin undergoes a small conformational change upon lactose binding (22), no significant conformational difference between the native and the ricinelactose complex has been detected in our maps.
The similarity in shape of the domains of the B chain and the nearly symmetrical distribution of sugar binding sites suggested that the protein might have arisen by gene duplication, and an analysis of the B chain sequence was therefore .. . undertaken. This search disclosed a striking similarity in the loops between the disulfides. Fig. 2 shows a diagrammatic representation of the B chain sequence emphasizing the repeat unit. Alignment of the sequence using the disulfides as common features revealed several areas of amino acid homology between halves of the protein. Fig. 3 shows the best alignment of B chain, matching sequences 1-132 and 133-260. Strongly homologous areas between the two halves are enclosed by boxes. A statistical analysis using the program KOFO supports our choice of alignments. For example, the 14-residue sequences 92-105 and 218-231 differ by only six base changes, a homology with an estimated probability of 5 X that such a coincidence could arise by chance (17). Also, sequences of residues 12-21 and 141-150 gave a minimum base change of 4 for the decapeptide with an estimated probability of only 4 X 10". The overall minimum base change per codon (27) for the entire alignment as shown in Fig. 3, not considering deletions or insertions, is 1.16.
Other methods for assessing protein relatedness from amino acid sequence have been developed (28). Although we lack the facilities to perform these other tests, the analysis we have done gives such strong support to our contention that the two halves of ricin B chain are related that we assume other standard tests will also support it.
Similarities in amino acid composition and tryptic maps between the A and B chains have previously been noted (2). However, their significance is not clear since the two subunits show no immunological cross-reactivity (29). KOFO was used  to search for homologies between the A and B chain sequences and within the A chain itself. No strong agreement was found. It is interesting to note, however, that the strongest agreement (probability = 2 x was a match between A chain sequence 11-30 and B chain sequence 216-235 which differed by 17 base changes over the 20 residues. Since this is apparently a highly conserved part of the B chain sequence, it may indicate that the chains had a common ancestor in the very distant past although we stress that the A to B correlation is very much lower than that between the two halves of the B chain. The evidence provided here presents a strong case for gene duplication in the B subunit of ricin. That is, the protein shows two distinct folding domains of similar size and shape. The linear sequence shows a striking translational repetition of disulfide bonds, loop sues, and amino acid identities. The B chain also shows two sugar binding sites, which may indicate that each half of the sequence contains residues for one site. If this is so, the domains are oriented in space so as to bring these sites near to one another, perhaps via a pseudo-2-fold relationship. It is possible that the region has developed into a single binding cavity after gene duplication. The toxins abrin and modeccin also have A and B subunits with almost identical physical and biological properties to those of ricin (2,(30)(31)(32). Structural comparisons between these toxins should therefore prove interesting. Areas of high homology in sequence comparisons of the A chains could indicate those regions which participate in A chain-ribosome interactions, while B chain comparisons could pinpoint those areas involved with oligosaccharide binding. The 2.8-A resolution analysis now in progress should reveal to what extent the B chain's internal homology persists in the secondary and tertiary structure and may provide a more powerful tool than sequence comparison for assessing the relatedness of the A and B chains.