Nuclear magnetic resonance and molecular modeling studies on O-beta-D-galactopyranosyl-(1----4)-O-beta-D-xylopyranosyl-(1----0)-L-se rine, a carbohydrate-protein linkage region fragment from connective tissue proteoglycans.

The solution conformation of O-beta-D-galactopyranosyl-(1----4)-O-beta-D-xylopyranosyl-(1----0)-L-ser ine (GXS), a carbohydrate-protein linkage region fragment from connective tissue proteoglycans, was investigated by two-dimensional NMR spectroscopy and molecular modeling calculations. Specifically, the 1H and 13C resonances were assigned by 2D-COSY and by 1H-13C heteronuclear correlation spectroscopy methods. 2D-NOESY was used to generate distance constraints between the galactose and xylose and between the xylose and serine residues. The 1H vicinal coupling constants for the sugars and the serine were also determined. A general molecular modeling methodology suitable for complex carbohydrates was developed. This methodology employed molecular dynamics and energy minimization procedures together with the application of inter-residue spatial constraints across the linkages derived from 2D-NOESY. The first step in this methodology is the generation of a wide variety of starting conformations that span the (phi, psi) space for each linkage. In the present study, nine such conformations were constructed for each linkage using the torsion angles phi and psi corresponding to the gauche+, gauche-, and trans configurations across each of the two bonds constituting the linkage. These conformations were subjected to a combined molecular dynamics/energy minimization refinement using the NOESY derived constraints as pseudoenergy functions. Families of conformations for the whole molecule were then constructed from the structures derived for each linkage. Characterization of GXS using this methodology identified a single family of conformations that are consistent with the solution phase NMR data on this molecule.

The solution conformation of 0-&D-galactopyranosyl-(1~4)-O-j3-D-xylopyranosyl-(l~O)-L+3erine (GXS), a carbohydrate-protein linkage region fragment from connective tissue proteoglycans, was investigated by two-dimensional NMR spectroscopy and molecular modeling calculations. Specifically, the 'H and 13C resonances were assigned by SD-COSY and by 'H-13C heteronuclear correlation spectroscopy methods. BD-NOESY was used to generate distance constraints between the galactose and xylose and between the xylose and serine residues.
The 'H vicinal coupling constants for the sugars and the serine were also determined.
A general molecular modeling methodology suitable for complex carbohydrates was developed. This methodology employed molecular dynamics and energy minimization procedures together with the application of inter-residue spatial constraints across the linkages derived from SD-NOESY. The first step in this methodology is the generation of a wide variety of starting conformations that span the (I$, #) space for each linkage.
In the present study, nine such conformations were constructed for each linkage using the torsion angles 4 and $J corresponding to the gauche+, gauche-, and tram configurations across each of the two bonds constituting the linkage. These conformations were subjected to a combined molecular dynamics/energy minimization refinement using the NOESY derived constraints as pseudoenergy functions. their importance for the maintenance of normal tissue architecture has long been recognized. More recently, it has also become evident that the glycosaminoglycans and their parent proteoglycans serve important functions in intracellular events as well as in the interaction of the cells with their environment (HBBk et al., 1984). Some of the known functions of the proteoglycans and hyaluronan may be viewed as an expression of the general physicochemical properties of these substances, among which their large size and polyanionic character are most prominent (Comper and Laurent, 1978). Other functions, however, are the result of specific interactions involving distinct segments of the complex carbohydrate molecules, as is strikingly illustrated by the anticoagulant activity of heparin which is due to the binding of antithrombin to a unique pentasaccharide segment in the polysaccharide (Lindahl et al., 1980). A special group of interactions is the recognition of substrate structures by the proteoglycan-synthesizing enzymes (Rodbn, 1980), but knowledge in this area is still rudimentary.
In our work, 2D-NMR spectroscopy was employed to deduce information about the conformations of the individual sugars and to determine interresidue distance constraints.
Molecular modeling using those  constraints in a combined energy minimization (EM) and molecular dynamics (MD) approach, commonly called simulated annealing, was then used to find a set of models consistent with the experimental data . While the methodology  can be adapted to any complex carbohydrate,  unbranched  oligosaccharides  particularly  lend themselves to a simplified  analysis by this methodology  since the conformation  for the  whole oligosaccharide can be constructed from the conformations of the individual modules composed of 2 residues linked together (a disaccharide or a monosaccharide linked to an amino acid). Implicit in this modular analysis approach is the assumption that interactions between non-neighboring residues are negligible. Such a situation is commonly realized at the oligosaccharide level in connective tissue proteoglycans where l-3 and 1+4 linkages are abundant. Further, this situation can be easily ascertained by comparing the NMR data for di-and higher oligosaccharides.
The absence of NOESY contacts between non-neighboring residues in an unbranched complex carbohydrate such as in GXS is also compatible with this assumption. This methodology was used to characterize the conformational properties of GXS in solution.

MATERIALS
AND METHODS GXS was synthesized as described (Ekborg et al., 1987). The NMR experiments were performed at 25 "C on solutions of GXS in D20 (pD 6.8). The 'H chemical shifts were referenced to the residual HDO signal (4.8 ppm). The '"C chemical shifts were referenced to internal dioxane (67.4 ppm). The NMR measurements were performed at 400 MHz on a Bruker WH-400 spectrometer equipped with an Aspect-2000 computer and a Diablo-31 Drive. All the BD-NMR experiments (PD-COSY (Aue et al., 1976), 2D-J resolved spectroscopy (Nagayama et al., 1977), BD-NOESY (Kumar et al., 1980), and the 'H-l% correlation spectroscopy) were performed with 256 tl and 1K t2 data points. The spectra were plotted in the absolute value mode. The 2D- In the initial rate regime, the crosspeak intensities (IAs) are proportional to (l/r& where TAB is the interproton separation (Ernst et al., 1987). The combined use of MD and EM with constraints derived from NOES to refine model structures for biological molecules is now well established (Clore et al., 1985;Folkers et al., 1989;Kaptein et al., 1985;Levitt,, 1983;McCammon and Harvey, 1987;Scarsdale et al., 1988). This procedure is one way to build models that are consistent with the experimental results. It should be emphasized that the purpose of the potential energy function in these calculations is to guarantee reasonable stereochemistry (acceptable bond lengths and bond angles, no unacceptable atomic overlaps, and so on). In this sense, the approach is similar to restrained refinement of x-ray crystallographic models (Konnert and Hendrickson, 1980). Small differences in the relative energies of the various models are not significant; any model that satisfies the observed distance constrains is an acceptable one, while those models that do not do so are not acceptable.
A preliminary report of the protocol used in our methodology has been presented elsewhere (Krishna et al., 1990) and is briefly described in the following. Because of the lack of a general solution to treat multiple minima in the ($, $) space (Ha et al., 1988;Scheraga, 1983), it is essential to construct several starting structures so that they can span the available conformational space in a reasonable manner (Ha et al., 1988). Hence, Stage 1 of the protocol involves the construction of several starting structures.
Our initial attempts to generate such structures by subjecting an arbitrary conformation of GXS to a high temperature (1000 K) MD simulation (a 50-ps MD simulation without constraints, where random atomic velocities corresponding to 1000 K were repeatedly assigned every 5 ps) showed that this MD simultation could not overcome the barriers across the inter-residue linkages, and, as a result, the conformations tended to localize around three points in the ($+ #) space rather than sample the available space in a reasonable fashion. Ha et al. (1988) have chosen, in their simulation of maltose, starting conformations defined on a 20 o grid in the ($, $) space, and these were subsequently energyminimized.
In our approach, we chose a total of nine starting conformations over the (4, +) space for each inter-residue linkage (i.e. corresponding to 81 starting conformations for GXS, and in general 9" starting conformations for an oligosaccharide with "n" linkages) and subjected each of these to high temperature MD evolution (1000 K, 5 ps on GXS were the same as those used in an earlier study of cyclodextrin (Prabhakaran and Harvey, 1987), which were derived from GROMOS (Aqvist et al., 1985;Van Gunsteren et al., 1983) with slight modification, as follows. The partial charges for a single sugar unit including the hydrogens were calculated (Prabhakaran and Harvey, 1987) using the Gaussian 80 (UCSF) program with minimal basis set and without geometrical optimization. The atomic charges for the serine residue were those of GROMOS (Van Gunsteren et al., 1983). The amino group of serine was assumed to be deprotonated and the carboxyl was assumed to be protonated to remove the positive and negative charges so as to make the charge distribution compatible with that in the native core protein. Table I contains a list of partial charges for the GXS molecule used in the computer modeling studies. A distance-dependent dielectric constant was used in the calculations (McCammon and Harvey, 1987). In addition to the usual terms for the covalent and noncovalent interactions, a constraint energy penalty term was added to the potential function for some of the MD/EM runs in the protocol described above. A semiharmonic potential of the following form was used for this term (Kaptein et al., 1985): where K is the force constant (3 kcal/mol A' in the present calculations), r is the i@erproton distance, and ro is the NOE cutoff distance. A value of 3.5 A has been used as a conservative estimate for r0 in our calculations. This constraint energy term was used for all proton pairs that showed NOE contacts.

RESULTS
NMR Spectroscopy-The assignment of the 'H resonances from the Gal, Xyl, and Ser residues based on the sequential connectivities of cross-peaks in the BD-COSY spectrum is straightforward and some of the assignments are indicated in Fig. 1. The 13C resonances were assigned from the 'H-r% correlation spectrum. The 'H and 13C chemical shifts for GXS based on these experiments are listed in Table II. Our assign- We have, however, revised their assignment for the serine residue and have assigned the multiplet structure at 4.22 ppm to one of the P-hydrogens, with the multiplet structure centered at 3.97 ppm assigned to the other /3-hydrogen. The multiplet structure at 3.92 ppm was assigned to the a-hydrogen.
These proton assignments for serine are consistent with the vicinal coupling constant data in Table III,  wm.
A rotamer population analysis (Bystrov, 1976) of the data for the serine residue indicated that it exists predominantly in the "c rotamer" configuration with x = 60". The coupling constant data for the two sugars are in excellent agreement with the predicted values for the 4C1 conformations of B-Dpyranosides (Altona and Haasnoot, 1980). NOESY cross-peaks at two different mixing times (200 ms and 400 ms) are shown. For brevity, a simplified notation which is self-explanatory is used to identify the cross-peaks. The PD-NOESY spectrum in Fig. 1 shows several interesting cross-peaks, and some of these have been identified in the figure. Some cross-peaks are intraresidue in origin while others are inter-residue (Gal-Hl', Xyl-H4'; Gal-Hl', Xyl-H5',,; Xyl-Hl', Ser-HP etc.). Many of these cross-peaks are too weak to observe at 200 ms mixing time, but they begin to build up in intensity at 400 ms.

Generation of Starting Conformations
(Stage l)-On the basis of the 'H vicinal coupling constant analysis (Ekborg et al., 1987;Van Halbeek et al., 1982), both the sugars were assumed to adopt the 4C1 chair conformations as starting structures in Stage 1. No sugar repuckering was observed in any of the simulations (including the 50-ps simulation mentioned earlier). The exocyclic torsion angle 0 (05'-C5'-C6'-06') for the galactose residue has been assigned a starting value of 178.2" as observed in the crystal structure for P-Dgalactose (Sheldric, 1976). Since the side chain of the serine residue exists predominantly in the c rotamer configuration, a value of x = 60" was used for all the Stage 1 conformations. In defining the stage 1 conformations, the torsion angles for each linkage (i.e. &, +I for X-S and &, lc/p for G-X) were given values that corresponded to the gauche+, gauche-, and trams configurations. This procedure generated 9 conformations for each linkage (See Tables IV and V).

MD/EM
Calculations-Under the assumption that the interactions between non-neighboring residues are negligible (i.e. between galactose and serine in the case of GXS), it is permissible to subject each linkage in an oligosaccharide separately to the protocol described above. Thus inter-residue linkages for the disaccharides X-S and G-X were subjected separately to the protocol and the resulting variations in the torsion angles due to refinement at different stages are shown in Tables IV and V, respectively. While working with G-X, the linkage oxygen connecting to the serine residue was replaced by an OH group. In a similar fashion, while performing calculations on X-S, the linkage oxygen connecting to the galactose was also replaced by an OH group. The side chain orientation of serine defined by the torsion angle x remained relatively invariant at the various stages of the refinement (see Table IV). The exocyclic torsion angle, 0 (05'-C5'-C6'-06') for the galactose residue grouped into two values centered around 177.2" and 60.7" in the final stage (not shown). On the other hand, the linkage torsion angles 41, +I, 42, and & experienced considerable variations at different stages of the MD/EM refinement. These variations are also plotted in Figs. 2 and 3 to emphasize the convergence of nine starting conformations into distinct families. It is noted that for X-S, the conformations converged into three distinct families, A1, B1, and C1. For the other linkage, G-X, the conformations converged into two distinct families, AZ and BP. Table VI shows a comparison of the average torsion angles and the inter-residue proton distances obtained for the different families. In each case, there was only one set of conformations in the final stage that correctly reproduced the observed NOESY contacts. These are the A, conformations for X-S and the AP conformations for G-X. The other conformations (B1, C1, and BJ represent models that are trapped in local minima of the potential energy function. Interestingly, the calculated potential energies of these other structures are 0.5-5.0 kcal/mol higher than those of conformations A1 and A*. We emphasize that these differences are not an indication that A, and AZ conformations are "better" than the other conformations in a structural sense. The purpose of our Torsion angles in X-s" stage 1    methodology is to build models that are in agreement with the NMR data; it is the experimental results that determine what "good" models are. Thus, the fact that the acceptable models also happen to have the lowest energies is simply an indication that our potential energy function is a reasonably good one.

DISCUSSION
Since the other conformations (B,, C1, and B2) do not satisfy the NOESY contacts, they are ignored in the remainder of the analysis.
From these results, the conformation for the GXS molecule was constructed as A2A1. Here the average linkage torsion angles for the AZ (G-X) and A1 (X-S) families were used. Since the exocyclic torsion angle assumes two separate values in the final stage, the AZ family for the G-X linkage is composed of two subfamilies corresponding to these values. For the purpose of discussion, a value of 1'77.2" was selected for the exocyclic torsion angle of galactose in the AP conformations. To account for any long range effects on charge distribution and to relieve any steric conflicts that might arise due to the modular construction of the conformation of the whole molecule, the family of conformations AzAl was subjected to additional energy minimization without NOE constraints. This step resulted in a slight improvement in the energy of molecules (by 0.2 kcal/mol).
The resulting family, A2'A1', is representative of the conformations of GXS compatible with the experimental data. Table VII shows a comparison of the computed distances for the A2'Al' family, with distances derived under the assumption of a single preferred conformation from the NOESY intensities in the initial rate regime. Fig. 4 shows a typical example from the final A2'A1' set of structures for GXS.
Even though the conformations of A2'A1' family are compatible with the observed NOE contacts in GXS, we have also examined the possibility that a conformational exchange between different families might account for the observed interresidue NOE contacts. This is an important consideration since the neglect of such an exchange in the interpretation of NMR data could result in a virtual conformation for the oligosaccharide that is either unrealistic or present in the solution phase only in small populations (Cumming and Carver, 1987). Table VI shows a comparison of the calculated inter-residue proton distances for the families A,, B1, Ci, AZ, and Bil, together with the average torsion angles. The B1 family predicts that both /3' and p" protons will experience a significant NOE contact with Xyl-Hl' proton, whereas the C!, family predicts a NOE contact between Xyl-Hl' and Ser-HP". These contacts were not observed in the experiment. The BP family for the G-X linkage predicts that both the H5' Linkage Family NMR of Proteoglycan Linkage Region Fragment and H5" protons of xylose will experience strong NOE contacts with Gal-Hl' proton. This is at variance with the experiment.
Thus, a conformational exchange with families such as A2'B1', B2'A1', B2'Cl' etc. would have resulted in NOE contacts that were not observed in the experiment on GXS. Thus, the populations of these latter conformations appear to be negligible. However, for other types of oligosaccharides, such a conformational exchange should also be considered in the modeling calculations.
Our approach differs from that of Scarsdale et al. (1988) in some essential details. These authors selected three starting conformations for the whole molecule uersus nine starting conformations for each linkage in our approach. In general, it is desirable to have as many starting structures as possible so that the conformational space is adequately sampled during the molecular mechanics calculations.
Second, the form of the pseudoenergy function chosen in our work to represent NOE constraints is similar to that used by Kaptein et al. (1985) in their protein modeling studies, but differs from that of Scarsdale et al. (1988). The latter authors have used a potential function that has a negative minimum at r = ro, approaches zero for r > ro, is positive for r < ro, and tends to infinity for r -+ 0. This function has the advantage of leading to rapid convergence, but the distance ro has maximum weight because of the form of the function.
The semiharmonic potential chosen in this work allows us to set an upper limit on r. for the distance between any two protons between which a NOE was observed. We have selected a conservative value of 3.5 8, for ro. Further, Equation 1 gives equal weight to all distances less than ro, and hence the computed distances will be relatively free of any bias induced by the pseudoenergy function as long as the NOE distance constraint is satisfied. A third difference involves our final step in the protocol in which the NOE constraints were lifted and the conformations were further subjected to EM to relax an unreasonable steric and bond angle distortions resulting from the previous steps (MD/ EM) that employed distance constraint penalty term (E(NOE)).
This final step becomes especially important in those calculations where the use of strong force constants for E (NOE) can result in unacceptable bond length and bond angle distortions. It is interesting that the galactose anomeric proton, Gal-The conformations of the individual monosaccharide units Hl', shows NOE contacts not only to the H4' proton across the glycosidic linkage but also to the H5' proton of xylose. It in the connective tissue proteoglycans and the spatial relationships between adjacent units are important determinants is conceivable that in other, more complex oligosaccharides, in the intramolecular interactions as well as in the interacthe anomeric proton of one sugar may show NOE contacts tions of the polysaccharide chains with other molecules. We only to protons not directly involved in the glycosidic linkage Dua et al., 1986;Lemieux et al., 1980). Our are now beginning to understand the structural basis of some of these interactions, particularly those which have readily observations and the earlier findings of other researchers Dua et al., 1986;Lemieux et al., 1980) suggest that one should be cautious in the use of proton NOE contacts to establish glycosidic linkages in deducing the primary structure of a complex carbohydrate.
Establishment of glycosidic linkages through the detection of long range 'H-13C correlation across the linkage in a heteronuclear multiple bond correlation experiment (Byrd et al., 1987;Kessler et al., 1986;Lerner and Bax, 1987) is more reliable.
In conclusion, we have developed a general methodology suitable for modeling of complex carbohydrates.
Unbranched oligosaccharides lend themselves to a simplified modular analysis by this procedure.
This methodology is based on MD/EM calculations with NMR-derived constraints introduced into the calculations to generate conformations compatible with the experimental data.
measurable functional consequences. The majority of the interactions examined to date involve the repeating disaccharide segments of the polysaccharide chains, e.g. the specific binding of a decasaccharide segment in hyaluronan to the core protein of the chondroitin sulfate proteoglycan of cartilage, which may be added to the examples already mentioned above (Rod& 1980). It is presently not known whether the carbohydrate-protein linkage region of the xylose/serinlinked proteoglycans plays a role in the interaction of these macromolecules with their environment. Indeed, it is possible that the existence of the specific linkage region is a reflection of the need for metabolic regulation during the intracellular assembly of the proteoglycans. The biosynthetic pathways and various mechanisms of regulation of the assembly process have been described elsewhere (Rodkn, 1980) and will not be discussed here. In closing, it should be pointed out, however, that the disaccharide component of GXS represents a unique structure in the mammalian complex carbohydrates and that both galactose and xylose, in /31,4-linkage, are required for recognition by galactosyltransferase II, which catalyzes the third step in the assembly of the polysaccharide chains. Although there is an abundance of galactose-containing glycoconjugates in animal tissues, the substrate specificity of the enzyme thus ensures that it operates with a high degree of fidelity and does not effect transfer to galactose residues other than the xylose-linked residue of the growing connective tissue polysaccharide chain. The basic data on the conformation of GXS reported here will aid in the understanding of the mechanism of action of galactosyltransferase II when the enzyme has been purified and studies of its three-dimensional structure become possible.