Proteoglycans and Their Heterogeneous Glycosaminoglycans at the Atomic Scale

: Proteoglycan spatiotemporal organization underpins extracellular matrix biology, but atomic scale glimpses of this microarchitecture are obscured by glycosaminoglycan size and complexity. To overcome this, multi-microsecond aqueous simulations of chondroitin and dermatan sulfates were abstracted into a prior coarse-grained model, which was extended to heterogeneous glycosaminoglycans and small leucine-rich proteoglycans. Exploration of relationships between sequence and shape led to hypotheses that proteoglycan size is dependent on glycosaminoglycan unit composition but independent of sequence permutation. Uronic acid conformational equilibria were modulated by adjacent hexosamine sulfonation and iduronic acid increased glycosaminoglycan chain volume and rigidity, while glucuronic acid imparted chain plasticity. Consequently, block copolymeric glycosaminoglycans contained microarchitectures capable of multivalent binding to growth factors and collagen, with potential for interactional synergy at greater chain number. The described atomic scale views of proteoglycans and heterogeneous glycosaminoglycans provide structural routes to understanding their fundamental signaling and mechanical biological roles and development of new biomaterials.


■ INTRODUCTION
Proteoglycans (PGs) are vital constituents of cellular membranes and extracellular matrices, where they confer tissue mechanical properties, facilitate constitutive cell−cell interactions, and are implicated in neurodegeneration, arthritis, and cancer. 1−4 Human PGs, such as bikunin and decorin, 5,6 comprise a core protein plus one or more covalently linked glycosaminoglycan (GAG) polymers, which possess functionally important patterns of postsynthetic modification. In particular, domains rich in sulfate and iduronic acid maintain connective tissue integrity and mediate signal transduction via multivalent interactions with collagen and growth factors. 7−10 However, the molecular basis for such bioactivity remains enigmatic, in part, because current representations of PGs and attendant GAGs lack atomic scale conformational detail.
Obtaining high resolution 3D data for PGs is hampered by GAG sequence heterogeneity, conformational flexibility, and molecular size. Electron microscopy and small-angle X-ray scattering have provided low resolution insights. 11,12 However, atomic resolution crystallographic studies have been limited to either the protein core (e.g., for small leucine-rich repeat PGs 13−15 ) or isolated GAG components (e.g., fiber X-ray scattering data for chondroitin-4-sulfate 16 ). Modeling offers an avenue to progress, but accurate fine-grained simulations (i.e., including all atoms and water solvent) have been restricted to oligosaccharides and short sub-μs time scales (notwithstanding approaches that distort biologically important kinetics 17 ). This limitation has, to some extent, been recently overcome by the use of application-specific integrated circuits and graphics processing units, which have enabled unbiased multi-μs computational studies. 18,19 However, application of fine-grained simulations to even small PGs will remain out of reach for the foreseeable future.
Less computationally demanding coarse-grained (CG) carbohydrate models have been successfully developed, 20,21 but in these, only the polymer glycosidic linkage degrees of freedom remain and functionally important pyranose ring flexing (or puckering) is neglected. Recently, we developed a CG approach that includes both carbohydrate linkage and ring motions. 22 In this method, linkage torsional and ring puckering free energy surfaces are calculated from microsecond time scale fine-grained simulations (containing implicit information on sugar substituents) and used to parametrize a simplified CG representation, which can be used to rapidly predict the motions of polymers using numerical integration. Although promising, the method's ability to model heterogeneous GAG sequences or glycoconjugates, such as PGs, has not been established. Therefore, at present, neither experimental nor theoretical techniques have been shown capable of characterizing heterogeneous GAGs or PGs at high resolution. Consequently, the relationships between GAG postsynthetic modification, domain organization, and PG dynamic shape, and their role in multivalent interactions with proteins, remain enigmatic at the atomic scale.
Here, we overcome this inability by broadening the applicability of our CG approach to characterize heterogeneous GAGs and PGs both accurately and at the atomic scale. Unbiased microsecond fine-grained simulations of hyaluronan and six key sequence units from chondroitin and dermatan sulfates ( Figure 1) were performed to investigate the effect of postsynthetic modifications on molecular motions. Using these data to further parametrize our CG model enabled simulations of GAGs to be performed on complex heterogeneous sequences, in general, and physiologically relevant block copolymeric structures, in particular. In further key steps, the CG approach was extended to PGs, and by exploiting GAG sequencing information and X-ray diffraction data, realistic atomic models of small leucine-rich repeat PGs were constructed. Where possible, the derived models were validated by comparison with hydrodynamic measurements. The results represent the first accurate atomistic views of heterogeneous GAGs and PGs, allowing relationships between sequence, atomic scale shape, and dynamics to be explored in these complex extracellular molecules. The reported models transcend simple schematic representations of PGs and advance understanding of their assembly and protein interactions toward clarifying the roles of PGs in health and disease and furthering their use as therapeutics. . Heteronuclear experiments were gradient selected and sensitivity enhanced; chemical shifts were referenced relative to internal DSS. Spectra were processed using Topspin software (Bruker Corp., Billerica, U.S.A.); 1D experiments were apodized by an exponential function (0.3 Hz decay constant) and zero-filled to twice the acquired points; 2D spectra were linear predicted in the indirect dimension to twice the acquired points and apodized with a 90°sine-bell function in both dimensions. Assignments for two sulfated disaccharides (see Materials) were performed with reference to prior data for similar compounds. 23−27 Glycosaminoglycan polysaccharide 1 H and 13 C assignments were aided by comparison with the spectra and assignments from these disaccharides. In polysaccharide [ 1 H, 13 C]-HSQC spectra, overlap of GalNAc C4 and C6 peaks hindered integration and precluded estimation of 6-O-sulfate; in this case, C5 peak integrals were used to the estimate relative levels of sulfate.

■ MATERIALS AND METHODS
Laser Light Scattering. Glycosaminoglycan polysaccharides (4 mg) were dissolved in 0.5 mL of ammonium bicarbonate (100 mM) for 24 h at 4°C and then centrifuged at 12000g for 10 min prior to application to a Superose 6 10/300GL gel filtration column (GE Healthcare, Little Chalfont, U.K.) running at a flow rate of 0.75 mL/ min in the same eluent. Samples eluting from the column at room temperature passed through an in-line DAWN HELEOS-II laser photometer with a 658 nm laser (Wyatt Technology Corp., Santa Barbara, U.S.A.), WyattQELS detector model number Q-03 and an Optilab rEX refractometer; a refractive index increment (dn/dc) of 0.16 mL g −1 was used throughout. 28 Light scattering and concentration during elution were recorded using ASTRA software (v5. 3.4.13). Multiangle laser light scattering was used to determine weight-averaged molecular masses (M) and radius of gyration (R g ) using analysis modules within the ASTRA software. For molecules too small to accurately determine R g , quasi-elastic light scattering was used to measure hydrodynamic radii (R h ), again using analysis modules within the ASTRA software. Conversion of calculated R g to R h was achieved using the formula R g = 1.5R h , derived previously for a random coil. 29 Fine-Grained Simulations. Fine-grained molecular modeling of seven GAG decasaccharides and the GAG-PG linker trisaccharide, Galβ(1 → 3)Galβ(1 → 4)Xyl, were performed using the GLYCAM06 force-field topologies and parameters for carbohydrates. 30 Each was built using the AMBER12 31 tool "tleap" and then added to cubic boxes of TIP3P water, 32 resulting in side lengths in the range 6.2−7.2 nm (≈12000 water molecules). Solute atoms were positioned at least 1.2 nm from the solvent box edge and charge neutral assemblies were achieved by adding the appropriate number of explicit Na + ions. Following initial conjugate-gradient energy minimization (1000 steps), , and (C) the dermatan iO-unit (R 1 = R 2 = OH), iA-unit (R 1 = OSO 3 − each assembly was heated to 298 K and equilibrated in the NPT ensemble for 20 ns prior to 10.3 μs unbiased NVT production molecular dynamics, performed using ACEMD software 33 and GTX TITAN graphics processing unit hardware (Nvidia Corp., Santa Clara, U.S.A.), as described previously. 19,34 The initial 300 ns was discarded and for the remaining 10 μs, atomic coordinate data were saved at 10 ps intervals for analyses. Molecular properties and axial helical rises (per disaccharide) were computed from complete 10 μs trajectories (i.e., 1000000 data sets) using standard equations described previously. 35 The distance r (graphed in Figure 2) was calculated over the four internal disaccharides within decasaccharides (i.e., between reducing-end and non-reducing-end glycosidic linkage oxygen atoms). Predominant decasaccharide conformers (illustrated in Figure  2) were derived using the AMBER12 31 tool "ptraj" and a hierarchical 3D-clustering algorithm with a sampling frequency of 100 frames (10000 in total). Glycosidic torsion angles ϕ and ψ in (1 → n) linkages (where n = 3 or 4) were computed using the IUPAC definitions: ϕ defined as O 5 -C 1 -O-C n and ψ defined as C 1 -O-C n -C (n−1) . A charge neutral 1 μs simulation of bikunin with a GAG-PG linker trisaccharide attached at serine residue 10 (Ser-10) was also performed using the methods detailed above; in this case both the GLYCAM06 and AMBER FF03 36 force-fields were used and the proteoglycan was solvated in a cubic box of TIP3P water with a side length of 7.3 nm.
Coarse-Grained Simulations. The coarse-grained (CG) model, utilizing empirical energy functions for glycosidic linkage and ring pucker, has been described previously. 22 Two-dimensional free energy surfaces for glycosidic linkage torsions and ring azimuthal and meridian angles were derived from fine-grained 10 μs simulations and parametrization of the CG empirical energy functions was achieved by fitting to these surfaces, as described previously. 22 For validation purposes only, all linkages and all rings present in decasaccharides were parametrized from their respective fine-grained simulation equivalent. For simulations of polymers, β(1 → 3)-linked disaccharide units were parametrized from data for the central two linkages and two rings of respective fine-grained decasaccharides. These units were stitched together via their β(1 → 4) linkages to produce model chains, that is, the constituent disaccharide [β(1 → 4)Uβ(1 → 3)N] is a representative unit with two linkages and two rings parametrized from a single fine-grained simulation (here, U and N represent uronic acid and hexosamine, respectively). If the ends of the chain did not naturally fit within this disaccharide register, then individual linkages and rings (with identical parameters) were appended (N to the nonreducing end and β(1 → 4)U to the reducing end). This allowed the construction of heterogeneous polymers of arbitrary length and sequence. All CG simulations were performed for 10 μs, accomplished by integrating the Langevin equation of motion independently for the linkages and rings, using the Leapfrog algorithm and a time-step of 2.5 fs, described previously. 22 During the simulations, only the positions of the glycosidic oxygen atom and pyranose ring atoms were calculated (i.e., ring substituents were not explicitly modeled) and coordinates were saved at regular intervals for analyses. No steric interactions were calculated during CG simulations (other than those implicit in the parameters). Therefore, when it was deemed necessary, and after completion of each simulation, coordinate sets with steric clashes were identified by detecting those in which CG atoms had approached within 1 nm (excluding those in the same or neighboring residues) and then removed from the simulated ensemble. Radius of gyration (R g ) calculations on polymers were performed without adding any substituents to the carbohydrate rings (test calculations on polymers showed this had a negligible effect on the outcome).
Proteoglycan Simulations. Models were constructed using Cartesian coordinates for small leucine-rich repeat PG core proteins (bikunin, decorin and biglycan); these were obtained from the Protein Data Bank (identification codes: 1BIK, 13 1XKU, 15 and 2FT3, 14 respectively). The decorin dimer was built by applying a transformation matrix (the PDB file BIOMT record). Terminal amino acids missing from these experimental data (residues 1−24 in bikunin and biglycan, 1−21 in decorin) were constructed using a knowledge-based algorithm with default settings in the "Loop Modeler" feature of MOE software (Chemical Computing Group Inc., Montreal, Canada). In each case the lowest energy loop conformer was selected. In the absence of experimental coordinates, this approach provides an energetically favorable N-terminal loop conformer that is based on experimental 3D-structures from the Protein Data Bank. Subsequently, all experimentally refined nonprotein atoms (i.e., ions, waters, carbohydrate glycosylation fragments) and all amino acid substituents were removed, leaving only Cartesian coordinates of the protein backbone atoms.
Polymer chains representing GAGs with specific sequences (as described above) were attached to a linker trisaccharide and onto previously identified Ser residues (within the terminal loop regions) to form the final modeled sequence: Parameters for the CG model of the linker were derived from a fine grained simulation of the trisaccharide (see above) and the parameters for the CG model of the GAG portion were as described above. The glycoprotein linkages between Xyl and Ser were fixed using the following torsional angles which are consistent with previous NMR measurements 37 and a 1 μs fine-grained simulation of bikunin with an attached linker trisaccharide. The PG core protein was also kept in a fixed geometry while 10 μs CG simulations were performed on the carbohydrate chains. Radii of gyration (R g ) calculations on PGs were performed without adding substituents to the GAG pyranose rings or side chains to the PG core protein amino acids. Additional post-translational modifications beyond the GAG chains, such as N-linked glycans, were assumed to be a minor contributor to R g and were not considered in CG simulations for practical reasons. Conversion of calculated R g to R h was achieved using the formula R g = 0.77R h , for a constant density sphere.

■ RESULTS AND DISCUSSION
Compositional Analyses and Nomenclature. The sulfate, D-glucuronic acid (GlcA), and L-iduronic acid (IdoA) content of shark cartilage and bovine trachea chondroitin sulfate (CS) and porcine intestinal dermatan sulfate (DS) were assayed using 13 C NMR spectra, Figure S1. The GAG compositional nomenclature in this work follows prior descriptions, 38    Dynamics of Subunits from Heterogeneous GAGs. Decasaccharides comprising five repeats of subunits from HA, CS, and DS (H, O, A, C, iO, iA, and iC) were subjected to unbiased 10 μs explicit molecular dynamics simulations, including all atoms (GAG, water solvent and counterions), referred to here as fine-grained. For each of the simulated molecules, the distance r between the two endmost glycosidic oxygen atoms (across the internal four disaccharides and thereby excluding possible end effects 40 ) was computed. Based on time series plots of r (Figure 2), each decasaccharide had a characteristic dynamic range and frequency of motion. In all molecules, transitions occurred between numerous conformations, but those containing iduronic acid were observed to spend more time in extended geometries than their glucuronic acid equivalents (predominant shapes are illustrated in Figure  2). Further inspection and analysis revealed extremes of maximally extended chains and compact turns, with exchange being correlated to μs time scale linkage and ring flexing (exemplified in Figure 2B), as found previously in amylose. 34 It was confirmed that, in every simulation, arithmetic mean rvalues for 1 μs subsets had a standard deviation of less than 0.1 nm from the mean for the entire 10 μs data set, substantiating that motions in the model GAG decasaccharides had approached equilibrium.
In the H-unit decasaccharide simulation, 67% of conformers were 2-to 4-fold left-handed helices and 81% had an axial rise per disaccharide (h) of >0.82 nm. Prior X-ray fiber diffraction observations for HA include 2-, 3-, and 4-fold left-handed helices with h in the range 0.84−0.98 nm. 41 In the A-unit simulation, 43% of conformers were 2-to 3-fold left-handed helices, with 42% having h > 0.94 nm. Experimentally, chondroitin-4-sulfate has been observed as 2-to 3-fold lefthanded helices with h in the range 0.96−0.98 nm. 42 The C-unit decasaccharide simulation contained 65% of conformers between right-handed 8-fold helices and left-handed 3-fold helices, with 52% having h > 0.94 nm, consistent with previously described chondroitin-6-sulfate extended left-handed 3-fold and right-handed 8-fold helices with h in the range 0.96− 0.98 nm. 43 The iA-unit decasaccharide simulation had 58% of conformers with h in the range 0.90−0.96 nm, 44% were between right-handed 8-fold helices and left-handed 3-fold helices. Dermatan-4-sulfate has been observed to form 2-, 3-, and 8-fold helices with h in the range 0.92−0.94 nm. 44 Considering that these prior experiments were performed in the solid state and, hence, only a fraction of solution phase conformers could satisfy the packing constraints, it was concluded that the predicted conformer populations had a good overlap with published X-ray fiber diffraction refinements.
Molecular Mechanism for Encoding GAG Shape via Sequence. Puckering occurred in all decasaccharide pyranose rings and converged to equilibria over 2−5 μs (Table S3), consistent with previous simulations. 19 Based on an average of the four internal hexosamines in each model, all were principally 4 C 1 chairs (>97%) and motions were invariant of C4-position stereochemistry (GlcNAc or GalNAc) and Osulfonation (at the C4-or C6-position). While all GlcA pyranoses were predominantly 4 C 1 -chairs, they also underwent 4 C 1 ↔ 1 C 4 exchange (e.g., Figure 3M). This behavior is qualitatively similar to prior experimentally consistent GlcA monosaccharide simulations, 47 Table S4. 46,48 The DS decasaccharide IdoA puckers were predicted to be predominantly 1 C 4 chairs (e.g., Figure 3N) with populations of 82% (iO-unit), 90% (iA-unit), and 56% (iC-unit). All IdoA rings underwent 4 C 1 ↔ 1 C 4 exchange and the mix of IdoA ring conformers produced a qualitative agreement with measured low 3 J H,H values (<5 Hz) for iA regions of porcine DS. 49 The deviation between predicted and experimentally measured 3 J H,H values were as described and discussed previously. 47 Although 2 S O was the predominant nonchair conformer, it was predicted to be a minor ensemble component (<7%) in the iO-, iA-, and iC-unit decasaccharides. This skew-boat is reportedly key for protein binding 50,51 but has been difficult to observe in free crystals 52 and was infrequently populated in prior IdoA monosaccharide simulations (<10%). 47 The fine-grained simulations performed here represent the most extensive for GAG oligosaccharides to date and suggest that CS and DS uronic acid ring puckering equilibria are influenced across β(1 → 3) glycosidic linkages by neighboring hexosamine sulfate groups. In particular, GalNAc 4-Osulfonation biases GlcA and IdoA toward 4 C 1 and 1 C 4 , respectively. To investigate, conformers with both uronic acid chair puckers were extracted from A-and iA-unit decasaccharide simulations. Inspection of the A-unit at the chain center suggested that repulsion of the GalNAc 4-O-sulfate and GlcA carboxyl groups disfavored the 1 C 4 chair, which was not present in the favored 4 C 1 pucker ( Figure 4A). For the iA-unit, the IdoA 1 C 4 pucker was stabilized by interaction of its C3-position hydroxyl substituent with the adjacent GalNAc 4-O-sulfo group ( Figure 4B). This allowed water to be excluded from the IdoA ring face, which was not the case when IdoA was in the less favored 4 C 1 chair conformation. It is hypothesized that, within the fundamental β(1 → 3)-linked GAG unit, networks of through-space interactions between uronic acid, adjacent hexosamine sulfate groups, and water establish characteristic uronic acid puckering equilibria that contribute to polymer hydrodynamic properties. This represents a mechanism for encoding shape via sequence and may be a key factor in both specific GAG-protein interactions and PG organization.
Validating Simulations of Homogeneous and Heterogeneous GAG Polymers. Computed glycosidic linkage and pyranose ring free energy surfaces were used to parametrize a coarse-grained (CG) model, described previously. 22 For validation purposes, the decasaccharides studied above were resimulated for 10 μs using the CG model. Comparing CG with reference fine-grained simulations, equivalent calculated mean r values (see above) were all within 0.3 nm and linkage and ring motions were faithfully reproduced ( Figure 5).
Radius of gyration (R g ) and molecular mass (M) were measured simultaneously, using multiangle laser light scattering (MALLS), on a sample of medical grade HA. Model polymers in the range [H] 100 to [H] 400 (reflecting the range of measured molecular weights) were simulated for 10 μs using the CG method. Experimental data and simulated mean R g are graphed in Figure 6A (see "CG all"). Within these simulations, unrealistic atomic overlap occurred due to the absence of steric interactions in the CG model. 22 Correcting for this by removing those computed structures containing steric clashes and recalculating R g achieved excellent agreement with experiment (see Figure 6A "CG no clash"). Finally, recalculation of R g (after removing steric clashes) from identical CG simulations but with all constituent monosaccharides fixed in rigid 4 C 1 chair puckers resulted in predictions that disagreed with experimental measurements (see Figure 6A "CG no clash, rigid rings").
In the cases of CS and DS materials from shark, bovine, and porcine sources, the small polymer sizes (≈10 nm) precluded reliable R g measurements using MALLS. Therefore, hydrodynamic radii (R h ) at defined M were derived from quasi-elastic light scattering and MALLS ( Figure 6B,C and Figure S5). Model CG polymers of these materials were built from blocks of five disaccharides (with compositions reflecting NMR analyses reported above) and simulations were performed within the experimentally observed mass ranges (again following removal of steric clashes). Simulated R g values for representative shark and bovine CS polymers (in the range 50− 200 disaccharides) concurred with the measurements after conversion to equivalent R h ( Figure 6B). Here, the experimental trend for shark CS to have slightly larger R h than bovine CS at equivalent mass was reproduced in the CG simulations. For model porcine DS polymers of 35−55 disaccharides in length, simulated R g values (converted to R h ) were also close to measured equivalents ( Figure 6C). Marginal deviations in experimental and calculated data for sulfonated polymers may be attributed to sequence differences in the biological materials and models, the assumed refractive index increment value (determined from bovine nasal cartilage PG fractions 28 ) or the conformationally dependent R g to R h conversion factor.
This validation demonstrates the applicability of the CG approach to all human GAGs except keratan sulfate (the model's predictive ability for heparan sulfate was shown previously 22 ). It also reiterates that inclusion of both glycosidic linkage and pyranose ring μs time scale motions are necessary to produce accurate GAG polymer models. Furthermore, the CG method enables rapid and accurate predictions using  Effects of Combining and Permuting GAG Chain Units. The CG method is particularly suited to studying the relationship between chain dynamics and GAG sequence. To effectuate a study of this kind, polymer chains with 100 disaccharide units were constructed by repeating blocks of either 5 or 10 units (idiosyncratic patterns were chosen to explore the effect of both hexosamine sulfonation and uronic acid type). Performing 10 simulations of [A] 100 resulted in a calculated R g standard deviation of 0.03 nm, confirming the utility of mean R g as a sensitive indicator of chain conformation. Chains were simulated for 10 μs using the CG approach; steric clashes occurred infrequently at this degree of polymerization and removal was not necessary for accurate predictions. Computed R g values for each polymer are shown in Figure 7.
Of any CS sequence considered, chains with [A] 100 and [C] 100 resulted in the smallest and largest R g values, respectively ( Figure 7A). Successive substitution of A-with C-units (or vice versa) resulted in a systematic increase (or decrease) in polymer size. Chains with only O-units were significantly longer than those with only A-units, but shorter than polymers comprising only C-units. Thus, inclusion of O-units increased the average hydrodynamic size of A-unit polymers and decreased that of C-unit polymers. In model DS chains, equivalent sulfonation patterns (to CS) resulted in consistently larger R g values ( Figure 7B), a trend that reflected observations from fine-grained simulations ( Figure 2). Here, the 4-O-sulfo [iA] 100 chain had a smaller calculated R g than either the [iO] 100 or [iC] 100 polymers. In contrast to CS, successive inclusion of iO-into iC-unit polymers marginally increased calculated radii. Polymers comprising exclusively IdoA-units had a relatively restricted range of R g values; all were within 1.4 nm of each other (cf. 2.0 nm in GlcA-unit polymers). Progressive addition of A-units into an iA-based polymer decreased the overall size, and maintaining the A/iA stoichiometry while increasing the iA-block size had a negligible effect on calculated hydrodynamic radii ( Figure 7C). Therefore, it was hypothesized that permutation of units (while maintaining the combination) has little effect on R g and the combinations of uronic acid type and hexosamine sulfonation contribute to the final size of R g in  the following order (smallest to largest): A < O < C < iA < iO < iC. This suggests that combining these units in heterogeneous polymers provides a mechanism for tuning the hydrodynamic properties of GAG chains.
Block Copolymer Microarchitectures as a Basis for Multivalency. Supporting these hypotheses, permutations of a sequence combination mimicking shark CS were predicted to have identical R g values, and the same was true for bovine CS ( Figure 7D). Shark CS polymers were predicted to be slightly larger than their bovine equivalents, but the porcine DS models were, as expected, the most extended model biological sequences considered. As a consequence of this it would be expected that PGs comprising DS would have a larger hydrodynamic domain than their CS equivalents. However, in DS block copolymers, overall polymer R g did not provide the full picture. For example, inspection of conformational snapshots from a simulation of the {[A] 4 [iA] 6 } 10 polymer revealed flexibility in the GlcA-containing regions, resulting in tight turns reminiscent of protein β-hairpins, and comparative rigidity in the IdoA-containing domains (Figure 8). These predicted microarchitectures could be a molecular basis for multivalent interactions with proteins, such as fibroblast growth factors. 7,53 The differential dynamics of these copolymeric sequences can thus be expected to have significance in signal transduction.
Given the documented ability of IdoA to be an equilibrium of chair conformations, prior work had logically assumed IdoA to impart GAG chain flexibility and GlcA to confer rigidity. 24 Our current studies, based on 10 μs fineand coarse-grained simulations, have come to an alternative set of conclusions: (a) pyranose rings of both IdoA and GlcA are flexible, (b) their conformational equilibria are driven by neighboring hexosamine 4-O-sulfonation toward either 1 C 4 or 4 C 1 , (c) equivalent chains with IdoA are more extended than those with GlcA (irrespective of sulfonation), and (d) IdoA does not substantially populate nonchair boat or skew-boat puckers.
Accurate Proteoglycan Models: Bikunin and Decorin. In a 10 μs fine-grained simulation of the GAG-PG interface trisaccharide, Galβ(1 → 3)Gal-β(1 → 4)-Xyl, both galactose (Gal) pyranose rings were predicted to be stable 4 C 1 puckers (>99%), while xylose (Xyl) was an equilibrium comprising mostly 4 C 1 chairs (70%) with a minor 1 C 4 component (26%). The Gal-Gal and Gal-Xyl glycosidic linkages were centered on (ϕ,ψ) ≈ (−60°,−120°) and ≈(−60°,+120°), respectively, which are consistent with prior NMR 37 and a 1 μs fine-grained simulation of bikunin glycosylated with the GAG-PG trisaccharide. Comparing linkage and ring dynamics in these two simulations ( Figure S6) suggested that motions in this region do not differ substantially when free in water and when linked to the core protein. Models of PGs were constructed by parametrizing a fully dynamic CG model of the linker from these fine grained simulations and elaborating with specific GAG chain sequences; these polymers were then attached to previously identified PG core protein serine (Ser) residues. The PG core proteins were kept rigid while the dynamics of GAG chains were predicted in 10 μs CG simulations. Conformers that contained steric clashes were removed from resulting ensembles prior to calculation of R g (as above).
A human bikunin PG model was built from an X-ray refinement of its core protein, the linker (attached to Ser-10), and a single CS chain. Several human bikunin GAG glycoforms have been sequenced, with 27−43 monosaccharides. 5  -GlcA-. In this simulation, the GAG folded and transiently enveloped the core protein. The distance between the core protein and terminal monosaccharide geometric centers was on average 9.1 nm and thus significantly larger than the protein core ( Figure  9A). This distance could range between 2.0 and 14.8 nm, highlighting the extent of sampled GAG conformations. Representative 3D structures from this human bikunin simulation are illustrated in Figure 9B and an R g of 3.4 nm was predicted (repeated calculations of R g had a standard deviation below the number of significant figures reported, see Table S7). Furthermore, performing this simulation with an alternative extended N-terminal loop conformer, derived from a 1 μs fine-grained simulation of the bikunin core protein (see Figure S8), resulted in an indistinguishable R g value, suggesting that the N-terminal loop conformation has very little effect on intact PG hydrodynamic properties. Prefixing the sequence described above (at its nonreducing end) with [O] 5 resulted in the longest CS chain identified previously. 5 In an ancillary simulation of this glycoform, the computed R g increased by almost a quarter, to a value of 4.1 nm. For comparison, an experimental R h value for rat bikunin (thought to have similar hydrodynamic properties to human bikunin) was measured   54 This is within the predicted range of R g values for the two human bikunin glycoforms considered here.
Crystallographic data for the bovine skin decorin core protein (for modeling purposes assumed here to be a homomeric dimer) was used as the basis for a second PG model. In the absence of sequence data for bovine decorin GAG chains, they were initially based on recent sequence analysis of porcine decorin GAGs 55 and the average molecular weight of bovine decorin GAG chains (≈24 kDa). 56  -GlcA-(≈27 kDa) resulted in a concomitant increase in R g to 9.4 nm. These predictions are consistent with an experimentally measured R g of 9.3 ± 0.1 for intact bovine decorin, 56 underscoring the CG method's ability to reproduce hydrodynamic data for intact PGs. The predictions also provide evidence for a dependence of PG size on the length and sequence combinations of the attendant GAGs.
In the absence of an X-ray diffraction refinement for human decorin core protein, the homologous bovine core protein was again used as a start point for building a human decorin PG model. Recent sequencing work has resulted in a human decorin GAG glycoform map, comprising multiple domains of variable length, sulfonation, and IdoA content. 6 Based on this, a presumed consensus sequence was constructed: [C] 6 4 -GlcA-. Two of these chains, one per subunit, were appended to the bovine decorin PG model at Ser-5 (a homomeric dimer was assumed). Representative conformations from the subsequent CG simulation are illustrated in Figure 10A. The predicted overall R g value was 8.0 nm, and thus, compared to a bovine skin decorin model, the hydrodynamic size of human skin decorin is predicted to be smaller, in part due to reduced IdoA composition. The dynamics of the two GAG chains featured bending and wrapping motions, with the average distance between the [iA] 4 regions of each chain predicted to be 23 nm. As alluded to above, multivalent interactions are theorized to be possible between a protein and a single GAG chain (see Figure  8). In this PG model, [iA] 4 regions of separate chains  Representative conformers derived from a 10 μs coarse-grained simulation of decorin with a single consensus human dermatan sulfate polymer attached to each core protein subunit (see text for the sequence). The 30 conformations reveal the dynamic range of polymer motion and their significant contribution to PG size; mean chain extension from the protein geometric center is depicted as a sphere of radius 21 nm. (B) Conformers of decorin and biglycan (two chains per subunit) with consensus human dermatan sulfate polymers (see text for sequences and Supporting Information for model coordinates). In decorin, GlcA and IdoA-containing blocks are blue and sand colored, respectively. A schematic collagen fibril with Dspacing of ≈70 nm and growth factor proteins (Protein Data Bank entries: 1FQ9, 1QQK, and 1NUN) are drawn to scale, these (colored cyan) are positioned to illustrate the potential for multivalent binding.
approached with a proximity of less than 5 nm, the approximate size of an FGF protein, confirming the feasibility of increased multivalent binding in a PG system with two GAG chains ( Figure 10B).
Substantiating Functional Roles for GAG Chain Length and Density: Biglycan. Two human biglycan models were constructed using crystallographic data for the bovine core protein (again assumed to be a homomeric dimer). In each, both subunits were appended (via linkers at Ser-4 and Ser-11) with two GAG chains containing an average number of units found in human mitral valve tissue associated with either tensile ({[A][iA] 4 } 12 -GlcA-) or compressive ({[A][C] 2 [iA] 2 } 17 -GlcA-) regions. 57 In these dynamic models, the respective predicted R g values were 13.0 and 15.4 nm. The larger hydrodynamic radii of PGs in compressive tissue, as reproduced here, are thought to be important in resisting stress. 57 Based on the biglycan simulations, it is simply concluded that the GAG chains with longer sequences found in compressive tissue are primarily responsible for the enhanced hydrodynamic radii. While the exact sequences have some effect, this is predicted to be a secondary factor. Indeed, the simulations imply that at equivalent GAG length the tension associated sequence, with a higher IdoA content, would be larger in size than the sequence found in compressive mitral valve tissue. If so, the specific sequences of IdoA and sulfonation are likely to be responsible for other functions such as protein recruitment. In the tensile tissue biglycan model, the average approach distances of the [iA] 4 blocks at the chain centers was 16 nm, significantly closer than in the human decorin model (23 nm), implying that the synergistic effect of four GAG chains increased the possibility of multivalent interactions with other proteins (Figure 10B), as recently evidenced. 8 These simulations also highlight the vast contribution of GAG chains to PG size. The computed R g for the homodimeric biglycan core protein alone was around 3 nm, much less than the 13−15 nm found here for fully glycoconjugated biglycan dimer models (with a total mass of ≈200 kDa). Moreover, based on standard bovine serum albumin, a folded globular protein would require a mass of approximately 3 MDa to achieve this physical size.

■ CONCLUSIONS
It has been shown that accurate spatiotemporal PG models, including complex heterogeneous GAG sequences, can be constructed by abstracting glycosidic linkage and pyranose ring atomic motions from multi-μs fine-grained simulations. The current CG method implementation was successful for the cases of small PGs and isolated GAGs, resulting in predictions that supersede schematic "bottle brush" PG representations. These atomic scale dynamic perspectives are essential to uncover how PGs collectively perform constitutive signal transduction and mechanical functions, and extension to GAG sequences not considered here should be straightforward.
The described CG technique can address PG assemblies comprising millions of atoms, which remain beyond the capability of even hardware accelerated fine-grained modeling. Therefore, emerging GAG sequence and core protein 3D data, in concert with methodological developments enabling molecular interactions, will allow larger cell-surface and aggregating PGs, plus assemblies thereof, to be studied routinely at the atomic scale. In time, the approach could inform how GAG synthetic enzyme deficiencies lead to connective tissue diseases, 58 provide mode-of-action insights for biopharmaceutical PGs, and facilitate the development of new biologically inspired medicines and materials.
■ ASSOCIATED CONTENT * S Supporting Information