The Length Dependence of the PolyQ-mediated Protein Aggregation *

Polyglutamine (polyQ) repeat disorders are caused by the expansion of CAG tracts in certain genes, resulting in transcrip-tionofproteinswithabnormallylongpolyQinserts.Whenthese inserts expand beyond 35–45 glutamines, affected proteins form toxic aggregates, leading to neuron death. Chymotrypsin inhibitor 2 (CI2) with an inserted glutamine repeat has previously been used to model polyQ-mediated aggregation in vitro . However, polyQ insertion lengths in these studies have been kept below the pathogenic threshold. We perform molecular dynamics simulations to study monomer folding dynamics and dimer formation in CI2-polyQ chimeras with insertion lengths of up to 80 glutamines. Our model recapitulates the experimental results of previous studies of chimeric CI2 proteins, showing high folding cooperativity of monomers as well as protein association via domain swapping. Surprisingly, for chimeras with insertion lengths above the pathogenic threshold, monomer folding cooperativity decreases and the dominant mode for dimer formation becomes interglutamine hydrogen bonding. These results support a mechanism for pathogenic polyQ-medi-ated aggregation, in which expanded polyQ tracts destabilize affected proteins and promote the formation of partially unfolded intermediates. These unfolded intermediates form aggregates through associations by interglutamine interactions. The polyglutamine (polyQ) 3 repeat

The polyglutamine (polyQ) 3 repeat disorders are a family of inherited disorders characterized by progressive neurodegeneration, as well as the formation of intracellular protein aggregates. These include Huntington's disease (HD), spino-cerebellar ataxias (SCA) 1-6, dentatorubral-pallidoluysian atrophy (DRPLA), and spinal and bulbar muscular atrophy (SBMA) (1). Each disease is caused by the expansion of a tract of repeated CAG triplet in a distinct gene, causing transcription of proteins with lengthened polyQ repeats. Although the genes (and proteins) involved in different polyQ diseases have little homology, these disorders share a striking relationship between their clinical manifestation and their genetic underpinning. Symptoms appear only if the length of the expanded tract exceeds a critical value of 35-45 glutamines, with a greater number of repeats leading to earlier disease onset and more severe manifestations (2). Accordingly, it is hypothesized that neuronal toxicity present in polyQ diseases is derived from the polyQ expansion itself, which induces pathogenesis by a common molecular mechanism in these disorders (3,4). A nucleation-elongation mechanism, in which the formation of a high energy nucleus is the rate-limiting step in the formation of a toxic protein aggregate, has been proposed to account for the timeframe of disease onset (3,5). The ␤-helix, proposed as such a nucleus for polyQ aggregation, has recently been found to be stable only in tracts of 37 or more repeated glutamines (6). A ␤-helix is a protein conformation in which the backbone dihedral angles of each amino acid corresponds to the ␤-region in the Ramachandran plot, and the peptide forms a helical structure with a periodic length much larger than that of ␣-helix, 3.6.
Although intracellular aggregates rich in polyQ-containing peptides are the most visible cytological manifestations of polyQ disorders (7), there is increasing evidence that soluble oligomers, rather than these insoluble aggregates, are toxic to cells (4, 8 -11). However, the role of polyQ tracts in the formation of these oligomeric species, as well as the nature of their pathogenic effect, remains unclear. Inspired by the hypothesis of a common molecular mechanism of the polyQ-mediated aggregation of proteins, chimeric proteins with inserted glutamine repeats have been used as an in vitro model system to study polyQ-mediated oligomerization (12)(13)(14). Stott et al. (14) first engineered chimeric CI2-polyQ with inserted tracts of 4 -10 glutamines. While these mutants were found to form soluble aggregates, association was later found not to occur by the hypothesized glutamine-mediated interactions. In contrast, the oligomers preserved native-like structures and formed through a domain-swapping mechanism (12). In a domain-swapped aggregate, certain regions of a protein associate with complementary regions of a like protein, mimicking native interactions in a monomer. This type of association is made possible by a flexible hinge region which changes conformation between monomeric and domain-swapped forms. The crystal structure of domain-swapped CI2-polyQ dimers has been solved by x-ray crystallography (12), but electron density for the glutamine insert could not be determined. The authors hypothesized a random coil conformation for the insert. Gordon-Smith et al. (13) performed NMR studies to examine the possibility of interglutamine interactions in the dimer. They found that any interglutamine interactions, if present, did not appreciably slow the motion of backbone amide groups. While no experimental evidence has been found of interglutamine interactions in these CI2 chimeras, we believe that this is due to the small insertlength of 4 -10 glutamines in previous studies, far below the pathogenic threshold (35-45 glutamines). We hypothesize that once the insertion length exceeds the pathogenic threshold the CI2 model system will aggregate in a different pathway, which might be relevant to in vivo toxicity.
Because the CI2-polyQ model system has been explored by previous experimental (3,4,12,13,(15)(16)(17)(18) and theoretical (19) studies, we systematically studied the conformational dynamics of chimeric CI2-polyQ proteins with a variety of insert lengths (4 -80 glutamines), including glutamine tracts longer than the pathogenic threshold. In chimeric CI2 monomers, we observed a decrease in folding cooperativity as a function of increasing polyQ insertion length. Using a coarse-grained protein model, molecular dynamics simulations also fully recapitulated the known crystal structure of CI2 domain-swapped dimers. We observed that the dominant mode of dimer formation in CI2 chimeras change from domain swapping to interglutamine interactions as polyQ insert length increased beyond the pathogenic threshold. The observation of the length-dependent folding/aggregation behavior in the polyQ-inserted CI2 protein, a non-amyloidogenic protein, supports a mechanism for polyQ-mediated aggregation in which insert length determines the degree of intraversus interchain interactions: at small insert lengths, the destabilization induced by the introduction of polyglutamine leads to domain-swapping whereas at large insert lengths (greater than pathogenic threshold) interchain glutamine-glutamine interactions dominate and lead to aggregation.

MATERIALS AND METHODS
Discrete Molecular Dynamics-Discrete molecular dynamics (DMD) is employed to study protein conformational dynamics. The detailed description of the DMD algorithm can be found in Ref. 20 -22. In DMD, interatomic interactions are governed by square-well potential functions. Neighboring interactions, such as bonds, bond angles, and dihedrals, are modeled by infinitely high square well potentials. During a simulation, atom velocity remains constant until a potential step is encountered, where it changes instantaneously according to the conservations of energy, momentum, and angular momentum. Simulations proceed as a series of such collisions, with a rapid sorting algorithm employed at each step to determine the following collision.
CI2-PolyQ Model-Proteins are modeled using a "beads-ona-string" approach, in which four pseudo-atoms represent the N, C ␣ , CЈ, and O atoms of the backbone, with optional beads representing the sidechain depending on the residue type (6). The ab initio folding of CI2 protein in silico remains a challenge in computational biology. To ensure the proper folding of CI2 protein, we use a Go -like potential, which favors native contacts and ensures that the native structure is the energetic ground state. For simplicity, all non-glycine residues of the CI2 host protein are modeled as alanines, with C ␤ atoms as the interaction centers. Two side-chain beads are used to model the inserted glutamines: one bead (the ␤-bead) for the C ␤ and C ␥ atoms, and another bead (the ␥-bead) for the carboxylamine group (6). For the polyQ insertion, we use a physical interaction model as in Ref. 23-25, including hydrophobic interactions and hydrogen bonds. The parameters to model the topology of each amino acids and peptide bonds between consecutive residues have been published previously (6,26). In addition to the existing sidechain-backbone and backbone-backbone hydrogen bond interactions of this previous model, we introduce sidechain-sidechain hydrogen bonds between the glutamine ␥-beads. We specify that a given glutamine residue can engage in a maximum of four hydrogen bonds, because a carboxylamine group can donate two protons and accept two protons.
To model the aggregation of polyQ-inserted proteins, we use the symmetrical Go potential (23)(24)(25), where native-like interactions between residues from different molecules are energetically favored. Such a model has been successfully used to model protein aggregation (22,27). The physical interaction model, describing the glutamine interaction, comprises hydrophobic interactions and hydrogen bonds, including backbonebackbone, sidechain backbone, and sidechain-sidechain interactions. To tune the relative contributions of the structure-and physics-based models, we use the simple criteria that the folding-transition temperature of CI2 (6) and the melting temperature of isolated polyQ peptides (15) should be comparable. Accordingly, we assign the strength of the Gopotential as 1⑀, and the strengths of hydrophobic interactions as well as backbone-backbone, sidechain backbone, and sidechain-sidechain hydrogen-bonding interactions as 0.7⑀, 5⑀, 2.4⑀, and 5⑀, respectively. We tested the relative interaction contributions by performing folding simulations of chimeric CI2 with polyQ insertions from 4 to 10, which are experimentally shown to have similar stability to the wild type (15). We found that small perturbations of the relative ratio between the structure-based potential and the physical potential do not affect the relative folding transition temperature (T m , a measure of the folding stability) of chimeric CI2. Temperature is in the unit of ⑀/k B T, where k B is the Boltzmann constant.
Replica-Exchange DMD Simulations-Efficient exploration of the potential energy landscape of molecular systems is the central theme of most molecular modeling applications. Sampling efficiency at a given temperature is governed by the ruggedness, and the slope toward the energy minimum in the landscape. Although passage out of local minima is accelerated at higher temperatures, the free energy landscape is altered due to larger entropic contributions. To efficiently overcome energy barriers while maintaining conformational sampling corresponding to a relevant free energy surface, we utilize the replica exchange sampling scheme (28 -32). In replica exchange computing, multiple simulations or replicas of the same system are performed in parallel at different temperatures. Individual simulations are coupled through Monte Carlo-based exchanges of simulation temperatures between replicas at periodic time intervals. Temperatures are exchanged between two replicas, i and j, maintained at temperatures T i and T j and with energies E i and E j according to the canonical Metropolis criterion (33) with Monomer simulations were performed over a set of temperatures ranging from 0. 7
As the Insert Length Increases, Folding Cooperativity Decreases Without Significantly Altering Thermodynamic Stability-All monomer simulations display an apparent twostate folding transition similar to that for wild-type CI2. In Fig.  1a, we present a typical folding transition for CI2 with various lengths of glutamine insertion. At temperatures below the folding transition temperature (T F ϳ 1.05⑀/k B T), the protein remains in a folded conformation with low potential energies. A sharp transition of potential energy is observed at T F , and the protein remains fully unfolded with high energies at temperatures higher than T F . Hence, T F corresponds to the peak in the specific heat plot (Fig. 1b). The folding transition temperature T F corresponds to the midpoint temperature T m in thermal denaturation experiments. Hence, the value of T F can serve as a measure of protein thermodynamic folding stability. We find that all CI2 chimeras with different lengths have similar values for T F , suggesting that insertion of glutamines in the loop region of CI2 does not significantly alter the stability of the protein. This is in agreement with previous experimental results (15).
The width of the transition peak in a specific heat plot (Fig.  1b) is an indicator of folding cooperativity for a protein, which is in turn related to the excitation barrier of folding/unfolding events. We define the peak width of specific heat plot by the temperature range with specific heat values larger than half of the peak value (Table 1 and Fig. 1b). For insert lengths of 4 -10 glutamines, specific heat peak width does not vary significantly from the wild-type protein. This is in agreement with the conclusions of Ladurner and Fersht (15) that the folding rate (excitation barrier of free energy) of CI2 chimeras is not affected by short polyQ inserts within experimental precision. In CI2 with insert lengths of 30 or more glutamines, we observe a significant broadening of the peak width, suggesting a decrease in the fold-ing cooperativity of the monomers. We conclude that as the insertion length increases toward the pathogenic threshold and beyond, the transition barrier between the folded and unfolded states decreases significantly. As a result, the proteins with long inserts spend more time in an unfolded or partially unfolded state. The folding thermodynamics of CI2 chimeras with different polyQ insertion lengths. a, average potential energy as a function of temperature for wild-type CI2, and its mutants with different glutamine insertions. The folding transitions take place at the transition temperature T F ϳ 1.05 ⑀/k B . b, specific heat plots for different CI2 chimeras. The specific heat is computed using the standard deviation of the potential energy in the simulations, ␦E (C v ϭ ␦ 2 E/T 2 ). As the insertion length increases, the peak width increases.

TABLE 1 The estimated T F and width of the specific heat peak, dT half
The T F is defined as the temperature with the largest specific heat. CI2 Chimeras with Small Insertion Lengths Associate via Domain-swapping-It has been proposed that domain-swapping association is governed by the monomer topology and the location of the flexible hinge region within a monomer determines the structure of domain-swapped oligomers (24,25). To determine these hinge regions, we use a recently developed H-Predictor (24) to compute each residues propensity to participate in the domain-swapping hinge region. A low H-predictor score (⌬E/⌬L) for a given residue indicates a high propensity to be involved in the hinge region in a domain-swapped structure. For CI2-Q4, the H-Predictor identifies a hinge region (marked in red in Fig. 2a) that encompasses the glutamine insert, and similar results are obtained for other insertion lengths (data not shown). In agreement with experimental findings that CI2-QN (4 Յ N Յ 10) associate by domain swapping (12), we observe that DMD simulations of CI2-Q4 and CI2-Q10 show frequent formation of domain-swapped dimers (Fig. 2b). The hinge regions of the modeled and experimentally determined structures agree with those identified by H-predictor.

Insertion length T F (⑀/k B T) dT half (⑀/k B T)
The crystal structure of the CI2-Q4 dimer has been determined by x-ray crystallography (PDB ID 1CQ4) (12) and provides an invaluable benchmark to validate our models. Because of high B-factors, the crystal structure of the flexible glutamine inserts was not experimentally determined (Fig. 2c). Although the relative orientation of the two folded CI2 domains (pseudomonomers) within the domain-swapped dimer is variable, the centroid dimer conformation from simulations (Fig. 2b) agrees with the location of the hinge region and the orientation of dimers within the crystal lattice given pictorially in Chen et al. (12). Here, the centroid conformation is defined as the dimer conformation with the smallest average R.M.S.D. to all other dimer conformations from the simulations. The relative R.M.S.D. between the two structures is Ϸ4 Å.
We next studied the dynamic features of the glutamine inserts in the CI2-Q4 and CI2-Q10 domain-swapped dimers. We compute frequency maps, defined as matrices of contact frequencies for each pair of residues, for each dimer along its simulation trajectory. We find that persistent native-like contacts, definitive of association by domain swapping, formed between monomers. The frequency of interglutamine contacts (circled in Fig. 2d) is weaker than that of the interdomain contacts, suggesting that the polyQ tract is flexible and does not form persistent structures in the domain-swapped dimer. This observation explains the high B-factor in the x-ray crystal structure, and also agrees with NMR studies of these dimers, in which permanent interglutamine interactions were not found (13).
Our simulation studies show that CI2 chimeras with small insertion length (N Ͻ 20) associate by a domain-swapping mechanism, where the driving force is native-like CI2 interactions. These aggregates contain insertion lengths far below the FIGURE 2. Formation of domain-swapped dimers. a, domain-swapping hinge region propensity, ⌬E/⌬L, is presented as the function of residue index for CI2-Q4. A low score over a given region indicates a high propensity for this region to serve as the hinge region in a domain-swapped aggregate. The red-shaded region, which includes the polyQ insert, is predicted as the hinge region. b, representative structure of the CI2-Q4 dimer derived from simulations. The red region corresponds to the predicted hinge region. c, structure of the CI2-Q4 domain-swapped dimer determined by x-ray crystallography. d, frequency map of this dimer. Circled regions correspond to interglutamine contacts. pathogenic threshold, and possess few non-native structural features.
CI2 Chimeras with Insertion Lengths Close to or Beyond Pathogenic Threshold Associate by Interglutamine Contacts-To determine the association mechanism of CI2 chimeras with longer inserts, we computed the average number of contacts per residue for interdomain and interglutamine interactions (Fig. 3a). Here, interdomain contacts of CI2 residues are defined as interactions between the first 43 residues of one protein and the last 25 residues of the opposite protein. The interglutamine interactions include all interprotein contacts between glutamine residues. Data are collected for all simulations in which the centers of mass of the two proteins fall within a cutoff distance of 30 Å along a specific simulation trajectory, indicating that a putative dimer is successfully formed (Fig. 3a).
A cutoff of 7.5 Å between the C ␤ atoms of each residue (C ␣ for Gly) is used.
We observe that the dominant mode of association of CI2-QN with N Ͻ 20 is domain-swapping (Fig. 3a). This result agrees with the experiments of Stott et al. (14) in which short insert mutants (CI2-Q4, CI2-Q10) were found to associate exclusively by domain swapping. However, we find that the frequency of interglutamine interactions surpasses that of interdomain contacts in CI2 with more than 35-40 inserted glutamines. Interestingly, this transition coincides with the pathogenic threshold for polyQ disorders. For insertion lengths longer than 30 glutamines, we also find that the average rate of protein association increases along with the length of the insert (data not shown). Our results suggest that any protein, even such a "benign" protein as CI2, can be driven to aggregate by a glutamine-mediated mechanism when the polyQ insertion exceeds a certain critical length.

DISCUSSION
Chimeric CI2 with an inserted polyQ tract has been experimentally employed as the in vitro model system to study polyQ repeat disease. Many experimental studies of polyQ-containing peptides and proteins are available to serve as computational benchmark for in silico studies of polyQ aggregation (3,4,12,13,(15)(16)(17)(18). Therefore, we have constructed a model of chimeric CI2 proteins with polyQ insertions of varying lengths. Simulations of proteins with short insertions (N Ͻ 20) recapitulate experimentally determined thermodynamic and kinetic properties of aggregation: 1) We find that the folding transition temperature T F does not change with increasing insertion length. Because T F corresponds to T m , a measure of stability, our results agree with the conclusion of Ladurner and Fersht (15) that short polyQ inserts do not alter the stability of CI2 chimeras. 2) We find that chimeras with short inserts associate by a domain-swapping mechanism. The simulation-derived structure for the CI2-Q4 domain-swapped dimer is in full agreement with the crystal structure (12). 3) The polyQ tract within short insert domain-swapped dimers is highly dynamic, which is consistent with the corresponding high B-factor in the crystal structure (12) and NMR study of Gordon-Smith et al. (13). These benchmark tests validate our computational model. Our simulation studies provide a possible scenario for the polyQ-mediated aggregation. Although polyQ insertion does not alter the folding stability of the protein, it reduces the excitation barrier for the unfolding process, as indicated by reduced folding cooperativity in proteins with longer polyQ inserts. As a result, even under native-like conditions, polyQ-expanded protein spends more time in a partially folded conformation than wild-type protein. Such intermediates have been implicated in the aggregation of human lysozyme variants (34). For shorter glutamine inserts, the unfolded state is rescued by the formation of domain-swapped aggregates, because the energy benefit of forming native interactions dominates over that of the polyQ-mediated interactions. However, as the insertion length exceeds the pathogenic threshold (35)(36)(37)(38)(39)(40), the glutamine-mediated interactions, mainly hydrogen bonds, dominate over the native contacts. The increased propensity of inter-polyQ interactions as the insertion length increases is compatible with Wetzel and co-workers (35,36) discovery that nucleation takes place in monomer, because the rate-limiting step of aggregation can be the formation of nucleus in each monomer. This scenario is also consistent with the theoretical and experimental results from Pappu and co-workers (36,37) that monomeric polyQ peptides form collapsed states. Their studies support the association of polyQ peptides in the same way as the collapse of isolated polyQ chains.
To find potential structural features of polyQ aggregates, we searched for persistent conformations present in CI2 monomers and dimers over a range of insertion lengths. We do not find the presence of regular secondary structure elements within the polyQ region, such as ␤-strands or ␤-helices, postulated to be the building blocks of the amyloid aggregate. However, within the time-scale of our simulations, we observe the transient formation of ␤-helices within CI2 monomers with insertion length of 40 (Fig. 3b). This fact suggests that the formation of persistent, ordered structural elements composing an aggregate occurs on a time scale much larger than that sampled by our simulations, and requires large structural rearrangements within multiple (more than two) chains of the protein. We believe that our simulations capture the early steps in the process of aggregate formation.