Revealing Well-Defined Soluble States during Amyloid Fibril Formation by Multilinear Analysis of NMR Diffusion Data.

Amyloid fibril formation is a hallmark of neurodegenerative disease caused by protein aggregation. Oligomeric protein states that arise during the process of fibril formation often coexist with mature fibrils and are known to cause cell death in disease model systems. Progress in this field depends critically on development of analytical methods that can provide information about the mechanisms and species involved in oligomerization and fibril formation. Here, we demonstrate how the powerful combination of diffusion NMR and multilinear data analysis can efficiently disentangle the number of involved species, their kinetic rates of formation or disappearance, spectral contributions, and diffusion coefficients, even without prior knowledge of the time evolution of the process or chemical shift assignments of the various species. Using this method we identify oligomeric species that form transiently during aggregation of human superoxide dismutase 1 (SOD1), which is known to form misfolded aggregates in patients with amyotrophic lateral sclerosis. Specifically, over a time course of 42 days, during which SOD1 fibrils form, we detect the disappearance of the native monomeric species, formation of a partially unfolded intermediate in the dimer to tetramer size range, subsequent formation of a distinct similarly sized species that dominates the final spectrum detected by solution NMR, and concomitant appearance of small peptide fragments.

T o obtain insight into the behavior of biological macromolecules it is imperative to establish efficient procedures for exploring their kinetic and equilibrium behavior. A particularly challenging case is the oligomerization and fibril formation of proteins associated with neurodegenerative diseases and other protein aggregation-related disorders. Oligomeric states appear during fibril formation and have been identified as cytotoxic components in many diseases. 1,2 Oligomers are often present only transiently and in low amounts, 3,4 thereby making their detection particularly challenging.
Nuclear magnetic resonance (NMR) spectroscopy can provide detailed chemical information from individual signals for virtually every atom in a molecule and, hence, makes it possible to follow the evolution of molecular species. For simple mixtures, the use of DOSY (diffusion-ordered spectroscopy) can resolve the individual NMR signals by distinguishing different species through their different rates of translational diffusion. 5,6 By contrast, complex samples containing interconverting species pose a formidable challenge to the interpretation of NMR data by conventional means. However, the additional complication posed by an evolving mixture can be an advantage, because the time and diffusion dependences can be encoded as independent dimensions enabling the use of multilinear data analysis methods, such as parallel factor analysis, PARAFAC. 7−10 PARAFAC is an extension of the two-dimensional principal component analysis (PCA) to higher order arrays. While PCA suffers from rotational ambiguity, resulting in fitted components that often are linear combinations of different processes and describe deviations from the mean rather than actual amplitudes, a successful PARAFAC decomposition directly provides the user with the variation of each species in a form representing meaningful amplitudes, resulting in a tremendous improvement in interpretability.
Here, we show that the combination of DOSY and PARAFAC 5,6,11 provides powerful information on an elusive oligomerization process underlying formation of amyloid fibrils. We employed three-dimensional 1 H− 15 N-DOSY-HSQC to acquire a series of spectra during fibril formation by SOD1 (153 amino acid residues), a protein associated with the neurodegenerative disease amyotrophic lateral sclerosis. The present investigation involves three independent dimensions: the NMR resonance frequencies, the signal decays caused by diffusion, and time. PARAFAC translates these dimensions into the sought information: the number of (NMR detectable) species and their time evolution profiles, sizes, and spectra. This is achieved without any constraints or presumptions of kinetic models, hydrodynamic properties, or spectral characteristics of the species.
We used the pseudo wild type SOD1 variant C6A/F50E/ G51E/C57S/C111A/C146S (pwtSOD1 ΔC ) that mimics the monomeric metal-free and disulfide-reduced form of SOD1, 3,12,13 which is known to be a key precursor in the formation of aberrant oligomers and fibrils. 14− 17 We performed 1 H− 15 N-DOSY-HSQC experiments based on the X-STE pulse sequence 18 with a diffusion time of 500 ms and 10 different gradient strengths, yielding a total experimental time of 18.6 h. This experiment was repeated 10 times on the same sample over a period of 6 weeks, during which time the sample temperature was maintained at 37°C, to cover the full fibril formation of pwtSOD1 ΔC (Supporting Information (SI) methods). Over this time period the 1 H− 15 N-HSQC spectrum of pwtSOD1 ΔC becomes crowded as peaks from partially unfolded species build up ( Figure S1). For the present analysis we selected a total of 223 peaks, which may well have contributions from overlapping resonances.
The data were analyzed using PARAFAC to identify the minimum number of species (factors) needed to describe the variance of the data set, see SI methods for details. We computed PARAFAC models including up to 5 factors. Anomalous behavior was identified for three weak signals from histidines affected by exchange processes at this pH, and these were removed from further analysis. The analysis identified four factors based on various quality assessment tools, such as residuals and core consistency diagnostics 19 (Table S1). Factor 4 comprises only three signals that we excluded to generate a separate set of PARAFAC models with improved descriptors for factors 1−3.
The resulting factors are characterized by unique combinations of time evolution, size, and spectral contribution ( Figure  1) that make physical and chemical sense: the amplitudes of individual factors change smoothly with time ( Figure 1A−D) and decrease exponentially with the square of the gradient strength, indicating that each factor represents a unique, monodisperse species ( Figure 1E−H). Due to extensive overlap, individual peaks may represent more than one molecular species, but this is resolved by the PARAFAC decomposition. Most peaks include contributions from more than one factor (Figure 1I−L), indicating that species I, II, and III contain different levels of similarly folded and unfolded segments.
At the starting point of the reaction the dominant species (state I, factor 1) is the folded, monomeric apo form of pwtSOD1 ΔC , as identified by the chemical shifts. 3 The diffusion coefficient, D = (1.40 ± 0.02) × 10 −10 m 2 s −1 , for this species corresponds to a Stokes' radius, r s = 23 Å (eq S2), which corresponds very well with the previously determined value of r s = 22.5 Å for monomeric apo disulfide-reduced SOD1. 20 Factor 2 describes an intermediate (state II) that builds up rapidly from the start, displays the same short lag phase as the monomer curve, peaks around day 8−15, and declines over days 15−42 ( Figure 1B). The time evolution of factor 2 is similar to that of prefibrillar intermediates of Aβ42 and transthyretin, associated with Alzheimer's disease and transthyretin amyloidosis, respectively. 21,22 State II has a lower diffusion coefficient than state I, D = (0.86 ± 0.01) × 10 −10 m 2 s −1 , corresponding to r s = 38 Å. If states I and II behave like globular species (despite the presence of disordered segments), the ratio of their r s values suggests that state II is a tetramer.
For comparison, the native Cu,Zn-SOD1 dimer has r s = 30.3 Å, 20 further supporting the conclusion that state II is tri-or tetrameric (SI). A trimeric SOD1 species with partially unfolded regions has previously been implicated. 23 Factor 3 describes the final visible state III, which has essentially the same diffusion coefficient as state II, D = (0.89 ± 0.01) × 10 −10 m 2 s −1 , corresponding to r s = 37 Å. The similar diffusion coefficients and the large number of peaks with similar contributions from factors 2 and 3 (green in Figure 1J,K) imply that states II and III have similar structures.
Factor 4 describes state IV, which has a much higher diffusion coefficient than the other three states, D = (5.90 ± 0.2) × 10 −10 m 2 s −1 , yielding r s ≈ 5 Å. State IV involves only three signals and appears to build up simultaneously with state III ( Figure 1C,D), suggesting that the two events are linked; most likely, state IV represents a short peptide that is cleaved off from the protein.
Variations in intensity and diffusion rate for the different types of signals are shown in Figure 2. Note that the data in Figure 2A describe individual peaks, many of which contain contributions from more than one PARAFAC factor. The evolution profiles and diffusion coefficients of the three main states suggest the following linear reaction model: state I (monomer) → state II (partly unfolded oligomer) → state III (partly unfolded and cleaved oligomer) + state IV (peptide). The observed buildup of state III to a plateau value (Figure The signals in (I−L) are ordered such that the contribution from factor 1 deceases with increasing signal number, and the contribution from factor 3 increases. The colored areas indicate signals dominated by factor 1 in gray, factor 2 in pink, factors 2 + 3 in green, factor 3 in blue, and factor 4 in red. The individual curves in panels A−H represent 10 models created using 50% randomly chosen signals.

Journal of the American Chemical Society
Communication 1C) could result from an equilibrium with fibrils, reflect depletion of monomeric pwtSOD1 ΔC , which might be necessary for the turnover of state III (I + III → fibrils), or indicate that state III is a dead end product of proteolysis. Note that the end-stage fibrils are too large to be detected and studied by liquid state NMR, although flexible parts of fibrils could potentially be observable and mistakenly identified as unbound objects. 24 However, the diffusion coefficients of state II and III are not low enough to correspond to flexible parts of a large fibrillar state.
The 1 H chemical shifts of the resonances contributing to factor 2 or 3 cluster around 7.0−8.5 ppm, as typical for disordered protein segments. The absence of signals with chemical shifts typical of folded proteins is expected, because signals from structured parts of large oligomers are often broadened beyond detection. We note that an alternative interpretation of factors 2 and 3 could be that states II and III are fully unfolded monomers (which are also expected to have increased r s ), rather than partly unfolded oligomers. However, a number of observations argue against this interpretation. First, the unfolding/folding equilibrium is shifted toward the folded state under the present experimental conditions and the unfolding/folding dynamics are much faster than the observed buildup of state II. 25 Second, we validated our interpretation of the results by small-angle X-ray scattering, which unequivocally shows that pwtSOD1 ΔC is monomeric at day 0, but polydisperse with larger oligomeric species by day 19 (SI Figure S2). Both samples are dominated by globular species with disordered regions, but the disordered contribution is only slightly increased at the later time point (SI Figure S2B), implying that states II and III are far from fully unfolded. The buildup of oligomers was further confirmed by chemical crosslinking of lysine residues, followed by MALDI TOF/TOF mass spectrometry of cross-linked, intact pwtSOD1 ΔC samples obtained at day 0 and day 19 (SI Figure S3). We also used cryo-transmission electron microscopy to verify that fibrils formed during the course of the 6-week experiment (SI Figure  S4). It should be noted that our results do not exclude the possibility that the formation of the oligomeric state II takes place via the unfolded monomer, even though this state is weakly populated and the equilibrium unfolding dynamics is not rate limiting. Future experiments will reveal whether the identified states are on-path precursors to fibril formation.
We have shown that 1 H− 15 N-DOSY-HSQC spectroscopy in combination with PARAFAC analysis allows straightforward identification of four NMR-detectable species of pwtSOD1 ΔC present during fibril formation. PARAFAC yields the time evolution profiles and diffusion coefficients of these species, and identifies their chemical shifts. Thus, the approach enables powerful characterization of molecular processes that occur alongside fibril formation, or other macromolecular transitions occurring on similar time scales.

Journal of the American Chemical Society
Communication