Introduction

Huntington’s disease (HD) is a neurodegenerative disease due to a CAG trinucleotide repeat expansion1, ranging from 36 to 250 repeats2 and resulting in an extended polyglutamine (polyQ) tract within huntingtin (HTT) protein. Age at disease onset, usually between 30 and 55 years, is strongly and inversely correlated with the size of the expanded CAG repeat3 but only explains ~60–70% of the variance in age at onset3,4,5. An early onset of symptoms, before age 20 years, is considered to be the juvenile form of the disease (JHD). JHD patients account for 5–10% of individuals with HD and usually have more than 60 CAG repeats6. The inherited expanded CAG repeat is unstable and undergoes a progressive increase in length over time in somatic cells7,8,9. Quantification of somatic CAG repeat instability by PCR in several HD knock-in (KI) mouse models has revealed an initial CAG repeat size, age and tissue dependency of this phenomenon9,10,11. Strikingly, genomic DNA (gDNA) from postmortem brain samples from two HD individuals, who died of other causes and with no microscopic evidence of pathological cell loss in the striatum (inherited CAG repeat length of 41 and 51 and an age at death of 40 and 27 years respectively), showed dramatic mutation length increases in striatum (up to >1,000 CAG repeats) and in the cortex, though to a lesser extent11. These observations suggest that somatic instability could precede and influence the onset of symptoms. Small-pool PCR (SP-PCR) analysis of gDNA from postmortem cortical brain tissue in patients with HD also suggested that somatic CAG repeat instability influences age of disease onset, with larger gains in repeat length associated with earlier disease onset12. Several candidate genes involved in DNA mismatch repair were identified to drive somatic instability in a mouse model of HD13,14. Most notably, GWAS found that the length of the uninterrupted CAG tract drives HD onset in humans, and that polymorphic variation in a region containing DNA repair genes was associated with disease onset or progression of HD, together consistent with somatic CAG expansion as a driver of HD pathogenesis15,16,17,18.

Technologies allowing quantification of mutant HTT (mHTT) protein are of prime interest, not only for pharmacodynamics, but also for biomarkers of disease evolution. Indeed, using a micro bead-based IP-flow cytometry and Single Molecule Counting (SMC) assay, mHTT in cerebrospinal fluid (CSF) was shown to correlate with HD progression19,20. Furthermore, using the SMC assay, dose-dependent reductions of mHTT protein in CSF taken from patients receiving an anti-sense oligonucleotide to lower HTT were reported21. Other sandwich ELISA-based assays, such as those based on Meso Scale Discovery (MSD) technology, were also developed for mHTT detection22. All these assays are similar and only diverge by the signal read-out that allows for improved assay sensitivity, significantly extending the linear dynamic range beyond that achievable with traditional sandwich ELISA assays read-outs.

Sandwich ELISA-based methods for mHTT quantification use polyQ targeting detection Ab MW119,20,22. While MW1 and other polyQ targeting Abs (1C2 and 3B5H10) were initially proposed to recognize a specific mutant conformation23,24 or a specific toxic monomeric conformation25, recent studies have contradicted these hypotheses, suggesting a linear lattice model26,27,28,29,30. These Abs bind a small polyQ epitope in similar linear and extended conformations, with a higher avidity for expanded polyQ tracts due to the Ab’s bivalence. Thus, these polyQ-binding Abs do not specifically, but preferentially recognize mHTT.

Although the intensity of the signal of sandwich ELISA-based assays for mHTT was reported to be dependent on the polyQ length19,20,22,31, no study has accurately quantified this phenomenon. Even when the effects of polyQ length on mHTT quantification were considered, protein concentration was proposed to be the overwhelming contributor in the polyQ range seen in 93% of patients20,32. However, this hypothesis was based on results obtained at a single concentration with a different assay, the time-resolved Förster resonance energy transfer (TR-FRET) immunoassay33. Moreover, assessment of confounding variables for mHTT quantification in HD patients has revealed an association with inherited CAG repeat length19,33,34,35.

In this study, we assessed the effect of polyQ length on mHTT detection using MSD assay and polyQ targeting Abs. We observed that the signal detected can be as much as double for a variation of only 7 glutamines in the polyQ length range seen in the adult HD patient population32. We have taken advantage of this bias to design and validate a novel method to assess mean CAG repeat length at the protein level. This method could become a new benchmark to complement the PCR method, for detection of somatic expansion of unstable CAG repeats at the protein level.

Results

PolyQ length in mHTT affects its quantification by MSD assay using polyQ targeting Abs

The effect of polyQ length on the detection of mHTT by MSD assay was evaluated with a series of purified GST-FLAG-HTTexon1 fusion proteins containing polyQ lengths from Q19 to Q72 (Supplementary Fig. S1a). MSD is a method similar to ELISA except that electrochemiluminescence is used as detection readout: electricity is applied to the plate electrodes leading to light emission by electrochemiluminescent labels that are conjugated to detection antibodies. The monoclonal rabbit capture Ab EPR5526 was paired with different mouse monoclonal polyQ targeting detection Abs MW1, 1C2 and 3B5H10 for mHTT assays (Fig. 1a). A rigorous protocol was developed to achieve the most accurate protein concentrations of the GST-FLAG-HTTexon1 proteins used in the assay (see Methods, Supplementary Figs. S1, S2 and Supplementary Table S1). Results showed that the intensity of MSD signal obtained with MW1 detection Ab increased with increasing polyQ length (Fig. 1b), confirming previous published results. In contrast, the MSD signal intensity seen with the mouse monoclonal MAB5492 detection Ab, a non-polyQ targeting Ab36 (Fig. 1a) was solely dependent on protein concentration (Fig. 1c). If used with biological sample, the Abs pair EPR5526-MAB5492 will allow total HTT (WT and mutant) detection. When the slopes of the standard curves in the linear dynamic range obtained by MW1 were normalized by the slopes of the standard curves in the linear dynamic range obtained by MAB5492 (see Supplementary Data Set 1 for Method details), corresponding to mHTT/Total HTT assay, a strong polyQ length correlation was observed (R2 = 0.9971; Fig. 1d). Similar correlations were obtained with 1C2 and 3B5H10 detection Abs (R2 > 0.98; Supplementary Fig. S3).

Figure 1
figure 1

PolyQ length affects GST-FLAG-HTTexon1 quantification by MSD assay using polyQ targeting detection Ab. (a) Diagram shows antibody epitopes in human HTT protein (NCBI reference sequence: NP_002102.4). Calibration curve performance for GST-FLAG-HTTexon1 protein using MW1 (b) and MAB5492 (c) detection Abs. Curves were fitted with a four-parameter logistic regression model with 1/Y2 weighting. Mean values ± SD (1 σ) of duplicates of a single experiment are shown. (d) Plot of ratio of the slopes determined from standard curves in the linear dynamic range for mHTT assay by total HTT assay as a function of polyQ length exhibits a strong correlation. Mean values ± propagated SD (1σ) of duplicates of a single experiment are shown. (e) Using the polyQ length-dependent correlations shown in (d), MSD signal fold increase as a function of polyQ length at constant amount of mHTT protein was extrapolated for mHTT assay. mHTT signal predicted for GST-FLAG-HTTexon1 proteins from Q38 to Q62 was normalized by the MSD signal for GST-FLAG-HTTexon1-Q38. PolyQ lengths ranging from Q38 to Q62 correspond to the polyQ length range seen in adult HD patients. GST: glutathione S-transferase; N17: HTT first 17 aa; PRD: proline-rich domain.

To quantify Q-dependent signal rate change observed with MW1 for polyQ lengths in the range of adult HD patients, we extrapolated, from the correlation in Fig. 1d, the mHTT signal fold increase for each additional glutamine residue in GST-FLAG-HTTexon1 protein at constant protein concentration. In this aim, mHTT signal predicted for GST-FLAG-HTTexon1 proteins from Q38 to Q62 was normalized by the MSD signal for GST-FLAG-HTTexon1-Q38. Results showed that predicted mHTT signal with MW1 doubled with the addition of only 7 glutamine residues (Fig. 1e). These results suggest that polyQ length dependent bias has a significant effect on mHTT detection, even for CAG repeats in the HTT gene in the pathological range of most HD patients. Other polyQ targeting Abs 1C2 and 3B5H10 also exhibited a polyQ length-dependent bias but to a much lower extent than MW1 (Supplementary Fig. S3).

We next tested if the polyQ length-dependent bias with MW1 detection Ab could be observed with the full length endogenous HTT protein using homogenates from striatum of 6 months old heterozygous HD-KI mice bearing different CAG repeat lengths in the HTT gene. Initially, MSD signal for mHTT was not observed to be polyQ length-dependent (Supplementary Fig. S4a). However, analysis of samples by western blot (WB) revealed a decreased amount of mHTT with increased polyQ length and for constant amount of total protein (Supplementary Fig. S4b). Normalization of MSD signal by the amount of mHTT quantified by WB confirmed the polyQ length-dependent correlation with MW1 detection Ab and full length endogenous HTT (R2 > 0.99; Fig. 2). It is remarkable to observe such similar correlation to what was seen with purified GST-FLAG-HTTexon1 using another method of normalization, demonstrating the robustness of our finding. A similar polyQ length correlation was observed independently of the capture Ab used (monoclonal rabbit EPR5526, targeting N-terminus of endogenous HTT protein or monoclonal rabbit D7F7, targeting middle region; Fig. 1a), confirming that only the avidity of MW1 detection Ab is involved (Fig. 2). Most striking, polyQ length-dependent bias for full length endogenous HTT was observed for a very large polyQ length range (from Q44 to Q188). All together, these observations show an inherent bias in mHTT detection by sandwich ELISA-based assays, which can be quantified and thus corrected.

Figure 2
figure 2

PolyQ length-dependent effect on mHTT detection is also observed with full length mHTT from HD-KI mice. Homogenates from striatum of 6 months old HD-KI mice with 50, 80, 111, 140 and 175 CAG repeats were analyzed for detection of mHTT with two different capture Abs (EPR5526 and D7F7) and MW1 detection Ab. MSD signals were normalized by the amount of mHTT quantified by WB as shown in Supplementary Fig. S4. Mean values ± SD (1 σ) of n = 3 mice per group are shown.

A novel method to evaluate polyQ length expansion in mHTT containing tissues using MSD assay

We hypothesized that we could take advantage of polyQ length-dependent bias observed in mHTT detection by MSD assay to design a novel method for quantification of average polyQ length in a biological sample, such as tissue lysates or human biofluids (Fig. 3). In essence, we addressed if CAG repeat instability could be assessed at the protein level. The premises were 1) that HTT protein exhibits a mosaicism of polyQ lengths in biological tissue prone to CAG repeat instability37,38,39 and 2) that a population of HTT proteins with different polyQ lengths result in a similar detected signal to a single HTT protein with a polyQ length corresponding to the average polyQ length of the population. Briefly, the sample is analyzed twice by MSD assay: first, with non-polyQ targeting detection Ab such as MAB5492 that allows quantification of total HTT (WT and mutant form; Fig. 3a,b) then with polyQ targeting detection Ab that allows quantification of mHTT (Fig. 3c). Signal obtained in the linear dynamic range with polyQ targeting detection Ab for a determined HTT concentration can be used to estimate the average polyQ length by a mathematical model (Fig. 3d and Methods). Even if polyQ-targeting Abs preferentially bind expanded polyQ tract, they also interact, to a lower extent, with WT HTT. Similarly, Abs that do not target the polyQ tract interact with both WT and mHTT. Thus, our method which relies on quantification of both WT and mHTT, provides information on the average polyQ length in total HTT proteins.

Figure 3
figure 3

Method for HTT polyQ length quantification. HTT proteins exhibit a mosaicism of polyQ lengths in biological tissue prone to CAG repeat instability. To quantify average polyQ length in HTT proteins, the biological sample is quantified twice by sandwich ELISA-based assay with two pairs of Abs: one that includes a detection Ab that does not target the polyQ tract (a) to quantify total HTT (b) and another one that has a polyQ targeting detection Ab (c). This information is used in a mathematical model to determine the average polyQ length in HTT proteins (d) when samples are tested in the linear dynamic range.

To mimic in vitro polyQ length mosaicism in HTT protein from biological tissue prone to CAG instability, different amounts of GST-FLAG-HTTexon1 proteins with variable polyQ lengths were mixed (Tables 13). Using polyQ targeting detection Ab MW1 and a mix of GST-FLAG-HTTexon1 proteins with an average polyQ length of 48 residues (avgQ48a), there was a similar MSD dose response to the standard curve obtained with pure GST-FLAG-HTTexon1-Q48 (Fig. 4a), indicating that the same average polyQ length could be determined at different concentrations of total GST-FLAG-HTTexon1 protein. The graph in Fig. 4a also displays results obtained with pure GST-FLAG-HTTexon1-Q38 and -Q72 for comparison. The average polyQ lengths experimentally determined did not exceed 13% of relative error, the highest relative error at the lowest concentration tested (Table 1). We then generated the same average polyQ length by different protein mixings of GST-FLAG-HTTexon1 (avgQ48a, avgQ48b and avgQ48c; Table 2). The average polyQ length experimentally determined at a single concentration was constant for a similar average polyQ length obtained by using different protein mixings (Fig. 4b and Table 2), highlighting the robustness of our method. Finally, we generated 9 different average polyQ lengths from avgQ38 to avgQ58 with 2.5Q increments (Table 3). The different average polyQ lengths determined experimentally at a single concentration exhibit a strong linear correlation with theoretical average polyQ lengths (R2 = 0.9829; Fig. 4c and Table 3). Intra-batch accuracy and precision for average polyQ length quantification were less than 13% of relative error and less than 4% of coefficient of variation for all conditions tested (Tables 13). Results obtained with other polyQ targeting Abs 1C2 and 3B5H10 were similar but with a lower accuracy (Supplementary Fig. S5 and Supplementary Tables S2–7). All together, these data validate the ability of our method to estimate the average polyQ length in a mix of HTT proteins with variable polyQ lengths, with the Ab pairs EPR5526-MW1 (for mHTT) and EPR5526-MAB5492 (for total HTT) being superior in accuracy.

Table 1 Quantification of the same average polyQ length (avgQ48a) at different protein concentrations with MW1 detection Ab.
Table 2 Quantification of the same average polyQ length obtained by different protein mixings with MW1 detection Ab.
Table 3 Quantification of different average polyQ lengths with MW1 detection Ab.
Figure 4
figure 4

Pre-validation of method for average polyQ length quantification using MW1 and MAB5492 detection Abs: parallelism, dilution linearity, accuracy and robustness evaluation. (a) Serial dilution of a mix of GST-FLAG-HTTexon1 proteins with an average polyQ length of 48 residues (avgQ48a) gives similar results to a pure GST-FLAG-HTTexon1-Q48 using MW1 detection Ab. Serial dilution of pure GST-FLAG-HTTexon1-Q38 and -Q72 are also displayed for comparison. Mean values ± SD (1 σ) of duplicates of a single experiment are shown. (b) Average polyQ length experimentally determined for a similar average polyQ length of 48 residues done by different protein mixings. Dashed gray line corresponds to the theoretical average polyQ length. (c) Different average polyQ lengths experimentally determined are plotted as a function of theoretical average polyQ length. Dashed gray line corresponds to the perfect correlation between experimental and theoretical average polyQ lengths. Mean values ± propagated SD (1 σ) of duplicates of a single experiment are shown.

Average polyQ length at the protein level correlates with average CAG repeat length at the DNA level in postmortem brain of HD mouse and HD patients

To establish whether our assay is suitable to measure the average polyQ length in endogenous HTT proteins from brain tissue, we examined striatum from homozygous HdhQ140 KI mice from different litters and of different ages (from 3.3 to 13 mo). Since this mouse model was previously shown to exhibit intergenerational CAG repeat changes40, we expected to detect a variation in average polyQ length in HTT between animals. To test this idea, data obtained by MSD assay were compared with the extent of CAG repeat instability measured in gDNA from the contralateral striatum using PCR method adapted from Lee et al.41 (see Method section for details). The MSD signal is normalized using the MSD signal ratio (MSD of mHTT/MSD of Total HTT; plotted on y-axis in the figure). The MSD signal for total HTT is solely dependent on protein concentration and does not depend on polyQ length. Results showed that MSD signal ratios (EPR5526-MW1/EPR5526-MAB5492) obtained from striatum of HD mice exhibited a strong correlation with average CAG repeat length determined by PCR (R2 = 0.7929; Fig. 5a). Remarkably, the average CAG repeat length was determined from contralateral striatum which may have introduced some variation and could explain, at least in part, some outliers. Unfortunately, we could not interpolate the average polyQ length from these data because 1) the recombinant GST-FLAG-HTTexon1 proteins used as standards do not bear sufficient polyQ repeats tracts and 2) it was reported that the same concentration of the full length and truncated HTT proteins with similar polyQ lengths are detected with wide difference in intensity31. Even though we showed polyQ length correlation with full length endogenous mHTT (Fig. 2), anchor points of this correlation are probably different than those obtained with GST-FLAG-HTTexon1.

Figure 5
figure 5

MSD signal ratio for mHTT by total HTT, corresponding to the average polyQ length, correlates with average CAG repeat length. (a) Striatal homogenates from 14 homozygous HdhQ140 KI mice of different ages were analyzed by MSD assay for average polyQ length quantification (MSD signal ratio MW1/MAB5492 corresponding to mHTT/Total HTT). Results were plotted as a function of average CAG repeat length determined by PCR method in DNA extracted from the contralateral striatum of each animal (see Methods, Quantification of average CAG repeat length). It is unclear why there is more variability (larger SDs) in raw MSD signals for samples between ~108 and 124 CAG repeats than for other samples. All samples were processed at the same time and in the same manner, so it is likely that variation may be from pipetting. (b) Homogenates prepared from postmortem cortex of HD patients were analyzed by MSD assay for average polyQ length quantification (MSD signal ratio for mHTT by total HTT). Results were plotted as a function of average CAG repeat length determined by PCR method from the same sample lysates (see Methods, Quantification of average CAG repeat length). Light blue sample was below the level of detection (background + 3 SD) for total HTT assay and was not used for correlation. Mean values ± propagated SD (1 σ) of duplicates of a single experiment are shown. Please note that the MSD signal is normalized using the following MSD signal ratio (MSD of mHTT/MSD of Total HTT; plotted on y-axis in the figure). The MSD signal for total HTT is solely dependent on protein concentration and does not depend on polyQ length.

Having established that MSD signal ratios (EPR5526-MW1/EPR5526-MAB5492) for endogenous HTT could be correlated with CAG repeat length in HD mice, we next focused on analysis of human postmortem HD brain. We analyzed lysates from postmortem cortex of 2 adult and 5 juvenile HD (JHD) patients. Protein and DNA analysis were done in the same sample lysate for all samples. As it was shown that exon 1 of HTT is produced via incomplete splicing of the HTT pre-mRNA in HD patient tissue42, we used an additional Abs’ pair (D7F7-MAB2166) for total HTT quantification (WT + mutant form) that does not recognize the truncated form of HTT. The MSD signal ratios (mHTT/Total HTT) displayed a high correlation with average CAG repeat length determined by PCR for both Ab pairs used for normalization (R2 > 0.9; Fig. 5b) and a strong parallelism between them. Among the samples tested for total HTT quantification with EPR5526-MAB5492, two non-affected individuals were not used for correlation because their signals were below background signal for the level of detection (data not shown).

Discussion

Currently, lowering mHTT is a major therapeutic strategy under investigation in many laboratories and in clinical trials for HD patients43,44, therefore accurate quantification using ultra-sensitive immunodetection methods is vital. mHTT can be preferentially distinguished from WT by polyQ targeting Abs23,24,25 sensitive to expanded polyQ repeats containing more epitopes than normal polyQ tracts. The increased avidity of such Abs for longer polyQ tracts was recognized as a potential bias in mHTT quantification19,20,22,31. However, even if the levels of mHTT were associated with inherited CAG repeat length19,33,34,35, polyQ length was considered as a minor contributor compared to mHTT protein concentration20. Previously, a series of purified truncated HTT proteins with different polyQ lengths was detected by TR-FRET assay and MW1 Ab33. The authors reported a 10- to 20-fold higher sensitivity for mHTT than WT HTT. They did not mention that with an increase of 7 glutamines, corresponding to the range of polyQs of HD patients tested in their study, they had a signal increase of ~40% for the same HTT protein concentration (see their Supplementary Fig. S1b). The analysis of large polyQ length series was not evaluated with sandwich-ELISA based methods currently used in clinical trial20,31. Here, we show that MSD signal detected with polyQ targeting Ab MW1 increases and, most of all, strongly correlates with polyQ length in purified N-terminal HTT fragments (Fig. 1d) and in endogenous full length HTT obtained from HD KI mice and human cortex (Figs. 2 and 5). Remarkably, this polyQ length dependent bias is evident for polyQ tracts that are in the range of adult onset HD patients as well as very large polyQ tracts (up to Q188). Our data suggest that even small polyQ length variations could lead to a large inaccuracy in mHTT quantification (Fig. 1e). When considering that somatic CAG repeat expansion occurs in HD brain11,12, the inaccuracy of mHTT quantification may be even greater.

Our findings raise questions about the reported increase in mHTT in CSF with disease progression using micro bead-based IP-flow cytometry and SMC assays19,20,35: is it solely due to mHTT concentration or might there be a contribution of CAG repeat instability? This is especially important if we consider that mHTT detected in CSF could preferentially come from dying cells exhibiting a very high level of instability. This issue is further complicated by findings that mHTT increases with disease progression in peripheral blood mononuclear cells (PBMC) but without significant difference in total HTT33,34. Initially, CAG repeat instability was proposed as a possible explanation for progressive increase in mHTT levels with no concomitant differences in total HTT level, but another likely explanation was a progressive accumulation of N-terminal fragments33. The latter explanation is challenged by a recent study showing no variation in N-terminal HTT level at different disease stages in PBMC34. The presence of CAG repeat instability is unlikely to influence the relative quantification of mHTT in current therapeutic silencing studies where a reduction in mHTT is measured as a change from baseline before treatment21, normalizing potential bias due to polyQ length difference between patients. Only CAG instability over the course of the longitudinal study could affect results.

In this study, we exploited the biasing effect of polyQ sensitive Abs in mHTT detection to design a novel method to assess the average polyQ length in HTT in samples where there is a population of HTT proteins with different polyQ lengths as might be expected under conditions of CAG repeat instability (Fig. 3). Our method relies on the normalization of MSD signal detected with polyQ targeting Ab MW1 by the amount of total HTT, corresponding to the MSD signal detected with non-polyQ targeting Ab MAB5492. The method proved to be sensitive, accurate and robust when tested using purified GST-FLAG-HTTexon1 (Fig. 4). Moreover, polyQ length assessment at the protein level strongly correlated with CAG repeat length at the DNA level in postmortem brain lysates from HD mice and patients (Fig. 5). It should be noted that for comparison of MSD signal ratio (mHTT/Total HTT), all samples were tested under the same conditions (same amount of total protein) and the detected signals were in the linear dynamic range of detection. Signal detected in 2 non-affected individuals was in the background signal and could not be used for MSD signal ratio and comparison with samples from HD individuals.

Studies have shown that the level of CAG repeat instability is higher in cortex than in cerebellum10,11. We confirmed this observation at the DNA level with our human sample set (data not shown). However, we were unable to obtain detectable signals for total HTT from cerebellar lysates, preventing the calculation of the MSD signal ratios. HTT protein was previously detected at lower levels in human cerebellum than in cortex of the same HD postmortem brains37,45. In our study, samples of human cerebellum and cortex were taken from the same brains and were processed in the same way and at the same time. Thus, we think the amount of total HTT present in cerebellar tissue is below the level of detection in our assay rather than an issue of the quality of the postmortem tissues or protein lysates.

Our method of determining average polyQ length relies on the correlation between MSD signal ratio (EPR5526-MW1/EPR5526-MAB5492) and polyQ length in HTT proteins. Using a different immuno-assay (AlphaLISA), different polyQ Ab (3B5H10) and non-polyQ (MAB2166 and D7F7) Abs and different materials (cell culture lysate and purified full length HTT), Baldo et al.46 reported that the ratio of mHTT/Total HTT signals increased with polyQ length. However, they did not perform further analysis. Our review of their data showed that similar to our findings, the ratio of their mHTT/Total HTT values shows a strong correlation with polyQ length (from their Fig. 4 and Fig. 5c; data not shown; R2 > 0.99).

Gold standard methods for determining CAG repeat instability involve PCR amplification from “bulk” or multiple small pools of genomic DNA. The negative correlation between CAG repeat length and PCR amplification efficiency represents a significant pitfall for accurate quantification47,48. However, despite a likely underestimation of CAG instability, especially for the bulk method that cannot detect the rare large expansions, the results obtained with these two methods exhibit a strong correlation41. Data obtained by bulk PCR in our studies exhibited a strong correlation with detected average polyQ length in HTT. Thus, we present a new complementary method to PCR for evaluating average instability at the protein level. Though less informative than PCR because it provides only average polyQ lengths without size distribution, it may allow an evaluation of expansion in tissues where HTT proteins, but not HTT gene, can be detected (e.g. CSF).

Our method of predicting HTT protein average instability relies on quantification of both WT and mHTT; both alleles must be expressed equally to correlate with CAG repeat length. We have observed that the level of mHTT decreases with polyQ length in HD-KI mouse models (Supplementary Fig. S4b). However in human, some western blot studies have observed an increased level of mHTT compared to WT in both adult and JHD brains37 or a lower amount of mHTT than WT solely in JHD brains38 and fibroblasts38,49. These inconsistent results could be due to a variety of factors including small sample size, the type of sample (brain or cell lines), the extent of separation of WT and mHTT or broader migration of mHTT in SDS-PAGE, probably due to CAG repeat instability and polyQ length mosaicism. Recently, a novel mass spectrometry-based method was developed to quantify allele specific HTT protein levels using polymorphic variants50. From the 28 adult HD subject-derived lymphoblast cell lines tested, levels of mHTT protein were highly associated with levels of WT HTT and were not correlated with the expanded CAG repeat size. These results argue against the idea that there is a potential effect of CAG repeat length on HTT protein level at least in the adult onset range. Although the impact of CAG repeat size on HTT expression levels in human brains remains largely unsolved, especially for JHD, our data, showing that polyQ length quantification significantly correlates with CAG repeat size, argues that both WT and mHTT levels are equal.

Our method relies on immunodetection of HTT proteins and therefore is subject to technical issues common to this approach, such as matrix influences or interfering substances. A fragment of HTT protein, corresponding to the 1–573 N-terminal aa, was reported to produce a higher signal than the full length HTT protein at comparable concentrations31. Fodale et al. consider results as a best estimate rather than absolute for mHTT quantification31. We were unable to obtain a series of stable purified full length HTT proteins with increasing polyQ lengths to compare to the results obtained with GST-FLAG-HTTexon1. The presence of HTT fragments has been reported in HD brain51,52,53. Additionally, flanking regions of the polyQ tract, which were sites used for total HTT detection in our assay, may be affected by polyQ length as described by others54,55 and may introduce a bias when determining Total HTT. It is noteworthy that the MSD signal ratios (mHTT/Total HTT) obtained from human cortical lysates with 2 different Abs’ pairs—targeting flanking polyQ regions and more C-terminal domains in HTT—displayed a high correlation with average CAG repeat length and a strong parallelism between them (Fig. 5b), suggesting that the contribution of truncated forms of HTT and the impact of polyQ length on flanking regions, if any, is negligible.

We have shown that MSD signal ratio (EPR5526-MW1/EPR5526-MAB5492) followed a simple polyQ length correlation in the linear dynamic range of our assay (Fig. 1d). We analyzed HD brain samples in this range of detection. The constraint of linear dynamic range could be a problem for polyQ assessment in samples with very low concentrations of HTT. However, results obtained with GST-FLAG-HTTexon1 showed that parameters from 4-parameters logistic regression are constant (Bottom and HillSlope) or strongly polyQ length dependent (Top and EC50) (Supplementary Fig. S6), allowing us to predict a regression curve for any polyQ length as illustrated in Fig. 3. Such an improved model for polyQ length assessment should overcome the limitation of our current study.

Genome-wide association studies identified potential genetic modifiers involved in CAG repeat instability15,17, opening an area for future therapeutic intervention. Our study represents a proof of principle for CAG repeat quantification at the protein level and paves the way for further studies. Our method relies on the detection of mHTT and Total HTT, which have both been detected and quantified in patient CSF using the SMC assay19,20, thus our assay potentially represents a way to study indirectly the extent of CAG repeat instability in vivo in the patient’s central nervous system. The lower limit of quantification of our MSD sandwich ELISA-based assay for mHTT (picomolar range) is not sensitive enough for quantification of mHTT in clinical CSF samples from HD patients. The SMC assay is required to reach femtomolar sensitivity20,31,35. Quantification of average CAG instability by our method adapted to SMC assay, could more accurately predict age of disease onset12 and be used in future clinical approaches that aim to reduce CAG repeat instability14,56,57.

Methods

Cloning

Plasmid vectors pGEX-6P-1 coding for GST-FLAG-HTTexon1 proteins with Q32, Q44 and Q55 were kindly provided by Erich Wanker58. DNA fragment coding for HTTexon 1 proteins with Q19 was available in-house59. DNA fragment coding for HTTexon 1 proteins with Q38 was a gift from Pamela Bjorkman26 (Addgene plasmid #11514). DNA fragments coding for HTTexon 1 proteins with Q25 and Q72 were kindly provided by Boxun Lu60. DNA fragment coding for HTTexon 1 proteins with Q48 was obtained by PCR from HD cell line (National Institute of Neurological Disorders and Stroke Repository at the Coriell Institute for Medical Research; catalog no. ND38551). DNA fragments coding for HTTexon1 were subcloned into the EcoRI and EagI sites of pGEX-6P-1 vector (GE Healthcare Life Sciences), in frame with the GST-FLAG sequence. The coding regions of all vectors were verified by DNA sequencing.

Protein production

GST-FLAG-HTTexon1 proteins were produced in E. coli BL21(DE3)pLysS competent cells (Thermo Fisher Scientific) grown at 16 °C in Lenox L broth base (ThermoFisherScientific) supplemented with ampicillin (100 μg/mL). For all proteins, the production was performed in 2 L flasks containing 400 mL of culture medium under constant agitation (220 rpm). Protein production was induced by adding IPTG (200 μM) when the optical density at 600 nm reached ~0.7. Bacteria were cultured overnight post-induction (~20 h), harvested by centrifugation and kept frozen (−20 °C).

Protein purification

All purification steps were done on ice or at 4 °C. Lysis Buffer = Tris (10 mM; pH 8), NaCl (50 mM), KCl (50 mM), glycerol (10%). Elution Buffer = Tris (10 mM; pH 8), NaCl (150 mM), reduced glutathione (50 mM). Dialysis Buffer = phosphate (50 mM; pH 7.4). Bacterial pellets were thawed and suspended in 15 mL of lysis buffer containing lysozyme (10 mg/L), DTT (1 mM) and complete EDTA-free protease inhibitor cocktail (Roche). Bacteria were lysed by sonication during 2.5 min as follow: 3 s “on”, 10 s “off” using Sonic dismembrator model 500 set at 40% and 1/8” probe (Thermo Fisher Scientific). After centrifugation at 14,000 g for 1 h, the soluble bacterial extract was loaded at gravity flow on 400 μL of Glutathione Sepharose 4B affinity chromatography resin (GE Healthcare Life Sciences) in a Poly-Prep chromatography column (Bio-Rad). Resin was then washed with 10 volumes of lysis buffer, the first 5 volumes containing Triton detergent (0.5%) to improve the release of nonspecifically bound bacterial material. Finally, GST-FLAG-HTTexon1 proteins were sequentially eluted once with 100 μL then 5 times with 200 μL of elution buffer. Protein containing eluates (usually eluate 2 to 4) were diafiltrated by 5 washing out steps with dialysis buffer and Amicon Ultra-0.5 Centrifugal Filter Unit with Ultracel-3 membrane (MilliporeSigma). To avoid unnecessary losses upon freezing/thawing, protein stock concentrations were adjusted by diluting them in dialysis buffer and were stored at concentrations ranging from ~65 to ~100 μM. Comparison of concentrations before and after freeze/thawing showed negligible losses (<6%). To remove potential aggregates generated by the freezing/thawing process, thawed protein samples were centrifuged at 16,000 g and 4 °C for 5 min and the supernatant was collected. This centrifugation and supernatant collection step was performed twice. If used below 10 μM, bovine serum albumin (MilliporeSigma; #A2153) was added to proteins at 2 mg/mL to limit protein adsorption on pipette tips.

Determination of purified protein concentration

Protein concentration was measured using its specific molar attenuation coefficient, after absorption spectrum scanning between 220 and 350 nm with DS-11 spectrophotometer (Denovix). Molar attenuation coefficient was computed with ProtParam tool on ExPASy bioinformatics resource portal61. Purity of full-length GST-FLAG-HTTexon1 proteins ranged from 67 to 90% depending on protein batch: a protein of the same size (~28 kDa) copurified with all proteins produced (Supplementary Fig. S1a), in proportion that is pure CAG repeat length dependent (Supplementary Fig. S1b). This product corresponded to the molecular mass of GST-FLAG and was detected with EPR5526 (anti-hHTT aa 1–100) but not with MW1 Ab by western blotting (Supplementary Fig. S1c). All together, these data suggest that 1) EPR5526 Ab targets HTT first 17 aa (N17), located N-terminally to the polyQ tract; 2) the 28 kDa species is composed of GST-FLAG-N17 and 3) only GST-FLAG-HTTexon1 protein can be detected by Ab pairs used for MSD assay. Quantification of protein concentration by absorbance at 280 nm, which measures absorbance of both GST-FLAG-HTTexon1 and GST-FLAG-N17 in solution, showed different results than Coomassie blue staining, which allowed a relative quantification of GST-FLAG-HTTexon1 (Supplementary Fig. S1d). To adjust protein concentration of GST-FLAG-HTTexon1 estimated by absorbance at 280 nm, correction factors for each batch of GST-FLAG-HTTexon1 protein were estimated based on relative quantification after SDS-PAGE (Supplementary Table S1). The same amount of GST-FLAG-HTTexon1 protein (according to absorbance at 280 nm) was fluorescently labeled with Amersham QuickStain Protein Labeling Kit (GE Healthcare) prior to SDS-PAGE. For each protein sample, fluorescent signal corresponding to GST-FLAG-HTTexon1 was quantified and normalized by average fluorescent signal of all GST-FLAG-HTTexon1 with different polyQ lengths. To validate experimentally these correction factors, we utilized the MSD platform for detection of GST-FLAG-HTTexon1 using EPR5526 capture Ab and MAB5492 detection Ab (Supplementary Fig. S2). Without correction of protein concentrations estimated by absorbance at 280 nm, the slope of standard curves obtained exhibited variations (Supplementary Fig. S2a) while after correction of protein concentrations, all standard curves overlap as expected (Supplementary Fig. S2b).

Mouse and HD patient-derived material

Mice with human exon1 KI within the endogenous mouse HTT gene—HdhQ5062, HdhQ8063, HdhQ11164, HdhQ14065 and zQ17566—with the same strain background of C57BL/6 were obtained from The Jackson Laboratory and HdhQ140 were bred and maintained at the MGH animal facility. Although named “Qn”, the average CAG repeat length of these mice is slightly different due to instability at the locus. Mice were anaesthetized with CO2 followed by cervical dislocation. The brain was rapidly removed, snap frozen using dry ice and stored at −80 °C for further use. After brain thawing to 4 °C, striatal tissues were dissected on ice, rapidly frozen using dry ice then stored at −80 °C for further use.

Human brain tissue was obtained from the Brain Tissue Resource Center (Belmont, MA), the University of Massachusetts, Department of Neuropathology and the Massachusetts General Hospital Neuropharmacology Laboratory Brain Bank. All tissue was quickly frozen and stored at −80 °C until further analysis. The time between death and brain dissection was variable but was always between 4 and 48 h. The dissections of neocortex and cerebellar cortex were performed to exclude the underlying white matter as much as possible.

Striatal and cerebellar samples were homogenized in buffer composed of HEPES (10 mM pH 7.4), sucrose (250 mM), EDTA (1 mM), complete EDTA-free protease inhibitor cocktail (Roche) and for some samples NaF (1 mM), Na3VO4 (1 mM) and were sonicated 10 s using Sonic dismembrator model 500 set at 20% and 1/8” probe (Thermo Fisher Scientific). Samples were then centrifuged at 16,000 g at 4 °C for 15 min. Supernatant was collected, aliquoted and stored at −80 °C for further use.

Bradford assay

Protein Assay dye reagent (Bio-Rad) was diluted with 4 volumes of distilled, deionized water. One volume of biological sample (diluted 10- and 20-fold) or bovine serum albumin protein standard was mixed with 50 volumes of diluted dye reagent and 200 µL was loaded into 96-well plate (Thermo Fisher Scientific, #269620). Optical density at 600 nm was recorded with a Victor2 Multilabel plate reader (PerkinElmer). Total protein concentration was interpolated from bovine serum albumin standard curve made with 11 dilution points from 100 mg/mL and 2-fold dilution path. Protein concentration was expressed as mg/mL.

SDS-PAGE electrophoresis

Three volumes of protein samples were mixed with one volume of NuPAGE™ LDS Sample Buffer 4 x (Thermo Fisher Scientific) and denatured by heating at 70 °C for 10 min. After a brief centrifugation step (1 min at 17,000 g), the same amount of denatured proteins was loaded on NuPAGE gel (Thermo Fisher Scientific) and separated by electrophoresis. Protein amounts were adjusted to optimize the different readouts: 90 pmol of GST-FLAG-HTTexon1 proteins for Coomassie blue staining; 3.75 pmol of GST-FLAG-HTTexon1 proteins for fluorescent detection; 15.6 pmol of GST-FLAG-HTTexon1 proteins for WB; 10 μg of total protein of mouse derived material. Purified proteins were migrated through 4–12% Bis-Tris gels with MES running buffer (Thermo Fisher Scientific) at 200 V constant and mouse brain lysates through 3–8% Tris-Acetate gels with Tris-Acetate running buffer (Thermo Fisher Scientific) at 120 V constant until suitable separation. Coomassie blue staining was done with PhastGel Blue R-350 (GE Healthcare Life Sciences) as recommended by the manufacturer and detected using white transillumination light of a FluorChem SP Imager (Alpha Innotech). Fluorescent labeling was done with Amersham QuickStain Protein Labeling kit (GE Healthcare Life Sciences) as follows: 1 μM of purified proteins were labeled with 0.25 μL of Cy5 dye reagent in a final volume of 12 μL of phosphate buffer 50 mM pH7.4 and incubated 30 min at RT. In-gel fluorescent detection was done using Odyssey Imaging System (LI-COR). Quantification was done using ImageJ software67.

Antibodies

All antibodies used in this study are commercially available: EPR5526, rabbit monoclonal anti-HTT (Abcam Cat# ab109115, RRID:AB_10863082); MW1, mouse monoclonal anti-polyQ (DSHB Cat# mw1, RRID:AB_528290); 1C2, mouse monoclonal anti-polyQ (Millipore Cat# MAB1574, RRID:AB_94263); 3B5H10, mouse monoclonal anti-polyQ (Sigma-Aldrich Cat# P1874, RRID:AB_532270); MAB5492, mouse monoclonal anti-HTT (Millipore Cat# MAB5492, RRID:AB_347723); D7F7, rabbit monoclonal anti-HTT (Cell Signaling Technology Cat# 5656, RRID:AB_10827977); MAB2166, mouse monoclonal anti-HTT (Millipore Cat# MAB2166, RRID:AB_ 2123255); peroxidase-conjugated AffiniPure, donkey polyclonal anti-rabbit IgG (Jackson ImmunoResearch Labs Cat# 711–035–152, RRID:AB_10015282); peroxidase-conjugated AffiniPure, donkey polyclonal anti-mouse IgG (Jackson ImmunoResearch Labs Cat# 715–035–150, RRID:AB_2340770) and Sulfo-Tag labeled, goat polyclonal anti-mouse IgG (Meso Scale Discovery, Cat# R32AC, RRID:AB_2783819).

Western blotting

Transfer to a nitrocellulose membrane was done using Trans-Blot® Turbo™ Transfer System according to manufacturer’s instructions. Membrane was then blocked with PBS-Tween 0.1% + 5% of Blotting-Grade Blocker (Bio-Rad) for 1 h on a rocking shaker. Membrane was then incubated overnight at 4 °C on a rocking shaker with 10 mL of EPR5526 (133.7 ng/mL; 1/10,000); MW1 (27.5 ng/mL; 1/10,000) or D7F7 Ab (1/2,000) in PBS-Tween 0.1% + 5% of Blotting-Grade Blocker. Membrane was then washed 3 × 10 min with 10 mL of PBS-Tween 0.1%. Membrane was then incubated for 1 h at RT on an orbital shaker with 10 mL of Peroxidase-conjugated AffiniPure Donkey anti-Mouse (1/5,000) or Peroxidase-conjugated AffiniPure Donkey anti-Rabbit IgG (1/2,500 or 1/5,000) in PBS-Tween 0.1% + 5% of Blotting-Grade Blocker. Membrane was then washed 3 × 10 min with 10 mL of PBS-Tween 0.1%. Acquisition was done using SuperSignal West Pico Chemiluminescent Substrate (Thermo Fisher Scientific) according to manufacturer’s instructions and FluorChem SP Imager (Alpha Innotech).

MSD assay

Multi-Array 96-well standard plates (MSD) were coated overnight at 4 °C on a flat surface with 30 µL of D7F7 or EPR5526 capture Ab (2 µg/mL) in PBS pH 7.4 (Thermo Fisher Scientific). Plates were emptied and blocked with 150 µL of 3% bovine serum albumin (BSA) in PBS-Tween 0.05% pH 7.4 for 2 h at room temperature and 1,000 rpm on orbital microplate shaker (Scientific Industries). After 3 washes with 150 µL of washing buffer (PBS-Tween 0.05% pH 7.4), 30 µL of diluted samples were distributed into plates and incubated 1 h (for purified proteins) or 2 h (for biological samples) at room temperature and 1,000 rpm. The amount of biological material tested was adjusted for each pair of Abs to obtain signal in the linear dynamic range of detection: ~10 μg of total protein of mouse derived material (for D7F7-MW1 and EPR5526-MW1 mHTT assays); ~140 μg of total protein of mouse derived material (for EPR5526-MAB5492 Total HTT assay); 6 μg of total protein of human derived material for EPR5526-MW1 mHTT assays and 50 or 80 μg of total protein of human derived material for D7F7-MAB2166 or EPR5526-MAB5492 Total HTT assays respectively. Plates were then washed 3 times with washing buffer and incubated with 30 µL of detection antibody and incubated for 1 h at room temperature and 1,000 rpm. Depending on the type of sample, different concentrations of detection Abs were used for optimal signal-to-noise ratio: MW1 (2 µg/mL); 3B5H10 (2 µg/mL); 1C2 (1:1,000 or 1:2,500); MAB5492 (1:5,000 or 1:20,000) and MAB2166 (1:10,000). After 3 washes with 150 µL of washing buffer, 30 µL of goat anti-mouse SulfoTag secondary Ab (2 µg/mL) were distributed into plates and incubated 1 h at room temperature and 1,000 rpm. After 3 washes with 150 µL of washing buffer, 150 µL of 2X Read Buffer T with surfactant (MSD) were distributed into plates before reading on QuickPlex SQ120 instrument (MSD) according to manufacturer’s instructions.

Regression analysis of MSD data

Calibration curves of purified proteins were fitted with a four-parameter logistic regression model, 1/Y2 weighting and least squares’ method using Solver, a Microsoft Excel Office 365 software add-in program. Four-parameter logistic regression model is:

$$MSD\,Signal=Bottom+\frac{{x}^{HillSlope}\times (Top-Bottom)}{{x}^{HillSlope}+EC{50}^{HillSlope}}$$
(1)

where Bottom and Top are plateaus of MSD signal; x is protein concentration; EC50 is the protein concentration that gives MSD signal half way between Bottom and Top; HillSlope is a factor representing the steepness of the standard curve. Slope of standard curves in the linear dynamic range were determined as shown in Supplementary Data Set 1.

For other regression analysis, different models (linear, exponential, power, logarithmic and power) were tested with the least squares’ method using Microsoft Excel Office 365 software. Regression models with highest R-squared value were selected.

Quantification of average polyQ length

When the amount of HTT is assessed in the linear dynamic range of our MSD assays, then:

$$MSD\,Signal=Concentration\,\times \,Slope$$
(2)

and we showed in Fig. 1d that:

$$polyQ\,Length=f(\frac{Slope\,(MW1)}{Slope\,(MAB5492)})$$
(3)

Combination of Eqs. (2) and (3) leads to:

$$polyQ\,Length=f(\frac{MSD\,Signal\,(MW1)}{MSD\,Signal(MAB5492)})$$
(4)

Both ratio of MSD signal obtained with MW1 by MSD signal obtained with MAB5492 or ratio of slope of MSD signal obtained with MW1 by ratio of slope of MSD signal obtained with MAB5492 could be used to quantify average polyQ length in biological samples. For MSD signal ratios or ratios of slope of MSD signals, propagation of error was calculated by the equation:

$$Propagated\,S{D}_{\frac{A}{B}}=\frac{A}{B}\times \sqrt{({(\frac{S{D}_{A}}{A})}^{2}+{(\frac{S{D}_{B}}{B})}^{2})}$$
(5)

PolyQ length was extrapolated from standard curve obtained by testing different concentrations of GST-FLAG-HTTexon1 with different polyQ lengths.

PCR assay

Genomic DNA was isolated from tissues for somatic instability analysis using the DNeasy Blood & Tissue Kit (Qiagen). The size of the HTT CAG repeat was determined using a PCR assay that amplifies the HTT CAG repeat. The forward primer was fluorescently labeled with 6-FAM (Applied Biosystems) and products were resolved using the ABI 3730xl DNA analyzer (Applied Biosystems) with GeneScan 500 LIZ as internal size standard (Applied Biosystems).

Quantification of average CAG repeat length

PCR amplification of trinucleotide repeats from tissue prone to CAG instability generates multiple PCR products, viewed using GeneMapper software as a cluster of peaks differing by a single CAG repeat unit41. The following steps were used to determine the average CAG repeat quantification: 1) WT and mutant huntingtin alleles were analyzed individually; 2) 5% (threshold factor) of the height of the highest peak was set as a relative peak height threshold (peaks with heights lower than this threshold level were excluded from quantification); 3) peak heights were normalized by dividing the peak height of each peak by the sum of the heights of all signal peaks; 4) the normalized peak heights were multiplied by their related CAG repeat lengths; 5) values from step 4 were summed to get the average CAG repeat length for each allele; 6) average CAG repeat length for each allele were averaged.

Study approval

The animal protocol was approved by the MGH Subcommittee on Research Animal Care - Office of Laboratory Animal Welfare #2004N000248. All procedures conform to the USD Animal Welfare Act, “the Institute for Laboratory Animal Research Guide for the Care and Use of Laboratory Animals”, Physician Health Services Policy on Humane Care and Use of Laboratory Animals.