Analysis of Brain and Cerebrospinal Fluid from Mouse Models of the Three Major Forms of Neuronal Ceroid Lipofuscinosis Reveals Changes in the Lysosomal Proteome*

Neuronal ceroid lipofuscinoses are fatal neurodegenerative lysosomal diseases. Protein biomarkers that reflect ongoing disease processes could provide a valuable readout to help optimize dose and dosing regimen and determine response to treatment. A quantitative proteomic analysis of brain and cerebrospinal fluid from mouse models of three NCL diseases and controls was conducted at time points corresponding to early and late stage disease. Resulting data were cross-referenced with previous studies in mouse and humans, identifying promising biomarker candidates for further study. Graphical Abstract Highlights Proteomic analysis of brain and cerebrospinal fluid from mouse NCL models. Disease-associated changes in lysosomal proteins and markers of neuroinflammation. Biomarker candidates identified for further evaluation. Treatments are emerging for the neuronal ceroid lipofuscinoses (NCLs), a group of similar but genetically distinct lysosomal storage diseases. Clinical ratings scales measure long-term disease progression and response to treatment but clinically useful biomarkers have yet to be identified in these diseases. We have conducted proteomic analyses of brain and cerebrospinal fluid (CSF) from mouse models of the most frequently diagnosed NCL diseases: CLN1 (infantile NCL), CLN2 (classical late infantile NCL) and CLN3 (juvenile NCL). Samples were obtained at different stages of disease progression and proteins quantified using isobaric labeling. In total, 8303 and 4905 proteins were identified from brain and CSF, respectively. We also conduced label-free analyses of brain proteins that contained the mannose 6-phosphate lysosomal targeting modification. In general, we detect few changes at presymptomatic timepoints but later in disease, we detect multiple proteins whose expression is significantly altered in both brain and CSF of CLN1 and CLN2 animals. Many of these proteins are lysosomal in origin or are markers of neuroinflammation, potentially providing clues to underlying pathogenesis and providing promising candidates for further validation.


In Brief
Neuronal ceroid lipofuscinoses are fatal neurodegenerative lysosomal diseases. Protein biomarkers that reflect ongoing disease processes could provide a valuable readout to help optimize dose and dosing regimen and determine response to treatment. A quantitative proteomic analysis of brain and cerebrospinal fluid from mouse models of three NCL diseases and controls was conducted at time points corresponding to early and late stage disease. Resulting data were cross-referenced with previous studies in mouse and humans, identifying promising biomarker candidates for further study.
Treatments are emerging for the neuronal ceroid lipofuscinoses (NCLs), a group of similar but genetically distinct lysosomal storage diseases. Clinical ratings scales measure long-term disease progression and response to treatment but clinically useful biomarkers have yet to be identified in these diseases. We have conducted proteomic analyses of brain and cerebrospinal fluid (CSF) from mouse models of the most frequently diagnosed NCL diseases: CLN1 (infantile NCL), CLN2 (classical late infantile NCL) and CLN3 (juvenile NCL). Samples were obtained at different stages of disease progression and proteins quantified using isobaric labeling. In total, 8303 and 4905 proteins were identified from brain and CSF, respectively. We also conduced label-free analyses of brain proteins that contained the mannose 6-phosphate lysosomal targeting modification. In general, we detect few changes at presymptomatic timepoints but later in disease, we detect multiple proteins whose expression is significantly altered in both brain and CSF of CLN1 and CLN2 animals. Many of these proteins are lysosomal in origin or are markers of neuroinflammation, potentially providing clues to underlying pathogenesis and providing promising candidates for further validation. The neuronal ceroid lipofuscinoses (NCLs) 1 are a group of clinically-related but genetically distinct lysosomal diseases that typically affect children, resulting in premature death (1). Clinical trials have been conducted or are planned for the three most frequently encountered forms of NCL: infantile (INCL, CLN1 disease), late-infantile (LINCL, CLN2 disease) and juvenile (JNCL, CLN3 disease), referred to here as CLN1, CLN2 and CLN3, respectively. In addition, enzyme replacement therapy has been approved for CLN2 (2). In evaluating potential treatments for NCLs, clinical rating systems (3,4) are essential for evaluating long-term efficacy but biomarkers that allow measurement of short-term response to treatment would be extremely useful. Such biomarkers could help optimize dose and dosing regimen and provide clinical end point surrogates to accelerate approval of effective treatments and allow trials of ineffective treatments to be terminated in a timely manner.
The goal of this study was to extend earlier studies to identify proteins that are up-or down-regulated in CLN1, CLN2 and CLN3 that could serve as biomarkers for disease progression. Previously, we analyzed NCL cases using human autopsy brain samples and matching CSF samples from the same individuals (5). This study resulted in an extensive database of protein expression changes in these three NCL diseases but there are caveats in the interpretation of this data. First, because the NCLs are predominantly pediatric in nature, it was not possible to obtain age-matched brain and CSF samples for controls thus age-related changes could not be discounted. Second, because these were autopsy samples, there were concerns regarding sample collection, specifically with autolysis times ranging from 3-54 h and variable amounts of blood contamination of CSF. Third, samples could only be collected at the conclusion of the disease process thus longitudinal data or response to treatment could not be evaluated.
There are various biological samples in which changes in protein expression could be detectable. Given the extensive neuropathology and changes in cell-type associated with NCL disease progression, protein changes are detectable in brain but this is not clinically accessible thus we have considered biological fluids that might reflect changes in protein expression in brain. Blood-based tests would have widest clinical applicability but given the complexity of this proteome, detecting changes in brain-derived proteins may prove to be difficult. We have therefore focused on CSF as a biological fluid that is proximal to the brain which is also relatively accessible, especially in LINCL patients undergoing CSFmediated enzyme replacement therapy (2). A proportion of newly synthesized lysosomal proteins escape the lysosomal targeting pathway and are secreted, providing a route by which these intracellular proteins may be detected in the CSF. However, it is also possible that neuronal death results in leakage of intracellular proteins into the CSF.
The use of mouse models should provide complementary data to that obtained with human autopsy samples while circumventing many of the problems encountered with the latter. In this study, we have conducted a proteomic analysis of brain and CSF from mouse models of CLN1, CLN2 and CLN3. These analyses have provided an extensive catalogue of proteomic changes in the NCL mouse models, extending and corroborating findings from earlier studies and providing a discovery basis for future studies to identify clinically useful biomarkers.

EXPERIMENTAL PROCEDURES
Animals-Mice were maintained and used following protocols approved by the Rutgers University and Robert Wood Johnson Medical School Institutional Animal Care and Use Committee ("Preclinical evaluation of therapy in an animal model for LINCL," protocol I09 -0274-4). All mouse models were in a C57BL/6 background. Tpp1 Ϫ/Ϫ (CLN2) mice were as described previously (6,7) and were analyzed at 4 and ϳ19 weeks of age. Ppt1 Ϫ/Ϫ (CLN1) mice (8) were obtained from Dr. Mark Sands (Washington University School of Medicine, St Louis, MO) and were analyzed at 4 and ϳ26 weeks of age. Cln3 Ϫ/Ϫ (CLN3) mice (9) were obtained from Dr. David Pearce (Sanford Research, Sioux Falls) and were analyzed at 4 and 52 weeks of age. Wild-type animals were obtained from the Jackson Laboratory and were analyzed at 4 weeks (early stage control for all NCLs), 19 -25 weeks (late-stage control for CLN1 and CLN2) and 44 weeks (late-stage control for CLN3) of age. All groups contained ϳequal numbers of males and females. CSF was collected from the cisterna magna (10) after deep anesthesia induced by intraperitoneal administration of 0.1-0.2 ml of mixture of sodium pentobarbital and sodium phenytoin (Euthasol, Delmarva Laboratories, Milton, DE) and animals were euthanized by exsanguination following transcardiac perfusion with saline. CSF was visually inspected for signs of blood contamination (red or pink color) and if uncontaminated, centrifuged and frozen on dry ice. Brains were dissected and frozen on dry ice. Animal details are given in supplemental Table S1. Pathology of disease has been well characterized in CLN1, CLN2, and CLN3 mouse models (reviewed in (11,12)).
Experimental Design and Statistical Rationale -This study consists of several analyses of three NCL mouse models and wild-type controls at early and late-stage disease. Proteomic analysis of brain samples was using isobaric labeling with n ϭ 3 animals per experimental group, chosen to facilitate analysis of a total of 27 animals in three labeling experiments using a pool bridge sample. Samples were analyzed using reporter ion measurement at both MS2 and MS3 level. For CSF, we determined point estimates using pooled samples from 10 animals per experimental group because of limitations of protein availability. Mannose 6-phosphate (M6P) glycoprotein analysis was conducted using 5-6 animals per experimental group. Statistical approaches employed for each study are described in respective sections as detailed below.
Mass Spectrometry-The protein concentration of purified M6P glycoprotein samples was determined and 5 g per animal were used for in-gel trypsin (specificity, carboxyl side of K and R) digests (16) and analyzed in duplicate by LC-MS/MS (ϳ0.5 g/run) using an LTQ Orbitrap Velos mass spectrometer (Thermo Fisher) as described (17). Six or 5 animals per genotype and time point were analyzed for CLN1/3 and CLN2, respectively. Whole brain extracts were prepared by homogenization of frozen brain in PBS/0.2% Tween (1:10 w/v), and cleared supernatants prepared by centrifugation at 14,000 ϫ g at 4°C for 30 min. Protein concentrations were determined by Bradford protein assay (18) (Bio-Rad, Hercules, CA) and tryptic digests were generated by sequential trypsin and endoprotease LysC (specificity, carboxyl side of K) digestion using the FASP method (19). CSF samples (2.5 l) from 10 -15 animals per experimental group were pooled, protein concentration determined and 10 g of each pool digested with trypsin in-gel (16). TMT10-plex (Thermo Fisher Scientific, Waltham, MS) labeling of in-gel tryptic digests was conducted using manufacturer's methods. Pooled brain or CSF samples were fractionated by alkaline RP-HPLC and ϳ 20 fractions analyzed by nanospray LC-MS using column and gradient conditions as described (17). Labeled CSF samples were analyzed using a Thermo Q Exactive HF. Each cycle consisted of an MS1 scan (300 -2000 m/z, 120 K resolution) followed by data-dependent MS/MS on the 20 most intense peaks (normalized collision energy of 38%, 110 -2000 m/z, 30 K resolution). Labeled brain samples were analyzed using the Q Exactive HF as described above and on a Thermo Fusion Lumos Tribrid instrument using multinotch MS3 (20). One full MS scan was acquired in the Orbitrap (380 -1600 m/z, 120K resolution). For MS2, parent ions were isolated in the quadrupole (1.3 m/z isolation width), fragmented and then scanned in the ion trap (collision energy of 35%, 400 -1200 m/z). For MS3, parent ions were isolated in the quadrupole, and five MS2 fragment ions generated and isolated in the ion trap were fragmented by HCD (collision energy of 65%) and scanned in the Orbitrap detector (100 -500 m/z, 50K resolution).
Generation of Peak Lists-Peak lists were generated using Proteome Discoverer version 1.4 (Orbitrap Velos) or 2.2 (Q-Exactive HF and Lumos) and with no constraints with respect to retention time, charge state or peak count, and a minimum signal/noise of 1.5 and minimum and maximum precursor masses of 350 Da and 10,000 Da, respectively.
Database Search-A local implementation of the Global Proteome Machine version 2.2.1 (21,22) (GPM) Cyclone XE (Beavis Informatics Ltd., Winnipeg, Canada) with X!Tandem version ALANINE (2017.02.01) was used to search mass spectrometry data against a unique mouse genome assembly based on GRCh38.79 (Jan 2012; 41,214 total genes, 22592 protein coding genes) and a custom database containing laboratory contaminants and other non-murine proteins. GPM peptide and protein assignments are included in MAS-SIVE repository submissions (see below).
The following parameters were used to search data acquired using the Orbitrap Velos: fragment mass error, 0.4 Da; parent mass error, 10 ppm; maximum charge, ϩ4; minimum 15 peaks assigned; one missed cleavage allowed; cysteine carbamidomethylation was a constant modification and methionine oxidation was a variable modification throughout the search; tryptophan oxidation, asparagine deamidation, and glutamine deamidation were variable modifications during model refinement. Peptide false positive rate (FPR) generated as GPM output was 0.65%. Models are presented using peptide log(e) of Ϫ1.
The following parameters were used to search data acquired using the QExactive HF: fragment mass error, 20 ppm; parent mass error, 10 ppm; maximum charge, ϩ4; minimum 15 peaks assigned; one missed cleavage allowed; cysteine carbamidomethylation; TMT10 at lysine and N terminus were constant modifications; methionine oxidation was a variable modification throughout the search. Tryptophan oxidation, asparagine deamidation, glutamine deamidation, TMT10 at tyrosine and minus TMT10 at lysine and N terminus were variable modifications during model refinement. MS data acquired using the Lumos were analyzed using identical parameters except the MS2 fragment mass error was 0.4 Da. Models are presented using peptide log(e) of Ϫ1. Peptide FPR generated as GPM output was 0.08% and 0.69% for brain and CSF samples, respectively, analyzed on the QExactive and 1.02% for brain samples analyzed on the Lumos. Protein identification data for all experiments including percentage coverage and number of unique and total peptides are shown in supplemental Tables S2-S5.
Data Normalization and Filtering-Where possible, protein accession numbers were mapped to associated gene names using Ensembl Biomart (23) lookup tables: if no corresponding gene name could be identified, the Ensembl mouse protein identifier (ENSMUSP) was retained. Isobaric reporter ion intensities were extracted and corrected for isotopic purity with custom in-house scripts (https:// github.com/cgermain/IDEAA) using vendor-supplied channel crossover data, then merged with the X!Tandem output. Data for protein quantitation were obtained by filtering the raw data in the following sequential steps: (1) Gene products were filtered for protein assignments with at least two unique assigned peptides; (2) Data were normalized to total reporter ion intensity per channel to account for differences in protein quantity or efficiency of labeling; (3) Spectra were filtered for a minimum acceptable total label intensity (10,000 for Lumos, 20,000 for QE-HF); (4) Peptide-spectrum matches were filtered for fully tryptic cleavage with no missed sites, complete isobaric labeling of lysines and N termini, and a lack of posttranslational modifications deemed to increase variability (asparagine or glutamine deamidation, methionine dioxidation, tryptophan mono-and dioxidation, and isobaric labeling of tyrosine at positions other than the N terminus). Analysis of brain samples from 27 individual animals using TMT-10plex was conducted in three separate experiments (details are provided in supplemental Table S1). Each study was comprised of 9 animals, reserving the final channel for a pooled bridging sample derived from all 27 individual samples. Thus, after normalization to the total reporter ion signal per channel for each experiment (Step 2 above), spectra were filtered for a minimum intensity of 0.05 in the normalized reference channel and then normalized to the bridging sample for each experiment.
Statistics-For spectral count analysis, proteins analyzed were filtered for a minimum average spectral count of 5 per biological replicate. Data were analyzed using an in-house R program that modifies the package Quasiseq (24) for analysis of mass spectrometry data. Instead of independently estimating a variance term for each protein as before (5), this method (QuasiSpectral version 0.22, https://github.com/mooredf22/quasispectral) uses a Bayesian procedure to shrink these variance term estimates toward the common mean. This improves the estimate of the variance term by drawing on information from the estimates of all the proteins. Output is provided in terms of false-discovery based q-values calculated using the original method of Benjamini and Hochberg (25).
Relative protein expression in NCL brain and CSF compared with control animals was measured by isobaric labeling. For brain, data were filtered as described above and the mean normalized intensity of all spectra assigned to a given protein was used to obtain a point estimate for the relative abundance of that protein in each individual animal. Statistical analyses were conducted in GraphPad Prism Version 8 with multiple t-tests using the two-stage linear step-up procedure of Benjamini, Krieger and Yekutieli (26) to calculate significance using indicated false discovery rates as thresholds. Note that missing values were present when a given protein was not found in all three experiments, thus t-tests could only be conducted when spectra corresponding to given proteins were identified in at least two of both control and NCL samples. For CSF, statistical analyses were conducted as described for brain except normalized data from all spectra assigned to each protein were used to calculate significance of point estimates.
Human Data-A goal of this study was to cross-validate data obtained with mouse NCL models with data from human patients with the rationale that similar changes in both species would add confidence to conclusions. Data for isobaric label analysis of human brain and CSF was published previously (5) but new statistical analyses were conducted as described above.

RESULTS
Quantitative Proteomic Analysis of Mouse Brain-To identify differentially expressed proteins, we conducted untargeted analysis of total mouse brain extracts from NCL models and controls. Although brain is not an accessible sample, our rationale was that changes in brain-derived proteins might also be detected in CSF or plasma. NCL animals and controls were analyzed at a presymptomatic early time point (4 weeks) and at late-stage disease (CLN1, 26 weeks, CLN2, 19 weeks and CLN3, 52 weeks). Phenotype of the CLN3 mice is subtle but animals at 52 weeks age are reported to have behavioral deficits and detectable accumulation of storage material (27). Protein quantitation was conducted by both MS2 and MS3 measurement of reporter ions using Thermo Scientific Q Exactive HF or Lumos instruments, respectively (supplemental Table S6) and protein identification data are summarized in Table I. MS3 analysis using the Lumos provided ϳ15% more confident protein assignments (see Experimental Procedures) than the MS2 analysis on the Q Exactive HF, with ϳ11% more quantifiable proteins.
We initially compared wild-type groups to determine whether these could be combined to increase statistical power. Preliminary analyses (supplemental Fig. S1) indicated significant changes when data obtained using MS3 measurement of reporter ions from 4-week animals were compared with the 19 -25-and 44-week groups. However, there were no significant differences between the two older groups of wildtype animals therefore these were combined when analyzing data from the late-stage NCL animals. Thus, for the early time point, we compared 4-week NCL animals with age matched wild-type controls (n ϭ 3 per group). No significant changes were detected for any NCL (supplemental Fig. S2). For the late-timepoints, we compared NCL animals (n ϭ 3) with the combined wt group of 16 -25 and 44-week animals (n ϭ 6).
Disease-associated changes in protein expression are visualized as volcano plots in Figs. 1 and 2 with measurement of isobaric reporter ions measured at the MS2 and MS3 level, respectively. When reporter ion quantitation methods are compared, we find more significant changes and greater effect sizes with MS3 analysis, consistent with a reduction in the background signal and subsequent ratio compression associated with measurement of MS2 reporter ions (28). We compared fold-changes obtained by MS2 and MS3 reporter ion measurements for proteins in late stage CLN1 compared with wild-type controls given that the most dramatic changes in protein expression were found in this group. Significant changes detected by MS2 were a subset of those detected by MS3 but data were generally well correlated for proteins that were found to be significantly altered with, as expected, the MS3 approach having a greater dynamic range (Fig. 3). Significant changes detected in brain using MS3 reporter ion measurement in late-stage disease for the three NCLs are discussed below.
Numerous (445) significant changes in protein expression were detected in CLN1 brain at late-stage disease. Most proteins were elevated although palmitoyl protein thioesterase 1 (PPT1) was decreased as expected-mutations in the gene encoding this lysosomal protein underlie CLN1 (29) and the Ppt1 gene is targeted in the CLN1 mouse model (8).
Fewer proteins (91) were significantly altered in late-stage CLN2 although tripeptidyl peptidase 1 (TPP1) was decreased as expected -mutations in the gene encoding this lysosomal protease underlie CLN2 (30) and the Tpp1 gene is targeted in the CLN2 mouse model (6). In CLN3, only 4 proteins were significantly elevated. In Fig. 4, we determined the subcellular distribution of proteins found to be significantly elevated in each NCL disease using the predominant compartment assigned to each protein in Expt A of the PROLOCATE database (http://prolocate.cabm.rutgers.edu/index) of rat liver protein subcellular localizations (31). When all proteins are considered, those that are assigned to the lysosome are a relatively minor fraction of the total assigned proteins (ϳ5%). However, when considering proteins elevated in NCL, lysosomal proteins are overrepresented in CLN1 (ϳ17% lysosomal, p ϭ 1.3 ϫ 10 Ϫ15 ), CLN2 (ϳ31%, p ϭ 1.3 ϫ 10 Ϫ11 ) and CLN3 (75%, p ϭ 5.6 ϫ 10 Ϫ4 ). Lysosomal storage diseases are frequently associated with multiple changes in lysosomal enzyme activities that are secondary to the primary defect. The cause of such changes are generally unclear but they may be compensatory cellular responses to the accumulation of multiple substrates within the lysosome. There are examples of such secondary alterations in the NCLs. For example, PSAP is a major component of the storage material in INCL (32), ␤-glucuronidase (GUSB) is elevated in LINCL brain (33), and TPP1 (34), lysosomal acid phosphatase (35) and other lysosomal enzymes (33) are reported to be elevated in JNCL brain. Given that correction of the primary lysosomal defect by enzyme replacement or gene therapy is likely to diminish such responses, secondary lysosomal changes may represent useful biomarkers. In CLN1, CD63 protein, TPP1, CTSD and PSAP were elevated ϳ4-fold or higher and 14 other lysosomal proteins were significantly elevated more than 2-fold including scavenger receptor class B member 2 (SCARB2) and cathepsins S and Z (CTSS and CTSZ). In CLN2, lysosomal proteins CTSZ, CTSA, beta-hexosaminidase subunit beta (HEXB), serine carboxypeptidase 1 (SCPEP1) and SCARB2 were modestly (Ͻ 2.2-fold) elevated. In CLN3, only 4 proteins were significantly elevated and these included lysosomal proteins SCARB2, HEXB and TPP1.
Nicotinamide nucleotide transhydrogenase (NNT) was elevated Ͼ4-fold in CLN2. This is not a reflection of TPP1 deficiency -investigation subsequent to this analysis revealed that elevated NNT results from a difference in substrain between the CLN2 mice and wild-type controls. Although both are in C57BL6 genetic background, the controls were of the C57BL6/J substrain which has reduced NNT levels because of mutations within the Nnt gene (36) but we anticipate that the effect of NNT on CLN2 phenotype should be minimal. However, it is worth noting that the detection of elevated NNT in the CLN2 mouse, as well as decreased PPT1 and TPP1 in the CLN1 and CLN2 mice respectively, help validate the quantitative mass spectrometry approach. Comparison of TMT-MS3 data obtained in this study for specific proteins that are reported previously to be altered in NCL samples are shown in supplemental Table S7.  Table II lists 71 proteins that are significantly elevated in both CLN1 and CLN2 and two striking observations can be made from this comparison. As noted above, a significant proportion of these proteins are of lysosomal origin. In addition, 32 out of 71 proteins significantly elevated in both CLN1 and CLN2 brain play roles in immune system function includ-  ing inflammatory response and glial activation. Alterations in expression of proteins involved in inflammatory response have been implicated in CLN1, 2 and 3 (37)(38)(39)(40).
In our earlier study (5), we conducted quantitative mass spectrometry analysis of extracts from brain autopsy samples from patients with NCL diseases and here, we compare data obtained from animal models with NCL patient autopsy samples. Although there will be species-dependent differences and post-mortem sample collection will likely influence data obtained from human samples, our rationale is that conservation of proteomic changes between human and mouse in a disease-specific manner will provide additional confidence in potential biomarker candidates. For both CLN1 and CLN2, we detect good correlation between changes in protein expression in human and mouse brain (Fig. 5). Correlations between human and mouse are strongest when lysosomal proteins are considered alone (r 2 ϭ 0.5957 and 0.2226 for CLN1 and CLN2, respectively).
Visualizing M6P Glycoproteins in Mouse NCL Models-Given the changes in expression of lysosomal proteins detected by data-dependent analysis of whole brain extracts in the NCL mouse models, we conducted a focused analysis of this group of proteins. M6P is a carbohydrate modification that is primarily associated with newly-synthesized lumenal lysosomal proteins (also referred to as soluble or matrix lysosomal proteins) that is recognized by two receptors that direct intracellular targeting from the Golgi to an acidified, prelysosomal compartment (41). Note that M6P is not found on lysosomal membrane proteins. Although this modification is rapidly removed in most cell types by acid phosphatase 5    (42), it is retained in neurons (13,43). Purified M6P receptors (MPRs) can be used for affinity isolation (44,45) and visualization (14) of lysosomal proteins. In human NCL cases (5, 46), we found characteristic changes in the expression of M6P glycoproteins in brain samples, presumably reflecting lysosomal perturbation. A radiolabeled MPR derivative was used to broadly visualize M6P glycoproteins in brain extracts from NCL mice and controls (Fig. 6) to determine whether there were any gross changes in the expression of lumenal lysosomal proteins. In CLN1, there was a generalized increase in M6P glycoproteins at late-stage disease whereas a M6P glycoprotein of ϳ45-50 kDa was highly elevated at both early and late stage disease. This indicates that even at the presymptomatic 4-week time point, lysosomal changes because of PPT1 deficiency were already present in CLN1. Levels of the corresponding ϳ45-50 kDa protein band were reduced in the CLN2 samples, suggesting that it likely corresponds to TPP1. As with CLN1, there was a generalized increase in M6P glycoproteins detected in late-stage CLN2. M6P glycoprotein expression in early-stage CLN3 appeared like wild-type whereas at late-stage disease, the ϳ45-50 kDa band was highly elevated, as were bands corresponding to other M6P glycoproteins.
Quantitative Analysis of M6P Glycoproteins-M6P glycoproteins were purified from the brains of CLN1 (26-week old), CLN2 (19-week old) and CLN3 (52-week old) and agematched wild-type controls (n ϭ 5-6 animals per experimental group). Extracts were applied to columns of immobilized MPR and M6P glycoproteins specifically eluted by free M6P sugar prior to analysis by data-dependent mass spectrometry. Preparations of purified M6P glycoproteins primarily comprise well-characterized lumenal lysosomal proteins. How- ever, there are other proteins not assigned to the lysosomethese can represent previously unrecognized lysosomal proteins, non-lysosomal proteins that interact with and copurify with lysosomal M6P glycoproteins, or contaminants that are not fully removed during column washing (13,17). The lysosomal M6P glycoproteome has been extensively characterized (reviewed in (44)) and the presence of M6P directly demonstrated for most lysosomal proteins (47,48).
Individual proteins were quantified by spectral counting. Note that equal amounts of each sample were analyzed by LC-MS/MS thus this analysis does not account for changes in overall levels of M6P glycoproteins in vivo, providing information about relative rather than absolute expression levels. NCL samples were compared with controls and volcano plots representing the expression of lysosomal and other proteins in each NCL at early and late stage disease are shown in Fig. 7 and spectral count analyses are presented in supplemental Tables S8 -S10 (mouse) and supplemental Tables S11-S13 (human). (Fig. 6), there were numerous disease-associated alterations in the relative expression of M6P glycoproteins in the late-stage (26-week) CLN1 samples. M6P containing forms of TPP1, acid lipase (LIPA), CTSF and CTSD were elevated ϳ2-fold (Fig. 7). A low level of peptides corresponding to PPT1 were detected in the CLN1 brain extracts, possibly reflecting expression of some inactive protein from the targeted gene. Note that a protein of unknown function that has not been assigned to the lysosome, Von Willebrand factor A domain-containing protein 5A (VWA5A) was highly elevated (ϳ20-fold) in the M6P glycoprotein preparation from CLN1 brain. This protein was significantly elevated in CLN1 and CLN2 whole-brain extracts (Table II) and its presence in the purified M6P glycoprotein preparation may indicate association with other M6P glycoproteins during the purification process.

Consistent with blotting experiments
In CLN2, TPP1 was not detected whereas cathepsins H (CTSH) and C (CTSC) were both elevated ϳ 2-fold. Few significant changes were observed in CLN3 although TPP1 was elevated modestly (ϳ1.4 fold). This is consistent with the elevated levels in CLN1 of a 45-50 kDa glycoprotein that presumably corresponds to TPP1 (Fig. 6) and earlier studies reporting an increase in TPP1 activity in juvenile neuronal ceroid lipofuscinosis (34). Sphingomyelin phosphodiesterase 1 (SMPD1) was significantly decreased in CLN3.
Expression of mouse brain M6P glycoproteins at the latestage time point for each NCL was compared with the human brain samples (5) and we identified a number of conserved changes between both species (Fig. 8). For CLN1, there was correlation between relative changes in lysosomal M6P glycoprotein levels between human and mouse (r 2 ϭ 0.3391 with PPT1 censored in linear regression analysis). Several lysosomal proteases (CTSC, CTSH, CTSS, and TPP1) were elevated as well as proteins involved in lipid metabolism (LIPA and NPC2, a small cholesterol-binding protein involved in lysosomal cholesterol transport (49)). Protease inhibitor cystatin B (CSTB), which is not lysosomal and likely copurifies with cathepsins, was also elevated. Relative expression of an inhibitor of serine proteases (SERPINB8) was decreased-this protein is not lysosomal and it is not clear whether it contains M6P (a related protein, neuroserpin, has been experimentally shown to do so (47)). GDP-fucose protein O-fucosyltransferase 2 (POFUT2), a M6P glycoprotein that likely localizes to the endoplasmic reticulum (47), was also decreased in mouse and human brain M6P glycoproteins preparations. For CLN2 and CLN3, correlation between human and mouse M6P glycoprotein expression was less marked, likely reflecting the relatively modest magnitude of changes measured in both species. However, the M6P form of GUSB was relatively elevated in mouse and human brain in both diseases. SMPD1 was consistently lower in both mouse and human CLN3 samples.
In Fig. 9, we compare relative levels of lysosomal proteins compared with wild-type controls in purified M6P glycopro-FIG. 6. Brain M6P glycoproteins in NCL mouse models. Indicated brain extracts were fractionated by SDS-PAGE, transferred to nitrocellulose and M6P glycoproteins detected using a radioiodinated MPR derivative. Early disease stage is 4 weeks for all genotypes whereas late stage corresponds to 26 weeks for CLN1, 19 weeks for CLN2 and 52 weeks for CLN3. All wild-type controls were agematched to corresponding mutant animals. tein preparations and in whole brain extracts. In general, they are well correlated, indicating that the levels of lysosomal proteins containing M6P reflect overall levels.
Quantitative Proteomic Analysis of Mouse Cerebrospinal Fluid (CSF)-In order to identify potential biomarkers in CSF, deep proteome analysis using two-dimensional LC fractionation was required given that proteins of interest (e.g. lysosomal proteins) are extremely low-abundance components of the proteome. As typically only ϳ5 l of CSF (containing Ͻ1 g protein) could be collected from an individual animal, we used a pooling strategy to obtain enough material for the analysis (see Experimental Procedures). Although not afford-ing estimates of animal to animal variability, this allowed measurement of point estimates to cross-reference against data obtained from mouse brain and human samples (5) to help corroborate possible biomarkers.
In terms of proteome coverage, this approach was successful, with 4688 mouse CSF proteins (Table I) assigned with 2 or more unique peptide assignments. When compared with mouse brain analyzed using MS3 (supplemental Table S14), 3814/3312 proteins were identified/quantified in both sources, 3973/3862 were identified/quantified in brain alone whereas 874/699 proteins were identified/quantified in CSF alone. FIG. 7. Relative expression of purified brain M6P glycoproteins in late-stage NCL mouse models. Axes are truncated at -log 10 qvalue of 10 and log 2 NCL/control ratio of Ϫ4 to 4 and analyses were restricted to proteins with a minimum sum of 50 spectral counts. Significance (q-values) was calculated by unpaired t tests using the False Discovery Rate (FDR) two-stage step up method of Benjamini, Krieger and Yekutieli. Dashed lines indicate an FDR of 2% (y axis) and fold-change of 2-fold (x axis). Names of select proteins of interest are shown in red (lysosomal proteins) or black (non-lysosomal proteins). Animals were aged 26-weeks (CLN1 and age-matched controls, n ϭ 6 per group), 19 weeks (CLN2 and age-matched controls, n ϭ 5 per group) and 52-weeks (CLN3 and age-matched controls, n ϭ 6 per group).
FIG. 8. Comparison of relative changes in expression of human and mouse purified brain M6P glycoproteins. Red symbols, known lysosomal proteins; black symbols, other proteins that were specifically purified on immobilized M6P receptor in a previous study (17). Filled symbols, proteins that are significantly elevated (FDR 2%) in human and/or mouse; open symbols, proteins that are not significantly elevated in either species. PPT1 and TPP1 were excluded from linear regression analyses of CLN1 and CLN2, respectively. Expression of CSF proteins at different stages of disease was visualized by Volcano plots (Fig. 10) and results summarized in supplemental Tables S14 and S15. Note that in this case, significance of change was calculated by multiple t-tests at the peptide level for each protein point estimate, a similar approach to that used with human samples (5). There are individual proteins that can be used to benchmark the point estimates. For example, in both early and latestage CLN1, PPT1 was significantly reduced. In early and late stage CLN2, TPP1 was reduced and although not achieving significance, it was the most decreased lysosomal protein. NNT was increased by ϳ8and 3.5-fold in early and late CLN2, respectively.
In CLN1, there were numerous changes in protein expression identified in CSF. At early stage disease, there was a generalized but modest (Ͻ2-fold) increase in levels of lysosomal proteins. In late-stage CLN1 CSF, several lysosomal proteins were elevated 1.6-to 2.5-fold including HEXA, CTSZ, CTSD and CTSS. Fewer changes were observed in early stage CLN2 although several proteins were significantly elevated by more than 2-fold, including GFAP. At late-stage CLN2, numerous proteins were elevated compared with wildtype including ϳ20 lysosomal proteins e.g. CTSZ, CTSS, and CTSD. In supplemental Table S15, we measured the numbers of proteins that were significantly increased or reduced in the NCL CSF samples. It is worth noting that in CLN1 and CLN2 CSF, more non-lysosomal proteins were significantly decreased than increased in the NCL samples compared with controls. In contrast, lysosomal proteins were elevated in the NCL samples (except for PPT1 and TPP1 in CLN1 and CLN2, respectively). Overall, this correlates well with brain, where lysosomal proteins tended to be increased in the disease samples. In CLN3, very few changes were detected at either early or late-stage disease in CSF.
CSF Biomarker Candidates-Our overall strategy was to cross-reference results obtained from brain and CSF from the mouse NCL models. Our rationale was that proteins that show significant changes in CSF that reflect similar changes in brain would potentially provide the most informative biomarkers for brain disease. Several promising candidates emerging from this analysis that were altered in both brain and CSF are discussed below. Proteins that are significantly altered in late-stage CLN1 and CLN2 are shown in Tables III and IV, respectively. No proteins were significantly altered in both CLN3 brain and CSF. In Fig. 11, NCL-associated changes in protein expression in brain and CSF are compared. Overall, we do detect some proteins with changes in expression that are well correlated between brain and CSF, e.g. SERPINA3N, LYZ2, C4B and CTSZ are elevated in both brain and CSF. However, we also detect several proteins that are highlyelevated in CSF (e.g. NEFH, NEFM, CHIL1 and CALB1) but unchanged in brain. It is possible that expression of these proteins is altered in a disease-specific manner in localized regions of the brain that contribute significantly to the CSF proteome (e.g. cells of the choroid plexus) but not in the brain overall. It is worth noting that the most striking changes in CSF were detected in both CLN1 and CLN2. Conservation of these changes between CLN1 and CLN2 suggests that they may represent nonspecific responses to neurodegeneration (or common responses to lysosomal dysfunction) but also adds confidence in their selection as potential candidates. DISCUSSION The goal of this study was to conduct quantitative analyses of the brain and CSF proteomes of murine NCL models to provide potential biomarker candidates for further validation. A major rationale was that analysis of genetically homogeneous NCL mouse models would reduce some of the inherent data variability encountered in using human autopsy samples. The study focused on total proteins from brain and CSF analyzed at a presymptomatic time point and in late-stage disease while we also conducted targeted analysis of lysosomal proteins in brain. For global surveys, the use of isobaric labeling with extensive prefractionation allowed us to achieve extremely broad proteome coverage. For example, two previous extensive proteomic studies of mouse CSF (50,51) identified 566 and 715 proteins, respectively whereas in this study, we confidently (see EXPERIMENTAL PROCEDURES) identified ϳ4700 proteins in CSF samples, ϳ4000 of which were quantifiable. Reflecting similar observations in human brain and CSF samples (5), a wider range of proteomic changes were found in the samples derived from CLN1 mice compared with CLN2 and changes in expression tended to be of higher magnitude. In mice, this does not correlate with disease severity in terms of lifespan: median survival of the CLN2 mouse (ϳ 4 months (6)) is shorter than the CLN1 mouse (ϳ7 months (8)). It is possible that pathology is more widely distributed throughout the CLN1 mouse brain compared with the CLN2 brain, accounting for the broader proteomic changes that are detected when whole brain is analyzed. In contrast, the shorter lifespan of the CLN2 mouse may be a consequence of localized but highly damaging pathology. In the human study, changes in CLN2 and CLN3 were similar. This discrepancy with the mouse likely reflects that the human samples were obtained at autopsy from severely affected patients whereas the CLN3 mice have highly attenuated disease. Lifespan of the CLN3 in our colony is essentially normal (data not shown) and there is no overt end-stage phenotype.
Several other studies have conducted proteomic analyses of NCL mouse models. An analysis was conducted of samples from newborn CLN1 animals of brain membrane proteins that were enriched for acylated proteins with the rationale that palmitoylation would be increased in the absence of PPT1 (52). This study found that the acylated forms of 88 proteins were significantly altered in CLN1 compared with wild-type controls. Of these, we identified 83 in our analysis of whole brain homogenates. Except for TLN1 and ANXA2, which were significantly but modestly elevated in late-stage CLN1, we detected no significant changes in aggregate expression levels (i.e. both acylated and not acylated) of these proteins in this study. In another study (53), thalamus of CLN1 mice was analyzed using label-free methods, identifying 139 proteins that were significantly altered at one or more time points (presymptomatic, symptomatic and late-stage). We identified 106 of these proteins in this study, 7 of which were significantly (FDR Յ 2%) altered at late-stage CLN1, including APOE and GFAP. A detailed comparison of this dataset with the current study is shown in supplemental Table S16. In a recent study (54), proteomic analysis of lysosomal fractions from CLN3 mouse cerebellar cells revealed several significant lysosomal changes. We detected few significant changes associated with CLN3 in lysosomal or other proteins in whole brain extracts or M6P glycoprotein preparations from whole brain. This may reflect cell-type specific responses that may be diluted when whole brain is analyzed. However, elevated TPP1 is a noteworthy and consistent observation in CLN3 in the current and other studies (34,54).
Several proteins were highly elevated in both CLN1 and CLN2 (Table II). CD63 is a transmembrane protein located on the plasma membrane and intracellularly within the lysosomalendosomal system. Elevated ϳ4and 2-fold in CLN1 and CLN2 respectively, CD63 may play a role in leukocyte motility (55) and activation of mast cells and basophils (56). Glyco-  protein non-metastatic melanoma B (GPNMB) was elevated 15-and 3-fold in CLN1 and CLN2, respectively and is thought to attenuate inflammatory responses in microglia and astrocytes via interaction with CD44 (57), which was also significantly elevated in CLN1 and CLN2. GPNMB is also elevated in CSF from patients with the lysosomal disorder Gaucher disease (58) and has been implicated in Parkinson's disease. Lysozyme 2 (LYZ2) was significantly elevated ϳ7-fold and ϳ2-fold in CLN1 and CLN2 respectively. LYZ2 expression is highly-elevated in neurons from specific regions of the brain in a model of lysosomal storage disease Sanfilippo syndrome type B (59) and cortex from mouse models of mucopolysaccharidoses I and IIIB (60). Serpin family A3 (SERPINA3), a neuroprotective inhibitor of granzyme B (61), was elevated ϳ11-fold and ϳ5-fold in CLN1 and CLN2, respectively. Other proteins elevated in CLN1 and CLN2 associated with neuroinflammation and reactive gliosis include GFAP, C4B, C1QA, C1QB, C1QC, CD44, and S100A6. Phosphatidylinositol-4,5bisphosphate 3-kinase catalytic subunit gamma (PIK3CG) was significantly decreased in both CLN1 and CLN2 brain. PIK3CG has been implicated in attenuating neuroinflammatory responses in ischemia (62). Transglutaminase 1 (TGM1) is not directly implicated in immune function but was elevated ϳ19and 5-fold in CLN1 and CLN2, respectively. TGM1 mutations cause rare hereditary skin disorders in humans (63) but this protein has not previously been implicated in neurodegenerative disease. However, the related protein transglutaminase 2 (TGM2) has been proposed play a role in neurodegeneration, although whether it promotes or protects against neuronal death is not clear (64).
Most of these proteomic changes detected were only identified at late stage disease, suggesting that they were a consequence of progressive neurodegeneration or other pathological processes rather than an immediate response to the metabolic defect caused by mutation of the disease gene. To examine this further, we investigated expression of these proteins in related diseases. For example, in a microarray analysis of cortex from mouse models of mucopolysaccharidoses I and IIIB (60), which are also lysosomal storage diseases characterized by severe neurodegeneration, transcripts corresponding to various proteins were found to be elevated with 13 proteins that were increased in both MPSI and IIIB mouse models. Most of this common set are proteins that are expressed by and associated with macrophage function. Nine proteins of the MPSI/IIIB common set were detected in the NCL models (Table V). Remarkably, nearly all were significantly elevated in CLN1 and CLN2 brain at late stage disease and most were significantly elevated in CLN1 and CLN2 CSF (Table III). There was less concordance between the NCLs and proteins encoded by transcripts that were elevated in MPSIIIB alone although there was some agreement-PSMB8 and SERPINA3N were both elevated in late-stage CLN1 and CLN2 brain whereas LAG3 and SERPING1 were elevated in late-stage CLN2 CSF. None of the MPSI/IIIB common set were significantly elevated in CLN3. FIG. 11. Proteins that are significantly elevated (FDR < 2%) in both brain and CSF at late stage time-points. Red symbols, lysosomal proteins; black symbols, non-lysosomal proteins. Filled symbols, proteins that are significantly elevated in both brain and CSF; open symbols, proteins that are significantly elevated in brain or CSF.
Another transcriptomic study has examined gene expression in the fore-midbrain regions of the CLN2 mouse model at 12 weeks (65) and in Fig. 12, we compare protein expression changes measured by transcriptomic and proteomic analyses for the intersecting dataset of 437 proteins. Overall, correlation was not high, reflecting differences frequently seen when measuring protein expression at the protein and transcript level, although it is interesting to note that changes in cerebellum more closely correlated with proteomic measurements in whole brain (r 2 ϭ 0.140) than changes in mid/forebrain (r 2 ϭ 0.031). More importantly, there is extremely good concordance between transcriptomic and proteomic measurements in brain of several proteins that were markedly altered in the current proteomic study including LYZ2, TGM1, GPNMB, C4B, SERPINA3N, GFAP, and C1QC.
The overall goal of the study was to identify proteomic changes in brain that are reflected by parallel responses in CSF and several promising candidates have emerged from this study including CTSZ, SERPINA3N, LYZ2, TGM1, and C4B. In addition, we detected changes in expression of neurofilaments NEFH, NEFL, and NEFM which may also be promising candidates as biomarkers for NCL diseases given that elevated levels of these proteins have been well established in CSF and other samples from patients with various other neurodegenerative diseases (66). Many of these protein changes were found in both CLN1 and CLN2 and other LSDs suggesting that they may reflect neurodegeneration in general but there were some changes that may be disease-specific. For example, envoplakin (EPVL), which is suggested to interact with TGM1, is highly elevated in CLN2 CSF. The origin of these potential biomarkers in CSF remains unclear but one possibility is that these intracellular proteins are released into the CSF because of neurodegeneration. If this were the case, we would predict that abundant intracellular proteins would FIG. 12. Comparison of transcriptomic and proteomic changes in the CLN2 mouse model. Global brain transcriptome analyses were conducted using isolated fore/midbrain samples and cerebellum from 16-week CLN2 mice (65) and compared with corresponding changes measured by TMT-MS3 in whole brain extracts at 19 weeks. Log 2 transformed ratios for CLN2/control are truncated at 4 and Ϫ4. C1qb MPSIIIB ϩ be elevated in CSF in an NCL-dependent manner. We investigated this possibility by specifically examining levels of abundant intracellular proteins in NCL CSF. In supplemental Fig. S3, we compared protein abundance in brain extracts as determined by GPM log intensity scores (i.e. log base 10 of the sum of intensities of fragment ions, supplemental Table  S2), with the corresponding ratio between NCL samples and control in CSF. There is no clear correlation between abundance in brain and elevated levels in NCL CSF samples, suggesting that biomarkers of interest in CSF do not appear to be present simply because of generalized release of contents of brain cells because of cell death. Finally, it should be borne in mind that although CSF biomarkers for NCL diseases could potentially be extremely useful, responsive biomarkers may not actually exist. Thus, subsequent studies are needed to validate candidates identified in this study and use of mouse models will facilitate long-term studies to evaluate the response of biomarkers to disease progression. In addition, therapeutic approaches have now been developed that extend the lifespan of the CLN1 and CLN2 (67-71) mouse models thus the response of potential biomarkers to effective treatments could also be evaluated.

Acknowledgments-We thank Drs. Mark Sands and David
Pearce for generously providing CLN1 and CLN3 mouse models, respectively.

DATA AVAILABILITY
Raw files, mgf files, GPM result files, Excel workbooks listing protein and peptide assignments, and keys for data interpretation are archived in the MassIVE (http://massive.ucsd.edu) and ProteomeXchange (http://www.proteomexchange.org/) repositories: MSV000083797 (analysis of NCL mouse brain extracts using the Q Exactive HF), MSV000083827 (analysis of NCL mouse brain extracts using the Lumos), MSV000083619 (analysis of purified mouse brain M6P glycoproteins on the Q Exactive HF) and MSV000083828 (analysis of mouse CSF on the Q Exactive HF).