Urine proteome of autosomal dominant polycystic kidney disease patients

Autosomal dominant polycystic kidney disease (ADPKD) is responsible for 10% of cases of the end stage renal disease. Early diagnosis, especially of potential fast progressors would be of benefit for efficient planning of therapy. Urine excreted proteome has become a promising field of the search for marker patterns of renal diseases including ADPKD. Up to now however, only the low molecular weight fraction of ADPKD proteomic fingerprint was studied. The aim of our study was to characterize the higher molecular weight fraction of urinary proteome of ADPKD population in comparison to healthy controls as a part of a general effort aiming at exhaustive characterization of human urine proteome in health and disease, preceding establishment of clinically useful disease marker panel. We have analyzed the protein composition of urine retentate (>10 kDa cutoff) from 30 ADPKD patients and an appropriate healthy control group by means of a gel-free relative quantitation of a set of more than 1400 proteins. We have identified an ADPKD-characteristic footprint of 155 proteins significantly up- or downrepresented in the urine of ADPKD patients. We have found changes in proteins of complement system, apolipoproteins, serpins, several growth factors in addition to known collagens and extracellular matrix components. For a subset of these proteins we have confirmed the results using an alternative analytical technique. Obtained results provide basis for further characterization of pathomechanism underlying the observed differences and establishing the proteomic prognostic marker panel.


Background
Autosomal dominant polycystic kidney disease (ADPKD) is an inherited disorder affecting 1 in 1000 people and responsible for 10% of cases of the end stage renal disease (ESRD). Apart from renal manifestations, changes in other organs may be present, including a.o. liver cysts and intracranial aneurysms. The disease is divided into 2 types based on mutated gene (PKD1 in type 1 -85% of cases, and PKD2 in type 2). The type of the mutation has prognostic significance, as the average age of ESRD depends on the type of the disease and amounts to 53 years in type 1, and 69 years in type 2 [1].
As potential therapeutic methods for ADPKD are extensively tested in clinical trials [2][3][4][5], there is need for tools which enable early diagnosis and monitoring of therapy, especially non-invasive tests which would substitute kidney biopsy. Evaluation of changes in the peptidome and/or proteome may provide required information of pathophysiologic and clinical significance and may allow to establish future diagnostic or prognostic tools [6]. Urine, as well-accessible compartment, seems to be an ideal material for the search of a noninvasive prognostic and therapy monitoring tests in case of renal diseases. However, before urine proteome or peptidome markers become clinically useful, the urine proteome itself must be thoroughly characterized in a process of intense multi-stage research comparing different sample processing and analysis experimental laboratory settings. The aim of our research was to apply an in-depth proteomic bottom-up methodology to characterize the urinary proteome of ADPKD population in comparison to healthy controls.
Literature data concerning descriptive proteomics in ADPKD patients are limited. Mason et al. reported the proteomic analysis of four samples of cyst fluid obtained postoperatively from excised kidneys in patients with ESRD due to ADPKD [7]. Kistler et al. [8] were the first who attempted to identify the urinary biomarker profile of ADPKD, focusing on the low molecular (<15 kDa) proteome fraction. It was thus of interest to explore other sections of the proteome in search of the differences between ADPKD and control samples. In the present study we have analyzed the proteome of the retentate of the urine filtration on 10 kDa filters. Urine samples were collected from 30 patients and carefully matched 30 healthy controls selected without introducing any bias as to the age and sex of the subjects in the aim to obtain possibly general conclusions at the present stage of the research. To obtain best possible coverage of the urine proteome for relative quantitation the MS analysis was preceded by a two dimensional tryptic peptides separation, the first dimension being isoelectrofocusing (IEF) and the second dimensionreversed phase liquid chromatography (LC). In result a list of more than 1400 proteins was established, represented by more than two peptides and subjected to comparative analysis yielding a set of 155 proteins the levels of which were different in ADPKD and control samples.

Results
The proteome of urine collected from 30 ADPKD patients and 30 healthy subjects was compared using a combined IEF-LC-MS-MS/MS relative quantitation of iTRAQ labeled tryptic peptides. From each sample an equal amount of total protein obtained in urine retentate, after filtration on >10 kDa cutoff filters, was used for the analysis. This allowed to normalise the sample set with respect to different levels of dilution of proteome in each sample and to compare the proteome composition. After tryptic digestion peptides were subjected to iTRAQ labeling and IEF separation yielding 26 fractions, each analysed in a separate LC-MS-MS/MS run. IEF separation substantially increases the final protein coverage. However, the separate analysis of 60 samples including IEF step would require more than 1500 LC-MS-MS/MS runs which is not practical. To overcome this difficulty and retain an in depth insight into urine proteome we have used a partial pooling strategy. A set of 30 ADPKD samples was divided into 3 subsets, containing 10 samples each, which were pooled into three Disease Pooled Samples (DPS I, II and III). Similarly, control set was divided into three Control Pooled Samples (CPS I, II and III) retaining age and sex matching within the subsets. In addition two technical replicates of each DPS or CPS was prepared, further denoted A or B to assess the intragroup technical variability.
In result 4-plex iTRAQ labeled peptides from three replicates of pooled control and ADPKD samples, each of them represented by two technical replicates, were analyzed during IEF-LC-MS-MS/MS analysis of three IEF strips, as described in Methods section and Figure 1. For each of IEF strips the two pairs of control pooled samples, for instance CPS IA, and CPS IIA, and two disease pooled samples, for instance DPS IA and DPS IIA were mixed and subjected to IEF separation. IEF strips were cut into ca. 26 sections. Labeled peptides were eluted from each of the IEF strip sections and subjected to separate LC-MS-MS/MS runs. In result of qualitative analysis (peptide and protein identification) in each of the three IEF-LC-MS-MS/MS experiments 1327/1353/ 1582 proteins, respectively, were identified, each represented by more than two peptides, as shown in Table 1 and Figure 2. One-peptide hits were not taken into account in further quantitative analysis.
Qualitative results (protein lists) from three IEF-LC-MS-MS/MS experiments were combined, resulting in a dataset with all 1700 proteins identified by at least two peptides. Within this dataset protein identifications based on identical peptide sets were again grouped and each group was treated as a single protein cluster in further processing. Quantitative analysis was performed, as described in Methods section, with proteins represented by two or more peptides for which it was possible to calculate a protein ratio in at least one of IEF-LC-MS-MS/ MS experiments. The final combined protein list accepted for quantitation contained 1413 proteins. 1090 out of Figure 1 The study design. Combining 12 pooled samples into 3 IEF strips analysed in three LC-MS/MS experiments. these proteins are common for all replicates of the experiment.
The statistical analysis of the quantitative results of the three IEF-LC-MS-MS/MS experiments revealed 155 proteins that were differently populated (with q < 0.05) in the urine of ADPKD patients as compared to healthy controls. 148 of them were identified in each of the IEF-LC-MS-MS/MS experiment, 7in two replicates. The Differential Protein List (DPL) is presented in Table 2. The differences in protein levels (protein ratio) can be substantial, exceeding 5-fold in some cases. Among DP's, 103 proteins were downregulated, and 52 were upregulated in ADPKD. Principal Component Analysis of the results of this experiment ( Figure 3) shows a very good separation of the two study groups along the first component axis.
DPL was obtained as a result of pooling experiment and this approach allowed for in-depth (>1000 proteins) quantitative analysis of urine proteome. However, upon pooling the levels of proteins are averaged and the information on the variability of the amount of the protein among individual samples is lost. Therefore, to test the pooling experiment results using an alternative analytical approach (Multiple Reaction Monitoring -MRM), we have carried out the analysis of individual ADPKD and control samples for a subset of proteins from DPL. For this purpose a new set of samples (27 ADPKD vs. 25 healthy controls) was collected. Initially, a subset of 17 proteins from DPL, represented by the largest number of peptides was selected for MRM analysis. The number of proteins for MRM experiment is limited by the number of peptides that can be analysed in parallel in a single experiment. For these proteins their natural abundance peptides were searched for in urine control samples. Satisfactory results were obtained for 9 (represented by 14 peptides) out of 17 proteins, due to insufficient sensitivity for 8 remaining proteins. Next, 14 stable isotopically labeled (SIS internal standards) peptides were synthesized. Using SIS peptides the MS parameters for MRM experiment were optimised for each peptide. Comparison of the results of the MRM quantitation with the results of iTRAQ pooling experiment for these 9 proteins is shown in Table 3. For 8 proteins their upregulation in ADPKD was in agreement with the results of the pooling experiment, however for one protein (Cystatin-M) the q-value (0.13) exceeded the threshold of 0.05 making this result insignificant. For still another protein (Proactivator polypeptide) MRM results for the representing peptide EIVDSYLPVILDIIK indicate its smaller level in ADPKD whereas in pooling experiment the level averaged over 19 peptides was larger in ADPKD. This result is difficult to explain since the same peptide EIVDSYLPVILDIIK in iTRAQ pooling experiment shows increased level in ADPKD, so for this protein MRM does not confirm results from pooling analysis. However, for 8 out of 9 proteins the results of both approaches are in full qualitative agreement. On the quantitative level the agreement between the two methods in the case of majority of proteins is good, only for 2 proteins the ratio differences are larger (for Retinol binding protein (RBP) ratio 4.6 for MRM and 2.65 for iTRAQ). It has to be taken into account that the ratios are calculated in both methods using a different set of peptides, usually much larger for iTRAQ. These peptides may represent different regions of protein sequence and some of them may originate from proteolytic protein fragments, quite probable in urine proteome and not from intact proteins, which may justify the observed differences on quantitative level. An alternative explanation in case of RBP comes from higher variability level of this protein within ADPKD group, as illustrated in Figure 4. It shows that the upregulation of an average RBP level in ADPKD originates from a subset (6 samples out of 27) of     ADPKD samples in which the level of the protein is much larger (even by a factor of 25) than in remaining ADPKD samples, for which the levels are similar to control. Thus the average value in pooling experiment might easily be shifted by a single sample of exceptionally large content of RBP. Interestingly, the RBP levels correlate strongly with the progressor status of the patient, as illustrated by asterisks in Figure 4. This effect however requires further studies.

Discussion
Urine proteome is thought to contain renal disease fingerprints, but the pathology-related urine proteomics is still in its infancy. For ADPKD one study [8] was published in which a low molecular weight proteome fraction was studied and a set of potential disease markers was proposed. However, the most successful approach of global proteomic analyses of the total proteome, combining multiple steps of separation preceding quantitative  mass spectrometry was not yet carried out for ADPKD urine samples. To fill this gap, in our approach we have combined iTRAQ based quantitation with peptide isoelectrofocusing and reversed phase separation coupled with MS to obtain an in-depth urine proteome coverage of quantitative analysis of ADPKD vs. control sample set. Qualitative analysiscombined from three IEF-LC-MS-MS/MS experiments peptide identification brought a list of 14429 peptides assigned to proteins, corresponding to 1700 proteins, each identified by at least two peptides (Additional file 1). The median number of peptides per protein was 9.34. This list compares well with other attempts of qualitative characterization of human urine proteome in which the overall number of proteins depends strongly on the number of peptide/protein prefractionation steps used. 808 proteins were detected when the only separation step was LC preceding MS [9]. Adding 1D SDS PAGE separation step increased this number to 1102 [10] or 1543 [11] proteins represented by at least two peptides. Application of multidimensional separation strategy was shown to yield 2362 proteins [12], but the other group reports only 991 proteins [13]. Pairwise comparison of common proteins detected in our work yields 972 common proteins with Adachi [11], and 975 with 1823 proteins (including one-peptide hits) found by Li [13]. The number of common proteins detected in three publications [10,11,13] was compared in Figure 2 in Marimuthu's paper [10] yielding 658 common proteins of which 582 were detected in our work. This number correlates well with 587 proteins named "core urinary proteins" commonly detected in a large set of urine samples [9]. In conclusion our dataset represents very  well core urinary proteins, however the number of unique proteins found in this work is also high, indicating that the urine proteome complexity is far from being explored in-depth.
In a quantitative analysis a list of proteins (DPL) differentiating ADPKD vs. healthy control samples has been established. The partial pooling experiment indicated a list of 155 proteins of different level in the urine of ADPKD patients compared to healthy subjects. We have found alterations in the complement system, apolipoproteins, group of serine protease inhibitors, several growth factors, collagen chains, extracellular matrix components, transmembrane proteins, and many others. Many of them have never been linked to ADPKD in previous studies. Additionally, our results confirm the alterations observed in animal models, concerning, for example, apolipoproteins [14]. Some proteins included in DPL have previously been linked to the progression of cystic kidney disease, for example CD14 molecule [15].
In our study the application of a pre-separation of peptides by IEF and the analysis of 26 fractions of each gel allowed to greatly increase the number of proteins that could be subjected to quantitation. However, each IEF-LC-MS-MS/MS experiment required 26 LC-MS-MS/ MS runs corresponding to 78 hours of spectrometer time, so it could not be carried out separately for 60 samples due to exceedingly long time of the analysis required (4500 hours, nearly 200 days of spectrometer time would be required). This justified the pooling approach which combined the information contained in all samples and allowed its in-depth analysis in a reasonable time. However, when the protein ratios are compared after pooling the information on the scatter of protein ratios among the individual, pooled samples is lost, and the statistical validity of obtained differences cannot be properly assessed. For that reason we have used MRM technique for a subset of nine DPL proteins, which confirmed the results of the pooling experiment, only for one protein the confirmatory analysis was not successful. In general the differential list obtained from pooling experiment is thus a candidate list, each protein of interest from the list has to be measured in individual samples in a separate experiment by an independent method.
Only a few cases of proteomic analysis of ADPKD tissue samples can be found in the literature. Mason et al. reported the proteomic analysis of four samples of cyst fluid obtained postoperatively from excised kidneys in patients with ESRD due to ADPKD [7]. The authors identified 44 proteins that were found in at least two cysts and might be of mechanistic or diagnostic interest in ADPKD. Similarly to our results, the list of these proteins included complement factors, apolipoprotein A-I, pigment epithelium-derived factor (PEDF) and others. However, the potential diagnostic utility of cyst fluid proteomics is highly limited, and in our opinion, it is the urine that may become the diagnostic material in clinical practice.
Kistler et al. were the first who attempted to identify the urinary biomarker profile of ADPKD [8]. Due to application of CE-MS technology the range of molecular masses under study was thus limited to less than 15 kDa, whereas in our work proteins of masses larger than 10 kDa were studied. This explains the differences in the lists of differentiating proteins which in case of Kistler et al. were limited mainly to collagen fragments and uromodulin peptides. Therefore, our DPL may be regarded as a complete list of ADPKD-specific urinary proteins, independent on kidney function.
Our results provide the first step of the analysis, specific DPL proteins of interest should be now verified by a targeted analysis on non-pooled samples on much wider sample sets. Moreover, the specificity of these results should be determined in studies including patients with chronic kidney disease of distinct origin. Additionally, it should be determined whether the type of mutation (PKD1 or PKD2) impacts the proteome. Finally, methods of sample collection and preparation, laboratory procedures, and data analysis must be optimized. After verification, our results may in future serve as a basis for mechanistic studies and, therefore, may ultimately lead to discovery of new therapeutic targets in ADPKD. Additionally, the set of urinary biomarkers may be used in the future for early diagnosis of ADPKD.

Conclusions
The urine proteome of ADPKD patients differs significantly from the urine proteome of healthy subjects and may become the clinical tool used for early diagnosis of ADPKD. The pathophysiological informations obtained in presented study may become a basis for the development of new therapies.

Urine samples
Thirty ADPKD patients diagnosed with abdominal ultrasound [16] were enrolled into the study group. The control group consisted of 30 healthy volunteers matched according to the sex and age. The demographic data of both groups are summarized in Table 4. The inclusion criteria for the study group were the diagnosis of ADPKD and age ≥18 years. The inclusion criteria for the control group included: absence of ADPKD, age ≥18 years, and body mass index (BMI) between 21 and 26. The exclusion criteria for both groups included especially: current infection of urinary tract, macroscopic hematuria, diabetes mellitus, malignancy of urinary tract or generalized malignancy of other system, and status post organ transplantation.
The study protocol was approved by the local ethics committee. Informed consent was obtained from all participants. The study was performed in accordance with the Declaration of Helsinki Principles.

Urine collection
Samples were collected from 30 patients and 30 healthy donors using a uniform protocol. The second or thirdmorning mid-stream urine was collected from all participants at a time of 1 and 3 hours after previous micturition. Sterile urine containers were used for the collection of samples. pH of the samples was stabilized at 7.2 by addition of 1/10 th vol. of 1 M HEPES pH 7.2 immediately after collection. Further sample preparation steps were carried out within 1 hour after collection during which the sample was kept at room temperature. Samples were vortexed for 2 minutes, centrifuged (3000xg, room temp.) for 10 minutes to clear the debris, filtered through the 0.4 μm filter (Rotilabo-Spritzenfilter, P819.1, Roth) and portioned into 1 ml aliquots, to avoid freeze/thaw cycles in repeated experiments of the same sample. Sample aliquots were stored at −80°C for further use. The protocol used follows the urine proteomic sample collection recommendations [17].

Sample filtration
10 kDa cutoff membrane filters (Amicon Ultra-0.5, UFC501096, Millipore) were washed twice with MilliQ water prior to use. Urine was centrifuged through the membrane at 14000xg for 15 minutes. Next, 500 μl MQ was added to the retentate and centrifugation step was repeated. To recover the concentrated and desalted sample, the filter was placed upside down in a clean micro centrifuge tube and centrifuged for 2 minutes at 1000xg. The protein concentration was measured by the Bradford method. Aliquots of samples were stored at −80°C.

Pooling samples and iTRAQ-labelled samples study design
When indicated, the aliquots (corresponding to 10 μg of protein) of 10 urine samples were pooled. Only samples from a single study group (disease or control) were pooled. 30 control (healthy) samples were divided into three control pooled samples (CPS's I, II and III) and similarly, 30 ADPKD samples were divided into three disease pooled samples (DPS's I, II and III). Age and sex matching was preserved within the three pairs of pooled sample groups. Three CPS's and three DPS's were obtained in two technical replicates (marked A and B) each, making a set of 12 pooled samples to be compared after iTRAQ labeling. As 4-plex iTRAQ was used, 2 CPS and 2 DPS samples were compared in one LC-MS/MS experiment. To analyze 12 samples we have carried out a set of 3 independent LC-MS/MS experiments. The study design is illustrated in Figure 1.

iTRAQ labeling
Before labeling, protein aliquots were evaporated to dryness in a speedvac, dissolved in 20 μl Dissolution Buffer with 0.1% SDS, reduced with TCEP, cysteine-blocked with MMTS (reagents were provided with the iTRAQ kit from Applied Biosystems), and digested overnight with trypsin (Promega). The CPS and DPS samples were differentially labeled with one of the four iTRAQ tags (114, 115 for CPS samples and 116, 117 for DPS samples) for 1 h according to the iTRAQ manufacturer's protocol. Next, the reaction was quenched by adding 100 μl H 2 O.
For each of the three LC-MS/MS experiments 2 CPS and 2 DPS iTRAQ-labeled samples were combined and 340 μl buffer was added [8 M urea, 0.2% IPG buffer pH 3-11 NL (GE Healthcare), 0.002% bromophenol blue in 50 mM Tris-HCl, pH 8.0]. The solution was applied to 18 cm IPG strip with 3-11 NL pH gradients (GE Healthcare) for isoelectrofocusing (IEF): 340 μl of sample/strip, corresponding to 400 μg protein. The IPG strip was rehydrated overnight in an IPG box (GE Healthcare). The next day, the strips were isoelectrofocused using a Ettan IPGphor 3 electrophoresis system (GE Healthcare) as follows. Two steps of electrophoresis were used. The first step consisted of a 5 h pre-run at 500 V. During this step, the conductivity decreases, and salts and other highly conductive compounds move towards the electrode (anode). Second, a long gradient focusing program was used: 1 h at 500 V, 9 h at 1000 V and 30 h at 8000 V (the final current was 5 μA).
After focusing, the strip was removed from the tray and the overlay oil was blotted with a paper tissue. Strip was wrapped in a parafilm and stored at −80°C. The strip was placed on a tray cooled with dry ice and cut into sections of ca. 7 mm. The sections were transferred into individual 1.5-ml siliconized Eppendorf tubes. In all,

Mass spectrometry -Qualitative MS/MS data processing
The acquired MS/MS data were pre-processed with Mascot Distiller (version 2.3.2.0, Matrix Science, London, UK). The database search of the data using MASCOT search engine was carried out in a three-step procedure (described elsewhere [18], and in short in Additional file 2) to calculate MS and MS/MS measurements errors and to recalibrate the data for the repeated MASCOT search. The initial search parameters were set as follows: enzyme, semi-trypsin; fixed modification, cysteine modification by MMTS as well as iTRAQ labeling of the N-terminus of peptides and of lysine side chains; variable modifications -oxidation (M); max missed cleavages -1, Swiss-Prot database with the taxonomy restricted to Homo sapiens (20273 sequences). For the repeated search the recalibrated data from all gel sections were merged into one input file and searched using MASCOT against a Swiss-Prot database supplemented with the decoy database to obtain the statistical assessment of the identification of each peptide by a joined target/decoy database search strategy [19]. This procedure provided q-value estimates for each peptide spectrum match (PSM) in the dataset. All PSMs with q-values > 0.01 were removed from further analysis. A protein was regarded as confidently identified if at least two peptides of this protein were found. Proteins identified by a subset of peptides from another protein were excluded from analysis. Proteins that exactly matched the same set of peptides were clustered into one group/cluster. MS/MS spectra of peptides meeting the above acceptance criteria were subjected to quantitative analysis step to obtain a list (Differential Protein List) of proteins differentially populated between a set of three CPS's and three DPS's.

iTRAQ quantitative analysis
For protein quantitation only unique peptides (i.e. peptides belonging only to one protein/cluster) were included. In the first step, using MascotDistiller program iTRAQ reporter ion peaks were detected in the preprocessed MS/MS spectra; next, their intensities were corrected for isotope impurity using the information provided by the reagent manufacturer. For each spectrum a geometric mean of two reporter ion intensities belonging to one study group (CPS or DPS) were separately calculated. A ratio of these mean values (CPS mean divided by DPS mean) was reported as peptide ratio. If more than one spectrum was obtained for a peptide in a single LC-MS/MS experiment, median peptide ratio value from all spectra was used. Prior to the protein ratio calculations, peptide ratios were median-normalized to remove systematic bias. Proteins ratios were calculated as the median ratio of their peptide's ratios. The statistical significance of a single protein ratio was assessed by an in house program Diffprot [20]. In this program the statistical validity of regulation/expression status of the protein represented by its calculated protein ratio is based solely on the statistical analysis of the set of all MS/MS datasets from a given experiment, without assumptions on the character of the distribution of peptide ratios in a dataset (e.g. its normality). In brief, the probability of obtaining a given protein ratio by a random selection from the dataset is tested by multiple rounds of protein ratio calculation for a large number of permuted decoy datasets in which the peptide-protein assignment has been scrambled. Calculated p-values were adjusted for multiple testing using a FDR-controlling procedure, yielding protein ratio q-values reported in Table 2.
Quantitative analysis of selected proteins using multiple reaction monitoring We have selected a subset of proteins from the Differential Protein List shown in Table 2 for further analysis of non-pooled, individual samples using the multiple reaction monitoring (MRM) technique, used in conjunction with stable-isotope-labeled peptide standards (SIS). The presence of natural MRM transitions for peptides from 17 proteins was first checked in samples of urine collected separately from healthy volunteers. Only for nine proteins the natural transitions corresponding to selected peptides yielded satisfactory results and SIS peptides were generated for these. The transitions for peptides corresponding to the remaining eight