C-terminomics Screen for Natural Substrates of Cytosolic Carboxypeptidase 1 Reveals Processing of Acidic Protein C termini*

Cytosolic carboxypeptidases (CCPs) constitute a new subfamily of M14 metallocarboxypeptidases associated to axonal regeneration and neuronal degeneration, among others. CCPs are deglutamylating enzymes, able to catalyze the shortening of polyglutamate side-chains and the gene-encoded C termini of tubulin, telokin, and myosin light chain kinase. The functions of these enzymes are not entirely understood, in part because of the lack of information about C-terminal protein processing in the cell and its functional implications. By means of C-terminal COFRADIC, a positional proteomics approach, we searched for cellular substrates targets of CCP1, the most relevant member of this family. We here identified seven new putative CCP1 protein substrates, including ribosomal proteins, translation factors, and high mobility group proteins. Furthermore, we showed for the first time that CCP1 processes both glutamates as well as C-terminal aspartates. The implication of these C termini in molecular interactions furthermore suggests that CCP1-mediated shortening of acidic protein tails might regulate protein–protein and protein–DNA interactions.

associated with neurodegeneration (1)(2)(3)(4). Human and mouse genomes encode for six CCP members (CCP1-6), all related to Nna1 (nervous system nuclear protein induced by axotomy), the first reported and best characterized member of the CCP subfamily. Nna1, afterward renamed CCP1, was identified when screening for genes up-regulated during axonal regeneration and found to be homologous to members of the MCP family (5). This finding suggested a role of CCPs in axonal regeneration, which was further substantiated by studies in C. elegans showing that CCP6 is required for axon regrowth after axotomy (6). Loss-of-function mutations in the Ccp1/Nna1 gene lead to an ataxic phenotype in Purkinje cell degeneration (pcd) mice (7), characterized by a late-onset degeneration of cerebellar Purkinje cells (3-5 weeks of age) (8). Next to displaying reproductive abnormalities, these mice also exhibit a selective degeneration of other cell types in the nervous system; that is, thalamic neurons, retinal photoreceptors, and olfactory bulb mitral cells (9 -11). These findings put CCP forward as an unexpected molecular link between neuronal degeneration and regeneration. Although the molecular pathways through which the loss of CCP1 function leads to neuronal degeneration in pcd mice are not completely understood, different mechanisms have been proposed such as endoplasmic reticulum stress (12), dysfunctional mitochondria (13), dysregulation of microtubule stability by abnormal levels of microtubule associated proteins (MAPs) (14), increased levels of tubulin polyglutamylation (15), and a progressive transcriptional silencing caused by the accumulation of DNA lesions (16 -18).
More recently, a primary function for CCPs in tubulin processing was shown. Four members of the CCP subfamily, CCPs 1, 4, 5, and 6, specifically remove the gene-encoded C-terminal glutamate residue of detyrosinated tubulin (15). Moreover, CCPs 1, 4, and 6 shorten post-translationally added polyglutamate side-chains on tubulin (15). CCP5 on the other hand preferentially removes the branching point of a polyglutamylation event (15,19,20). It was suggested that abnormally high polyglutamylation levels, caused by the lack of functional CCP1, are responsible for the neuronal degeneration in pcd mice (15). A recent study further supports the link between CCPs and microtubules as the taxonomic dis-tribution of CCPs suggests their primary function to be associated with cilia and basal bodies (2). Likewise, the C. elegans CCP1 ortholog, CCPP-1, was described to regulate the structural integrity of microtubules in sensory cilia and transport along them (21). CCP1 was also found to C-terminally process telokin and myosin light chain kinase 1 (MLCK1) (15), regulators of myosin function. As such, it was postulated that CCP1 might cleave additional substrates and identifying these might help to elucidate its function and understand some of the anomalies observed in pcd mice. For instance, a nuclear localization for CCP1 was described, but its possible function in nuclear processes remains unclear (5). In this respect, Baltanas et al. observed extensive chromatin reorganization in pcd mice, leading to the accumulation of unrepaired DNA and ultimately to Purkinje cells death (16).
Here, we performed a proteome-wide screen for CCP1 substrates in HEK293T cells using C-terminal COFRADIC, a recently developed C-terminomics approach (22). Overall, we identified seven new putative CCP1 substrates, all of them harboring acidic amino acids in their gene-encoded C terminus. Validation of two of these substrates confirmed the processing events identified by C-terminomics and proved that these are the result of the direct action of CCP1 on these proteins. Analysis of the role of these acidic tails in proteins like the high mobility group proteins B (HMGBs), led us to speculate that CCP1, by shortening these tails, might regulate protein-protein and protein-DNA interactions.
Molecular Cloning and Transfection-The full-length cDNA sequence of human CCP1 (Uniprot Q9UPW5-1) was cloned into the pOPINFS vector (23). Both a Strep-tag II (24) and a hemagglutinin (HA) epitope were introduced at the C terminus of CCP1. The CCP1 E270Q (using the numbering system of bovine CPA, E1102Q in human CCP1) mutant was generated by QuikChange site-directed mutagenesis according to the manufacturer's protocol (Stratagene, La Jolla, CA). The full-length cDNA sequences of TRAF-type zinc finger domain-containing protein (TRAD1, Uniprot O14545) and human high mobility group protein B3 (HMGB3, Uniprot O15347) were cloned into the pTriEx-6 vector (EMD Millipore, Billerica, MA) and as a result both a Strep-tag and a HA-tag were introduced on their N termini. Fulllength coding sequences of human high mobility group protein B1 (HMGB1, Uniprot P09429) and B2 (HMGB2, Uniprot P26583) were subcloned into the MBU-I-8856 vector (kindly provided by Prof. Sven Eyckerman, VIB-Ghent University, Belgium) and this SR␣ promoter driven eukaryotic expression vector introduces a Myc-tag on the N termini of these proteins. Supplemental Fig. S1 illustrates the constructs generated for this study.
DNA transfections were carried out using linear 25-kDa polyethylenimine (PEI) (PolySciences, Warrington, PA). Briefly, DNA was mixed with PEI in a ratio of 1:3 and incubated in the presence of serum-free medium for 15 min at room temperature. At 40 -50% confluency, HEK293T cells were exposed to the transfection complex at 1.4 g DNA per ml of culture for 60 h. HEK293F cells, at 10 6 cells per ml, were exposed to the transfection complex at 1 g DNA per ml of culture for 60 h.
Expression and Purification of Recombinant Proteins-CCP1, HMGB3, and TRAD1 were expressed in mammalian cells and affinity purified with a Strep-Tactin affinity column (IBA GmbH, Gö ttingen, Germany). Briefly, HEK293F cells were transfected as described above with the constructs encoding the Strep-tagged version of these proteins. After 60 h, cells were washed and resuspended at 2 ϫ 10 7 cells per ml of binding buffer (100 mM Tris-HCl, pH 8.0, 150 mM NaCl supplemented, with EDTA-free protease inhibitor mixture Set III (EMD Millipore)). Cells were then subjected to three rounds of freeze-thaw lysis and cell lysates were cleared by centrifugation at 16,000 ϫ g for 5 min at 4°C. The cleared lysate was loaded on a Strep-Tactin column equilibrated with binding buffer. Following a washing step with seven column volumes of binding buffer, Strep-tagged proteins were eluted and fractionated using elution buffer (binding buffer supplemented with 2.5 mM desthiobiotin). The purest eluted fractions as analyzed by SDS-PAGE were pooled.
In Vitro Substrate Validation Assay-Affinity purified CCP1 was incubated at 5 ng/l for 2 h or 16 h at 37°C with affinity purified TRAD1 or HMGB3 at 25 ng/l in CCP1 buffer (80 mM PIPES, pH 6.8, 1 mM MgCl 2 , complemented with EDTA-free protease inhibitor mixture Set III (EMD Millipore)). In the control experiment, 10 mM o-phenanthroline, a metalloprotease inhibitor, was added to the reaction (15,25).
Protein analysis of intact and processed HMGB3 was performed by nano LC-MS on a Waters nano-Acquity HPLC (Waters Corporation, Milford, MA) in-line coupled to an ESI-quadrupole (Q)-time-of-flight (TOF) (Waters Corporation) mass spectrometer. Briefly, 10 l of HMGB3 solution at 10 ng of protein/l was first loaded on a trapping column (made in-house, 100 m internal diameter (I.D.) ϫ 20 mm length, 5 m Reprosil-Pur Basic-C4-HD beads, Dr. Maisch, Ammerbuch-Entringen, Germany) at a flow rate of 10 l/min with 0.1% TFA in 2% acetonitrile. After 4 min, the trapping column was placed in-line with the analytical column (made in-house, 75 m I.D. ϫ 170 mm length, 3 m Reprosil-Pur Basic-C4-HD beads, Dr. Maisch). The proteins were eluted with a linear gradient of 1.7% of acetonitrile increase per min at a flow rate of 300 nL/min. MS spectra were calibrated using the continuous lock mass correction with Leu-enkephalin, obtaining an accuracy of 500 ppm. The MaxEnt 1 deconvolution algorithm of the MassLynx software (Waters Corporation) was used to transform the multiple-charged ions into a MaxEnt spectrum on a real-mass scale.
Peptide mass fingerprinting of purified recombinant HMGB3 was performed by in-gel tryptic digestion followed by MALDI-TOF-MS analysis using an UltrafleXtreme MALDI-TOF mass spectrometer (Bruker Daltonics, Bremen, Germany) as previously described (26). The Flexanalysis software (version 3.3, Bruker Daltonics) was used to create peak lists and the Biotools software package (version 3.2, Bruker Daltonics) was used for interpretation of the MS spectra (Matrix Science, London, UK) and for the identification of tryptic maps. Sequence Editor 3.2 (Biotools 3.2, Bruker Daltonics) was used to compare the fingerprint of the experimental tryptic peptides map with the tryptic peptides maps of an in silico trypsin-digest, in search for unmatched peaks hinting to post-translational modifications (PTMs). The following criteria were used for the in silico enzymatic digestion: up to three missed tryptic cleavages and only single peptide charges were allowed, 150 ppm was set as precursor mass tolerance and acetylation of the protein N terminus, oxidation of methionines, and carbamidomethylation of cysteine residues were set as variable modifications.
Immunoblot-For immunoblotting, transfected cells were resuspended in phosphate saline buffer, supplemented with 0.1% or 1% Nonidet P-40, and Complete Protease Inhibitor Mixture (Roche Diagnostics GmbH, Mannheim, Germany). The suspension was subjected to three rounds of freeze-thaw lysis and cell lysates were cleared by centrifugation at 16,000 ϫ g for 7 min at 4°C. Cleared extracts were mixed with 4ϫ SDS-PAGE loading buffer and boiled for 10 min at 95°C. The pellet after centrifugation was boiled in 1ϫ SDS-PAGE to generate the insoluble fraction samples. For the whole cell lysate samples, 4ϫ SDS-PAGE loading buffer was added to the cell lysate without previous centrifugation. Equal amounts of each lysate or fraction were analyzed by SDS-PAGE and the separated proteins were electroblotted onto PVDF membranes (EMD Millipore), which were subsequently blocked with 5% skimmed milk for 2 h. The PVDF membranes were then incubated with primary antibody for 16 h at 4°C. The following primary antibodies were used: anti-⌬2-tubulin For immunoblots visualized using an Odyssey infrared imager (LI-COR Biosciences, Lincoln, NE) a similar protocol was performed, except that membranes were blocked with Odyssey blocking buffer (LI-COR Biosciences) and the secondary antibodies used were goat anti-mouse-IRDye 680 and goat anti-rabbit-IRDye 800CW (LI-COR Biosciences).
Isolation of N-terminal and C-terminal Peptides-HEK293T cells were washed in PBS and resuspended at 2 ϫ 10 7 cells per ml in 50 mM sodium phosphate buffer, pH 8.0, 100 mM NaCl, and 0.5 mM EDTA supplemented with Complete Protease Inhibitor Mixture (Roche Diagnostics GmbH). Cells were subjected to three rounds of freeze-thaw lysis and cleared by centrifugation at 16,000 ϫ g for 5 min at 4°C. Protein concentrations were determined with Bio-Rad's Protein Assay according to the manufacturer's instructions, and 1 ml of cleared cell lysate containing 2.5 mg of proteins was processed as follows. Guanidinium hydrochloride was added to the cell lysates to a final concentration of 4 M to denature proteins. Primary amines were acetylated for 2 h at 30°C using sulfo-N-hydroxysuccinimide D 3acetate to a final concentration of 10 mM. O-acetylation was reversed by adding hydroxylamine to a final concentration of 40 mM and incubation for 10 min at 30°C. The acetylation reagent was further quenched by the addition of 20 mM glycine for 10 min at 30°C. Excess reagents were removed by buffer exchange to 1.5 ml of 10 mM NH 4 HCO 3 , pH 7.9, using 1 ml Illustra NAP TM -10 columns (GE Healthcare). After heating the proteins for 5 min at 95°C followed by placing them on ice for 5 min, N-acetylated proteins were digested overnight at 37°C with sequencing-grade, modified trypsin (Promega, Madison, WI), at an enzyme/substrate ratio of 1/100 (w/w). Subsequent steps of the C-terminal COFRADIC protocol were performed as described previously (22). As a result of the RP-HPLC separation, fractionation, and pooling, a total of 96 samples were generated. Samples were dried and redissolved in 20 l of 2 mM TCEP and 2% acetonitrile for subsequent LC-MS/MS analysis.
LC-MS/MS Analysis-LC-MS/MS analysis of 2 l of the sample mixtures was performed using an Ultimate 3000 RSLC nano LC system (Dionex, Amsterdam, The Netherlands) in-line connected to a LTQ Orbitrap Velos (Thermo Fisher, Bremen, Germany). LC-MS/MS analysis and generation of MS/MS peak lists were performed as previously described (27). Mascot generic files (mgf) were created using the Mascot Distiller software (version 2.2.1.0, Matrix Science). The generated MS/MS peak lists were then searched with Mascot using the Mascot Daemon interface (version 2.3, Matrix Science). These searches were performed in the Swiss-Prot database with taxonomy set to human (2011_05 UniProtKB/Swiss-Prot database containing 20,286 human entries). The following search parameters were set: spectra were searched with semi-ArgC/P enzyme settings, allowing no missed cleavages. Mass tolerance on the precursor ion was set to 10 ppm and on fragment ions to 0.5 Da. In addition, Mascot's C13 setting was set to 1. The peptide charge was set to 1ϩ, 2ϩ, or 3ϩ, and instrument setting was put on ESI-TRAP. For the identification of butyrylated peptides, variable modifications were set to pyroglutamate formation of N-terminal glutamine and fixed modifications included D 3 -acetylation at lysines, methionine oxidation to methionine-sulfoxide, and butyrylation ( 12 C 4 or 13 C 4 ) of peptide N termini. Additionally, we also searched for peptides containing D 3acetylation at lysines and methionine oxidation as fixed modifications, and pyroglutamate formation of N-terminal glutamine and acetylation and D 3 -acetylation of N termini as variable modifications. Only peptides that were ranked one and scored above the threshold score, set at 95% confidence, were withheld. This confidence level was chosen because we have generally found lower Mascot scores for butyrylated C-terminal peptides (22). The estimated false discovery rate by searching decoy databases (a reversed version of the 2011_05 human UniProtKB/Swiss-Prot database made by the DBToolkit algorithm (28)) was found to be 3% at the spectrum level (29). The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium (http://www.proteomexchange.org) via the PRIDE partner repository (30,31) with the data set identifier PXD000834 and DOI 10.6019/PXD000834. Peptide quantification ( 12 C 4 -versus 13 C 4 -butyrylated) was carried out using the Mascot Distiller Quantitation toolbox (version 2.2.1). The quantification method details were as follows: constrain search, yes; protein ratio type, average; report detail, yes; minimum peptides, 1; protocol, precursor; allow mass time match, yes; allow elution shift, no; all charge states, yes; and fixed modifications, mass values. Ratios for the proteins were calculated by comparing the extracted ion chromatogram peak areas of all matched light peptides with those of the heavy peptides. To identify significantly altered C termini, robust statistics (32) was applied to the base-2 logarithm values of the calculated ratios of all the identified ( 12 C 4 -butyrylated/ 13 C 4 -butyrylated) peptide doublets being set as TRUE. Then, the R software package was used for statistical computing to calculate the probability distributions of log2 transformed peptide ratios. We considered neo-C-terminal peptides to those processed C termini (i.e. the latter peptides lacking up to 20 amino acids from database annotated C terminus) that were present at much higher levels (according to the statistical analysis and using a stringent p value threshold of 0.0001) in the proteome of HEK293T cells overexpressing CCP1. Ratios being set as FALSE were all verified by individual inspection.
Analysis of the Occurrence of Swiss-Prot Protein Entries with Acidic C termini-An in silico search for putative human CCP1 substrates was performed using ScanProsite (33) and using the pattern [ED](n)Ͼ, where n represents the number of consecutive acidic residues at a protein's C terminus which was allowed to vary from 1 to 32. The pattern search was performed in the Homo sapiens taxon of the 2014_03 UniProtKB/Swiss-Prot database, using the default settings (except that no splice variants were allowed).
To estimate the probability of finding proteins with acidic C-terminal tails, we assumed a random model of sequences in which the different amino acids occur independently in protein sequences (34). As a consequence, the probability (p) of finding a motif n 1 n 2 n 3 … n i is where n j ʦ Z and Z is the set of naturally occurring amino acids, Z ϭ (A,R,N,D,C,Q,E,G,H,I,L,K,M,F,P,S,T,W,Y,V); p(n j ) is the natural occurrence of the amino acid n j in the proteome, and these numbers were obtained from the average occurrences calculated by the iceLogo program for the human Swiss-Prot database (35). When applied to the C terminus, j ϭ 1 corresponds to the C-terminal position, j ϭ 2 to the penultimate position, j ϭ 3 to the antepenultimate position, and so on. For the search of acidic motifs, we are not interested in a unique amino acid, but rather in the presence of Glu or Asp. Thus, we are interested in the probability of finding any of these two amino acids, i.e. p(E or D). In consequence, based on (Eq. 1) the probability of finding, in the whole proteome, proteins with C termini of i or more consecutive acidic residues was calculated as follows

Study Design and Sample
Preparation-It has previously been shown that the carboxypeptidases CCPs 1, 4, and 6 not only trim post-translationally added polyglutamate side chains of tubulin, but also act on the gene-encoded C-terminal glutamate stretches of two important regulators of myosin function; MLCK1 and telokin (15). It was then suggested that additional proteins carrying multiple glutamate residues at their C termini might also be CCP substrates. In this context, we performed an unbiased search for natural substrates of CCP1 using the C-terminal COmbined FRActional DIagonal Chromatography (C-terminal COFRADIC) positional proteomics approach (22). C-terminal COFRADIC is a quantitative approach, which enriches for C-terminal peptides, thereby pointing to peptides reporting C-terminal protein processing events (i.e. neo-C termini) and thus facilitating the discovery of natural substrates of carboxypeptidases (22).
We chose HEK293T cells for this study, since it was previously shown that this cell line constitute a simple and appropriate cellular system to study CCP1 function (36). We transiently transfected full-length CCP1 in HEK293T cells and analyzed its effect on different tubulin post-translational modifications (PTMs). After 72 h of transfection, Western blot analyses showed that overexpression of CCP1 generated increased levels of ⌬2-tubulin ( Fig. 1A and 1B), in line with previous reports (15,36). Of note is that for overexpressed CCP1, besides the main 150-kDa band, CCP1 fragments of ϳ120 -130 kDa and 80 -90 kDa can be observed (Fig. 1A). Similar patterns were reported for native and recombinant CCP1, both for the mouse (37) and human (2) orthologs.
C-terminal COFRADIC Analysis to Identify CCP1 Substrates-Knowing that expressed CCP1 is active in HEK293T cells, we proceeded with C-terminal COFRADIC (22) to identify CCP1 cleaved substrates on a proteome-wide level. Extracts from control HEK293T cells (i.e. mock-transfected) and HEK293T cells overexpressing CCP1 were processed in parallel, as described (22). In C-terminal COFRADIC, both N-terminal and C-terminal peptides are enriched by strong cation exchange (SCX) from internal peptides present in whole-proteome trypsin digests. Subsequent postmetabolic and chemical stable isotope-labeling of free amines with isotopic variants of N-hydroxysuccinimide (NHS)-butyrate (i.e. NHS ester of 12 C 4 -butyrate for the control sample and 13 C 4 -butyrate for the sample with CCP1 overexpression) allows for segregation of N-terminal and C-terminal peptides, and for MS-based quantification enabling the relative quantification of isolated C-terminal peptides per proteome. Following LC-MS/MS analysis and database searching, we expected different readouts for the protein C termini identified and quantified. The majority of protein C termini were expected not be processed by CCP1 ( Fig. 2A), and, as a con-sequence, their corresponding C-terminal peptides should be present at similar concentrations in both proteomes (Fig. 2C).
Similarly, proteins C-terminally processed by a different carboxypeptidase (or other protease) were expected to give such

FIG. 2. Identification of C-terminal proteolytic processing events in CCP1-transfected cells by means of C-terminal COFRADIC analysis.
A, Representation of a protein that is not processed by CCP1. Scissors indicate trypsin cleavage sites. N-terminal tryptic peptides are colored in red. C-terminal tryptic peptides are colored in green/blue. Internal tryptic peptides are colored in yellow. B, Representation of a protein that is C-terminally processed by CCP1. C, Mass spectrum representation of a C-terminal peptide that is not processed in both proteomes (Peptide A). This mass spectrum would equally represent a C-terminal peptide showing basal degradation in both proteomes by a protease different from CCP1. D, Mass spectrum representation of the intact C-terminal peptide of a protein that is processed by CCP1 (Peptide B). E, Mass spectrum representation of the neo-C-terminal peptide of a protein that is processed by CCP1 (Peptide C). F, Identification of ␣-tubulin 1A/1B as CCP1 substrate. Mass spectrum of the triply charged neo-C terminus of ␣-tubulin 1A/1B, But 13 C 4 -EDMAALEK-DYEEVGVDSVEGEGEEEG (residues 423-448; But 13 C 4 , ␣-amino group modified by 13 C 4 -butyric acid). a readout. CCP1 substrates however, should have been processed to higher extents in the proteome in which CCP1 is overexpressed (Fig. 2B). As a result, the corresponding intact C-terminal peptide (or database-annotated protein C-terminal peptide) should be found at higher levels in the control sample (Fig. 2D). Additionally, the generated neo-C-terminal peptide (C-terminal peptide generated by proteolytic cleavage of CCP1) is expected to be present at much higher levels in the proteome where CCP1 is overexpressed (Fig. 2E).
In total, we identified 1259 unique database-annotated N termini, 886 unique database-annotated protein C termini, and 75 processed protein C termini (i.e. only considering peptides lacking up to 20 amino acids from database annotated C terminus) (supplemental Table S1). Most of the latter were found at similar levels in both proteomes, and thus point to proteins processed by proteases other than CCP1 or to artifacts (i.e. C-terminal ragging) generated during sample preparation. Interestingly, we identified nine processed C termini that were present at much higher levels (over sevenfold up, p Ͻ 0.0001) in the proteome of HEK293T cells overexpressing CCP1 (Table I). Annotated MS/MS spectra of the neo-C termini identified in Table I are provided in supplemental Fig. S2. We focused our analysis on these neo-C termini, because they directly point to putative CCP1 substrates. In particular, two of these neo-C termini corresponded to processed ␣-tubulin C termini, a protein known to be processed by CCP1 (Fig. 1B and references (15, 36, 37)). C-terminal COFRADIC identified the processed C terminus of ␣-tubulin 1A/1B (EDMAALEKDYEEVGVDSVEGEGEEEG; residues 423-448) as almost uniquely present in the proteome of HEK293T cells overexpressing CCP1 (Fig. 2F). Moreover, we also identified a neo-C terminus (EDMAALEKDYEEVGADSADGEDEG; residues 423-446) of a different ␣-tubulin isotype (␣-tubulin 1C) as a singleton in CCP1 overexpression sample. Both peptides correspond to the C terminus of ⌬3-tubulin, a tubulin form that lacks the last three residues of ␣-tubulin. Coincident with previous reports (36), our data confirm the ability of CCP1 to generate ⌬2-tubulin (Fig. 1B) or ⌬3-tubulin (Table I) by removing one or two Glu residues from the C terminus of detyrosinated ␣-tubulin; considering that another enzyme, different from CCP1, is responsible for releasing C-terminal Tyr from ␣-tubulin (15,38).
Besides ␣-tubulin, we identified seven neo-C termini belonging to five new putative CCP1 substrates (Table I). Interestingly, in all cases the amino acids released by CCP1 are acidic amino acids, a substrate specificity that partially agrees with previous reports. Two of these peptides indicate CCP1mediated removal of a single amino acid from the protein's C terminus: release of glutamate from the C terminus in case of the eukaryotic translation initiation factor 4H (eIF4H), and of aspartate in the case of stathmin. Previous reports demonstrated the action of CCP1 on C-terminal glutamates (15,36,37), and it was suggested that CCP1 does not recognize aspartate (15,37). Our observed trimming of the C terminus of stathmin indicates that CCP1 is indeed able to release C-terminal aspartates. Moreover, two peptides pointed to C-terminal processing of the 40S ribosomal protein S9 (RPS9). Here, CCP1 removed respectively three and four amino acids from the C terminus of RSP9 (i.e. the C-terminal Asp and two or three subsequent Glu residues were removed, Table I). Given that M14 MCPs release only one amino acid at a time, it is expected that CCP1 sequentially releases these amino acids (37). The TRAF-type zinc finger domain-containing protein (TRAD1) was also identified as a putative CCP1 substrate. TRAD1 contains a C-terminal stretch of six glutamates and CCP1 seems to be capable of releasing all of them. Remarkably, we also identified two neo-C termini of the high mobility group protein B3 (HMGB3), which shows that up to 15 or 16 amino acids are removed by CCP1.
Interestingly, when identified and except for ␣-tubulin 1C, the levels of corresponding intact C termini of these putative CCP1 substrates are not significantly altered (supplemental Table S2). This might indicate that the extent of processing of these putative CCP1 substrates in cells is generally low. The intact C terminus of ␣-tubulin 1C with significantly altered expression, however, does not match the expected readout for a substrate (Fig. 2C), but rather seems to be at fivefold higher concentration in the sample with CCP1 overexpression. Western blot analysis using an antibody recognizing different tubulin isotypes however showed equal levels of total Tyr-tubulin in both samples (Fig. 1B).
Validation of Putative CCP1 Substrates-From the five new putative CCP1 substrates identified, two were selected for further validation; TRAD1 and HMGB3. TRAD1 contains a C-terminal stretch of six glutamates residues and CCP1 is able to sequentially remove all of them (Fig. 3A). We therefore used the polyE antibody that was designed to recognize a polyglutamate side-chain of at least three consecutive glutamate residues in tubulin (39), but additionally recognizes gene-encoded C termini of three or more glutamates (15). N-terminal Strep-and HA-tagged TRAD1 was overexpressed in HEK293F cells together with CCP1 or an inactive form of CCP1 (CCP1-E270Q). The latter contains a mutation of a key catalytic residue of M14 metallocarboxypeptidases (E270Q using the numbering system of bovine carboxypeptidase A, E1102Q in human CCP1) that polarizes the metal-bound water molecule responsible for attack of the scissile bond (37,40). Because of the low sensitivity of the polyE antibody, we enriched TRAD1 by means of its N-terminal Strep-tag. Subsequent Western blot analysis of TRAD1 with antibodies directed to its N-terminal HA-tag (Fig. 3B) shows that TRAD1 is enriched from both cell lines with similar efficiency. Note that CCP1 is copurified because it also holds a Strep-tag at its C terminus (Fig. 3B). Interestingly, the polyE antibody detects TRAD1's C terminus when co-expressed with inactive CCP1, but not when co-expressed with active CCP1 (Fig. 3B). In addition, we directly incubated purified recombinant TRAD1 with purified recombinant CCP1 and found that TRAD1 was no In the case of ␣-tubulin, we considered that CCP1 uses as substrate the pool of detyrosinated tubulin naturally present in the cell (15,38). b The C-terminus of these proteins is displayed, although no CCP1 cleavage sites have been identified.
c This substrate was not identified in our screen, but the orthologous mouse substrate was previously reported by Rogowski et al. (15).  3. TRAD1 and HMGB3 are direct CCP1 substrates. A, CCP1 is capable of removing all acidic C-terminal amino acids of TRAD1. The epitope recognized by the PolyE antibody is indicated. B, The C terminus of TRAD1 is processed by active CCP1. Strep/HA-tagged TRAD1 was cotransfected in HEK293F cells either with active CCP1 or inactive (E270Q) CCP1. TRAD1 was enriched from protein extracts using Strep-Tactin columns. Equal amounts of protein eluate were analyzed by Western blot using an anti-Strep antibody. CCP1 is also enriched by means of its C-terminal Strep-tag. The polyE antibody shows the integrity of TRAD1 C terminus when co-expressed with inactive CCP1, but not when co-expressed with the active enzyme. C, In vitro validation of TRAD1 as a direct CCP1 substrate. Purified recombinant TRAD1 was incubated with CCP1 for 2 h at 37°C, in the presence or absence of the MCP inhibitor o-phenanthroline (o-Phen), and analyzed by Western blot using the polyE antibody. D, HMGB3 is composed of two central DNA binding domains (HMG-boxes) and a long acidic C-terminal tail. CCP1 is able to sequentially release up to 16 acidic amino acids from its C terminus. The size of the HMG boxes and the acidic tail are merely illustrative, and does not represent their real sizes. E, HMGB3 is processed by CCP1. Strep-tagged HMGB3 was cotransfected with active or inactive (E270Q) CCP1. Equal amounts of each cell extract were analyzed by Western blot using a HA-tag antibody, to demonstrate equal expression levels of both CCP1 variants. Western blotting using an anti-Strep antibody shows the appearance of a degraded form of HMGB3 when co-expressed with active CCP1. F, In vitro validation of HMGB3 as a direct CCP1 substrate. Purified recombinant HMGB3 was incubated with CCP1 ON at 37°C, in the presence or absence of 10 mM o-Phen, and analyzed by SDS-PAGE. longer recognized by the polyE antibody (Fig. 3C). However, in the presence of o-phenanthroline (32), a specific inhibitor of metalloproteases, the polyE antibody was still capable of recognizing affinity purified TRAD1, indicative of CCP1 dependent removal of the acidic tail of TRAD1. Hence, our data validate TRAD1 as a novel and direct CCP1 substrate.
As for HMGB3, CCP1 seems to remove quite a long stretch of acidic amino acids from its C terminus (Fig. 3D). Accordingly, we expected to observe a significant change in the MW of the HMGB3 precursor versus its processed counterpart(s) by means of SDS-PAGE. As such, N-terminally Strep-tagged HMGB3 was co-expressed in HEK293T cells with either active or inactive CCP1. Western blot analysis shows that in addition to the intact HMGB3 band, a proteolytic fragment of HMGB3 is found when HMGB3 is co-expressed with active CCP1 (Fig.  3E). The appearance of a HMGB3 fragment, with different electrophoretic mobility, is consistent with the CCP1-mediated processing identified by C-terminal COFRADIC. By assaying recombinantly produced and purified proteins, we could also show direct processing of HMGB3 by CCP1, but only in the absence of o-phenanthroline (Fig. 3F). Finally, we used protein mass spectrometry to measure the masses of the purified HMGB3 protein(-fragments). Q-TOF analyses determined the molecular weight of intact HMGB3 purified from HEK293F cells to be 25,300 Ϯ 12.65 Da (supplemental Fig.  S3A), which indicates that this protein lacks its initiator methionine, is N-terminally acetylated and lacks a C-terminal glutamate (predicted MW 25303.3 Da). Peptide mass fingerprint analysis confirms the presence of these PTMs (supplemental Fig. S3B and supplemental Tables S3 and S4). The mass of the HMGB3 fragment generated by CCP1 is 23,377 Ϯ 11.67 Da (supplemental Fig. S3C), which corresponds to the release of 15 extra C-terminal residues of HMGB3 by CCP1 (predicted MW 23380.6 Da). Hence, the in vitro cleavage product fully correlates with one of the HMGB3 neo-C termini identified by C-terminal COFRADIC, and all of these results validate HMGB3 as a direct CCP1 substrate.
Prediction of Additional CCP1 Substrates-Our findings show that CCP1's substrate specificity is restricted to Glu and Asp residues. We exploited this characteristic to perform an in silico scan for other potential CCP1 targets using the ScanProsite tool (33). Our search (parameters indicated in the Experimental Procedures section) returned a list of 16, 36, and 111 human proteins containing respectively 7, 5, and 3 or more consecutive acidic amino acids at their C terminus. Table II shows the protein hits with five or more consecutive acidic C-terminal residues. Supplemental Table S5 displays hits containing C termini with three or more acidic residues. The number of proteins in Table II is 76 times higher than expected by chance alone; among the 20,257 human proteins in the 2014_03 UniProtKB/Swiss-Prot database less than one protein (or 1 out of 43,159 proteins) is expected to have five or more consecutive acidic C-terminal residues (considering that the natural frequencies of glutamate and aspartate for human proteins in Swiss-Prot are 7.1% and 4.7%, respectively). More so, finding proteins with 10 or more consecutive C-terminal acidic residues is highly unlikely (1 in 1.86 ϫ 10 9 proteins); however, nine human proteins have acidic stretches of at least 10 residues.
Our ScanProsite analysis pointed to different high mobility group proteins as potential CCP1 targets. In particular, the high mobility group protein B1 (HMGB1) and B2 (HMGB2) contain an unusually high number of acidic residues at their C termini (30 and 22 residues respectively). To confirm that in line with HMGB3, these could be CCP1 substrates, N-terminally Myc-tagged versions of both proteins were co-expressed in HEK293T cells with either active or inactive CCP1. Western blot analyses show that CCP1 indeed processes the C terminus of HMGB2 (Fig. 4B). A second cleavage product of HMGB2 with a lower molecular weight can also be observed, although at lower levels (Fig. 4B). CCP1 is also capable of processing the C terminus of HMGB1 (Fig. 4A), but to a lower extent when compared with HMGB3 or HMGB2. In this case, processing leads to several protein fragments.
CCP1 partially localizes in the nucleus, as was shown by Rodriguez de la Vega et al. by confocal microscopy analysis in HeLa cells, revealing a granular distribution for CCP1 both in the nucleus and cytoplasm (2). Furthermore, recently, a nuclear export signal (NES) in the N-terminal region of CCP1 was identified and characterized, showing that CCP1 is involved in a nuclear-to-cytoplasmic relocalization mediated by CRM1dependent nuclear export (41). Because the function of CCP1 in nuclear processes remains unclear, the identification of possible nuclear substrates is of interest. HMGB1 and HMGB2 localize in the nucleus, but shuttle continuously to the cytoplasm by active transport mechanisms, with the equilibrium shifted toward nuclear accumulation (42)(43)(44)(45). To exclude the possibility that cleavage of HMGB proteins occurs as a result of an aberrant localization of HMGB proteins (i.e. generated by their overexpression), we used an anti-HMGB2 antibody to analyze endogenous HMGB2 in HEK293T cell overexpressing CCP1 or CCP1-E270Q. Western blot analysis showed the appearance of C-terminal processed forms of endogenous HMGB2 in cells overexpressing active CCP1, absent when inactive CCP1 mutant was overexpressed (supplemental Fig. S4,  lanes 1-2). Thus, we confirm the ability of CCP1 to process the acidic C-terminal tail of HMGB2, in this case by analyzing the endogenous HMGB2 protein. Given that HMGB2 partially associates with chromatin, we fractionated whole cell protein lysates into soluble fractions (containing cytosolic and nuclear soluble proteins) and chromatin-bound insoluble fractions, and found that most of the processed HMGB2 is present in the chromatin(-associated) fraction (supplemental Fig. S4, lanes  3-6). This could indicate that chromatin-bound HMGB2 is more susceptible to CCP1 cleavage or, alternatively, that processed HMGB2 display a higher affinity for chromatin (44,46). Further research is however, needed to clarify this observation and to elucidate the functional implications of these and other proteo-lytic processing events observed in this work. Studies with the pcd knock-out might offer further validation of the putative substrates here identified. DISCUSSION Identification of carboxypeptidase substrates in complex proteomes has been challenging until the recent development of C-terminal positional proteomics techniques (22,(47)(48)(49)(50). Previously, C-TAILS was used to investigate constitutive Cterminal proteolysis in E. coli (47), oxazolone chemistry enabled the study of natural neo-C termini in Thermoanaerobacter tengcongensis (49) and C-terminal COFRADIC was used to screen for substrates of human carboxypeptidase A4, when added to human PC3 cell lysates (22). In this work, we present the first proteomic study aimed at identifying pro-tein substrates of carboxypeptidases in a cellular context. Here, C-terminal COFRADIC analyses resulted in the identification of nine processed C termini directly generated by human CCP1 in HEK293T cells. The putative natural substrates here identified together with the previously reported substrate MLCK1/telokin (Table I), support a role for CCP1 in the proteolytic processing of gene-encoded acidic C termini.
Notably, among the putative substrates identified by C-terminal COFRADIC, we found ␣-tubulin, a known CCP substrate that has already been extensively studied (15,36,37). In addition, the observed preference for acidic residues is in agreement with previous reports (15,20,36,37). All putative CCP1 substrates here identified are located either in the cytoplasm or the nucleus, which is consistent with the subcellular localization of CCP1 (2,3,5). We further validated TRAD1 and HMGB3 as CCP1 substrates by cotransfecting cells with TRAD1 or HMGB3 and with CCP1. This allowed us to verify that in a cellular context, C-terminal trimming of both proteins occur in the presence of active CCP1. In addition, in vitro experiments making use of purified substrates confirmed that the identified neo-C termini are the result of the direct action of CCP1 on these proteins.
It was previously hypothesized that CCPs would generally recognize and process proteins with gene-encoded C termini ending with glutamate stretches (15). The here identified CCP1 substrates show for the first time that CCP1 processes both glutamate as well as aspartate C-terminal residues. We used an iceLogo representation (35) to visualize the derived substrate specificity for CCP1, which as expected is shown to be restricted to acidic residues, with a marked preference for glutamates over aspartates (Fig. 5). The iceLogo displays, in addition to the requirement of an extended substrate specificity profile, a critical role of nonprimed binding subsites in substrate selection. This result agrees with a recent report describing that biotin-3E is a better CCP1 substrate than biotin-2E, and therefore shows the preference of this enzyme for longer acidic tails (37). Previous reports stated that CCP1 was not capable of releasing C-terminal aspartate residues (15,37). Given our data, we hypothesize that CCP1 does cleave off aspartates but less efficient than glutamates. Indeed, CCP1 cleaves aspartates from RPS9 and HMGB3, but their identified final products also suggest that CCP1 slows down or stalls its proteolytic activity when it encounters an aspartate residue in P1 or P1Ј (Table I). Also, the higher proportion of aspartate residues residing at the C terminus of HMGB1, when compared with HMGB2 or HMGB3, might explain why the extent of cleavage is lower for this protein (Figs. 3E and 4). In vitro assays showed processing of the HMGB3 tail (Fig. 3F) and Biot-EEE, but failed to detect CCP1 cleavage of Biot-EED peptide in a colorimetric assay (data not shown) (37). Cleavage of Asp might be favored in the context of long acidic tails like those found in HMGB proteins, where nonprime positions are occupied by favoring acidic residues.
To further support a role of CCP1 in gene-encoded C-terminal processing, we searched for reports describing the natural occurrence of the C-terminal modifications shown in Table I. In the case of ␣-tubulin, a pool of ⌬2-tubulin is known to be naturally present in the cell (51)(52)(53). Richter-Cook et al. described the purification of rabbit eIF4H from rabbit reticulocyte lysates and identified a form of the protein with a more basic pI and postulated that this protein form might result from proteolysis of its C-terminal glutamate (54). Rusconi et al. analyzed by MALDI-TOF-MS the C terminus of chicken telokin and found forms of the protein from which one to six C-terminal glutamate residues had been removed (55). Furthermore, we searched for evidence of CCP1 being the responsible carboxypeptidase in vivo for some of these C-terminal processing events. Berezniuk 4. HMGB2 and HMGB1 are processed by CCP1. CCP1 is able to cleave the C terminus of A, HMGB1 and B, HMGB2. N-terminally Myc-tagged HMGB1 or HMGB2 were cotransfected with active or inactive CCP1. Equal amounts of each cell extract were analyzed by Western blot using a HA-tag antibody, to show equal expression levels of both CCP1 variants. Western blotting using an anti-Myc antibody shows the appearance of C-terminal processed forms of the HMGB proteins when coexpressed with active CCP1.
HMGB proteins can provide clues to the functional implications of CCP1-mediated C-terminal processing, given that comprehensive research is done on how the lack of their acidic C-terminal tails affects their function. Most of these studies were performed with HMGB1 or HMGB2, and have not yet been performed on more recently discovered mammalian HMGB proteins such as HMGB3. HMGBs are nonhistone proteins that are involved in various nuclear chromatinassociated processes such as transcription, replication, recombination, DNA repair, and genomic stability (44,56,57). Mammalian HMGBs are composed of two central DNA binding domains (HMG-boxes A and B) and a negatively charged C-terminal region, which is highly conserved among different species (44). The acidic tail of HMGB1 and HMGB2 negatively affects binding to both linear and supercoiled DNA (44,58) mediated by intra-molecular interactions of the acidic tail with the HMGB boxes and thereby shielding these domains from other interactions. However, these acidic tails seem to modulate the biological functions of HMGBs in many other ways. For instance, they are involved in the nucleocytoplasmic transport of HMGB1 and HMGB2, and affect their localization (59,60). In addition, the C-terminal tail of HMGB1 is crucial for transcription stimulation (61,62), essentially by its interaction with histones H1 and H3 (61,63). Furthermore, the HMGB1 C-terminal tail was reported to play a major role in DNA repair, chromatin remodeling, and DNA replication (64 -66). Marintcheva et al. built a model to explain the role for the acidic tail of the ssDNA-binding protein of bacteriophage T7, which can be extended to other proteins such as ribosomal proteins, translation factors and HMG proteins (67). According to this model, the acidic tail of these proteins would regulate the affinity to their binding partners (DNA or other proteins) by binding and thereby shedding their basic binding clefts. In this context, CCP1, by shortening the length of these acidic tails would regulate (the avidity of) protein-protein and DNA-protein interactions. In fact, this is in line with the proposed role for PTMs affecting the C terminus of tubulin; modulation of motor proteins, and MAPs binding to the C-terminal region of tubulin (52). Similarly, Rusconi et al. proposed that the observed heterogeneity of telokin's C terminus might regulate its interaction with other proteins (55).
The functions of the here identified putative CCP1 substrates are quite diverse (Table I), including microtubule-related proteins and proteins involved in transcription and chromatin remodeling. The latter could help to understand, for instance, the progressive transcriptional silencing or the large scale reorganization of chromatin in pcd mice. However, further work is needed to fully validate these proteins as in vivo CCP1 substrates. Studies in the pcd knockout mouse might provide the ultimate validation of these substrates. Future research should additionally assess the possible implications of C-terminal processing of these proteins on their function. Finally, further findings from C-terminomics studies will aid in the understanding of post-translational modifications that affect protein C termini and their role in the regulation of biological processes. FIG. 5. Substrate specificity profile derived from putative CCP1 substrates. IceLogo (35) was used to show the enriched residues present at the different identified CCP1 putative substrate positions as compared with the reference set (i.e. C termini of human proteins in the Swiss-Prot database). For the iceLogo preparation we took into consideration that MCPs sequentially release amino acids. As a result, when a few amino acids are released to get to the final identified product, intermediate products of digestion act as new substrate and are further digested by CCP1. Consequently, intermediate products of digestion were considered as different individual substrates when preparing the iceLogo and, as a result, 35 CCP1 substrates were considered. Only those substrates identified by proteomics were considered here. In the representation the substrate residues are depicted according to Schechter & Berger nomenclature. The frequency of the amino acid occurrence at each position in the sequence set was compared with the occurrence in the reference set. Only statistically significant residues with a p value Յ 0.05 are plotted. Amino acids height shows the degree of difference in the positional amino acid frequencies in the experimental set as compared with the reference set. Residues that are statistically over-or underrepresented in the experimental set are shown in the upper or lower part of the iceLogo respectively.