Predicting the targeting of tail-anchored proteins to subcellular compartments in mammalian cells

ABSTRACT Tail-anchored (TA) proteins contain a single transmembrane domain (TMD) at the C-terminus that anchors them to the membranes of organelles where they mediate critical cellular processes. Accordingly, mutations in genes encoding TA proteins have been identified in a number of severe inherited disorders. Despite the importance of correctly targeting a TA protein to its appropriate membrane, the mechanisms and signals involved are not fully understood. In this study, we identify additional peroxisomal TA proteins, discover more proteins that are present on multiple organelles, and reveal that a combination of TMD hydrophobicity and tail charge determines targeting to distinct organelle locations in mammals. Specifically, an increase in tail charge can override a hydrophobic TMD signal and re-direct a protein from the ER to peroxisomes or mitochondria and vice versa. We show that subtle changes in those parameters can shift TA proteins between organelles, explaining why peroxisomes and mitochondria have many of the same TA proteins. This enabled us to associate characteristic physicochemical parameters in TA proteins with particular organelle groups. Using this classification allowed successful prediction of the location of uncharacterized TA proteins for the first time. Summary: Characteristic physicochemical features of tail-anchored membrane proteins, based on the interplay between transmembrane domain hydrophobicity and tail charge, allow prediction of their subcellular localization.


INTRODUCTION
Tail-anchored (TA) proteins possess a single transmembrane domain (TMD) close to their C-terminus, which anchors them to cellular membranes and exposes their N-terminal domain to the cytosol. They play key roles in processes requiring membrane anchorage such as organelle division, apoptosis, vesicle targeting/ fusion and lipid trafficking (Borgese and Fasana, 2011). Their correct targeting and localization is therefore of fundamental importance for cellular function and viability of the organism. As the TMD of TA proteins emerges from the ribosome only after termination of translation, sorting and insertion require posttranslational mechanisms (Borgese and Fasana, 2011;Kutay et al., 1993). TA proteins gain entry to the cellular membrane systems at three subcellular sites: the endoplasmic reticulum (ER), mitochondria and peroxisomes. In yeasts and mammals, the orthologous GET and TRC40 complexes, respectively, are involved in the delivery and insertion of TA proteins into the ER (Mariappan et al., 2010;Mateja et al., 2015;Schuldiner et al., 2008). In mammals initial binding of nascent TA proteins is mediated by the SGTA and the BAG6 complex, constituting a quality control step in the pathway (Hessa et al., 2011;Leznicki and High, 2012;Leznicki et al., 2013;Mariappan et al., 2010;Mock et al., 2015). Following successful transit through the SGTA/BAG6 checkpoint, TA proteins are delivered to the ER transit factor TRC40 (GET3 in yeast). Two additional proteins, WRB (Vilardi and Lorenz, 2011) and CAML (also known as L1CAM) (Yamamoto and Sakisaka, 2012), then act as receptors for TRC40-bound TA proteins on the ER membrane. For some ER TA proteins, alternative pathways exist which may utilize the signal recognition particle (SRP) or HSC70-HSP40 systems (Abell et al., 2004(Abell et al., , 2007Daniele et al., 2016;Vogl et al., 2016). The molecular mechanisms for sorting and insertion to peroxisomes and mitochondria are less clearly understood. The factors for targeting of mitochondrial TA proteins have not yet been identified, although the involvement of HSC70 (also known as HSPA8) has been suggested (Borgese and Fasana, 2011;Rabu et al., 2008). An alternative possibility is via unassisted insertion, with the composition of the mitochondrial membrane contributing to targeting specificity (Kemper et al., 2008). For peroxisomal TA proteins, targeting is generally considered to be mediated by PEX19, an import receptor for peroxisomal membrane proteins (PMPs), and PEX3, the receptor for PEX19-bound PMPs at the peroxisomal membrane (Chen et al., 2014b;Yagita et al., 2013).
Owing to the few peroxisomal TA proteins identified to date, these studies are based on mammalian PEX26 (PEX15p in yeast) (Buentzel et al., 2015;Halbach et al., 2006;Yagita et al., 2013) and FIS1, which is present on both peroxisomes and mitochondria (hereafter denoted as shared) (Delille and Schrader, 2008;Koch et al., 2005). Besides these primary targeting systems, subcellular localization can be further controlled after membrane entry by processes such as membrane extraction and TA protein degradation (Chen et al., 2014a;Okreglak and Walter, 2014).
The targeting information for TA proteins is contained within the C-terminus, and it is established that hydrophobicity of the TMD and the presence of charged residues are important factors in membrane selection (Borgese et al., 2007). Generally, ER-targeted TA proteins tend to have more hydrophobic TMDs than those targeted to mitochondria (Wang et al., 2010) with non-charged regions surrounding the TMD (Horie et al., 2002;Kuroda et al., 1998). Yagita and colleagues (2013) demonstrated that, for PEX26, charged residues in the tail were also important for peroxisomal targeting. Despite this general knowledge about factors influencing targeting, it remains to be determined how these two properties ensure proper targeting to mitochondria, peroxisomes and the ER.
Furthermore, TA proteins can be targeted to both peroxisomes and mitochondria in mammalian cells (Dixit et al., 2010;Gandre-Babbe and van der Bliek, 2008;Huber et al., 2013;Koch et al., 2005), revealing close organelle interplay and novel peroxisomal functions (Schrader et al., 2015). Moreover, disorders with combined defects in peroxisomal and mitochondrial fission, caused by mutations in MFF and GDAP1, TA proteins shared by both organelles, have been discovered (Huber et al., 2013;Koch et al., 2016;Shamseldin et al., 2012). As peroxisomes fulfill important metabolic functions in lipid and reactive oxygen species (ROS) metabolism, and influence neuronal development and aging (Fransen et al., 2012), there is great interest in the identification of additional peroxisomal TA proteins and those shared by peroxisomes and mitochondria.
Although bioinformatic studies have previously identified potential TA proteins in yeast, plants and humans (Beilharz et al., 2003;Kalbfleisch et al., 2007;Kriechbaumer et al., 2009;Shigemitsu et al., 2016), wider, integrated studies focusing on how targeting is coordinated to control organelle selection in mammals are currently lacking. Here, we expand the repertoire of peroxisomal TA proteins, reveal additional proteins shared by multiple organelles, and show that a combination of TMD hydrophobicity and tail charge determines targeting to distinct organelles in mammals. We demonstrate that tail charge and TMD hydrophobicity act as directly opposing signaling parameters. A sufficient increase in one can override the other, re-directing a protein from the ER to peroxisomes or mitochondria and vice versa. Mechanistically, changes in these physicochemical parameters correlated with the ability of either PEX19 or GET3 to bind and prevent aggregation of individual TA proteins. We show that subtle alterations in TMD hydrophobicity and tail charge can shift targeting between organelles, explaining why peroxisomes and mitochondria share many TA proteins. Our analyses allow, for the first time, successful prediction of the location of uncharacterized TA proteins.

Identification of new peroxisomal and shared peroxisome and mitochondria TA proteins
Peroxisomes and mitochondria cooperate in lipid and ROS metabolism and share membrane proteins involved in organelle division and anti-viral signaling (Koch et al., 2005;Delille and Schrader, 2008;Gandre-Babbe and van der Bliek, 2008;Dixit et al., 2010). Remarkably, all these dually localized proteins are TA proteins. To assess how extensive sharing of TA proteins between organelles is, and to identify additional peroxisomal proteins, we examined a number of TA proteins for localization and targeting ( Fig. 1; Fig. S1). Expression of tagged TA proteins in COS-7 cells and colocalization with organelle markers revealed a subset of mitochondrial TA proteins that were able to target both mitochondria and peroxisomes. These included the anti-apoptotic proteins BCL-XL (encoded by BCL2L1) and BCL2, the motor adaptors MIRO1 and MIRO2 (also known as RHOT1 and RHOT2, respectively), and OMP25 (also known as SYNJ2BP) ( Fig. 1; Fig. S1). BCL2 and MIRO2 were additionally targeted to the ER, which has already been reported for BCL2 (Krajewski et al., 1993). Peroxisomal localization of MIRO1 was confirmed by detection of the endogenous protein in organelle subfractions (Fig. S2).
In contrast, the pro-apoptotic TA proteins BAK (also known as BAK1) and BAX were targeted to mitochondria as was monoamine oxidase A (MAOA) ( Fig. 1; Fig. S1). Expression of ER TA proteins SEC61β, VAPB and FALDH isoform 2 (denoted FALDH-ER in this study; FALDH is also known as ALDH3A2) (Ashibe et al.,  2007) resulted in ER staining ( Fig. 1; Fig. S1); for FALDH-ER, localization was confirmed by assessing organelle subfractions (Fig. S2B). FALDH-PO, a splice variant of FALDH which only differs from FALDH-ER in its C-terminal tail (Fig. 1B) (Ashibe et al., 2007) was confirmed as a TA protein that exclusively targets peroxisomes ( Fig. 1; Fig. S2E). ACBD5 is another potential TA protein recently detected at peroxisomes (Islinger et al., 2007;Nazarko et al., 2014;Wiese et al., 2007). Overexpressed and endogenous ACBD5 showed peroxisomal localization in COS-7 cells ( Fig. 1A; Fig. S2A). Furthermore, ACBD5 localized to peroxisomal fractions separated by density gradient centrifugation and was found in the integral membrane protein fraction after carbonate treatment; differential permeabilization experiments also showed that its N-terminus faces the cytosol (Fig. S2C-E). Overall, these observations suggest that, in addition to TA proteins targeting either mitochondria or peroxisomes, a subset of mitochondrial TA proteins share overlapping targeting properties with peroxisomal TA proteins. In contrast, all tested ER-specific TA proteins showed no detectable peroxisomal localization.
High TMD hydrophobicity is not unique to ER TA proteins in mammals, but peroxisomal TA proteins contain a highly charged tail Targeting information responsible for sorting of TA proteins to the ER and mitochondria is generally located within their C-termini (Borgese et al., 2003). Targeting signals are supposed to consist of general physicochemical parameters such as TMD hydrophobicity and tail charge. Compared to ER TA proteins, mitochondrial TA proteins are generally thought to possess less hydrophobic TMDs (Borgese and Fasana, 2011). This is the case for yeast, where ER TA proteins clearly differ from those targeted to mitochondria or peroxisomes by a more hydrophobic TMD (GRAVY>1.75) ( Fig. 2A) (Beilharz et al., 2003). To identify organelle-specific targeting information for mammalian TA proteins, we analyzed the C-terminal sequences of 51 proteins whose localization had been characterized (including this study) and compared their physicochemical parameters (Dataset S1 available at https://doi. org/10.6084/m9.figshare.4758532). Whereas in yeast, a clear distinction between ER and mitochondrial TMD hydrophobicity is observed, this does not universally apply to mammalian TA proteins. Here, TMD hydrophobicity is more randomly distributed and not significantly different when compared to peroxisomal TA proteins ( Fig. 2A-C). However, our analysis revealed a significantly higher positive net charge of the tail region in peroxisomal TA proteins (6.03±1.03) compared to those routed to both peroxisomes and mitochondria (mean 2.5± 0.43), to mitochondria only (mean 1.12±0.41) or to ER (mean 0.21±0.3) (mean±s.e.m., n=51; Fig. 2D). Significant differences in charge or hydrophobicity were not observed in regions preceding the TMD (Fig. S3). When tail length was assessed, peroxisomal TA proteins appeared to possess a significantly longer tail (Fig. S3A), but this did not appear to be a requirement for peroxisomal targeting, as ACBD5 contains a short tail comparable to the average tail length of the other groups.
We conclude that a highly positive net charge in the tail is a general property of all identified peroxisomal TA proteins in mammals (as shown for PEX26 by Yagita et al., 2013), which distinguishes them from mitochondrial and ER TA proteins. We further determined that there was a significantly higher TMD hydrophobicity in ER compared to mitochondrial TA proteins, indicating that a hydrophobic TMD and low tail charge support ER targeting.
Alterations in tail charge and TMD hydrophobicity distribute TA proteins between peroxisomes, mitochondria and ER To verify the bioinformatics results, we first analyzed a selection of ACBD5 mutants (Fig. 3A). The GFP-tagged ACBD5 TMD and C-terminal tail (GFP-ACBD5 TMD-T ) fusion protein was targeted to peroxisomes, indicating that the TMD and tail region is sufficient for peroxisomal targeting (Fig. 3B). Mutations in the tail region (GFP-ACBD5 TMD-T MUT1), reducing tail charge from +4.9 to +2.9, resulted in targeting to mitochondria (Fig. 3B). Some dual localization to peroxisomes and mitochondria was observed for this mutant ('shared' in Fig. 3E), but all cells showed mitochondrial targeting ( Fig. 3E; Fig. S4A). Further reducing tail charge to +0.9 (GFP-ACBD5 TMD-T MUT2), predominantly resulted in ER staining, with some cells showing targeting to both ER and mitochondria (Fig. 3B,E; Fig S4A). This demonstrates that subtle changes in tail charge can route a peroxisomal TA protein to mitochondria, whereas further reduction in charge leads to ER targeting.  (Kalbfleisch et al., 2007) and yeast (Beilharz et al., 2003) was assessed via database and literature searches; TMD GRAVY and net tail charge were calculated for each. (A,B) Scatter plots depicting TMD GRAVY for each TA protein in yeast (A) and humans (B). (C,D) Box-and-whisker plots of tail charge (D) and TMD GRAVY (D) for human TA proteins. The box represents the 25-75th percentiles, and the median is indicated. The whiskers show the sample range. **P<0.01; ***P<0.001; ns, not significant compared to indicated group (unpaired t-test). Mitochondria (MITO) TA proteins (blue); peroxisome (PO) TA proteins (green); shared TA proteins (PO and MITO, white); ER TA proteins (salmon pink).
Our analysis showed that some ER TA proteins possess a positively charged tail, but unlike peroxisomal TA proteins this is generally combined with a highly hydrophobic TMD. To investigate whether an increase in TMD hydrophobicity can direct ACBD5 to the ER and 'override' the positively charged tail, we expressed a version of GFP-ACBD5 TMD-T (MUT3) with increased TMD hydrophobicity (Fig. 3A). MUT3 was directed to the ER and showed only minor peroxisomal targeting (Fig. 3B,E). This is in line with our data on FALDH, which possesses a highly hydrophobic TMD (GRAVY 2.4). This property (and the negative charge in the tail of -1.1) routes the major isoform (FALDH-ER) to the ER (Fig. 1A). Targeting a TA protein with a highly hydrophobic TMD to peroxisomes appears to require a highly positive net charge in the tail. Indeed, the tail of FALDH-PO is highly charged (charge +9.1), and overrides TMD hydrophobicity. To investigate whether increased tail charge can improve targeting to peroxisomes, we expressed a mutant version of FIS1 with increased tail charge, denoted FIS1-SR (Onoue et al., 2013) (Fig. 3C). Wild-type FIS1, as described previously (Koch et al., 2005), distributes to both mitochondria and peroxisomes, whereas FIS1-SR is predominantly targeted to peroxisomes (Fig. 3D,E; Fig S4B).
To more definitively establish correlations between TA protein sequence and localization, we utilized another model TA protein, GDAP1, which is predominantly mitochondrial but also localizes and functions at peroxisomes (Huber et al., 2013). We generated a systematic set of mutants with alterations in both TMD hydrophobicity and tail charge and assessed their localization (Fig. 4). This is shown graphically for each individual mutant in Fig. 4B-I as the percentage of cells displaying the indicated subcellular localization. For example, a GFP-GDAP1 TMD-T fusion protein with wild-type TMD and tail sequence was targeted to mitochondria alone in ∼28% of cells whereas ∼72% of cells showed dual mitochondrial and peroxisomal targeting ( Fig. 4B,C). In line with our other observations, increasing tail charge increased peroxisomal targeting of GDAP1 (while not completely abolishing mitochondrial targeting) whereas reducing tail charge resulted in ER targeting (Fig. 4D,E). Increasing TMD hydrophobicity was able to override tail charge, resulting in predominantly ER targeting (Fig. 4F) whereas reducing the hydrophobicity caused a shift to mitochondria (Fig. 4G,H). Finally, removing the tail altogether resulted in ER targeting (Fig. 4I). Overall, our data suggest an interplay between tail charge and TMD hydrophobicity in organelle targeting. We conclude that a highly charged tail in combination with a moderately hydrophobic TMD directs TA proteins to peroxisomes. Subtle changes can alter protein distribution: a reduction in tail charge or TMD hydrophobicity enables targeting of peroxisomal TA proteins to mitochondria whereas low charges in combination with a highly hydrophobic TMD favor transport to the ER; an increase in tail charge increases peroxisomal targeting by directly opposing the hydrophobic ER signal in the TMD. Our analysis also reveals that an increase in TMD hydrophobicity can 'override' tail charge and route peroxisomal TA proteins to the ER. Peroxisomal TA proteins interact with the peroxisomal import receptor PEX19 Targeting of membrane proteins to peroxisomes involves the import receptor PEX19 (Sacksteder et al., 2000). Owing to the restricted number of known peroxisomal TA proteins, studies on PEX19 interaction have focused on PEX26 (yeast PEX15p) (Chen et al., 2014b;Halbach et al., 2006;Yagita et al., 2013). For the dually targeted TA proteins FIS1 and GDAP1, interaction with PEX19 has been demonstrated (Delille and Schrader, 2008;Huber et al., 2013). Immunoprecipitation experiments revealed that the peroxisomal TA proteins FALDH-PO and ACBD5 interact with PEX19, whereas no interaction was observed for FALDH-ER (Fig. 5A). As the FALDH  isoforms only differ in the tail sequence, this points to a role for the tail in PEX19 binding. Interaction with PEX19 was also demonstrated for FIS1 and FIS1-SR (Fig. S3G). With the GFP-ACBD5 TMD-T fusions interaction was observed for the wild type (WT), but not for mutants 1-3 suggesting a requirement of high charge and moderate TMD hydrophobicity for PEX19 binding (Fig. 5B). This was confirmed in vitro by testing binding of fluorescently labeled peptides matching the TMD and tail region of ABCD5 to recombinant PEX19 by using fluorescence anisotropy (Fig. 5C). Whereas binding of the WT and MUT2 peptides to PEX19 was significantly different (K d =0.9 µM and 7.7 µM, respectively), binding of peptide MUT1 to PEX19 was only slightly altered compared to WT (MUT1, K d =1.9 µM). Binding of the fluorescent peptides to the control peroxisomal protein SurE was not observed, indicating specificity for PEX19 (Fig. S3H). The discrepancy between binding of MUT1 to PEX19 in vitro compared with the lack of interaction observed in the immunoprecipitation experiments may reflect the presence of competing factors in vivo (see Discussion).
We also investigated the targeting of peroxisomal TA proteins in PEX19-deficient fibroblasts. In control cells, FALDH-PO, ACBD5 and FIS1-SR were targeted to peroxisomes (Fig. 5D), whereas all three proteins were routed to mitochondria in PEX19-deficient cells (Fig. 5D) supporting a general role for PEX19 in receptor-mediated targeting of peroxisomal TA proteins. Importantly, all three proteins showed no observable ER localization in PEX19-deficient cells (Fig. S3), further confirming the overlap between peroxisomal and mitochondrial-targeting properties. Mechanistic insights into the biochemical activity of Neurospora crassa (Nc) PEX19 and the ERtargeting factor GET3 have recently been revealed using a cell-free assay (Chen et al., 2014b). By using this assay, the authors demonstrated that NcPEX19 but not NcGET3 was sufficient to prevent aggregation of NcPEX26. To gain further insight into the mechanisms controlling selective organelle targeting we utilized this assay to test the activity of PEX19 and GET3 on our ACBD5 TMD-T constructs. ACBD5 TMD-T constructs were in vitro translated in HeLa extracts in the presence of recombinant NcPEX19 or NcGET3. NcPEX26 TMD-T and NcSEC61β TMD-T served as controls. In the absence of PEX19 and GET3, TA proteins form large aggregates seen as punctate structures in fluorescence microscopy images (Fig. 5E). Translation in the presence of PEX19 largely prevented aggregation of ACBD5 TMD-T WT, and ACBD5 TMD-T MUT2 but had almost no effect on ACBD5 TMD-T MUT3 (comparable effect to that seen with SEC61β). By contrast, GET3 prevented aggregation of MUT3, but had significantly less impact on MUT2 and WT. Artificial aggregation by saturation of the chaperoning machinery was excluded by the use of excess NcPEX19 and NcGET3. For ACBD5 TMD-T MUT3 this correlates with our localization and PEX19-binding data, suggesting that upon increasing TMD hydrophobicity PEX19 activity is reduced whereas for GET3 it is apparently increased. However, a charged tail does not exclude GET3 from having some activity on ACBD5 TMD-T WT in vitro, in line with previous observations (Yagita et al., 2013). Nevertheless, the positive charge in the tail of ACBD5 TMD-T WT increases the binding affinity to PEX19 when compared to an uncharged tail sequence, as shown by the fluorescence anisotropy assay (Fig. 5C).

Prediction of TA protein localization in mammalian cells
Finally, we exploited the compiled data to predict targeting of uncharacterized TA proteins (Fig. 6). We trained a SVM classifier using the TMD GRAVY, tail charge and cellular location of 43 proteins from our dataset ( Fig. 1 and Dataset S1 available at https:// doi.org/10.6084/m9.figshare.4758532). This classifier builds a statistical model able to predict the probability of a protein to be targeted to each organelle (Fig. 6A). Peroxisomal, mitochondrial and ER TA proteins can clearly be separated into regions of high-class probability or clusters, with very few exceptions (i.e. mitochondrial TOMM22 clusters with ER). When using the highest probability class, the SVM misclassifies 9 of the 43 data points (21%) when used in an in-sample fashion. A more rigorous leaveone-out cross validation misclassifies 14 of the 43 data points (33%). To assess the predictive power of our classifier, we analyzed a published list of predicted human TA proteins (Kalbfleisch et al., 2007) and generated probabilities for peroxisomal, mitochondrial and ER targeting (Dataset S2 available at https://figshare.com/s/ 07df2992d588a2f0c98d). The predicted localization of three proteins was experimentally verified (Fig. 6B). In agreement with our prediction, the candidate TA protein ACBD4 localized to peroxisomes (Fig. 6B). ACBD4 shares 58% sequence identity with ACBD5, mainly due to similarities in the N-terminal acyl-CoAbinding domain, but the amino acid sequence in the C-terminus is significantly different. ATP5J2, which was shown to be a minor component of the mitochondrial ATP synthase complex (Aggeler et al., 2002), was also predicted to be targeted to peroxisomes. Expression of Myc-ATP5J2 revealed dual targeting to mitochondria and peroxisomes (Fig. 6B). This is in accordance with proteomics studies reporting other ATP synthase subunits in peroxisomal fractions (Wiese et al., 2007), but how these proteins might function at peroxisomes is unclear. Finally, we analyzed the targeting of PPP1R3F, a potential regulatory subunit of protein phosphatase type 1 complexes (Kelsall et al., 2011). Predicted targeting to the ER was confirmed by expression of Myc-PPP1R3F in COS-7 cells (Fig. 6B).

DISCUSSION
Hundreds of TA proteins have been predicted bioinformatically in a wide range of organisms (Beilharz et al., 2003;Kalbfleisch et al., 2007;Kriechbaumer et al., 2009), several have been associated with human disorders, but many are still of unknown function or localization. A better understanding of the mechanisms that determine targeting and localization is of great value for the study of TA proteins and organelle function, in particular in humans where mistargeting may cause hitherto undetected disorders.
In the present study, we characterize the physicochemical parameters of a large number of TA proteins in mammals and increase the number of bona fide peroxisomal TA proteins significantly, allowing us to identify targeting information and bioinformatically predict targeting.
Recent studies determining targeting properties for mitochondrial and peroxisomal TA proteins demonstrated that targeting to both organelles requires a positively charged C-terminal tail sequence (Horie et al., 2002;Isenmann et al., 1998;Kuroda et al., 1998;Yagita et al., 2013). Our data clearly demonstrate that a highly positive net charge in the tail region is a general property of all identified peroxisomal TA proteins in mammals, which distinguishes them significantly from mitochondrial and ER TA proteins. As shown for ACBD5, a step-wise reduction in the tail charge results first in mitochondrial and subsequently in ER mistargeting. In line with this, an increase in tail charge can direct TA proteins from mitochondria or the ER to peroxisomes, as exemplified by GDAP1 MUT1, FIS1-SR and FALDH-PO. These data fit a model where a highly charged tail promotes interaction with the peroxisome import receptor PEX19 (Fig. 7). We provide evidence that ACBD5, FALDH-PO and FIS1 interact with PEX19, whereas mutants with a reduction in tail charge lose this ability and are mistargeted. This is also reflected in our in vitro binding assay using C-terminal peptides. Binding to PEX19 has also been demonstrated for other TA proteins (PEX26, FIS1, GDAP1 and FAR1) (Delille and Schrader, 2008;Halbach et al., 2006;Honsho et al., 2013;Huber et al., 2013). Overall, these findings support a general role for PEX19 in the direct receptor-mediated targeting of peroxisomal TA proteins in mammals (Fig. 7). Nevertheless, additional proteins at the organelle membranes may prevent insertion or induce excision of TA proteins missorted by the cytosolic shuttle systems (Chen et al., 2014a;Okreglak and Walter, 2014).
Whereas in yeast a clear distinction between ER and mitochondrial TMD hydrophobicity is observed, this property does not universally apply to mammalian TA proteins. Instead, our data reveal an interplay between tail charge and TMD hydrophobicity. This is exemplified by FALDH-PO and FALDH-ER, which share a highly hydrophobic TMD, suggesting ER targeting of both. Instead, the highly charged tail routes FALDH-PO to peroxisomes. Our analysis also reveals that an increase in TMD hydrophobicity can 'override' tail charge and route peroxisomal TA proteins to the ER and vice versa.
We further determined a significantly higher TMD hydrophobicity in ER TA proteins than in those targeted to mitochondria, indicating that a hydrophobic TMD and a low tail charge support ER targeting in mammals. While this study was under review Rao et al., (2016) proposed helical content of the TMD (based on the AGADIR helix propensity scale) as an additional factor relevant for ER-targeting in yeast. Interestingly, our GDAP1 TMD-T model proteins exhibit similar variations in AGADIR values as the model proteins used by Rao and colleagues (GDAP1 TMD-T MUT3=1.05, MUT4=0.73, WT=0.57, MUT5=0.48) and similarly show a shift in subcellular localization from the ER to mitochondria. Thus, helical content of the TMD may also be considered as a parameter relevant for TA protein targeting in mammals.
Mitochondrial TA proteins have been proposed to be targeted by a default route allowing either unassisted insertion of TA proteins (Krumpe et al., 2012;Setoguchi et al., 2006) or using as yet uncharacterized targeting factors for mitochondrial TA proteins (Kemper et al., 2008) (Fig. 7). Our data supports that highly hydrophilic TMDs are preferentially inserted into mitochondrial outer membranes (Fig. 3). Interestingly, in the absence of PEX19, a hydrophobic TMD (FALDH-PO) combined with a highly charged tail does not prevent TA protein targeting to mitochondria, and in these conditions no ER localization is observed. This supports a model where mitochondrial TA proteins are targeted by a positive selection mechanism (Fig. 7), suggesting the existence of an, as-yetundetected shuttle protein, or a default mitochondrial pathway able to insert charged TA proteins with a higher kinetic efficiency than the GET pathway even when they possess highly hydrophobic TMDs. Interestingly, hydrophobic TMD versions of PEX26 and ACBD5 are protected from aggregation by NcGET3 despite their highly charged tails (Chen et al., 2014b;Fig. 5, this study). In their recent publication Rao and colleagues elegantly showed that tailcharge and TMD hydrophobicity influence the yeast GET pathway at three distinct steps: (1) capture by SGT2, (2) transfer from SGT2 to GET3, and (3) targeting and insertion into the ER membrane (Rao et al., 2016). During any of these steps TA proteins may be rejected and would then be available for other organelle-targeting machinery (e.g. PEX19). In the first step, binding to SGT2 depends on the properties of the TMD with no dependence on tail charge. TA proteins with TMDs containing highly hydrophobic or highly helical content form more stable complexes with SGT2 and more readily enter the ER pathway. In the second step, GET3 appears to have similar substrate preferences to SGT2. In the final step GET3bound substrates that are highly hydrophobic are more likely to be maintained in a stable complex long enough to reach the ER membrane for insertion. At this stage, any positive charge in the tail region drastically reduces import into the ER membrane. Although this model is based on yeast proteins it may provide an explanation for many of the observations we make here. For example, the TMD of ACBD5 has relatively low hydrophobicity/helical content. Thus, based on its TMD, wild-type ACBD5 is a suboptimal substrate for SGT2/GET3. In addition, its charged tail (+4.9, which is higher than the charge observed in any known ER-resident TA protein) would be highly inefficient at inserting into the ER membrane. This would allow PEX19 multiple opportunities to interact with ACBD5 and facilitate its delivery to peroxisomes. When the tail charge is reduced, as in ACBD5 TMD-T MUT1, SGT2/GET3 affinity is unchanged, and the tail charge (+2.9) is still suboptimal for ER insertion, but PEX19 affinity is slightly reduced (K d =1.9 µM vs 0.9 µM in wild-type); potentially the affinity for either a mitochondrial chaperone or the mitochondrial membrane itself is optimal. ACBD5 TMD-T MUT2, like MUT1, is able to initially bind to SGT2/GET3 to enter the ER pathway (Fig. 5G) but unlike MUT1 it now has an uncharged tail and so can successfully pass the ER membrane checkpoint.
Our findings also explain why peroxisomes and mitochondria share a significant number of TA proteins such as FIS1, MFF, MAVS and GDAP1 (Koch et al., 2005; Gandre-Babbe and van der Bliek, 2008; Dixit et al., 2010;Huber et al., 2013). Although our results from overexpression experiments cannot definitively prove the in vivo localization of all the mitochondrial TA proteins investigated in this study, they still underline the overlap in targeting information for both organelles. As subtle changes in the tail charge can shift TA targeting between peroxisomes and mitochondria, it is likely that some exchange occurred through mutations during coevolution of both organelles (Martin, 2010). Binding to PEX19 may have been the selective force allowing the development of new functions for peroxisomes. Based on our findings, those shared functions may also include regulation of organelle motility and apoptosis, but these await further confirmation in vivo. Very recently peroxisome permeability was reported to be influenced by pro-apoptotic proteins (Hosoi et al., 2017). Thus, anti-apoptotic proteins at peroxisomal membranes could protect the organelles from excessive matrix protein release into the cytosol. Exchange of TA proteins between peroxisomes and the ER appears to be more difficult to achieve, requiring more significant sequence changes. This is exemplified by FALDH, which exploits alternative splicing to allow targeting to peroxisomes or the ER. Here, we demonstrate Fig. 7. Schematic model for TA protein targeting to ER, mitochondria and peroxisomes in mammalian cells. Specific targeting of TA proteins to ER, mitochondria and peroxisomes in mammalian cells is mediated by a combination of TMD hydrophobicity and tail charge. Targeting of TA proteins to the ER involves the GET (guided entry of TA proteins) pathway. ER TA proteins interact with a cytosolic sorting complex (composed of BAG6, TRC35/GET4 and Ubl4a/ GET5) and are delivered and inserted into the ER membrane by TRC40 (GET3) and WRB (GET1). A WRB/CAML dimeric membrane receptor (functional homolog to GET1/2) accepts the TA protein from TRC40 at the ER. A hydrophobic TMD and low tail charge support ER targeting in mammals. Targeting of TA proteins to peroxisomes is mediated by PEX19 and PEX3. Peroxisomal TA proteins are characterized by a highly charged tail that promotes PEX19 interaction. TA proteins with a hydrophobic TMD require increased tail charge to be targeted to peroxisomes. It is currently unknown whether delivery and insertion of TA proteins into mitochondria involves specific targeting factors or is primarily unassisted. Mitochondrial TA proteins generally possess a less hydrophobic TMD than ER TA proteins and a less charged tail compared to peroxisomal TA proteins. This scheme is based on the steady state distribution of TA proteins, but other processes such as membrane extraction and TA protein degradation may also influence the subcellular localization. (Please note that the illustration of the GET pathway has been simplified). BAG6, BCL2-associated athanogene cochaperone 6; TRC, transmembrane domain recognition complex; Ubl4a, ubiquitin-like 4a; WRB, tryptophan-rich basic protein.
that the characteristic physicochemical features of the TMD and tail region allow prediction of TA protein localization. Correlating data from the classifier analysis with our experimental approaches confirms that ER TA proteins are primarily sorted according to their high TMD hydrophobicity, which is required for efficient GET3/ TRC40 chaperone activity. Peroxisomal TA proteins possess some tolerance in TMD hydrophobicity, but a highly positive tail charge appears to be the primary selective force for PEX19 binding. In mitochondrial TA proteins, low TMD hydrophobicity seems to be favorable for efficient membrane insertion, whereas tail charge appears to shield from selection for effective ER-membrane insertion (Rao et al., 2016). Importantly, both parameters -TMD hydrophobicity and tail chargeexhibit competitive effects on organellar targeting; thus, proteins with a comparatively low or high TMD hydrophobicity, which on its own would favor mitochondrial and ER targeting, respectively, can still be targeted to peroxisomes if the TMD is followed by a highly charged tail. Alternatively, TA proteins with charged tails can be routed to the ER or mitochondria, if they exhibit appropriately hydrophobic or hydrophilic TMDs.
However, besides these general features, other properties may influence organelle-specific targeting, for example additional signals within the N-terminus or accessibility of the tail region. An example may be GDAP1L1, which, when expressed, is cytosolic and can only be targeted to mitochondria upon specific stimulation (Niemann et al., 2014). Additional parameters influencing targeting (e.g. position of hydrophobic and charged residues in the tail, or helical propensity of the TMD) could add another dimension to the classifier, improving its predictive power.
It should be noted that several of the shared peroxisomemitochondria or peroxisomal TA proteins are of medical importance and have been linked to human disorders (Abu-Safieh et al., 2013;Ferdinandusse et al., 2016;Huber et al., 2013;Keller et al., 2014;Koch et al., 2016;Shamseldin et al., 2012). Our predictor has allowed us to determine potentially new membrane-associated functions for peroxisomes and other organelles. It will be a great challenge for future studies to verify the localization of the endogenous TA proteins, their cell type-or organ-specific expression and to elucidate their cellular functions and importance for organelle biology and human health.

Plasmids and antibodies
For initial cloning of human genes, total RNA was extracted from HepG2 cells by using TRIzol reagent, and was then reverse transcribed into cDNA and used as a PCR template. Gene synthesis was performed by Genscript (Genscript, Piscataway) or Eurofins (Eurofins Genomics, Ebersberg, Germany). See Table S1 for details of plasmids generated in this study, Table S2 for details of primers used and Table S3 for other plasmids. Sitedirected mutagenesis was achieved by means of the QuikChange Kit (Agilent). Details on antibodies can be found in Table S4.

Immunofluorescence and microscopy
Cells were processed for immunofluorescence at 24 or 48 h after transfection. Cells grown on glass coverslips were fixed with 4% paraformaldehyde (PFA) in PBS ( pH 7.4), permeabilized with 0.2% Triton X-100 and incubated with antibodies as described previously (Bonekamp et al., 2013). For differential permeabilization, cells were either permeabilized with 0.2% Triton X-100 or 2.5 μg/ml digitonin. Cell imaging was performed using an Olympus IX81 microscope equipped with an UPlanSApo 100×1.40 NA Oil objective (Olympus Optical, Hamburg, Germany). Digital images were taken with a CoolSNAP HQ2 CCD camera and adjusted for contrast and brightness using the Olympus Soft Imaging Viewer software (Olympus Soft Imaging Solutions GmbH) and MetaMorph 7 (Molecular Devices). Confocal images were obtained using a Zeiss LSM 510 META inverted microscope equipped with a Plan Apochromat 63×1.4 NA (oil/dic) objective (Carl Zeiss, Oberkochen, Germany), using the Ar 488 nm and He 543 nm laser lines. Digital images were adjusted for contrast and brightness using the Zeiss LSM Image Browser software (Carl Zeiss MircroImaging GmbH).

Subcellular fractionation
Peroxisome purification from rat liver was performed as described previously (Islinger et al., 2012). In brief, liver tissue was homogenized in homogenization buffer (HB; 250 mM sucrose, 5 mM MOPS, 1 mM EDTA, 2 mM PMSF, 1 mM DTT, 1 mM ɛ-aminocaproic acid and 0.1% ethanol, pH 7.4) using an Potter-Elvehjem tissue grinder (1 stroke/120 s). The homogenate was clarified in an initial centrifugation step at 600 g for 10 min. The resulting pellet was re-homogenized and re-centrifuged applying the same conditions; both supernatants were pooled and comprise the post nuclear supernatant (PNS). Subsequently, PNS was centrifuged at 1900 g for 15 min to yield the pellet of heavy mitochondria (HM). The resulting supernatant was centrifuged at 25,500 g for 20 min resulting in the light mitochondrial pellet (LM). The corresponding supernatant was again centrifuged at 100,000 g for 30 min to separate the microsomal pellet (MIC) from cytosol (CYT). To increase the purity of the fractions, each pellet recovered was washed in 5 ml HB/g liver tissue and centrifuged using the same parameters. Highly purified peroxisomes were obtained from the LM pellet by applying a sigmoidal Optiprep gradient from 1.26-1.12 g/ml in a vertical type rotor at an integrated force of 1256×10 6 g min. Here, peroxisomes form a distinct band at 1.20 g/ml.
Subcellular separation of homogenates from HepG2 cells was performed in a modified procedure. Harvested cells were homogenized in HB using a syringe (needle 27G, 7 strokes). The differential centrifugation series was performed at 500 g (PNS), 2000 g (HM), 20,000 g (LM), 100,000 g (MIC and CYT). The LM fraction was subsequently separated on a linear Nycodenz gradient from 1.26-1.12 g/ml at 100,000 g for 3 h. The gradient was eluted in 12 equal-sized fractions for further analysis.
Integral membrane proteins were prepared from the peroxisome-enriched fraction LM using the carbonate-stripping method (Fujiki et al., 1982). An aliquot of LM was pelleted at 25,500 g and suspended in a hypo-osmotic TVBE-buffer for organelle rupture (1 mM NaHCO 3 , 1 mM EDTA, 0.01% Triton X-100, pH 7.6). After 30 min incubation on ice the organelle suspension was centrifuged at 100,000 g to yield a soluble matrix fraction and crude membrane pellet. The membrane pellet was subsequently resuspended in 0.1 M Na 2 CO 3 and incubated on ice for 30 min to remove peripherally attached membrane proteins. The integral membrane pellet was prepared by centrifugation at 100,000 g and washed in TVBE buffer applying the same centrifugation parameters. Samples (equal amounts of protein) were subsequently analyzed by immunoblotting.

Immunoprecipitation
For immunoprecipitation experiments, GFP-or FLAG-tagged TA proteins and HA-tagged PEX19 were expressed in COS-7 cells. After 48 h, cells were washed in PBS and then incubated with 1 mM DSP followed by quenching with 100 mM Tris-HCl pH 7.4. Cells were lysed in ice-cold lysis buffer (25 mM Tris-HCl pH 7.5, 150 mM NaCl, 0.5% Triton X-100, 1 mM PMSF and protease inhibitor cocktail), undissolved material was pelleted by centrifugation at 15,000 g and lysates mixed with GFP-TRAP (ChromoTek) or FLAG-antibody-coupled agarose beads and incubated for 2 h at 4°C.
Beads were subsequently washed extensively with lysis buffer and bound proteins eluted with either Laemmli buffer (GFP-TRAP) or 50 mM NaOH (FLAG beads). Immunoprecipitates and total lysates were subsequently analyzed by immunoblotting.

Expression and purification of PEX19 and SurE
Full-length human PEX19 was cloned into vector pETM11. For PEX19 expression, E. coli BL21(DE3)RIL cells were incubated in autoinduction medium (Studier, 2005) at 20°C for 16 h. Subsequently, cells were pelleted, re-suspended in lysis buffer (50 mM Hepes pH 7.5, 200 mM NaCl, 20 mM imidazole), lysed by sonication, loaded onto Ni-NTA resin and eluted with elution buffer (50 mM Hepes pH 7.5, 200 mM NaCl, 300 mM imidazole). The eluted protein was dialyzed overnight into dialysis buffer (50 mM Tris-HCl pH 7.5, 250 mM NaCl, 0.5 mM TCEP) and simultaneously digested with TEV protease (1:50 molar ratio). The protease, affinity tag and undigested protein were removed via a second affinity chromatography step and the cleaved protein was concentrated and purified via Size-Exclusion Chromatography (HiLoad 16/600 Superdex 75 pg, GE Healthcare). SurE (http://www.uniprot.org/uniprot/Q8LAM2) was expressed as an N-terminal poly-histidine fusion in E. coli strain BL21(DE3)RIL using auto-induction medium at 37°C for 4 h and 21°C overnight. Purification was as for PEX19 but following elution the eluted protein was further purified by using a 16/ 600 Superdex 200 pg column.

Fluorescence anisotropy
Fluorescently labeled peptides ACBD5 WT (FITC-SPGVLTFAIIWPFIA-QWLVYLYYQRRRRKL), MUT1 (FITC-SPGVLTFAIIWPFIAQWLV-YLYYQRARAKL) and MUT2 (FITC-SPGVLTFAIIWPFIAQWLVYL-YYQAAAAKL) (Genscript) were used in the assay at a final concentration of 6.7 nM. Note that the C-terminal asparagine residue was removed to facilitate peptide synthesis. Assays were performed in black 96-well plates (Greiner) with an Infinite M1000 plate reader (TECAN) regulated at 25°C, with excitation and detection at 470 and 530 nm, respectively. The experiment was performed in dialysis buffer with 0.67 mg/ml BSA to prevent unspecific binding on the surface of the plastic well. The protein concentration series was obtained by successive dilution by a factor of 1.5 and each point was measured in triplicate. Being highly hydrophobic, the peptides have a tendency to aggregate, resulting in an unusual decrease in anisotropy upon protein titration. Addition of detergents in the buffer prevented aggregation, but interfered with the interaction, and therefore we chose to perform the experiment in a detergent-free buffer, which provided reproducible data. Three independent measurements were performed and binding data were analyzed using Prism (GraphPad software). Binding profiles were fitted using a simple model (hyperbolic equation) assuming 1:1 stoichiometry.
Cell-free chaperone assay mRNA was generated and purified following the manufacturer's instructions (mMessage mMachine T7 Transcription kit and MEGAclear kit; Ambion). Translation reactions were conducted as previously described (Chen et al., 2014b). In short, mRNAs were translated in HeLa cell translation extracts using the 1-Step Human Coupled IVT Kit for DNA (Thermo Scientific) according to the manufacturer's instructions. Reactions were incubated for 2.5 h using 5 μM of chaperone proteins and western blotting used to control for levels of protein expression. Images of aggregates were taken with an epi-fluorescence microscope (BX51; Olympus) equipped with a 100×1.4 NA oil immersion objective and a GFP filter cube. The excitation wavelength is between 457 and 487 nm, the emission wavelength is between 502 and 538 nm, and the dichroic cut-off wavelength is 495 nm. A total of ten separated images were used to generate a maximum projection image in Fiji software. A magnified representative area is shown in Fig. 5. Aggregates were quantified with the 'Analyze Particle' function in Fiji. Three independent experiments were conducted and analyzed with unpaired t-test. Purification of NcPEX19 and NcGET3 were performed as described previously (Chen et al., 2014b). Briefly, NcPEX19 and NcGET3 were expressed from the pET15b (Novagen) vector in E. coli BL21 (DE3, Stratagene) and purified with Ni-NTA resin following the manufacturer's instructions (Qiagen). Eluted proteins were further purified by size-exclusion chromatography using a Hi-load 16/60 Superdex 200 prep grade column (GE Healthcare) equilibrated in Buffer H (20 mM HEPES, pH 6.8, 50 mM KOAc, 200 mM sorbitol and 1 mM MgCl 2 ).

Sequence and bioinformatics analysis
Data on human TA proteins was sourced from the literature (references in Dataset S1 available at https://figshare.com/s/07df2992d588a2f0c98d). SNARE proteins were omitted as they have been previously shown to differ significantly from other ER TA proteins (Kalbfleisch et al., 2007). Protein sequences were obtained from the NCBI database (http://www.ncbi. nlm.nih.gov/), all isoforms were analyzed and those that lacked a C-terminal TMD were removed. Yeast TA proteins were sourced from literature and by homology with human proteins. For the detection of the membranespanning helices in the TA proteins, the TMHMM server v. 2.0 (Krogh et al., 2001) was used. When no TMD was predicted but the protein had been characterized as a TA protein, the TMPred server from ExPASy was used, with a threshold score of 1500 (Hofmann and Stoffel, 1993). As a measure for hydrophobicity, the Grand Average of Hydropathicity (GRAVY) of membrane-spanning helices was calculated (Kyte and Doolittle, 1982), using the ProtParam server from ExPASy (Gasteiger et al., 2005). The charge of the tail sequence was calculated using the Protein Calculator v3.4 (http://protcalc.sourceforge.net). Box-and-whisker plots were created with GraphPad Prism 5 (GraphPad Software) with whiskers representing the smallest and largest value in the sample. PEX19-binding sites were analyzed using the BLOCKS algorithm from the PeroxisomeDB 2.0 database (Schlüter et al., 2007).
For the support vector machine (SVM) classifier (Cortes and Vapnik, 1995), we trained a SVM classifier with the [ protein data] using the SVM application in package e1071 (Meyer et al., 2014), of the R statistical programming environment (R Core team, 2014; http://www.R-project.org/) utilizing the LIBSVM library of Chang and Lin (Chang and Lin, 2011). The SVM takes the training set of [Tail Charge, GRAVY and location in cell], and builds a statistical model to predict the probability of [location in cell], given any combination of [Tail Charge, GRAVY]. Initially, we restrict the training data to three unique classes, corresponding to [location in cell] of mitochondria (MITO), peroxisomes (PO) and endoplasmic reticulum (ER).

Statistical analyses
Analysis of GRAVY, charge, tail length and PEX19 binding were performed using GraphPad Prism 5 software. A two-tailed unpaired t-test was used to determine statistical differences against the indicated group (*P<0.05, **P<0.01, ***P<0.001). For qualitative analyses of organellespecific targeting of TA proteins, a minimum of 300 cells were examined per condition, and organelle localization was microscopically assessed in at least three independent experiments. Data are presented as mean±s.e.m.