Identification of Cargo Proteins Specific for the Nucleocytoplasmic Transport Carrier Transportin by Combination of an in Vitro Transport System and Stable Isotope Labeling by Amino Acids in Cell Culture (SILAC)-based Quantitative Proteomics*

The human importin-β family consists of 21 nucleocytoplasmic transport carrier proteins that carry proteins and RNAs across the nuclear envelope through nuclear pores in specific directions. These transport carriers are responsible for the nucleocytoplasmic transport of thousands of proteins, but the cargo allocation of each carrier, which is necessary information if one wishes to understand the physiological context of transport, is poorly characterized. To address this issue, we developed a high-throughput method to identify the cargoes of transport carriers by applying stable isotope labeling by amino acids in cell culture to construct an in vitro transport system. Our method can be outlined in three steps. (1) Cells are cultured in a medium containing a stable isotope. (2) The cell membranes of the labeled cells are permeabilized, and proteins extracted from unlabeled cells are transported into the nuclei of the permeabilized cells. In this step, the reaction system is first depleted of all importin-β family carriers and then supplemented with a particular importin-β family carrier of interest. (3) Proteins in the nuclei are extracted and analyzed quantitatively via LC-MS/MS. As an important test case, we used this method to identify cargo proteins of transportin, a representative member of the importin-β family. As expected, the identified candidate cargo proteins included previously reported transportin cargoes as well as new potential cargoes, which we corroborated via in vitro binding assays. The identified cargoes are predominately RNA-interacting proteins, affirming that cargoes allotted to the same carrier share functional characteristics. Finally, we found that the transportin cargoes possessed at least two classes of signal sequences: the well characterized PY-nuclear localization signals specific for transportin, and Lys/Arg-rich segments capable of binding to both transportin and importin-β. Thus, our method will be useful for linking a carrier to features shared among its cargoes and to specific nuclear localization signals.

pected, the identified candidate cargo proteins included previously reported transportin cargoes as well as new potential cargoes, which we corroborated via in vitro binding assays. The identified cargoes are predominately RNA-interacting proteins, affirming that cargoes allotted to the same carrier share functional characteristics. Finally, we found that the transportin cargoes possessed at least two classes of signal sequences: the well characterized PY-nuclear localization signals specific for transportin, and Lys/Arg-rich segments capable of binding to both transportin and importin-␤. Thus, our method will be useful for linking a carrier to features shared among its cargoes and to specific nuclear localization signals. Most genetic processes, including chromosome replication and transcription, occur in the nucleus, and thus the selection of proteins that enter the nucleus and act there is crucial for cellular processes (1). During interphase, all nuclear proteins cross the nuclear envelope via nuclear pores, channels that constitute a selective permeability barrier for macromolecules. Only a fraction of proteins traverse these channels by means of free diffusion; most nuclear proteins are imported into or exported out of the nucleus with the assistance of importin-␤ (Imp-␤) 1 family proteins (Imp-␤s) (2,3). Imp-␤s interact with the nuclear pore complex in a way that allows them to travel in and out of the nucleus. To drive cargo transport, they utilize the GTPase cycle of small GTPase Ran. Import carriers bind to their cargoes in the cytoplasm and then travel to the nucleus, where they release their cargoes upon binding to RanGTP. The RanGTP-bound carriers recycle back to the cytoplasm, where RanGTP is converted to RanGDP. Export carriers form trimeric complexes with their cargoes and RanGTP in the nucleus, and the export complexes disassemble in the cytoplasm (2,3). The human genome encodes 21 Imp-␤s, of which 12 are nuclear import carriers, 7 are export carriers, and 2 are bidirectional carriers (4,5). Among the Imp-␤s, Imp-␤ is special in that it often uses Imp-␣ family proteins as adapters to bind to cargoes, but the transport activity is still basically attributable to Imp-␤ (6).
Imp-␤s import thousands of cargo proteins into the nucleus and export a large number. Although some cargoes are transported by multiple carriers, most cargoes are expected to be transported via specific interactions, and a group of cargoes transported by a carrier might share specific functions. Therefore, to reveal the role of each carrier in various cellular processes, it is imperative to define which carriers transport which cargoes. Unfortunately, only a limited number of cargocarrier interactions have been characterized, and in some cases few or no cargoes have been linked to specific Imp-␤ family carriers.
Systematic searches for cargo proteins have been performed, especially for those dependent on Imp-␣ to associate with Imp-␤, using Far Western-based cellular proteomics (7) and the screening of in vitro virus libraries (8) and oriented peptide libraries (9). The experiments using synthetic libraries helped to refine the classical nuclear localization signal (cNLS) sequences to which Imp-␣ binds, but they did not detect natural cargo proteins directly. Cargo proteins for Imp-␤ family carriers have mainly been identified using affinity methods (4, 10 -14) or yeast two-hybrid screening (15). These methods were based on Imp-cargo protein binding. Recent studies, however, have reported that the functions of Imp-␤s are not limited to nucleocytoplasmic transport and are also relevant to diverse cellular processes (16 -18). In this study, we designed a transport-based method for cargo identification by combining stable isotope labeling by amino acids in cell culture (SILAC) (19), an in vitro transport system (20,21), and MS. Our use of SILAC-based quantitative MS differs from the typical methods so far reported that compare cellular protein levels under different conditions. We used SILAC to prepare permeabilized recipient cells for an in vitro transport system and incubated them with unlabeled proteins, which are the source of import substrates, and thereby proteins imported into the nuclei could be distinguished from the proteins of the recipient cells.
Historically, the first Trn-binding site identified was the "M9 sequence" (25), which we now understand to be an instance of the PY-NLS (24), a Trn-recognition signal which almost all known Trn cargoes appear to contain. In terms of sequence constraints, the PY-NLS is a loosely defined motif usually ending with a proline-tyrosine (Pro-Tyr) dipeptide, which can be further subdivided into hydrophobic and basic types, depending on the nature of its N-terminal region (24). Previous work also suggests the possible existence of another NLS for Trn, distinct from the PY-NLS. In 1998, Jä kel and Gö rlich reported a basic region resembling a cNLS within residues 32-74 of RL23A (rpL23a) that can serve as an NLS for Trn as well as the import carriers Imp-␤, importin 5 (RanBP5), and importin 7 (RanBP7) (10). RL23A is the only clear instance of this so-called ␤-like import receptor binding (BIB) domain, but circumstantial evidence hints that this might be a more general signal: (1) the ribosomal proteins rpS7 and rpL5 contain similar sequences and are imported by several Imp-␤s, including Trn, although binding to Trn could not be confirmed in vitro (10); (2) the ribosomal proteins rpS6 (26) and rpL7a (27) contain basic regions that serve as NLSs, although their carriers are not yet identified; and (3) the C-terminal domain of the nucleoporin ELYS contains a basic segment and interacts with both Trn and Imp-␤ (28).
The present study was performed with two goals: (1) to establish our transport assay as a valuable new tool that can complement conventional techniques for import carrier cargo identification, and (2) to discover novel Trn cargoes and gain insight into the range of sequences recognized by Trn.

EXPERIMENTAL PROCEDURES
The buffers, ATP regeneration system, proteins, antibodies, cytosolic and nuclear extracts, evaluation of the transport system, and identification of Ran-regulated Trn-interacting proteins via a pulldown method are described in the supplemental data.
In Vitro Transport-A transport system modified from a previous study (20,21) was used. Adherent HeLa-S3 cells were labeled with u-13 C 6 Lys and u-13 C 6 Arg via SILAC and seeded onto a 30-mm glass plate in a dish. In vitro transport was performed on the glass plate. (Materials were kept on ice thereafter unless otherwise stated.) The cells were then rinsed in transport buffer (TB) (20 mM HEPES-KOH (pH 7.3), 110 mM KOAc, 2 mM MgOAc, 5 mM NaOAc, 0.5 mM EGTA, 2 mM DTT, and 1 g/ml each of aprotinin, pepstatin A, and leupeptin) twice, permeabilized through incubation with 40 g/ml digitonin in TB for 4 min, and then rinsed twice and incubated in TB for 5 min. The resulting permeabilized cells were pretreated with TB containing 4 M RanGDP and an ATP regeneration system for 20 min at 30°C to remove residual Imp-␤s before the transport reaction. After rinsing with TB twice, the transport reaction was carried out through incubation of the cells in 250 l of transport mixture (50% Imp-depleted cytosolic extract, 10% Imp-and RCC1-depleted nuclear extract, 1 M p10/NTF2, and ATP regeneration system in TB) with (ϩTrn) or without (control) 0.3 M Trn for 20 min at 30°C. Reactions with and without Trn were always performed simultaneously. After the reaction, the cells were rinsed with TB twice and then incubated in 250 l of extract mixture (50% Imp-depleted cytosolic extract and ATP regeneration system in TB) for 20 min at 30°C to remove proteins binding nonspecifically to the permeabilized cells. The cells were rinsed with NaCl-TB (110 mM NaCl instead of KOAc in TB) three times and suspended in 100 l of nuclear buffer (20 mM Tris-HCl (pH 8.0, 4°C), 420 mM NaCl, 1.5 mM MgCl 2 , 0.2 mM EDTA, 2 mM DTT, and 1 g/ml each of aprotinin, pepstatin A, and leupeptin). Proteins were extracted from the permeabilized cells via sonication and clarified via centrifugation for 15 min at 20,000 ϫ g. Evaluation of the transport system is shown in Fig. 1B and supplemental Figs. S1-S3.
Protein Identification and Quantification via LC-MS/MS-Two hundred micrograms of proteins collected from five to six transport reactions were separated via SDS-PAGE in one lane of a 7.5%-12.5% gradient polyacrylamide gel (height ϭ 16 cm, thickness ϭ 1 mm). The gel was sliced into 36 pieces per lane and incubated in 100 mM ammonium bicarbonate (pH 8.0) buffer containing 10 mM DTT for 1 h at 60°C. After alkylation with 55 mM iodoacetamide in 100 mM ammonium bicarbonate buffer for 30 min at room temperature, the proteins were in-gel-digested with 1 pmol of trypsin (Promega, Madison, Wisconsin) in 50 l of the same buffer for 16 h at 37°C. Half the amount of peptides recovered from each gel piece was analyzed separately using a nanoflow LC linear ion-trap time-of-flight (TOF) mass spectrometer (NanoFrontier LD, Hitachi, Tokyo, Japan). The liquid chromatography was performed on a PicoFrit C18 column (75 m ϫ 100 mm, particle size ϭ 3 m; NewObjective, Wobum, Massachusetts) with elution (200 nl/min) using a 2%-49% acetonitrile gradient over 45 min in the presence of 0.3% formic acid. The mass spectrometer was operated in the data-dependent MS/MS mode. Precursor ions were measured by the TOF mass analyzer and subjected to fragmentation in the linear ion trap. For protein identification, peaks in the spectra were extracted by Mascot Distiller (version 2.3, Matrix Science, London, UK) with the correlation threshold set at 0.7, and the resultant peak lists were queried against the Swiss-Prot database (release 2010_04) using Mascot (version 2.2, Matrix Science). The taxonomy of interest was set as Homo sapiens (20,350 sequences). The search parameters used were as follows: enzyme cleavage specificity, trypsin; number of missed cleavages permitted, Յ2; fixed modification, Cys-carbamidomethylation; variable modification, Met-oxidation; quantitation, SILAC Kϩ6 Rϩ6 [MD]; mass tolerances for precursor and fragment ions, Ϯ0.5 Da. Quantitative analysis of light/heavy (L/H) ratios of precursor ions was performed using Mascot Distiller, in which protein hits of the Mascot search were passed through the filter of significance threshold p Ͻ 0.05 (the default value of Mascot Distiller), and the L/H ratio of each precursor ion peptide was calculated from the peak areas in the MS spectra. To ensure reliable quantification, only high-quality peaks (intensity Ͼ 300, charge Ն ϩ2, standard error Ͻ 0.1, fraction Ͼ 0.5, and correlation Ͼ 0.8, as defined by Mascot Distiller) were considered for further analysis; proteins with no peptides meeting the above criteria in either the control or ϩTrn sample were excluded. For each parent protein, arithmetic mean L/H ratios were calculated separately for peptides from the control and ϩTrn samples (L/H Ctl and L/H Trn , respectively). The T/C value (calculated as (L/H Trn )/(L/H Ctl )) was defined to indicate the cargo potentiality of each protein. Proteins and peptides that satisfied the above criteria are listed in descending T/C value order in supplemental Table S1 and presented graphically in supplemental Fig. S4. In order to select proteins with highly reliable T/C values, we also applied two arbitrary thresholds to the L/H ratios. To discount spectra from proteins that bound non-specifically to the permeabilized cells (see supplemental Fig. S2), the L/H Ctl value was required to be Ͻ1; to avoid inaccuracies near the lower limit of relative quantification, the values of L/H Ctl and L/H Trn were required to be Ͼ0.03. The proteins that passed each test are listed in Table I. For each quantified protein, one representative MS/MS spectrum with annotation was depicted by Peptide View of the Mascot search program, saved in the mht file format of Internet Explorer (Microsoft, Redmond, Washington) using a homemade Visual Basic program, and shown in the "spectra" folder of the supplemental data. To further demonstrate the validity of our protein identification, the Mascot search results are presented in the peptide summary format with a cutoff score of 20 for accepting individual MS/MS spectra (supplemental Table S2A). False discovery rates determined by decoy searches were less than 10%. Because a few proteins, especially those who are members of large protein families, are missing in this table because of the different attributions of common peptides among family proteins, the result of querying with a cutoff score of 10 is summarized (supplemental Table S2B) to show that the protein identification with a cutoff score of 10 was consistent with that in supplemental Table S1.

Transport-based Cargo Identification System-Screening
for candidate cargo proteins relies on the relative quantification of stable isotope-labeled "heavy" and unlabeled "light" peptides by means of MS, together with an in vitro reconstituted transport system (Fig. 1). HeLa cells were labeled with u-13 C 6 Lys and u-13 C 6 Arg via SILAC, and their cell membranes were permeabilized by digitonin. Residual Imp-␤s associated with nuclei of the permeabilized cells were removed through incubation with Ran and ATP. Nuclear and cytosolic extracts (the former a source of cargo proteins) were prepared from unlabeled HeLa cells and depleted of Imp-␤s using phenyl-Sepharose (31) (supplemental Figs. S1A-S1C). Imp-␤s are almost completely removed from the extracts by this method (32). The nuclear extract was further depleted of RCC1 (a guanine nucleotide exchange factor for Ran) using a Ran affinity method (supplemental Fig. S1D) because cytoplasmic RCC1 impairs the import reaction. The transport reaction was carried out using these materials in the absence (control) or presence (ϩTrn) of recombinant Trn. Unlabeled proteins from the extracts that bound non-specifically to structural components of the permeabilized cells were removed after the transport reaction through treatment with ATP and cytosolic extract depleted of Imp-␤s and subsequent washing with a salt solution (supplemental Fig. S3). Proteins were extracted from the permeabilized cells, and their tryptic peptides were analyzed via LC-MS/MS to identify the proteins and quantify their light and heavy peptides (Fig. 1C). The performance of the transport system was evaluated with reporter cargoes: GST-NLS-GFP specific for Imp-␤ with Imp-␣, and GST-M9-GFP for Trn (Fig. 1B). From a mixture of the two cargoes, both the Imp-␤ with Imp-␣ and Trn imported their specific cargo selectively. Likewise, natural cargo proteins labeled with biotin in the nuclear extract were imported selectively by these carriers (supplemental Fig. S2). Thus, our transport system provides import specificity sufficient for cargo identification.
Identification of Trn-specific Cargo Proteins-The MS/MS spectra were queried against the Swiss-Prot database, and the L/H ratio in the MS spectra for each precursor peptide ion was calculated using Mascot Distiller. Over 650 proteins each were identified in the control and Trn-reacted permeabilized cell extracts, with about 480 proteins detected in both extracts (supplemental Table S2A). Applying quality thresholds for quantification to the spectrum profiles reduced the number of proteins with valid L/H ratios to 298. The arithmetic mean L/H ratios for the peptides derived from each protein were calculated separately for the control (L/H Ctl ) and ϩTrn (L/H Trn ) samples, and the T/C ratio ((L/H Trn )/(L/H Ctl )) for each protein was used as an index of cargo potentiality. The 298 proteins are listed in descending T/C value order in supplemental Table S1 and presented graphically in supplemental Fig. S4. Highly ranked proteins showing outstanding T/C values are expected to be Trn cargoes, but the T/C values might be inaccurate as a result of experimental limitations. We thus further confined the proteins to those with more reliable T/C values by excluding proteins that might have bound nonspecifically to the permeabilized cells or that had extremely low L/H ratios (see "Experimental Procedures"). Ultimately, 90 proteins were selected, as listed in descending T/C value order in Table I and presented in Fig. 2E. Hereinafter, the proteins are referred to by their numbers as presented in Table I. Proteins 1-8 that were assigned very high T/C values include four previously reported Trn cargo proteins. Proteins 9 -18, with lower but still significantly high T/C values, include a predicted and an experimentally defined cargo protein. We postulate that these highly ranked proteins are potential Trn cargoes, but with two clear exceptions. RA1L3 (hnRNP A1like) is a pseudogene, mistakenly identified because of the mis-assignment of ROA1 (hnRNP A1) derived peptides, and the high T/C value for IMB1 (Imp-␤) resulted from export inhibition of residual Imp-␤ (supplemental Fig. S5). In some cases, proteins may be identified as different proteins of the same or related families possessing common amino acid sequences, depending on the parameters for database search (supplemental Tables S2A and S2B). Among the highly ranked proteins, one hnRNP H family protein was identified as FIG. 1. Transport-based cargo identification. A, method outline. Cells containing heavy proteins (magenta) were generated via SILAC and permeabilized with digitonin. Light proteins (cyan) in unlabeled nuclear extract were imported into the nuclei by Trn. Then, nuclear proteins were extracted and analyzed by means of LC-MS/MS. The light/heavy peptide ratios were measured concurrently with protein identification. Experiments with (ϩTrn) and without (control) Trn were performed simultaneously, and their results were compared. B, evaluation of the import assay. The reporter cargoes GST-NLS-GFP and GST-M9-GFP, which are specific for Imp-␤ plus Imp-␣ and Trn, respectively, were mixed and added to the transport reaction without carriers (a), with 0.6 M Imp-␣ and 0.3 M Imp-␤ (b), or with 0.3 M Trn (c). The cells were observed via fluorescence microscopy. (d) After the transport reactions with the indicated concentrations of carriers, proteins were extracted and cargoes were examined using Western blotting with an antibody specific for GST. The transport of proteins in nuclear extract is evaluated in supplemental Fig. S2. C, typical mass spectra for a peptide derived from a Trn cargo (ROA1). The peaks corresponding to the light (m/z: 848 -850) and heavy (m/z: 851-853) peptides were detected. The relative levels of the light peaks were compared between the control and ϩTrn spectra as follows: T/C ϭ (L/H Trn )/(L/H Ctl ). Red, actual measurement; black, predicted spectrum.  (33). We adopted HNRPH1, as the representative of the family, for further analysis. Most of the 16 cargo candidates that we selected were not identified by means of a pull-down method reported previously (14). In that report, 15 proteins that bound to Trn in the absence of Ran but not in its presence were isolated from HeLa cell extract. The cargo candidates identified in this study differ from them, except for only a few (proteins 3, 7, and 8).
To further clarify the relative strengths of our transport assay vis-à -vis conventional pull-down assays, we carried out a pull-down-based assay designed to detect Ran-regulated Trn-cargo interactions using the same depleted nuclear extract as used in the transport-based methods. From the ex-tract, Trn-binding proteins were precipitated by GST-Trn with GSH-Sepharose beads, and after washing, proteins were eluted by the GTP-fixed mutant Q69L-Ran or a control buffer (see supplemental "Experimental Procedures " and supplemental Fig. S6), mimicking the protein-protein interactions in Trn mediated protein import. Proteins enriched in the Q69L-Ran eluate were identified via LC-MS/MS. Many proteins were identified in the LC-MS/MS analysis for each band in the SDS-PAGE, and the ten proteins with the highest scores are listed in supplemental Table S4A. Of the 16 cargo candidates obtained via the transport-based method, eight (proteins 6, 8, 10 -14, and 17) are absent in supplemental Table S4A. These results indicate that the cargo proteins identified in the transport-based method partly overlap with those from pull-down methods but differ considerably, indicating that our transport- c Light/heavy ratio of the ϩTrn reaction. d The light/heavy ratio of the ϩTrn reaction was divided by the light/heavy ratio of the control reaction. e Results of the bead halo assay: ϩϩ and ϩ, positive; Ϫ, negative; Ϯ, unclear; blank, not done; (ϩϩ) and (ϩ), positive for Trn binding when GFP-Lys/Arg-rich segment fusions were assayed (Fig. 3D).
f,g The assays were not done or the results were obscure, because the proteins were not expressed (g) or degraded extensively (f) in E. coli. h Potential NLS for Trn-binding: PY, PY-NLS; BIB, BIB-domain-like NLS; -, undetermined; blank, not examined. i Three peptides were matched in the Mascot search, but their sequences exist in ROA1 also. j Predicted in this study or the indicated reference. k Result of Xenopus L5.
based method can complement conventional pull-down methods.
Interaction between Identified Cargoes and Trn-Next, we confirmed the physical interactions between the identified candidate cargoes and Trn through a bead halo assay (29). The highly ranked proteins and randomly selected lower ranked proteins in Table I were expressed as GFP fusion proteins in Escherichia coli (supplemental Table S3). The bacterial extracts were mixed with GST-Trn fusion proteincoated GSH-Sepharose and then observed via fluorescence microscopy (Fig. 2). The GFP fusion proteins were grouped according to expression level, and the bacterial extracts containing them were normalized within each group (see supplemental "Experimental Procedures " and supplemental Fig. S7A). GFP fusions of proteins 2, 3, 5-7, 10, 11, and 18 displayed brighter halos, which disappeared after the addition of Q69L-Ran. Neither the combination of GFP alone mixed with GST-Trn nor GFP fusion proteins mixed with GST alone produced halos. Therefore, the observed binding was specific. Halos for GFP fusions of proteins 4, 12, 17, and 31 were evident in the enhanced images and disappeared after the addition of Q69L-Ran. In contrast, GFP fusions of proteins 21,

FIG. 2. Interaction of candidate cargoes with Trn.
A-D, interactions examined via bead halo assay. Bacterial extracts containing GFP-fusion proteins were mixed with GSH-Sepharose beads coated with GST or GST-Trn fusion proteins and then observed via fluorescence microscopy. Q69L-Ran, which inhibits the Trn-cargo specific interaction, was added as indicated. The protein numbering is identical to that in Table I. The images within each panel are comparable. PC, phase contrast; FL, fluorescence micrograph. A, B, the assay and microscopy conditions were appropriate for proteins expressed at high levels in E. coli. B, images taken with the same exposure time as in A were enhanced. C, D, proteins expressed at lower levels. The GFP moiety concentrations were half (C) and one-fifth (D) of those in A and B. Accordingly, the exposure time was longer. E, T/C values predict Trn cargoes. Proteins quantified via MS were ranked in descending T/C value order (Table I). Horizontal, protein rank; vertical, T/C value. Magenta, protein (or segment, Figs. 3C and 3D) that bound to Trn in the bead halo assays; blue, unclear; cyan, not bound; orange, not examined here but reported to be Trn cargo; green, IMB1; black, others. RA1L3 (sequence from pseudogene) is not shown.  (8) were searched in the identified cargoes not carrying PY-NLS sequences. We expanded the definition of bipartite cNLS to derive more candidate sequences. KϩRՆ3/5, three or more Lys or Arg within five residues; [ ], any one of the amino acids within the brackets is allowed; [ ∧ DE], any residue other than Asp or Glu is allowed; green, Lys; cyan, Arg. C, segments fused to GFP. Nineteen cNLS motif matches (underscored) were found in the cargoes, and segments consisting of 50 residues centered on them were expressed as GFP fusion proteins. Some segments were shorter than 50 residues, because the cNLS motif match was near the N-or C-terminus of the protein. The segments are shown in five groups ordered, from the top, as follows: (1) the BIB domain of RL23A reported to bind to both Trn and Imp-␤ (not assayed here), (2) the four fragments that bound to both GST-Trn and GST-Imp-␤, (3) RU2A-I, which bound only to GST-Trn (weakly), (4) the fragments 38, 53, 63, 66, 69, and 88 produced faint or undetectable halos. GFP fusion of protein 8, which was heavily degraded in E. coli (supplemental Fig. S7A), gave ambiguous results. GFP fusions of proteins 9 and 13-15 were either not expressed or severely degraded in E. coli extracts. In summary, proteins with higher T/C values bound to Trn, whereas those with values less than 0.85 did not. Consequently, we believe that 16 proteins (proteins 2-15, 17, and 18) are potential Trn cargoes. Protein 21 and lower-ranked proteins did not bind to Trn, with the exception of protein 31, HNRPU (hnRNP U). Given that many hnRNPs are reported to be Trn cargoes, HNRPU is most likely a specific cargo. As this example illustrates, in the intermediate T/C value range, some disagreement between the T/C value and cargo potentiality might be unavoidable.
BIB Domain-like Lys/Arg-rich NLS-Eight highly ranked proteins (proteins 2-5, 7, 8, 12, and 18) contained sequences identical or similar to the well-established Trn-binding sequence PY-NLS (24) (Fig. 3A), which was originally identified in ROA1 as the M9 sequence (25). Other highly ranked proteins (proteins 6, 9 -11, 13-15, and 17) do not contain similar sequences, though they bound to Trn in the bead halo assays. Another type of Trn-binding sequence, the BIB domain in RL23A, which is enriched in basic residues and binds to other carriers as well, was reported previously (10). We examined the possibility that the highly ranked proteins contain a BIBdomain-like NLS instead of the PY-NLS that is responsible for Trn binding. The defining features of the BIB domain are not clear, except that it overlaps cNLS motifs, in which Lys and Arg are the key elements. We therefore hypothesized that the presence of a cNLS motif might be a prerequisite for this type of NLS and searched for the six classes of cNLS motifs (8) shown in Fig. 3B. Among the eight highly ranked proteins lacking the PY-NLS, six contained one or more matches to the cNLS motifs (19 in total) generally found in Lys/Arg-rich regions. Because the BIB domain of RL23A is longer than the cNLS motifs, we selected nineteen 50-residue segments with the cNLS motifs placed at the center (Fig. 3C). Each segment was fused to GFP (supplemental Fig. S7B) and examined for Trn binding using a bead halo assay (Fig. 3D). The GFP fusion to one segment from RL27 (RL27-I) was heavily degraded and could not be measured. Of the remaining 18 segments, 5 bound to Trn, including the first Lys/Arg-rich segment of SF3B2 (SF3B2-I), for which Trn binding could not be measured for the full-length protein (Table I). Four out of the five segments were found to bind directly to Imp-␤ (Fig. 3D), a feature shared with the reported BIB domain. Interestingly, the segments that bound to both Trn and Imp-␤ differed in amino acid composition from the other 13 segments, showing a high ratio of Arg to AspϩGlu content (Fig. 3E).
Next, we analyzed Imp-␤ binding of the full-length proteins containing the BIB-domain-like NLSs and PY-NLSs (Fig. 4). All of the proteins with BIB-domain-like NLSs bound to Imp-␤, including RU2A, whose Lys/Arg-rich segment (RU2A-I) failed to bind to Imp-␤ (Fig. 3D). The extensive degradation of GFP-RU2A-I (supplemental Fig. S7B) might explain this discrepancy. In contrast, the PY-NLS-containing proteins did not bind to Imp-␤, with the exception of SNRPA, which bound only weakly. These results confirm that PY-NLSs generally bind to Trn exclusively, whereas BIB-domain-like NLSs bind to both Trn and Imp-␤ (Fig. 4D).
To be certain that the BIB-domain-like segments are truly distinct from PY-NLSs, we conducted an additional analysis based on a very broadly defined PY-NLS motif (supplemental Fig. S8A) that covers all PY-NLSs summarized by Sü el et al. (34), as well as many spurious matches (supplemental Fig.  S8B). Even with this broad definition, SF3B2-I did not overlap with a PY-NLS motif match, and RL3-I and RL3-II overlapped only by one or two residues to the N-terminal tail of PY-NLS motif matches. So we conclude that the Trn-binding shown by these segments is not attributable to a PY-NLS. RL27-II partially overlapped a basic PY-NLS motif match with an atypical proline-leucine instead of Pro-Tyr, but the N-terminal basic epitope part of this match is outside of the segment tested in the fusion protein (whose sequence does not fully match either type of PY-NLS). So these four Trn-binding segments cannot be explained in terms of binding to PY-NLSs. In contrast, the weakly Trn-binding segment RU2A-I overlaps entirely with a PY-NLS motif match, and so without further experimental analysis, we cannot rule out the possibility that the Trn binding of that segment is due to a PY-NLS. Nevertheless, our results suggest that the BIB-domain-like NLSs, which had been previously demonstrated only in RL23A, can be classified as more general NLSs found in several proteins. DISCUSSION We developed a transport-based system for identifying cargo proteins specific for individual Imp-␤s and used it for large-scale screening of Trn-specific cargoes. As a result, 298 proteins produced adequate MS spectra for calculating T/C that bound to neither GST-Trn nor GST-Imp-␤, and (5) the heavily degraded RL27-I. Green, Lys; cyan, Arg; magenta, Asp or Glu. The numbers displayed on the left and right of each segment indicate amino acid positions in the protein sequences listed in supplemental Table S3 and denote the co-ordinates of that segment in its UniProt protein sequence. The one exception is PARP1-II (marked with an asterisk). The sequence of PARP1 isoform (supplemental Table S3) that we cloned diverges from the one in UniProt in the last half of this segment, and DNA for the segment was cloned separately from a HeLa cDNA library. D, interactions between the Lys/Arg-rich segments and Trn or Imp-␤. The segments fused to GFP were analyzed with GST-Trn or GST-Imp-␤ in the bead halo assay. Q69L-Ran was added as indicated. The images in the right-hand panel were recorded with a longer exposure. E, the AspϩGlu content of the segments in groups (1)- (4) in C plotted against their Arg content. (1) cyan diamond, (2) magenta diamond, (3) magenta square, and (4) black diamond. values, the index for cargo potentiality (supplemental Table S1 and supplemental Fig. S4). Because the L/H ratios of some proteins binding non-specifically to the permeabilized cells might not reflect transport efficiency, we removed proteins with high L/H Ctl ratios (Ͼ1) and also ruled out proteins with low L/H Trn and L/H Ctl ratios (Ͻ0.03) to ensure the accuracy of the quantification. Notably, only a few of the proteins with high T/C values failed to pass the L/H ratio criteria (compare Fig.  2E and supplemental Fig. S4), and the yield of cargo candidates was not greatly affected by this selection. Ultimately we obtained 90 proteins with highly reliable T/C values as listed in Table I, and among those, proteins 1-18, except for IMB1 and RA1L3, are possible Trn cargoes. Of those 16 proteins, 12 bound to Trn in the bead halo assay (Figs. 2 and 3), 8 are reported or predicted Trn cargoes (Table I), and 8 contain the PY-NLSs or analogous sequences (Fig. 3A). These facts support the likelihood that other proteins with similar T/C values are also bona fide Trn cargoes, including proteins 13-15, which lack additional corroborating evidence.
Although we successfully identified a considerable number of known and novel Trn cargoes, we also missed many known cargoes. To elucidate the reason for this, we traced the progression through our transport assay of 44 previously reported (either demonstrated or predicted) Trn cargoes (supplemental Table S5A). Among the 44 proteins, 21 are included in the 650 proteins identified in one or both of the control and ϩTrn cell extracts (supplemental Table S2), but only 13 pro-teins are included in the 298 proteins quantified in both the extracts adequately enough to calculate T/C values (supplemental Table S1). Thus, many reported cargoes, and those predicted by Lee et al. (24) (supplemental Table S5B), were not identified or quantified in the present LC-MS/MS analysis. We speculate that the absence of these defined or predicted cargoes in our LC-MS/MS result is attributable to the current performance limitations of the instrument, rather than to the sample preparation. Western blotting (supplemental Fig. S9A) revealed that both the control and ϩTrn extracts applied to LC-MS/MS analysis contained the well-characterized Trn cargoes EWS, TAP/NXF1, FUS, SAM68, and HuR, but they were not detected by the LC-MS/MS (see supplemental Table S5). The mass spectrometer was operated in the data-dependent acquisition mode. Although this currently is a standard operation mode for attaining high-throughput quantification with accuracy, it often fails to identify proteins or peptides that are not separated from abundant ones in SDS-PAGE or LC. Because LC-MS/MS analysis is a generic technique that can be substituted by an improved method, the numbers of proteins that we identified (650) and quantified (298), and consequently the number of cargoes that can be assigned T/C values, are expected to increase significantly with the use of advanced MS techniques that are rapidly developing. Thus, we believe our method is promising for higherthroughput identification of cargoes, although the identification of those with very low copy numbers or extensive FIG. 4. Interaction between Imp-␤ and full-length Trn cargoes. A-C, Imp-␤ binding of the full-length proteins. The Trn cargoes carrying BIB-domain-like NLSs or PY-NLSs were analyzed for binding to Imp-␤ in the bead halo assay. Proteins are grouped as in Fig. 2, and the images are comparable within each panel. The results of PPIB and PCNA are presented as negative controls. The class of NLS (PY-NLS or BIB-domain-like NLS) is indicated. D, Trn NLSs and carriers. The BIB-domain-like NLS binds to both Trn and Imp-␤ directly, whereas PY-NLS binds to Trn exclusively. Undiscovered NLSs differing from these two might also exist. post-translational modification might remain difficult without an alternative approach.
In Western blotting (supplemental Fig. S9A), the cargoes were detected at almost the same level in the control and ϩTrn extracts. This does not necessarily indicate low transport activities, because the signal intensity represents the sum of the imported and endogenous proteins. For example, the expected signal intensity ratio of protein 10, RU2A, with L/H Ctl : 0.714 and L/H Trn : 0.806 is ϩTrn/control ϭ (1 ϩ 0.806)/(1 ϩ 0.714) Ϸ 1.05, and the 5% difference might be hardly apparent in Western blotting. Nonetheless, Trn efficiently imported recombinant GFP-RU2A in our transport system (supplemental Fig. S9B). Likewise, Trn imports GFP fusion proteins of the reported Trn cargoes, TAP/NXF1, SAM68, HuR, and PABP2, in our system (supplemental Fig. S9B), though they were not detected in the LC-MS/MS analysis. Given that many of the known cargo proteins, albeit not all, were present in the depleted nuclear extract, and because Trn imports the ones we tested, it is likely that many more imported cargoes could be identified with the use of a more advanced instrument.
From the 298 proteins successfully quantified, 198 with L/H Ctl Ͼ 1 and 10 with L/H Ctl or L/H Trn Ͻ 0.03 were removed from consideration, leaving the 90 proteins in Table I. In this step, the number of previously reported cargoes decreased from 13 to 8 (supplemental Table S5A). However, the ratio of previously reported cargoes increased from 90/298 (30%) to 8/13 (62%). Thus, despite the extensive elimination of proteins from consideration, our method has the potential to identify cargoes at a practical level.
Anecdotal analysis justifies our elimination criteria in many cases. For example, we eliminated the heat shock proteins Hsp70 and Hsp90, which interact with a variety of proteins, might well adhere to the permeabilized cells, and often exhibit non-specific binding in conventional protein-protein interaction assays. So the inability to provide clear results for heat shock proteins is a common problem for many assay systems, not just ours. One known Trn cargo, Histone H2B, was discarded on the basis of its low L/H ratio, but it would be an exceptional case. L/H ratios of histones will be low even if they are imported in the transport system, because endogenous histones in the permeabilized cells are exceptionally abundant. The effect of the threshold L/H: 0.03 on the final number of cargoes is negligible, because only ten proteins (e.g. VIME and MDHM, supplemental Table S1) were eliminated from the total protein list based on the threshold. Thus, only three known cargoes were dismissed because of the threshold L/H Ctl : 1.0, and the non-specific binding of proteins influences the cargo identification less severely than might be expected.
Significantly, our transport-based method identified a set of cargoes different from that identified by the conventional pulldown method carried out in this and other studies (14). The major merits of our transport-based method are that it fully involves the cellular transport system and accumulates car-goes in the nuclei. The pull-down experiment carried out in this study also involves the biochemical function of Trn-Ran interaction, mimicking the cargo release from Trn within a nucleus. In our hands, however, the separation of specific cargoes from the background of non-specific proteins was not effective. Many proteins were readily released into both the control and Q69L-Ran eluted fractions (supplemental Fig.  S6), probably because the cargo dissociation from Trn is rapid. A washing condition that removes nonspecific proteins but preserves specific cargoes on Trn is difficult to establish, and specific cargoes dissociate from Trn even in the control buffer during the elution. Although our pull-down-based method leaves room for improvement, it is clear that the transport-based method has the advantage of accumulating cargoes into the nuclei even though their interaction with their carriers is short-lived. Thus the transport-based method can complement pull-down methods by identifying a different subset of cargoes.
The cargoes that we identified are mainly RNA metabolism factors, including ribosomal proteins. This observation agrees with the features of Trn cargoes that have been reported to date and demonstrates that RNA interaction is a general attribute of many Trn cargoes. We expect that cargoes transported by the same carrier might share a specific function, and our method should be useful in elucidating common properties of cargoes transported by other Imp-␤ family carriers.
Our results are consistent with the view of PY-NLS as a Trn specific signal. None of the PY-NLS carrying cargoes tested here bound to Imp-␤ (Fig. 4). Some PY-NLS carrying Trn cargoes, such as TAP/NXF1 and SNRPA, are known to bind to other import carriers, but the binding to the other carriers is not necessarily attributable to their PY-NLSs. The N-terminal 100 residue region of TAP/NXF1 binds to Trn and some other Imp-␤s. However, both a divergent cNLS and a hydrophobic PY-NLS can be assigned separately within this region, with amino acid substitutions in the former primarily hampering binding to Imp-␤, whereas in the latter binding to Trn is affected (35). SNRPA carries a PY-NLS (Fig. 3A) and binds to Trn (Fig. 2B), but it also carries a cNLS and binds to Imp-␣ (36).
Eight of the Trn cargo proteins identified here contain PY-NLSs, supporting the view that this well-established signal is the major Trn NLS. However, our results also suggest that many Trn cargoes may be recognized by a distinct NLS. We identified five segments in four Trn cargoes that match cNLS motifs and bind to both Trn and Imp-␤. As these characteristics are reminiscent of the BIB domain (10), we refer to these segments as BIB-domain-like. The number of instances (our five segments plus the original BIB domain) of this potential new NLS is still too limited for rigorous statistical inference, but we note that the ratio of Arg to AspϩGlu is higher for those segments than for other cNLS-like segments that did not bind Trn (Fig. 3E).
As described above, our study has shed new light on Trn cargo recognition, potentially breathing new life into the decade-old notion of the BIB domain as a general NLS. Unfortunately, for many Imp-␤s, few if any cargoes are known, and no specific NLSs have even been proposed. We expect our method will be useful for identifying novel NLSs through the identification of carrier specific cargoes.