The Alliance for Cellular Signaling Plasmid Collection

Cellular responses to inputs that vary both temporally and spatially are determined by complex relationships between the components of cell signaling networks. Analysis of these relationships requires access to a wide range of experimental reagents and techniques, including the ability to express the protein components of the model cells in a variety of contexts. As part of the Alliance for Cellular Signaling, we developed a robust method for cloning large numbers of signaling ORFs into Gateway® entry vectors, and we created a wide range of compatible expression platforms for proteomics applications. To date, we have generated over 3000 plasmids that are available to the scientific community via the American Type Culture Collection. We have established a website at www.signaling-gateway.org/data/plasmid/ that allows users to browse, search, and blast Alliance for Cellular Signaling plasmids. The collection primarily contains murine signaling ORFs with an emphasis on kinases and G protein signaling genes. Here we describe the cloning, databasing, and application of this proteomics resource for large scale subcellular localization screens in mammalian cell lines.

Analysis of cross-talk between signaling pathways when mammalian cells are challenged with multiple ligand stimuli and the development of molecular models that describe signal integration and processing provide key insight to cellular signaling mechanisms and the regulation of cellular function (1,2). Two strategic requirements in projects of this nature are the identification of the protein components in a given model system (the so-called "parts list") and the ability to modulate the expression and function of these protein components. At the outset of the Alliance for Cellular Signaling (AfCS) 1 project (1), reagent availability was a central issue, and in the case of cDNA clones, there was no publicly accessible repository of validated mouse sequences. Moreover the nature of the project demanded the establishment of standardized methodology for the isolation of cDNAs and their expression in a variety of contexts. The emergence of recombination-based cloning technologies was timely and allowed the development of a robust cloning platform. We adopted Invitrogen's Gatewayா system that allows facile transfer of DNA segments into multiple expression platforms while maintaining orientation and reading frame register (3)(4)(5).
Several reports have described efforts to generate genomewide collections of cDNAs (6,7) and the production of "OR-Feome" sets of sequence-validated open reading frames (8 -10). The generation of "ORF-only" clones in these latter efforts is key for downstream proteomics applications that require expression of proteins with N-or C-terminal fusion tags (11)(12)(13)(14). The AfCS cloning effort has focused on genes involved in cell signaling and therefore does not approach the scale of these genome-wide projects. However, as the described OR-Feome cloning projects have concentrated on model organisms other than mouse, the AfCS-generated plasmids remain, to our knowledge, the largest collection of mouse ORFonly clones available through a non-commercial source.
Using a set of mouse serine-threonine kinase (STK) genes as an example, we describe the development of protocols and vectors contained in the AfCS plasmid database. This repos-itory has grown into a significant resource of several thousand plasmids that have been made available to the scientific community through the ATCC. Certain gene families with key roles in cell signaling such as kinases and heterotrimeric G proteins are highly represented in the database. We have created expression vectors permitting facile recombination-based generation of fluorescent protein fusions for imaging applications and affinity tag fusions for immunoprecipitation of protein complexes. Using the former expression platform, we have carried out large scale subcellular localization studies in murine cell lines to create the AfCS image database. We present selected subcellular localization data to demonstrate the functional utility of this resource.
EXPERIMENTAL PROCEDURES Detailed protocols developed by the AfCS are available on line at www.signaling-gateway.org/data/ProtocolLinks.html. Brief summaries of key procedures are described below.

Design of Scripts
Cloning Primers-Cloning primers of a defined minimum length, starting at the termini of each ORF and with a melting point above a minimum desired temperature (T m ϭ 70°C), were designed using an in-house Perl script. The calculations in the script were performed as described previously (15,16) based on 50 mM salt and 50 nM DNA concentrations. The attB Gateway recombination sites were added to the primer sequence after the T m calculations ( Fig. 1 and Supplemental Table 1).
Sequence Primers-We utilized Primer3 (17) to generate sequencing primers at approximately equal intervals along a sequence ( Fig. 3b and Supplemental Table 2). Assuming we need n pairs of primers based on clone length L and interval length l (e.g. 500 bp), we then ran Primer3 n ϩ 1 times. Generally we targeted the interval between the coordinates we wanted for the mth primer on the positive strand (minus a window, e.g. 50 bp) and for the (n Ϫ m)th primer on the negative strand (plus a window, e.g. 50 bp). The coordinates refer to the positive strand position (i.e. position Ϫy on the reverse strand is actually L Ϫ y on the forward strand). For the first reverse primer, we defined one excluded area up to the end of the clone and ran Primer3 to design a pair of primers. We then took only the reverse primer. Primer3 accepts different values for product size, primer GC content, primer size, and primer melting point. For each m Ͼ 1 (up to the last one), we used those coordinates to define two excluded zones (from those coordinates to the closest clone ends) and generated a pair of primers using the product size parameter. For the last forward primer, we proceeded similarly to the first reverse primer, with one excluded zone, but only kept the forward primer.

RNA Source and cDNA Preparation
1 g of mouse brain, testis, or spleen poly(A) ϩ RNA (Clontech) was used for first strand synthesis using the cDNA Synthesis System (Invitrogen), subaliquoted, and stored at Ϫ20°C.

Amplification of attB PCR Products for Target Genes
Primers were purchased resuspended to 100 M (Invitrogen). 38 ng of brain cDNA and 38 ng of testis cDNA were combined for amplification of target genes with ProofStart DNA polymerase (Qiagen) including Q-solution in a total reaction volume of 50 l. Products were amplified using the following cycling program: 95°C for 5 min, 94°C for 30 s, 72°C for 6 min (five cycles), 94°C for 30 s, 70°C for 6 min (five cycles), 94°C for 30 s, 68°C for 6 min (25 cycles), 4°C hold. Amplification products were visualized on a 1% agarose, ethidium bromide gel. The correct bands were excised and purified using the QIAquick Gel Extraction kit (Qiagen) and eluted in 30 l EB buffer (Qiagen).

BP Reactions and Entry Clone Diagnostics
150 ng of pDONR207 was combined with 50 -150 ng of purified attB PCR product, 2 l of BP Clonase enzyme, and 2 l of BP Reaction Buffer (Invitrogen), and the volume was adjusted to 10 l/reaction using Milli-Q purified water. After incubation at room temperature for 1 h, 1 l of Proteinase K was added, and reactions were incubated at 37°C for 10 min. 5 l of the reaction mixture was used to transform TOP10 cells (Invitrogen), and recombinants were selected on gentamicin plates (10 g/ml). Entry clone candidates were identified by digestion with the BanII restriction enzyme (New England Biolabs).

Sequence Reactions, Assembly, and Analysis
Sequence reactions were submitted to Macrogen Inc. (Seoul, South Korea) or Elim Biopharmaceuticals, Inc. (Hayward, CA) in 96well plate format. 5 l of miniprep DNA and 5 l of sequencing primer at 2 M (Invitrogen) were dispensed per well in separate plates. Sequencing output traces were input into Paracel Genome Assembler (PGA) to generate one contig per sequenced ORF. Contigs were imported into OMIGA 2.0 (Oxford Molecular) for alignment against target ORFs and for further analysis.

LR Reactions and Expression Clone Diagnostics
100 ng of pEN was combined with 100 ng of pDS, 2 l of LR Clonase enzyme, and 2 l of LR Reaction Buffer (Invitrogen), and the volume was adjusted to 10 l/reaction using Milli-Q purified water. After reactions were incubated at room temperature for 1 h, 1 l of Proteinase K was added, and reactions were incubated at 37°C for 10 min. 5 l of the reaction mixture was transformed in TOP10 cells (Invitrogen), and recombinants were selected on kanamycin (10 g/ ml) or ampicillin (50 g/ml) plates. Expression clone candidates were validated by restriction digest usually with the NcoI restriction enzyme (New England Biolabs).

Plasmid Database
The AfCS plasmid database structure is broadly composed of three parts. The first part is the vector information that consists of the "parent_vector," "target_gene," "cloned_gene," "construct," and "misc_plasmid" tables used for storing sequences, features, and related information about vectors. Plasmid map diagrams are generated from this information using CGView (18) and cached in the corresponding "map" tables. For AfCS users, certain functions have been implemented for generating collections of records, e.g. a parent vector may be selected followed by a search for cloned genes to create a set of construct records in batch format. In this case, the construct sequences are generated from the parent vector template and the cloned gene sequence, whereas the construct feature offsets are adjusted from the parent vector template features. The second part is laboratory storage information that uses the "plasmid_prep" and "prep_storage_list" tables to track samples that have been created in the laboratory in terms of their location (freezer, box, and position). The third part is the batch system that allows collections of parent_vector, cloned_gene, construct, and plasmid_prep records to be grouped together. Various functions have been created to operate on batches to simplify routine processing for AfCS laboratory personnel. For example, storage location records may be generated for samples by automatically or manually assigning the next sequentially available box and position; barcode files for label printing may be generated from the batch records; data files may be created for exporting construct information to ATCC. This flexible approach allows additional batch-oriented functions to be easily added to the programs.

Subcellular Localization in RAW264.7 Cells
A short summary of the steps taken is presented along with the AfCS protocol identification number referenced in parentheses. RAW 264.7 cells were grown in Dulbecco's modified Eagle's medium supplemented with 10% heat inactivated fetal bovine serum at 37°C with 5% CO 2 (PP00000159). Cells were transiently transfected with 0.5 g of each DNA and 1 l of Lipofectamine 2000 (Invitrogen). The protocol was modified from the manufacturer's usual method so that the cells were simultaneously transfected and plated onto an 8-well coverglass chamber (PP00000182). Live cell confocal images were collected 20 -30 h after transfection. Images were collected on an automated Zeiss Axiovert with 100ϫ oil objective using a Nipkow spinning disk. The automated (PP00000143) microscope and camera (Photometrics CoolSnap HQ) were controlled using MetaMorph software (Molecular Devices Corp.).

Primer Design and Amplification of attB PCR Products-To
develop a methodological platform for efficient cloning of large numbers of signaling ORFs for the AfCS research effort, we chose the STK gene family (19,20) as a pilot project as it was well represented in the public sequence databases at the outset of the AfCS program. We queried the simple modular architecture research tool (SMART) database (21,22) to generate a comprehensive, yet unique, list of ORFs and identified 217 distinct murine STK ORFs based on the information publicly available in 2001. As different applications might require the ability to express either N-or C-terminal fusions of the cloned cDNAs, we generated two forms of each ORF: T and N. The former includes the stop codon allowing for N-terminal tagging, whereas the latter lacks this codon to permit Cterminal fusions. To allow facile subcloning of cDNAs to multiple expression platforms, we used the Gateway recombination-based cloning technology (4).
It has been demonstrated that PCR isolation of cDNAs from complex mixtures can be facilitated by amplification protocols that use combined annealing/extension steps that decrease stepwise from high temperatures (so-called "touchdown PCR" (23)). Because these protocols require high initial primer annealing temperatures, we designed a Perl script (see "Experimental Procedures") that processes input ORF sequences (in FASTA format) to generate cloning primers with a T m of 70°C (Supplemental Table 1). The schematic in Fig. 1 shows the design and nomenclature of the cloning primers for amplification of the T and N forms of each ORF. We initially tested several mRNA sources and found a mixture of mouse brain and testes to be optimal for amplification of most signaling target mRNAs, whereas hematopoietic-specific target mRNAs were more often isolated from a mouse spleen source (data not shown). We used the hot start-activated ProofStart DNA polymerase enzyme for amplification as its proprietary chemical modification ensures that primers remain intact during setup to prevent mispriming, and it offers efficient exonuclease activity for high fidelity (10 times greater than Taq DNA polymerase). Fig. 2 shows a typical example of attB PCR amplification of a subset of our targeted STKs from brain/ testes cDNA. Of the 434 PCRs attempted for the initial STK target set (T and N forms of 217 ORFs), we amplified 375 products based on expected size (data not shown).
Recombinational Cloning and Sequence Analysis-Purified PCR products were recombined with pDONR207 in a BP reaction to generate entry vectors for each target mRNA (4). pDONR207 was favored over other commercially available donor vectors as it produces gentamicin-resistant entry clones compatible for recombination to either ampicillin-or kanamycin-based expression vectors.
In moderate to high throughput cloning protocols, it is helpful to have a simple diagnostic method to reliably identify clone candidates prior to sequencing. The redundancy in the BanII recognition site (GPuGCPyC where Pu is purine and Py is pyrimidine) leads to a higher likelihood of internal restriction sites being present in most target genes, increasing variation in expected banding patterns from ORFs of similar length and allowing for more reliable identification of correct clones. Furthermore the entry vector backbone derived from pDONR207 contains only two BanII sites that conveniently flank the ORF. Thus, the parent vector backbone contributes a single 3-kb band in screens for pEN candidates thereby minimizing interference when analyzing sizes of gene-specific bands. We FIG. 1. Design of attB gene-specific PCR amplification primers. Forward primer (I5) contains attB1 sequence, and reverse primers (T3 and NT3) contain attB2 sequence. I5 and T3 primer pair amplifies the T form of target gene, whereas I5 and NT3 primer pair amplifies the N form. Note that the bracketed sequence complementary to the stop codon is included in the T3 primer but is omitted from NT3 primer. All primers extend to a T m of 70°C exclusive of the attB sequence. developed a simple Perl script to output expected BanII digest patterns for any number of ORFs input in FASTA format. An example of a diagnostic digest of 16 entry clones is shown in Fig. 3a. Using this screening method, we identified sequence candidates for 95% of the STK ORFs that went through BP recombination in our initial clone set.
The Primer3 PCR primer program (17) was customized to autogenerate sequence primers along both DNA strands of each target ORF spaced ϳ500 bases apart (see "Experimental Procedures" and Supplemental Table 2). Two entry vectorspecific flanking primers were also used for each ORF to provide extensive coverage of both strands of each sequenced clone (Fig. 3b). We used PGA software to assemble the output from each set of sequencing reactions into a single contig. PGA utilizes an algorithm that allows comparison of base calls in the raw sequencing data to evaluate and select the most accurate reading. We found that the combination of generous overlap between individual contigs from the 500nucleotide spacing of primers and the efficient analysis of the PGA software provided a robust protocol for generation of high quality sequence data. More importantly, the generation of N and T forms for each ORF, from separate amplification reactions, allowed us to better differentiate between PCRgenerated mutations and genuine variants due to splice differences or polymorphisms. We found many cases where our amplified sequence did not precisely match the reference sequence in GenBank TM . This disagreement could be caused by randomly generated PCR-based mutations. However, if the same "mutation" occurred in the independently amplified T and N forms of a given gene, the chance that this difference was introduced randomly by amplification was considered negligible, and the clones in question were databased as valid. Details of any such differences between our target and cloned cDNAs (target ORF versus AfCS ORF) were recorded in the AfCS plasmid database (described below).
Design, Structure, and Content of the AfCS Plasmid Database-One advantage of the Gateway cloning platform is that it consists of standard backbones for each parent vector into which cDNAs are incorporated at invariant recombination sites. In this context, the union of a "parent" vector backbone and a sequence-validated "cloned gene" generates a construct. With regard to standard Gateway nomenclature in this system, both entry vectors (pEN) and expression vectors (pEX) containing a cloned cDNA are constructs, whereas destination vectors (pDS) are parents (Fig. 4). We designed the AfCS plasmid database to build and curate these various vector types. This database combines cloned gene and parent vector template information, and it uses custom Perl scripts to automatically generate detailed construct records and maps (Supplemental Fig. 1, a and b). To facilitate tracking of these vectors, we created a barcode system for ready identification of relevant components and properties (Fig. 4). This database structure supports the generation of detailed records for the large number of constructs generated by the AfCS consortium. The details for a given cloned cDNA only require one initial entry to the database; all subsequent construct records created using that sequence are autogenerated. We found this particularly valuable for 96-well-based applications where a plate of 96 entry clones could be subcloned by LR recombination to a set of four CFP and YFP fusion expression vectors for microscopy studies (Fig. 5). The database was designed to permit effortless combination of the 96 "cloned cDNAs" with each of the four parent vector sequence "templates" to create 384 construct records containing full sequence and recommended diagnostic digests to validate recombinants. The plasmid database software is a standard web-based database application that consists of a collection of Perl common gateway interface (CGI) programs for display and data entry and uses an Oracle database for persistence (Supplemental Fig. 2 and "Experimental Procedures").
The publicly accessible version of the database contains constructs that have been made available to the research community through the ATCC (see their molecular genomics clone search on line). Users can view, browse, and search the list of available vectors at www.signaling-gateway.org/data/ plasmid/ where barcodes link to the detailed construct records including full sequence and plasmid maps (Supplemental Fig. 1, a and b). The database also contains details on a number of available AfCS-developed parent vectors that can be used to create expression constructs for a range of applications from tagging of cDNAs with GFP derivatives to affinity tags for proteomics applications (Supplemental Table   3). Although the majority of AfCS "expression-ready" constructs available from the ATCC are CFP or YFP fusions (1898 of a total 3076 constructs containing cDNA), we have also deposited all of the cloned ORFs as entry vectors (1178 of the 3076) that can be used to move the ORFs to any of the parent vectors described in Supplemental Table 3 or to any other Gateway-ready vector.
Subcellular Localization of cDNAs in RAW264.7-As described above, a large proportion of the vectors generated for AfCS studies were CFP and YFP fusion constructs for microscopy studies. In an effort to generate a database describing the subcellular localization of signaling genes in AfCS model cell types, we utilized multiple constructs for each cloned gene. We carried out a large scale assessment of subcellular localization of the STKs in both the RAW264.7 murine macrophage cell line and the WEHI231 murine B cell line. This data set is part of the freely available image database found on the AfCS data center (www.signaling-gateway.org/data/ Data.html). In this analysis, we co-expressed the same protein N-or C-terminally fused with C/YFP in the following fashion: protein-CFP ϩ YFP-protein and CFP-protein ϩ protein-YFP. We visualized the expression patterns of the STKs using live cell confocal microscopy, allowing us to make a subjective determination of the subcellular localization of the STKs. Importantly, screening for localization by this method allowed us to determine whether the N-and C-terminally tagged versions co-localized and if there were differences in localization patterns across different cells.
Here we highlight three examples in RAW264.7 cells to emphasize the relevance and utility of a large scale cloning and subcellular localization screen. The p21-activated kinases (Paks) bind to and may be stimulated by activated forms of the small GTPases Cdc42 and Rac. The Pak5 and Pak6 isoforms belong to the recently recognized Group II Paks (24). In the RAW264.7 line, we observed Pak5 localization as a punctate pattern along the cellular membrane (Fig. 6a), which differs from previously published reports showing association with mitochondria (25). The same pattern of Pak5 localization is observed with both N and C-terminal CFP/YFP fusions. We noted differential Pak6 localization depending on the terminus at which CFP/YFP is expressed. With N-terminal tagging, we observed the disperse cytoplasmic localization of Pak6 described previously in CV-1 cells (26). However, with C-terminal CFP or YFP, we detected a more distinct localization of Pak6 to the plasma membrane (Fig. 6b). This finding may provide insight into the cellular mechanisms that regulate Pak6 localization.
Cyclin-dependent kinases (Cdks) have been shown to be key players in the control of cell cycle progression. Their activity is regulated by interaction with specific subunits known as cyclins, phosphorylation by other protein kinases, and dephosphorylation by phosphatases (27). Based on sequence homology, Pctaire2 is a Cdk-related gene and is likely FIG. 4. AfCS plasmid database vector types and barcodes. The ORF from the entry vector recombines with the parent vector backbone to generate an expression construct. In the AfCS plasmid database, entry vectors and expression vectors, which both contain a cloned ORF, are classified as constructs, whereas parent vectors remain in a separate category. The tables to the right describe the components of the AfCS barcode system for each vector type.

FIG. 5. Schematics of ORF transfer by recombinational cloning to generate expression clones for subcellular localization studies.
ORF-T form contains the stop codon, whereas ORF-N form lacks the stop codon. ORF sequence is transferred to expression vectors via LR recombination. The T form is CFP-or YFP-tagged at the N terminus; the N form is CFP-or YFP-tagged at the C terminus. to play a role in cell cycle progression. Here we show the first example of punctate localization of Pctaire2 potentially at the centrosome (Fig. 6c). In most cells, N-or C-terminal tagging does not seem to disrupt this localization (see Fig. 6c, upper panels). However, in some cells, we find that C-terminal tagging does disrupt Pctaire2 localization from the centrosome in the same cell in which an N-terminally tagged version still localizes to this region (see Fig. 6c, lower panels). Overall these data demonstrate that large scale subcellular localization studies cannot only confirm previous findings but also reveal alternative or novel data that provide the basis for new hypotheses about protein function. DISCUSSION The cloning of large numbers of cDNA ORFs is a challenging task that requires capable software systems and robust technical methodology as well as efficient, facile, and accessible data curation. We describe the generation of a publicly available cDNA collection by the AfCS consortium with emphasis on the STK gene family. Although reference is made to an initial set of 217 STK target genes (of which 124 were successfully cloned and databased), this group was limited by the lack of a publicly available mouse genome sequence at the outset of the project. Our scope was expanded with the completion of the mouse genome (28), and the entire set of murine kinase genes (the mouse "kinome") is now proposed to number 540 genes (19) of which the AfCS has cloned and distributed Ͼ260. Fig. 7 shows a phylogenetic tree derived from the human kinome (20) that highlights the murine protein kinase orthologs cloned and distributed as part of the AfCS effort. In addition to wild type kinase cDNAs, we also created mutant cDNAs encoding kinases where the essential lysine residue in the kinase domain was mutated to arginine. This "Lys to Arg" mutation creates a potential dominant negative for almost 200 kinases. These mutants have also been distributed to the ATCC.
We managed the curation of our cloned cDNAs by creating the AfCS plasmid database (www.signaling-gateway.org/data/ plasmid/). This database is a dynamic and flexible resource that represents a long term and ongoing repository of not only STK-containing but all publicly available AfCS vectors. Users can also perform blast searches of the AfCS collection at this site.
In addition to the kinase cDNA set, many additional genes with key roles in cell signaling were cloned as part of the AfCS effort. Other gene families strongly represented in this group include heterotrimeric G proteins, G protein-coupled receptors, small G proteins, guanine nucleotide exchange factors, and phospholipases. A comprehensive set of pleckstrin homology domains were also cloned and distributed. The total number of unique full-length ORFs cloned and fully sequenced in Gateway entry vectors, and currently available at the ATCC, is 449 (Supplemental Table 4). The majority of these are available in two forms (Ϯa termination codon) to permit expression as either N-or C-terminal fusions with various tags (see Supplemental Table 3 for list of expression options). Although Supplemental Table 4 shows only one representative clone for each unique gene, splice variants were cloned for several genes. For example, a keyword search with the Mek7 gene name at www.signaling-gateway. org/data/plasmid/ finds 44 constructs containing six different variants of Mek7.
A primary application for these cDNAs in the AfCS program was the creation of a subcellular localization database generated through expression of cDNAs as CFP and YFP fusion proteins in the RAW264.7 macrophage cell line and the WEHI231 B cell line (www.signaling-gateway.org/data/Data.html) (29). Thus, the majority of "ready-to-use" expression constructs available through the ATCC are CFP or YFP fusion constructs for mammalian expression. An additional resource that was created as part of the imaging project was a comprehensive set of localization markers for various organelles (34). These have been made available by the ATCC in a 96-well plate to provide the entire 64 plasmid set in an accessible format (ATCC ID MBA-91 and Supplemental Table 5).
We describe localization data in RAW264.7 macrophages for the kinases Pak5, Pak6, and Pctaire2. Pak5 and Pak6 are highly homologous and belong to the recently recognized Group II Paks (Paks 4, 5, and 6) that retain 40 -50% identity to the kinase domains of the Group I Paks (Paks 1, 2, and 3) (24,30). We observed a distinct punctate pattern for Pak5 localization along the cellular membrane that does not seem to be disrupted by N-or C-terminal tagging (Fig. 6a). Because previous studies observed Pak5 localization in mitochondria of Chinese hamster ovary cells, it is possible that Pak5 localization is dynamic and varies between cell types. Expression of the N-terminal fusion of Pak6 exhibits the previously observed disperse cytoplasmic localization (26). However, the C-terminal fusion reveals a more distinct plasma membrane location (Fig.  6b). Pak6 contains a PAK box domain (PBD), characteristic of the Pak kinases (31). It is possible that C-terminal tagging disrupts the function of the PBD domain and, by extension, the  Supplemental Table 4 for clone IDs). cellular compartmentalization of Pak6. On the other hand, the specific membrane localization of the C-terminal CFP/YFP fusion may provide insight into how engagement of the PBD domain could dynamically regulate Pak6 localization.
Based on sequence homology, Pctaire2 is a Cdk-related gene and likely plays a role in cell cycle progression (27,32). Previous localization studies of Pctaire1 show abundant cytoplasmic localization (32). Here we show the first example of distinct localization of Pctaire2 that, in most cells, appears to be at the centrosome with both N-or C-terminal tagging (Fig.  6c). It is noteworthy that we find a small proportion of cells where C-terminally tagged Pctaire2 is delocalized from the centrosome while the N-terminally tagged form remains localized (Fig. 6c, lower panels). This observation may provide insight to a dynamic role for the Pctaire2 C terminus during certain stages of the cell cycle. These data demonstrate how expression constructs from the AfCS collection can provide informative data on the localization of signaling genes that may lead to important insights into protein function. In addition, the CFP/YFP fluorescent tags were chosen for their compatibility in fluorescence resonance energy transfer studies, which can provide data on the real time interaction kinetics of dynamically associated proteins (e.g. heterotrimeric G protein subunits). Indeed all of the constructs available in the AfCS plasmid database can provide the foundation for further analyses working toward a better understanding of complex cellular signaling networks.
In general, the cDNA cloning protocol we describe proved to be robust for the isolation of a large number of murine signaling genes. As a standard source material, a mixture of brain and testes cDNA appears to provide the most comprehensive coverage of genes involved in the major cell signaling pathways. Analysis of the genes that we failed to amplify shows a bias toward large cDNAs. However, we were still able to clone 17 ORFs of Ͼ3 kb using the ProofStart polymerase. We found that this enzyme provided the best balance of robustness of amplification versus fidelity. Furthermore we could readily identify randomly generated PCR mutations by running separate PCRs for cDNAs with and without the termination codon (Fig. 1). In most cases, we expect that the differences we observed between our cloned sequences and the GenBank reference were caused by either splice variation between the originally sequenced ORF and ours, genuine polymorphisms, or sequence inaccuracies from more dated sequencing technologies. For example, we cloned the heterotrimeric G protein G␣ i2 from the sequence in GI: 6680036 and observed five differences in the nucleotide sequence. Comparison with the current RefSeq record for G␣ i2 (GI: 41054805) shows a perfect match between the AfCS and RefSeq sequences at these five locations. Interestingly there are three presumed polymorphisms between the AfCS and RefSeq sequences at other locations, but these three nucleotide differences are "silent" with respect to amino acid sequence (data not shown).
Our adoption of the Gateway cloning technology at the outset of this effort greatly facilitated the cloning process (4). The resource described here is specific to cDNA cloning, but we have also used Gateway extensively in the cloning of short hairpin RNAs for RNA interference, initially as RNA polymerase III promoter-driven stem loops (www.signaling-gateway. org/data/plasmid/RNAi_vector_guide.pdf) and more recently as conditionally expressed microRNA-like-short hairpin RNAs (33). The recombination reactions used for both cloning PCR products and subcloning to expression platforms for functional analysis have proven to be highly robust and scalable to 96-well plate formats. Moreover our development of custom software scripts allowed us to automate the generation of plasmid database records, an approach that was particularly useful for batch calculations of diagnostic restriction maps for validating constructs after recombination.
In summary, the constructs generated and distributed to the ATCC by the AfCS consortium represent a valuable experimental resource for the cell signaling research community. We present only a small subset of the localization data available at the AfCS image database (www.signaling-gateway.org/data/Data.html), so this data resource can provide significantly more insight and experimental leads than what we are able to highlight in this report. Moreover these localization screens are only one of many applications for this Gateway-compatible clone set. The AfCS ORFs described here are also available from the ATCC in Gateway entry vectors. Thus, subcloning to other Gateway-compatible expression platforms (such as those detailed in Supplemental Table  3 and others available commercially) would allow many additional proteomics applications to further investigate the molecular function of the signaling genes in the AfCS collection.