A High Efficiency Strategy for Binding Property Characterization of Peptide-binding Domains*S

A large proportion of protein-protein interactions is mediated by families of peptide-binding domains. Comprehensive characterization of each of these domains is critical for understanding the mechanisms and networks of protein interaction at the domain level. However, existing methods are all based on large scale screenings for each domain that are inefficient to deal with hundreds of members in major domain families. We developed a systematic strategy for efficient binding property characterization of peptide-binding domains based on high throughput validation screening of a specialized candidate ligand library using yeast two-hybrid mating array. Its outstanding feature is that the overall efficiency is dramatically improved compared with that of traditional screening, and it will be higher as the system cycles. PDZ domain family was first used to test the strategy. Five PDZ domains were rapidly characterized. Broader binding properties were identified compared with other methods, including novel recognition specificities that provided the basis for major revision of conventional PDZ classification. Several novel interactions were discovered, serving as significant clues for further functional investigation. This strategy can be easily extended to a variety of peptide-binding domains as a powerful tool for comprehensive analysis of domain binding property in proteomic scale.

A substantial proportion of protein-protein interactions is mediated by families of protein interaction domains that recognize short peptide motifs in their binding partners, such as PDZ, SH3, WW, GYF, etc. (1). These peptide-binding domains are involved in a variety of functions such as subcellular localization, enzymatic activity, substrate specificity of regulatory proteins, and the assembly of multiprotein complexes (2,3). Comprehensively investigating the binding properties of each peptide-binding domain family is critical for understanding the mechanisms and networks of protein interactions at the domain level (4,5).
Several powerful methods, including oriented peptide li-brary, SPOT synthesis, phage display, and yeast two-hybrid (Y2H) 1 methods, have been successfully used for characterization of the peptide recognition specificities of individual domain families. Oriented peptide library approach was the initial foray into investigation of the consensus-binding sequences of domains (6,7). The domain of interest is immobilized and incubated with soluble oriented peptides, and the adsorbed peptides are sequenced as a mixture to deconvolute the consensus motif in a statistical manner. This method is powerful for revealing preferences of the domain for certain amino acids at a given position; however, it can neither determine whether certain residues are forbidden at the particular position nor isolate actual sequences of binding peptides (6,7). As an alternative strategy, oriented peptide array library integrates the oriented peptide library and array technologies (8). Hundreds of individual pools, each of them consisting of an oriented peptide library, are synthesized on solid supports, and the preferred amino acids at every position are read directly from arrays without protein sequencing. A disadvantage of this method is that the binding peptides are analyzed in a pool, making it impossible to obtain actual sequences of positive peptides and quantitatively compare affinities of defined peptides. SPOT synthesis is based on chemical synthesis of peptides on cellulose membranes and independent tests of the binding of these peptides to the domain of interest (9). In this approach, actual binding sequences can be obtained directly, but unexpected sequences will not be detected because synthesized sequences are dependent on a priori information, and the throughput is limited by the number of peptides that can be synthesized on a membrane of reasonable size in a realistic time (ϳ10 4 peptides) (10,11). Phage display is a high throughput approach in which libraries of 10 9 -10 10 random peptides can be displayed and screened (5,12). High affinity ligands can be identified much more efficiently than can low affinity ones (13). All the above methods are in vitro and require purified protein(s) as well as artificial conditions for incubation and elution. As an in vivo approach, Y2H assays are extensively used in a high throughput manner to find ligands for domains of interest (14 -16). The advantages are that affinity competition can be avoided, transient interactions can be captured, and actual binding sequences can be isolated (14 -16). However, cDNA libraries have traditionally been screened in Y2H assays with the result that low abundance ligands are easily overwhelmed by high abundance ligands (17). For comprehensive analysis of the binding properties for a certain domain, multiple libraries must be screened repetitiously. Alternatively a random peptide library allows the identification of specific peptides to deduce the consensus-binding sequences in a single round of screening (18). However, these methods are all based on large scale screenings for each bait domain that are relatively laborintensive and time-consuming. Hundreds of domains have been predicted for each of the major domain families in the human proteome (smart.embl-heidelberg.de/), making it tremendously difficult to do individual large scale screening for each member of a domain family. Direct validation of preferred ligands from a special library that consists of highly diversified candidate ligands can dramatically improve the overall efficiency. It is very common in most domain families that one domain can bind many ligands (1) and that one ligand can be recognized by multiple domains (19,20). In a given domain family, the ligands selected by some domain members are more likely to bind other members than unrelated random sequences. For more efficiently characterizing the binding properties of peptide-binding domains, here we present a systematic strategy based on high throughput validation screening of a specialized candidate ligand library using yeast two-hybrid mating array. The library was constructed mainly by collecting the ligand clones isolated from yeast two-hybrid screenings of random peptide libraries (RPY2Hs) with representative members of the selected domain family.
We first focused on one particular type of peptide-binding domain, PDZ (named after PSD-95/SAP90, DLG, and ZO-1) domain, and used it as an example for explanation and evaluation of the strategy. PDZ domain is one of the most abundant protein interaction modules and is involved in a variety of important cellular functions, such as intracellular routing of proteins, regulation of neurotransmitter transporter, and formation of multiprotein complexes (21,22). Typically PDZ domain is specialized for binding the extreme carboxyl termini of its target proteins (7,23). Based on the last four residues, PDZ binding motifs have been grouped conventionally into four classes: Class I, -(S/T)X⌽*; Class II, -⌽X⌽*; Class III, -(D/E/K/ R)X⌽*; and Class IV, -X⌿(D/E)* where X is any amino acid, ⌽ is a hydrophobic residue, ⌿ is an aromatic residue, and * represents stop codons of peptide sequences (21). Some PDZ domains can create further functional diversities by other interaction modes, such as dimerization of PDZ-PDZ domains (24,25) and recognition of internal sequences (26,27).

MATERIALS AND METHODS
Preparation of Bait Plasmids-Yeast two-hybrid bait plasmids, human ZO-1 PDZ1-pMBa (Swiss-Prot accession number Q07157, amino acids 1-105) and PDZ3-pMBa (Swiss-Prot accession number Q07157, amino acids 401-500) plasmids, were kindly provided by Dr. Ben Giepmans (The Netherlands Cancer Institute). cDNA of Erbin PDZ domain (Swiss-Prot accession number Q96RT1, amino acids 1273-1371) was amplified by PCR with sense primer 5Ј-GAATTCGGCCAT-GAACTGGCAAAACAAG-3Ј and antisense primer 5Ј-GAATTCTTAT-GAGGAAACTTCTCGTAC-3Ј from human T cell cDNA library. The PCR product was digested with EcoRI and cloned into the EcoRI site of GAL4 BD vector, pBridge, which carries Trp1 as a selection marker. cDNA of HtrA2 PDZ domain (Swiss-Prot accession number O43464, amino acids 343-454) was obtained by PCR from human bone marrow cDNA library with the sense primer 5Ј-CGCGGATCCATCGT-GGGGAAAAGAAGAATT-3Ј and antisense primer 5Ј-GGCGGATC-CAGGGGTCACATATAAGGTCAG-3Ј. The PCR product was digested with BamHI and ligated into the BamHI site of pBridge. cDNA of LNX1 PDZ2 domain (Swiss-Prot accession number Q8TBB1, amino acids 371-473) was amplified by PCR with sense primer 5Ј-CCGGAATTC-GATGCCTACAGACCCCGAGAT-3Ј and antisense primer 5Ј-CGCG-GATCCCTACTGAAAGATGTCAGGGCTCCG-3Ј using IMAGE 5164034 clone (Invitrogen) as template. The PCR product was digested with EcoRI and BamHI and cloned into GAL4 BD vector, pAS2-1, which carries Trp1 as a selection marker. All the constructed bait plasmids were confirmed by DNA sequencing.
Construction and Characterization of the Y2H Random Peptide Library-High diversity random peptide library was constructed by the improved methodology based on our previously described method (28). We choose pGADT7, which carries selection marker Leu2, as the GAL4 AD vector. We modified its reading frames as follows. 1) The reading frame at BamHI site was modified. Plasmid pGADT7 was digested by EcoRI, and the four protruding nucleotides at the 5Ј-end were made up by T4 DNA polymerase or cut by mung bean nuclease, and then the two blunt ends were ligated by T4 DNA ligase. Two new GAL4 AD vectors, pGADT7(ϩB) in which the reading frame at BamHI site was shifted one nucleotide backward and pGADT7(ϪB) in which the reading frame at BamHI site was shifted one nucleotide forward, were obtained. 2) The reading frame at EcoRI was modified. Oligonucleotides 5Ј-AATTAAGCTTGGG-3Ј and 5Ј-AATTAAGCTTG-3Ј and each antisense strand were synthesized. The fragment of double-stranded DNA was obtained by annealing the oligonucleotides and their respective antisense strands followed by subcloning into the EcoRI sites of pGADT7 individually. Another two new GAL4 AD vectors, pGADT7(ϩE) in which the reading frame in EcoRI site was shifted one nucleotide backward and pGADT7(ϪE) in which the reading frame in EcoRI site was shifted one nucleotide forward, were obtained.
Human genomic DNA was digested by Tsp509I overnight at 65°C, and the fragments were cloned into the EcoRI site of pGADT7(ϩE) and pGADT7(ϪE), respectively. Human genomic DNA was digested by DpnII for 5 h at 37°C, and the fragments were cloned into the BamHI site of pGADT7(ϩB) and pGADT7(ϪB), respectively. The same procedure was used with tobacco genomic DNA with the additional cloning into original pGADT7 vector. The constructed library plasmids were transformed into Escherichia coli DH10B, and the number of transformed clones was estimated before amplification. Finally we obtained 10 libraries (Supplemental Table 1). For characterization of the library, several single clones were selected randomly from each library, and the inserted fragments were analyzed by PCR and restriction enzyme digestion. A final random peptide library was obtained by pooling all 10 libraries.
Y2H Screening of Random Peptide Library-The GAL4 BD-PDZ fusion bait plasmid was transformed into the yeast strain CG1945 using the lithium acetate protocol. The transformants were grown on SD/ϪTrp plates and spread on SD/ϪHisϪTrp plates for self-activation estimation. The transformants that had no background growth or had background growth but could be inhibited by 3-amino-1,2,4triazole and were also negative for the LacZ assay were selected for the subsequent screening. The random peptide library was screened following the MATCHMAKER Two-Hybrid System protocol (Clontech). Approximate 10 7 Trp ϩ Leu ϩ transformants were selected on plates with SD/ϪHis/ϪLeu/ϪTrp medium in the primary screening and then tested by the improved LacZ assay in the second screening. After rescue, the potential positive plasmids were isolated and retransformed into the yeast strain CG1945 containing corresponding bait plasmid. Only the clones that were positive for all the reporter assays and confirmed by at least two independent tests were selected for specific interactions and sequenced.
Semiquantitative Binding Assay-The semiquantitative binding affinity of interaction between positive clone and corresponding PDZ bait was determined as ␤-galactosidase units using liquid culture ␤-galactosidase assay with o-nitrophenyl ␤-D-galactopyranoside (ONPG) as substrate. The assay protocol and the calculation of ␤-galactosidase units were based on the "MATCHMAKER Random Peptide Library User Manual" from Clontech. In brief, yeasts in appropriate selective medium were cultured overnight, and A 600 was recorded. Cells from 500 l of culture were resuspended in 100 l of Z buffer and frozen in liquid nitrogen. Then 700 l of Z buffer plus 0.27% ␤-mercaptoethanol and 160 l of ONPG (4 mg/ml in Z buffer) were added. The reaction was terminated by adding 0.4 ml of 1 M Na 2 CO 3 until yellow color developed. The incubation time and A 420 were recorded. Results were expressed as Miller units: ␤-galactosidase units ϭ 1000 ϫ A 420 /(A 600 ϫ T (incubation time in min) ϫ V (volume of cell culture in ml)). To reduce variability, three separate transformants were assayed, each in triplicate. Values represent the means Ϯ S.D. of ␤-galactosidase units. The results were analyzed by the statistical software SPSS11.0.
Construction of PDZ Ligand Library and High Throughput Yeast Mating Array-The PDZ ligand library was constructed by individually introducing all the non-redundant positive plasmids isolated from RPY2Hs, all the GAL4 AD-PDZ fusion plasmids, and other plasmids (as described under "Discussion") into the yeast strain Y187 (MAT␣). The library yeast clones were stored at Ϫ80°C. The interested PDZ bait was introduced into the yeast strain CG1945 (MATa). For interaction mating array, 10-l liquid cultures of library MAT␣ yeast strains were pooled in 96-well plates, grown, and mixed with 60-l liquid cultures of bait MATa yeast strains. The mating mixtures were transferred onto yeast-peptone-dextrose agar plates and incubated for 32 h at 30°C. After mating, the diploid clones were picked from the plates and transferred onto 96-well plates with SD/ϪLeu/ϪTrp selective agar medium for examining the mating efficiency and onto 96well plates with SD/ϪHis/ϪLeu/ϪTrp selective agar medium for subsequently examining the positive interactions. After 3-7 days of incubation at 30°C, the clones positive for growth were selected for further LacZ assay.
Analysis of the Consensus-binding Sequences and Prediction of Candidate Ligands-For each PDZ, consensus-binding sequences were deduced from sequence alignment of positive clones and comparative analysis of both positive and negative sequences. Tailfit software (29) was used to search the Swiss-Prot database or NR human database to retrieve the potential ligand proteins whose carboxyl termini matched the consensus-binding sequences. The NR human database was composed of the proteins with keyword "human" in their descriptions by using Bioworks 3.1 SR1 (Thermo Finnigan, San Jose, CA) to single out proteins from the NR database. The most promising candidate ligand proteins were selected manually with biological information such as subcellular localization and known function.

Confirmation of Candidate PDZ Interactions-
The GAL4 AD plasmids expressing the carboxyl-terminal 10 amino acid residues of the candidate ligands of ZO-1 PDZ1 were first constructed. In one example, for construction of Claudin-17, the oligonucleotides 5Ј-AATTCA-TGCTTAGTAAGACCTCCACCAGTTATGTCTAAG-3Ј and 5Ј-GATCC-TTAGACATAACTGGTGGAGGTCTTACTAAGCATG-3Ј were synthesized. The two fragments were phosphorylated by T4 polynucleotide kinase and annealed at room temperature. The obtained fragment of double-stranded DNA was cloned into the EcoRI/BamHI sites of GAL4 AD vector pGADT7 followed by DNA sequencing. Each constructed plasmid was cotransformed with ZO-1 PDZ1 bait plasmid into yeast strain CG1945, respectively. The positive interactions were selected as mentioned above in yeast two-hybrid screening. The same procedure was used for the candidate ligands of Erbin PDZ, HtrA2 PDZ, and LNX1 PDZ2. The procedure was slightly different for ZO-1 PDZ3: the oligonucleotides encoding the carboxyl-terminal 16 amino acid residues of each candidate ligand were cloned.
Analysis of the Novel Protein-Protein Interactions-A human protein-protein interaction reference data set was obtained from the Human Protein Reference Database (HPRD) (www.hprd.org, status September 13, 2005). The shortest path length was carried out with TopNet (30). The network was visualized using the Osprey Network Visualization System (reference manual for Version 1.2.0) (31).

The High Efficiency Strategy for Binding Property Characterization of PDZ Domain Family
The newly developed strategy involves the following seven steps as illustrated in Fig. 1A.
Step 1 is RPY2H. Individual RPY2Hs with representative PDZ domains as bait were performed to isolate a series of positive PDZ binding clones (a).
Step 2 is construction of the ligand library. A specialized PDZ ligand library was constructed mainly by collecting all the PDZ binding clones isolated from all the RPY2Hs and all the PDZ domain clones (b) along with addition of known or potential PDZ binding clones such as the cDNA clones of PDZ ligand proteins and the synthesized clones of potential PDZ binding peptides (l). Step 3 is validation screening. A PDZ domain of interest was analyzed by validation screening of the PDZ ligand library using high throughput Y2H mating array (c). Both positive and negative binding sequences were read directly from arrays (d).
Step 4 is supplemental RPY2H. Certain PDZ domains that did not interact with enough numbers of ligands from the ligand library were used as bait for de novo RPY2Hs in the traditional way. The selected clones were added to the PDZ ligand library to increase its diversity for subsequent studies of PDZ domains with similar preferences.
Step 5 is characterization of binding properties. The precise binding properties of each PDZ were characterized by comparative analysis of both positive and negative sequences isolated from RPY2H and/or validation screening (f and g).
Step 6 is prediction of candidate ligand proteins. The candidate ligand proteins were predicted by protein database searches with consensus-binding sequences followed by manually filtering with biological information such as subcellular localization and known functions (h).
Step 7 is confirmation of potential interactions. The clones expressing carboxyl termini of the predicted candidate PDZ ligand proteins were constructed individually by synthesis of oligonucleotides and cotransformed with corresponding PDZ bait in one-to-one Y2H assays to confirm the interactions (i and k). The synthesized clones were then added to the PDZ ligand library (j).
This strategy is notably different from other approaches in several features. First, the overall efficiency is dramatically improved by validation screening instead of traditional screening. It can be further improved as the candidate ligand library expands. Second, we can achieve high throughput with the yeast two-hybrid mating array approach. Multiple bait domains can be tested in parallel, and thousands of candidate ligands can be screened on arrays simultaneously. Third, because the library consists of ligands of known sequences, actual positive and negative binding sequences, which are both very important for precisely defining binding properties, can be directly read from arrays without resequencing. Fourth, the clones of PDZ domains are also included in the PDZ ligand library, enabling study of PDZ-PDZ domain interactions at the same time. Fifth, the ligands having the highest affinity and the highly specific ligands capable of binding to one particular target PDZ domain but not to others will be ultimately identified. These ligands can provide the basis for the synthesis of mimetics to be used as research tools or therapeutic agents (18,32).

Investigation of PDZ Domains Using the Newly Developed Strategy
Construction of a PDZ Ligand Library by RPY2Hs with Representative PDZ Domains-We first constructed 10 random peptide libraries individually (Supplemental Table 1) by subcloning the digested genomic DNA fragments into the yeast two-hybrid AD vectors with different reading frames (28). Because PDZ domain typically recognizes four residues at the extreme carboxyl termini of their targets, 1.6 ϫ 10 5 (20 4 ) independent random clones are necessary for studying its recognition specificity. Each library contained approximately 1 ϫ 10 7 transformed clones and was sufficient for expressing all the possibilities of the carboxyl-terminal random four residues. Taking into account that more than four residues are possibly involved in PDZ recognitions, we generated a larger random peptide library by pooling the 10 libraries together. The final library contained 1.5 ϫ 10 8 transformed clones and was sufficient to cover all the possibilities of the carboxylterminal random six amino acids (6.4 ϫ 10 7 , 20 6 ).
We chose three human PDZ domains, PDZ1 and PDZ3 domains of ZO-1 (Swiss-Prot accession number Q07157) and PDZ domain of Erbin (Swiss-Prot accession number Q96RT1), to screen the final random peptide library individually using the Y2H approach. The specificities of ZO-1 PDZ1 and PDZ3 have not been characterized by large scale screening as shown by a limited number of known target proteins, ZO-1 PDZ1 binds ligands that conform to Class I (33,34), Class II (35,36), and Class III (35) motifs, and PDZ3 binds ligands conforming to Class II motif (35). Erbin PDZ domain, as determined by phage display experiments, binds peptides that belong to Class I motif (37). We reasoned that the binding sequences selected by these three PDZ domains could cover the major classes of PDZ motifs.
From RPY2Hs, we isolated 42, 37, and 35 non-redundant positive binding clones for ZO-1 PDZ1, ZO-1 PDZ3, and Erbin PDZ, respectively (Supplemental Tables 2-4). All the specific clones were positive for both reporter genes HIS3 and LacZ and confirmed by three independent cotransformation experiments of the candidate plasmids with their corresponding bait.
Next we generated the initial PDZ ligand library by collecting all the positive PDZ binding clones isolated from RPY2Hs and all the PDZ domain clones in our laboratory. The library was composed of 143 non-redundant clones (Supplemental Table 12). To examine whether the linear ligands in the library were representative of potential PDZ binding sequences, the class and length distributions were analyzed (Fig. 1, B and C). Besides all the four classes of traditional ligands, notably some unclassified ligands with Cys or hydrophilic amino acid residues at the extreme carboxyl termini were included, enabling discovery of atypical PDZ motifs. In detail, the numbers of Class I, Class II, Class III, Class IV, and unclassified ligands in the library were 72 (56.25%), 35 (27.34%), 6 (4.70%), 1 (0.80%), and 14 (10.90%), respectively. It has been reported that the majority of PDZ motifs are Class I and Class II, and Class III and Class IV are the minority (7,21). The distribution of these traditional classes of ligands in the library was consistent with the general preference of PDZ domains. The lengths of linear ligands in the library varied from six to 91 amino acid residues with most of them in the range of 6 -20, longer than the four residues required for typical PDZ recognition. Both analyses indicated that this PDZ ligand library was suitable to be used in validation screenings.
High Throughput Validation Screening of PDZ Ligand Library Using Y2H Mating Array-We used the Y2H mating array approach for validation screening of the PDZ ligand library in which each bait PDZ was mated with every candidate ligand in a 96-well format. Positive interactions were identified first by the growth of colonies on selective medium followed by the appearance of blue stains in LacZ assays. From the mating array both positive and negative binding sequences could be directly read without resequencing.
To assess the validation screening method, we first chose to study HtrA2 PDZ, which binds Class II motifs as determined by oriented peptide library (38). As a result, 38 positive binding sequences were rapidly identified (Supplemental Table 5). We also used ZO-1 PDZ1, ZO-1 PDZ3, and Erbin PDZ, which had been analyzed by RPY2H, to do validation screenings. Sixteen, zero, and three new positive sequences besides the ones found in RPY2H were discovered for each of them  Tables 2-4). Remarkably some negative sequences were quite similar to the positive ones. For example, -ETWV* was selected as positive by ZO-1 PDZ1 in RPY2H, whereas -DTWV* was identified as negative by mating array. Such a slight difference may present meaningful information to differentiate between binding and nonbinding sequences, suggesting that negative sequences are as important for precisely characterizing the PDZ binding properties as are the positive sequences.
New results were obtained for ZO-1 PDZ1, ZO-1 PDZ3, and Erbin PDZ from validation screenings, including negative and new positive sequences. They were useful complements to the RPY2H results. As the PDZ ligand library expands, each studied PDZ can be taken into a new cycle of validation screening for more comprehensive characterization of the binding properties.

Analysis of the Consensus-binding Sequences and Prediction of Candidate Ligand
Proteins-We aligned the positive sequences for each PDZ. The results indicated strong consensus located at the carboxyl-terminal three to five amino acid residues (Supplemental Tables 2-5, shaded). Furthermore we comparatively analyzed both positive and negative sequences to deduce more precise consensus-binding sequences (Table I).
Next we searched Swiss-Prot or/and NR human database with the consensus-binding sequences to predict all the potential ligands in human proteome. For each PDZ studied, many native proteins were predicted to be its ligands, including the previously identified binding partners such as p0071, ␦-catenin, and armadillo repeat protein deleted in velo-cardio-facial syndrome for Erbin PDZ (37, 39) as well as novel ones (Supplemental Tables 7-10). Then we used biological information such as subcellular localization and known functions to select the most promising candidate ligands (Table I). binding sequences. g, characterization of binding properties by comparative analysis of both positive and negative sequences. h, prediction of candidate ligands by protein database searches with consensus-binding sequences followed by manually filtering with biological information such as subcellular localization and known functions. i, confirmation of candidate interactions. The plasmids expressing carboxyl termini of predicted candidate PDZ ligands are constructed individually by synthesis of oligonucleotides and cotransformed with corresponding PDZ bait in one-to-one Y2H assay to confirm the interactions. j, the synthesized clones are added to the PDZ ligand library to expand its diversity. k, discovery of novel PDZ ligands. l, supplement of PDZ ligand library by addition of known or potential PDZ ligands. n PDZ, new PDZ domain; Syn, synthesized clones of PDZ ligands; cDNA, cDNA clones of PDZ ligands. B, class distribution of PDZ ligand library. Inner rings, ligands in the initial PDZ ligand library generated by collecting all the positive PDZ binding clones isolated from RPY2Hs by ZO-1 PDZ1, ZO-1 PDZ3, Erbin PDZ, and GAIP C terminus-interacting protein PDZ. The library contained 128 non-redundant linear PDZ ligands in total. There were 72 (56.25%), 35 (27.34%), 6 (4.70%), 1 (0.80%), and 14 (10.90%) for Class I, Class II, Class III, Class IV, and other ligands, respectively. Outer rings, ligands in the expanded PDZ ligand library, which was expanded by addition of clones constructed for confirmation, cDNA clones of known PDZ ligands, and other constructed clones of potential PDZ ligands. Up to the submission of this manuscript, the library contained 242 non-redundant linear PDZ ligands in total. There were 124 (51.24%), 71 (29.34%), 17 (7.02%), 2 (0.80%), and 28 (11.57%) for Class I, Class II, Class III, Class IV, and other ligands, respectively. Four classes of ligands are designated: Class I, -(S/T)X⌽*; Class II, -⌽X⌽*; Class III, -(D/E/K/R)X⌽*; and Class IV, -X⌿(D/E)* where X is any amino acid, ⌽ is a hydrophobic residue, and ⌿ is an aromatic residue. Other unclassified ligands prefer the basic amino acid Lys and polar amino acids Asn, Ser, and Cys at the carboxyl termini. C, length distribution of PDZ ligand library. White bars, ligands in the initial PDZ ligand library. Black bars, ligands in the expanded ligand library. The length of linear library ligands mainly ranged from six to 20 amino acid residues, longer than four residues for typical PDZ domain recognition. AA, amino acids.

Characterization of Domains by Novel Screening Method
Confirmation of Candidate PDZ Interactions-We used one-to-one Y2H assay to confirm the potential interactions. The carboxyl termini of the most promising candidate ligand proteins were cloned and cotransformed into yeast cells with the corresponding PDZ bait. As shown in Table I, several novel ligand proteins were confirmed for each PDZ of interest. Then we added all the clones constructed here to the PDZ ligand library to expand its diversity.
A New Cycle: Investigation of LNX1 PDZ2 by Validation Screening of the Expanded PDZ Ligand Library-The initial PDZ ligand library was expanded by addition of synthesized clones of native proteins constructed for confirmations. By validation screening of the expanded library, we rapidly studied another PDZ domain, LNX1 PDZ2. Its binding properties have not been clarified, and it has been reported that it binds only one Class I ligand protein (40). Here we identified 83 positive clones (Supplemental Table 6), including 14 native protein clones (Table I). The direct isolation of native protein clones greatly improved the efficiency for discovering novel ligand proteins. We consequently deduced the consensus-binding sequences (Table I), predicted the potential native ligands (Supplemental Table 11), and confirmed the most promising interactions following the processes of the strategy. Finally we confirmed 10 more novel ligand proteins selected from the protein database searches ( Table I).
Characterization of the Binding Properties of PDZ Domain-For each PDZ of interest, a series of positive and negative sequences were successfully identified from RPY2Hs and/or validation screenings of PDZ ligand library (Supplemental Tables 2-6). The occurrence of each amino acid type at positions of the positive sequences was calculated (Fig. 2, A, B,  and C). A semiquantitative binding assay (ONPG) was used to assess the relative binding affinity of positive interactions (Supplemental Fig. 1, A and B).
The binding properties of ZO-1 PDZ1 and Erbin PDZ are similar. Both PDZ domains showed predominant preference for hydrophobic amino acids at the extreme carboxyl terminus (P 0 ), especially Val (94.7%) for Erbin and Val (38.1%), Leu (38.1%), or Ile (14.3%) for ZO-1 PDZ1; whereas rarer Cys and Thr were also selected by PDZ1. At P Ϫ1 , aromatic amino acids were overwhelmingly preferred, especially Trp (68.4%) for Erbin PDZ; however, fewer Asp and Glu were also tolerated by Erbin PDZ, and fewer Arg was tolerated by PDZ1. At P Ϫ2 , Ser (23.8% for PDZ1; 23.7% for Erbin) and Thr (69.4% for PDZ1; 55.3% for Erbin) were the dominant selections, while acidic and hydrophobic residues such as Glu, Val, Ile, and Ala were also observed to a lesser extent. At P Ϫ3 , for Erbin Glu (60.5%) and Asp (18.4%) were mostly selected, whereas for PDZ1 a more variable consensus was found, but Asp could not be tolerated (derived from negative sequences). At P Ϫ4 , for Erbin PDZ, hydrophobic amino acids contributed to higher affinity than did other amino acids when the last four residues were the same (Supplemental Fig. 1B); for PDZ1, no specificity was observed beyond the four carboxyl-terminal residues.
ZO-1 PDZ3 showed unique properties quite different from ZO-1 PDZ1 and Erbin PDZ. At P 0 , hydrophobic amino acids, Val (29.7%), Leu (27%), Ile (13.5%), and Phe (13.5%), were preferred along with a rare occurrence of aromatic amino acids, Tyr and Trp, and hydrophilic amino acids, Lys, Asp, and Gln. At P Ϫ1 , aromatic amino acids were predominant preferences, in particular Phe (48.6%) and Trp (43.2%). At P Ϫ2 , no dominant preference was observed among the positive sequences, although there was a slight favor of hydrophobic residues over hydrophilic ones; however, the negative sequences revealed distinct intolerance for Ser and Thr. Notably PDZ3 showed specific preference for hydrophobic amino acid residues at P Ϫ5 , especially Leu (56.8%), which is an unreported PDZ binding property.
HtrA2 PDZ recognized the Class I and Class III sequences besides the reported Class II sequences (38). Three consensus-binding sequences, -X(S/T)⌿⌽*, -X⌽⌿⌽*, and -X(D/ E)⌿⌽* (where X denotes any amino acids except the conditions described as follows) were identified. In contrast to traditional PDZ recognition specificities, HtrA2 PDZ exhibited restricted variability at P Ϫ3 , namely the P Ϫ3 residue was selected according to the composition of the last three residues. Glu Ϫ3 was only present in the sequences of -E(T/V)(W/F)V*. (S/T) Ϫ3 were overwhelmingly preferred when hydrophobic or acidic residues were at P Ϫ2 . Hydrophobic residues were selected with the consensus of -(S/T)H⌽*, and basic or large hydrophobic residues were necessary with -(S/T)Y⌽*.
Unique binding properties were observed at P 0 of LNX1 PDZ2 for which two different kinds of residues, Val and Cys, were both predominantly preferred. When Cys was present at P 0 , the consensus sequence can be defined as -(S/T)XC*. While Val was at P 0 , aromatic amino acids, Trp and Phe, were the overwhelming preference at P Ϫ1 , and no specificity was observed at P Ϫ2 ; the consensus sequence can be deduced as -X(W/F)V*. This suggested the traditional classification of PDZ GRLGDLWV Tumor necrosis factor receptor superfamily member 18   (Table I and Fig. 3). As cellular functions are carried out by stably or transiently associated groups of proteins (16), we reasoned that the protein pairs that are closer within the protein interaction network (41,42) are of higher biological relevance than others. Therefore, we calculated the shortest path length between the two proteins of each interaction (30) that could be found in the HPRD to evaluate the confidence of the novel interactions. 22 of 40 novel interaction pairs can be linked (Supplemental Table 13 and Supplemental Fig. 2). 21 pairs form links in the range from three to six. The longest path length is eight links between only one protein pair. The mean shortest path length is 4.82, which means the two proteins of the novel interactions are highly clustered (16,43), increasing the possibility of direct interaction with each other.
Some of the novel interactions identified in this study may reveal significant clues for further functional investigation. As one example, we found a novel interaction between the PDZ of Erbin, which mediates the localization of its target proteins such as ErbB2 (44), and PDZ domain binding kinase, which is recruited to MAPK14 through interaction with human discslarge PDZ and activates MAPK14 signaling (45,46). We speculate that Erbin is probably critical for the subcellular localization and function of PDZ domain binding kinase and plays roles in the MAPK signal transduction pathway. Another ex-ample was the discovery of several novel interactions for the PDZ2 of LNX1, the first described PDZ domain-containing member of the E3 ubiquitin ligase family (47), which may provide important clues for further clarifying the roles of LNX1 in regulation of protein ubiquitin modification. DISCUSSION We developed a systematic strategy for characterization of the binding properties of PDZ domains based on high throughput validation screening of a specialized PDZ ligand library. The key of this integrated system is the PDZ ligand library. Besides collecting the binding clones isolated from RPY2Hs and predicted ligand clones constructed for confirmations, the library can also be further expanded by addition of other clones of other known or potential PDZ ligands. Up to the submission of this manuscript, our PDZ ligand library had been expanded to a total of 257 non-redundant PDZ ligands (Supplemental Table 12). Both class and length distributions of linear ligands in the library were analyzed (Fig. 1, B and C). The current library consists of four kinds of ligands, including PDZ binding peptides, carboxyl termini of native proteins, full-length native PDZ ligand proteins, and PDZ domains. The former three kinds are appropriate for studying the carboxylterminal binding properties of PDZ, and the last one enables study of the dimerization of PDZ domains. The native proteins in the library enable direct discovery of novel PDZ ligand proteins from a single round of validation screening. However, the internal PDZ binding sequences are not covered so far.
The high efficiency of the strategy is accounted for by the following features. 1) It is time-saving. 1 month at least is required for one cycle of traditional Y2H screening, whereas GRLGDLWV Tumor necrosis factor receptor superfamily member 18 The ligand proteins identified directly from the validation screening of the expanded PDZ ligand library.

Characterization of Domains by Novel Screening Method
only 1 week is sufficient for one validation screening. 2) It is labor-saving. Current procedures of traditional screening are tedious, including large scale library transformation, isolation of candidate plasmids, cotransformation, and sequencing of positive clones, whereas only one step of mating assay is required for validation screening. Therefore, high throughput can be achieved. 3) There is a high success rate of prediction. By considering the negative binding sequences, more precise consensus-binding sequences can be deduced, thereby enhancing the success of prediction. Taking HtrA2 PDZ as an example, the comparative analysis of all the positive and negative sequences uncovered the unique preference of P Ϫ3 . As a result, seven of 10 predicted candidate ligand proteins were confirmed. The confidence of our results can be justified in several aspects. First, clear consensus-binding sequences are present at the carboxyl termini of positive sequences selected by each PDZ of interest. Second, 10 of 20 reported PDZ ligand proteins were recapitulated in our system (50%); this is much higher than currently expected overlap rates between data sets (15,16). Third, each protein pair of 32 interactions identified here can be mapped into a small network of protein interactions extracted from the HPRD reference data, suggesting that the two binding partners are biological relevant (41,42). Links for the remaining 18 interactions were not found in the HPRD network probably because the current HPRD interaction data are far from completed.
The binding properties of five PDZ domains were characterized in this study. All of the recognition specificities previously determined by large scale approaches were reconfirmed, but also some novel binding properties were discovered. The specificity of Erbin PDZ had previously been studied by phage-displayed random peptide library, and a carboxyl-terminal -(D/E)(S/T)WV* binding motif was identified (37). In this study, we identified more diverse consensusbinding sequences and observed a novel phenomenon that Asp and Glu could be selected at P Ϫ1 in addition to Trp. The preferred binding sequences of HtrA2 PDZ have been determined previously by the oriented peptide library approach, and the Class II motif has been identified to be the most preferred (38). We found that HtrA2 PDZ also permitted Class I and Class III ligands. The broader binding properties characterized can probably be attributed to effectively avoiding both affinity competition and abundance suppression in our system. ZO-1 PDZ1 and PDZ3 had not been studied by large scale screening before. Several proteins, including Claudins (35), Connexin36 (36), NEPH1 (33), and TRPC4 (34), have been reported to bind ZO-1 PDZ1 through their carboxyl termini. Here all the reported native ligand proteins were retrieved except the Claudins with the sequences of -KNYV* and -SQYV*. For ZO-1 PDZ3, only one interactor, junction adhesion molecule, has been reported (48). However, the interaction was not confirmed in our system, even in one-toone Y2H assay between PDZ3 and the carboxyl terminus of JAM. The reasons are not clear. Noticeably the reported ligand proteins we failed to identify carry carboxyl termini that were not consistent with the consensus-binding sequences. For LNX1 PDZ2, the current study is the first to comprehensively describe the binding properties of this domain. The results pave the way for functional characterization of LNX1 protein.
To date, PDZ domains have been conventionally grouped into four classes based on binding motifs (21). However, this simple classification may not be sufficient in differentiating all PDZ domains. For example, ZO-1 PDZ3 exhibited a unique characterization for specific selections at P Ϫ5 where hydrophobic amino acids, in particular Leu, were predominantly preferred; yet P Ϫ5 was not considered to contribute to PDZ recognition specificity before. The predominant preference of aromatic amino acids Trp, Phe, and Tyr at P Ϫ1 for all five PDZ domains studied was observed, whereas previously P Ϫ1 was not considered conserved and was regarded of only minor importance for PDZ binding specificity (49). At P 0 , besides typical hydrophobic amino acids, Cys was significantly se-lected by LNX1 PDZ2, and other polar or hydrophilic residues such as Lys, Asn, and Thr were also selected by different PDZ domains. In addition, all the PDZ domains of interest bind more than one conventional class of ligands as well as unclassified PDZ ligands. As increasing novel types of PDZligand interactions are reported (50,51), the traditional classification may need major revision.
We demonstrated the feasibility of the newly developed strategy by using it to investigate the PDZ domain interactions. It can be easily extended to a variety of domain families that recognize short peptides as long as an individual member in a certain family can engage different ligands and one ligand can be recognized by different domain members. A subset of peptide-binding domains are adoptable, including SH3 domain that recognizes PXXP motif (52,53), WW domain that binds PPXY motif (54), GYF domain that favors PPG(F/I/L/ M/V) motif (3), EVH1 domain that prefers FPPPPX(D/E) motif (55), and VPS-27, Hrs, and STAM domain that selects (D/ E)XXLL motif (56), etc. For the domain families that recognize modified peptides, such as SH2 and phosphotyrosine-binding that bind phosphorylated tyrosine (57), the strategy is incompetent due to the lack of corresponding post-translational modifications in Y2H system. This limitation may be overcome by applying the yeast three-hybrid system in which an exogenous kinase can be used to phosphorylate the Tyr residues on the ligands.