Toward an understanding of the protein interaction network of the human liver

Proteome-scale protein interaction maps are available for many organisms, ranging from bacteria, yeast, worms and flies to humans. These maps provide substantial new insights into systems biology, disease research and drug discovery. However, only a small fraction of the total number of human protein–protein interactions has been identified. In this study, we map the interactions of an unbiased selection of 5026 human liver expression proteins by yeast two-hybrid technology and establish a human liver protein interaction network (HLPN) composed of 3484 interactions among 2582 proteins. The data set has a validation rate of over 72% as determined by three independent biochemical or cellular assays. The network includes metabolic enzymes and liver-specific, liver-phenotype and liver-disease proteins that are individually critical for the maintenance of liver functions. The liver enriched proteins had significantly different topological properties and increased our understanding of the functional relationships among proteins in a liver-specific manner. Our data represent the first comprehensive description of a HLPN, which could be a valuable tool for understanding the functioning of the protein interaction network of the human liver.

network hubs. These experiments, if representative, demonstrate the usefulness of the Y2H data set as a resource. More broadly, the authors use a bioinformatic scoring system (PRINCESS, MCP 2008) to define a set of 1005 PPIs as higher confidence.
The network is then analyzed and annotated. The major part of the network analyses is focusing on four categories of liver proteins, ME, LS, LP and LD. The interacting group of ME, LS, LP and LD proteins show distinct topological properties. LP and LD proteins and their interaction partner tend to be central in the network. LS proteins, although not central in the big (somatic?) network may be important for liver development. Interesting connections of metabolic enzymes to transcriptions are identified. There is a detailed discussion of sub networks implicating candidate proteins in e.g. redox processes, pathways etc. From the interaction pattern of Git2, a possible involvement in NFxB signaling is inferred. Git2 seems to increase deubiquitylation of NEMO leading to a decrease of NFxB transcriptional activity.
Systematic human interactome mapping is a vital goal in the scientific community and this manuscript is a solid, significant contribution, large in terms of data size. The analysis presented is probably not very innovative as such, but a good example of how to make sense of the data. Connections between metabolism and signaling/transcription seem interesting and may be particularly relevant for a liver interactome map. In more general terms: Numerous excellent network analyses have been published and represent major advances in network biology. These studies are always based on the same, few, good data sets and at the same time always suffer from the limited amount of systematically generated PPI data. This study provides new data input to the field.
Specific points: 1) It would be nice to see some representative plates e.g. from the deconvolution step, to demonstrate how the phenotypic evaluation of the PPIs is actually performed. 2) Very surprisingly it seems that the PPIs are at least not underrepresented in membrane proteins (1b). Since this is the known major bias in Y2H screens, this needs further investigation, going beyond the statement "the GO categories distribution of HLPN proteins was consistent with the liver proteome". 3) Figure 1d does not show that liver proteins are enriched in the HLPN as the enrichment ratio is the same among interacting proteins and screened proteins. 4) With respect to interesting connections of metabolic enzymes to TFs: Figure S4 is lacking an appropriate description of the experiment and e.g. the axis labeling. 5) Figure 3, Demonstration of Git2 effects on Nemo deubiquitylation. b and c: it seems that ubiquitylation stabilizes nemo and thus it is not very clear whether it is ubiqitylation that is increased or the protein levels. Of course it can be both, but this needs clarification/interpretation. b lane four: does it really contain Git2? Because it is not seen on the Myc blot and the effect is probably due to increased A20. I am not convinced about this 3b figure.
6) The authors in general use available tools, such as PRINCESS, network analyser (not quoted in the main text!), etc. This is ok as the methods are well defined; however my concern is that generic methods may not always best suit such original data set with their particularities. It would also help the reader to at least get to know the underlying principles of the methods used, without having to look up the method somewhere else. 7) The English needs to be improved.
Reviewer #2 (Remarks to the Author): Wang et al. report the use of yeast two-hybrid to characterize a protein interaction network relevant to human liver. For this purpose, they selected series of more than 5,000 human proteins relevant to liver biology. The authors used two different yeast two-hybrid screening strategies: a ~5,000x~5,000 matrix and screening against a human liver cDNA library. Overall, they identified 3,484 interactions, of them 1,105 have high confidence scores. The authors also provide an impressive and extensive validation using orthogonal assays such as GST-pulldowns(47 pairs), co-IP (94 pairs) and co-localization (117 pairs). Finally, for six cases they also used an in vivo gene reporter assay to prove the physiological relevance of the charted interactions. The datasets is certainly of great interest to the community and deserves publication. Several issue, mainly relating to the analysis, need to be addressed before the manuscript can be accepted: General points: 1) The initial selection of 5,026, from the 18,000 expressed in liver, is very important to the entire story and not sufficiently described. The selection is apparently based on a proteomics work performed by the same consortium, using mass spectrometry. A figure or table dedicated to this point would help address this important point (see also point 5: comments below concerning circular arguments). Also the criteria for the selection of the 1,428 baits used for the cDNA library screens are not clear. These are important points, as later on the authors claim some special behavior for a set of proteins (see point 3); it is important to prove that this was not introduced from the initial bait/prey selection.
2) The overall estimation of the dataset quality is currently insufficient and not complete without a measure of false positives and false negatives. This should be relatively easy. The authors could use the now well established methods from Vidal and many others for this purpose.
3) Statistic and significance: The authors use the term significant all over the manuscript, but rarely provide p-values and information on the statistical tests used. This should be addressed for the entire manuscript and only a few examples are listed below: * page 3 "..., the selected proteins are significantly enriched with metabolic enzymes (ME), liverspecific proteins (LS), liver phenotype proteins (LP) ..." the authors need to provide with p-value for all cases. * page 4 "First, the HLPN is remarkably enriched for proteins that have been shown to be specific expressed in liver or required individually for controlling liver functions." Again no statistic. * page 5 "But the degree and the betweenness centrality of their partners were significantly greater than the expected values." 4) The authors claim at multiple places (abstract, page 4 paragraph 2 and page 8 paragraph 2 on pathways involved in liver function) that the network is enriched for proteins expressed in the liver and required for liver function. This sounds like a circular argument as the analysis is biased for proteins relevant to liver. This should certainly be clarified.
5) The authors found that the betweeness and centrality value for LP and LD proteins are higher than for other HLPN which is a very interesting and intriguing observation that deserves a deeper analysis. The authors should here again perform some statistic (give p-values) and also prove that these differences cannot be attributed to technical issues inherent to the screening and bait/prey selection (see point 1). Indeed some proteins have been used often, as prey and as bait and also for both the matrix analysis and human liver library screen, whereas some were only used once. Could it be that the higher betweeness and centrality values observed for some proteins simply points to more studied proteins. 6) Page 7 the authors can use the network to predict LP and LD proteins, which is again a very interesting observation. Currently this is a bit redundant with the previous story concerning betweeness and centrality and may be the two stories could be merged. Also could they predict a new LD or LP ? and... may be validate ?
Other points: * The legend of the Supplementary tables should be given directly in the excel file (with abbreviation, etc) so that one does not need to open several files to get the relevant information. * Table S4 "Potential cross-talk between metabolic pathways" some are a bit confusing for example, line 56 CAT and CAT is the same enzyme in the same pathway. Does that represent a cross-talk? The same is true for ECHS1, UGP2, GLUL, ASL, etc: this needs clarification. * The manuscript contains many abbreviations, which meanings are not always clear. For example, HCC appears first page 8 without explanation, and the meaning is hidden page 18 in the figure legend. If possible, abbreviations should be used with parsimony as they make the manuscript difficult to read. * Table S6 Under "Type of cross-talk interaction" the classification between different and same is counter-intuitive. Per definition a cross talk is across pathway; may be a better nomenclature here would help. * Similarly in Table S7 The "Crosstalk proteins between different KEGG signaling pathways" contains mainly proteins annotated NA (not assigned to a pathway, if I got it properly). Strictly speaking, they are not really formal cases of cross-talk between pathways. This again should be clarified. J. Wang, et al, have produced a useful protein-protein interaction map for human liver cells using their set of ~5,000 cloned, liver-enriched protein-coding genes. The use of a matrix yeast two-hybrid (Y2H), a 4,788 x 4,740 array, and a cDNA library Y2H screen, 1,428 baits x liver cDNA prey library, provides a valuable dataset of binary interactions that encompasses a large fraction of liverenriched functions. That data quality has been established using a combination of orthogonal assays on a subset of interaction pairs and their previously published bioinformatics strategy provides a measure of confidence in the overall dataset.
While the overall screening results and the investigation of some biological features of specific interactions render this an attractive study, there are a number of specific concerns with the current version of the manuscript that need to be addressed.
Several minor concerns with the text are: -there is no description of how the 1,428 baits used for the cDNA library screen were chosen.
-while the overlap between the interaction pairs found in the array screen and those in the library screen is very small (albeit statistically significant), the authors should indicate which genes were found as prey in both screens.
-the text states that "After detecting more than 2.27x107 protein pairs,...." -this cannot be correct. Their matrix screen of 4,788 x 4,740 would test 2.26x107 combinations (covering 1.13x107 unique pairs) but would not identify that many interaction pairs based on their criteria for selecting Y2H positives.
-there is some inconsistency in the use of gene symbols. For example, IKBKG is the official symbol and is used in Fig 1F but NEMO, an alias, is used in Fig 2. Similarly for TNFAIP3 and its alias A20 -the authors should use official gene symbols whenever possible and not rely on either an alias or mixed naming. -in Fig 2D, all the nodes appear red -do all the nodes correspond to proteins with a liver phenotype. -in Fig 2E, what are the grey nodes.
-the various IP/westerns need better labeling. The text and legends suggest that the pairs were tested in reciprocal fashion (Myc-X + Flag-Y and Myc-Y + Flag-X) but only one configuration is shown. In fig S6, both A20 (TNFAIP3) and GIT2 are shown as myc-fusions. If that is the case, then one cannot conclude anything about the GIT2-NEMO(IKBKG) association, especially since the individual immunoblot panels are not labeled with respect to what fusion protein(s) is detected. -in the Y2H experiments, were diploids directly spotted or was replica-plating done. The authors indicate that interactions from two independent screens were considered "true positives" -while these would be Y2H positives, were such "true positives" also retested using fresh glycerol stocks and, more importantly, was any sequencing done to confirm clone identity. For the cDNA library screen, what sequencing criteria was used to confirm ID and reading frame and were any baits sequenced to confirm identity of both partners? -were any pairs tested by co-IP also tested by GST-affinity capture and vice versa? There also did not appear to be any mention of the positive and negative controls used for the orthogonal assaysan empty vector control is not sufficient as a negative control.
The major concern with the current manuscript is that this network is one highly enriched in only the most highly expressed liver gene products because the baits used and the prey found correspond to highly expressed liver genes. While all the interaction pairs have a high degree of confidence, any conclusions regarding the topological or statistical nature of the resulting networks are overstated. Basically, there is a certain amount of circularity since the starting point is a biased set of liverenriched genes. As others have shown, inferences on network topology need to take into account sampling issues and biases in the search space of genes tested.
Calling the network a human liver-specific protein interaction network (HLPN) and emphasizing specific interactions as liver-specific is also somewhat misleading since many of the highlighted nodes in the network are ubiquitously expressed proteins or play key roles in other tissues/cells.
Overall, the authors have produced a useful protein-interaction map for genes expressed in liver, albeit not a liver-specific map. The data is high quality as a result of multiple orthogonal validations (bioinformatics and experimental), but the analysis of network properties is premature given the biased nature of the baits and preys chosen for experimental investigation. Comments from the editor: Thank you again for submitting your work to Molecular Systems Biology. We have now heard back from the three referees who agreed to evaluate your manuscript. As you will see from the reports below, the referees find the topic of your study of potential interest. They, however, raise substantial concerns, which, I am afraid to say, preclude its publication in its present form.
In general, the reviewers recognized that this work provides a large and potentially useful dataset relevant to interactions among proteins expressed in the human liver. Nonetheless, they raise a series of issues which they felt would need to be addressed convincingly before this work would be appropriate for publications. I highlight here two issues of particular importance:  Based on the above speculation, we think the false positive rate of HLPN is far less than 58.9%. The more convincing strategy to measure the confidence of one dataset is to validate it by the independent experiment. So in this paper, we validated randomly selected interactions by performing independent biochemical or cellular assays (Supplementary Table S2). At least 72% was confirmed by independent biochemical or cellular assays (Supplementary Table S2). Thus, the false positive rate might be less than 28%, which is similar to previous reports of human interatome datasets (Cell. 2005, 122(6):957-68;Nature, 2005Nature, , 437(7062):1173.
The experimental controls of the co-IP and GST pull down assays are explained in the following point-to-point responses to the comments of the reviewers. As the standard procedure, we used the cells transfected with only one of the proteins as negative controls in co-IP assay. Also, we used the antibody to IP a protein and then used the same antibody to detect the protein to show the IP is worked as a positive control. For the GST-pull down assays, similar to most published articles, we used the GST vector as negative control. The pull down of GST and GST fusion proteins was as a positive control to show that the experiment system is worked.
Comments: 2. The reviewers felt that the initial selection of the bait proteins had not been sufficiently described, and they worried that the bias introduced by this targeted set of liverexpressed proteins needed to be rigorously accounted for when assessing the functional enrichments and topological properties of the liver interaction network. More broadly, they felt that the bioinformatic analysis of the network and its properties required much more rigorous statistical analysis.
Answer: Based on our understanding of the characteristics of the human liver proteome, we analyzed the functional and regulatory proteins that play important roles in liver development, regeneration, metabolism, biosynthesis and diseases (J Proteome Res. 2010, 9(1):79-94). From human liver protein and mRNA expression datasets, which were collected through shot-gun proteomics and microarray expression analyses (Supplemental Data), a total of 5,026 proteins were selected for interaction screening.
Because the object of our work is to generate a protein-protein interaction network of human liver, the selected proteins for Y2H screening are significantly enriched with liver proteins (ME, LS, LP and LD). In the subsequent bioinformatics analysis of HLPN network, these proteins were highly enriched. But the special behavior of these proteins is not introduced from the initial baits and preys selection. First, although ME, LS, LP and LD are all highly enrichment, the topological properties of the subnetworks of these proteins are difference. The degree centrality and the betweenness centrality values of LP and LD are significantly higher than other HLPN proteins. ME or LS do not have vital network positions in HLPN. The difference properties of subnetworks of LP, LD, ME and LS reflect that specific proteins of a certain category have special behavior, which may not be due to the initial selection of baits or preys. Second, similar to previous reports, certain kind of proteins tends to form a cluster, such as phenotype and diseases proteins. The phenotype proteins of yeast have a large tendency to interact with each other (Proc Natl Acad Sci U S A. 2004, 101(52):18006-11). The diseases proteins also have the same properties as the phenotype proteins in the network (Proc Natl Acad Sci U S A. 2007, 104(21):8685-90). Third, to exclude that the array screening with Y2H method may introduce bias that from baits/preys selection, we analyzed the subnetwork with the library screening method. We found that LP, LD, ME and LS proteins are also enriched in the preys, which further confirmed the former conclusions. The special properties of subnetworks of LP, LD, ME and LS are not due to the baits or preys selection. Fourth, HLPN contains only part of the entire human liver network, which might lead to some sampling bias. To address this concern, we compiled a virtual liver protein-protein interaction network with the data from HPRD databases (http://www.hprd.org). In the compiled liver protein interaction network, all topological conclusions with ME, LS, LP and LD remains to be true (Supplementary Table S4). Moreover, we randomly added or removed 5~20% edges and found that all the conclusions still hold, which suggest that topological features of ME, LS, LP and LD of HLPN are not artifact from the biased datasets. All of these results indicate that the special features of ME, LS, LP and LD proteins are not introduced from the initial baits and preys selection, or the sampling bias.

More bioinformatics and statistical analysis has been included in the revised manuscript.
Comments: In addition, the reviewers noted several cases where this manuscript required more explicit detail, particular in the Figure descriptions. Fig. 1 and 3. In each case, the figure legend should clearly state how many replicates were performed, whether the replicates were from independent biological samples, and what the error bars actually represent (standard error, standard deviation, etc.).

On this note, the Editor found it somewhat difficult to determine exactly what the error bars represent in
Answer: Thanks for the comments. All of the error bars in Fig. 1 and 3 are standard deviation. Data are presented as means ± S.D. (n=3).The results are representative of three independent experiments. The corresponding description has been added to the figure legends.
Comments: When you do submit your revised work we ask that you also address the following format and data issues: Comments: If you feel you can satisfactorily deal with these points and those listed by the referees, you may wish to submit a revised version of your manuscript. Please attach a covering letter giving details of the way in which you have handled each of the points raised by the referees. A revised manuscript will be once again subject to review and you probably understand that we can give you no guarantee at this stage that the eventual outcome will be favorable.

Answer:
The point to point responses to the comments have been included in the cover letter.

Comments: Wang et al. report the generation and analysis of a large Y2H-based protein -protein interaction network that focuses on liver proteins. A combination of matrix and library Y2H
screening in a space of approx. 5000x5000 proteins enriched in "liver annotations" reveals 3,484 PPIs. Technically, the screening is performed at the state of the art, single pass data with retest. Importantly also in the library approach, preys were isolated and freshly retransformed for a second assay. , transient  and COlocalization (117) as well as a SMAD3 reporter assay (6/9); SMAD3 is with 45 PPIs one of the network hubs. These experiments, if representative, demonstrate the usefulness of the Y2H data set as a resource. More broadly, the authors use a bioinformatic scoring system (PRINCESS, MCP 2008) to define a set of 1005 PPIs as higher confidence.

The network is then analyzed and annotated. The major part of the network analyses is focusing on four categories of liver proteins, ME, LS, LP and LD. The interacting group of ME, LS, LP and LD proteins show distinct topological properties. LP and LD proteins and their interaction partner tend to be central in the network. LS proteins, although not central in the big (somatic?) network may be important for liver development. Interesting connections of metabolic enzymes to transcriptions are identified.
There is a detailed discussion of sub networks implicating candidate proteins in e.g. redox processes, pathways etc. From the interaction pattern of Git2, a possible involvement in NFxB signaling is inferred. Git2 seems to increase deubiquitylation of NEMO leading to a decrease of NFxB transcriptional activity.
Systematic human interactome mapping is a vital goal in the scientific community and this manuscript is a solid, significant contribution, large in terms of data size. The analysis presented is probably not very innovative as such, but a good example of how to make sense of the data. Connections between metabolism and signaling/transcription seem interesting and may be particularly relevant for a liver interactome map.
In more general terms: Numerous excellent network analyses have been published and represent major advances in network biology. These studies are always based on the same, few, good data sets and at the same time always suffer from the limited amount of systematically generated PPI data. This study provides new data input to the field. Answer: Thanks for the positive comments.

Specific points:
Comments: 1) It would be nice to see some representative plates e.g. from the deconvolution step, to demonstrate how the phenotypic evaluation of the PPIs is actually performed.
Answer: Thanks for the useful advice. In the liver phenotype proteins analysis, 166 proteins were found to interact with two or more known liver phenotype proteins, among which 31 proteins have been identified as liver-phenotype proteins in the MGI database (Table S6). This represents a 7-fold enrichment compared with all proteins encoded by the human genome. For example, EP300 is an acetyltransferase that interacts with seven liver-phenotype proteins. EP300 is highly expressed in human liver. Knockout of EP300 shows defects of the heart, lung, and small intestine and dies at midgestation. Recently, it is reported that acetylation of metabolic enzymes of liver is ubiquitous and important for their function (Science. 2010, 327(5968):1000-4). So it is reasonable that EP300 might contribute to liver specific functions and change the phenotype of liver.
In the liver diseases protein analysis, we found that 27 proteins were connected to more than two HCC related protein and 6 of them were reported to be related with HCC (Supplementary Table  S7). The chance of these 27 proteins being HCC proteins was 3.4-fold higher than randomly selected proteins from HLPN. For example, the immune response and inflammation signaling pathway is closed related to liver cancer (Cell Res. 2011, 21(1):159-68). A few members of NF-κB signaling pathway were identified as potential HCC candidates, such as IKBKG, MYD88 and NFKB1, which might play certain roles during pathogenesis of HCC.
Comments: 2) Very surprisingly it seems that the PPIs are at least not underrepresented in membrane proteins (1b). Since this is the known major bias in Y2H screens, this needs further investigation, going beyond the statement "the GO categories distribution of HLPN proteins was consistent with the liver proteome".
Answer: Thanks for point this out. We are sorry for the wrong description of the legends in Figure  1B and 1C. From inside to outside, the rings represent all of human proteins, human liver proteins, Y2H matrix proteins and HLPN proteins. Although we selected large amounts of membrane proteins (25%) for constructing of HLPN, the PPIs of membrane proteins are underrepresented (23%), which might due to the technical bias of Y2H systems. The results are consistent with the large scale Y2H dataset of human (19% Y2H matrix membrane proteins vs. 17% membrane proteins in the network, [Cell, 2005, 122 (6):957-68]). On the other hand, we used two complementary Y2H methods to map HLPN. Nearly half interactions of the dataset were from Y2H library screening. It might get more membrane protein interactions, of which different forms (full length and truncations) of membrane proteins exist in the liver cDNA library. Even though, the membrane proteins are still significantly depleted in HLPN when compared with Y2H matrix proteins. Figure 1d does not show that liver proteins are enriched in the HLPN as the enrichment ratio is the same among interacting proteins and screened proteins.

Comments: 3)
Answer: It might due to the misleading of the description of Figure 1D. To construct HLPN, we selected 5026 proteins that expressed in human liver. Compared with all proteins encoded by the human genome, the selected proteins are significantly enriched with liver proteins (ME, LS, LP and LD, the enrichment ratios are all greater than two, all of the P-values <<0.01), which is showed in Figure 1D (red bars). In the HLPN, the enrichment ratios of liver proteins are also extremely significant when compared with all proteins encoded by the human genome (all of the P-values <<0.01, Figure 1D, blue bars). We did not compare the differences of selected Y2H proteins and HLPN proteins. To better show these results, we modified the legends of Figure 1D and presented the P-values from hypergeometry distribution test. Figure S4 is lacking an appropriate description of the experiment and e.g. the axis labeling.

Comments: 4) With respect to interesting connections of metabolic enzymes to TFs:
Answer: The descriptions of Figure legends and the axis labeling have been added to the Figure S4 in the revised manuscript. Figure 3, Demonstration of Git2 effects on Nemo deubiquitylation. b and c: it seems that ubiquitylation stabilizes nemo and thus it is not very clear whether it is ubiqitylation that is increased or the protein levels. Of course it can be both, but this needs clarification/interpretation. b lane four: does it really contain Git2? Because it is not seen on the Myc blot and the effect is probably due to increased A20. I am not convinced about this 3b figure.

Comments: 5)
Answer: It is reported that NEMO is ubiquitinated by K63 linked ubiquitin chain in vivo or under stimulation with cytokines, such as TNF-α or IL6 (J Biol Chem. 2003, 278(39):37297-305;Trends Immunol. 2006, 27(9):395-7;Nature. 2004, 427(6970):167-71.). The K63 linked ubiquitination of NEMO primarily plays a regulatory role, rather than affect its stability. Similar to previous reports, our results showed that ubiquitination of NEMO did not affect its stability ( Figure 3B, lane 2). The ubiquitinated NEMO is decreased when TNFAIP3 (A20) is added ( Figure 3B, lane 3 and 4), which slightly affects the expression level of NEMO. GIT2 inhibits the ubiquitination of NEMO ( Figure  3B, lane 5). The ubiquitinated NEMO is much reduced when co-expressed with A20 ( Figure 3B, lane 6). But the total amount of NEMO is also abated ( Figure 3B, lane 6), which might due to unknown mechanisms. Thus, GIT2 might simultaneously inhibit the ubiquitination of NEMO and affect its stability.
The lane 4 in Figure 3B was wrong labeled, which did not contain GIT2. The repressed effect is due to the expression of increased A20. But when co-expressed GIT2 with A20, the repressed effect of NEMO ubiquitination is enhanced, which suggest that GIT2 indeed inhibits the NEMO ubiquitination ( Figure 3B, lane 6). The Figure 3B has been modified in the revised manuscript.
Comments: 6) The authors in general use available tools, such as PRINCESS, network analyser (not quoted in the main text!), etc. This is ok as the methods are well defined; however my concern is that generic methods may not always best suit such original data set with their particularities. It would also help the reader to at least get to know the underlying principles of the methods used, without having to look up the method somewhere else. mainly relating to the analysis, need to be addressed before the manuscript can be accepted: Answer: Thanks for the positive comments.

General points:
Comments: 1) The initial selection of 5,026, from the 18,000 expressed in liver, is very important to the entire story and not sufficiently described. The selection is apparently based on a proteomics work performed by the same consortium, using mass spectrometry. A figure or table dedicated to this point would help address this important point (see also point 5: comments below concerning circular arguments). Also the criteria for the selection of the 1,428 baits used for the cDNA library screens are not clear. These are important points, as later on the authors claim some special behavior for a set of proteins (see point 3); it is important to prove that this was not introduced from the initial bait/prey selection.
Answer: Based on our understanding of the characteristics of the human liver proteome, we analyzed the functional and regulatory proteins that play important roles in liver development, regeneration, metabolism, biosynthesis and diseases (J Proteome Res. 2010, 9(1):79-94). From human liver protein and mRNA expression datasets, which were collected through shot-gun proteomics and microarray expression analyses (Supplemental Data), a total of 5,026 proteins were selected for interaction screening. As shown in Figure 1B, 1C and Supplementary Fig. 1, these proteins unbiased represents human liver proteome through Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis.
The criteria for the selection of the 1,428 baits are same as the 5,026 liver proteins. The aim of our study is to uncover characteristics of the liver by network analysis. However, the Y2H array we constructed represented only about 34% of the genes expressed in the human liver, and the available interaction information might be limited. Therefore, we screened an adult liver cDNA library using 1,428 bait vectors constructed in a stringent Y2H system designed to express the fusion proteins at low levels.
Because our object is to generate a protein-protein interaction network of human liver, the selected proteins for Y2H screening are significantly enriched with liver proteins (ME, LS, LP and LD). In the subsequent bioinformatics analysis of HLPN network, these proteins were highly enriched. But the special behavior of these proteins is not introduced from the initial baits and preys selection. First, although ME, LS, LP and LD are all highly enrichment, the topological properties of the subnetworks of these proteins are different. The degree centrality and the betweenness centrality values of LP and LD are significantly higher than other HLPN proteins. ME or LS do not have vital network positions in HLPN. The difference properties of subnetworks of LP, LD, ME and LS reflect that specific proteins of a certain category have special behavior, which may not be due to the initial selection of baits or preys.
Second, similar to previous reports, certain kind of proteins tends to form a cluster, such as phenotype and diseases proteins. The phenotype proteins of yeast have a large tendency to interact with each other (Proc Natl Acad Sci U S A. 2004, 101(52):18006-11). The diseases proteins also have the same properties as the phenotype proteins in the network (Proc Natl Acad Sci U S A. 2007, 104(21):8685-90).
Third, to exclude that the array screening with Y2H method may introduce bias that from baits/preys selection, we analyzed the subnetwork with the library screening method. We found that LP, LD, ME and LS proteins are also enriched in the preys, which further confirmed the former conclusions. The special properties of subnetworks of LP, LD, ME and LS are not due to the baits or preys selection.
Fourth, HLPN contains only part of the entire human liver network, which might lead to some sampling bias. To address this concern, we compiled a virtual liver protein-protein interaction network with the data from HPRD databases (http://www.hprd.org). In the compiled liver protein interaction network, all topological conclusions with ME, LS, LP and LD remains to be true (Supplementary Table S4). Moreover, we randomly added or removed 5~20% edges and found that all the conclusions still hold, which suggest that topological features of ME, LS, LP and LD of HLPN are not artifact from the biased datasets. All of these results indicate that the special features of ME, LS, LP and LD proteins are not introduced from the initial baits and preys selection, or the sampling bias.

Comments:
2) The overall estimation of the dataset quality is currently insufficient and not complete without a measure of false positives and false negatives. This should be relatively easy.
The authors could use the now well established methods from Vidal and many others for this purpose.
Answer: We evaluated the false-positives and false-negatives of our interaction dataset used the methods as suggest by the reviewer. To estimate the false positives of our dataset, we adopted a method developed by Dr. Marc Vidal (Science. 2008, 322(5898):104-10). A golden standard negative dataset (GSN) was constructed through selecting the proteins with different cellular locations (Nat Biotechnol. 2005, 23 (8):951-9). There are 455 nucleus proteins and 242 membrane proteins in HLPN. After the manual analyses, we found that 68 interactions between nucleus and membrane proteins have a relative high probability to be negative (We don't mean that any of them are false positive here It seems that there are a great proportion of false positives in HLPN, which might due to following reasons. It is difficult to define a GSN for human being. Only less than 8% or 0. Based on the above speculation, we think the false positive rate of HLPN is far less than 58.9%. The more convincing strategy to measure the confidence of one dataset is to validate it by the independent experiments. So in this paper, we validated randomly selected interactions by performing independent biochemical or cellular assays (Supplementary Table S2). At least 72% was confirmed by independent biochemical or cellular assays (Supplementary Table S2). Thus, the false positive rate might be less than 28%, which is similar to previous reports of human interatome datasets (Cell. 2005, 122 (6) In addition, we also evaluate the false negative ratio of HLPN based on the former report of human interactome size (Proc Natl Acad Sci U S A. 2008, 105(19):6959-64), we estimated the false negative ratio of HLPN is 1-3484×(1-20%)/(650000×(2582×(2582-1)/2))/ (25000×(25000-1)/2)=60%. The ratio of false negative might due to the technical limitations of any given large-scale method. It is estimated that less than 20% interactions could be identified by Y2H technology (Nat Methods. 2009, 6(1):83-90).

Comments: 3) Statistic and significance:
The authors use the term significant all over the manuscript, but rarely provide p-values and information on the statistical tests used. This should be addressed for the entire manuscript and only a few examples are listed below: * page 3 "..., the selected proteins are significantly enriched with metabolic enzymes (ME), liverspecific proteins (LS), liver phenotype proteins (LP) ..." the authors need to provide with p-value for all cases. * page 4 "First, the HLPN is remarkably enriched for proteins that have been shown to be specific expressed in liver or required individually for controlling liver functions." Again no statistic. * page 5 "But the degree and the betweenness centrality of their partners were significantly greater than the expected values." Answer: All the P-values and the related statistical methods have been provided in the revised manuscript (in text or figure legends).

Comments: 4)
The authors claim at multiple places (abstract, page 4 paragraph 2 and page 8 paragraph 2 on pathways involved in liver function) that the network is enriched for proteins expressed in the liver and required for liver function. This sounds like a circular argument as the analysis is biased for proteins relevant to liver. This should certainly be clarified.
Answer: Based on our understanding of the characteristics of the human liver proteome (J Proteome Res. 2010, 9(1):79-94) and transcriptome (Supplemental Data), we selected the functional and regulatory proteins that play important roles in liver development, regeneration, metabolism, biosynthesis and diseases. These proteins unbiased represents human liver proteome through GO and KEGG analysis. So the protein-protein interaction network should reflect the properties of human liver. In fact, the selected proteins for Y2H screening are significantly enriched with liver proteins (ME, LS, LP and LD). In the subsequent bioinformatics analysis of HLPN network, these proteins were highly enriched. But the special behavior of these proteins is not introduced from the initial baits and preys selection. The specific properties of the subnetworks of ME, LS, LP and LD appropriately reflect the functions of these proteins in liver physiology and pathology, which might be not due to the analysis bias.

Comments: 5) The authors found that the betweeness and centrality value for LP and LD proteins are higher than for other HLPN which is a very interesting and intriguing observation that deserves a deeper analysis. The authors should here again perform some statistic (give p-values) and also
prove that these differences cannot be attributed to technical issues inherent to the screening and bait/prey selection (see point 1). Indeed some proteins have been used often, as prey and as bait and also for both the matrix analysis and human liver library screen, whereas some were only used once. Could it be that the higher betweeness and centrality values observed for some proteins simply points to more studied proteins.
Answer: Thanks for the comments. We proved that the special properties of ME, LS, LP and LD proteins are not introduced from the initial baits and preys selection. Briefly, although ME, LS, LP and LD are all highly enrichment, the network properties are difference. In consistence with previous reports, phenotype and diseases proteins tend to form a cluster. The technical bias can be ruled out through analyzing the subnetwork with the Y2H library screening. The special properties of subnetworks of LP, LD, ME and LS are not due to the baits or preys selection. All of the P-values are provided in the revised manuscript. For the Y2H array screening, most of proteins (>94.3%) are used as both bait and prey, of which the technical bias could be neglected. For the Y2H library screening, 1,428 baits were used, which might increase the degree centrality values of these proteins while have limited effect on the betweeness centrality values. The library screening results should have limited effects on the conclusions. First, more than half of the proteins (54%) with betweeness centrality values are not from the baits of Y2H library screening (Supplementary Table S1 and S3), which might due to half of the interactions are from matrix screening. The other 46% proteins with betweeness centrality values were investigated by both matrix and library screening. Because of that these proteins were highly enrichment with important functions, such as liver phenotype and liver diseases, the degree and the betweeness centrality values tends to higher for these proteins than randomly selected proteins. It might be due to the inherent nature of LP and LD proteins. Second, HLPN contains only part of the entire human liver network, which might lead to some sampling bias. To address this concern, we compiled a virtual liver protein-protein interaction network with the data from HPRD databases (http://www.hprd.org). In the compiled liver protein interaction network, all topological conclusions with ME, LS, LP and LD remains to be true (Supplementary  Table S4). Moreover, we randomly added or removed 5~20% edges and found that all the conclusions still hold, which suggest that topological features of ME, LS, LP and LD of HLPN are not artifact from the biased datasets. All of these results indicate that the special features of ME, LS, LP and LD proteins are not introduced from the initial baits and preys selection, or the sampling bias.

Comments: 6) Page 7 the authors can use the network to predict LP and LD proteins, which is again a very interesting observation. Currently this is a bit redundant with the previous story concerning betweeness and centrality and may be the two stories could be merged. Also could they predict a new LD or LP ? and... may be validate ?
Answer: We agree with this comment. Most of potential LP and LD proteins tend to be with higher degree and betweeness centrality values. But not all of these proteins have higher degree or betweeness centrality values. On the other hand, the proteins with higher degree and betweeness centrality values are not always identified as potential LP and LD proteins. We think these proteins might have other crucial functions, such as housekeeping, but not contribute to LP or LD. It is difficult to validate if these proteins have relationship to LP or LD by experiments or bioinformatics. The prediction of potential LP and LD proteins is based on the observation that these proteins are apt to form a cluster with known LP and LD proteins. Thus, we prefer to separately descript the two sections, even with a bit redundant.

Other points: Comments: * The legend of the Supplementary tables should be given directly in the excel file (with abbreviation, etc) so that one does not need to open several files to get the relevant information.
Answer: All the legends of the supplementary tables have been added to make the file more readable. Table S4 "Potential cross-talk between metabolic pathways" some are a bit confusing for example, line 56 CAT and CAT is the same enzyme in the same pathway. Does that represent a cross-talk? The same is true for ECHS1, UGP2, GLUL, ASL, etc: this needs clarification.

Comments: *
Answer: Sorry for the mistake on the title of Table S4 (Table S5 in revised manuscript). Table S5 lists 74 interactions which are of two ME, including not only 42 interactions of potential cross-talk between KEGG metabolic pathways, but also 32 interactions within a same metabolic pathway. The title of Table S5 has been corrected to " List of interactions between metabolism enzymes in HLPN".

Comments: * The manuscript contains many abbreviations, which meanings are not always clear. For example, HCC appears first page 8 without explanation, and the meaning is hidden page 18 in the figure legend. If possible, abbreviations should be used with parsimony as they make the manuscript difficult to read.
Answer: We define the abbreviations when first used in manuscript to make it more readable. Some abbreviations have been omitted from the article.

Comments: * Table S6 Under "Type of cross-talk interaction" the classification between different and same is counter-intuitive. Per definition a cross talk is across pathway; may be a better nomenclature here would help.
Answer: Thanks for the comment. In the revision, we change the tile of Table S10 (Table S6 in former manuscript) into "The crosstalk interactions that link two different signal transduction pathways in HLPN". Here, only if an interaction that can bridge proteins in different pathways will be regarded as crosstalk. Table S7 The "Crosstalk proteins between different KEGG signaling pathways" contains mainly proteins annotated NA (not assigned to a pathway, if I got it properly). Strictly speaking, they are not really formal cases of cross-talk between pathways. This again should be clarified.

Comments: * Similarly in
Answer: To better clarify the meaning of crosstalk proteins, more description have been added to the revised manuscript. In our work, we found two types of crosstalk proteins (Supplementary Table  S9 in revised manuscript). One type of these proteins can link two different pathways, while they belong to a third KEGG pathway. The other type of these proteins also connects two different pathways, but they are not assigned to any KEGG pathway (NA), which might be useful to investigate their roles in different signaling transduction.

Comments: * Figures 1e-f are unclear -what are the myc-, GST, and flag-fusions?
Answer: The Myc, GST and Flag-fusions mean the fusion proteins with Myc, GST or Flag tags. The corresponding descriptions have been modified in the revised manuscript.

Reviewer #3 (Remarks to the Author):
Comments: J. Wang, et al, have produced a useful protein-protein interaction map for human liver cells using their set of ~5,000 cloned, liver-enriched protein-coding genes. The use of a matrix yeast two-hybrid (Y2H), a 4,788 x 4,740 array, and a cDNA library Y2H screen, 1,428 baits x liver cDNA prey library, provides a valuable dataset of binary interactions that encompasses a large fraction of liver-enriched functions. That data quality has been established using a combination of orthogonal assays on a subset of interaction pairs and their previously published bioinformatics strategy provides a measure of confidence in the overall dataset.
While the overall screening results and the investigation of some biological features of specific interactions render this an attractive study, there are a number of specific concerns with the current version of the manuscript that need to be addressed.

Answer: Thanks for the positive comments.
Several minor concerns with the text are: Comments: -there is no description of how the 1,428 baits used for the cDNA library screen were chosen.

Answer:
The criteria for the selection of the 1,428 baits are same as the 5,026 liver proteins. The aim of our study was to uncover characteristics of the liver by network analysis. However, the Y2H array we constructed represented only about 34% of the genes expressed in the human liver, and the available interaction information might be limited. Therefore, we screened an adult liver cDNA library using 1,428 bait vectors constructed in a stringent Y2H system designed to express the fusion proteins at low levels. The functional classifications of the bait proteins by GO mainly include signal transduction molecules, metabolic enzymes, and molecules closely related to liver development, regeneration and apoptosis. ). Fig 1F but NEMO, an alias, is used in Fig 2. Similarly for TNFAIP3 and its alias A20 -the authors should use official gene symbols whenever possible and not rely on either an alias or mixed naming.

Answer:
We have uniformed the gene names to official gene symbols. Fig 2D, all the nodes appear red -do all the nodes correspond to proteins with a liver phenotype.

Comments: -in
Answer: Yes, all the nodes are liver phenotype proteins. We have modified the description to avoid the confusion.
-in Fig 2E, what are the grey nodes.

Answer:
The grey nodes are those proteins without human liver disease annotation in LOMA database. We have added the description in the revised manuscript. fig S6,  Answer: For some IP results, reciprocal fashions were performed. But for most of them, only one way co-IP assay was performed to show the reliability of the dataset. We modified the corresponding description in the manuscript. In Figure s6, the expression of GIT2 enhanced slightly the association of A20 and NEMO, which might also due to the binding of GIT2 and NEMO. So we have deleted this figure from the supplementary files.

Comments: -the various IP/westerns need better labeling. The text and legends suggest that the pairs were tested in reciprocal fashion (Myc-X + Flag-Y and Myc-Y + Flag-X) but only one configuration is shown. In
Comments: -in the Y2H experiments, were diploids directly spotted or was replica-plating done.
The authors indicate that interactions from two independent screens were considered "true positives" -while these would be Y2H positives, were such "true positives" also retested using fresh glycerol stocks and, more importantly, was any sequencing done to confirm clone identity. For the cDNA library screen, what sequencing criteria was used to confirm ID and reading frame and were any baits sequenced to confirm identity of both partners?
Answer: In the Y2H assays, the diploids were directly spotted from different mating type yeast strains in the first step. Then, the diploids were replica-plating to analyze the reporter gene. The positives were retested by fresh glycerol stocks. Not all clones were sent for DNA sequencing. Because the identities of the clones were initially confirmed by gene sequencing, the random error is very low. For the cDNA library screen, to avoid the mistake with more manual work by peoples, all of baits and preys were sequencing to confirm the identities, only in frame with Gal4 were remained for further testing.

Comments: -were any pairs tested by co-IP also tested by GST-affinity capture and vice versa?
There also did not appear to be any mention of the positive and negative controls used for the orthogonal assays -an empty vector control is not sufficient as a negative control.
Answer: Few protein pairs were tested simultaneously by co-IP and GST-pull down assays. As the standard procedure, we used the cells transfected with only one of the proteins as negative controls in co-IP assay. Also, we used the antibody to IP a protein and then used the same antibody to detect the protein to show the IP is worked as a positive control. For the GST-pull down assays, similar to most published articles, we used the GST vector as negative control. The pull down of GST and GST fusion proteins is as a positive control to show that the experiment system is worked.

Comments:
The major concern with the current manuscript is that this network is one highly enriched in only the most highly expressed liver gene products because the baits used and the prey found correspond to highly expressed liver genes. While all the interaction pairs have a high degree of confidence, any conclusions regarding the topological or statistical nature of the resulting networks are overstated. Basically, there is a certain amount of circularity since the starting point is a biased set of liver-enriched genes. As others have shown, inferences on network topology need to take into account sampling issues and biases in the search space of genes tested.
Answer: Based on our understanding of the characteristics of the human liver proteome, we analyzed the functional and regulatory proteins that play important roles in liver development, regeneration, metabolism, biosynthesis and diseases (J Proteome Res. 2010, 9(1):79-94). From human liver protein and mRNA expression datasets, which were collected through shot-gun proteomics and microarray expression analyses (Supplemental Data), a total of 5,026 proteins were selected for interaction screening. The selected criteria of the liver proteins are not based on their expression level, but based on their functions. Many of them are lowly expressed in liver, such as liver phenotype proteins, liver diseases proteins and signal transduction proteins. In the Y2H library screening, the highly expressed preys may easily to be identified as indicating by the hits number. But for Y2H array screening, this bias does not occur.
Because the object of our work is to generate a protein-protein interaction network of human liver, the selected proteins for Y2H screening are significantly enriched with liver proteins (ME, LS, LP and LD). In the subsequent bioinformatics analysis of HLPN network, these proteins were highly enriched. But the special behavior of these proteins is not introduced from the initial baits and preys selection.
First, although ME, LS, LP and LD are all highly enrichment, the topological properties of the subnetworks of these proteins are difference. The degree centrality and the betweenness centrality values of LP and LD are significantly higher than other HLPN proteins. ME or LS do not have vital network positions in HLPN. The difference properties of subnetworks of LP, LD, ME and LS reflect that specific proteins of a certain category have special behavior, which may not be due to the initial selection of baits or preys.
Second, similar to previous reports, certain kind of proteins tends to form a cluster, such as phenotype and diseases proteins. The phenotype proteins of yeast have a large tendency to interact with each other (Proc Natl Acad Sci U S A. 2004, 101(52):18006-11). The diseases proteins also have the same properties as the phenotype proteins in the network (Proc Natl Acad Sci U S A. 2007, 104(21):8685-90).
Third, to exclude that the array screening with Y2H method may introduce bias that from baits/preys selection, we analyzed the subnetwork with the library screening method. We found that LP, LD, ME and LS proteins are also enriched in the preys, which further confirmed the former conclusions. The special properties of subnetworks of LP, LD, ME and LS are not due to the baits or preys selection.
Fourth, HLPN contains only part of the entire human liver network, which might lead to some sampling bias. To address this concern, we compiled a virtual liver protein-protein interaction network with the data from HPRD databases (http://www.hprd.org). In the compiled liver protein interaction network, all topological conclusions with ME, LS, LP and LD remains to be true (Supplementary Table S4). Moreover, we randomly added or removed 5~20% edges and found that all the conclusions still hold, which suggest that topological features of ME, LS, LP and LD of HLPN are not artifact from the biased datasets. All of these results indicate that the special features of ME, LS, LP and LD proteins are not introduced from the initial baits and preys selection, or the sampling bias.

Comments:
Calling the network a human liver-specific protein interaction network (HLPN) and emphasizing specific interactions as liver-specific is also somewhat misleading since many of the highlighted nodes in the network are ubiquitous Answer: We agree with the comments. The corresponding description is modified in revised manuscript. Most of the proteins are not specific expressed in human liver. For our knowledge, no protein is only specific expressed in a given tissue, organ or cell. But the combination of proteins might determine the specific biological features of a tissue, organ or cell. Currently, the molecular mechanisms for the tissue, organ or cell's specific properties are too complicated to be resolved. We aim to analyze the network of proteins that expressed in human liver. Due to all of the proteins were expressed in human liver and the further focused analysis with liver related functions, we named our network as a human liver protein interaction network. We just correlate and integrate the proteinprotein interactions with other functional information of human liver proteins. The interaction dataset of this manuscript could also be used for the network analysis of other cells, tissues or organs. More useful information was hinted from our bioinformatics analysis, which can be used to direct further studies of liver functions. But as comments from the reviewer, which can be applied to other research, not only limited to liver related studies.

Comments: Overall, the authors have produced a useful protein-interaction map for genes expressed in liver, albeit not a liver-specific map. The data is high quality as a result of multiple orthogonal validations (bioinformatics and experimental), but the analysis of network properties is premature given the biased nature of the baits and preys chosen for experimental investigation.
Answer: We agree that the dataset is not only able to use to direct liver related research, but also can be used to benefit more research fields, such as systems biology and molecular biology. The initial selected bias of baits and preys might have limited effects on the network properties as aforementioned above. Thanks for the positive comments. Thank you again for submitting your work to Molecular Systems Biology. We have now heard back from the two referees who agreed to evaluate this revised study. As you will see, the first reviewer is now supportive, but the second indicated clearly that they felt very important issues raised during the first round of review had not been clearly addressed. The editor agrees that these issues remain substantial, but feels it may be possible to address them with some additional clarifications. As such, we would ask you to carefully address these points in a final revision of the present work.
Here, I highlight some the key issues, that the editor feels must be conclusively addressed: 1. False Positive Rate: While an adequate discussion of the possible false positive rate, calculated with the Vidal method, is included in the rebuttal letter, these calculations should be incorporated into the main manuscript. Naturally, you may also include a brief discussion as to why you feel these estimates may be overly high.
2. Bait Selection. The selection criteria for the bait proteins has still not be adequately described. In particular, it is not clear how the genes actually used in the Y2H hybrid array and library were selected from the initial 5,026 liver genes. This should be described in detail, and if additional functional criteria were used to select these genes this should be clearly stated.
3. Functional Properties of the HPLN. The editor agrees with reviewer #2 that it is inappropriate to compare the functional enrichment in the HPLN to all genes in the genome (Fig. 1D, and claims of enrichment in the manuscript). Any test for functional enrichment of the HPLN should be made versus an appropriate gene background that rigorously reflects the initial gene selection bias (i.e. the 5,026 human liver genes). Ultimately, the HPLN appears to have similar functional composition to the set of human liver genes that were initially selected for the screen, which is not surprising since these genes are largely a subset of the initial liver-related genes. Similarly, claims that the prey genes are enriched for liver-related genes is not surprising given that they were identified either from the Y2H array or a liver cDNA library. As such, claims that the HPLN is enriched for liverrelated genes should be removed or rephrased, and instead it should be clearly stating that the HPLN shows similar functional properties to the initial set of bait genes. If you believe that the HPLN is indeed enriched for certain functional categories over the bait genes then the significance of these enrichments should be rigorously tested using appropriate comparisons, and p-values should be reported (e.g. liver-specific genes do appear slightly more common in the HPLN).
4. Topological properties of the HPLN. The HPRD analysis does support the idea that the topological properties observed are a property of the liver-related interaction network, and not an artifact of the bait selection process. Once again, though, it is essential that you clearly describe how liver-related genes were selected from HPRD to build the HPRD liver PPI network.
Please also address the important figure labeling concerns raised by reviewer #2.
Given the importance of these concerns, we reserve the right to send any revised work back to reviewer #2 to confirm that any revisions are sufficiently convincing.
In addition, when preparing your revision please address the following format issues: After carefully re-reading the revised version of the manuscript, I want to renew my view about the manuscript. This is a significant and valuable contribution to human interactome mapping worth being published. With respect to my earlier concerns it seems that they (actually almost all) were due to wrong labeling and other formal mistakes in the first version. This has been corrected and the study has gained consistency. Thus I would like to support the publication of this study.
Reviewer #2 (Remarks to the Author): Many if the main issues raised by the reviewer were serious and have been only remotely and insufficiently addressed -these points were important: 1) The criteria for the bait and prey selections remain unclear: -page 4 first paragraph we read: 5,026 proteins were selected from previous knowledge on liver proteome and then second paragraph we read that only 4,788x4,740 were used in the matrix screen and only 1,428 in the library screen. How did the authors go from 5,026 to 4,788 and then 4,740 and then to 1,428 is unclear from the manuscript. The authors claim in their answer to the reviewers that the criteria are "the same"; then how can it be that they get 3 different numbers. This important point is addressed neither in the revised version nor in the answer to the reviewer. This should be addressed and clearly phrased.
-page 3 the authors claim the network is enriched for liver specific, liver phenotype and liver disease proteins, but their initial selection also was. It is not the network that is enriched but the baitprey selection, as admitted by the authors in the answer to Reviewer 1 point 3. Similarly page 6 they claim " ... the HLPN is remarkably enriched for proteins (Fig. 1D) that have been shown to be specific expressed inn liver and...". Given the points raised above there is nothing remarkable in this, this simply arise from the bait-prey selection.
2) Overall clarity in legends and figures: - Supplementary Figure 4 has no x-axis legend and it is simply impossible to assess what is being displayed.
- Figure 1 panel E and F is identical to the one in the previous version and no effort has been made to clarify the labeling as requested by Reviewers 2 and 3.
3) Data quality: -If the false negative rate has been mentioned, there is currently still no mention of false positive rate.
Also if I got it properly, the request from Reviewer 1 comment 1 to add some representative plates has not been addressed.
2nd Revision -authors' response 21 July 2011 Thanks for the further review of our manuscript (MSB-11-2854R) from you and the reviewers. The additional modifications and clarifications are as follows. We hope that it fulfills the requirements for publish on Molecular Systems Biology.

Comments from the editor:
Thank you again for submitting your work to Molecular Systems Biology. We have now heard back from the two referees who agreed to evaluate this revised study. As you will see, the first reviewer is now supportive, but the second indicated clearly that they felt very important issues raised during the first round of review had not been clearly addressed. The editor agrees that these issues remain substantial, but feels it may be possible to address them with some additional clarifications. As such, we would ask you to carefully address these points in a final revision of the present work.
Here, I highlight some the key issues, that the editor feels must be conclusively addressed: Comments: 1. False Positive Rate: While an adequate discussion of the possible false positive rate, calculated with the Vidal method, is included in the rebuttal letter, these calculations should be incorporated into the main manuscript. Naturally, you may also include a brief discussion as to why you feel these estimates may be overly high.

Response:
The description of the false positive rate estimated by Dr. Vidal's method has been incorporated into the manuscript (Page 6, first paragraph).
Comments: 2. Bait Selection. The selection criteria for the bait proteins has still not be adequately described. In particular, it is not clear how the genes actually used in the Y2H hybrid array and library were selected from the initial 5,026 liver genes. This should be described in detail, and if additional functional criteria were used to select these genes this should be clearly stated.

Response:
We modified the description of bait selection in the manuscript (Page 4, first and second paragraph). The brief descriptions of baits selection are as follows: Selection of 5026 liver genes: A total of 5,026 proteins were selected for interaction screening from human liver protein and mRNA expression datasets (J Proteome Res 9: 79-94), which include the functional and regulatory proteins that play important roles in liver development, regeneration, metabolism, biosynthesis and diseases. The dataset include 684 metabolism enzymes, 201 liverspecific proteins, 337 liver-phenotype proteins (mice-homologous proteins which knockouts cause liver phenotypes) and 488 liver disease-related proteins. These proteins unbiased represent human liver proteome through Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis ( Fig. 1B-C, Supplementary Fig. 1). These molecules are involved in 84 of the 85 KEGG metabolic pathways including those of carbohydrates, lipids, nucleotides, amino acids, vitamins, hormones, bile acid and drugs ( Supplementary Fig. 1A). The selected proteins covered all 114 human regulatory pathways in the KEGG, such as the ErbB, MAPK, and TGF- signaling pathways, which have been shown to play key roles in the regulation of liver function ( Supplementary Fig. 1B). Compared with all proteins encoded by the human genome, the selected proteins are significantly enriched with ME, LS, LP and LD (Fig. 1D).
Construction of Y2H matrix: The selected 5026 baits were constructed on Y2H vectors. Finally, a matrix with 4,788×4,740 unique genes was successfully constructed for array screening.
Selection of 1428 baits for Y2H library screening: To obtain a more comprehensive map of liver, 1,428 baits were selected for Y2H library screening. The 1,428 baits used for Y2H library screening were selected from 5,026 proteins. The functional classification of 1,428 baits is consistent to the 5,026 proteins, which mainly involved in liver metabolism, apoptosis, cell proliferation, transcription, signal transduction, transport and biosynthesis etc.
Comments: 3. Functional Properties of the HPLN. The editor agrees with reviewer #2 that it is inappropriate to compare the functional enrichment in the HPLN to all genes in the genome (Fig.  1D, and claims of enrichment in the manuscript). Any test for functional enrichment of the HPLN should be made versus an appropriate gene background that rigorously reflects the initial gene selection bias (i.e. the 5,026 human liver genes). Ultimately, the HPLN appears to have similar functional composition to the set of human liver genes that were initially selected for the screen, which is not surprising since these genes are largely a subset of the initial liver-related genes. Similarly, claims that the prey genes are enriched for liver-related genes is not surprising given that they were identified either from the Y2H array or a liver cDNA library. As such, claims that the HPLN is enriched for liver-related genes should be removed or rephrased, and instead it should be clearly stating that the HPLN shows similar functional properties to the initial set of bait genes. If you believe that the HPLN is indeed enriched for certain functional categories over the bait genes then the significance of these enrichments should be rigorously tested using appropriate comparisons, and p-values should be reported (e.g. liver-specific genes do appear slightly more common in the HPLN).
Response: Even though the editor and reviewer #2 think that the functional enrichment analysis of HLPN is inappropriate, we still argue that the comparison of HLPN to all genes of human genome is an appropriate way for our analysis. Currently, it is difficult to select another gene background for the compare. First, lots of mRNA expression profile of various human tissues and organs exist. It is difficult to choose the gold standard dataset for the compare with HLPN data. Second, the combined datasets of human liver were collected through shot-gun proteomics and microarray expression analyses. To our knowledge, the related proteomics datasets are still absent for human tissues or organs. The expression information might be limited for a certain tissue or an organ that choose to be as gene background.
We admit that it is not surprising the HLPN is enriched for liver-related genes. Thus, we clearly state it in the modified manuscript (Page 7, second paragraph). However, the value and the significance of our work are to present such a human liver protein interaction network for these enriched liver-related proteins, which can be used to direct the functional research of liver. More than 90% of the HLPN interactions are showed for the first time is really valuable of our work.
Comments: 4. Topological properties of the HPLN. The HPRD analysis does support the idea that the topological properties observed are a property of the liver-related interaction network, and not an artifact of the bait selection process. Once again, though, it is essential that you clearly describe how liver-related genes were selected from HPRD to build the HPRD liver PPI network.
Response: Thanks for your suggestion. We compiled the virtual liver protein-protein interaction network ("HPRD liver PPI network") with the data from HPRD databases by a reported method (Molecular Systems Biology 5:260). We have added this reference and the related detail in the revision (Page 8, first paragraph).
Please also address the important figure labeling concerns raised by reviewer #2. Response: The figure labeling has been modified in the revised manuscript.
Given the importance of these concerns, we reserve the right to send any revised work back to reviewer #2 to confirm that any revisions are sufficiently convincing.
In addition, when preparing your revision please address the following format issues:

Comments:--The Supplementary Experimental Procedures should be included in the main manuscript as a Materials and Methods section.
Response: The supplementary experimental procedures have been included in the main manuscript as a materials and methods section (Page 13, paragraph 3).

Response:
A list of all supplementary materials has been added to the supplementary information file.

Revised version Wang et al. "Towards an understanding of the protein interaction network of human liver"
Comments: After carefully re-reading the revised version of the manuscript, I want to renew my view about the manuscript. This is a significant and valuable contribution to human interactome mapping worth being published. With respect to my earlier concerns it seems that they (actually almost all) were due to wrong labeling and other formal mistakes in the first version. This has been corrected and the study has gained consistency. Thus I would like to support the publication of this study.
Response: Thank you very much for the valuable comments and the support.

Reviewer #2 (Remarks to the Author):
Many if the main issues raised by the reviewer were serious and have been only remotely and insufficiently addressed -these points were important: Comments: 1) The criteria for the bait and prey selections remain unclear: -page 4 first paragraph we read: 5,026 proteins were selected from previous knowledge on liver proteome and then second paragraph we read that only 4,788x4,740 were used in the matrix screen and only 1,428 in the library screen. How did the authors go from 5,026 to 4,788 and then 4,740 and then to 1,428 is unclear from the manuscript. The authors claim in their answer to the reviewers that the criteria are "the same"; then how can it be that they get 3 different numbers. This important point is addressed neither in the revised version nor in the answer to the reviewer. This should be addressed and clearly phrased.

Response:
We modified the description of bait selection in the manuscript (Page 4, first and second paragraph). The brief descriptions of baits selection are as follows: Selection of 5026 liver genes: A total of 5,026 proteins were selected for interaction screening from human liver protein and mRNA expression datasets (J Proteome Res 9: 79-94), which include the functional and regulatory proteins that play important roles in liver development, regeneration, metabolism, biosynthesis and diseases. The dataset include 684 metabolism enzymes, 201 liverspecific proteins, 337 liver-phenotype proteins (mice-homologous proteins which knockouts cause liver phenotypes) and 488 liver disease-related proteins. These proteins unbiased represents human liver proteome through Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis ( Fig. 1B-C, Supplementary Fig. 1). These molecules are involved in 84 of the 85 KEGG metabolic pathways including those of carbohydrates, lipids, nucleotides, amino acids, vitamins, hormones, bile acid and drugs ( Supplementary Fig. 1A). The selected proteins covered all 114 human regulatory pathways in the KEGG, such as the ErbB, MAPK, and TGF- signaling pathways, which have been shown to play key roles in the regulation of liver function ( Supplementary Fig. 1B). Compared with all proteins encoded by the human genome, the selected proteins are significantly enriched with ME, LS, LP and LD (Fig. 1D).
Construction of Y2H matrix: The selected 5026 baits were constructed on Y2H vectors. Finally, a matrix with 4,788×4,740 unique genes was successfully constructed for array screening.
Selection of 1428 baits for Y2H library screening: To obtain a more comprehensive map of liver, 1,428 baits were selected for Y2H library screening. The 1,428 baits used for Y2H library screening were selected from 5,026 proteins. The functional classification of 1,428 baits is consistent to the 5,026 proteins, which mainly involved in liver metabolism, apoptosis, cell proliferation, transcription, signal transduction, transport and biosynthesis etc.
Comments: -page 3 the authors claim the network is enriched for liver specific, liver phenotype and liver disease proteins, but their initial selection also was. It is not the network that is enriched but the bait-prey selection, as admitted by the authors in the answer to Reviewer 1 point 3. Similarly page 6 they claim " ... the HLPN is remarkably enriched for proteins (Fig. 1D) that have been shown to be specific expressed inn liver and...". Given the points raised above there is nothing remarkable in this, this simply arise from the bait-prey selection.

Response:
We admit that it is not surprising the HLPN is enriched for liver-related genes. Thus, we clearly state it in the modified manuscript (Page 7, second paragraph). However, the value and the significance of our work are to present such a human liver protein interaction network for these enriched liver-related proteins, which can be used to direct the functional research of liver. More than 90% of the HLPN interactions are showed for the first time is really valuable of our work. Figure 4 has no x-axis legend and it is simply impossible to assess what is being displayed.

Comments: 2) Overall clarity in legends and figures: -Supplementary
- Figure 1 panel E and F is identical to the one in the previous version and no effort has been made to clarify the labeling as requested by Reviewers 2 and 3.

Response:
The x-axis legend has been added to the supplementary Figure 4 (Page 5). More description of Figure 1 panel E and F has been added to the figure legends (Page 25, paragraph 2). The gene name of NEMO has been changed to IKBKG in the revised manuscript ( Figure 1F).

Comments: 3) Data quality:
-If the false negative rate has been mentioned, there is currently still no mention of false positive rate. Response: The description of false positive rate has been added to the revised manuscript (Page 6, first paragraph).
Comments: Also if I got it properly, the request from Reviewer 1 comment 1 to add some representative plates has not been addressed. Response: The description of representative plates has been added to the revised manuscript (Page 10, second paragraph, "For example, EP300 is an acetyltransferase that interacts with seven liverphenotype proteins……"). Thank you again for submitting your revised work to Molecular Systems Biology.
Editors of the journal have now had time to consider and discuss your revised manuscript. Unfortunately, we feel that your revision does not sufficiently address the important concerns raised by the reviewers and we regret to inform you that we cannot publish this manuscript in its current form.
Please note that it is the general policy of Molecular Systems Biology to only allow manuscripts a single round of revision. Since this will now be the third round of revision, our next decision will be final. Please understand that if the issues listed below are not convincingly addressed in this last revision, we will have no choice but to reject publication of this work.
The major issues remaining from the previous round of review are the following: 1. It is still unclear how the 1,428 baits were selected. The revised manuscript indicates that "functional classification of 1,428 baits is consistent to the 5,026" liver-related proteins. Unfortunately, this does not explain how this subset of bait proteins was selected --were baits selected randomly, by functional characteristics, arbitrarily, or by any other criteria?
2. The fundamental concerns regarding the way functional enrichment was calculated for the HPLN have not been rigorously addressed. The claims regarding "remarkable enrichment" remain in the manuscript, and the statistical analyses presented in this work (e.g. in Fig. 1D) still make comparisons to genome-wide gene sets. The reviewers clearly indicated that this was inappropriate. We ask you to simply remove these claims from the text and remove figure 1D.
3. Please make sure that the text in any revised manuscript is thoroughly evaluated by a native English speaker.
Yours sincerely, Editor Molecular Systems Biology