Topology-driven protein-protein interaction network analysis detects genetic sub-networks regulating reproductive capacity

Understanding the genetic regulation of organ structure is a fundamental problem in developmental biology. Here, we use egg-producing structures of insect ovaries, called ovarioles, to deduce systems-level gene regulatory relationships from quantitative functional genetic analysis. We previously showed that Hippo signalling, a conserved regulator of animal organ size, regulates ovariole number in Drosophila melanogaster. To comprehensively determine how Hippo signalling interacts with other pathways in this regulation, we screened all known signalling pathway genes, and identified Hpo-dependent and Hpo-independent signalling requirements. Network analysis of known protein-protein interactions among screen results identified independent gene regulatory sub-networks regulating one or both of ovariole number and egg laying. These sub-networks predict involvement of previously uncharacterised genes with higher accuracy than the original candidate screen. This shows that network analysis combining functional genetic and large-scale interaction data can predict function of novel genes regulating development.


Introduction
The final shape and size of an organ is critical to organismal function and viability. Defects in human organ morphology cause a multitude of pathologies, including cancers, organ hypertrophies and atrophies (e.g. Yang and Xu, 2011). It is thus critical to understand the regulatory mechanisms underlying the stereotypic shape and size of organs. To this end, assessing the genetic regulation of size is significantly facilitated by using quantifiable changes in organ size and shape.
The Drosophila melanogaster female reproductive system is a useful paradigm to study quantitative anatomical traits. In these organs, the effects of multiple genes and the environment combine to produce a quantitative phenotype: a species-specific average number of egg-producing ovarian tubes called ovarioles. Fruit fly ovaries can contain as few as one and as many as 50 ovarioles per ovary, depending on the species (Kambysellis and Heed, 1971;King, 1970;Markow et al., 2009;Sarikaya et al., 2019), with each ovariole capable of producing eggs. Ovariole number, therefore, may affect the reproductive fitness of Drosophila species by determining the potential of an adult female to produce eggs (Klepsatel et al., 2013b;R' kha et al., 1997). While ovariole number within a species can vary across temperatures (Azevedo et al., 1996), altitudinal and latitudinal clines (Capy et al., 1994;David and Bocquet, 1975), under constant environmental conditions ovariole number is highly stereotypic (Capy et al., 1993;Klepsatel et al., 2013a;R'Kha et al., 1991;R' kha et al., 1997). The reproducibility of ovariole number thus indicates a strong genetic component (Sarikaya et al., 2019). Genome wide association studies and quantitative trait locus mapping have demonstrated that the ovariole number is a highly polygenic trait (Bergland et al., 2008;Lobell et al., 2017;Orgogozo et al., 2006;Wayne et al., 2001;Wayne et al., 1997;Wayne and McIntyre, 2002). In contrast, functional genetic studies have identified only a small number of genes whose activity regulates ovariole number (discussed below). Thus, the complexity of the genetic regulation of this important trait remains largely unknown.
The determination of ovariole number in D. melanogaster occurs during late larval and pupal development (King et al., 1968). Each ovariole in the adult fly arises from a single primordial structure called a terminal filament (TF), which forms in the late third instar larval ovary  by convergent extension (Keller, 2006) of the terminal filament cells (TFCs) Sahut-Barnola et al., 1996). TFCs are first specified from an anterior population of somatic cells in the larval ovary by the expression of transcription factors including Bric-à -brac 1/2 (bric-àbrac 1/2; bab1/2) and Engrailed (engrailed; en) Sahut-Barnola et al., 1995). Initially a loosely arranged group in the anterior of the larval ovary, TFCs undergo morphogenetic movements to give rise to the ordered columns of cells that are TFs. Cell intercalation during convergent extension is dependent on the actin regulators Cofilin (twinstar; tsr) and the large Maf factor Traffic Jam (traffic jam; tj), and on E-cadherin dependent adhesion (Chen et al., 2001;Godt and Laski, 1995). Regulation of ovariole number is thus largely dependent on the specification of the TFCs and their rearrangement into TFs (Sarikaya and Extavour, 2015).
We previously showed that the regulation of both TFC and TF number is dependent on the Hippo signalling pathway (Sarikaya and Extavour, 2015), a pan-metazoan regulator of organ and tissue size (Hilman and Gat, 2011;Sebé-Pedró s et al., 2012). At the core of the Hippo kinase cascade are two protein kinases, Hippo (hippo; hpo) and Warts (warts; wts), which prevent the nuclear localisation of the transcriptional co-activator Yorkie (yorkie; yki). Yki and the transcription factor Scalloped (scalloped; sd) together initiate the transcription of multiple target genes, including those that promote cell proliferation and survival. In the D. melanogaster larval ovary, loss of Hpo in the somatic cells causes an increase in nuclear Yki, leading to an increase in TFCs, TFs, ovariole number and egg laying in adults (Sarikaya and Extavour, 2015).
Production of fertile eggs from a stereotypic number of ovarioles requires a spatially and temporally coordinated interplay of signalling between the somatic and germ line cells of the ovary. Thus, signalling amongst somatic and germ line cells in the larval ovary is crucial to all stages of ovarian development (Ables and Drummond-Barbosa, 2017;Gilboa, 2015;Green et al., 2011;LaFever and Drummond-Barbosa, 2005;LaFever et al., 2010;Sarikaya and Extavour, 2015). For instance, disruptions in insulin or Tor signalling affect both somatic and germ line cell proliferation (Gancz and Gilboa, 2013;Green and Extavour, 2012;Hsu and Drummond-Barbosa, 2009;LaFever and Drummond-Barbosa, 2005;LaFever et al., 2010;Sarikaya et al., 2012). Similarly, ecdysone pulses from the prothoracic gland regulate the timely differentiation of the primordial germ cells (PGCs) and the somatic TFCs (Gancz et al., 2011;Hodin and Riddiford, 1998;Hodin and Riddiford, 2000b). Both Hpo and ecdysone signalling also control the proportion of germ line to somatic cells by differentially regulating proliferation of both cell types (Gancz et al., 2011;Sarikaya and Extavour, 2015).
Although it is clear that genes function together in regulatory networks (Gonzalez and Kann, 2012), determining how the few genes functionally verified as required for ovariole development and function work together to coordinate ovariole number and ovarian function more generally is a challenge, because most genes or pathways have been considered individually. An alternative approach that is less often applied to animal developmental genetics, is a systems biology representation of complex biological systems as networks (Barabási and Albert, 1999;Watts and Strogatz, 1998). Protein-protein interaction networks (PINs) are such an example (Albert and Barabási, 2002). The availability of high-throughput molecular biology datasets from, for example, yeast twohybrid, protein CHiP and microarrays has allowed for the emergence of large scale interaction networks representing both functional and physical molecular interactions ( Barabási and Oltvai, 2004;Berger et al., 2007;Giot et al., 2003;Gonzalez and Kann, 2012).
With ample evidence that signalling in the ovary can affect ovarian development, but few genes functionally verified to date, we aimed to identify novel regulators of ovariole development by functionally testing all known members of all characterized D. melanogaster signalling pathways. We used tissue-specific RNAi to systematically knock down 463 genes in the larval ovary and looked for modifiers of the hpo loss of function egg laying and ovariole number phenotypes. To analyse the results of this phenotypic analysis, we used topology-driven network analysis to identify genetic networks regulating these phenotypes, thus generating hypotheses about the relationships between these networks. With this systems biology approach, we identify not only signalling pathway genes, but also previously untested genes that affect these reproductive traits. Functional testing showed that these novel genes affect ovariole number and/or egg laying, providing us with a novel in silico method to identify target genes that affect ovarian development and function. We use these findings to propose putative developmental regulatory networks underlying one or both of ovariole formation and egg laying.

Results
An RNAi modifier screen for signalling pathway involvement in ovariole number To systematically ascertain the function of signalling pathway genes and their interactions with Hippo signalling in the development of the D. melanogaster ovary, we first curated a list of all known and predicted signalling genes (Gramates et al., 2017;Kanehisa et al., 2010;Mbodj et al., 2013). We identified 475 genes belonging to the 14 developmental signalling pathways characterised in D. melanogaster ( Table 1; Supplementary file 1), and obtained UAS: RNAi lines for 463 of these genes from the Vienna Drosophila RNAi centre (VDRC) or the TRiP collections at the Bloomington Drosophila Stock centre (BDSC) (all D. melanogaster genetic lines used are listed in Materials and methods).
We previously showed that reducing the levels of hpo in the somatic cells of the larval ovary using traffic jam Gal4 (tj:Gal4) driving hpo [RNAi] increased both ovariole number and egg laying of adult female flies (Sarikaya and Extavour, 2015). To identify genes that modify these phenotypes, we used tj:Gal4 to drive simultaneous hpo [RNAi] and RNAi against a signalling candidate gene, and quantified the phenotypic change (Figure 1a-d). We observed that on driving two copies of hpo [RNAi] using tj:Gal4, we obtained a further increase in both egg laying and ovariole number (Figure 1e). This indicates that ovaries have further potential to increase ovariole number and egg laying beyond the increase induced by tj:Gal4 driving one copy of hpo [RNAi], and that tj:Gal4 can Table 1. Number of candidate genes tested in each signalling pathway. Candidate genes are grouped by their reported roles in one or more signalling pathways based on published literature. Genes in this list are not necessarily unique to a single pathway, but rather may function in more than one signalling pathway. The list of specific genes per pathway that were included in the screen for functional analysis ( Figure 1) is found in the Supplementary file 1.  Counted number of ovarioles  Table 2). (e) Total eggs laid by three female flies over five days (left panel) and ovariole number (right panel) of Oregon R (top row), tj:Gal4 driving one copy of UAS:hpo [RNAi] (middle row), and tj:Gal4 driving two copies of UAS:hpo [RNAi] (bottom row), showing that, the previously reported tj:Gal4 >hpo [RNAi] ovariole number and egg laying phenotypes (Sarikaya and Extavour, 2015) can be modified by further UAS:RNAi-mediated gene knockdown. Distribution of egg laying and ovariole number of controls in each screen batch is illustrated in Figure 1  drive the expression of two RNAi constructs, indicating that our screen could identify both enhancers and suppressors of the tj:Gal4 >hpo [RNAi] phenotype.
We proceeded to identify modifiers of the tj:Gal4 >hpo [RNAi] phenotype by crossing males of each of the 463 candidate genes RNAis individually with tj:Gal4 >hpo [RNAi] females, and performing three phenotypic screens on the offspring. In the first screen (Figure 1a), we measured egg laying of three F1 female offspring (tj:Gal4 >hpo [RNAi], signalling candidate [RNAi]) over five days. To address batch variation (Figure 1-figure supplement 1), we standardised egg laying measurements by calculating the Z scores (Z gene = number of standard deviations from the mean) for each candidate line relative to its batch controls. 190 genes had an egg laying |Z gene | below 1. Previous studies have shown that the egg laying of newly eclosed adult mated females correlates with ovariole number during the first five days (Klepsatel et al., 2013b). We therefore eliminated these 190 genes from subsequent screening, because the change in egg laying was so modest that we considered these candidates were unlikely to show changes in ovariole number when compared to controls.
In the second screen (Figure 1b), we measured egg laying in a wild-type background (tj >signalling candidate [RNAi]) for the 273 remaining candidate genes. For the third screen (Figure 1c), we quantified the ovariole number of tj:Gal4 >hpo [RNAi], signalling candidate[RNAi] F1 adult females for the same 273 candidate genes. To choose candidates from the second and third screens for further study, we wished to account for the fact that the two screens had different effective numbers of data points. This was because egg laying data were obtained from individual vials of three females over five days, while ovariole numbers were obtained from 20 ovaries from ten females (see Materials and methods). We therefore selected the 67 genes with a |Z gene | above two for ovariole number (Figure 1c and d; Table 2), and the 49 genes with a more conservative |Z gene | above five for egg laying (Figure 1a, b and d; Table 2), for a total of 116 positive candidates for subsequent analyses.

Ovariole number is weakly correlated with egg laying
Standardisation of the results from the three screens using Z scores allowed us to compare the effects of individual genes on one or both of egg laying and ovariole number. We performed a pairwise comparison of the Z gene values for all combinations of screens, and considered genes with |Z gene | values that were above the thresholds set for the phenotype in each screen (above two for ovariole number, above five for egg laying; green dots in Figure 2a-c). Across all three screens, loss of function of our positive candidates yielded reductions in ovariole number and egg laying more commonly than increases (Figure 2a-c). Comparing the |Z gene | values of egg laying and ovariole number of tj:Gal4 >hpo [RNAi], signalling candidate [RNAi] adult females revealed that genes that caused a change in egg laying did not always similarly affect ovariole number, and vice versa ( Figure 2a). We therefore hypothesise that egg laying and ovariole number may be regulated by Table 2. Results of the three functional genetic screens. Number of genes tested in each screen and cumulative results. 'Negative effect' corresponds to a reduction in eggs laid or number of ovarioles below the Z score (Z gene ) threshold for each phenotype. 'Positive effect' indicates an increase above the set Z gene thresholds. Z gene thresholds for each category in each screen are indicated in brackets. The primary filter of |Z gene | < 1 was applied only to the hpo [RNAi] Egg Laying screen shown in Figure 1a. The list of specific genes that exceeded our chosen Z gene thresholds for each scored phenotype ( Figure 1) and were therefore considered to have a positive or negative effect on the phenotype, is found in the Supplementary file 1. The 12 genes for which RNAi stocks were unavailable at the time of testing are listed in Table 3.  genetically separable mechanisms. This hypothesis notwithstanding, we observed a weak but statistically significant correlation between egg laying and ovariole number (p=1e10 À5 ; Figure 2d), and this correlation was most significant in adult females that had a drastic reduction in both phenotypes ( Figure 2a).

No single signalling pathway dominates regulation of ovariole number or egg laying
We found that at least some genes from all tested signalling pathways could affect both egg laying and ovariole number (Figure 3). To determine if some pathway(s) appeared to play a more important role than others in these processes, we asked whether any of our screens were enriched for genes from a specific signalling pathway. To measure enrichment, we compared the distribution of individual pathway genes among the positive candidates in each screen, to a randomly sampled null distribution of pathway genes among a group of the same number of genes randomly selected from our curated list of 463 signalling genes ( Figure 3a). Involvement of a pathway in the regulation of a phenotype would be reflected in a difference between the representation of pathway genes in an experimentally derived list and a randomly selected group of signalling genes. We found that rather than only one or a few pathways showing functional evidence of regulating ovariole number or egg laying, nearly all pathways affected both phenotypes ( Figure 3a). We further tested this result by calculating the hypergeometric p-value for the enrichment of each signalling pathway, in each of the three groups of genes. Consistent with the results of the random sampling approach (Figure 3a), we found that most pathway members were not significantly enriched for egg laying or ovariole number phenotypes ( Figure 3b). The absence of significant enrichment of any specific pathway is not simply attributable to the pool of genes that were screened, because our experimental manipulations of ovariole number and egg laying did cause a change in the distribution of signalling pathway members ( Figure 3-figure supplement 1). Instead, both phenotypes appeared to be regulated by members of most or all signalling pathways ( Figure 3). The only two exceptions to this trend were a greater than twofold enrichment of (1) genes from the Notch signalling pathway in the regulation of ovariole number (p-value<0.05, pink bar in Figure 3a,b), and (2) members of the Hedgehog (Hh) signalling pathway in the regulation of Hippo-dependent egg laying (p-value<0.05, brown bar in Figure 3a and b; Figure 3-figure supplement 2). In summation, our analyses of the enrichment of signalling pathways within the different screens indicated that both ovariole number and egg laying are regulated by genes from nearly all described animal signalling pathways (Figure 3a), rather than being dominated by any single pathway.
Comparing the results of the Egg Laying screens performed in a wild type background (Figure 1b) or in a hpo [RNAi] background (Figure 1a), revealed that most of the genes that met a threshold of |Z gene | > 5 in one screen, did not meet that threshold in the other screen ( Figure 2c). This result suggests the existence of both Hippo-dependent and Hippo-independent mechanisms of regulation of egg laying. The interpretations of separable Hippo-dependent and -independent regulation of egg laying, and of the separable regulation of ovariole number and egg laying, were supported by the results of the network analysis described in the following section.
Centrality of genes in the ovarian protein-protein interaction networks can predict the likelihood of loss of function phenotypic effects The finding that these reproductive traits were regulated by the genes of all signalling pathways led us to consider the broader topology of putative gene regulatory networks in the analysis of our data. Previously characterised genes in the ovary are often pleiotropic and can regulate both ovariole number and egg laying (Gilboa, 2015;Sarikaya and Extavour, 2015). As with proteins in a linear pathway, proteins in a protein-protein interaction network (PIN) are more likely to function in conjunction with genes that are connected to them within the network (e.g. Ideker and Sharan, 2008;Jeong et al., 2001). Centrality is one measure of the connectedness of a gene in the PIN and can be used to identify the most important functional centres within a protein network (Hahn and Kern, 2005;Ma'ayan, 2011). Most centrality measures use path length, which is a measure of the number of other proteins required to link any two proteins in the network. Here, we used four commonly used metrics to quantify gene centrality, each measuring slightly different properties (Jalili et al., 2016;Koschützki and Schreiber, 2008).
(1) Degree centrality is proportional to the number of proteins that a given protein directly interacts with.
(2) Betweenness centrality measures the number of shortest paths amongst all the shortest paths between all pairs of proteins that require passing through a particular protein.
(3) Closeness centrality measures the average shortest path that connects a given protein to all other proteins in the network. (4) Eigenvector centrality is a measure of the closeness of a given protein to other highly connected proteins within the network. We hypothesised that if the candidate genes we identified in our screen as playing roles in ovarian function worked together as a PIN, then the degree of centrality of a gene might be an indicator of function. To test this hypothesis, we first compiled a PIN consisting of all described interactions between D. melanogaster proteins, from the combination of publicly available protein-protein interaction (PPI) studies in the DroID database (see Materials and methods). We then calculated the four centrality measures described above for all genes within the D. melanogaster PIN (Supplementary file 1). We rank ordered only the genes tested in each screen by their score for each centrality measure, and asked whether their rank order correlated with the results of the screen, plotting these results as a receiver operating characteristic (ROC) curve. Positive correlations between centrality (a continuous variable) and phenotype (a binary variable: above or below the |Z gene | threshold) are reflected in an area under the curve (AUC) of more than 0.5. We found that the higher the centrality score, the greater the likelihood that a gene had |Z gene | values above our threshold for effects on ovariole number and egg laying (Figure 4a; Figure 4-source data 1). This supports the premise that the positive candidates identified in our screen function together as a network in the regulation of either ovariole number or egg laying. Interestingly, while the centrality of genes did predict whether a gene would affect our phenotypes of interest, it could only weakly predict the strength of that effect (p-value<0.05 in Figure 4-figure supplement 1).
Genes regulating egg laying and ovariole number regulation form nonrandom gene interaction networks The centrality analyses above suggested that the genes implicated in ovariole number and egg laying displayed characteristics of a functional network. PINs can often be further sorted into a collection of sub-networks. A sub-network is a smaller selection of proteins from the PIN. Examples and c, bar graphs on the top and right sides of each panel show the distribution of genes in each axis of the adjacent scatter plots. Green dots = genes that meet the Z gene threshold for the indicated phenotype. Grey dots = genes that do not meet the Z gene threshold for the indicated phenotype. Dark grey dotted lines = thresholds for each phenotype: |Z gene | > 5 for Egg Laying and |Z gene | > 2 for Ovariole Number. In a and c, the white vertical bar removes all genes in the tj >hpo [RNAi], candidate [RNAi] with a |Z gene | < 1 for egg laying. These genes were not measured in the other two conditions and are therefore not represented in the scatter plots.  Figure 3. Enrichment of genes of individual signalling pathways among the experimentally obtained positive candidates of each screen. (a) Enrichment/ depletion analysis to identify over-or under-represented members of individual signalling pathways among positive candidates of each screen. Positive Z scores represent an enrichment, and negative Z scores represent depletion, of genes of a pathway among those genes that experimentally affected the phenotype enrichment and depletion are defined relative to a null distribution of the expected number of members of a signalling pathway among Figure 3 continued on next page of such sub-networks could be proteins within the same subcellular organelle (Foster et al., 2006) or genes that are expressed at the same time (Spellman et al., 1998), thus making them likely to function together (Srinivasan et al., 2007). A putative module is a sub-network that can perform regulatory functions as a unit, independent of other sub-networks, and has key measurable features (Barabási and Oltvai, 2004;Hartwell et al., 1999;Ravasz et al., 2002;Yook et al., 2004). Genes and interactions between genes are not mutually exclusive to such putative modules and can be shared between putative modules. We therefore asked if our sub-networks, consisting of genes that showed similar mutant phenotypes, might display features of modularity. To determine whether genes that were implicated in regulation of ovariole number and egg laying interacted with each other in specific groups more than would be expected by chance, we created four lists of genes, called 'seed' lists, based on their individual phenotypic effects based on our screen results: (1) the core seed list, including genes positive in all three screens ( Figure 4b); (2) the egg laying seed list, including genes positive in the wild type background egg laying screen ( Figure 4c). Interestingly, the core seed list, comprising genes that affected all three measured phenotypes, only consisted of genes that caused a reduction in both ovariole number and egg laying ( Figure 4b).
We then asked whether the members of these four seed lists were more connected than would be expected by chance. In other words, we formally tested them for modularity as defined above. Meeting our criteria for modularity would suggest that the genes in these phenotypically separated seed lists might operate together as putative functional modules within the Drosophila PIN. We performed our modularity test using four commonly measured network metrics: (1) Largest Connected Component (LCC) (the number of proteins or nodes connected together by at least one interaction), (2) network density (the relative number of edges as compared to the theoretical maximum), (3) total number of edges, and (4) average shortest path (average of the minimum distances connecting any two proteins). We considered a sub-network to show modular features if they showed most of the following properties: higher LCC, higher network density, more edges, and shorter average shortest path length, when compared to a similarly sized, randomly sampled selection of genes from the PIN.
To determine whether these criteria would correctly identify signalling genes, which are known to function together as a module, we measured these four parameters in the original set of genes (all signalling genes) used in this study (Table 1). We found that the signalling genes display features of modularity when compared both to a randomly selected set of genes, as well as to a degree-controlled list of genes ( Figure 4-figure supplement 2a). We then used this approach to test the modularity of the four phenotypic sub-networks, when compared to two different 'control sub-networks' consisting of a group of the same number of genes as contained in the sub-network, one chosen randomly from among the candidate genes from our initial screen list (Table 1), and the second chosen from a degree-controlled list of genes selected from the entire PIN (see Materials and methods: Building degree-controlled randomised networks). We found that the four predicted phenotypic sub-networks showed higher LCC, higher network density, and more edges (Figure 4-figure supplement 2b), compared to both 'control sub-networks'. This result suggests that these sub-networks display many features of modularity (although their average shortest path length is higher than Figure 3 continued a group containing the same number of randomly selected signalling genes. (b) Fold enrichment and hypergeometric p-value calculation to identify over-or under-representation of the genes of a pathway in each screen. Significantly enriched pathways (coloured bars: brown = Hedgehog; pink = Notch) are defined by having a hypergeometric p-value less than 0.05. Enrichment/depletion analysis of the 273 signalling pathway genes above the threshold |Z gene | > 1 (Figure 1a) before screening is illustrated in Figure 3-figure supplement 1. Figure 3-figure supplement 2 compares the Z gene of egg laying of adult females of tj >hpo [RNAi],candidate [RNAi] plotted against Z gene of egg laying of tj >candidate [RNAi] adult females displayed by pathway. The online version of this article includes the following figure supplement(s) for figure 3:   Genes whose |Z gene | value was above the threshold (green dots; Table 2) in all three screens were assigned to the Core seed list. (c) Genes whose |Z gene | value was above the threshold (green dots; Table 2) in each screen were assigned to the corresponding seed list. Figure 4-source data 1 Tabulates the AUC values for the ROC curves for each centrality measure for the three screens (a).    controls, rather than lower) and may function as putative modules within the PIN to regulate one or both of ovariole number or egg laying.
Based on published molecular interactions, in addition to the four criteria described above, further evidence for putative functional modules of genes can also be obtained by applying algorithms that use either the shortest path method (Bromberg et al., 2008) or the Steiner Tree approach (Huang and Fraenkel, 2009). Such methods identify and predict functional connections between the seed proteins, as well as additional nodes (proteins or genes) that have not been experimentally tested within the given parameters, but are known to interact with the seed genes in the PIN (Albert and Albert, 2004;Yu et al., 2006). This process can provide evidence for or against the existence of a predicted functional module, and subsequent experimental testing of this predicted module can confirm its functionality. Given its recent success in predicting gene modules, we applied the previously published Seed Connector Algorithm (SCA), a member of the Steiner Tree algorithm family Wang and Loscalzo, 2018), to the groups of genes that had similar phenotypic effects in our screens (seed genes; Figure 4b and c). The SCA connects seed genes and previously untested novel genes (connectors) to each other using a known PIN, producing the largest possible connected putative module given the data. Using the PIN and the aforementioned four lists of seed genes, we applied a custom python implementation of the SCA (Materials and methods: 04_Seed-Connector.ipynb) to build and extract the largest possible (given our PIN) connected putative modules that regulate egg laying and ovariole number.
This SCA method yielded four putative modules, one for each seed list, which we initially referred to as the Core Module (Figure 5b Each of these four putative modules contained seed genes, which had been functionally evaluated in our screens (green and red circles in Figure 5), as well as connector genes, which were genes newly predicted as regulators of these phenotypes (green and red triangles in Figure 5). Of the four putative modules generated by the SCA, we found that the Core module had higher centrality measures than the other three putative modules (Figure 5-figure supplement 4). We interpret this to mean that the genes regulating these 'Core phenotypes' are more strongly connected to each other.
We found, however, that these four groups of genes produced by the SCA did not have increased LCC values, increased network density, more edges nor decreased average shortest path (Figure 5-figure supplement 5), compared to our 'control sub-networks'. This result shows that the SCA in this instance does not provide evidence for putative functional modules from the four phenotypic sub-networks in our system, above and beyond the evidence provided by the application of the four network metrics discussed above. To be conservative in our description of these results, we therefore henceforth refer to these four groups of genes united by phenotype and with strong predicted interactions, as sub-networks rather than as modules. We noted that each of these four sub-networks contains genes from most, if not all, known signalling pathways, rather than only genes from a single pathway (Figure 5-figure supplement 6).
Low edge densities between sub-networks suggest genetically separable mechanisms of ovariole number and egg laying Our network analysis identified four highly connected sub-networks of genes that regulate two distinct developmental processes, together with or independently of Hippo signalling activity: ovariole number determination, which occurs primarily during larval development, and egg laying, which takes place in adult life ( Figure 5). We wished to assess the degree to which there might be any shared genetic components between these four phenotypic sub-networks, and whether the addition of connector genes by the SCA had any impact on this. To understand potential interactions between the phenotypic sub-networks in the regulation of both ovariole number and egg laying, we constructed a composite network of all genes in each of the four phenotypic sub-networks , which we refer to as the 'meta network' (Figure 6a). We then grouped the genes of the meta network into seven bins based on their phenotypic effects as measured in the three screens, resulting in sub-groups I through VII shown in Figure 6a. To ask whether the genes in these phenotypic groupings showed any notable interaction patterns, we compared the connectivity between genes assigned to the same phenotypic group, to the connectivity of a group of the same size

Core Module
Zgene >t t>Z gene >-t Z gene <-t Seed Connector b Figure 5. Representation of Seed Connector Algorithm (SCA) and output. (a) Schematic representation of the SCA. The algorithm initialises by creating a sub-network of seed genes from the PIN and computes the Largest Connected Component (LCC) and coverage (number of genes from the seed set in the LCC). At each iteration, genes in the direct neighbourhood of the LCC (distance = 1) are added one at a time to the seed set, and the coverage and LCC are recomputed. This process is repeated for each gene in the direct neighbourhood, each time restarting from the seed set of the preceding Figure 5 continued on next page randomly assembled from the genes of the meta network ( Figure 6-figure supplement 1). As a measure of connectivity, we used an edge density map, which reflects the number of interactions between the genes within a group and between groups. We quantified the deviation between the edge density of each of groups I through VII, and their corresponding randomly assigned groups of the same size, by computing their respective Z score. When we included only seed genes in each of groups I through VII, we found that the edge density values of these groups were somewhat lower (Z score <À1.5) than those of the randomised groups ( Figure 6-figure supplement 1a). The single exception to this was group IV, whose members shared more edges with each other than did the members of its randomised comparison group of the same size ( Figure 6-figure supplement 1a).
In other words, these groups of phenotypically binned seed genes were not notably more connected to each other then we would expect by chance. In contrast, expanding each of the seven phenotypic sub-groups to include both seed genes and the connector genes predicted by the SCA changed the edge densities of these groups relative to their randomised control groups. Specifically, edge densities were much lower between groups I, II and III (Z score < À3), and much higher within group IV (Z score >3) (Figure 6-figure supplement 1). This shows that applying the SCA to these phenotypically binned groups increased the non-random differences in connectivity between them that were already present within the seed genes ( Figure 6-figure supplement 1a), thus clarifying the internal structure of the meta network.
We then asked if these seven sub-groups were as connected to each other, as were the genes within each of the sub-groups, again using the edge density assessment as described above (Figure 6b). This analysis yielded three principal findings. First, edge densities between the three groups corresponding to the three scored phenotypes (I, II and III in Figure 6a) were very low ( Figure 6b).This implies that the genes in each of the groups that regulate only one phenotype (I, II and III in Figure 6a) share more interactions with themselves than with genes in the other two groups, suggesting that each of these initially scored phenotype can be largely regulated by an independent, non-interacting set of genes. Second, the core group (IV in Figure 6a) displayed a higher edge density with the other three groups (I, II and III in Figure 6a) than any of those three groups did with each other (Figure 6b). Consistent with the definition of core genes as regulating all three scored reproductive phenotypes, this result suggests that the core genes, in contrast to those from the other three groups, may share substantial functional interactions with genes of the other groups. Finally, three small additional groups emerged from this analysis (V, VI and VII in Figure 6a), iteration. If any gene outside the seed set but in the direct neighbourhood is found to maximise coverage while minimising the LCC, it is added to the seed set as a connector gene. Black arrows indicate the path taken by the algorithm for which the criteria of maximal coverage and minimal LCC are met; such a path would be used to proceed to the subsequent iteration. Grey arrows indicate paths that fail to meet these criteria; such paths would be disregarded. The iteration repeats until the coverage cannot be increased; in this schematic example, this state is achieved in iteration 3. (b) The Core sub-network generated by the SCA based on the results of the genetic screens (Figure 1a-c). The size and colour of the shapes indicate the relative Z gene score of ovariole number of adult tj >hpo [RNAi], candidate [RNAi] females. Circles indicate seed genes (functionally tested in the screen; Table 2; Supplementary file 1) while triangles are connector genes (novel predicted genes; Supplementary file 1, Figure           suggesting small networks of genes that might work together to regulate two of the three scored phenotypes. In sum, this meta network analysis supports the hypothesis of three potentially largely non-interacting genetic networks that regulate Hippo-dependent ovariole number, Hippo-dependent egg laying, and Hippo-independent egg laying, respectively. The presence of smaller sub-networks (V, VI and VII in Figure 6a) that interact with each other further supports the observation that the putative modules predicted by the SCA -which we refer to as sub-networks -could include genes that function within more than one such sub-network ( Figure 5-figure supplement 4). Moreover, each of these genetically separable sub-networks included genes in multiple signalling pathways ( Figure 6c). The meta network is represented as a Venn diagram, in which each grey dotted outline represents the screen in which a given gene was identified as affecting the scored phenotype. Within each sub-network, grey circles indicate seed genes, and blue circles indicate connector genes. Solid grey lines indicate interactions between genes in the meta network from the PIN. (b) Edge densities between the seven sub-networks of the meta network. (c) Relative enrichment of screened members of the 14 tested developmental signalling pathways within the seven sub-networks of the meta network. Figure 6-figure supplement 1 compares the edge density of the seven sub-networks of the meta network to the edge densities of a random assignment of the positive candidates in the screen to a seven similarly sized sub-networks. The online version of this article includes the following figure supplement(s) for figure 6: Figure supplement 1. Comparison of edge densities between the seven sub-networks of the meta network, and to a randomly assigned grouping of genes in the meta network.

Network analysis predicts novel genes involved in egg laying and ovariole number
The four predicted phenotypic sub-networks produced by the SCA approach included connector genes that were not included in our original screen, and thus had not been tested for possible effects on our phenotypes of interest (triangles in Figure 5b; Figure 5-figure supplements 1-3). Given that prior work in human disease models showed that predicted disease modules can correctly identify gene involvement in the relevant diseases (Chen et al., 2006;Gonzalez et al., 2007;Wang et al., 2017;Wang and Loscalzo, 2018), we asked whether our deployment of the SCA had likewise successfully predicted novel, functionally important genes in each sub-network. To do this, we used UAS:RNAi lines for each connector, driven by tj:Gal4 to measure the effects of knocking down each of the connector genes (triangles in Figure 5b and Of the ten predicted novel connectors within the Core sub-network, loss of function of several of these had significant effects on at least one of the three scored phenotypes. Five affected ovariole number, two affected Hpo-dependent egg laying and one affected Hpo-independent egg laying. However, only one of them significantly altered all three scored phenotypes (Figure 7a; Figure 7source data 1).
The predicted connector genes from two of the other three phenotypic sub-networks showed high positive prediction rates for novel genes within the sub-networks. RNAi against seven out of 18 of the hpo[RNAi] Egg Laying connectors, three out of 11 of the hpo[RNAi] Ovariole Number connectors, and none of the 11 Egg Laying connectors, significantly affected the sub-network phenotype (Figure 7-source data 1). Thus, although the Egg Laying connectors failed to impact this phenotype in our assay, 41.1% and 27.2% of the connectors from the other two sub-networks were correctly predicted (Figure 7b; Figure 7-source data 1).
In sum, taken across all sub-networks, this methodology correctly identified genes regulating at least one of the scored reproductive phenotypes, at significantly higher rates than those obtained in the original screen of 463 members of all known signalling pathways (Figure 7c; Figure 7-source data 2). By this measure, testing network-predicted novel genes derived from experimentally obtained data was even more successful than testing signalling pathways as a means of identifying novel genes that regulate ovariole number and egg laying.

Discussion
In this study, we have identified many novel genes that regulate one or both of egg laying and ovariole number. Though the development of the insect ovary has been studied for over 100 years, our understanding of the genetic mechanisms that regulate the development of the ovary is sparse. The female reproductive system and its ability to produce eggs are one of the key determinants for the survival of a species in an ecological niche. The genes we have uncovered here are possible targets for the regulation of the construction and function of the reproductive system in D. melanogaster, and potentially in other species of insects as well. Understanding the gene regulatory networks that regulate egg laying and ovariole development could provide a framework to understand the key regulatory steps during this process that may be modified over evolutionary time, to yield the wide diversity of ovariole numbers and fecundities displayed by extant insects. We suggest that, given our success in applying a network approach to the results of a traditional forward genetic screen, the field of developmental genetics should find it fruitful to apply network analyses to the interpretation of large scale transcriptomic and proteomic data.

Identification of regulatory sub-networks for ovariole development and egg laying
The D. melanogaster ovary is a commonly studied model for organogenesis (Chen et al., 2001;Godt and Laski, 1995;Lobell et al., 2017;Sarikaya and Extavour, 2015), stem cell maintenance (Gilboa, 2015) and interactions of development and ecology (Cohet and David, 1978;Hodin and Riddiford, 2000a;Klepsatel et al., 2013a;Sarikaya et al., 2019). Nevertheless, our understanding  . Positive prediction rates of the connector genes in each of the four sub-networks. (a) Proportion of Core sub-network connector genes with |Z gene | above the threshold in each of the three screens. The 'All phenotypes' category includes the genes with |Z gene | above the threshold in all three screens. (b) Proportion of tested connector genes in each of the three sub-networks with |Z gene | above the threshold within their corresponding screen.
(c) Proportion of all unique connector genes (dark grey bars) predicted by all four sub-networks compared to the proportion of signalling candidate genes (light grey bars) with |Z gene | above the respective threshold in any of the three phenotypic screens (Figure 1a, b and c). Positive connector and signalling candidates that were above the |Z gene | threshold in all three phenotypic screens (Figure 1a, b and c) are indicated in an 'All phenotypes' column. Statistical significance was computed using the binomial test, comparing the probability of a positive candidate amongst the connectors to the probability of a positive candidate amongst the signalling candidates (p-value is found below each bar). Figure 7-source data 1 tabulates the distribution of seed genes and connectors in each sub-network used in Figure 7a and b. Figure 7-source data 2 tabulates the unique connector and signalling genes above |Z gene | threshold for the three phenotypic measurements plotted in Figure 7c.The connector genes are listed in the Figure 7 continued on next page of the genetic mechanisms that regulate these processes remains fragmentary. In this paper, we have identified four distinct protein interaction sub-networks that regulate ovariole number and egg laying in the D. melanogaster ovary. These sub-networks consist of both novel and previously characterised genes that regulate either ovariole number or egg laying or both, thus enhancing our understanding of the genetic underpinnings of this reproductive system. Of the four sub-networks, the Core sub-network affects both ovariole number and egg laying. The Core sub-network contains numerous housekeeping genes, including regulators of transcription, translation and cell division such as polo (Llamazares et al., 1991), cyclin K (Edwards et al., 1998), nucleosome assembly protein 1 (Ito et al., 1996) and eukaryotic translation release factor 1 (Chao et al., 2003). While polo and eukaryotic translation release factor one are members of signalling pathways, cyclin K and nucleosome assembly protein one are genes predicted by the SCA. Given that the Core sub-network largely consists of genes whose loss of function decreases both of these parameters, we hypothesise that these are essential genes for the basic structure and function of the ovaries. Essential genes are more interconnected in a PIN with higher centrality measures (Jeong et al., 2001; but see Yu et al., 2008) and interestingly, we find that the genes in the Core sub-network also have higher connectivity than those in the other three sub-networks ( In addition to genes that regulate basic cellular processes, the Core sub-network is enriched for the central components of the Hh signalling cascade, namely patched (ptc), smoothened (smo) and costa (cos) (Lee et al., 2016). However, we find that the loss of Hh ligand, which is expressed in the TF cells in the developing larval ovary (Lai et al., 2017), does not significantly affect either ovariole number or egg laying. Though surprising, ligand-independent activation of Hedgehog signalling has been observed before. For example, in the Drosophila eye, loss of either ptc or cos in clones leads to non-cell autonomous proliferation in wild type cells, as well as growth disadvantages in the mutant tissue (Christiansen et al., 2012). In another example, sufficient intracellular smo levels can also activate downstream transcription of Hh pathway targets, showing that Hh itself is not always required to activate the cascade (Jiang et al., 2018).

Development of the larval ovary
The hpo[RNAi] Ovariole Number sub-network is composed of genes that affect the Hippo signalling activity-dependent determination of ovariole number during development. Establishment of ovariole number occurs largely during the third instar stage of larval development in D. melanogaster Hodin and Riddiford, 1998;King et al., 1968;Sahut-Barnola et al., 1996). During this period, the TFCs are specified in the anterior of the ovary and undergo rearrangement into stacks of cells called TFs, each of which gives rise to an ovariole Sahut-Barnola et al., 1995). TF specification requires the expression of engrailed (En) (Bolívar et al., 2006) and the transcription factors Bab1 and Bab2, encoded by the bric-à-brac locus (Couderc et al., 2002;Godt et al., 1993). A third transcription factor, Lmx1a, was recently found to be necessary for the specification of the TFCs (Allbee et al., 2018). Our hpo [RNAi] Ovariole Number sub-network identifies numerous additional novel transcription factors including bunched (bun) and retinoblastoma-family protein (rbf), which we hypothesise could also be involved in the specification of ovariole number. bun and rbf have been implicated in the migration (Dobens et al., 2005) and endoreplication (Cayirlioglu et al., 2003) of the follicle cells during oogenesis, but have not, to our knowledge, been previously identified as playing a role in the context of larval ovary development.

Figure 7 continued
Supplementary file 1 and can be identified under the column header [SubNetworkName]_Connector. The raw data for egg laying and ovariole number for each of the connector genes can be found within the Supplementary file 1. The online version of this article includes the following source data for figure 7: Source data 1. Distribution of seed genes and connectors in each sub-network. Source data 2. Number of unique connector and signalling genes above |Z gene | threshold for the three phenotypes measured in this screen (Figure 1a, b and c).
The TFCs specified in the larval ovary undergo a process of convergent extension to form TFs. This process of convergent extension requires cell intercalation, and the actin depolymerising factor Cofilin, encoded by the gene twinstar, is essential to this process (Chen et al., 2001). During intercalation, the cells also dynamically modify their actin cytoskeleton and their expression of E-cadherin . Our hpo [RNAi] Ovariole Number sub-network further identifies Rho1 (Barrett et al., 1997) and Rho kinase (Rok) (Mizuno et al., 1999) as necessary for correct ovariole number. During the extension of the D. melanogaster embryonic germ band, a commonly studied model of convergent extension, the localised activation of the actin-myosin network facilitated by Rho1 and Rok is necessary for cell intercalation (Kasza et al., 2014). Given the known roles of Rho1 and Rok as regulators of the actin cytoskeleton (Ridley, 2006), we propose that TF assembly in the ovary requires both these proteins for correct cell intercalation. A third actin cytoskeleton regulator, misshapen (msn), was also identified by our hpo [RNAi] Ovariole Number sub-network. msn encodes a MAP kinase previously shown to regulate the polarisation of the actin cytoskeleton during oogenesis (Lewellyn et al., 2013), but has not, to our knowledge, been studied to date in the context of larval ovarian development.
We propose that the polarity of the somatic cells in the ovary is also necessary for correct larval ovary development, given the presence of the lateral membrane proteins discs large 1 (dlg1) and prickle (pk) in the ovariole sub-network. During the maturation of the TFs during larval development, the TFCs undergo significant cell shape changes, coincident with localised expression of beta-catenin and actin to the lateral edges of the TFCs . Restriction of the E-cadherin domain in epithelia requires establishment of the basolateral domain (Harris and Peifer, 2004) and we propose that testing a similar requirement for dlg1 and pk in the larval ovary would be a fruitful avenue for future studies.

Network analysis as a tool in developmental biology
Using a systems biology approach to analyse RNAi screening data has proven fruitful, providing us with new insights into the development and function of the D. melanogaster ovary by identifying novel and previously understudied genes that regulate this process. Systematic analysis of the function of single genes in development has been a historical convention and has provided valuable and precise genetic interaction information (Jansen et al., 2002;von Mering et al., 2002). With the advent of genome-wide analysis, however, we can use data from a larger number of genes to predict the identity of additional functionally significant genes with relative ease (Yu et al., 2006). We note that the novel gene prediction rate within each individual sub-network ranged from as high as 41.1% from the hpo[RNAi] Ovariole Number sub-network to as low as 0% from the Egg Laying subnetwork (Figure 7b; Figure 7-source data 1). We suggest that this may be due to multiple factors. Firstly, the possible incompleteness of the PIN is expected to lead to some areas of the network being sparse or non-existent (von Mering et al., 2002). If the sub-network of interest happens to fall in such regions of the PIN, prediction algorithms will fail. Secondly, the initial restriction of tested genes to signalling pathway members might have provided a seed list too sparse to usefully predict functional connectors. Finally, it could be the case that 'Egg Laying' is such a complex phenotype that its gene regulation cannot be adequately captured within a highly connected network of the type suited for identification by the analyses we have used here.
Ovariole number in D. melanogaster is the outcome of a discrete developmental process with a clear beginning and end, comprising a specific series of cellular behaviours that take place in the confines of one organ Hodin and Riddiford, 2000a;Sahut-Barnola et al., 1996). Once established during larval life, ovariole number in Drosophila remains unaltered through to and during adulthood, even if oogenesis within those ovarioles suffers congenital or age-related defects (King, 1970). Because previous work suggested that ovariole number in Drosophila could have at least some predictive relation to egg laying (Cohet and David, 1978;Klepsatel et al., 2013b;Sarikaya and Extavour, 2015), we reasoned that scoring the latter phenotype in a primary screen (Figure 1a) could be an effective way to uncover ovariole number regulators (Figure 1c). While our results showed that this was true in many cases, it was also clear that these two traits can vary independently (Figure 2), highlighting the fact that ovariole number is not the only determinant of egg laying. Egg-laying dynamics, even during the limited five-day assay used in our study, are likely influenced not just by a single anatomical parameter such as ovariole number, but rather by many biological, biomechanical, hormonal and behavioural processes. Consequently, the sub-network we were able to extract from the results of this screen (Figure 1a and b) might be too coarse to extract novel genes that participate in the potentially complex gene interactions regulating egg laying. Furthermore, genes predicted within each of the sub-networks are unlikely to function exclusively within just one sub-network. This conclusion is supported by our observation that genes predicted to function in any of the sub-networks, also function in at least one of the four sub-networks at a higher rate than genes selected for screening by their presence in a signalling pathway (Figure 7c). We also observe that although substantial regions of the meta network do not share interactions with genes in the other sub-networks (Figure 6b), we do find smaller sub-networks where there is some overlap, further indicating pleiotropy of some genes in both egg laying and ovariole number regulation.
The predictive rates of the approach we have used here, although encouraging, are likely limited by the degree of noise in the high-throughput data used to generate the PIN (Li et al., 2010), the sparseness of the PIN, and the degree of misidentification of protein interactions (Zhang et al., 2017). Addressing one or more of these parameters could improve the outcomes of future network predictions from developmental genetics data. For example, the problem of sparseness, which is a paucity of high confidence detectable interactions relative to all biologically relevant interactions, has been addressed in other studies by using an 'Interolog PIN' (Matthews et al., 2001) in place of an organism-specific PIN. An Interolog PIN combines known interactions from multiple organisms and has been used successfully to identify, for example, gene modules relevant in squamous carcinoma, based on a starting dataset of microarray data on differentially expressed genes between cancer cells and the surrounding tissue (Wachi et al., 2005). Future studies applying such an Interolog PIN to the outcomes of genetic screens for developmental processes of interest could potentially overcome the problem of sparseness, as well as the biases towards proteins that are more heavily studied and thus better represented in organism-specific PINs.

Experimental model and subject details
Wild type and mutant lines of Drosophila melanogaster were obtained from publicly accessible stock centers and maintained as described in 'Fly Stocks' below. Genotypes and provenance are provided in the Key Resource Table. Candidate genes were randomly assigned to batches for screening (see the Supplementary file 1 for which genes were in each batch). F1 animals from the same cross were randomly assigned to experimental groups for phenotyping in all screens.

Method details Fly stocks
Flies were reared at 25˚C at 60% humidity with standard Drosophila food (Sarikaya et al., 2012) containing yeast and in uncrowded conditions as previously defined (Sarikaya and Extavour, 2015). RNAi lines were obtained from the TRiP RNAi collection at the Bloomington Drosophila Stock Centre (BDSC) and from the Vienna Drosophila Resource Centre (VDRC). See Key Resources Table for complete list of stocks used in this study. Oregon R was used as a wild-type strain. The genotype of the traffic jam:Gal4 line used in the screen was y w; P{w [+mW.hs]

Egg and ovariole number counts
Adult egg laying was quantified by crossing three virgin females of the desired genotype (see 'Screen design' below) with two males in a vial containing standard food and yeast granules (day one) and then transferring them into a fresh food vial without yeast granules for a 24 hr period. Eggs from vials were then counted by visual inspection of the surface of the food in the vial. Males and females were transferred to fresh food vials without yeast granules, every day thereafter until day six. All egg -laying measurements reported and analysed in the paper are the sum of the eggs laid by three adult female flies over the five days of this assay (days two through six without yeast granules). Data from any vial in which either a female or male died, during the course of the experiment, were not included in the analysis.
Ovariole number was quantified by mating ten virgin adult females with five virgin adult Oregon R males for three days post-eclosion in vials with yeast at 25˚C and 60% humidity. After this threeday mating period, all 20 adult ovaries from the mated females were dissected in 1X PBS with 0.1% Triton-X-100 and stained with 1 mg/ml Hoechst 33321 (1:10,000 of a 10 mg/ml stock solution). Ovarioles were separated from each other with No. 5 forceps (Fine Science Tools) and counted by counting the number of germaria under a ZEISS Stemi 305 compact stereo microscope with a NIGHTSEA stereo microscope UV Fluorescence adaptor.

Screen design
In the primary screen ( Figure 1a: hpo[RNAi] Egg Laying), 463 candidate genes (Supplementary file 1) were screened for the effect of an RNAi-induced loss of gene function in a hpo [RNAi] background on the number of eggs laid in the first five days of mating (see 'Egg and ovariole number counts' above) by adult females. These females were the F1 offspring of UAS:candidate gene RNAi males crossed to P{w [+mW.hs] (Figure 1a: hpo[RNAi] Egg Laying). All genes that yielded an egg laying count with a |Z gene | > 1 (see 'Gene selection based on Z score and batch standardisation' below) were selected to undergo two secondary screenings (n = 273, Table 2; Figure 1d). First, these genes were screened for effects on the egg laying of mated adult female offspring from a cross of UAS:candidate gene[RNAi] males and tj:Gal4 virgin adult females (Figure 1b: Egg Laying). Secondly, these genes were screened for effects on ovariole number in a hpo [RNAi] background. All 20 ovaries from ten adult female F1 offspring of a cross between UAS:candidate gene [RNAi]  Gene selection based on Z score and batch standardisation Candidate genes were screened in batches with an average size of 50 genes. For each batch, control flies were the female F1 offspring of Oregon R males crossed to P{w [+mW.hs]=GawB}NP1624;P{y [+t7.7] v[+t1.8]=TRiP.HMS00006}attP2 (tj:Gal4; UAS:hpo [RNAi]) virgin adult females. Because the control group in each batch had slightly different distributions of egg laying and ovariole number values (Figure 1-figure supplement 1), it was inappropriate to compare absolute mean values between genes that were scored in different batches. Instead, comparisons of the Z score of each candidate (Z gene ) to its batch control group was used as a discriminant. This approach standardises for batch effects and allows the comparison of all genotypes within and across the primary and secondary screens with a single metric (Z gene ).
Firstly, the mean and standard deviation of the eggs laid by the control genotype for a batch were calculated as m b and s b respectively. Then, using the number of eggs laid by adult females of a candidate gene RNAi (x gene ) of the same batch, the Z score for the egg laying count of that gene (Z gene ) was calculated as Z gene ¼ xgeneÀb sb . The same standardisation protocol was applied to both egg laying and ovariole number counts of every gene and its corresponding batch control.
Ovariole numbers were derived from counts of the number of ovarioles per ovary for 20 ovaries per candidate gene, and a threshold of |Z gene | > 2 was applied for ovariole number phenotype. Egg laying counts were derived from measurements of three females in a single vial per gene. We therefore chose to be more conservative in our Z score comparisons for the egg laying phenotype, than for ovariole number phenotype, and applied a stringent threshold of |Z gene | > 5 to select genes of interest. All genes with |Z gene | values above these thresholds are referred to throughout the study as 'positive candidates'. (See Ipython notebooks 02_Z_score_calculation.ipynb and 02.2_Z_score_calcu-lation_prediction.ipynb for code implementation and calculation of Z scores, and 06_Screen Analysis.ipynb for batch effects.)

Signalling pathway enrichment analysis
To study the enrichment of a particular signalling pathway in a group of candidate genes that had similar phenotypic effects revealed by the screen, custom scripts (see 07_Signaling_pathway_analysis.ipynb for code implementation) were generated to implement two different methods (Figure 3a  The first method is a numerical method that uses random sampling to calculate the null distribution of the number of members (M) of a signalling pathway (S) that would be expected at random in a set of genes of size (N). The script randomly sampled N genes from among the 463 tested D. melanogaster signalling genes 10,000 times, and counted the number of genes (M) that were members of the signalling pathway S. Positive candidates in each of the three screens were sorted by their presence in signalling pathways and counted. The Z score was then calculated by comparing the experimentally observed number of positive candidates in each signalling pathway against the randomly sampled null distribution.
The second method used the hypergeometric p-value to calculate the probability of M members of a signalling pathway being in a group of N genes, given a starting population of 463 tested D. melanogaster signalling genes, and the known attribution to a pathway S of each gene.

Protein-Protein Interaction Network (PIN) building
There is no standard complete Protein-Protein Interaction network (PIN) available for Drosophila melanogaster. However, there exist many smaller networks from different screens, as well as literature extractions. We therefore combined data from these sources and then created a PIN for use in the present study, as follows: Step 1 Several screens assessing protein-protein interactions have been centralised in a database called DroID: http://www.droidb.org. The version DroID_v2018_08 was used. All available datasets were first downloaded from that database using this link: http://www.droidb.org/Downloads.jsp. The description of all of these datasets can be found here: http://www.droidb.org/DBdescription.jsp.

Building degree-controlled randomised networks
We assessed the modularity of the networks by comparing the network metrics of each sub-network to a degree-controlled randomly sampled network. To generate this degree controlled random network, we applied a previously developed method (Guney et al., 2016). In short, nodes in the PPI are binned by degree with the minimum size of each bin being set at 100 nodes. Bins are constructed iteratively from the lowest degree to the highest degree in the network. To sample a set of nodes, the sub-network degree distribution is computed, using the bin cut-off, from the PPI. Then, nodes are randomly selected from each bin to match this degree distribution (see 05.2_Degree_-Controlled_Testing.ipynb for code implementation).
Assessing the utility of the Seed Connector Algorithm in building network modules Network modules were assessed using the previously published Seed Connector Algorithm (SCA) Wang and Loscalzo, 2018), implemented here in python (see 04_Seed-Connector.ipynb) and illustrated in Figure 5a. Creating a module using the SCA requires a list of seed genes and a PIN. From each of the three screens, we selected the genes whose |Z gene | value was above the threshold and created three seed lists, respectively (Figure 4c: Egg laying, hpo[RNAi] egg laying and hpo [RNAi] ovariole 'seed' list). A fourth list consisting of the intersection of the aforementioned seed lists was also collated and called the core 'seed' list ( Figure 4b). Genes were assigned in the core list if they passed the Z threshold in all three screens. The SCA was then executed on each of these seed lists using the PIN. Not all genes in the four seed lists were found in the PIN (specifically, CG12147 in the hpo[RNAi] Egg Laying seed list and CG6104 in the hpo [RNAi] Ovariole number seed list were absent from the PIN) and were therefore eliminated from further network analysis. The removal of these two genes accounts for the variation in the number of positive candidates in Table 2 and the number of seed genes in the module. Modules were obtained for each seed list (Figure 5b; Figure 5-figure supplements 1-3) consisting of the seed genes (circles in Figure 5b and Figure 5-figure supplements 1-3) and previously untested genes added by the SCA (squares in Figure 5b and Figure 5-figure supplements 1-3) to increase the LCC size that we refer to as connector genes (see 04_Seed-Connector.ipynb). The results of the algorithm are summarised in the Supplementary file 1.
The modularity of the sub-networks was then assessed using four network metrics namely Largest Connected Component (LCC), number of edges, network density and average shortest path in the LCC. Each metric for each module was assessed using distance of the network metric to a null distribution. Initially, the null distribution was calculated by taking 1000 samples of 463 genes randomly selected from the PIN and calculating the above metrics. We found that the 463 genes selected in the signalling screen were already more connected than the null distribution of sets of 463 genes randomly selected from the PIN ( Figure 5-figure supplement 4a). Therefore, to avoid a false positive detection of modularity, the four experimentally obtained sub-networks were compared to null distributions obtained by randomly sampling an equal number of genes from the 463 signalling candidate genes selected for our screen. For each of the four modules, comparison of the metrics was performed on the seed lists and the sub-network after the SCA. Most metrics were enriched in the seed group when compared to the null distribution with the exception of the Average shortest path (

Meta network
To build the meta network, the genes from all four sub-networks were concatenated into one network. This network was then visually sorted in an approach akin to projecting the network onto a Venn Diagram. The meta network was sorted by which of the three screens the gene was positive in. The intersections were genes whose |Z gene | value was above the threshold in more than one and possibly all three of the screening paradigms. For example, if a gene was found in the hpo[RNAi] Ovariole Number and Egg Laying sub-networks it is then assigned to the dual positive group hpo [RNAi] Ovariole Number/Egg Laying (Figure 6a, sub-network VI). After applying this grouping strategy, the connectivity across the groups was studied by calculating the edge density between all groups (density ¼ Edges1;2 Nodes1ÃNodes2 ). Finally, the proportion of each signalling candidate in each of those groups was calculated by taking the number of members of a signalling pathway divided by the total members of a group (see Ipython notebook 08_MetaModule_Analysis.ipynb). A single gene, sloppy paired 1, was a seed in the Egg Laying sub-network and also a connector in the hpo[RNAi] Egg Laying sub-network; it fell within sub-network VII in the meta network, and is marked as a seed (grey) in Figure 6a.

Quantification and statistical analysis Number of samples
The number of samples across the different screens were as follows:

Hpo[RNAi] Egg Laying and Egg Laying screens
Controls: five vials of three females and two males Sample: one vial of three females and two males Hpo [RNAi] Ovariole number screen Controls: 20 flies, two ovaries per fly considered as independent measurements Sample: 10 flies, two ovaries per fly considered as independent measurements

Correction of batch effect
Despite best efforts to maintain the exact same condition between each experiment, some variation was measured between the batches. Control flies showed variations in both measured phenotypes, ovariole number and egg laying (Figure 1-figure supplement 1). In order to compare the values measured across different batches, each sample was standardised by calculating its Z score (Z gene ) to the control distribution. For each batch, the measurements for controls were pooled into a distribution, and the mean and standard deviation was computed. Then each sample was compared to its respective batch and its Z score computed (see 'Gene selection based on Z score and batch standardisation' for formula).

Statistical analysis
All statistical analyses were performed using the scipy stats module (https://www.scipy.org/) and scikit-learn (https://scikit-learn.org/) python package. Statistical tests and p-values are reported in the figure legends. All statistical tests can be found in the Ipython notebooks mentioned below.

Data and code availability
This study generated a series of python3 Ipython notebook files that perform the entire analysis presented in this study. All the results presented in this paper, including the figures with the exception of the network visualisations, which were created using Cytoscape3 (https://cytoscape.org/) can be reproduced by running the aforementioned python3 code. The raw data, calculations made with these data, and code used for calculations and analyses (Ipython notebooks) are available as supplementary information. For ease of access, legibility and reproducibility, the code and datasets have been deposited in a GitHub repository available at https://github.com/extavourlab/hpo_ova_eggL_ screen. The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.