Experimental validation of predicted cancer genes using FRET

Huge amounts of data are generated in genome wide experiments, designed to investigate diseases with complex genetic causes. Follow up of all potential leads produced by such experiments is currently cost prohibitive and time consuming. Gene prioritization tools alleviate these constraints by directing further experimental efforts towards the most promising candidate targets. Recently a gene prioritization tool called MaxLink was shown to outperform other widely used state-of-the-art prioritization tools in a large scale in silico benchmark. An experimental validation of predictions made by MaxLink has however been lacking. In this study we used Fluorescence Resonance Energy Transfer, an established experimental technique for detection of protein-protein interactions, to validate potential cancer genes predicted by MaxLink. Our results provide confidence in the use of MaxLink for selection of new targets in the battle with polygenic diseases.


Introduction
Many diseases that involve complex interplay between multiple genes and the environment, such as cancer, diabetes and Alzheimer's disease, have huge implications for the public health. In order to combat these important issues, advances in high-throughput experimental methods, such as Genome Wide Association Studies, RNA inference screens, etc have lately been used to speed up the discovery of genes/proteins implicated in diseases with complex genetic causes. Hundreds (or thousands) of potential candidate genes involved in polygenic disorders have been uncovered by these experimental advances (Tranchevent et al 2011, Doncheva et al 2012. Follow up of all these leads with rigorous target validation is currently cost prohibitive and time consuming. Therefore prioritization of candidate disease genes is necessary and important in order to maximize the information from the high-throughput experiments, by focusing further experimental efforts on the most promising candidates. Current prioritization tools can usually provide a manageable list of genes, to focus on, and often rank the genes by their estimated importance for future investigations. Usually it's not expected that a priori tization tool can provide a list that is 100% accurate but it's important that the prioritized genes have a biologically meaningful relationship to the studied pathology (Piro and Di Cunto 2012).
MaxLink (Guala et al 2014) is a gene prioritization method that when provided with a set of genes known to be involved in a particular phenotype, e.g. cancer, produces a ranked list of genes/proteins tightly associated to the input set i.e. predicting novel genes/proteins potentially involved in the phenotype of interest. MaxLink works on a gene/protein association network such as FunCoup (Schmitt et al 2014) in order to make its predictions. MaxLink has recently been shown to outperform other state-of-the-art gene prioritization algorithms on a largest, to date, benchmark for gene prioritization tools (Guala and Sonnhammer 2017). Initial validations of MaxLink were performed using Gene Ontology (Ashburner et al 2000) and datasets of differential expression in cancer versus normal tissue (Östlund et al 2010). Max-Link has also been tested on set of rare disease genes from Orphanet (Aymé 2003). However, a direct validation of predicted interactions is still lacking and would add additional credibility to this promising prioritization method.
There is a wealth of biochemical techniques to study protein-protein interactions (PPIs), but they usually require extraction of the studied proteins from their natural environment and are too laborious when many PPIs need to be tested. Fluorescence Resonance Energy Transfer (FRET) (Kenworthy 2001) is a widely used technique that allows the study of PPIs directly in fixed or live cells using fluorescence microscopy. FRET relies on transfer of energy between two fluorophores in close proximity (typically<10 nm). The interaction partners of a PPI can be labeled with fluorophore conjugated antibodies that exhibit FRET when the labeled proteins are close enough for a possible interaction.
In this study we used FRET to experimentally validate a list of potential genes implicated in cancer, predicted by MaxLink. We attempted to test 18 predicted interactions between the known oncogene Janus kinase 2 (JAK2) and other proteins, and could confirm 15 of them experimentally. This substantially expands the list of verified JAK2 interactions that can be used to guide future functional experiments. Furthermore, the study demonstrates the high reliability of MaxLink predictions, and that a FRET based method is efficient for large-scale probing of proteinprotein interactions.

Materials and methods
Functional association network Network prioritization methods need a source of protein interactions in order to make their predictions. We used FunCoup 3.0 (Schmitt et al 2014) as the underlying protein interaction network, since the method for gene prioritization is seamlessly integrated into this network. FunCoup is one of the most comprehensive sources of gene/protein associations. It combines various different types of gene/protein interaction data e.g. Protein-Protein Interaction, Protein domain interaction, Protein co-expression, mRNA co-expression, Subcellular co-localization, co-miRNA regulation by shared miRNA targeting, shared transcription factor binding, phylogenetic profile similarity and genetic interaction. This data is collected from 11 species including Homo sapiens and 10 model organisms and transferred via orthologous associations in order to increase the amount of evidence and to make the inference of interactions more robust. FunCoup can be used with different confidence thresholds (0.1-1) marking the lowest allowed confidence of interaction between proteins. Confidence threshold of 0.75 was used in this study resulting in 1123 873 links between 12 391 genes.

Method for prediction of cancer genes
MaxLink (Guala et al 2014) was used for prediction of potential candidate genes. This gene prioritization method tries to exploit the notion of guilt-by-association (Altshuler et al 2000, Oliver 2000, i.e. the fact that proteins involved in the same or similar diseases tend to interact with each other (Zhu et al 2014). MaxLink uses the topology of an underlying protein association network e.g. FunCoup to identify genes/proteins that interact with a set of query genes/proteins involved in a phenotype of interest e.g. cancer. The mechanism by which MaxLink prioritizes candidate genes is depicted below (figure 1). MaxLink accepts a set of genes known to be involved in a phenotype as the query, and uses FunCoup to find all their direct neighbours. It then applies a connectivity filter corrected for multiple hypothesis testing, to select genes enriched in connections to the query. This step is done in order to filter out some of the hub genes that spuriously interact with the query due to their high number of connections. In the last step, MaxLink returns a list of potential candidates sorted by the number of connections to the query, called the MaxLink score (ML).

Validation pipeline
The overview of the validation pipeline is depicted below (figure 2). Its individual steps are described in detail in the following sections.

Selection of query genes
The query of genes known to be involved in cancer was compiled from Cancer Gene Sensus (CGC) (Forbes et al 2011) and UniProt (The UniProt Consortium 2014). CGC (downloaded on 2012-MAR-15) genes and genes annotated with keywords related to cancer (Supp. table 1 is available online at stacks.iop. org/MAF/6/035007/mmedia) in the CC field of UniProt (downloaded on 2013-NOV-15) were combined (Östlund et al 2010), and used as the query to MaxLink.

Prediction of cancer genes
We ran MaxLink with a connectivity filter cutoff of 0.5, meaning that in order to pass the filter a candidate gene needed to have more connections to the query genes than to the other genes in the network. FunCoup was used with confidence threshold of 0.75. One of the query cancer genes with the most predicted candidates, JAK2, was selected for experimental evaluation together with its predicted associations (Supp. table 2). The final selection of candidates for validation was further restricted to those that had antibodies in the Human Protein Atlas (HPA) (Uhlen et al 2010) and were expressed in sufficient quantity (Fragments per  Table 1. Tested FRET interactions. The list of predicted interactions intended for testing. The information supplied is the MaxLink score (ML), whether a direct interaction with JAK2 has been demonstrated before (Known) and the database or publication the evidence comes from: intAct (i), BioGrid (b), iRefIndex (w), or publication, respectively. The FPKM in the chosen cell line and the main HPA location of the protein are also presented.

Fluorescence resonance energy transfer (FRET)
FRET is a fluorescence based method that can be used to detect the close proximity (possible interaction) between two proteins. FRET is based on a nonradiative energy transfer mechanism between a donor and an acceptor fluorophore. The efficiency in the energy transfer is inversely proportional to the sixth power of the distance between the fluorophores and can typically occur if the distance is on the order of <10 nm (figure 3). The experimental setup includes labeling the potentially interacting pair of proteins with different fluorophores and measurement of fluorescence in wavelength bands corresponding to donor and acceptor fluorescence. FRET efficiency can then be calculated from the measured shift in emission from the donor to the acceptor wavelengths.

Experimental setup
In the experimental setup JAK2 was labeled with a commercially available primary antibody (mouse) and a secondary anti-mouse antibody carrying the donor fluorophore. Each candidate protein was labeled with an appropriate primary antibody (rabbit) from the HPA and a secondary anti-rabbit antibody carrying the acceptor fluorophore ( figure 4(a)). Each interaction was studied in two wells of a 96-well plate by means of fluorescence imaging. Images were acquired in three different fluorescence channels e.g. donor, acceptor and FRET in 16 symmetrically allocated fields of view. The signal means of the FRET channel were compared to the signal means of the corresponding donor channel in order to establish presence of FRET. In order to establish the maximum value of FRET in our experimental setup we used a positive control construct with one primary antibody for the target protein (e.g. mouse monoclonal anti-JAK2) and two different secondary antibodies (e.g. goat monoclonal anti-mouse and rabbit monoclonal anti-mouse) fused with donor and acceptor fluorophores respectively ( figure 4(b)). This guaranteed that the donor and acceptor fluorophores were in sufficiently close vicinity of each other in order to produce a FRET signal detectable by fluorescence microscopy.
An absolute biological negative control can be difficult to establish in an experimental setup. If two proteins are expressed in different compartments of the cell (e.g. mitochondria and nucleoplasm), then there should be no measurable FRET. However, if the two proteins are in the same compartment, it is not possible to rule out that they could be physically or functionally interacting. We therefore used both an experimental and a theoretical negative controls. As an experimental negative control we used the mitochondrial inner membrane protein Mic60 as donor and the plasma membrane protein Na, K-ATPase α1 (NKA) as theacceptor. The theoretical negative control was based on calculation of random FRET in a volume with varying concentration of donor and acceptor molecules. To this end we use the equation for FRET efficiency according to Förster (Förster et al 1939, Schaufele et al 2005 Where E is the FRET efficiency between the donor and acceptor fluorophores at distance R , DA and R 0 is the Förster radius. In our study we use the dyes Alexa For full details of the experimental setup including plate design, antibody details and concentrations please see below.

Cell cultivation
The U-S OS human osteosarcoma cell line (ATCC-LGC) used throughout this study was grown in 37°C (humidified air, 5% CO2) in McCoy's 5 A medium (Sigma-Aldrich) supplemented with 10% fetal bovine serum (Sigma-Aldrich). Cells were seeded on to fibronectin coated 96-well glass bottomed plates (Greiner Sensoplate Plus, Cat# 655892, Greiner Bio-One, Germany) at a concentration of 15 000 cells/well and cultivated for 24 h to a confluency of 60%-70% before immunostaining was carried out.
Immunostaining PBS-washed cells were fixed in 4% paraformaldehyde (PFA) in growth media supplemented with 10% FBS for 15 min, followed by permeabilization with 0.1% Triton X-100 in PBS for 3×5 min. After a washing step with PBS, cells were incubated with the primary antibodies overnight at 4°C. The antibody targeting Figure 3. FRET overview. The sample is illuminated at a wavelength X nm that excite the donor fluorophore. If the acceptor fluorophore is in close proximity, energy can be transferred and emitted by the acceptor as light of a longer wavelength, Y nm. JAK2 (ab37226, Abcam) was diluted to 1 μg/ml and rabbit polyclonal HPA antibodies targeting the proposed interaction partners were diluted to 2-4 μg/ml in blocking buffer (PBS with 4% fetal bovine serum). On the next day after 4×10 min washes with PBS, the cells were incubated for 90 min at room temperature with the following secondary antibodies (all from ThermoFisher Scientific) diluted to 1 μg/ml in blocking buffer: goat anti-mouse AlexaFluor 488 (Thermo-Fisher A11001/A11029), goat anti-rabbit AlexaFluor 555 (thermoFisher A21428/A21429). Cells were subsequently counterstained with DAPI for 10 min. After washing with PBS, the wells were completely filled with PBS containing 78% glycerol and sealed. The entire immunostaining procedure was carried out using Tecan Freedom EVO pipetting robotic systems.

Microscopy
Fluorescent images were acquired with a Leica SP5 confocal microscope (DM6000CS) equipped with a 40×/0.85NA objective (Leica Microsystems, Mannheim, Germany). The image acquisition was performed automatically using the MatrixScreener M3 in LAS AF software (Leica Microsystem, Mannheim, Germany). FRET experiments were performed by excitation at 488 nm (donor excitation) and 543 nm (acceptor excitation) with detection in wavelength bands 495-540 nm for the donor channel and 550-625 nm for the acceptor (FRET) channel. The settings for each image were as followed: Pinhole 1 Airy unit, 600 Hz scan speed, 16 bit acquisition, and an image size of 2048×2048 pixels with a pixel size of 0.1 μm. In order to reduce potential artifacts from Poisson and shot noise (Nagy et al 2014) the images were down-sampled to 1024×1024 pixels by a 2×2 averaging. In order to reduce the effect of variability in experimental factors, such as cell density, protein expression level, and staining efficiency, we captured 16 symmetrically allocated fields of view from each well and used triplicates for each studied interaction.

Analysis of FRET signal
In analysis of FRET, compensation for spectral crosstalk between donor and acceptor channels need to be performed. To achieve this, each experimental group consisted of a triplicate, labeled either with donor, acceptor or both fluorophores. The samples with both donor and acceptor fluorophores were used to measure FRET. The background corrected intensity in donor ( ) I D and acceptor channels ( ) I A were recorded using donor excitation wavelength followed by recording the intensity in the acceptor channel ¢ ( ) I A using acceptor excitation wavelength. The compensation factors B and D, were used to calculate on a pixel by pixel basis the corrected FRET intensity ( ) FRET corr and the FRET ratio ( ) FRET ratio which was reported as a mean value for each analyzed cell and experimental group.

Results
The query submitted to MaxLink consisted of 787 unique genes (Supp. table 3) and yielded 706 candidates (Supp. table 2). We selected one of the cancer genes that had the most predicted candidates, JAK2, which is a non-receptor tyrosine kinase implicated in eosinophilic, lymphoblastic and myeloid leukemia. Experimental evaluation was performed together with 18 ( figure 5, table 1) of its predicted interaction partners that remained after the initial 40 were restricted by availability of antibodies in the Human Protein Atlas (HPA) and sufficient intracellular expression of the protein in question. Three of the studied interactions (Jak2 with PTPN12, TNKS and CASP8, respectively) were not possible to be tested due to failure in binding of the primary antibodies to their respective targets, as evident by the lack of staining for donor and/or acceptor antibodies. The remaining 16 interactions (the positive control and 15 predicted interactions) showed various levels of signal in the three studied channels as depicted below (figure 6).
Pixelwise analysis of the FRET signal in the positive control was able to clearly distinguish the background FRET intensities from the true signal ( figure 7).
When FRET intensities were corrected for the intensities in the donor channel and presented as FRET ratio i.e. the fraction of the intensity in the donor channel and the FRET channel, the ratio for the positive control marked the highest possible mean FRET ratio of 0.619 (figure 7, Supp. table 4). The distribution of FRET ratio for the positive control had a relatively narrow spread (std≈0.015) (figure 8, Supp. table 4).
The results for predicted interactors exhibited mean FRET ratio values between 0.055 and 0.086 ( figure 8, Supp. table 4). The distributions for the predicted interactors were even tighter than that for the positive control and ranged between std:s of 0.006 and 0.014. The level of FRET ratio thus seen for the predicted interactions corresponded to between 8% and 14% of maximum theoretically possible FRET ratio . Mean FRET ratio values were also much higher than both the experimental and the simulated lower bounds, which had a mean FRET ratio of 0.02 and 0.002, respectively, amounting to 3% and 0.3% of the theoretical maximum, respectively (figure 8, Supp. table 4).
Some of the interactions predicted by MaxLink have already seen support in the literature in terms of being already known associations in the FunCoup     (table 1). We included these interactions in our validation, both for completeness in terms of MaxLink's predictions and also to see whether we could reproduce the previous findings using our experimental method of choice, i.e. FRET. However, we were particularly interested in interactions predicted by MaxLink which did not have any previous direct PPI evidence in the literature nor in any other of the major protein interaction databases, including FunCoup, UniProt, IntAct, BioGrid (Chatr-Aryamontri et al 2015) and iRefIndex (Razick et al 2008). One such example is the direct interaction of JAK2 and RFWD2 (table 1). RFWD2 is an E3 ubiquitin-protein ligase whose main function involves tagging of proteins by ubiquitin, which leads to subsequent degradation of the tagged protein in the proteasome (Uljon et al 2016). One of RFWD2's targets is the prominent protein p53 involved in many different types of cancer (Muller and Vousden 2013). Among other RFWD2's targets is the COP1 (Uljon et al 2016), which plays an important role in Chronic Lymphocytic Leukemia (CLL) (Fu et al 2015). There is evidence of JAK2 involvement in CLL (Todisco et al 2016). JAK2 localizes both to cytoplasm and the nucleoplasm as well as some other cell compartments (table 1) according to HPA, UniProt, GO and in our FRET experiments (figures 6 and 9). The primary location of RFWD2 in the cell is in the nucleoplasm according to HPA, which is also where we see most of the FRET signal stemming from (figure 9).
Another predicted interaction without previous direct PPI evidence is JAK2-PRKCD (table 1). Both proteins are kinases important for intra-cellular signaling (Aaronson 2002, Paul et al 2015 and most importantly they have both been shown to have key roles in the pathogenesis of breast cancer, and as potential treatment targets (Urtreger et al 2012, Paul et al 2015, Wang et al 2015; PRKCD is overexpressed in malignant mammary tissue where it acts by promoting cell growth and as a poor survival factor as well as enhancing resistance to apoptosis. JAK2 activation enhances proliferation and survival of aberrant mammary cells through its downstream targets (Rodriguez-Barrueco et al 2015) and it has been shown to increase breast cancer survival when targeted by microRNA (Wang et al 2015). Of the tested interactions, this pair exhibited one of the strongest FRET signals. Both JAK2 and PRKCD have nucleoplasm and cytoplasm as their primary locations according to both HPA and UniProt.

Discussion
Gene prioritization tools used for leveraging the wealth of information stemming from high-throughput attempts to map polygenic disorders require experimental validation besides in silico assessment of their performance. The gene prioritization tool Max-Link was recently shown to outperform other state-ofthe-art prioritization tools. In this study, we have demonstrated that interactions predicted by MaxLink, could be validated using FRET, a sensitive technique for detection of protein interactions on the nanometer scale. In our experimental setup we were able to confirm some of the previously identified interaction partners of JAK2. We were also able to experimentally validate a number of new interactions predicted by MaxLink. These novel interactions may have implications for understanding Chronic Lymphocytic Leukemia and breast cancer pathogenesis as well as support their investigation as potential therapeutic targets.
The level of FRET measured in the tested interactions is subject to several factors, such as variation in the level of expression of both JAK2 and the interaction protein in the used cell line, the binding efficiency of secondary antibodies (labeled with donor and acceptor fluorophores, respectively) to the primary antibodies, and the steric conformation of antibody binding. The expression level of JAK2 (the donor) can be considered to be constant in the experiments. However, the expression level of the acceptor proteins varies by an order of magnitude for the different Figure 9. FRET of JAK2 and RFWD2. FRET images of JAK2 and RFWD2 in Donor (green), Acceptor (red), and FRET (yellow) channels.
predicted proteins (table 1, Supp. table 2). A low expression of the acceptor protein results in a reduced FRET efficiency. The steric conformation, i.e. how the antibodies bind and organize spatially in relation to the proteins, has a direct effect on the FRET efficiency, since FRET is inversely proportional to the sixth power of distance between donor and acceptor fluorophore. In the most unfavorable situation the antibodies point outwards from the interaction and then reduce the FRET efficiency significantly.
Those factors reduce the efficiency of FRET, i.e. the FRET ratio for the predicted interactions. In our positive control, most of these factors are not applicable and the FRET efficiency is higher than for the validations. Experimental conditions, such as unspecific background fluorescence and imperfections in the microscope and analysis setup can contribute to a baseline of background FRET noise. We estimated this both by simulations and experimentally and used it as a negative control. The experimentally measured FRET efficiency for predicted interactions were found to be at least one order of magnitude higher than both the experimental and simulated negative controls. Thus, despite all factors that reduce FRET efficiency, we were able to verify the predicted interactions with high confidence.
In conclusion, we have experimentally verified predictions of PPIs made with a state-of-the art gene prioritization method, MaxLink. In the process, we have validated novel interactions with potential implications for cancerogenesis and drug targeting. This should provide confidence to users of gene prioritization methods in general and in particular for users of MaxLink in the identification and selection of new potential targets of relevance for e.g. polygenic diseases.