Zbrowse: an interactive GWAS results browser

The growing number of genotyped populations, the advent of high-throughput phenotyping techniques and the development of GWAS analysis software has rapidly accelerated the number of GWAS experimental results. Candidate gene discovery from these results files is often tedious, involving many manual steps searching for genes in windows around a significant SNP. This problem rapidly becomes more complex when an analyst wishes to compare multiple GWAS studies for pleiotropic or environment specific effects. To this end, we have developed a fast and intuitive interactive browser for the viewing of GWAS results with a focus on an ability to compare results across multiple traits or experiments. The software can easily be run on a desktop computer with software that bioinformaticians are likely already familiar with. Additionally, the software can be hosted or embedded on a server for easy access by anyone with a modern web browser. Abstract The growing number of genotyped populations, the advent of high-throughput phenotyping techniques 30 and the development of GWAS analysis software has rapidly accelerated the number of GWAS experiments. Candidate gene discovery from these datasets is often tedious, involving many manual steps searching for genes in windows around a significant SNP. This problem rapidly becomes more complex when trying to compare multiple GWAS studies to identify pleiotropic or treatment/environment specific effects. To address this problem, we have developed a fast and intuitive interactive browser for the viewing of GWAS results with a focus on an ability to compare 36 results across multiple traits or experiments. The software can easily be run on a desktop computer a comparison of GWAS results from three phenotypes measured three separate and one aggregate experiment (12 GWAS experiments total at of figure displays the color of points that correspond to the of and selected side Clicking the points in the legend allows a user to easily show or hide points from that trait. The title the plot is automatically generated from the filename of the provided by This makes it easy to determine which GWAS experiment is being plotted. The popup is by hovering the over

37 with open access software. Additionally, the software can be hosted or embedded on a server for easy 38 access by anyone with a modern web browser. 51 Identifying the peaks of interest usually involves sifting through the results table for the range of 52 coordinates under the peak of interest and then using those coordinates to filter a large gene 53 annotation file. The extra steps involved in exploring the data in this way makes it more likely that 54 interesting associations may be missed either due to 1) mistakes made in attempting to mine the large 55 results files or 2) the dataset not being mined deeply enough due to the difficulty of looking for genes 56 under less significant peaks. Additionally, this method quickly becomes tedious when analyzing 57 multiple phenotypes or relatively complex traits.

58
59 Some web applications provide tools for viewing Manhattan plots (Table 1), but they are all either 60 specific to a single species or don't allow interactive results browsing. These resources also do not 61 allow for easy viewing and comparison of GWAS results across phenotypes and studies, a situation 62 that frequently arises with structured populations.    103 104 The first tab in the list, and the landing page when the application is first loaded, is the Manage tab 105 ( Figure 1). This tab allows a new GWAS dataset to be uploaded into the application or a pre-loaded 106 dataset from a dropdown menu can be selected. Data can be uploaded in a flat file delimited with 107 either commas or tabs or an RData object. These flexible file formats allow any type of data to be 108 loaded into the browser.
110 In Figure 1, we have loaded the results from the sorghum ionomics experiment and selected the 111 appropriate columns to be used for plotting the results. The results file was generated by taking the 112 most significant SNP hits from each of the 80 GWAS experiments performed (20 phenotypes 113 measured in 3 locations and an experiment combining the location data). We added a column 114 describing which experiment (e.g. the three locations) and which phenotype each SNP was found in.

115
116 Once uploaded, a preview of the first ten rows of the dataset will appear in the main panel. Below this 117 table is a series of selection boxes that allow the user to specify which columns in the file to use for 118 plotting. This selection method removes the complexity of requiring the input file to either have 119 columns with specific names or columns in a specific order.  Reviewing Manuscript 147 axis scale. By default, the software will automatically scale the y-axis based on the range of the 148 selected data. The browser will only display 5000 points total (see Limitations section). If there are 149 more than 5000 points in the subset of tracks being plotted, the browser will use the y-axis column to 150 rank the SNPs and take only the top 5000.

151
152 After the user has selected the appropriate parameters, clicking the submit button will trigger a tab 153 change to the Whole Genome View visualization tab (Figure 2). Conveniently, once submitted, the 154 software will remember the selected settings for this dataset on future visits and automatically Scrolling over these tracks displays a tooltip with a description of the gene and clicking genes in the track opens a separate browser window displaying information about the gene from an external database. The displayed gene, as heavy metal transporter, is a likely candidate for effecting cadmium accumulation in sorghum germplasm.