Introduction
InterMine1 is a platform for building data warehouses which includes specialisations for the life-sciences. As part of the InterMOD2 project, a number of InterMine data-warehouses have been developed and released to the public containing high-quality integrated data curated by the major model organism database (MOD) organisations. In addition, the InterMine platform is widely used by other projects, such as the modENCODE project3, as well as a range of other resources including metabolicMine4, TargetMine5, FlyTFMine6, and MitoMiner7. This means that reliable integrated data sets exist for use by researchers working in a wide range of fields in the life-sciences, which can be accessed by a common interface.
One of the features of the InterMine system is the ability to store named sets of entities, called lists, and refer to them in queries and other analysis. This allows a user, for example, to save a list of genes and reuse this saved collection easily. The InterMine system also allows specialised analysis to be performed taking advantage of the integrated nature of the data warehouse system. For example the system can run queries that aggregate information about relationships between data types, and provide indications of levels of statistical significance for the results (enrichment queries).
Until recently, the output of these list analysis tools was only accessible through the web-application built into the InterMine system. Recent work on the InterMine web services has enabled this functionality to be externalised into the list-widgets8 project: separate JavaScript-based components that can be used in third party websites. These developments have already been incorporated into the standard InterMine web-application configuration, meaning that users of the tools described here have access to the same query and display mechanisms in their own sites that are available through the standard InterMine web-application.
InterMine supports the aims of the BioJS9 initiative to provide well-designed, robust website components to application developers in order to foster code reuse and minimise duplicated effort. This leads us to contribute to the BioJS project this set of components for running list analysis tools and displaying their output, so that they may be widely distributed, and interoperate with tools from other developers.
Installation
As a JavaScript web component, these tools are designed to be run within the JavaScript virtual machines provided by modern browsers, and render to HTML pages. Installation means indicating to the remote client (the user), which resources to load as dependencies, as well as where these are located. Typically this is done by adding references to these resources in the head section of a page through the use of script element (see code sample 1). Recent practice suggests loading these resources in at the end of the body improves page load time. The dependencies that must be loaded to use these tools are listed in Supplementary materials A.
The BioJS InterMine list analysis library needs to be downloaded from the BioJS registry10 and hosted in an accessible location.
Listing 1. Loading the list analysis tools library.
<script src="Biojs.InterMine.ListAnalysis.js"></script>
Usage
Once the BioJS component and its dependencies are loaded, the component itself may be instantiated, which creates a new list analysis displayer, inserts it into the document, and populates it with the appropriate data by calling to the InterMine web-services. This requires that an element exists within the document (see code listing 2) into which the component can be inserted.
Listing 2. The target document element
<div id="list-analysis-example"></div>
The JavaScript code to instantiate the component refers to this element as the target, and provides the other arguments required to specify which list we wish to analyse, the url of the service where that list is to be found, and which specific analysis tool we wish to run. The example below uses a list of genes encoding putative Drosophila melanogaster transcription factors made available as a public list at FlyMine11 and runs the pathway enrichment statistical analysis tool. The full list of available lists (which each user can extend by creating personal lists) and analysis tools can be accessed from the InterMine service being used.
Relationship enrichment
One category of tools is the enrichment tools, which run queries that attempt to find relationships that are statistically significant for the set of entities as a whole. For example, FlyMine11 contains both genes, loaded from sources such as FlyBase12, and biochemical pathways, loaded from sources such as KEGG13 and Reactome14. The pathways enrichment tool lists pathways of which genes in the list are members, ordered by the degree of significance for the list of genes as a whole.
For example, if one gene in a list is in a particular pathway, but none of the others are, it would be considered less significant than a pathway that all or most genes in a list belonged to. Similarly, the background probability that a particular relationship exists for an item is taken into account, meaning for example that finding a publication that lists many or even all genes for a organism, such as Clark 200715, would not be considered as significant as a publication that mentions fewer genes, but with most of them being in the list of interest.
The p-values used as measures of statistical significance are calculated by modelling the relationships as a hypergeometric distribution (as Rivals 200716 and Beissbarth 200417), which determines the probability that a relationship between two entities would be selected at random given the set of items to choose from. Let n be the number of items in the list, and N be the size of the reference population, and k be the number of items in the list which are involved in the given relationship (are mentioned in the publication, for example, or belong to a particular biochemical pathway), and M be the number of items in the reference population which share that same relationship. Then for each relationship
The options made available for multiple test correction include the Bonferroni, Holm-Bonferroni, and Benjamini Hochberg18 algorithms.
The tools in this category are all prefixed with enrichment:, and can be loaded as follows:
Listing 3. Loading an enrichment list analysis tool.
var ListAnalysis = Biojs.InterMine.ListAnalysis; var analysis = new ListAnalysis({ target: "list-analysis-example", url: "http://www.flymine.org/query", list: "PL FlyTF_putativeTFs", tool: "enrichment:pathway_enrichment" });
Once run, the component should be inserted into the document (see Figure 1). The component allows the user to adjust the parameters of the analysis, including the multiple test correction method used, the p-value threshold and the background population.
The component also allows the user to interact with the results in a number of ways, specifically: by clicking on an individual item that was matched; by clicking on a button to show a set of matches; and by clicking on a button to request that the selected items be saved to some location. All these actions cause the component to emit events, which can be listened for and handled by the host JavaScript application. For example, to alert a string such as Gene - FGBN0123 when a user clicks on the corresponding element, one might attach an event listener to capture the onClickMatch event, see code listing 4.
Listing 4. Listening for a click event.
analysis.onClickMatch(function (ident, type) { alert(type + " - " + ident); });
This enables the behaviour of the component to be integrated into the hosting application. The full listing of events and their arguments is included in the BioJS API documentation19.
The canonical example for the use of statistical enrichment in bioinformatics is enrichment of Gene Ontology (GO) terms for sequence annotations (Rivals 200716). This functionality is supported as one of the statistical analysis tools (see Figure 2), within this more generic enrichment analysis framework. The GO enrichment tool merits some further notes, however, as it supports some of the more advanced parameters.
The GO enrichment tool demonstrates the use of optional filter parameters to limit the results in some way. In the GO tool, it allows the user to select the sub-ontology they are interested in. The user can also choose to normalise the results of this tool, in this case by transcript length.
Charts
The other main category of analysis tools is the chart tools. These run aggregate queries over the items in a list, and present the information graphically in interactive charts. The InterMine system supports both numerical and categorical charting, reflected in the supported chart formats: bar charts, line charts, pie charts and scatterplots.
Loading a chart analysis tool is identical to loading a statistical enrichment tool - only the name of the tool need differ (see code listing 5).
Listing 5. Loading a chart list analysis tool.
var chart = new Biojs.InterMine.ListAnalysis({ target: "list-analysis-example", url: "http://www.flymine.org/query", list: "PL FlyTF_putativeTFs", tool: "chart:flyfish" });
This code will request data for the particular tool (flyfish), as run against the given input list (PL FlyTF_putativeTFs), and then display the results in the appropriate chart format (Figure 3). The chart tools have fewer parameters; they may take a single parameter, as detailed in the tool description available from the relevant service (e.g. http://www.flymine.org/query/service/widgets).
In most cases they do not provide mechanisms for the user to change the results displayed. They do however provide several mechanisms for the user to interact with the results displayed. The user can click on the groupings or data-points represented on the chart (see Figure 4), which allows the user to trigger the same events available to enrichment tools, which can be captured the same way (see code listing 4).
Comments on this article Comments (0)