To the Editor:

The ability to translate large-scale genetics and genomics data into biological knowledge has not kept pace with our ability to generate these data sets. As a consequence, a major bottleneck in biomedical research has become access to data within a computational workspace that allows for robust, collaborative analyses. One innovative solution is to bring together scientific data, code, tools and disease models into an open commons or workspace, for example, the Synapse platform of Sage Bionetworks1. This environment allows for real-time sharing of large genomic data sets, continuous peer review and rapid learning within a system constructed to provide data access in a manner aligned with the informed consent provided by patients and research participants.

This crowdsourcing approach has been used to predict breast cancer survival from clinical and omics data2 and was suggested as a way to find new drugs3 by soliciting contributions from a large online community collaborating or competing to answer an inherently difficult but important question4. Researchers initiating an open challenge invite solutions but also incentivize the process by offering new data, a process in which the participants' methods can be assessed by testing their predictions against previously unseen data sets. This year, Sage and DREAM (Dialogue for Reverse Engineering Assessments and Methods) are running four open challenges (http://www.sagebase.org/challenges-overview/2013-dream-challenges/).

Here we announce the challenge to develop genetic predictors of response to immunosuppressive therapy in a common autoimmune disease, rheumatoid arthritis (RA). Disease-modifying antirheumatic drugs such as those that block the inflammatory cytokine tumor necrosis factor-α (known as anti-TNF therapy) are not effective in all patients with RA, with up to one-third of such patients failing to enter clinical remission after a standard course of therapy5. Moreover, the biological mechanisms underlying this failure are unknown, limiting the development of clinical biomarkers to guide either this therapy or the development of new drugs to target refractory cases.

The Rheumatoid Arthritis Responder Challenge is for teams to build the best genetic predictor of response to anti-TNF therapy. There are two phases to the challenge: discovery and validation (Fig. 1). In the discovery phase, teams will utilize genomic data sets—several of which will be generated for the purposes of this challenge—and a variety of analytical methods to build predictive polygenic models of treatment response. We recently published a genome-wide association study (GWAS) in 2,700 patients with RA treated with anti-TNF therapy6. Our GWAS data indicate that the genetic architecture of the anti-TNF response is probably highly polygenic, similar to what has been observed for other complex traits, such as risk of RA7. Importantly, our challenge will incorporate a new GWAS data set, which will be used in the validation phase, in which models built in the discovery phase are tested. The data set of 1,100 patients with RA treated with anti-TNF therapy will be made available though a public-private partnership between the Consortium of Rheumatology Researchers of North America, Inc. (CORRONA) and the Pharmacogenomics Research Network (PGRN) sponsored by the National Institute of General Medical Sciences (NIGMS) and the US National Institutes of Health (NIH).

Figure 1: Overview of the Rheumatoid Arthritis Responder Challenge.
figure 1

There are two phases to the challenge. In phase 1 (discovery), analysts build genetic models of response to anti-TNF therapy using SNP data from a GWAS of 2,700 patients with RA. To facilitate model building, additional genomic data will be made available. In a model of open collaboration, participants will use Synapse to post code, share insights and engage in rapid learning prepublication. In phase 2 (validation), models will be posted, tested and scored in an independent GWAS data set of 1,100 patients with RA treated with anti-TNF therapy. To complement challenge-assisted peer review (which occurs in both the discovery and validation phases), conventional peer review will have access to Synapse to understand the iterative process of model building. Synapse will allow study investigators to respond to peer-review critiques and resubmit versions of their models and studies.

A unique component of our Rheumatoid Arthritis Responder Challenge is the diversity of participation across a number of groups from academic institutions, private foundations and for-profit companies. In addition to support from CORRONA and PGRN, we received funding from pharmaceutical companies (see complete list on our website; link below) and a private foundation (the Arthritis Foundation) to support the public commons. We also received support from the Arthritis Internet Registry (AIR) and the Broad Institute to generate new genomic data sets, as well as in-kind support from a large number of academic collaborators from across the world to make GWAS data available in the discovery phase. We anticipate that a winning classifier could enable a follow-on prospective clinical trial within the group of appropriately consented patients in AIR.

Through Synapse, analysts who are inclined to establish collaborations will have the opportunity to see in real time the models that others are using so that each team can learn from the others (Fig. 1). A leaderboard will show the relative performance ranking of the different teams on the basis of a crossvalidation strategy designed to minimize overfitting. During the discovery phase, teams that choose to collaborate with each other will have the opportunity to check each other's algorithms for readability, speed and reproducibility. Then, during the validation phase, each team will submit computer code, which the Sage-DREAM team (http://www.sagebase.org/) will test in Synapse to establish whether it runs as expected to predict if a subject is an anti-TNF therapy responder or nonresponder on the basis of the GWAS data. Predefined performance metrics will be used to objectively determine the accuracy of the predictions, their statistical significance and the final performance ranking of the participating teams. The team that develops the most highly predictive model will be deemed the 'winner', with precise attribution of contributor roles going to all members of teams that contributed to building the final consensus model.

The best-performing models, therefore, will have passed a test of performance that is outside the realm of, and complements, traditional peer review. Indeed, this stringent test of method performance can be used as an enhanced way of publication vetting, what we call 'challenge-assisted peer review'. Traditional peer review is essential for ensuring the clarity, originality, contextualization and logical thread of a discrete set of work that is ready to be used by researchers in the form of a published article. However, the complexity of working with omics data—entailing multiple analytical decisions, computational simulations and statistical calculations—means that referees are challenged to follow and check the components of even a traditional research paper. In our Rheumatoid Arthritis Responder Challenge, we will explore the feasibility of enhancing the reliability and transparency of conventional peer review in partnership with Nature Genetics. This can be achieved if the referees and authors of the paper reporting on the best-performing methods in the challenge are willing to leave their comments openly (yet anonymously) on the Synapse platform (Fig. 1). We anticipate that the challenge-based assessment of accuracy will provide an objective metric of performance and a comparison with state-of-the-art analytical methodologies that will greatly enhance the task of refereeing a body of work with more quality control than is currently provided by conventional peer review.

In conclusion, we believe that the Rheumatoid Arthritis Responder Challenge is an apt use of crowdsourcing in human genetics to gain insight into clinical prediction and disease biology. Details of the challenge, including the rules by which the models will be judged, can be found at https://synapse.prod.sagebase.org/#!Synapse:syn1734172.