Open Data Partnerships between Firms and Universities: The Role of Boundary Organizations

Science-intensive firms are experimenting with ‘open data’ initiatives, involving collaboration with academic scientists whereby all results are published with no restriction. Firms seeking to benefit from open data face two key challenges: revealing R&D problems may leak valuable information to competitors, and academic scientists may lack motivation to address problems posed by firms. We explore how firms overcome these challenges through an inductive study of the Structural Genomics Consortium. We find that the operation of the consortium as a boundary organization provided two core mechanisms to address the above challenges. First, through mediated revealing, the boundary organization allowed firms to disclose R&D problems while minimizing adverse competitive consequences. Second, by enabling multiple goals the boundary organization increased the attractiveness of industry-informed agendas for academic scientists. We work our results into a grounded model of boundary organizations as a vehicle for open data initiatives. Our study contributes to research on public-private research partnerships, knowledge revealing and boundary organizations.

'All human genomic sequence information (…) should be freely available and in the public domain in order to encourage research and development and to maximise its benefit to society' (Human Genome Project, 1996).

Introduction
The above quote expresses the 'open data' rule that constituted a cornerstone of the Human Genome Project. The disclosure regime of this large-scale research programme was built on the principle of free, unrestricted and timely access to research findings for all interested parties (Murray-Rust, 2008;Molloy, 2011). In the Human Genome Project, public science was pitched against for-profit entities with competing projects based on proprietary intellectual property (Williams, 2010). Yet increasingly firms themselves participate in and even instigate open data initiatives, either by releasing data to academic communities with no restriction or by supporting the generation of open data. Partnerships sponsored by pharmaceutical companies, such as the SNP 1 consortium and the Genetic Association Information Network (GAIN) made their data publicly available (Cook-Deegan, 2007;Pincock, 2007;Allarakhia and Walsh, 2011).
Partnerships with universities, aided by public or charity grants, are natural territory for open data practices, given the prominence that public knowledge creation has in the norms and traditions of academic science (Dasgupta and David, 1994). The propagators of open data in corporate R&D argue that by integrating their R&D programmes more closely with those of open academic communities, firms may reap significant benefits for both the quality and the volume of their innovation activity (Melese et al., 2009).
Nevertheless, participation in open data partnerships with universities is likely to complicate firms' attempts to capture value from research. A first challenge is that firms may fear that proprietary information about their R&D agendas and technologies is publicly disclosed (Alexy et al., 2013), given that open data initiatives operate with minimum intellectual property protection and disclose all research results with no restriction. The second challenge, from a firm's viewpoint, is to motivate outsiders to work on problems that are valuable to the firm, without being able to offer IP-related incentives (von Hippel and von Krogh, 2006;Levine and Prietula, 2014). In other words, in open data initiatives which, unlike traditional firm-sponsored contract research, are strongly aligned with academic 1 SNPs are 'single nucleotide polymorphisms'. They indicate possible mutations of a gene, and can be used as disease markers.
conventions, firms may struggle to persuade scientists to work on firm-defined priorities rather than their own personal research agendas.
Extant research provides limited insight into how firms can address these challenges.
The literature on research partnerships between firms and universities is largely focused on contexts with traditional, IP-centred appropriation mechanisms in place (Link and Scott, 2005;Bercovitz and Feldman, 2007) but says little about how open data partnerships ought to be structured and governed. 2 In this paper, we therefore address the following research question: What partnership characteristics enable firms to benefit from open data collaboration with academic researchers?
To explore how firms overcome the challenges of open data initiatives, we examined the structures and practices of an international life sciences partnership. We present an inductive study of the Structural Genomics Consortium (SGC) which led an open data programme involving firms and academic scientists. Supported by charity, government and industry funding, the SGC brought together pharmaceutical firms including GlaxoSmithKline, Novartis and Merck, with the Universities of Toronto and Oxford, and the Karolinska Institutet (Stockholm). The SGC's mandate was to determine the three-dimensional shape of proteins and release this knowledge into the public domain without restriction. This information is seen as vital to the discovery of new drugs to combat common human diseases, including cancer, diabetes and inflammation.
We draw on our empirical analysis to develop a grounded model of open data in university-industry partnerships. We propose that open data university-industry partnerships structured as boundary organizations (O'Mahony and Bechky, 2008) are particularly adept at generating productive outcomes while mitigating firms' challenges. Boundary organizations accomplish this via two core mechanisms: mediated revealing and the enabling of multiple goals. The former allows firms to reveal their research problems to external problem solvers in a way that reduces the threat of unintended knowledge disclosure and simultaneously allows them to shape the collective research agenda. In turn, by enabling multiple goals -in this case the concurrent pursuit of both industrial and academic goals -the boundary organization broadens the objectives and activities of the partnership so they align with the ambitions and professional practices of academic researchers which in turn helps to ensure their participation.
Our findings contribute to previous work by considering the implications of open data for both the rationales underpinning research partnerships between firms and universities and questions of organization design. In particular, we demonstrate the role that boundary organizations can play in orchestrating industry-informed, large scale scientific work that has the potential to advance and transform the knowledge commons from which science-based sectors draw.

Open data in university-industry partnerships
Open data partnerships provide universal and free access to research outputs including results, data and sometimes materials (Murray-Rust, 2008;Molloy, 2011). The open data approach is in contrast not only to commercial emphasis on intellectual property rights, but even to classic open science in which only the final outputs are shared (Boudreau and Lakhani 2015;Franzoni and Sauermann, 2013). Various scientific communities have recently adopted increasing openness, including the free sharing of data on which outputs are based (Reichman et al., 2011).
This development was partly spurred by the increasingly widespread use of computer code and large datasets which makes the large-scale sharing of data both feasible and economical (Boulton et al., 2011). The same technological affordance has facilitated 'crowd science' experiments where problem solving is pursued by a large number of dispersed contributors (Franzoni and Sauermann, 2013). Particularly in the life sciences, a further driver of open data is the trend towards larger scale initiatives designed to address the complex, interconnected nature of biological systems which has tested the limits of the traditional small-scale approach in biology, centered around individual investigators (Swierstra et al., 2013). The Human Genome Project (HGP) absorbed $3b of funding and used an open data approach to facilitate coordination across thousands of researchers around the world, and the subsequent exploitation of the generated knowledge (Wellcome Trust, 2003). Similarly, the Census of Marine Life project resulted in the Ocean Biogeographic Information System (OBIS) database, the world's largest open access repository of marine life data .
The sharing of data in areas such as genetics, clinical trials and climate science is supported by various types of stakeholders, including research funding organizations, patient groups, interest groups and not least academic scientists themselves. They argue that open data enables scientific communities to validate and substantiate the results of previous research and thereby enhance its quality, particularly in areas where conflicts of interests are at play such as pharmaceutical research (Washburn, 2008).
Below, we first contrast the new open data approaches with traditional approaches in university-industry collaboration and then outline the specific challenges that open data collaborative initiatives create for for-profit firms.

Research partnerships between firms and universities
Research partnerships are innovation-based relationships focusing on joint research and development (R&D) activities (Hagedoorn et al., 2000). Firms engage in research partnerships because they allow investments in the creation of new knowledge to be shared across multiple participants. They also provide firms with access to complementary knowledge, broaden the scope of their R&D, and create new investment options in high-risk contexts (Hagedoorn et al., 2000;Perkmann et al., 2011). Especially in science-intensive sectors such as chemicals and pharmaceuticals, universities represent important partners and sources of innovation for firms (Mansfield, 1991;Cohen et al., 2002). Firms tend to view university research as complementary (rather than substitutive) to internal R&D (Rosenberg and Nelson, 1994;Hall et al., 2001). Access to key personnel represents an additional important motive for firms to work with academia, resulting both in "information gifts" from highly specialized academics as well as opportunities for hiring students and staff (Hagedoorn et al., 2000).
Partnerships are not without challenges. Chief amongst these is the concern that a firm may struggle in appropriating the knowledge outputs generated in the partnership (Teece, 1986). Compared to inter-firm partnerships, such concerns are even more pronounced in university-industry partnerships (Hagedoorn et al., 2000). There are two aspects to this problem. First, firms' efforts to appropriate knowledge arising from partnerships may be misaligned with open science practice. Academics may prefer generating publishable research output and contest the formal requirements involved in creating protected knowledge assets (Murray, 2010). At the very least, this may lead to an uneasy co-existence of open publishing and intellectual property protection (Gittelman and Kogut, 2003). Second, partnerships involving universities often attract grants from government or charities. This means that the universities will in most jurisdictions make ownership claims over intellectual property generated (Kenney and Patton, 2009). Regulations such as the Bayh-Dole Act in the United States stipulate that universities can claim IP ownership over outcomes from government funded research (Mowery et al., 2001). In such a context, the higher the share of public funding in a partnership, the more pronounced firms' concerns about appropriation will become (Hall et al., 2001).
Firms respond in several ways to the appropriability challenges pertaining to partnerships with universities (Panagopoulos, 2003). First, firms tend to prefer larger collaborations when public partners are involved. In this case, appropriability has already been diminished because the presence of a larger number of private and public partners not only stipulates the shared ownership of intellectual property, but also increases the risk of unintended knowledge spill-overs (Link and Scott, 2005). Also, participation in larger partnerships carries a reduced cost, and hence implies an improved balance between risks and rewards (Saez et al., 2002). Second, firms are often given the first right of refusal for licencing intellectual property arising from a partnership (Perkmann and West, forthcoming).
Firms can access the results from joint research with conditions that were determined ex-ante, implying a reduction of uncertainty relating to the appropriation of partnership outputs. Third, firms may choose partnerships in 'pre-competitive' areas where intellectual property appropriation is less important than alternative benefits. For instance, partnerships with universities can help firms develop new areas of expertise from which they may subsequently benefit (Powell et al., 1996).
Extant research on university-industry partnerships has mostly focused on the question of "primary" appropriability, that is the control and ownership of intellectual property created within the partnership (Ahuja et al., 2013). This focus is mirrored in universities' efforts to assert ownership of the outputs from research collaborations (Kenney and Patton, 2009).
Against this background, we lack insight into the benefits accruing to firms from partnerships that entirely relinquish intellectual property in the first place. For firms, open data policies pose a conundrum: On the one hand, the absence of intellectual property rights makes it difficult to gain returns to investments, yet on the other hand the sheer scale of these collaborative efforts makes them too important to ignore, particularly if large numbers of scientists are potentially available to work on topics of interest to firms. Next, we discuss the considerations relevant for firms with respect to participation in open data initiatives.

Challenges facing firms in open data research partnerships
Compared to conventional research partnerships, open data partnerships pose two significant problems which may temper firms' motivation to engage in such initiatives. The first is that of revealing; the more a firm attempts to align the efforts in an open data research programme with its R&D priorities, the more it will have to reveal about the problems it is addressing within its proprietary R&D. Revealing has both advantages and disadvantages for firms -they may benefit from revealing problems or solutions as this may allow them to shape the collaborative behaviour of others, and thereby enhance their competitive position (Alexy et al., 2013). The reaping of such benefits has been documented for various contexts, including mining during the industrial revolution, 19 th century iron production, and contemporary embedded Linux software (Allen, 1983;Nuvolari, 2004;Henkel, 2006).
Revealing information about their technologies may also discourage others from competing in the same technology areas (Clarkson and Toh, 2010). Yet, by guiding the academic community to address specific scientific problems, a firm discloses information about its active R&D areas to its competitors (Arrow, 1971;Cohen et al., 2000). Overall, in an open data scenario, while an excessive degree of 'problem revealing' (Alexy et al., 2013) (Murray and O'Mahony, 2007). Since academic scientists are embedded in the academic status hierarchy and career system that differs considerably from private sector R&D, monetary incentives are unlikely to be effective. The primary objective for many participating researchers will be to improve their standing and position in their chosen academic community, even at the expense of pursuing commercially valuable opportunities or personal monetary gains (D'Este and ). An open data initiative will have to provide suitable incentives that are aligned with academic scientists' desire to be rewarded for their work within their respective communities.
Having outlined the challenges for firms arising in open data partnerships, in this study we will explore how they should be organized to enable firms to address the challenges while garnering benefits from the partnership.

Site: The Structural Genomics Consortium
We studied the Structural Genomics Consortium, a major initiative with laboratories at the Universities of Oxford and Toronto, and the Karolinska Institutet (Stockholm), established in 2004. The consortium was funded by the Wellcome Trust, pharmaceutical companies GlaxoSmithKline, Novartis and Merck, government organizations, and several smaller foundations. The Wellcome Trust is one the world's largest medical research foundations, and the participating firms were in first, third, and sixth position respectively, for global market share of prescription drugs in 2012. 3 The SGC's objective was to identify the threedimensional shape of thousands of human proteins with potential relevance for drug discovery. The physical shape of proteins affects how they interact with other molecules in the human body. Thus, knowledge of proteins' structural characteristics can aid the discovery of new drugs and exploration of the molecular mechanisms that underpin them.
The pharmaceutical industry provides an ideal setting to study how firms implement open data initiatives and overcome their challenges. Because in this sector proprietary intellectual property has traditionally played a strong role, potential tensions arising from open data were likely to be particularly accentuated.

Data collection
We used an inductive, qualitative approach to study the SGC which is suitable for the in-depth exploration of phenomena that are not well understood (Eisenhardt, 1989). Data collection involved studying archival documents, interviewing and observation. Our archival documents were drawn from the official minutes of 16 meetings of key SGC bodies, including the Board of Directors, held between 2005 and 2007 (see Appendix A). The minutes provide records of the organization's activities and decisions but also allowed the inference of more subjective agendas and interests of various participants. We also perused additional SGC documents including the Memoranda on Articles of Association, the Funding Agreement, annual reports, press communications and presentations. The total word length of all documents is approximately 100,000.
We further conducted 22 semi-structured interviews with SGC staff, members of the board and the scientific committee, senior management, and scientists. The interviews covered more than half of the individuals involved in the governance and management of the SGC, a sample of the researchers, and an external observer. We asked informants to provide us with their version of the SGC's origins and history as well as their own motives, objectives and role within the consortium. We also requested they describe key organizational processes, specifically those relating to aspects of revealing and motivation. All but four interviews were recorded and transcribed verbatim (see Appendix B). Triangulation with archival meeting minutes allowed us to control for potential self-reporting and retrospective bias in the interview evidence.
A third set of data was based on observations and informal conversations in London and Toronto, and by phone, between 2007 and 2011. The informal discussions were with SGC managers, sponsors' representatives, and external observers, including some critics. The first author attended three SGC workshop held in 2007 and 2011, and had numerous informal conversations with participants as well as outsiders. After each interview or conversation, we created a memo, summarizing insights and exploring avenues for theorizing. We sought to obtain external validity by triangulating information across multiple sources, spanning insiders and outsiders.

Data analysis
Our inductive analysis proceeded in several steps. We first generated a case narrative, depicting the SGC's operating context, the organization's development, and its structures and practices. We used this account to generate a 3200-word report which we sent to all interviewees. Two respondents provided detailed feedback and corrected factual mistakes while others provided cursory feedback.
Using the qualitative data analysis software, NVivo, we conducted an initial round of first-order (open) coding (Corbin and Strauss, 2008) on all archival documents, interview transcripts and memos. Guided by our research question, we coded the activities of the SGC with respect to how these addressed the key challenges associated with open data. Examples for first-order codes included 'pharma members' ability to nominate targets' and 'making allowance for scientific curiosity' (see sample extracts in Table 1). We validated codes by ensuring they emerged from multiple instances, otherwise we discarded them.
- Table 1 about We next moved to second-order (axial) coding and established relationships between the open codes by searching for connections between them. For instance, we grouped the first-order codes 'pharma members' ability to nominate targets', and 'pharma members shape SGC strategy to focus on targets relevant for drug discovery' to form the second-order category of 'enabling firms' influence on research agenda'. Throughout, we constantly moved backwards and forwards between our evidence and the emerging categories, helping us to render our results as robust as possible. Our final step was to work our second-order codes that were still fairly close to the phenomenon into a grounded theory model that abstracts from the specificities of our case, and posits theoretical mechanisms potentially applicable to a wider range of empirical situations.
Below, we first present our raw findings, reflecting the results of our first-order and second-order coding exercise, before presenting our grounded model in the subsequent section.

History and features of the Structural Genomics Consortium
During the 2000s, there was an increasing recognition in the pharmaceutical industry that its research productivity was slowing. Despite escalating R&D expenditure, the number of novel drugs failed to rise proportionately (Paul et al., 2010). 'Big Pharma' responded by reducing R&D expenditure and engaging in external collaboration (Garnier, 2008;Schuhmacher et al., 2013). In particular, public-private partnerships appeared attractive as many industry insiders believed that by relying on public science they could reduce the high failure rates in drug development (Munos, 2009 (Edwards, 2008). The funders of the consortium believed that this approach would accelerate the collective creation of knowledge underpinning the discovery of new drugs. Edwards warned that 'the predominant methods of drug research are too patent heavy, leading to duplicated effort and lost opportunities for significant productivity (…) Intellectual property is killing the process of drug discovery' (SGC press communication). Whilst close interaction with academia had been common in the pharmaceutical industry for many years (Cockburn et al., 1999), the SGC model differed from traditional collaboration: participants were committed to relinquishing intellectual property ownership; collaboration among competing pharmaceutical firms was emphasized; and research funds were sourced internationally via an organization that maintained flexible ties with universities.
In Table 2, we provide a summary of the interests held by each type of consortium participant, and their perceived benefits. While the table illustrates that the various parties' agendas partly diverged, it also shows that each derived benefits from participation.
We use the term 'participants' to refer to individuals involved in the SGC, including its lead scientists (CEO and chief scientists), members of the board of directors (the 'directors'), members of the scientific committee and the scientists working for the SGC. When we speak of the SGC as an organizational actor, we refer to the collective actions of its principal officers (the lead scientists), the directors and the scientific committee members.

Revealing and confidentiality
Trust. The Wellcome Trust was the largest contributor of funding, and the pharmaceutical firms contributed approx. 5% each.
As part of our investigation, we explored how the pharmaceutical companies aligned SGC's research with their own proprietary R&D activities and the implications this had for their revealing behaviour. The challenge was to steer the SGC's work towards areas of importance to a firm but simultaneously avoid detailed problem revealing to the extent that it would benefit competitors (Alexy et al., 2013). Within the SGC, this challenge was addressed in two ways. First, the SGC created procedures through which the participating firms could influence its work programme. Second, the SGC designed these procedures in a way that restricted information spill-overs.
Enabling firms' influence on the research agenda. The consortium designed a decision making process that enabled the pharmaceutical companies, like the other sponsors, to shape the organization's research programme. The process entailed compiling a list of proteins to be resolved by the SGC scientists. Every sponsor was allowed to submit a 'wish list' of 200 targets of which 20 could be designated as priorities. The nominations were examined by the scientific committee, which produced a master list of proposed targets for final approval by the board. At the time of our study, the list included a few thousand proteins.
For the pharma companies, the ability to shape the research agenda was a critical objective. They were keen to focus the SGC's work on proteins that were likely to be relevant for human health, rather than those that were of more general scientific interest. The focus on such 'human targets' was attractive because of their potential for informing the development of new drugs.
Maintaining confidentiality. The wish lists proposed by the SGC's pharma members contained those proteins regarded as important for their R&D activities. Revealing these lists openly may have allowed competitors to infer a company's R&D priorities. So, despite its insistence on openness in many other respects, the SGC kept these lists confidential even from the board of directors and scientific committee; in this way, a sponsor's interest in a particular protein was never revealed to another sponsor. The office of the CEO combined the individual wish lists into an anonymous 'master list' that was circulated among the management, board of directors and the scientific committee.
As an additional safeguard, the master list was not publicly disclosed. Only when a target protein was resolved was its structural information openly deposited, but the identity of proteins that the consortium failed to resolve was never disclosed. The confidentiality requirements also extended to parties collaborating with SGC scientists and external collaborators were required to sign a confidentiality agreement. The confidentiality formula was regarded as a decisive benefit by the private-sector sponsors, particularly with respect to a small number of high-priority targets. The presence of this rule meant that pharma members were prepared to entrust the consortium with more of their sensitive high priority targets, than had the wish lists or the master list been publicly disclosed.
The confidentiality formula was maintained even though it was criticized by some academic outsiders. Because it was not known which proteins the SGC was working on, these individuals feared that they may be expending parallel effort on resolving the same proteins and that the SGC would likely succeed in doing so first because of its economies of scale.
Even in the face of such criticism, the SGC leadership maintained that this trade-off was necessary in order to maintain the partnership with corporate sponsors. Exceptions to the confidentiality formula were only made when the public interest overrode confidentiality concerns, as for instance when targets related to malaria were prioritised for global health impact, or when the information was an essential part of journal articles to be imminently published.

Motivating academic researchers
The SGC pursued several strategies for attracting academic scientists to participate in its endeavour. Some of these were aimed at the SGC-internal scientist community while others were tailored for engagement with external scientists.
Promoting academic goals. The SGC encouraged its academic workforce to engage in research beyond mapping the proteins on the master list. Such activities were not always aligned with sponsoring firms' interests, but facilitated the career progression of participants and increased the prominence of SGC as a research institution.
First, SGC scientists were encouraged to pursue 'follow-on' research on the characterized proteins and publish results in peer-reviewed articles. This meant studying how they linked to and reacted with other molecules, such as inhibitors. The investigation of these mechanisms was seen in the academic community as more demanding and interesting, than resolving the protein structures per se. On one occasion, the SGC leadership reduced the amount of proteins to be resolved by 15% each quarter. This measure was enacted to 'enable [the SGC staff] to utilize their intellects fully' by increasing the time at their disposal to pursue personal research programmes, and thus achieve 'higher overall scientific impact' of the SGC (d4). This was thought to improve staff retention because it enabled the researchers to publish more high impact articles and thereby improve their career prospects in academia.
Second, the SGC allowed the scientists to tackle proteins that were not on the master list, even though the SGC had solicited suggestions from the academic community when it compiled the list. The freedom to explore structures outside the master list was granted when it promised to add 'significantly to scientific understanding' with a prospect of academic publications. At one site, 17 of its 114 resolved structures were outside the master list and had been chosen by the researchers themselves. These structures were of less immediate interest to drug discovery but of more general scientific relevance. In effect, the SGC provided its scientists with organizational slack so they could pursue their scientific curiosity, leading not only to publications, but also to increased knowledge and capabilities.
Adopting academic practices. The SGC sought to emulate core features of academic environments even though strictly speaking it was autonomous from academia. The SGC did not employ researchers directly but disbursed funds to universities so they could in turn employ the researchers using terms and conditions familiar in academic contexts. The SGC also sponsored academic activities, such as seminars and visiting scholarships, it actively encouraged collaboration between its scientists and other researchers at its own university sites and in other universities (i16). In September 2009, SGC Oxford had 70 collaborating researchers and SGC Toronto had 37 active collaborations (d23). The SGC also ensured its staff could obtain honorary appointments within the respective departments at the universities hosting the SGC laboratories. The academic outlook of the organization was reinforced by the fact that while industrial sponsors shaped the overall work agenda through their boardroom representation, they had minimal impact on the day-to-day pursuit of the research.
Overall, the design of the SGC as an autonomous organization meant that it was able to provide a work context that differed little from traditional academic settings. By using these practices, the SGC could attract high calibre researchers who used their employment to underpin a career in mainstream academia. The important overall insight is that the SGC achieved this by broadening the agenda from a pure industrial focus to accommodate both industrial and academic goals.

A model of open data partnerships between firms and universities
The Structural Genomics Consortium shares many features with organizations characterized in previous literature as boundary organizations. These stand between parties with divergent interests and allow them to collaborate (Guston, 2001;O'Mahony and Bechky, 2008;Miller, 2001;Howells, 2006)

Mechanisms enabled by the boundary organization
Generalising our findings, we identify two key mechanisms by which boundary organizations enable open data partnerships (see Figure 1). The first is 'mediated revealing', which involves an intermediary to aggregate and anonymize information before it is passed on to a different party. The SGC aggregated firms' wish lists of proteins, and compiled a master list for the scientists that did not disclose which target was nominated by whom. Through this mechanism the firms fundamentally shaped the direction of work pursued by the academic researchers in the consortium and their external collaborators', without revealing which proteins specifically they were interested in. Previous work has pointed out that firms use 'selective revealing' (Henkel, 2006;Alexy et al., 2013) as a means of balancing the benefits of disclosing information to externals and the risk for this information to be adversely used by their competitors. While in selective revealing firms' information is directly exposed to rivals, mediated revealing establishes an additional safeguard by inserting a boundary organization between firms and externals. This arrangement enables firms to disclose information to externals that may be too sensitive to be disclosed directly can be disclosed to an intermediary. This in turn increases the potential benefits from revealing as firms are prepared to share their more important problems.
Mediated revealing. Mediated revealing requires that the interacting parties trust the boundary organization. Trust refers to the confidence the involved parties have that an actor will adhere to mutually agreed lines of action (Nooteboom, 1996). The concept of 'trusted intermediary' (Rai et al., 2008) aptly encapsulates the specific role of the SGC. In information systems, trusted intermediaries allow system owners to ensure that certain types of information are separated from others (Pavlou and Gefen, 2004). The SGC played an equivalent role by brokering information between firms and academic researchers. Like many trusted intermediaries, its mission was confined to pursuing a specific objective -managing an open data research agenda. By confining its activities to this focused objective, and keeping its distance from the organizations involved, the SGC succeeded in acquiring trust.
The kind of boundary organization characterized above shares some features with specialized innovation intermediaries that 'crowd-source' problem solutions via broadcast search and innovation contests (Jeppesen and Lakhani, 2010) or orchestrate the trading of knowledge (Dushnitsky and Klueter, 2011). Like the SGC, these intermediaries may anonymise the identity of the problem owner and they are trusted by the revealing parties.
However, they do not practice open data and often require the problem solvers to assign their rights to the problem owner for a reward. Unlike these entities, boundary organizations engaged in open data initiatives face the additional challenge of having to engage potential innovators by offering non-monetary incentives. This consideration connects with issues of motivation addressed below.
Enabling multiple goals. The second key mechanism consists in enabling multiple In an open data scenario, firm benefits from openly generated scientific knowledge will depend on whether they can pique the interest of academic researchers in the topics they propose. We suggest that boundary organizations can accomplish this by enabling multiple goals to be pursued in the context of the collaboration. While for many of the SGC's activities the interests of academia and industry were aligned because the scientifically interesting proteins were also those likely to inform drug discovery, the SGC allowed some activities to be of purely academic interest. In other words, while goals sometimes overlapped, this was not always and not necessarily the case.
The SGC resolved goal conflicts by allowing multiple goals to co-exist instead of optimizing its activities and costs around either purely industrial or purely academic goals.
The SGC pursued the sponsor firms' primary goal of mapping the protein structures but also encouraged the pursuit of goals concomitant with academic science research driven by curiosity and academic publishing (Owen-Smith, 2003). Goal co-existence allowed the SGC to attract high-calibre academic scientists and, as a second-order benefit, helped it connect with academic collaborators further afield. While the concurrent pursuit of multiple partially incompatible goals may be seen as ineffective (Simon, 1964), it allows a boundary organization to garner more financial resources and access a wider spectrum of human capital.
Boundary organizations are well suited to enable the parallel pursuit and alignment of separate goals because their interstitial position allows them to manage operations by maintaining the social boundaries between the different participants, thereby avoiding potential conflicts that could arise from direct coordination efforts (Lamont and Molnar, 2002). A further benefit of maintaining social boundaries is that a boundary organization has a relative prerogative over how resources are allocated and how production is controlled (O'Mahony and Bechky, 2008), allowing for resources to be earmarked for the pursuit of different goals. Finally, boundary organizations are likely to be familiar with the norms and practices prevalent in the different domains in which their stakeholders operate. In our case, this allowed the SGC to build organizational procedures that created a context familiar to academic researchers and simultaneously remain aware of the industrially led priorities advocated by the firms. While these capabilities of boundary suggestions have been characterised by previous research, the novel insight emerging from our study is that boundary organizations can also act as trusted information brokers that aggregate and selectively distribute information. In the case of the SGC, this function that underpinned mediated revealing was a crucial ingredient of the organization's role in enabling open data with industry involvement.

Implications for university-industry partnerships
By coordinating mediated revealing and enabling multiple goals, boundary organizations such as the SGC provide an organizational solution to the challenges that firms face when seeking to instigate and shape large-scale open access initiatives. They help to attract high-calibre academic scientists to contribute to scientific grand challenges because they can reach a larger 'workforce' of academic researchers than conventional, un-mediated partnerships, and thereby achieve enhanced economies of scale. By using the potential of open data to draw on larger 'crowds' of scientists (Franzoni and Sauermann, 2013), firms may participate in shaping the knowledge commons of industries or sectors in a way not achievable by conventional research collaborations.
Previous research has found that participating in the development of open science can be an important motive for firms to engage in collaboration with universities (Powell et al 1996;Cohen et al., 2002;Murray, 2002;Simeth and Raffo 2013). Our study contributes to this body of work in two ways. First, we provide a framework for understanding the conditions under which firms would participate in open data which compared to open science represents a more radical arrangement in terms of publicly sharing information and results.
Second, we emphasize the nexus between the firms' ability to appropriate benefits from public science and their ability to shape its course. In face of the generally inverse relationship between the extent to which firm shape a research programme and their willingness to share the results publicly, our model suggests how boundary organizations can moderate this relationship in ways that allows firms greater influence of public research programmes without compromising their commercial interests.
There are however boundary conditions that will limit the applicability of open data partnerships. First, the described benefits will apply to open data initiatives particularly involving multiple firms. A single firm wishing to attract academic scientists can ensure confidentiality by setting up suitable contracts with specific universities or groups of universities. When several firms are involved, however, they face the challenge of having to disclose potentially sensitive information to each other. Here a boundary organization provides an organizational solution that protects firms' sensitive information from being disclosed to each other. The more competitive the research context is, the more relevant boundary organizations and mediated revealing will become for accomplishing joint open data initiatives.
In contrast with proprietary research collaboration, the ability to benefit from open data is particularly contingent upon time-based competition and superior complementary assets (Teece, 1986 impact of public science certainly within its own boundaries (Rhoten and Powell, 2008;Rai et al., 2008;Kenney and Patton, 2009). In this way, they can mitigate the delay or reduction of follow-on academic research and product innovation caused by the presence of intellectual property protection (Murray and Stern, 2007;Williams, 2010).
Our study not only demonstrates that the benefit of enhanced cumulative innovation provided by open data can be achieved even when for-profit firms are involved. Furthermore, one may plausibly argue that the involvement of industrial partners increases the relevance of the science produced for tackling societal challenges such as the development of new drugs.
While industry often fails to cover areas of societal need -as exemplified by the failure in orphan drugs -the involvement of science users in shaping the agenda for scientific challenges will promote the creation of knowledge potentially instrumental for developing future innovations. Simultaneously, the risk of publicly funded research being captured by particular industry interests is reduced because no intellectual property is produced and no privileged access to research results is provided. In this sense, the SGC exemplifies a way in which large-scale, decentralised and open scientific collaborations can be made more impactful by involving industry participants.