Simulating the Social Processes of Science

An introduction to the Special Issue of the JASSS Forum. Science is the result of a substantially social process. That is, science relies on many inter-personal processes, including: selection and communication of research findings, discussion of method, checking and judgement of others' research, development of norms of scientific behaviour, organisation of the application of specialist skills/tools, and the organisation of each field (e.g. allocation of funding). An isolated individual, however clever and well resourced, would not produce science as we know it today. Furthermore, science is full of the social phenomena that are observed elsewhere: fashions, concern with status and reputation, group-identification, collective judgements, social norms, competitive and defensive actions, to name a few. Science is centrally important to most societies in the world, not only in technical, military and economic ways, but also in the cultural impacts it has, providing ways of thinking about ourselves, our society and our environment. If we believe the following: simulation is a useful tool for understanding social phenomena, science is substantially a social phenomenon, and it is important to understand how science operates, then it follows that we should be attempting to build simulation models of the social aspects of science. This JASSS forum presents a collection of position papers by philosophers, sociologists and others describing the features and issues the authors would like to see in social simulations of the many processes and aspects that we lump together as "science". It is intended that this collection will inform and motivate substantial simulation work as described in the last section of this introduction.

behaviour under different conditions of Hirsch's h-index for measuring research output and impact based on citations.

2.3
Whereas Simon's (1957) urn model simply generated a frequency distribution for papers per author, Gilbert (1997) represented individual academic papers with references to past papers and their content. Using two continuous variables to represent paper topics, his model depicts an academic field as a two-dimensional plane. Subfields appear within this model as clusters of points. The TARL model ('Topics, Aging and Recursive Linking') of Börner et al. ( 2004) represents both authors and papers, including references and 'topics' for papers, and generates network data. The behaviour of scientists publishing within academic fields has been compared to heuristic search (Bruckner et al. 1990;Scharnhorst and Ebeling 2005;Chen et al. 2009). Weisberg and Muldoon (2009) also employ landscape search as a model for science.

2.4
While fairly simple, Gilbert's model inspired models in different areas (e.g. Boudourides and Antypas 2002) and using different approaches. Sun and Naveh (2009) extended the model by giving the scientist-agents a learning mechanism and the ability to select which areas they choose to work in. Watts and Gilbert (forthcoming) added a representation of the influence of academic journals and their referees to the mix.

2.5
Edmonds (2007) developed a model in which scientists are represented as theorem provers, generating new theorems by inference from existing premises. In this model, as in those of Ahrweiler (1998), Ahrweiler and Wolkenhauer ( 1998), Weisberg and Muldoon (2009) and Grim (2009), there is an attempt to model an explicit epistemic landscape in which some locations are harder to discover than others (see also Watts and Gilbert, forthcoming).
The current state of the art 3.1 Mathematical models of science that could form the basis for a simulation appear occasionally at conferences such as the annual meeting of the Society for Social Studies of Science (4S), or of the conference of the International Society of Scientometrics and Informetrics (ISSI). They can also be found in conferences on network science, statistical physics [1] , sociology or even computational philosophy [2] . Few of these come with systematic simulation-based experiments -the community of quantitative science research has embraced simulation as an independent research method even less than it embraced dynamic mathematical models. It is therefore not surprising that simulations of the science system tend to appear at the interfaces between communities that do use simulation as a method, such as sociology and (somewhat unexpectedly) philosophy. Given the fact that current science studies, science of science, and science and technology studies are scattered among very different schools of thought and historic traditions, it is not surprising that the simulation research that occurs in one school has not been taken up in others. Thus we have the present situation where there is relatively little simulation work and what occurs is isolated.

3.2
A recent book, Models of science dynamics (Scharnhorst et al., forthcoming) collects review articles about different formal modelling techniques, including epidemic and opinion dynamics models of idea diffusion, evolutionary game theory on complex networks, and network analysis of co-authorship and citation networks. One chapter of this book (Lucio-Arias and Scharnhorst 2011) conducts an algorithmic historiography (historical research by means of bibliometrics) of mathematical models of science. It shows that despite a growth in publishing activities in this area, the recent threads of (mathematical) models do not refer to each other but remain isolated and this is also the case for existing simulations of the social processes of science. For the most part in this book analytic mathematical approaches predominate. Simulation appears, if at all, at the sidelines, and is not discussed as an independent research method.

3.3
However there is one chapter (Payette, forthcoming) that surveys agent-based models of science (including some of those mentioned above) and proposes a new one based on Hull (1988). The chapter also describes some general properties of agentbased models, focussing on those listed in Epstein (2006). He quotes Epstein: "The main desideratum is that the notion of 'local' be well posed." (2006, p. 6) and "If you didn't grow it, you didn't explain it." (2006, p. 51). These two principles can be seen as encapsulating the goal of explaining (growing) the macro from the micro (the local). In comparison there has been quite a lot of interest in either the micro (the individual scientist) or the macro.

3.4
On the macro side there is a growing request for mathematical models of science to infer initial and boundary conditions of "good" scientific activity; as well as to forecast the broad development of the science system. However the need is not for an abstract mathematical foundation but for very practical, empirical grounded scenario development that can inform policymaking. Such efforts could include a broad range of techniques including visual experiments (as for instance laying out evolving networks or overlapping knowledge diffusion processes to form global science maps as done, e.g., by Rafols et al. 2010), empirical validations and the simulation of theoretical assumptions. There is now a stream of research that seeks to capture aggregate patterns in the traces left by science (citation analysis, co-authorship patterns, and various other distributions observable in terms publishing or patents). These have the potential to aid the validation of simulation models, as demonstrated in Gilbert (1997) and Sun and Naveh (2009).

3.5
On the micro side there was a stream of research on the borders of artificial intelligence and philosophy of science to model how a single scientist might reason and induct new hypotheses (e.g. Holland et al. 1989, Thagard 1993). These did not consider social aspects of cognition or behaviour and since then the importance of modelling individual behaviour has lost out to the macro side. However, recently there has been a renewed focus on the individual. The "return of the actor" has been due to new ways to trace scientific authors in bibliographic databases and on the web (e.g. Thompson Reuters' ResearcherID). This shadows a trend in social network analysis to elaborate the role and content of their nodes, allowing a more active behaviour on their part, with the dynamics of the interactions along network links becoming more evident as well as a new focus on how networks themselves might be changing.

3.6
The only technique currently available to link these micro and macro sides that does so in a precise and replicable manner, open to detailed critique and step-wise improvement, is agent-based simulation. The necessity for this approach can be seen as a result of the social embeddedness of interaction in science (Granovetter 1985): that if we reduce our models of science to only the macro (essentially reducing the interaction to some global relationships between factors plus noise) or only the individual (ignoring social effects), then we shall miss substantial parts of the story.

3.7
This turn back to the actor, combined with the turn towards time, dynamics and complexity in science and philosophy of science studies, along with the development of social simulation, provides a fertile academic background for this initiative. The possibilities of the semantic web to provide a systematic empirical basis for scholarly communication, and the increasingly sophisticated analyses of networks, are providing more ways to validate simulations. Agent-based simulations of social processes are able to incorporate lessons from qualitative social science studies of what scientists actually do on a day-to-day level as well as insights from the more naturalistic philosophers of science.

3.8
To summarise, there are relatively few existing simulations of the social processes that occur in science, but the ground is now ripe for these. We are aware of the start of a steady stream of papers on such simulations, including upcoming work by the following (in no particular order):

The contributions 4.1
Answering the call for position papers, the following sixteen contributions (in alphabetical order) were submitted: To assist scientific discourse, Ahrweiler opts for a combined language-and behaviour-based framework for modelling theory networks in science, which looks at theories as competing and cooperating agents working on scientific domains. Balzer and Manhart emphasise the difference between scientific processes and processes in science, and explain how the incorporation of scientific theories in social simulations could lead to more united structural approaches. Barreteau and Le Page outline the complex dynamics, especially micro dynamics, involved in participatory research methodologies, and show how social simulation can help to address these issues. Chattoe-Brown identifies two challenges for simulating science: firstly to develop a "dynamic concept network" representation of scientific knowledge on which learning systems intended to model the scientific process can be compared; and secondly to develop an effective approach to providing data for a simulation of the scientific process.
Collins starts from the demarcation problem, asking what science actually is, which leads to a range of difficulties for simulation, and puts forward three recommendations about how to deal with the issue. Doran suggests a generic long-term science model where science is a set of processes by which a community of individuals uses reliable methods to obtain reliable understanding (scientific knowledge) of itself and its environment over time.
Edmonds surveys the observations and conclusions of some philosophers of science that might be relevant to a social simulation of science, observing that philosophers of science have not focussed much on the dynamic, social and complex aspects of science, which illustrates the need for simulations.
Taking the example of Robotics as a domain, Francisco, Milojevic and Sabanovic model conferences as venues in which social, cognitive, and institutional practices of science are performed and which provide a basis for analysis bridging local and system level features of science.
Meyer addresses the question of how to design good social simulation models of science building on stylised facts of science derived from bibliometric studies. Moelders, Fink and Weyer combine a Luhmannian systems perspective with a model of decision making of individual actors embedded in a socio-political context ("new public management of science") to reconstruct and analyse how the science system works. Parinov and Neylon discuss how virtual research environments influence the social processes of science and how, building on social simulation insights, these systems could be designed to be more efficient and effective in supporting scientific communities.
Payette conceptualises an agent-based model of the social processes of science that contains researchers who are organised in heterogeneous networks and who work on different domains communicating directly or through publications. Squazzoni and Takács argue for social simulation of the scientific peer review system, which is under increasing strain due to exploding demand, is under-investigated compared to its importance, and is in need of revision and innovation itself.
Thorngate applies a fundamental observation to the science field, namely that psychological factors such as competition for attention influence the social processes involved in the evolution of science such as the review process for journal papers.
Yilmaz addresses general issues of workforce dynamics and applies them to science introducing various models while asking what produces successful scientists, and what identifies areas for additional research.
Zollman points out that it is unknown how the imperfections of individual researchers impact upon the overall efficacy of science. He poses five key questions that have real and substantial bearing on the management and understanding of science, each of which could be the goal of a modelling programme.

5.1
The aim of this collection of position papers is to motivate and challenge those in the social simulation community to attempt simulation models of the social aspects of science. The issues raised and the directions indicated in these papers should help inform and guide these attempts. We hope that any models developed in response will: Bridge the micro-macro gap in some way, that is establish explanations that link macro level outcomes from the micro level behaviour of individuals, and vice versa Be motivated in terms of their conception and design with respect to this collection of papers Include some indication of how and in what way they might be checked and/or validated.

5.2
After a suitable time, we (the authors of this introduction) will organise a workshop for the discussion of papers that respond to this collection. Responses which present credible simulations will be centre stage at this event, but others will also be involved.
The idea is that it should be a forum to present and discuss these simulations in an extended manner, and thus motivate the production of more and better simulations in the future. We hope to eventually publish a set of papers that describe these.

5.3
Thus we call for contributions to this project from all fields, but especially those in social simulation and science studies, and look forward to the workshop in 1 to 2 years time.