Scientific software development viewed as knowledge acquisition: Towards understanding the development of risk-averse scientific software
Introduction
Scientific software development has been characterized as end-user programming (Segal, 2004), considered a candidate for Agile iterative development (e.g., Ackroyd et al., 2008), and has been regulated with waterfall-style software quality development standards (Canadian Standards Association). Scientists themselves characterize their development approach as “a-methodical” (Truex et al., 2000). This confusion of views of scientific software development hampers the creation of useful and useable tools, quality standards, and software development paradigms for scientists. Our aim in this paper is to (1) describe the common characteristics of the scientific software developer that we encountered in our studies, (2) argue that these scientists do not fall under the definition of “end-user programmers”, and based on our studies, (3) offer a different model of what drives software development by these scientific software developers, and (4) provide a new understanding of the software engineering research that would benefit this type of scientific software development and use.
For this paper, we define scientific software as application software that includes a large component of knowledge from the scientific application domain and is used to increase the knowledge of science for the purpose of solving real-world problems. We use the word “scientific” to include engineering applications.
Scientific software, by our definition, includes examples such as software to model loading on bridges, study safe operation of nuclear plants, track paths of hurricanes, locate satellites in telescope images, check mine shafts for rock faults, model medical procedures for cancer treatment, model dispersion patterns for toxic particulates, and study ocean currents for ecological impact.
The term “scientific software” has been used for a wide variety of software types that do not share the same quality requirements or the same management priorities. Software written to become a commodity product, for example, is managed to meet delivery dates and budget constraints. Software written to verify the safety of a radiation procedure, has to be correct, to the exclusion of all else.
We also exclude from our definition, software whose primary purpose is to control equipment. As explained in Kelly (2008), the quality goals of software that controls potentially dangerous equipment, such as avionics and nuclear reactor shut-down software, are different from the quality goals of scientific software that computes models of physical phenomena, such as tracking the path of a hurricane. If there is a failure in avionics software, the preference is that the software degrades as gracefully as possible. If there is a failure in software tracking severe weather, the preference is that it crashes and makes the problematic calculation as obvious as possible. One side effect is that any software quality standards targeted at control software are inappropriate to be applied to scientific software.
We also exclude generalized tools from our definition. Even if the tools are primarily intended for use by scientists, for example mathematical libraries, software layered to hide the complexity of high performance computing environments, and fourth generation languages intended for scientific computation. We include, instead, the applications built “on top” of these tools, which are aimed at solving a particular scientific problem.
To clarify, we further characterize scientific software with the following:
- (a)
a scientific domain specialist is necessarily involved in the process of developing the software;
- (b)
the user of this software has some minimum knowledge of the associated scientific domain, to allow correct interpretation of the output data;
- (c)
the user is the recipient of all output from the software, meaning the software's purpose is not to control equipment;
- (d)
the software's primary purpose is to provide data for understanding specific real world problems, meaning that the scientists we study do not develop generalized tools and libraries to support computational computing;
- (e)
the overriding software quality is correctness – or more accurately, trustworthiness – and if trustworthiness fails, then all other software qualities are irrelevant.
This paper is organized as follows:
The next section describes the set of studies of scientific software developers that we carried out from 2004 to 2014. This body of work provides the background for our understanding of the scientific software developer, and a basis for our discussions in the balance of this paper.
Next, we contrast our findings with the commonly held characterization of scientists as “end user programmers”. Ko et al. (2011) provide a definition and detailed discussion of the characteristics of “end user programmer”. We argue that end user programmer does not provide an accurate characterization of the type of scientist we are studying. Hence the body of research on end user programmers cannot be applied unilaterally to this type of scientist and their software.
In Section 4 of this paper, as an alternative to the view of scientific software developer as end user programmer, we offer a model of scientific software development based on the scientist acquiring knowledge from five knowledge domains. We are not proposing a new process theory, but an empirical example of an alternative to “methods”. We use our knowledge domain model to explain how approaches based on methods assume a fragmentation of knowledge that is detrimental to the development of scientific software.
In Section 5, we discuss, from our studies, the activities scientists engage in to advance their knowledge and to maintain trust in their scientific software, while not engaging in methods.
Finally, Section 6 concludes with a summary of the contributions of this paper.
Section snippets
2.1. Overview of a synthesis of research
From 2004 to 2014, we carried out a variety of studies looking at different aspects of scientific software development. In this section, we give a brief description of each study, a list of references that provide further details, and explain the findings salient to the discussion in this paper. The discussions in this paper are a synthesis of this work.
The studies took different formats from open-ended interviews of a group of scientists to working with and observing one scientist. Our
Scientists as professional end-user programmer
The most ubiquitous characterization of scientists who develop software is as end-user programmer. This allows the software engineering community to slot scientists into a body of research to help understand and recommend software engineering approaches to improve the scientists' software work. The most obvious reason to characterize scientists as end-user programmers is because they do not consider themselves to be in the software business. Segal (2004) refined the label to “professional
A model of knowledge acquisition as a driver for the development of scientific software
In order to fully understand how scientists view and develop their software, we need to change from a method-based view of software development where the product is the software, to a non-method view of software development where the product is the scientist's knowledge.
At least since 1990 (e.g., Guindon, 1990), researchers have considered how knowledge is acquired and expressed in any type of software. Earlier, researchers (e.g., Curtis et al., 1979) were aware that human understanding played
How scientists develop software outside the “methods” approach
Our observations (e.g., Kelly, November 2013, Sanders, 2008, Sanders and Kelly, July/August 2008), and that of others (e.g., Sletholt et al., 2012), are that scientists engage in software development outside the methods paradigm. But, because methods are so dominant in the software engineering literature (Ralph, 2012) and assumed by many to be the only valid approach to software development, scientists have been criticized for not following methods (e.g., Merali, 2010) and have been offered a
Summary and conclusions
None of the current software engineering views on how scientists do – or should – develop software are universally applicable to the wide and varied range of what is termed, scientific software.
The most ubiquitous view is that scientists are end user programmers. This is based on observations that scientists do not self-identify as professional programmers, that they do not produce software as end products, that their user base is small, and that scientists are not using “systematic and
Acknowledgment
This work is funded by the Natural Sciences and Engineering Research Council of Canada (NSERC). Many thanks go to the scientists and engineers who offered their time and enthusiasm for these studies. Several talks given by the author based on these interviews were funded by IEEE.
Diane Kelly is an Associate Professor in the Department of Mathematics and Computer Science at the Royal Military College (RMC) of Canada. She is cross-appointed to RMC's Department of Electrical and Computer Engineering and to the School of Computing at Queen's University. Diane has a Ph.D. and MEng in Software Engineering both from RMC. Her B.Sc. in Pure Mathematics and B.Ed. in Mathematics and Computer Science are both from the University of Toronto. Diane worked in industry for over 20
References (37)
Knowledge exploited by experts during software system design
Int. J. Man-Mach. Stud.
(1990)- et al.
Examining random and designed tests to detect code mistakes in scientific software
J. Comput. Sci.
(2011) Determining factors that affect long-term evolution in scientific application software
J. Syst. Softw.
(2009)- et al.
Amethodical systems development: the deferred meaning of systems development methods
Account. Manag. Inf. Technol.
(2000) - et al.
Scientific software development at a research facility
IEEE Softw.
(2008) - et al.
Scientific software development is not an oxymoron
PLoS Comput. Biol.
(2006) - et al.
Balancing Agility and Discipline
(2005) - Canadian Standards Association (1999) CSA N286.7-99 (R2012) – Quality Assurance of Analytical, Scientific and Design...
An Exploration of a Testing Strategy to Support Refactoring
(2005)
An empirical characterization of scientific software development projects according to the Boehm and Turner model: a progress report
Measuring the psychological complexity of software maintenance tasks with the Halstead and McCabe metrics
IEEE Trans. Softw. Eng.
Simplicity research in information and communication technology
IEEE Comput.
Investigating Test Selection Techniques for Scientific Software
Using Code Mutation to Study Code Faults in Scientific Software
Mutation sensitivity testing
IEEE Comput. Sci. Eng.
Problem Frames
Software Testing A Craftsman's Approach
Cited by (36)
An empirical study of COVID-19 related posts on Stack Overflow: Topics and technologies
2021, Journal of Systems and SoftwareCitation Excerpt :Besides the rapid demand for digitalized platforms elaborating the transformation of homes into places for remote education and work, COVID-19 has led to a steep growth in Scientific Software Development (SSD). Generally speaking, SSD refers to the design, implementation and testing of software encompassing knowledge from a specific scientific application domain (e.g. biology, health sciences, mathematics, data science etc.) and used with the primary aim of knowledge acquisition and solving of real-world problems (Kelly, 2015). According to Segal and Morris (2008) SSD is fundamentally different from commercial software since the (usually complex) application domain is not understood by the average developer and for this reason a scientist (domain expert) must be heavily involved in software development.
State of the Practice for Lattice Boltzmann Method Software
2024, Archives of Computational Methods in EngineeringComputational Science: A Field of Inquiry for Design Science Research
2022, Proceedings of the Annual Hawaii International Conference on System Sciences
Diane Kelly is an Associate Professor in the Department of Mathematics and Computer Science at the Royal Military College (RMC) of Canada. She is cross-appointed to RMC's Department of Electrical and Computer Engineering and to the School of Computing at Queen's University. Diane has a Ph.D. and MEng in Software Engineering both from RMC. Her B.Sc. in Pure Mathematics and B.Ed. in Mathematics and Computer Science are both from the University of Toronto. Diane worked in industry for over 20 years as a scientific software developer, technical trainer, and QA advisor. She is a senior member of IEEE.