SPI implementation at the large scale and the small scale are studied , and a multitude of publications report on experiences in academia and practice

Software process improvement (SPI) has been around for decades: frameworks are proposed, success factors are studied, and experiences have been reported.However, the sheermass of concepts, approaches, and standards published over the years overwhelms practitioners as well as researchers. What is out there? Are there new trends and emerging approaches?What are open issues? Still, we struggle to answer these questions about the current state of SPI and related research. In this article, we present results from an updated systematic mapping study to shed light on the field of SPI, to develop a big picture of the state of the art, and to draw conclusions for future research directions. An analysis of 769 publications draws a big picture of SPI-related research of the past quarter-century. Our study shows a high number of solution proposals, experience reports, and secondary studies, but only few theories and models on SPI in general. In particular, standard SPImodels likeCMMI and ISO/IEC 15,504 are analyzed, enhanced, and evaluated for applicability in practice, but these standards are also critically discussed, e.g., from the perspective of SPI in small-to-medium-sized companies, which leads to new specialized frameworks. New and specialized frameworks account for the majority of the contributions found (approx. 38%). Furthermore, we find a growing interest in success factors (approx. 16%) to aid companies in conducting SPI and in adapting agile principles and practices for SPI (approx. 10%). Beyond these specific topics, the study results also show an increasing interest into secondary studies with the purpose of aggregating and structuring SPI-related knowledge. Finally, the present study helps directing future research by identifying under-researched topics awaiting further investigation. Subjects Software Engineering


33
Software process improvement (SPI; according to Humphrey, 1989) aims to improve software processes 34 and comprises a variety of tasks, such as scoping, assessment, design and realization, and continuous 35 improvement, e.g., Münch et al. (2012). In this field, a number of SPI models competes for the companies' to adopt CMMI for improvement programs, while Müller et al. (2010) study SPI in general from the 101 perspective of organizational change. All these representatively selected studies address specific topics, 102 yet, they do not contribute to a more general perspective on SPI. Such general studies are scarcely to find. 103 For instance, Rainer and Hall (2001) analyze some 'core' studies on SPI for the purpose to work out 104 addressed topics and gaps in the domain. However, they select few studies of which they assume to be 105 good representatives thus providing a limited picture only. In terms of analyzing the entire domain and 106 providing new (generalizable) knowledge, Unterkalmsteiner et al. (2012) contribute a systematic review 107 on the state of the art of evaluation and measurement in SPI. They conduct a systematic review for the 108 purpose of synthesizing a list of evaluation and measurement approaches, which they also analyze for the 109 practical application. 110 The study at hand does not aim at generating generalizable knowledge for one or more SPI-related 111 topics in the first place. The purpose of the present study is to draw a big picture of the current state of 112 the art of SPI in general. That is, as there is no comparable study available, this article closes a gap in 113 literature by providing a comprehensive picture of the development of the field of SPI over time and by 114 summarizing the current state of the art. Other than, e.g., Rainer and Hall (2001)  and to present our results. Therefore, our study does not address one specific aspect/topic, but aims to 117 draw a general picture from a "bird's-eye perspective" to pave the way for further topic-specific and more 118 detailed studies. 120 In this section, we present the overall study design. After describing the selected research method, we 121 introduce the research questions, and describe the different instruments used for data collection and 122 analysis, and the validity procedures.

124
In this study, we ground the overall research approach in the procedures implemented for our previously 125 published initial study. In Kuhrmann et al. (2015), we followed an approach in which we applied different 126 methods from systematic literature reviews (SLR) according to Kitchenham and Charters (2007) and

Manuscript to be reviewed
Computer Science systematic mapping studies (SMS) as presented by Petersen et al. (2008). While carrying out the study 128 update, we used and improved the methods applied, which was necessary to develop a strategy that allows 129 for continuous study updates. Figure 1 shows the overall research approach for which we provide details 130 in subsequent sections.

131
Initial Study The initial study was designed as a breadth-first search to cover the SPI domain as 132 complete as possible. In February 2013, we performed the study preparation, conducted a series of test 133 runs, and refined the search queries iteratively. End of April 2013, we conducted the main search, which 134 resulted in about 85,000 hits. As we expected this large number of results and in order to support the 135 dataset cleaning, we defined filter questions, which we applied to the initial result set. When the initial 136 result set was cleaned, we performed a voting procedure to select the relevant publications from the result 137 set. Based on this selection, we developed the classification schemas (by manual sampling as well as 138 tool-supported) and harmonized the dataset (e.g., completion of keyword lists).

139
Study Update Procedure As one of the goals was to develop an instrument to provide a "heartbeat" of 140 the whole field, having a strategy available to continuously update and refine the study was an imperative. 141 Therefore, after having conducted and analyzed the initial study, we collected lessons learned and

148
Our objective is to capture the domain of Software Process Improvement (SPI), to provide a continuously 149 updated snapshot of the available publication pool, and to investigate research trends. Therefore, we

155
RQ 2: What is the contribution population? Based on the found publications, we are interested in the 156 addressed topics and major contributions (e.g., SPI models, theories, secondary studies, and lessons 157 learned) to work out the SPI topics to which research contributed so far. 158 RQ 3: What trends in SPI and SPI-related research can be observed? The third research question aims 159 at investigating the focus points addressed by SPI research so far, and to work out gaps as well as 160 trends. This research question shall pave the way to direct future research on SPI.  162 As mentioned in Section 3.1, due to lessons learned in the initial study and in order to provide a feasible 163 strategy for study updates, the research approach had to be improved. The most significant changes 164 regarding the data collection procedure are described in Appendix B. In the following, we describe the 165 actual data collection procedure applied to the present study.

Query Construction
The basic queries were already developed in the initial study (Appendix B.1.1).

167
After the initial result set analysis, the query strings were critically reviewed and updated ( Figure 1). 168 However, no new search terms were added, only the structure of the queries required some updates to 169 address the new data source that serves as main input. In a nutshell, due to the change of the search engine, 170 the main search strings S 1 -S 8 were integrated with the context and filter queries, which were required in 171 the initial study to help querying the different literature databases. The full new search queries can be 172 depicted from we looked for more efficient ways to fetch papers for the update and eventually opted for Scopus 1 as new 175 1 Scopus is available from: http://www.scopus.com. Before we made this decision, we tested Scopus: We took some initial search queries (Table 10), queried Scopus, and compared the obtained data with the original datasets. We then iteratively that structures the data and contains the attributes shown in Table 1. The data structure shown in Table 1 177 follows the structure used in the initial study.

179
We describe the analysis preparation as well as the steps conducted to answer the research questions.

181
We performed an automated search that required us to filter and prepare the result set. The data analysis is 182 prepared by harmonizing the data, performing a 2-staged voting process, and integrating the initial and 183 the update data set to prepare the result set analysis.

184
Harmonization To make the selection of the contributions more efficient, we first integrated and cleaned 185 the result set. We removed the duplicates, which we identified by title, year, and author list. The main 186 instrument used was the Microsoft Excel feature to identify and remove duplicates (cf. Appendix B.2.2).

187
This procedure was performed on the integrated result set.
188 Table 2. Inclusion and exclusion criteria applied to the study.

IC 1
Title, keyword list, and abstract make explicit that the paper is related to SPI. IC 2 Paper presents SPI-related topics, e.g., SPI models, assessments, experiences in adopting and deploying software processes, and reports on improving specific methods/practices.
Paper is not in the field of software engineering or computer science in general. EC 3 Paper is a tutorial or workshop summary only. EC 4 Paper occurred multiple times. EC 5 Paper full text is not available for download.

Voting
We applied the voting procedures as described in Kuhrmann et al. (2015). That is, we performed a 189 multi-staged voting process to classify the papers as relevant or irrelevant and to build a set of publications 190 enhanced the Scopus search strings and, eventually, defined the following quality requirement for the search: Given the trends in publication frequency and classification obtained in the initial study, we expect a similar frequency and classification for the Scopes-based search (see also Section 4.1).

Criteria Description
Evaluation research implemented in practice, evaluation of implementation conducted; requires more than just one demonstrating case study Solution proposal solution for a problem is proposed, benefits/application is demonstrated by example, experiments, or student labs; also includes proposals complemented by one demonstrating case study for which no long-term evaluation/dissemination plan is obvious Philosophical paper new way of thinking, structuring a field in form of a taxonomy or a framework, secondary studies like SLR or SMS Opinion paper personal opinion, not grounded in related work and research methodology Experience paper personal experience, how are things done in practice Contribution Type Facets In order to analyze what and how publications contribute to the body of 208 knowledge, we adopted the contribution type facets as proposed by Shaw (2003). In the initial study Kuhrmann et al. (2015), the focus type facets were found inadequate for this study stage, e.g., due to variety of the topics addressed and the limitations to define proper topic clusters or the need to have multiple assignments for many papers.  Figure 2. Overview of the collected metadata in the study analysis phase, including publication vehicles and 40 study-specific attributes and their grouping in topic cluster (dimensions).
which the lessons learned from the initial study were taken into account. During the metadata collection, 213 reviewers had the option to propose and add further attributes, i.e., the list of metadata was extended and 214 then the result set was revisited (see also Figure 1).
215 Figure 2 provides a structured overview of the metadata.

271
To get an overview of the harvested papers, we performed a categorization to define the research type 272 facets and contribution type facets (Table 3 and Table 4). To analyze the respective trends, Figure 5 273 provides an integrated picture that shows the papers in the different categories and over time. (3.9%) (11.8%) Figure 6. Systematic map over research-and contribution type facets.

313
In this section, we provide a more detailed perspective on the result set using the collected metadata as 314 illustrated in Figure 2. While classifying the result set, we collected metadata for the three dimensions 315 Study Type and Method, Process (incl. sub-categories), and Context (incl. sub-categories). In addition 316 to the publication vehicle, we defined 40 attributes, and each paper could be assigned none or many of 317 these attributes (Section 3.4.2). In total, for the 769 studied papers, we assigned 2,408 attribute values.

318
All metadata assignments are summarized in Figure 7 and discussed in the following. of such standards. Therefore, a trend analysis is yet not meaningfully to conduct.

Manuscript to be reviewed
Computer Science few mentions (less than 20 each).

356
The companies sizes and scales addressed in the papers show a trend towards very small entities 357 (VSE) and small-to-medium-sized enterprises (SME). In the result set, 116 papers deal with companies 358 of this sort, while 75 papers address companies of other scales, i.e., large companies and global players.

359
In Section 4.4.3, we investigate this attribute group in more detail. Furthermore, global distribution of 360 software development is addressed by 37 papers, whereas this is a cross-cutting concern that is addressed     what their scientific maturity is. Figure 9 shows a systematic map that illustrates two aspects: in lower 396 part the research maturity and the contribution of papers addressing standard maturity models is shown.

397
In total, 225 out of 769 papers address CMMI, ISO/IEC 15504 or both. The classification according to 398 the research-and contribution type facet shows that for standards and standard-related SPI research many 399 lessons learned are reported and that some evaluation research is available.      analysis. Figure 12 also shows that only 27 papers (21 multi-case/longitudinal study, 2 replication study, 443 and 4 multi-method) go beyond "one-time research", i.e., these papers study success factors over time, 444 from different angles, and/or apply them and learn from the application.

453
The third trend observed in the initial study was an increasing interest in SPI for small-to-medium-sized 454 enterprises (SME). Figure 13 provides an overview of the share of papers explicitly addressing SPI in 455 SMEs (and other company sizes if mentioned in title, keywords, or abstracts).

Computer Science
Engineering (GSE), i.e., if SPI takes place in a global setting. 37 papers address GSE-related questions.

462
In the following, we provide some insights regarding the topics SPI for SMEs addresses and we also 463 provide an overview of the respective application domains and covered life cycle phases.
464 Figure 14 provides a systematic map of the papers that explicitly mention the company context.    The get more insights, we filtered the metadata for the company size. The results are illustrated in 472 Tables 6, 7, and 8. Finally, Figure 15 visualizes the fourth trend found in the initial study: although perceived as contradiction, 489 in recent years, combining agility and SPI received some attention, such as agile maturity models. In total, 490 the result set contains 73 papers (9.5%) that address agility in the context of SPI, and the Figure 15 shows

Manuscript to be reviewed
Computer Science

517
In this section, we discuss the findings obtained so far. Beyond the discussion of the trends already 518 identified in our initial study, we also broaden our perspective and discuss further trends that can be found 519 in the updated result set. of testing-related papers (compared to the implementation-related papers) motivates the question for why 550 this rather "late" phase is more emphasized, especially in times of agile/lean software development. Is 551 testing addressing implementation as well? Is testing subject to improvement because of the effort spent 552 on this activity? However, these are questions that cannot be answered in the current stage of the study 553 thus remain subject to future work (see also Section 5).

554
What is the state of SPI after all? Our data shows a diverse picture and, furthermore, shows SPI a is the lack of theorizing approaches, which are often performed for specific domains (e.g., SMEs) or 563 grounded in secondary studies only. In summary, although SPI is around for decades, we still miss a 564 sound theory about SPI. We have a number of standardized and specific SPI models and frameworks.

565
However, we still lack evidence.

591
In this section, we evaluate our findings and critically review our study regarding its threats to validity. to have captured all metadata). Furthermore, the metadata collected so far needs to be considered initial,

611
as there are potentially more attributes of interest. That is, since we rely on the mapping study instrument 612 in the first place, some metadata might yet not be captured, as this would require a more in-depth analysis, 613 e.g., using the systematic review instrument. Furthermore, as we introduced 40 metadata attributes, the 614 risk of misclassification increases, e.g., due to misunderstandings regarding the criteria to be applied or 615 due to confusing/misleading use of terminology in respective papers.  thermore, as already mentioned in the discussion on the internal validity, generalizability is also affected 629 by potential white spots in the metadata attributes, which, however, requires further investigation. Such 630 (independently conducted) investigation will (i) contribute to the internal validity by increasing dataset 631 completeness, but (ii) will also improve the external validity by incrementally improving the quality of 632 the dataset used to draw general conclusions.  In the initial study, based on the data collection procedures (described in Appendix B.1) and the study 720 selection procedures (described in Section 3), we obtained the result set described in Manuscript to be reviewed Computer Science Table 9. Data collection and filtering results (tentative result sets during selection and final result set). Step

IEEE ACM Springer Elsevier Wiley IET Total
Step imperative, which mainly affects the data collection procedures. Therefore, in this appendix, we give an 727 integrated and detailed view on the data collection procedure as executed in the initial study, and we detail 728 the update procedure used for compiling the report at hand.

730
The initial study, inter alia, aimed at creating the baseline to study SPI. Therefore, the initial study was 731 carried out with a considerable "manpower" that, however, is too costly for a continuous update. In this 732 section, with the purpose of increasing transparency and reproducibility, we present the details of the 733 initial data collection procedure (see also Kuhrmann et al., 2015), before presenting the implemented-and 734 recommended-approach to conduct the study updates in Appendix B.2.

736
In a series of workshops, we defined the keywords that we are interested in and defined the general search 737 strings in Table 10, which were then validated in several test runs before being used in an automated   The initial data collection was an automated full-text search in several literature databases. As main data 755 sources, we relied on established literature databases, which we consider most appropriate for a search. In 756 particular, we selected the following databases: ACM Digital Library, SpringerLink, IEEE Digital Library 757 (Xplore), Wiley, Elsevier (Science Direct), and IET Software. If there was a paper listed in one of those 758 databases, but was only referred, we counted it for the database that generated the item, regardless of the 759 actual publication location.

761
We performed an automated search that required us to filter and prepare the result set. The data analysis is 762 prepared by harmonizing the data and performing a 2-staged voting process.

763
Harmonization Due to the query construction, we found a vast amount of multiple occurrences in the 764 result set, and we also found a number of publications that are not in software engineering or computer 765 science. To make the selection of the contributions more efficient, we first cleaned the initial result set (cf. 766 Table 9 for the results per phase). In the first step, we removed the duplicates, which we identified by title, 767 year, and author list. In the second step, we applied the filter queries to sort out those publications not  3 We used the word clouds to visually inspect the result set for "intruders", e.g., medicine, chemistry, and cancer therapy. Terms not matching our search criteria were collected and used to identify and remove the misselected papers from the result set.

Computer Science
Voting the Papers The final selection whether or not a paper was included in the result set was made 773 using a multi-staged voting procedure. This procedure was also applied in the study update and, therefore, 774 is described in detail in Section 3.4.1.

776
In this section, we present the details about the recommended data collection procedure to be implemented 777 for study updates. 778   Table 11. Final search strings used for the automatic database search in the study update procedure.
Search String S 1 ((life-cycle or lifecycle or "life cycle") and (management or administration or development or description or authoring or deployment)) and (("software process" and ("software development model" or "process model")) or (SPI or "software process improvement")) S 2 (modeling or modelling or model-based or approach or variant) and (("software process" and ("software development model" or "process model")) or (SPI or "software process improvement")) S 3 (optimization or optimisation or customization or customisation or tailoring) and (("software process" and ("software development model" or "process model")) or (SPI or "software process improvement")) S 4 ("reference model" or "quality management" or evaluation or (assessment or audit) or (CMMI or "Capability Maturity Model Integration")) and (("software process" and ("software development model" or "process model")) or (SPI or "software process improvement")) S 5 ((feasibility or experience) and (study or report)) and (SPI or "software process improvement") S 6 ((life-cycle or lifecycle or "life cycle") and (design or modeling or modelling or analysis or training)) and (("software process" and ("software development model" or "process model")) or (SPI or "software process improvement")) S 7 (measurement or evaluation or approach or variant or improvement) and (("software process" and ("software development model" or "process model")) or (SPI or "software process improvement")) S 8 ((SCAMPI or "Standard CMMI Appraisal Method for Process Improvement") or (SPICE or "ISO/IEC 15504") or (PSP or "Personal Software Process") or (TSP or "Team Software Process")) and (("software process" and ("software development model" or "process model")) or (SPI or "software process improvement"))

779
The major update in the search procedure is the search engine utilized for the search. Instead of repeating 780 the search with individual databases (cf. Appendix B.1.2), we switched to Scopus, as Scopus as meta-781 search engine covers most of the relevant software engineering venues (journals as well as conferences).

782
This however changes the general search procedure, notably the search strings need to be updated 783 accordingly. The adapted search strings are summarized in Table 11. Comparing the new search queries 784 to the initial study's queries from to the paper title (double-check and confirm by also checking authors and abstract).  Table 2 following the procedure description in Section 3.4.