Protocol adaptations to conduct systematic literature reviews in software engineering: A chronological study

Systematic literature reviews (SLR) have reached a considerable level of adoption in Software Engineering. However, protocol adaptations for its implementation remain tangentially addressed. This work provides a chronological study for the use and adaptation of the SLR protocol, including its current status. A systematic literature search was performed, reviewing a set of twelve articles published between 2004 and 2013, and selected in accordance with the inclusion and exclusion criteria and using digital data sources recognized by the SE community. A chronological study that includes the current state of the protocol adaptations to conduct SLR in SE is provided. The results indicate areas where the quantity and quality of investigations needs to be increased, and the identification and also the main proposals providing adaptations for the protocol conducting SLR in SE


Introduction
The importance of research in software engineering (SE) aims to produce knowledge based on the scientific method.This has become one of the main challenges in strengthening the foundations of SE as a discipline on its path to total maturity (Rodriguez 2005).Different types of experimental studies can be used in SE (Wohlin et al. 2006), some proposals to support the fulfillment of these studies can be found in Wohlin et al. (2000).
Researchers have applied primary studies to improve the knowledge of SE (Basili et al. 1999) in order to support the processes related to SE technologies, mainly those related to appraising the technology (Shull et al. 2001).Secondary studies are designed to make feasible the comparisons between individual investigations, scientifically selected within a series of primary studies that can support the creation of an evidence-based body of knowledge (Kitchenham et al. 2009).
Evidence-based Software Engineering (EBSE) is designed to provide the means to obtain the best current evidence, integrating practical experience and human values into the decision-making for software development and maintenance (Dybå et al. 2005).EBSE considers five steps (Sackett et al. 1996): (i) convert the need for information into questions and answers, (ii) identify the best evidence to answer these questions, (iii) assess the critical evidence (validity and utility), (iv) put the results of this evaluation into practice in SE and (v) evaluate the yield of this implementation.
The preferred method for steps (ii) and (iii) is the systematic literature review (SLR) (Da-Silvaet al. 2011).Unlike a peer review, a SLR is a rigorous methodological review; it aims to provide all the existing evidence on a research question and also to support the development of evidence-based directives for practitioners (Kitchenham et al. 2007).
It was Kitchenham (2004) who adopted the protocol to implement a SLR from medicine to SE.Later, the protocol was updated using concepts from the social sciences (Kitchenham et al. 2007).In addition, SLRs require an extra effort that must be planned prior to execution, and the entire process must be documented (Biolchini et al. 2005).This indicates the need to put the efforts into its planning and execution, so as to guide researchers in carrying it out.Therefore, SLR protocol adaptations to SE must be considered.Additionally, in a study from Kitchenham (2013) regarding using SLRs in SE, she concludes that the three most significant problems are: (i) digital libraries are not well suited to complex automated searches, (ii) the time and effort needed for SLRs and (iii) the quality assessment of papers based on different research methods.
The aim of this work is to account for the adaptations made to the protocol used to conduct SLR in SE, also providing a chronological study that includes its current status.This article may be of interest to researchers planning to conduct additional studies, as well as to practitioners and new researchers who wish to approach SLRs as a relevant source of information in SE.
The structure of the article presents the main steps of the conducted methodology in section 2. In section 3 the selected works are reviewed in detail.Results and discussion are presented in section 4. Finally, the main conclusions of this work are presented in section 5.

Methodology
A systematic literature search was conducted, compiling background on change proposals for protocol to conduct SLRs in SE.We speak of a systematic search and not a SLR as defined by Kitchenham (2007), because we did not strictly follow all the steps defined in the protocol for its implementation (i.e.we did not do a quality assessment or a classification of works).
Research Questions (RQs): The RQs to be answered for this work are presented in Table 1.We want to give an account of the changes made to the original SLR protocol in SE.It would be possible to observe how the community has adapted this protocol in practice.
RQ2: What sections of the SLR protocol have been more modified, and what did these modifications consider?
We want to know the protocol sections that have been modified and what kinds of changes have been incorporated.This would show the protocol sections most controversial for the SE community and their adaptions.
RQ3: What has happened with proposals for changes to the SLR protocol through time?
We want to know when these proposals have originated.This would show the temporal evolution of the SLR protocol.
Searching for works: To answer the RQs, the systematic search was based on identifying adaptations made to the protocol to conduct SLRs in SE.This search covers the period between 2004 -date on which Kitchenham (2004) adapts the protocol used in medicine-and 2013.
We use some of the sources most frequently used by the SE community (Brereton et al. 2007).In our case we consulted: IEEE Xplore, ACM Digital Library and Science Direct.The used search string was: ("systematic literature review" OR "systematic review") AND ("software engineering") AND ("guidelines" OR "protocols" OR "lessons" OR "study" OR "proposals").
Selecting works: Once the data sources and search string were defined, all those works that reported changes to the protocol on conducting SLR were reviewed.This included reading the methodology used, the steps carried out and the results obtained.

Inclusion and exclusion criteria:
The following criteria allow us to determine the relevance of the works collected.
a. Inclusion criteria: all the works regarding SLR in SE that specifically mention aspects about modifications of the protocol to carry out a SLR, i.e. how to conduct a SLR and the stages/activities this entails.
b. Exclusion criteria: all the works containing SLR topics, but that do not suggest proposals on how to carry out or modify the defined protocol to develop SLRs in SE.
Initially 31 works were compiled, and 12 were finally selected.These are the ones analyzed and described in detail in the next section.
Considering that inclusion and exclusion of the works was done by reviewing and interpreting the text (which is potentially ambiguous), the reliability between the reviewers was calculated using Cohen's Kappa statistic (Gwet 2002).The results, after two failed attempts, were satisfactory (K = 0.851).This result indicates that the scale presented in Clark et al. (2004) provides a basis for criteria that is clear enough, which does not induce significant divergences among measurers.In addition, for those cases where the reviewers had doubts about including or not a work, this was subjected to an individual review and then a decision was made by group consensus.

Data extraction and synthesis:
Regarding the works that submit a proposal for changes to the SLR protocol in SE, in a previous work we established which would be the activities in the protocol to look for changes in each one of the protocol stages (Sepúlveda and Cravero 2013).

Proposals for changes to the SLR protocol
This section analyzes the selected works.Table 2 shows the three stages of the protocol, their activities and an identifier for each one of them.We chose these activities by reviewing literature evidence that shows changes to the SLR protocol.

Selected Works
Having selected the works, 12 were found to present proposals for modifications to the protocol to conduct SLRs in SE.Table 3 shows the #Id, title, authors and year of each selected work.

Stages of a SLR-SE analysis
In order to conduct a detailed analysis of each stage and activities according to the protocol, a table was designed for each activity reviewed.Due to extension issues, an example of these tables is shown on Table 4.All the Tables can be seen in Sepúlveda and Cravero (2013).
Table 4. Proposals for the "Report of the results (A7)" stage.

Changes -Proposals Code Id
Proposes formats to publish the results (structure and contents of a report).KitA7 T1 Results of the final protocol must be reported, which includes reviews/ changes regarding the process and explaining the nature of changes to the original protocol.

StaA7 T9
Make a detailed record of decisions taken during the process.To establish a mechanism that allows SLR results to be published (more extensive than traditional papers) or use of appendices in electronic repositories.

BreA7 T7
a. Planning stage and activities reviewed For the planning stage we review the activity "Definition of the RQ (A1)".Seven works were selected for this stage and activity A1 (#Id.T1, T3, T5, T6, T7, T9, T12).The proposals suggest: (i) guidelines to help define best RQs, (ii) guidelines to check that defined RQs are indeed the most appropriate and (iii) that RQs are not defined a priori, but rather defined as a greater knowledge of the subject being gained.

b. Implementation stage and activities reviewed
For implementation stage we review the activities A2 to A6, according to Table 1.
• Identification of relevant works (A2): Six works were selected for A2 (#Id.T1, T2, T5, T6, T7, T11).The proposals suggest: (i) identification and selection of relevant data sources, (ii) definition and justification of a systematic search strategy according to the defined RQs and (iii) identification of categories for classification of the works identified (Nabi and Mullins 2011).
• Selection of relevant works (A3): Eight works were selected for A3 (#Id.T1, T2, T3, T5, T6, T7, T9, T10).The proposals suggest: (i) definition of guidelines to establish the inclusion/exclusion criteria, (ii) guidelines to resolve disagreements between reviewers when selecting works, (iii) use of peer review to avoid bias when selecting a work and (iv) review of other elements of the paper such as the conclusions, because abstracts are usually of low quality.
• Quality evaluation of selected works (A4): Six works were selected for A4 (#Id.T1, T5, T6, T8, T9, T10).The proposals suggest: (i) guidelines and framework to evaluate the quality of the selected work, (ii) use of checklists with defined factors to evaluate the quality of the work and (iii) participation of multiple evaluators and discussion rounds to reach a consensus on criteria.
• Data extraction (A5): Eight works were selected for A5 (#Id.T1, T2, T3, T4, T5, T6, T7, T9).The proposals suggest: (i) design and use of forms to record data, (ii) use of software tools to support the documentation of data, (iii) use of peer review and (iv) recording the section of the article where the selected data is found.

c. Documentation stage and reviewed activities
For the documentation stage, reviewing the activity report of the results was considered (A7).
• Report of the results (A7): Three works were selected for A7 (#Id.T1, T7, T9).The proposals suggest: (i) formats and guidelines to publish results and (ii) the reviews and decisions made during the process must be reported.

d. Comments to the stages and activities reviewed
Having reviewed the three stages and the identified proposals for each one, we can say that they focus essentially on defining guidelines for: (i) supporting the definition of RQs, (ii) defining inclusion/exclusion criteria, (iii) synthesizing data, and (iv) publishing the results.The details are presented in Sepúlveda and Cravero (2013).

Results and Discussion
Next, the results and findings are discussed.The main threats to the validity of this study are also presented.
The final selection included 12 works between 2004 and 2013.We think that the specificity of the topic has caused the sample to be rather small, and due to this same specificity, the review provides a reliable overall view of the state of research in this area.

Answering the RQs
RQ1: What changes are proposed to the original protocol defined by B. Kitchenham?
The original protocol for conducting SLRs in SE was defined by #Id T1.Later works were published proposing changes to it, in one or more activities for the three stages.
Generally, we can say that the proposals for changes to the SLR protocol in SE focus essentially on defining guidelines for: (i) supporting the definition of the RQs; (ii) identifying and selecting relevant data sources as well as the definition of a search strategy aligned with the RQs and classification of the identified works by category; (iii) defining the inclusion/exclusion criteria, the solution of disagreements between reviewers when selecting works and the caution in using only abstracts due to their low quality; (iv) evaluating the quality of the selected works and participation of several evaluators and how to reach a consensus on the criteria; (v) synthesizing the data, obtaining statistical results from quantitative and qualitative data and using tables and databases to facilitate the analysis; and finally (vi) publishing the results, reporting the reviews and decisions taken in the process.

RQ3: What happened with the proposed changes to the protocol through time?
From the stages and activities identified, as well as from the changes proposed for each one of these activities, a timeline has been prepared for each stage of the protocol (planning, implementation and documentation).We used the previously defined acronyms for each work reviewed.
Planning stage and proposed changes: As shown in Figure 1, the proposals defined for activity A1 are included between 2004 and 2013, totaling seven proposals; three of them are from 2007.We can see from Figures 1-3: (i) as we expected, at the beginning everything was based on the proposal by (Kitchenham 2004); (ii) the greatest number of changes are concentrated in 2007, with a total of 25 proposals; (iii) the activities with the most proposed changes are A2, A3 and A5, which correspond to the implementation stage with a total of eight proposals each; (iv) the stage that seems to be the most stable is documentation/report, because between 2004 and 2011 only 3 change proposals are recorded; and (v) according to the changes after 2008, it could be argued that these are more focused on improving and controlling quality aspects of the SLRs. Figure 4 shows a quantification of the proposals for changes to the SLR protocol with respect to the year in which these were published, where it can be corroborated that the greatest number of proposals appears in 2007.
From the collected evidence, we can say that the greatest number of proposals regarding the original SLR protocol in SE appeared in 2007.This is consistent with a considerable increase in the number of SLRs published in the same year, which show a growth rate that is maintained up to the present day, but the changes proposed to the SLR protocol decay dramatically.

Meaning of the findings and results
From the collected data, we can state that SLR is a subject that has gained relevance in the SE community, which translates into an increasing number of articles in specialty journals and conferences, as well as an increasing number of experiences of application/adoption in the industry.Nevertheless, we detected some relevant aspects where there is a considerable lack of both theoretical and empirical contributions, and some areas where it is possible to make contributions to the community, such as the implementation of: (i) tertiary studies that allow the real state of the quality of SLRs conducted in SE to be visualized; (ii) studies that make it possible to verify whether there is indeed a stabilization of the protocol for conducting SLRs; (iii) use of empirical evidence to establish how this protocol is used and adapted; and (iv) studies to establish the level of adoption and adaptation of SLRs in the industry.
The collected data shows the significant increase in protocol proposals in 2007-2008, but this number has fallen drastically.This makes us think that the protocol to implement SLRs in SE has generally attained a certain acceptance and stability within the SE community.Then, the emphasis of the community is migrating toward improving the quality of primary studies.We do not have the arguments and it is beyond the scope of this work to verify whether these hypotheses are true or false.It can give rise to a new type of research for SLR and SE according to the quality of primary works and the need to establish more tertiary studies that are dedicated to reviewing the quality of secondary studies.An example of this are Kitchenham et al. (2012) and Zhang and Ali-Babar (2013), which report on issues, activities, working groups characterization and quality of SLRs and primary works collected.
In addition, if we observe the authors and co-authors of each one of the twelve selected works, in 50% of these a subset of six researchers is involved.Therefore, we can say that there is a group concerned with improving the processes and performance of SLRs in SE.The case of B. Kitchenham stands out; besides having adapted the protocol to develop SLR in SE, she is present in four of the twelve works, and in three of them as the main author.We can say that she and her research group are leading the work in terms of SLR research in SE.
Finally, besides the changes to the protocol that we evidenced, some authors make a set of recommendations to improve the SLRs in SE.Next, we present a summary of these recommendations, identifying: item, authors and proposals of each one of them.
Abstract: The low quality and how the abstracts are considered as a key element in selecting works (Staples and Niazi, 2007).Also, it is recommended using the structured abstract and suggest it as an important source of information and to emphasize the abstract as the only section of the publication that is accessible free of charge (Jedlitschka and Pfahl, 2005).For more recommendations on using structured abstracts see (Budgen et al., 2008).
Search: Searching for relevant works using digital sources in SE community makes it necessary to use different search strings, try them out and evaluate the results (Chen et al. 2009, Kitchenham et al. 2007.).Search engines do not support the use of search strings to conduct SLRs (Staples and Niazi 2007).Using a glossary of terms from the experience based medicine can be helpful for those initiating SLRs (Kitchenham et al. 2010).To have a unified source, a centralized SLR index in SE similar to the Cochrane Collaboration 3 initiative is recommended (Staples and Niazi, 2007).
Quality: According to (Cruzes and Dybå, 2011), the quality of SLRs conducted can be positively influenced if the challenges at the time of synthesizing the research around SE are better understood.In addition, despite the focus being placed on SLRs, limited attention is given to this item.
It requires becoming a central aspect of the SLR so as to increase its importance and utility both in the research and practice of the discipline.A simplification of the original criterion raised by Kitchenham to evaluate the quality of each work is suggested by Staples and Niazi (2007).In the future, instruments should be developed to support the implementation and control of a SLR, similar to the PRISMA 4 proposal (Moher et al. 2010).
Protocol and stages: Considering the original protocol for SLR, Staples and Niazi (2007) talk about the lack of clarity in directives for synthesizing data.This, despite the fact that they agree with the importance of running a pilot project, and criticize Kitchenham for not clarifying when to stop the pilot or when a pilot project must be run.Improvements regarding how to conduct a SLR and a set of learning strategies are collected by Brereton et al. (2007).
Templates: Recommendations about using templates to conduct SLRs and to define an ontology describing the knowledge of experimental studies are suggested (Biolchini  . 2006).An application of this template can be seen in Biolchini et al. (2005).About using guidelines to report results in EBSE, including SLR, see Jedlitschka and Pfahl (2005).
Tool support: To conduct SLRs requires considerable effort and these are time consuming.On the other hand, many stages and tasks are carried out manually, which means having tools to support this process is very important.
In recent years there have been various proposals with software tools supporting different tasks for conducting SLRs (Bowes et al. 2012, Felizardo et al. 2011).

Threats to validity
We are aware there are some threats that may affect validity of the findings and results.Among these are: (i) Possible bias in selecting works.We use data sources that are highly recognizable within the SE community (IEEE Xplore, ACM Digital Library, Science Direct).We do not consider other relevant sources, basically due to aspects of scope and time.While a total of twelve works found seems to be very low, in the future we hope to validate the results obtained by expanding the sources considered and improving the RQs.
(ii) Limitations of the search engines used to conduct the searches in electronic data sources (Dybå 2008).We tried to mitigate these threats by means of an individual selection and a joint validation of the works, thus avoiding individual bias.In order to avoid works being left out of the study, the idea was to review all the versions of a work, whether these were journals, conference proceedings or technical reports.
(iii) Limitations of the search string.The used search string was not validated by domain experts and neither was a criterion used to build this string, such as PICOC (Petticrew and Roberts 2008).This weakness undermines the generalization factor of this study and must be considered in future works.

Conclusions
The work presented covers the protocol adaptations of the SLR as a research methodology in SE.We provide a chronological study that includes its current status.In addition, the answers and evidence for the RQ have been reviewed.The collected evidence may be of interest to practitioners and new researchers who wish to approach the SLR as a relevant source of information, as well as to researchers planning to conduct additional studies on SLR and SE.
Although there are other works that present both a set of observations and criticisms made in the SE SLR, as far as we know there is no evidence of works that specifically report results of the protocol adaptations to conduct SLR as a methodology applied research in SE.This work can therefore be seen as a complement to those that are reviewing the evolution of SLR in SE.We understand that more tertiary studies are required in this area for it to delve into greater detail.
As future work, we plan to add and refine the RQs and data sources in order to test the robustness of the ideas put forward here.Also, a quality evaluation of collected data must be done to test the strength of evidence.

Figure 1 .
Figure 1.Proposals for changes to the SLR protocol for the planning stage.

Figure 2 .
Figure 2. Proposals for changes to the SLR protocol for the implementation stage.

Figure 3 .
Figure 3. Proposals for changes to the SLR protocol for the documentation/report stage.

Figure 4 .
Figure 4. Annual number of proposals for changes to SLR protocol.

Table 1 .
RQs to be answered.

Table 2 .
Stages and activities reviewed.