Abstract
If empirical software engineering is to grow as a valid scientific endeavor, the ability to acquire, use, share, and compare data collected from a variety of sources must be encouraged. This is necessary to validate the formal models being developed within computer science. However, within the empirical software engineering community this has not been easily accomplished. This paper analyses experiences from a number of projects, and defines the issues, which include the following: (1) How should data, testbeds, and artifacts be shared? (2) What limits should be placed on who can use them and how? How does one limit potential misuse? (3) What is the appropriate way to credit the organization and individual that spent the effort collecting the data, developing the testbed, and building the artifact? (4) Once shared, who owns the evolved asset? As a solution to these issues, the paper proposes a framework for an empirical software engineering artifact agreement. Such an agreement is intended to address the needs for both creator and user of such artifacts and should foster a market in making available and using such artifacts. If this framework for sharing software engineering artifacts is commonly accepted, it should encourage artifact owners to make the artifacts accessible to others (gaining credit is more likely and misuse is less likely). It may be easier for other researchers to request artifacts since there will be a well-defined protocol for how to deal with relevant matters.
Similar content being viewed by others
Notes
South Korea: Scientist Admits Faking Stem Cell Data, New York Times, July 5, 2006.
Please send comments to the second author, Zelkowitz.
References
Basili V, McGarry F, Pajerski R, Zelkowitz M (2002) Lessons learned from 25 years of process improvement: the rise and fall of the NASA Software Engineering Laboratory. IEEE computer society and ACM international conference on software engineering, Orlando, FL (May)
Donzelli P, Basili V (2006) A practical framework for eliciting and modeling system dependability requirements: experience from the NASA high dependability computing project. J Syst Softw 79(1):107–119 (January)
Hochstein L, Carver J, Shull F, Asgari S, Basili V, Hollingsworth JK, Zelkowitz M (2005) HPC programmer productivity: a case study of novice HPC programmers, supercomputing 2005, Seattle, WA (November)
Holcombe M, Cowling AJ, Macias F (2003) Towards an agile approach to empirical software engineering, proceedings of the workshop on the future of empirical studies in software engineering, Rome, Italy, Fraunhofer IRB Verlag, pp 35–46
Johnson PM, Kou H, Agustin JM, Zhang Q, Kagawa A, Yamashita T (2004) Practical automated process and product metric collection and analysis in a classroom setting, International symposium on empirical software engineering, Los Angeles CA (August)
Shull F, Basili V, Boehm B, Brown AW, Costa P, Lindvall M, Port D, Rus I, Tesoriero R, Zelkowitz M (2002a) What we have learned about fighting defects, IEEE computer society international symposium on software metrics, Ottawa Canada, pp 249–258 (June)
Shull F, Basili, V, Carver J, Maldonado J, Travassos G, Mendonca M, Fabbri S (2002b) Replicating software engineering experiments: addressing the tacit knowledge problem, International symposium on empirical software engineering, Nara, Japan (October)
Acknowledgement
This paper is an outgrowth of sessions at the International Software Engineering Research Network (ISERNFootnote 3) meeting in Los Angeles (2004), Noosa Heads, Australia (2005) and Rio de Janeiro (2006). ISERN is a worldwide organization of about 50 organizations dedicated to fostering empirical software engineering research. The authors acknowledge the contributions of the 75 attendees to those meetings.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 Potential data sharing agreement for NASA SEL data
The attributes that most closely reflect what was done with the NASA SEL data in the 1990s are given below.
Attribute | Option | Sample agreement wording |
Lifetime | Unlimited | Signer of the agreement may use artifact as long as needed. |
Area | Unlimited | Signer of the agreement may use artifact on any project |
Data | Proprietary | Artifacts contain information that uniquely identifies specific projects and personnel. Any report developed from this data must remove all such personal identifications. |
Transfer | Yes | Signer of the agreement is free to transfer data to any other organization. |
Publication | None | Signer of agreement is free to publish all results using these artifacts |
Help | Limited | Owner of data will give limited help in using this data. |
Costs | None | There is no cost to obtain a tape of this data. |
Derivatives | None | Any derivative work is owned by the signer of the agreement. |
If those responsible for this project were writing such an agreement today, in the light of the experience described above it probably would be more restrictive, as follows:
Attribute | Option | Sample agreement wording |
Publication | Acknowledge | Signer of agreement must acknowledge the owner of data as the source of the NASA SEL data in any resulting publications. (Perhaps also a clause that signer will give owner prior results to allow for checking whether data is misused.) |
Derivatives | Open-source | Any derivative work is covered by the same conditions as this agreement. A copy of the derived result shall be transmitted to the owner of data. |
Rights and permissions
About this article
Cite this article
Basili, V.R., Zelkowitz, M.V., Sjøberg, D.I.K. et al. Protocols in the use of empirical software engineering artifacts. Empir Software Eng 12, 107–119 (2007). https://doi.org/10.1007/s10664-006-9030-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-006-9030-4