Skip to main content
Log in

Why and how developers fork what from whom in GitHub

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Forking is the creation of a new software repository by copying another repository. Though forking is controversial in traditional open source software (OSS) community, it is encouraged and is a built-in feature in GitHub. Developers freely fork repositories, use codes as their own and make changes. A deep understanding of repository forking can provide important insights for OSS community and GitHub. In this paper, we explore why and how developers fork what from whom in GitHub. We collect a dataset containing 236,344 developers and 1,841,324 forks. We make surveys, and analyze programming languages and owners of forked repositories. Our main observations are: (1) Developers fork repositories to submit pull requests, fix bugs, add new features and keep copies etc. Developers find repositories to fork from various sources: search engines, external sites (e.g., Twitter, Reddit), social relationships, etc. More than 42 % of developers that we have surveyed agree that an automated recommendation tool is useful to help them pick repositories to fork, while more than 44.4 % of developers do not value a recommendation tool. Developers care about repository owners when they fork repositories. (2) A repository written in a developer’s preferred programming language is more likely to be forked. (3) Developers mostly fork repositories from creators. In comparison with unattractive repository owners, attractive repository owners have higher percentage of organizations, more followers and earlier registration in GitHub. Our results show that forking is mainly used for making contributions of original repositories, and it is beneficial for OSS community. Moreover, our results show the value of recommendation and provide important insights for GitHub to recommend repositories.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. http://github.com

  2. https://github.com/rowanj/gitx

  3. https://github.com/pieter/gitx

  4. http://developer.github.com/v3/

  5. https://github.com/blog/40-we-launched

  6. https://rubygems.org/

  7. https://github.com/explore

  8. http://github.com/equus12

  9. https://rubygems.org/

  10. https://github.com/lightbot/ForkResearch

References

  • Begel A, Bosch J, Storey MA (2013) Social networking meets software development: perspectives from github, msdn, stack exchange, and topcoder. IEEE Soft 30(1):52–66

    Article  Google Scholar 

  • Bird C, Rigby PC, Barr ET, Hamilton DJ, German DM, Devanbu P (2009) The promises and perils of mining git. In: Proceedings of MSR, Vancouver

  • Crowston K, Wei K, Howison J, Wiggins A (2012) Free/libre open source software development: What we know and what we do not know. ACM Comput Surv:44

  • Dabbish L, Stuart C, Herbsleb J (2012) Social coding in github: transparency and collaboration in an open software repository. In: Proceedings of CSCW, Washington

  • Dabbish L, Stuart C, Tsay J, Herbsleb J (2013) Leveraging transparency. IEE Soft 30(1):37– 43

    Article  Google Scholar 

  • DiBona C, Ockman S, Stone M (eds) (1999) Open sources: voices from the open source revolution. O’Reilly

  • Ernst NA, Easterbrook S, Mylopoulos J (2010) Code forking in open-source software: a requirements perspective. arXiv:1004.2889

  • FBissyande T, Thung F, Lo D, Jiang L, Reveillere L (2013) Popularity, interoperability, and impact of programming languages in 100,000 open source projects. In: Proceedings of COMPSAC , Kyoto

  • Fung KH, Aurum A, Tang D (2012) Social forking in open source software: an empirical study. In: CAiSE forum, Poland

  • Gousios G, Pinzger M, van Deursen A (2014) An exploratory study of the pull-based software development model. In: ICSE, Hyderabad

  • Happel HJ, Maalej W (2008) Potentials and challenges of recommendation systems for software development. In: Proceedings of the international workshop on Recommendation systems for software engineering, pp 11–15

  • Jiang J, Zhang L, Li L (2013) Understanding project dissemination on a social coding site. In: Proceedings of WCRE, Koblenz

  • Kalliamvakou E, Gousios G, Blincoe K, Singer L, German D M, Damian D (2014) The promises and perils of mining github. In: Proceedings of MSR, Hyderabad

  • Lee MJ, Hahn J, Ferwerda B, Moon JY, Choi J, Kim J (2013) Github developers use rockstars to overcome overflow of news. In: Proceedings of CHI, pp 133–138

  • Mann HB, Whitney DR (1947) On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat 18(1):50–60

    Article  MathSciNet  MATH  Google Scholar 

  • Marlow J, Dabbish L (2013) Activity traces and signals in software developer recruitment and hiring. In: San Antonio

  • Muffatto M, Faldani M (2003) Open source as a complex adaptive system. EMERGENCE 5(3):83– 100

    Article  Google Scholar 

  • Nagy D, Yassin A, Bhattacherjee A (2010) Organizational adoption of open source software: barriers and remedies. Commun ACM 53(3):148–151

    Article  Google Scholar 

  • Neville-Neil G V (2011) Think before you fork. Commun ACM 54(6):34–35

    Article  Google Scholar 

  • Nyman L, Lindman J (2013) Code forking, governance, and sustainability in open source software. Technology Innovation Management Review:7–12

  • Pham R, Singer L, Liskin O, Filho FF, Schneider K (2013) Creating a shared understanding of testing culture on a social coding site. In: Proceedings of ICSE, San Francisco

  • Robillard MP, Walker RJ, Zimmermann T (2010) Recommendation systems for software engineering. IEEE Soft 27(4):80–86

    Article  Google Scholar 

  • Robillard MP, Maalej W, Walker RJ, Zimmermann T (2014) Recommendation systems in software engineering. Springer

  • Robles G, Gonzalez-Barahona JM (2012) A comprehensive study of software forks: Dates, reasons and outcomes. Open Source Systems: Long-Term Sustainability 378:1–14

    Google Scholar 

  • Thung F, FBissyande T, Lo D, Jiang L (2013) Network structure of social coding in github. In: 17th European conference on software maintenance and reengineering, Genova

  • Tian Y, Achananuparp P, Lubis IN, Lo D, Lim EP (2012) What does software engineering community microblog about?. In: MSR, pp 247–250

  • Tsay J, Herbsleb J, Dabbish L (2012) Social media and success in open source projects. In: Proceedings of CSCW, Seattle

  • Zhang L, Zou Y, Xie B, Zhu Z (2014) Recommending relevant projects via user behaviour: An exploratory study on github. In: Proceedings of the international workshop on crowd-based software development methods and technologies, pp 25–30

Download references

Acknowledgments

This work is supported by National Natural Science Foundation of China under Grant No.61300006, the State Key Laboratory of Software Development Environment under Grant No.SKLSDE-2015ZX-24, and Beijing Natural Science Foundation under Grant No.4163074.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jing Jiang or Li Zhang.

Additional information

Communicated by: Massimiliano Di Penta

Li Zhang is the first corresponding author

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, J., Lo, D., He, J. et al. Why and how developers fork what from whom in GitHub. Empir Software Eng 22, 547–578 (2017). https://doi.org/10.1007/s10664-016-9436-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-016-9436-6

Keywords

Navigation