A Holistic, Proactive and Novel Approach for Pre, During and Post Migration Validation from Subversion to Git

Software development is getting a transition from centralized version control systems (CVCSs) like Subversion to decentralized version control systems (DVCDs) like Git due to lesser efficiency of former in terms of branching, fusion, time, space, merging, offline commits & builds and repository, etc. Git is having a share of 77% of total VCS, followed by Subversion with a share of 13.5%. The majority of software industries are getting a migration from Subversion to Git. Only a few migration tools are available in the software industry. Still, these too lack in many features like lack of identifying the empty directories as premigration check, failover capabilities during migration due to network failure or disk space issue, and detailed report generation as post-migration steps. In this work, a holistic, proactive and novel approach has been presented for pre/during/ post-migration validation from Subversion to Git. Many scripts have been developed and executed run-time over various projects for overcoming the limitations of existing migration software tools for a Subversion to Git migration. During premigration, none of the available migration tools has the capability to fetch empty directories of Subversion, which results in an incomplete migration from Subversion to Git. Many Scripts have been developed and executed for pre-migration validation and migration preparation, which overcomes the problem of incomplete migration. Experimentation was conducted in SRLC Software Research Lab, Chicago, USA. During the migration process, in case of loss of network connection or due to any other reason, if migration stops or breaks, available migration tools do not have capabilities to start over from the same point where it left. Various Scripts have been developed and executed to keep the migration revision history in the cache (elastic cache) to start from the same point where it was left due to connection failure. During post-migration, none of the available version control migration tools generate a detailed report giving information about the This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Computers, Materials & Continua DOI:10.32604/cmc.2021.013272 Article ech T Press Science

Abstract: Software development is getting a transition from centralized version control systems (CVCSs) like Subversion to decentralized version control systems (DVCDs) like Git due to lesser efficiency of former in terms of branching, fusion, time, space, merging, offline commits & builds and repository, etc. Git is having a share of 77% of total VCS, followed by Subversion with a share of 13.5%. The majority of software industries are getting a migration from Subversion to Git. Only a few migration tools are available in the software industry. Still, these too lack in many features like lack of identifying the empty directories as premigration check, failover capabilities during migration due to network failure or disk space issue, and detailed report generation as post-migration steps. In this work, a holistic, proactive and novel approach has been presented for pre/during/ post-migration validation from Subversion to Git. Many scripts have been developed and executed run-time over various projects for overcoming the limitations of existing migration software tools for a Subversion to Git migration. During premigration, none of the available migration tools has the capability to fetch empty directories of Subversion, which results in an incomplete migration from Subversion to Git. Many Scripts have been developed and executed for pre-migration validation and migration preparation, which overcomes the problem of incomplete migration. Experimentation was conducted in SRLC Software Research Lab, Chicago, USA. During the migration process, in case of loss of network connection or due to any other reason, if migration stops or breaks, available migration tools do not have capabilities to start over from the same point where it left. Various Scripts have been developed and executed to keep the migration revision history in the cache (elastic cache) to start from the same point where it was left due to connection failure. During post-migration, none of the available version control migration tools generate a detailed report giving information about the

Introduction
For almost all software projects, source code is like the crown jewels, a precious asset whose value must be protected. For most software teams, the source code is a repository of the invaluable knowledge and understanding about the problem domain that the developers have collected and refined through careful efforts. Version control protects source code from both catastrophe and the casual degradation of human error and unintended consequences. Software developers working in teams are continually writing new source code and changing existing source code [1]. The code for a project, app, or software component is typically organized in a folder structure or file tree. One developer in the team may be working on a new feature. In contrast, another developer fixes an unrelated bug by changing code; each developer may make their changes in several parts of the file tree. While it is possible to develop software without using any version control, doing so subjects the project to a considerable risk that no professional team would be advised.
There are various version control systems in use like Subversion (SVN), Mercurial, CVS, Git and many more. By far, the most widely used modern version control system is Git. Git is a mature, actively maintained open source project initially developed in 2005 by Linus Torvalds, developer of the Linux operating system kernel. A staggering number of software projects rely on Git for version control, including commercial projects as well as open-source. Git has a variety of advanced features like distributed nature, offline commits facility, fast processing, staging feature and many more. Developers who have worked with Git are well represented in the pool of available software development talent and it works well on a wide range of operating systems and IDEs (Integrated Development Environments). In the recent few years, Git has been the first choice as version control of small to massive IT corporate world due to its better features [2].
Git is having a share of 77% of total VCS, followed by Subversion with a share of 13.5% [3]. The majority of software industries are getting a migration from Subversion to Git. Only a few migration tools are available in the software industry. Still, these too lack in many features like lack of identifying the empty directories as pre-migration check, failover capabilities during migration due to network failure or disk space issue and detailed report generation as post-migration steps. In this work, a holistic, proactive and novel approach has been presented for pre/during/post-migration validation from Subversion to Git. Many scripts have been developed and executed run-time over various projects for overcoming the limitations of existing migration software tools for a Subversion to Git migration. Experimentation was conducted in SRLC Software Research Lab, Chicago, USA.
The rest of the paper is organized as follows. Section 2 gives, in brief, the major work done by earlier researchers in the migration process of Subversion to Git VCS. Pre-migration validation and migration preparation are provided in Section 3. Section 4 deals with the process of migration. Section 5 the postmigration validation-report generation & notification of version control migration. Finally, work is concluded in Section 6.

Related Work
In Subversion, many earlier studies focus on commit size distribution, which gives the probability that a given commit is of a particular size. Commit size for the number of files committed for a revision can follow power laws [4], Pareto model [5], Petro-distribution [6], etc. Commits have been categorized based on various features, mainly size and comment [7]. Dynamics of open source software developer's commit behavior have been investigated for Subversion in Chen et al. [8]. It is evidence that within the life cycle and each release of project data sets of project-level and file-level collective commit interval follows a power-law distribution. Four open-source software projects on Apache.org have been considered for the above investigation. An empirical study on inter-commit times in Subversion has been performed in Wang et al. [9] on two projects written in Java. It is observed that both POI and Tomcat distribution of commit intervals follow power laws. Further, two major factors that cause a very long period of inactivity are active committer's behavior of long vacations.
For a centralized VCS, a commit signing mechanism has been proposed in Kumar et al. [10]. The proposed mechanism supports various features like it allows clients to work on a disjoint set of files without retrieving other's changes. It also allows working with a subset of the repository. Collaborative vocabulary development has been focused on many earlier works [11]. Applicability of Git for collaborative vocabulary development has been investigated in Vaidya et al. [12]. Git4Voc has been proposed, which gives guidelines on how Git can be adopted to vocabulary development. It is shown that by using vocabulary-specific features, Git hooks can be implemented to go beyond the Git plain functionality. LHCb migration from Subversion to Git has been presented in Halilaj et al. [13]. Issues related to specific requirements of LHCb have been addressed. Technical details of migration of large non-standard SVN repositories have also been addressed. It has been claimed that this migration from Subversion to Git has resulted in increased productivity in terms of new projects & the number of contributions and code quality in terms of testing and reviews. A mechanism for classification and extraction of changes is proposed in Diane et al. [14]. A tool named Git Change Classifier (GCC) is proposed, which uses text mining for determining the type of change and regular expressions for extracting the changes. Year-wise number of changes for a file have been reported by GCC where changes have been classified as: bug repairing changes (BRC), feature introducing changes (FIC) and global changes (GC). Various challenges and confusion in learning version control with Git have been reported in Li et al. [15], followed by recommendations for dealing the same. General concepts of centralized and distributed VCS have been discussed in Tanwar et al. [16][17][18][19][20] and how these concepts are implemented by Subversion and Git VCS.

Pre-Migration Validation and Migration Preparation
There is no software solution for pre-identification and pre-validation of a complete project structure and placing a version control place holder file created by Git users as a Git repository preserves an otherwise empty project directory. Subversion supports empty repositories, but Git does not as part of the project structure, which might be essentially required for complete software development like lib (libraries), dist. and target folders. If the source code repository has few empty folder structures [21] in Subversion and a Subversion to Git migration is executed, then it would never be completely imported to Git. These empty repositories might be an essential part of the project development structure, for example, lib or dist. folders that are needed to download the maven-based dependencies at compile or run time. In real-time, since these available migration tools never fetch empty repositories [22], it eventually results in incomplete Subversion to Git migration. It would require project build failure; hence whole project delivery pipeline is failed, require manual efforts to correct the project structure in Git repositories and also a start over of the Subversion to Git migration. It results in depletion of manual efforts, industry timelines and financial losses. These migrations should have pre-migration capability to scan and analyses the entire Subversion repository and to create a Git place holder during Git clone. In this work, many shell Scripts have been developed for identifying all directories/subdirectories which are empty and need a place holder for complete migration.

Pre-Requisites
Pre-requisites for pre-migration validation and migration preparation are as follows: Install Oracle JRE 1.8 or newer (LTR release) Install SVN Mirror add-on migration tool in Git BitBucket Sufficient space should be available on the BitBucket server as per the size of SVN repositories SVN repo URLs must be accessible from the BitBucket server Dependencies: Dependencies for pre-migration validation and migration preparation are as follows: License procurement for SVN Mirror add-on migration tool from SubGit organization Count of SVN users to be migrated to Git

Migration Workflow
The migration workflow is given in Fig. 1. Steps for Pre-migration validation and migration preparation: The project structure needs to be validated in Subversion to identify if there is an empty directory in all repositories. For this purpose, a script, shown in Fig. 2 has been developed. Next, Subversion repositories need to be validated for empty project structures using the script shown in Fig. 2.

Results
Email notification shall be sent by the Script and also results in command line.
Total 3 folders have been fixed by adding place holder (.gitkeep).

Installation Steps for SVN Mirror Add on
Firstly, a fresh Git project need to be created in Git Bitbucket. Once the project is created then it need to be identified if there is any empty repository. SVN-Mirror add on the tool is to be installed to migrate the project. For this purpose, the following command is executed. subgit.bat configure-layout auto-trunk TRUNK SVN_URL GIT_REPO Fig. 3 shows the Script for installation and configuration of the SVN Mirror (SubGit) tool.

Configure SVN Mirror Settings
Now mirror Git repository need to be configured. For this purpose following command is executed to configure Git repository to reflect SVN project: $ subgit configure-layout auto-trunk trunk SVN_PROJECT_URL repos.git The above command will detect branches layout in the SVN project and then will create an empty Git repository ready to mirror the SVN project.

Author Mappings, Adjust Translation & Initial Translation
Here repositories name is to be provided, which subgit tool will use to copy from SVN. This input is very much required because sometime it might not be needed to migrate all repositories from SVN. The same way the authors, which means SVN users, will be filtered out based upon the repositories chosen in repos.git.
Finally, repositories shall be imported by executing an import command. It will start the initial translation and will import all repositories from Subversion to Git. edit repos.git/subgit/config $ edit repos.git/subgit/authors.txt $ subgit install repos.git $ subgit import repos.git

During Migration
In the available migration software tools from Subversion to Git, in case of network failure or for any other reason, if version migration is failed or hampered, then there is no way by which an alert can be sent to the project administrator about network failure. Some of the repositories are very old with giga/terabytes of data. Further, during migration in case of loss of network connection or due to any other reason, if migration stops or breaks, available tools do not have capabilities to start over from the same point where it left. So there must be a mechanism to alert or inform migration administrators in the form of email communication. Version control migration tools should have capabilities to notify the users/administrator about network connection failure between source and destination. Also, the tool should start over from the breaking point precisely from the same revision history where it left after connection breaks. The tool must have the capability to keep the migration revisions history in the cache (elastic cache) to start from the same point where it was left due to connection loss.

Figure 3: Results of SVN Mirror (SubGit) tool installation and configuration
Most of the time, Subversion repositories and projects are substantial in nature. In a typical scenario, it could be 1-2 TB. This could lead to migrate data in hours based upon network bandwidth. This leads to the development of Scripts for automation and incorporation of these Scripts in Jenkins to continuously check the status of Git repositories during migration and also to restart the migration if migration has stopped due to network connection failure. This status might include: if the size of the repository is being increased, network ping status between Git and Subversion servers, if commits counts are getting increased, etc. Fig. 4 shows the Script developed to automate these tasks and incorporated with Jenkins to schedule it and restart the migration if migration is failed for any reason.
During the migration process, if Git count is not increasing, then most probably it is due to network failure, etc. In this situation, during migration, the migration process should get a start from the last commit point. To accomplish it, a Script is developed in Python for making a check if Git count is not increasing and to restart the migration process in case of failure. This is shown in Fig. 5. Results and validation of scripts are shown in Fig. 6 for checking network connectivity.  In case if commit count is not increasing, then an email notification must be sent to the administrator about it. Fig. 7 shows the Script developed for the above purpose. Fig. 8 shows the email notification received from python scripts about migration failure.

Post-Migration Validation-Report Generation & Notification of Version Control Migration
After the migration is completed successfully, none of the available version control migration tool generates a detailed report giving information about the total size of source Subversion repositories, the total size of data migrated to destination repositories in Git, the total number of repositories migrated, time taken for migration, number of Subversion users with email notification, etc. Version control  migration tool should be capable enough to send email notification and generate a detailed report to version control admin/software configuration manager or release engineering team with a detailed report about revision & commit history, user mapping and branch mapping, including tagging feature. The tool should be capable enough to provide this information after successful migration. For all these purposes, a shell Script has been developed for the generation of a detailed report of post-migration and is shown in Fig. 9. Fig. 10 shows the algorithm for printing the detailed report after migration. Fig. 11 shows the Script developed for comparing the Git users with Subversion users while Fig. 12 shows the algorithm for the same. A detailed report generated about migration success is shown in Fig. 13.

Conclusion
Software development is getting a transition from centralized version control systems like Subversion to decentralized version control systems like Git due to lesser efficiency of former in terms of branching, fusion, time, space, merging, offline commits & builds and repository, etc. Only a few migration tools are available Figure 12: Algorithm for comparing the Git users with Subversion users Figure 13: Detailed report generation about migration success in the software industry. Still, these too lack in many features like lack of identifying the empty directories as pre-migration check, failover capabilities during migration due to network failure of disk space issue, and detailed report generation as post-migration steps. In this work, a holistic, proactive and novel approach has been presented for pre/during/post-migration validation from Subversion to Git. Many scripts have been developed and executed run-time over various projects for overcoming the limitations of existing migration software tools for a Subversion to Git migration. During pre-migration, none of the available migration tools have the capabilities to fetch empty directories of Subversion, which results in an incomplete migration from Subversion to Git. In this work, many Scripts have been developed and executed for pre-migration validation and migration preparation, which overcomes the problem of incomplete migration. During the migration process, in case of loss of network connection or due to any other reason, if migration stops or breaks, available migration tools do not have capabilities to start over from the same point where it left. Various Scripts have been developed and executed to keep the migration revision history in the cache (elastic cache) to start from the same point where it was left due to connection failure. During post-migration, none of the available version control migration tool generate a detailed report giving information about the total size of source Subversion repositories, the total volume of data migrated to destination repositories in Git, total number of repositories migrated, time taken for migration, number of Subversion users with email notification, etc. Various Scripts have been developed and executed for the above purpose during post-migration process.
With all above mentioned experimentation done in the SRLC Software Research Lab, Chicago, USA it could be stated that it is very much possible and feasible that any version control migration tool can be made more capable and dynamic by providing a holistic approach for the migration for all repositories from Subversion to Git. With the help of automation and Scripts it can be ensured that Subversion repositories have proper structure compatible to the Git structure before actual migration starts and during migration there is proper notification sent to all stakeholders in the IT business. Automation has been done using python and shell Scripts to complete a post migration validation tasks. Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.