Concerns-Based Reverse Engineering for Partial Software Architecture Visualization

— Recently, reverse engineering (RE) is becoming one of the essential engineering trends for software evolution and maintenance. RE is used to support the process of analyzing and recapturing the design information in legacy systems or complex systems during the maintenance phase. The major problem stakeholders might face in understanding the architecture of existing software systems is that the knowledge of software architecture information is difficult to obtain because of the size of the system, and the existing architecture document often is missing or does not match the current implementation of the source code. Therefore, much more effort and time are needed from multiple stakeholders such as developers, maintainers and architects for obtaining and re-documenting and visualizing the architecture of a target system from its source code files. The current works is mainly focused on the developer viewpoint. In this paper, we present a RE methodology for visualizing architectural information for multiple stakeholders and viewpoints based on applying the RE process on specific parts of the source code. The process is driven by eliciting stakeholders’ concerns on specific architectural viewpoints to obtain and visualize architectural information related these concerns. Our contributions are three fold: 1-The RE methodology is based on the IEEE 1471 standard for architectural description and supports concerns of stakeholder including the end-user and maintainer; 2-It supports the visualization of a particular part of the target system by providing a visual model of the architectural representation which highlights the main components needed to execute specific functionality of the target system, 3-The methodology also uses architecture styles to organize the visual architecture information. We illustrate the methodology using a case study of a legacy web application system.


I. INTRODUCTION
Nowadays, reverse engineering (RE) is becoming one of essential engineering trends for software evolution and maintenance. Generally; RE is defined as the way of analysing an existing software system to identify its current components and the dependencies between these components to recover design information, and create new forms of system representations [1]- [4]. The core of RE consists of extracting information from the available software artifacts (such as: source code) and representing it into visual models to be understandable by stakeholders [3], [5]. The main objectives of RE are focused on generating alternative views of system's architecture, recapture design information, re-documentation of software system, facilitate software system's reuse, and represent software systems at higher level of abstractions (by putting the system's users in the maintenance loop so that users can give feedback on the information related the target system). Furthermore; RE is used to support recapturing the design information for restructuring the architecture into more maintainable architecture [3], [5]. Hence, most of the companies rely on reengineering the legacy systems which are important for their business process and keep them in operations [3].
Moreover, software documentation is essential for the system's stakeholders (such as: developers, end-users, testers, maintainers, architects, system administrators, etc.) to decide on activities in order to evolve and maintain the software system. For example, "source code" is considered as the detailed documentation for the software system implementation, and in most cases, it is the only source of information that up to date and available for legacy software systems. Accordingly; IEEE_1219 standards recommend the RE as a key supporting technology to deal with source code as the "reliable representation" of software systems [3], [5].
Recovering and documenting software architectures (either fully or partially) has been an area of active research where programmers, architects, maintainers, testers and software engineers spend a lot of time using their expertise in resolving such problems of mapping existing source code

INTERNATIONAL JOURNAL ON INFORMATICS VISUALIZATION
VOL 4 (2020) NO 2 e-ISSN : 2549-9904 ISSN : 2549-9610 of a target system into architecture components and for supporting the understand-ability and maintainability of software systems.
Previous research made great progress to overcome the problems of documenting and recovering software architectures to reflect the system's changes at the code level. However, to deal with complex legacy systems, there is a significant need to develop a new RE approaches or methods for documenting the only part of the architecture in order to simplify and visualize the available information of complex architectures. This should be based on stakeholders concerns and their decisions about the architecture of the target system. Hence, it's important to determine what to look for and focus in obtaining specific information on the architecture of the implemented software system. This paper represents RE methodology for extracting a particular architectural information based on applying RE process on specific parts of the implemented source code to support the understand-ability and maintainability process for particular parts of the software system. The rest of the paper is organized as follows: Section 2; presents the proposed RE methodology and the detailed design of RE methodology's phases. Section 3; describes how to apply RE methodology's phases to a case study. Section 4; compares the proposed methodology with related works. Finally, Section 5 concludes with the main contributions and highlights the future research

II. THE PROPOSED REVERSE ENGINEERING METHODOLOGY
This section presents an overview of the proposed RE methodology. We discuss the principles of proposed methodology, and describe the detailed design of the main phases of the methodology.

A. Overview of the Proposed RE Methodology
The main goal of proposed methodology is to define a RE process for extracting particular architectural information based on stakeholder's viewpoints and concerns related to the target software system.
The RE methodology is based on three main concepts defined in the IEEE1471 standard for architectural description such as (stakeholder, viewpoint and concern). The main idea is to elicit stakeholders' concern on specific architectural viewpoint of the target software system. Then we apply the RE process to extract and document a particular architectural information about the target software system driven by the elicited concern.
The extraction process of RE methodology is driven by addressing the specific concerns of the stakeholder(s) for extracting only partial architectural information. Therefore, it's doesn't address the RE of the whole architecture of target software system. The general overview of RE methodology is shown in Figure 1 as follow. The inputs are the source code and documentation as well as the stakeholders concerns regarding the software system. The output is a model of a particular architectural information based on the specific concerns.

B. RE Methodology Principles
The principles of RE methodology are summarized as:  RE methodology is based on three concepts defined in the IEEE1471 standard for architectural description as shown in Figure 2. These concepts are described as follows [6]-[8]:  Stakeholder is a person, group or entity with an interest in the realization of the architecture.
 Concern related to specific functional or nonfunctional requirements of the software system is defined as: a concern to a requirement, an objective, an intention, or aspiration which a stakeholder has for the software system.
 Viewpoint defines the perspective from which the view is taken; and each viewpoint covers a set of concerns related to one or more stakeholder(s).  Our RE methodology extends additional stakeholders such as: end-user, maintainer, analyst, architect and tester.
 The Methodology supports the understand-ability and maintainability of legacy software systems using partial architecture visualization

C. RE Methodology Phases
The Methodology consists of four phases described as follows:  Phase(1): Define stakeholders concerns based on one of the architectural viewpoints.
 Phase(4): Apply the RE process for extracting the particular architectural information driven by the extracted requirement information. Fig.3 The RE Methodology's Phases As shown in Figure3; the phases of RE methodology is described using a process modelling language. The following paragraphs elaborate on the detailed design of each phase:

1) Define stakeholders concerns based on architectural viewpoint:
This phase is based on the definition of "stakeholders" and "concerns" in IEEE 1471 standard for architectural description. We follow the classification of architectural viewpoints that are presented in literature. The activities in this phase includes the following two steps:  Select a viewpoint from a given catalog which describes specific architectural viewpoint for the target software system.  Categorize common stakeholders related to the selected viewpoint.  Figure 4). Each one of these viewpoint defines a set of concerns related to one or more stakeholder(s). Fig.4 The Viewpoints Catalog [10,17] Summarized the viewpoints catalog in Figure 4; the first three viewpoints: Functional, Information and Concurrency characterize the fundamental organization of the software system. The development viewpoint exists to support the system's construction. The deployment and operational viewpoints characterize the system's runtime environment [10], [17]. The last three viewpoints mainly covers the concerns of the developers and maintainers stakeholders. The methodology we present in this paper is focused on the "Functional viewpoint" from the catalog of Nick et al. The justification for selecting this "Functional viewpoint" is that it is applicable to all types of software systems; and reflects the essential architectural information for most of the stakeholders (such as: maintainer, end-user, developer, system administrator, tester, acquirer, assessor and communicator). Furthermore, the functional viewpoint includes a set of general stakeholders' concerns which reflect and realize the essential and basic architectural information about the software system. This information include the internal structure which determines the main elements of software system, the responsibilities of each element and primary interactions between elements, the functional capabilities that defines what the specific action(s) that system should take in a given situation, and the functional design philosophy that reflects how the system will work step by step from the user's perspective as represented in Table 1.

Functional viewpoint Description
Describes the system's runtime functional elements and their responsibilities, interfaces, and primary interactions between these elements.

General Concerns
 Internal structure  Functional capabilities  Functional design philosophy  The external interfaces

1.2) Categorize common stakeholders concerns related to
the selected viewpoint: This step includes the categorization of common stakeholders and their architectural concerns based on selected viewpoint catalog. The main idea is to address the following points: who are the stakeholders of target software system; and which concerns do they have according to the selected viewpoint.

2) Elicit specific stakeholder concern:
In "Phase(2)" we define an elicitation process to clearly describe a specific concern from the general architectural concerns defined in Phase(1).
The specific concern of the stakeholder is defined as a specific question that can be used to query the functional requirements document of the target software system. It can for example be used to select related use cases defined in the use case diagram of the requirements model. Accordingly, each elicited concern should have a question format and has two elements as follows:  CIDn: refers to concern ID (where n is an integer number), which written in dotted diamond box.
 Question: refers to elicited concern from the functional scenario of a target software system, and written in dotted rectangular box. As shown in Figure 5; the association between the functional requirement (FR) and elicited concern appears with dotted lines in the use case diagram of the target system. Moreover, it's possible to have multiple elicited concerns for one FR which are numbered as CID1, CID2,.. ,CIDn

3) Extract related requirement information based on elicited stakeholder's concern
In "Phase(3)" which describes how to extract the related requirement information related to the elicited functional concern produced in Phase(2). The stakeholder's functional concern should be focused on the functionality offered by the target software system.
To support the activities of this phase, we developed a prototype tool which has a graphical user interface (GUI).
The tool allows stakeholders to enter a specific concern in form of a "query". The specific concern will be elicited from the functional requirements repository assumed to be available for the target software system.
The tool extracts a set of related requirement information based on elicited concern, and creates a trace link between elicited concern and its relevant information. Figure 6 shows screen shots described as follows:

3.1) Extraction of related requirement information:
The extraction process starts by accessing the requirement repository and filtering all relevant information related the specified concern. Furthermore, the extraction process is achieved using the Full-Text indexing and searching mode technique as described in [19,20]. The Full-Text indexing and searching technique allows to implement keyword based filtering and sorting through several searches mode. The searching techniques is achieved using natural language searching mode which interprets the search for specific functional concern (in form of user query); then performs filtering process and ranking of the relevant information related to the specified concern. The main results are displayed in a dropdown menu and sorted into three categories:  High weight: appears in green color and represents highly relevant requirement information related the specified functional concern,  Medium weight: appears in yellow color and represents the medium relevance requirement information related the specified functional concern,  Low weight: represents low relevance values of requirement information, and appears in red color.
3.2) Traceability among specific concern and its related requirement information: The traceability process is performed after the extraction process. The main idea is to create a trace link among the extracted concerns and its relevant information using the tool as shown in Figure 6

4) RE for extracting particular architectural information
The final phase, "Phase(4)" is based on using the extracted requirement information produced from the previous. This phase includes two key activities as follows:  RE process for extracting specific source code files,  Representation of the particular architectural information based on the extracted code files.

4.1) RE process for extracting specific source code files:
This RE process is achieved by applying a code analyser process which performs static analysis on source code files to determine and trace which set of code files are used to implement specific functionality reflected by the extracted requirement information in Phase(3). The code analyzer process includes three key steps as shown in Figure 7. We describe these steps in the following paragraphs:  Select the starting point for tracking the execution of a specific functionality represented by extracted requirement information. For examples: page file, class, method or function from code elements. Notably, the selection of a starting point can be performed by using references from existing documents such as the user manual, or the software testing document.
 Track the execution of selected starting code element and analyze the code extraction contents and gather all related code elements.
 Extract related code elements in form of main code element and its related elements. The relation between code elements can be describes as:  require relation is used to describe the relations between code files and show the dependences of these files within the software system, or  contain relation is used to describe that code file contains a set of functions that are used to execute specific functionality of the system, or  Call relation is used to describe the relation between code elements and how different functions interact with each other. As summarized; the whole process of code analyzer is achieved by using a static analyser tool called doxygen tool [21]. The doxygen tool is used to extract code structure from the existing source code files, and visualize the relations between various code elements according the type of source code of target software system in the form of function call graphs, or dependency graphs, or inheritance diagrams, or collaboration diagrams, which are all generated automatically by the tool [21].

4.2) Representation of the particular architectural
information: Generally; the representation process includes two key steps; mapping the extracted code elements into a component model; and visualizing the architectural information using architecture styles. The following paragraphs describe the details of these steps: internet applications, service applications and mobile applications as summarized in Table 4. However, beside these archetypes, the Microsoft's guide also contains details of some specialized application types such as hosted and cloud services, and office business applications. The architecture of each of the archetype application can be defined using architecture styles. For example, the guide in [22] describes a layered architecture style for web applications. The visualization process we adopt is performed using these architectural styles. This is based, for example, on grouping related components in web applications as a three-layered architecture which consists of a presentation layer, business layer and data layer as shown in Figure 15. Each layer should include specific components described as follows:  Presentation Layer: responsible for managing user interaction with software system, and generally consists of components that provide a common bridge into the core business logic that encapsulated in the business layer.
 Business Layer: which implements the core functionality of software system, and encapsulates the relevant business logic. It generally consists of components, some of which may expose service interfaces that other callers can use.
 Data Access Layer: provides access to data hosted within the system, and data exposed by other networked systems; perhaps accessed through services.
To summarize; Phase (4) includes two key steps. The first step deals with organizing the extracted code elements into a component model to make an explicit mapping between the system's architecture and code elements. The second step deals with using archetypes and architecture styles to visualize the architecture model. We give the example of a layered architectural style for web applications. The visual model represents the extraction of the partial architectural information in the form of a logical model. This architectural information helps stakeholders to answer their architectural concerns about a target system. The next section describes how to apply the methodology phases to a practical case study.

III. APPLY RE METHODOLOGY TO A CASE STUDY
The following sections describe how to implement the RE methodology phases using a legacy web application as a practical case study. The section starts by giving an overview of the selected software system, and describes the main reasons for selecting this system. Then we describe the details of applying each phase of the methodology to the case study.

A. Selecting Software System for a Case Study
The case study selected is a web application system called Timetable Management System (TMS). TMS was developed by the Computer Center at Sudan University of Science and Technology (SUST) in 2008. TMS is a Web-based open source system which was built for Sudanese Universities using MySQL database and PHP web page language with Arabic interface; and it provides high flexible features for managing and controlling the scheduling of lectures' times for students at Sudanese universities. Moreover; TMS is flexible to accept changes that occur in schedules for all colleges at the university during the academic year without an overlap in specified slot times between these colleges. We chose this system for the following reasons. TMS software is a diverse software implemented as a combination of both front-end PHP, JavaScript and HTML code plus a back-end MySQL database. It is an example of an application with multiple components implemented with different technologies. TMS is considered to be a legacy system implemented with more than 10 years old technologies since 2008. The documentation of TMS's architecture is missing, and the system documentation needs to reflect its current architectural representation in order to be reengineered with new technologies. Recovering the particular architectural information of the system is essential to support the system's understand-ability and maintainability. Table 5 represents the general description about the TMS's source code contents.

System Name Timetable Management System(TMS) Description
The core of source code is mainly PHP webpage source files (written with PHP procedural function code style, and its non-object oriented code style).

PHP Source Files 110
Total LOC 30364

B. Applying RE Methodology Phases to the Case Study
The following paragraphs elaborate on the details of applying each phase of RE methodology:

1) Define a set of Stakeholders Concerns:
we define a set of stakeholders' concerns base on the "Functional viewpoint" of the TMS system. The primary TMS's stakeholders are:  End-User: who defines the system's functionality and ultimately make use of it. TMS has three endusers (College Admin, Teachers, and Students).
 Maintainer: who manages the reengineering and improvements of the system.

2) Elicit a specific Stakeholders Concern:
The elicitation process is focused on a selecting a particular functional concern related to a use-case or a major functionality offered by the system to different type of users. The main idea is to elicit a specific concern such as "CID1" shows in Figure 9 bellow.

3) Extract related requirements information based on the elicited stakeholder's concern:
TMS has 34 functional requirements; this phase assumes that all of TMS's functional requirements are already existed in a "requirement repository". The extraction process starts by accessing the requirement repository and filtering all of relevant information based on the elicited concern. Then create a trace link between its relevant information. The phase is achieved by using the tool as described in section 3.1 and section 3.2.
Using the tool we obtain the results shown in Figure10. The results of the search shows ten requirements information displayed in a dropdown menu and sorted by ranking using three categories as follow: High weight(2) appears in green color, Medium weight(7) appears in yellow color, and Low weight(1) appears in red color.

Fig.10 Extraction of Related Requirements Information
Additionally, the creation of a trace link is performed in order to link the elicited concern with its relevant information produced from the extraction process as shown in Figure 11.

4) Extracting architectural information:
This final phase is achieved by applying the RE process at code level to perform following key steps:

4.1) Extracting specific source code files:
The code extraction process is performed by using a static code analyzer as described in section 4.1. Using the existing TMS source code file, we determine which set of source code files are used to implement the specific functionality of the system specified in the previous steps.
Notably, the selection of a starting point for the extraction process is performed by returning to TMS's user manual in order to track the starting point for "TMS_Req2.20" execution. The main output of this process is to extract the call graph to obtain and visualize the dependencies between the function elements which are used to execute specific functionality in the system as described in Figure12 and Figure13.

4.2) Representation and Visualization of architectural information:
This process includes two steps: The first step deals with mapping the extracted code elements into architectural components. The selected code elements in Figure 14 (webpages and functions) are mapped into thirteen components architecture. The second step is visualizing and representing particular architectural information using a web application layered architecture style. The selection of the architecture type is based on Web Application Archetype which is applicable with the TMS system. The core of the Web application is the serverside logic which is visualized in a three-layer architecture. Figure 15 shows the main components in each layer that are used to describe and represent "TMS_Req2.20" functionality as following:  Presentation layer includes three components such as (TMS Main Menu, Reporting Form and Page Layout component). These components are responsible for managing the end-user interaction with TMS system.
 Business layer includes nine components which implement the core functionality of TMS system. The first four components such as (Preparation of Teacher Report, College Timeslots, Report Detail and DeptBackground Theme component). These Components are concerned with the retrieval, processing, transformation, and management of TMS's data; business rules and policies. The others five components called "business entities" which encapsulate the business logic and data necessary to present the real world elements within TMS system, such as (Academic Class Group, Lecture Room, Teacher, Subject and Department).  Data access layer consists of the database connection component which provides access to the data hosted within TMS system To summarized, the layered architecture model is used to visualize and represent the extraction of particular architectural information into a graphical model for stakeholders which helps to answer their architectural concerns about specific functionality of the TMS system. Moreover, this architectural model provides an abstract level of architectural representation for stakeholders which highlights which set of components are needed to execute specific functionality of the system. This is shown here as the functionality of the mechanism for managing the scheduling of Teachers lectures as shown in Figure 16. architecting is a specific type of reverse engineering, and stated that the RE process should consist of three phases starting with an extraction phase where we extract information from the source code and document it in documentation, and documented system history. The process also include an abstraction phase which abstracts the extracted information based on the objectives of RE activity, then elicits the extracted information into a manageable amount of information. And finally a presentation phase that represents the abstracted data in a way suitable for the stakeholders [23]. Software architecture consists of the description of components and their relationships and interactions, both statically and behaviourally as described in . The recent approaches and methods discussed the need for alternative solutions to extend additional stakeholders. The solutions should focus to communicate with the stored architectural information by applying the scenario based documentation through stakeholders' scenarios and managing the architecture's documentation of software system. However; these issues should simplify and classify the architectural information based on identifying stakeholders' concerns and viewpoints about the target system, and visualize the architectural information in a proper level of abstractions based on these stakeholders' concerns. In this paper we present a RE methodology for visualizing architectural information for multiple stakeholders and viewpoints based on applying the RE on specific parts of the source code. The process is driven by eliciting stakeholders' concerns on specific architectural viewpoints to obtain and visualize architectural information related these concerns. The main idea of the methodology integrates the RE technology and the representation of software architectural information. The extraction process of RE methodology is driven by addressing the specific concern by stakeholder(s) for extracting only partial architectural information. Therefore, it's doesn't address RE of the whole architecture of a target system. Moreover; the representation process includes two key steps; mapping the extracted code elements into a component model; and visualizing the architectural information using the architecture styles. This visualized architectural information indicates the architecture for particular part of software system which support the understand-ability and maintainability process for legacy software system. Respecting and comparing with some of the related works as summarized in Table 6; our main contributions are three fold: (1) The RE methodology is based on the IEEE 1471 standard for architectural description and supports concerns of stakeholder including end-user and maintainer; (2) RE methodology supports the visualization of a particular part of the target system by providing a visual model of the architectural representation which highlights the main components needed to execute specific functionality of the target system, and (3) The methodology uses architecture styles to organize the visual architecture information. We illustrate the methodology using a case study of a legacy web application system As a result of these contributions, the visualization of a particular part of the target system highlights the main components needed to execute specific functionality which can be used to support the understand-ability and maintainability of the legacy software system (by putting the stakeholder in the maintenance loop; so that stakeholder can give feedback on the information related the target system).

V. CONCLUSION AND FUTURE WORK
The main contributions drawn from the proposed RE Methodology are: firstly; a new RE Methodology follows IEEE 1471 standard of architectural description and support concerns of stakeholder including end-user and maintainer. Secondly; GUI prototype tool to support the steps of Methodology. It supports the visualization of a particular part of the target system by providing a visual model of the architectural representation which highlights the main components needed to execute specific functionality of the target system. Finally; the verification of the methodology using legacy web application system. Further information; the extraction of architectural representation helps stakeholders especially (maintainer, end-user, architect, tester and developer) for obtaining the as built architecture from its implemented source code elements, and supporting the understand-ability and maintainability phase for the target system. For example; the architectural representation can be used by the maintainer to support the understand-ability for particular part of the system; by tracing the related requirement information through its implemented code elements and highlighted which components were needed to represent specific functionality of the target system as described in Figure 16.
Moreover; in case of improving or re-engineering the legacy software system into new technology such as (object oriented system or cloud based application system); the architectural representation helps the maintainer to identify which set of components that implement the core functionality of legacy system, and encapsulate the relevant business logic, or either to decide how to manage and migrate the executable components into cloud based environment.
Additionality, the extracted architectural information can be used by the end-user to support the understand-ability for particular part of the system by providing a proper level of architectural diagram that highlighted which components are needed to describe specific functionality. Actually, this is very important by putting the end-user in the maintenance loop so that end-user can give feedback on the information related the target system, or either to determine and decide in case of re-engineering specific functionality of legacy software system through adding new features for the target system.
The main recommendations for the Future work are highlighted as follow: there is a need to extend RE methodology to support additional architectural viewpoint beside the "Functional viewpoint" based on a given classification of viewpoints catalog (such as: the information viewpoint, the deployment viewpoint, and the operational viewpoint). The development of automated tool is needed to support the whole phases of RE methodology, and apply RE methodology in different application domains such as: the robotics systems and smart object systems to support the understand-ability and maintainability process for particular parts of these systems.