ON AUTOMATIC IDENTIFICATION OF MONITORING CONCERNS IMPLEMENTATION

Automatic identification of crosscutting concerns implementation is still a challenging task in software engineering. The approaches proposed so far for crosscutting concerns identification are all bottom-up approaches: starting from the source code of a software system they try to discover all the crosscutting concerns that exist in the system. In this paper we present a top-down approach that we developed based on the observations gathered after analyzing how monitoring crosscutting concerns are implemented in different open source object oriented software systems. The approach aims to identify only one type of crosscutting concern, namely monitoring. It tries to automatically identify the type of the logger used for monitoring crosscutting concerns implementation by analyzing the attributes defined in Java-based software systems. We also present and discuss the results obtained by applying this approach to different open source Java software systems.


INTRODUCTION
The ever increasing complexity of software systems makes designing and implementing them a difficult task.Software systems are usually composed of many different concerns.A concern is a specific requirement or consideration that must be addressed in order to satisfy the overall system.The concerns are classified in core concerns and crosscutting concerns.The core concerns capture the central functionality of a module, while crosscutting concerns capture system-level, peripheral requirements that cross multiple modules.Paradigms like procedural or object oriented programming provide good solutions for the design and implementation of core concerns, but they cannot deal properly with crosscutting concerns.Many different approaches have been proposed for the design and implementation of crosscutting concerns: subject oriented programming [1], composition filters [2], adaptive programming [3], generative programming [4], and aspect oriented programming (AOP) [5].From these approaches, the aspect oriented programming approach has known the greatest success both in industry and academia.
For almost two decades researchers have tried to develop techniques and tools to (automatically) identify crosscutting concerns in software systems that were already developed without using AOP.This area of research is called Aspect Mining.The goal is to identify the crosscutting concerns, and then to refactor them to aspects, in order to obtain a system that can be easily understood, maintained and modified.In order to identify crosscutting concerns, the existing techniques try to discover one or both symptoms that appear when designing and implementing crosscutting concerns using the existing paradigms: code scattering and code tangling.Code scattering means that the code that implements a crosscutting concern is spread across the system, and code tangling means that the code that implements some concern is mixed with code from other (crosscutting) concerns.
The main contribution of this paper is to propose the first top-down aspect mining approach that tries to identify the implementation of two kinds of monitoring crosscutting concerns: logging and tracing.The approach does not aim to identify all the crosscutting concerns that exist in a software system, it only focuses on these two kinds of monitoring crosscutting concerns.We also present and discuss the results obtained by applying this approach to eight open source Java-based software systems.The rest of the paper is structured as follows.Section 2 presents an overview of the aspect mining techniques proposed so far.Section 3 describes the two types of crosscutting concerns we are interested in, and the proposed approach for their automatic identification.In Section 4 we present the software systems on which we have applied our approach and the obtained results.Conclusions and further work are given in Section 5 and Section 6, respectively.

ASPECT MINING TECHNIQUES
The first approaches in aspect mining were query-based search techniques.The developer had to introduce a socalled seed (eg., a word, the name of a method or of a field) and the associated tool showed all the places where the seed was found.Very soon, researchers discovered that this approach to aspect mining has some important disadvantages: the user of the tool had to have an in-depth knowledge of the analyzed system, as he/she had to figure out the seed(s) to be introduced, and the large amount of time needed in order to filter the results displayed.There were many query based aspect mining tools proposed, like: Aspect Browser [6], The Aspect Mining Tool (AMT) [7], Feature Exploration and Analysis Tool (FEAT) [8].All these tools are performing the search in the source code of the mined system.
All the presently proposed automated aspect mining techniques try to discover all the crosscutting concerns that exist in the mined software system.The obtained results have shown that it is not an easy task to develop an approach that can be used for discovering different types of crosscutting concerns.Consequently, the obtained results are not very accurate, and only some types of crosscutting concerns are discovered.If the techniques proposed in the beginning used very different approaches, the last ones are more like improvements of some of the previously proposed techniques.Even so, the results obtained by the new aspect mining techniques did not improve significantly.They obtained better results, but not much better.Also, practice has shown that not all crosscutting concerns can be refactored to aspects.
Mens et al. have conducted an analysis of the problems the proposed aspect mining techniques were encountering [23].The main identified problems were: poor precision, poor recall, subjectivity, scalability, lack of empirical validation.The study was conducted in 2008 and since then the results obtained by the proposed aspect mining techniques did not improve much.

A TOP-DOWN APPROACH
In this section we describe our approach for identifying logging and tracing monitoring crosscutting concerns.

Monitoring Crosscutting Concerns
Monitoring concerns record the behaviour of a software system during development, testing and execution in its own environment.The most commonly used are: logging, tracing and performance monitoring: • Logging produces messages specific to the logic carried by a piece of code.
• Tracing produces messages for lower-level events such as: the entry or exit of a method, exception handling or object construction, and state modification.
• Performance monitoring measures the time taken by specific parts of the system to execute and/or the number of times a particular method is invoked.
It is well-known that tracing and performance monitoring are better implemented using AOP.The AOP-based solution is clearly separated from the rest of the system, can be easily understood and maintained, and it can be easily plugged-in or plugged-out of the system.As for logging it is not clear yet if an AOP-based solution can be designed, and if it is better than the non-AOP one.

Our Approach
The proposed approach is based on the results obtained in previous studies where we have manually analyzed object oriented software systems to determine if a pattern (or patterns) can be extracted for monitoring crosscutting concerns implementation [24][25][26].The obtained results have shown that many different patterns are used for monitoring concerns implementation, but the most used one is the declaration of an attribute corresponding to the object used for recording the produced messages, often called a logger, and then calling different methods on it.This attribute is in most cases a static and/or final one [25,26].
Listing 1 shows a fragment from the source code of AjpMessage class from Tomcat v9 [28] (one of the analyzed software systems).The fragment includes the declaration of the logger object (named log) as a static and final attribute.The object is later used in some of the methods to record the corresponding messages.This pattern was identified in most of the classes that recorded messages.Still, there are also other patterns identified, like the declaration of the logger object in a base class and methods from subclasses only use it without declaring a new attribute, or the declaration of the logger object as a local variable in the methods that needed to record messages.
Based on these results we have developed a top-down approach that tries to identify logging and tracing monitoring crosscutting concerns by analyzing the static or final attributes defined in a Java-based software systems.The approach consists of first automatically identifying the type of the logger object, and then the automatic identification of the affected classes (the classes in which the monitoring concerns are implemented).Having identified the classes, then we can (automatically) analyze them to determine whether the concerns can be (automatically) refactored to aspects.
Our approach for identifying the type of the logger object consists of the following steps: 1. Instrumentation.In order to determine the static or final attributes defined in a Java-based software system we need to automatically analyze the source code (.java files) or the bytecode (.class files) of the system.The existing libraries and frameworks that allow us to analyze them (like Soot [29] or Spoon [30]) require and/or use a classpath variable that must be properly set in order to be able to analyze the input (source code or bytecode).During this step we determine all the dependencies (usually other .jarfiles) of the system.This is the most time consuming step, as large software systems may depend on many different libraries that must be identified by the user if there is no additional information present (such as a Gradle [31] or a Maven [32] build file).According to Sulir and Poruban [27] even if a build file exists, it is still a difficult and time consuming task to successfully build a complex software system, with many third parties dependencies, from its source code.
2. Analysis.After the completion of the first step, we automatically identify all the static or final attributes defined in the analyzed software system.During this step we gather the following information: the type of the attribute, the number of times this type was used for declaring a static or final attribute, and the number of distinct classes defining this kind of attributes.We consider that the number of distinct classes in which static or final attributes of the same type were defined is important, as in the same class many different static or final attributes of the same type may be defined. .During this step we also remove the types that were used for declaring static or final attributes in less than 3 classes.We consider that if a static or final attribute of the same type is defined in more than 3 classes than it can be considered as crosscutting, otherwise it can be considered as coupling between the corresponding types (the type of the attribute(s) and the classes in which the static or final attribute(s) was (were) defined).

4.
Ranking.The remaining static or final attributes' types are sorted descending by the number of declaring classes.The first n results will be presented to the user as possible results for the logger object's type.From our observations of the manually analyzed software systems the type should be among the first ranked results.The value of n can be decided by the user (or it could have a default value).
After identifying the type of the logger object, we consider that logging and/or tracing monitoring crosscutting concerns are implemented in all the classes having an attribute of this type.These classes are determined during the analysis of all the attributes defined in the software system, so no additional computation is needed.

STUDY
In this section we present the software systems used for our approach assessment and the obtained results.

Case Studies
In order to verify the applicability of the described approach we have used eight different open source Java-based software systems as case studies.Four of them, namely Spoon, Tomcat v9, Spring Framework, and ArgoUML, were previously manually analyzed in order to be able to also compute our approach accuracy.Four systems, namely Mars simulator, JGAP, Neuroph and JEdit, are new systems that we did not analyze before.
• Spoon is an open-source library that enables transformation and analysis of Java source code.It provides a metamodel where any kind of program element such as a class, a method, a field, a statement, etc. can be accessed for reading and/or modification.The code used for our analysis was downloaded from [33].
• ArgoUML is an open source UML modeling tool that includes support for all standard UML 1.4 diagrams.It runs on any Java platform.We have used version 0.34 for our analysis, and the source was downloaded from [34].
ISSN 1335-8243 (print) c • Apache Tomcat is an open-source web container for Java Servlet, JavaServer Pages, Java Expression Language and Java WebSocket technologies [35].We analyzed version 9, the source code being downloaded from [36].
• Spring Framework is a modular framework that helps developing Java enterprise applications by providing a comprehensive programming and configuration model [37].The developers focus on the application-level business logic, and the framework helps putting together the final system.The source code that we have used for our analysis was downloaded from [38].
• Mars simulator is a Java based open source project that simulates the activities of the first generation of settlers on Mars.The byte code that we have analyzed was downloaded from [39].
• JGAP is a Genetic Algorithms and Genetic Programming package written in Java.The analyzed byte code was downloaded from [40].
• Neuroph is lightweight Java neural network framework to develop common neural network architectures.The byte code that we have analyzed was downloaded from [41].
• JEdit is a text editor for programmers with support for many different programming languages.The analyzed byte code was downloaded from [42].
Table 1 presents the number of .classfiles analyzed for each case study.For the automatic analysis step we have used Soot [29], a Java optimization framework that provides various representations for analyzing and transforming Java bytecode.In order to obtain the bytecode of the systems for which we only had the source code, we have first built the systems by following the instructions described on the corresponding websites.

Results
Table 2 presents the obtained results after executing the Analysis step of our approach.As the results show the number of static or final attributes defined in a large Java-based software system is big.In almost all case studies (with the exception of Neuroph), the number is greater than the number of classes and interfaces analyzed, and for ArgoUML and Tomcat the number is even bigger than the doubled number of classes and interfaces.However, if we consider only the attributes' type their number decreases significantly.In all cases, the number of distinct types (DT) is less than 25% of the number of static or final attributes.In six of the cases (ArgoUML, Tomcat V9, Spring, Mars, JGAP and Neuroph) the percentage of distinct types over the number of static or final attributes is even less than 16%, and in the case of ArgoUML and JGAP it is even less than 9% meaning that only a small part of this information is necessary for the identification of the logger object type In Table 3 are given the obtained results after executing the Filtering step.These results show that after this step more than 80% of the types are removed for almost all case studies, meaning that they are either types which are commonly used for the business logic of a system (like the primitive types or the types from the java.utilor java.langpackages) or they are not crosscutting (they were used in at most 2 different classes).For six case studies (Spoon, ArgoUML, Tomcat, Spring, JGAP, and Neuroph) the number of possible types for the logger object is less than 20% of the total number of types used for declaring static or final attributes, and for all case studies it is less than 4% of the total number of static or final attributes defined in the system.
In Table 4 is presented a subset of the results obtained after executing the Ranking step for each case study.Column DA represents the number of times a static or final attribute of the corresponding type was declared, DC represents the number of classes that declare a static or final attribute of this type, TDC represents the number of classes that declare an attribute of this type (independently of the modifier used: static, final or none), and TDC/CI represents the ratio of the number of declaring classes over the total number of classes and interfaces from the system.For five case studies (ArgoUML, Tomcat v9, Spring, JGAP and Neuroph) the type ranked at the first position is the type of the logger object, showing that this approach could be used for automatic identification of logging and tracing monitoring crosscutting concerns.For these case studies (excepting Neuroph) the ratio of TDC/CI of the type ranked at the first position is also significantly greater than the percentage of the type ranked at the second position.The results also show that the type used for the logger object is different for almost each case study, meaning that it is dependent on the software system.
In the case of Spoon case study the top 3 ranked types do not include the type of the logger object.
The most used type for declaring static or final attributes is ReplacementVisitor from spoon.support.visitor.replacepackage which was used in more than 10% of the classes from system.In this case the type of the logger object was ranked only at the position 15, being declared as a static or final attribute in only 4 classes, meaning less than 1% of the total number of classes.The manual analysis has determined that the type was used in 5 classes, but in one class the logger object was not declared neither as a static nor a final attribute.
For Mars and JEdit case studies the filtered list of types does not include any type that could be used for recording monitoring messages.For these systems no logger object could be determined.The studies presented in [24][25][26] have also shown that there are software systems which use more than one type for the logger object.The manual analysis of two of the systems used in this study have shown that they use more than one type for the logger object: Tomcat v9 uses 2 types, and Spring Framework uses 3 types.For Tomcat case study the results obtained by our approach actually included both types, but the second type was ranked at the position 96 (from 136 possible positions), as shown in Table 4.The second type was used only in 4 classes, and the logger object was declared as a static or final attribute in only 3 of the classes.In the case of Spring Framework, only one type was given among the possible results.The other two types were not included because they were used in at most 4 classes, and in some of these classes the logger object was not defined as a static or final attribute.
In Table 5 is presented the accuracy of our approach for the software systems which were manually analyzed.TDC represents the number of classes where an attribute of the logger type was declared, CCC presents the number of classes from the software system in which the concerns are implemented (the number was determined during the man-ual analysis of the system), and ACC represents the accuracy of our approach.The accuracy is considered to be the percentage of classes from the software system that are part of the crosscutting concerns implementation and were identified as such by our approach.For this study the accuracy is computed as the percentage of TDC/CCC.
As shown in Table 5, the accuracy of our approach is higher than 78% for the three larger case studies, for two of them the accuracy being even higher than 90%.Even if for Spoon case study, where the type of the logger object is not among the top 3 ranked possible types, after the user chooses the correct type, the accuracy of our approach is 100%.JGAP and Neuroph were not manually analyzed before applying our approach, so we could not compute their accuracy.

CONCLUSIONS
The conclusions that can be drawn from the results obtained in Section 4 are: • The proposed approach can be used for the automatic identification of logging and tracing monitor-ISSN 1335-8243 (print) c Acta Electrotechnica et Informatica, Vol. 18, No. 3, 2018 13 ing crosscutting concerns.The results obtained for the three larger case studies have shown that the analysis of static or final attributes of a software system is a good starting point for the identification of the type of the logger object used for these crosscutting concerns implementation.The set of classes in which an attribute of this type is defined is also a good starting point for determining all the affected parts of the software systems.On these classes we can perform a more in-depth analysis in order to determine, for example, the methods in which the attribute is used and to also determine if the implementation can be refactored to aspects.As Table 3 shows the searching space is significantly reduced, less than 15% of the total number of classes and in-terfaces need to be considered for the in-depth analysis.
• This approach is scalable.As the results of the study have shown, the number of types considered as possible results are less than 7.5% of the total number of classes and interfaces.Even for large or very large software systems, the possible results for the logger object's type are reduced significantly.The time needed to obtain all the possible types is also small.It takes less than 3 seconds to obtain the possible types for any of our case studies.Identifying all the other parts of the concerns implementation may take longer, but it should still be an acceptable amount of type.• The majority of the already proposed aspect mining techniques try to identify crosscutting concerns by analyzing the methods defined in a software system.However, the obtained results show that for logging and tracing monitoring crosscutting concerns, a different granularity provides more accurate results.
• The results of this approach can be considered as input for automatically refactoring the implementation of these monitoring crosscutting concerns into aspects.This approach can also be used to determine if the implementation can actually be refactored into aspects.In the analysis described in [24] we have determined that at least 25% of the messages recorded using the logger object are constructed using local variables.This kind of monitoring messages cannot be refactored to aspects.
• Mars and JEdit case studies revealed a disadvantage of our approach: it cannot automatically determine whether monitoring concerns are implemented in the analyzed system or not.The user has to decide, after analyzing the obtained results if there is a type (or more types) used as a logger.
We did not include in this paper a comparison of our approach with other already proposed approaches as it is difficult to compare them due to their different granularities.Also, they do not automatically separate the results based on the kind of crosscutting concerns, letting the user decide which results belongs to which crosscutting concerns.

FURTHER WORK
In this paper we have presented a top-down approach for automatic identification of logging and tracing monitoring crosscutting concerns implementation.The approach analyzes the attributes defined in a Java-based software system in order to determine the logger object's type, and then determines all the affected classes.We have used the proposed approach on four open source Java-based software systems.
Further work will be done in the following directions: • To apply the proposed approach on other open source (larger) case studies.
• To determine if the proposed approach can be used for software systems developed using other programming languages like C# or C++.
• To develop a plugin for a popular IDE like IntelliJ or Eclipse that will allow the automatic identification of logging and tracing monitoring crosscutting concerns.
• To determine if refactoring is possible for the analyzed software systems.
• To evaluate if the structure of the system would improve if refactoring to aspects is possible.

Listing 1
p a c k a g e org .apache .coyote .ajp ; i m p o r t org .apache .juli .logging .*;i m p o r t org .apache .tomcat .util .res .StringManager ; p u b l i c c l a s s AjpMessage { //The logger object p r i v a t e s t a t i c f i n a l Log log = LogFactory .getLog ( AjpMessage .c l a s s ) ; // The string manager for this package.p r o t e c t e d s t a t i c f i n a l StringManager sm = StringManager .getManager ( AjpMessage .c l a s s ) ; //Write a MessageBytes out at the //current write position.p u b l i c v o i d appendBytes ( MessageBytes mb ) { i f ( mb == n u l l ) { log .error ( sm .getString (" ajpmessage .null " ) , new N u l l P o i n t e r E x c e p t i o n () ) ; appendInt (0) ; appendByte (0) ; r e t u r n ; } // other business logic code a pp en dB y te Ch un k ( mb .getByteChunk () ) ; } //Write a ByteChunk out at the //current write position.p u b l i c v o i d a pp en dB y te Ch un k ( ByteChunk bc ) { i f ( bc == n u l l ) { log .error ( sm .getString (" ajpmessage .null " ) , new N u l l P o i n t e r E x c e p t i o n () ) ; appendInt (0) ; appendByte (0) ; r e t u r n ; } appendBytes ( bc .getBytes () , bc .getStart () , bc .getLength () ) ; } //other attributes and methods } Fragment from AjpMessage class from Tomcat v9 [28].ISSN 1335-8243 (print) c 2018 FEI TUKE all the types defined in java.util or java.langpackages (such as java.lang.String, java.util.ArrayList) but not the subpackages (types like java.lang.reflect.Method will not be removed), any arrays of a type defined in these two packages (eg.java.lang.String[]) 3. Filtering.From the results obtained at the previous step we remove the following types: all Java primitive types (byte, short, int, float, double, char), any arrays of a primitive type (like byte[] or int[][]),

Table 1
Case Studies

Table 2
Case Studies Results -Analysis Step

Table 3
Case Studies Results -Filtering Step

Table 4
Case Studies Results -Ranking Step

Table 5
Case Studies Accuracy