Improving Analysis and Visualizing of JVM Profiling Logs Using Process Mining

Growing size and complexity of modern software applications increase the demand to make the information systems self-configuring, self-optimizing and with flexible architecture. Although managed languages have eliminated or minimized many low-level software errors there are many other sources of errors that persist. Java Virtual Machine (JVM), as managed language has many adaptive optimization techniques, which needs tools to analysis program behavior determines where the application spends most of its time. In this paper, new approached has been introduced to use process-mining techniques to represent the analysis and visualize phases of JVM profilers. They are flexible enough to cover so many perspectives in several ways. That can form a unified layer for analysis and visualize across profiling.


Introduction
The modern software applications are complicated enough that leads to increasing the demand for automating the process of managing the software environments that allows developers to identify performance bottlenecks with minimum effort.
Likewise for the Java Virtual Machine (JVM), as managed language based on interpreter it requires more processing for execution, and there are several approaches to enhance JVM performance like Just-In-Time Compilation (JIT), interpretation directly in hardware by specialized architecture and improving JVM performance by understanding of the behavior of Java-based applications (Bowers & Kaeli, 1998).Optimizing the compilers and software applications by understanding the dynamic behavior of it; it is an effective approach (Driesen et al., 2003).
The process of automatic collection and presentation of data that is representing the dynamic behavior of the program is called profiling (Dmitriev, 2004).After profilers collect and analyze the data, it can be either automatically feedback to the compiler or present it for the developers.Each case has different requirements in designing the profiler (Liang & Viswanathan, 1999).For example the feedback profilers should avoid the "observer effect" program that may affect the program's behavior (Snyder et al., 2011), while this problem not critical if profile will just present the result for developers.
From Another Perspective, Process Mining techniques aim to extract non-trivial information from event logs recorded by information systems.According to their abilities to assist in understanding and (re)design the complex process by extracting the workflow model that represent the information system behavior, the process mining techniques have received notable attention and promising vision (Van Der Aalst & Weijters, 2004).ProM framework is a pluggable environment for process mining.This framework is flexible with respect to the input and output format, and is also open enough to allow for the easy reuse of code during the implementation of new process mining ideas (De Medeiros et al., 2005).This paper is mainly concerned with profilers that provide information about java program or JVM (HotSpotTM); these profilers mainly have three phases: collecting data, analyzing data and visualizing results.In this paper, the process mining techniques and ProM tool implemented to represent the analysis and visualize phases.They are flexible enough to cover so many perspectives in several ways.That can form a unified layer for analysis and visualize across profiling perspectives.Process mining applied on two different profiling data for java programs/VM, and the result compared with the original profiling tools.The profiling data mapped to process-based, and with each different mapping new analysis perspective obtained.The DaCapo (Garner et al., 2006) benchmark suite has been used to apply profiling perspectives on some of its component.In this context, the java-event-logs tool has been provided to mapping Java profiling data to process mining event logs.This paper organized as follows.Section 2 provides background information about the profiling data and process mining.Section 3 describes related work.Section 4 describes the architecture of java-event-logs tool.Section 5 provides a detailed description on how to use process mining during profiling and presenting the experimental results.Finally, Section 6 concludes and suggests directions for future work.

Literature Review
In this section, the profiling data will be dissected and study the existence tool for analysis and visualizing, then an overview about process mining perspectives and event log format will be discussed.

JVM Profilers
There are two different profiling data with different perspectives selected to study.The following is a breakdown of them:

Dependence Graph
The Java HotSpotTM server compiler uses a program dependence graph as the intermediate data structure when compiling Java bytecodes to machine code.When using the compiler in debug mode, it is providing a textual output of the graph (Ottenstein et al., 1987;Vick et al., 2001;Wimmer et al., 2008).The Ideal Graph Visualizer (IGV) tool used to analyze the compiler by providing a graphical representation of the program dependence graph.During the compilation process, the IGV tool captures snapshots of the graph then use it to create visual presentation to reconstruct the transformations applied to the graph by compiler optimizations.Figure 1 shows the interaction between the visualization tool and the server compiler (Würthinger, 2007;Wimmer et al., 2008).In IGV the data transferred from the server compiler to the visualization tool is represented in XML. Figure 2 shows the XML elements and their relations (Würthinger, 2007).The JVM HotSpotTM developers create internal diagnostic option in the JVM itself.The diagnostic options "-XX: +LogCompilation" emits a structured XML log of compilation related activity during a run of the virtual machine.By default it ends up in the standard "hotspot.log"file, though this can be changed using the -XX: LogFile= option.Note that both of these are considered diagnostic options and have to be enabled using -XX: +UnlockDiagnosticVMOptions (Snyder et al., 2011).
Figure 4 shows very rough overview of the LogCompilation output XML, and Table 2 describes the main elements in this XML.In (Sewe et al., 2012) the JP2 tool designed to extract the valuable calling context tree without exposure to analysis or visualize.In each of (Krinke, 2004;Balmas, 2001;Lee & Sim, 2015) specially programmed tools have been provided to display the program dependence graph.In (Driesen et al., 2003;Hendren et al., 2003) a new tool has been developed to use the internal JVM profiling APIs for gathering the information about the program then computing and presenting the results from the standpoint of the dynamic metrics.The NetBeans/JFluid Profiler (Dmitriev, 2004;Schulz et al., 2015) depends on dynamic bytecode instrumentation and code hotswapping to turn profiling on and off dynamically.However, this tool needs a customized JVM and is therefore only available for a limited set of environments.The Spy framework (Banados et al., 2012) builds profilers and visualizes profiling information for the Pharo-Smalltalk programming language.However, the limitations of the language reflect on the profiler.There is a wide range of related work in the area of profiling perspectives and tools; but the common thing across all that there is no unified data model and each tool designs its analysis and visualize technique which make it hard to integrate.

Process Mining
The main goal of process mining is to extract the information from the logs of the systems and representing it in workflow model to reconstruct the order of activities in the form of a graphical model.The basic idea of process mining is to learn from observed executions of a process (Van der Aalst & Weijters, 2004;Van Dongen et al., 2007); this used to: Discover new models (e.g., constructing a Petri Net that is able to reproduce the observed behavior), Check the conformance of a model by checking whether the modeled behavior matches the observed behavior and Extend an existing model by projecting.
The basic perspective of process mining is the so-called control-flow (process) perspective, which focuses on the control-flow, i.e., the ordering of activities.However, in addition to that could also consider: the organization perspective which focuses on which performers are involved and how they are related, and the case perspective that focuses on properties of cases (Van der Aalst & Weijters, 2004).
Event logs can be very different in nature, i.e. an event log could show the events that occur in a specific machine that produces computer chips, or it could show the different departments visited by a patient in a hospital.However, all event logs have one thing in common: they show occurrences of events at specific moments in time, where each event refers to a specific process and an instance thereof, i.e. a case (Van Der Aalst & De Medeiros, 2005).
ProM framework (De Medeiros et al., 2005) is a pluggable environment for process mining.Since each system has its own format for output log files, ProM framework works with a generic XML formats like MXML and XES (Van Der Aalst & Van Der Aalst, 2011).Regardless the elements name in file formats; there are main elements for each process that should be represented in any format, these elements listed in Table 3.
Plug-ins in ProM framework can be divided to mining plug-in which implements algorithms that mine models from event logs, analysis plug-in which typically implement some property analysis on some mining result and others plug-ins related to file formats input/output.Moreover, ProM has enormous potential in filtration and general statistics about the input event logs.
Table 3.The main elements in the event log

Element Required Description Case
Mandatory Each case has unique ID and includes related actions.

Activity
Mandatory The name of the action.

Timestamp Optional
The time of the action.

Originator Optional
The name of the action performer.

Proposed Tool Architecture
In this section, the architecture of java-event-logs tool and the usage of it described in details.XML format is the common thing between the types of input files and the output files too.So, the XMLBeans library for accessing XML by binding it to Java types, XMLBeans provides a way to get at the XML through XML schema that has been compiled to generate Java types that represent schema types, the XML schemas that descript the three types of input data has been have been included in the tool.
Figure 6 shows the class diagram for the java-event-logs tool.The "MainMiner" is the main class which receives the user options and delegates it to the right miner.The "LogMiner" is the abstract parent class for the three miners which applies the factory method pattern, the "IGVLogMiner","LogCompilationLogMiner" are the miners that responsible for extract the event logs patterns from dependence graph and compilation logs and finally the "MXMLLogBuilder" class which is responsible about the event logs output format.
The java-event-logs tool has two execution options "-igv" and "-logc" for dependence graph and compilation logs respectively.As shown in Figure 7 and Figure 8, the tool apply certain algorithm based on each log input.

Process-Based Profiling in Action
In this section, the process mining techniques applied on the profiling data that previously mentioned by mapping the data using java-event-logs tool and present the faces of various analysis perspectives supported by ProM.Data extracted by profiling the fop application in the DaCapo benchmark suite.As in Table 3, time of the action is one of elements that process mining uses during analysis, although it is optional, some important analysis techniques rely on it.So, the time attribute added to profiling data by modifying the profiler agent.
The Heuristics Miner (HM) algorithm (Weijters & Van Der Aalst, 2003;Van Der Aalst et al., 2006;Burattin, 2015) focuses on the control flow perspective and generates a process model in form of a Heuristics Net for the underlying event log.Also HM is a practical applicable mining algorithm that can deal with noise, and can be used to express the main behavior of the event log.So, for the control flow perspective of the next cases, the HM algorithm selected.The HM provides list of parameters to control the level of details in the extracted model by it.
Figure 7.The algorithm steps to convert dependency graph to event log HM Algorithm Steps: The HM algorithm is a three-step algorithm: Construct a dependency graph on the basis of the event log.For each task in the event log establish the input-output expressions in form of type of dependencies between activities.Discover the long distance dependency relations.
Mining of the dependency graph: The starting point of the Heuristics Miner is the construction of a so-called dependency graph.A frequency based metric is used to indicate how certain that there is truly a dependency relation between two events a and b (notation a W b).
Let W be an event log over T, and a, b ∈ T. Then |a >W b| is the number of times a >W b occurs in W, and Equation 1. Dependency measure between a and b

JVM Dependence Graph Implementation
For each different data mapping from dependence graph to event log, different analysis perspective obtained.
The timestamp attribute added for each graph element.Two different mapping listed below:

Method Snapshots
This pattern provides a graphical representation of each method snapshot of the program dependence graph.This represent an equivalent for what provided by the IGV tool.Each process will represent states of single method; each case will represent two nodes attached with one edge, any case constructed in two actions, first one is the source node and second action is the destination node.The filtration functionality used to select the specific state to work on.The profiling data mapped to event log as in Table 4. Figure 9 shows the control flow graph represent method state extracted using HM algorithm after filtering it using instance name filter with regular expression value "^(?!After|Parsing).*$" to model "After Parsing" state only.Also there are different analysis techniques are available for direct applying like LTL and SCIFF checkers (Lamma et al., 2009) which uses a logic-based approach to mining declarative models and DWS clustering algorithm (Guzzo et al., 2008) which provides solution for over-fitting problem that appear with complex methods and which is not handled in IGV tool.

Compilation Workflow
This pattern provides very detailed information about compiler behavior during the process of compilation of the monitored code, the compilation process changes based on method structure and complexity.The event log will have only one process; ach case will represent one method compilation steps and each action will include the step title, event time is the time of starting this step and the originator will be the full name of the method itself.
The profiling data mapped to event log as in Table 5.  Starting with the control flow model for compilation process, Figure 10 shows part of the HM model that explains which compiler state has triggered and in which sequence and frequency, that allows understanding the code complexity and the corresponding compiler behavior.For example, how many "Phase Ideal Loop" states compiler has to call, did the compiler call the "eliminating allocations and locks" state or not and for how many times and so on.The applying of LTL and SCIFF checkers allows defining which compilation pattern to check, also clustering over-fit patterns and simplify then using DWS.
Some different analysis perspectives can be extracted directly; like basic statistics about the occurrences of the compilation states as in Figure 11, using the basic performance analysis we can easily identify the time that each state takes in average as in Figure 12 to identify the costly states or the time that each method takes in general while compilation, by using the "Originator by Task Matrix" we can identify which method trigger specific state in high frequency as in Figure 13.
To study the compilation patterns we can use "Sequence Diagram Analysis" to list the paths that the control flow constructed from them with identification for the most frequent path that was happened during compilation as in Figure 14, in this case the total unique compilation paths is 9 paths represent 38 cases and the most frequent path happened 21 times.For studying the changes in compiler behavior from method to another, we can use the "Trace Diff Analysis" to compare compilation steps for two methods, Figure 15 shows the common steps in order between two method and when changes start and end.

Compilation Logs Implementation
Compilation logs provided by JVM HotSpot developers inspects the compilation process.Compilation logs focuses on the compilation of method without describing the method architecture.Two patterns listed below, first is about classes relationship according to compilation, and the other pattern is compilation workflow to describe compiler behavior during the process of compilation.

Classes Relationship Based on Compilation
In JVM HotSpot method compilation happens under some optimization conditions, for this pattern, and to extract a valid classes relationship based on compilation, the compilation should happens for all method.So, the "-Xcomp" option has been used in this pattern to force compilation for all methods.Each process will represent single classes sequence; each case will represent two classes relationship according to the order of compilation methods in both of them as in Table 6.
Figure 16 shows part of the HM workflow model that describes the classes' relationship based on compilation process.The dependency between classes is so clear in such a model, by filtering this model to cover specific classes with predefine relation, we can make sure that what we designed actually applied.Obviously there are some basic statistic can be extracted from this pattern directly.Like the time has been consumed with each method, frequencies of method compilations in each class and which class has more compilation frequency.
Listing the methods based on their compilation order as in the LogCompilation tool extracted directly from the log inspector as in Figure 17.

Compilation Workflow
This pattern describes compiler behavior during the process of compilation as organized in JVM compilation logs.The event log will have only one process; ach case will represent one method compilation process and each action will include the compilation phase, event time is the time of starting this phase and the originator will be the full name of the class that contains this method.The profiling data mapped to event log as in Table 7. Figure 18 shows the HM model, and all the analysis patterns that mentioned with dependency graph can be extract as well from JVM compilation logs.
Table 7. Data mapping to extract compilation workflow from JVM compilation logs

Event Log Profiling Data Description Case
Task Each case represents one method compilation process.

Activity
Phase Each activity represents one phase of compilation process.

Timestamp Phase time
The starting time of this phase.Originator Task: Class Name Putting originator as method full name to analysis.
Figure 18.Workflow diagram that represent some methods compilation processes in JVM according to compilation logs

Conclusions
In this paper, new approach has introduced to use process-mining techniques to represent the analysis and visualize phases of JVM profilers.They are flexible enough to cover so many perspectives in several ways.That can form a unified layer for analysis and visualize across profiling perspectives.
To do so, new tool java-event-logs has introduced to implement this approach.The java-event-logs tool has two execution options "-igv" and "-logc" for dependence graph and compilation logs respectively, and the outputs provide new perspectives for profiling data analysis.
Applying this new approach on the JVM profilers provides information about java program or JVM (HotSpotTM) and helps JVM developers with new aspect of analysis with each process mining perspective.On the other hand, we will work on how to use process mining to provide interactive approached that can help to analysis the Java Byte code program and provide feedback to JVM to enhance execution time and memory management.

Figure 1 .
Figure 1.Interaction between the compiler and the visualization tool

Figure 2 .
Figure 2. IGV XML elements and their relations

Figure 3 .
Figure 3. Dependence Graph as represented in IGV tool

Figure 4 .
Figure 4. Very rough overview of the LogCompilation XML

Figure 5 .
Figure 5. Sample for the output of the LogCompilation tool

Figure 6 .
Figure 6.Class diagram for java-event-logs tool

Figure 8 .
Figure 8.The algorithm steps to convert Compilation logs to event log

Figure 9 .
Figure 9. Dependence Graph state as represented in ProM tool

Figure 10 .
Figure 10.Workflow diagram that represent some methods compilation processes in JVM according to dependency graph

Figure 16 .
Figure 16.HM workflow model that represent classes relationship based on compilation

Table 1 .
Description of the main elements in IGV XML

Table 4 .
Data mapping to extract the method snapshot

Table 5 .
Data mapping to extract compilation workflow from dependency graph Timestamp Graph: state time The starting time of this state.Originator methodPutting originator as method full name to use in analysis.
ActivityGraph: state title Each activity represents one state.