JAVA BYTECODE INSTRUCTION USAGE COUNTING WITH ALGATOR

Development of algorithms for solving various kinds of computer related problems consists of several consecutive and possibly repetitive phases. The ﬁnal and very important step in this process is to implement the developed algorithm in a selected programming language, to test its behavior on some real-world test cases and to compare the results with the results of other algorithms. This evaluation can be done by comparing different execution indicators among which the time consumption is usually considered to be the most relevant. On the other hand, timing the algorithms in practice is very difﬁcult since it is hard to ensure a fair and reproducible environment in which algorithm’s implementations can be compared. To overcome this barrier, we introduce a system called ALG ATOR that was developed to facilitate the algorithm evaluation process. Besides the time complexity and the project-speciﬁc indicators, ALG ATOR also measures the counters of Java code and Java bytecode usage. The measurement of the former is implemented by using special tags that are to be inserted in the appropriate lines of Java code while the measurement of the latest is enabled by using an adapted Java virtual machine, which counts the Java bytecode usage and reports the statistics. By using this counters new timing-independent criteria for algorithm assessment can be derived. In this paper we present some basic concepts of the ALG ATOR system and give some examples of how to use the system in practice. We show the distribution of the usage of Java bytecode instruction for the sorting problem and the usage of the Java bytecode indicators in the time-complexity prediction for the matrix multiplication algorithm. The examples presented in this article show how the classic time measurement methods can be replaced by measuring some other more reliable indicators, and how this measurements can help to asses the quality of our algorithms.


INTRODUCTION
The results obtained by the last two phases of the algorithm design process (i.e. the proof of the correctness and the estimation of the complexity) are based on the assumptions of the selected computational model, which (more or less successfully) imitates the real environment in which the algorithm will be implemented.From a theoretical point of view, these results are interesting and completely legitimate, since they allow (theoretically) to compare and classify algorithms.When applying these results in practice, however, problems may arise, as it often shows that the real environment differs from the theoretical assumptions.Thus, for example, theoretical models usually do not include assumptions about the concrete implementation of memory management (and more specifically, the influence of the cache), which in practice greatly affect the speed of implementation.To choose the best algorithm for solving a given problem, theoretical results may help in the first round of selection, where the algorithms with the best asymptotical boundaries are selected.A real distinction between the selected algorithms with the same theoretical boundaries can only be made by comparing their behaviour in the real environment [3,4].Timing the algorithms in practice is very difficult since it is hard to ensure a fair and reproducible environment in which algorithms can be compared.The results of the measurements are influenced by various factors, some of them are more or less random.In order to better assess the practical time complexity, we, therefore, need additional tools to measure independent indicators of the implementation of algorithms.
In this paper we present a novel approach for measuring and predicting the complexity of the algorithms' imple-mentations by counting the Java bytecode instructions [2].In Section II.we present a tool called ALGATOR [1], which was designed to facilitate the algorithm comparison process by measuring different indicators.We also present the three types of measurements which are supported by the ALGA-TOR.In Section III.we focus on a simple problem of matrix multiplication and present different approaches to produce useful performance predictions based on the java bytecode instructions usages.We conclude this paper with the final remarks in Section IV.

THE ALGATOR
The ALGATOR is a computer application that was developed to support and to facilitate the algorithm design and evaluation process.The main entity within the system is the so called 'project' which it defined by a set of definitions for the problem, the algorithms and the test sets.The system was designed to be as general as possible and therefore applicable in a wide range of problem domains.All the entities (projects, algorithms, test cases) are primarily defined on an abstract level and the system is trained to execute abstract algorithms on abstract test cases.After selecting a real problem, user concretizes the abstract parts of the project and makes the project 'alive' and prepared to be used to execute real algorithms on real data.The abstraction of the project is integrated in several parts of the ALGATOR system.The algorithm is defined as a block of code (e.g. a class in Java) with predefined hook used to pass the parameters and start the execution (e.g. a method signature).The test case and the results of the execution are defined as an arbitrary data structures implemented in a selected programming language.A test set, which represents a minimal execution unit, is composed of several test cases and it is iterated through during the algorithm execution process.To collect and present the results of the algorithm execution, the ALGATOR uses the so called result sets (i.e. the sets of parameters and indications of the execution) and the presenters (the definition files in which the type and the range of the presentation is provided).All these abstractions make the system flexible and usable in many fields of computer science.

The ALGATOR project
To define an ALGATOR project, user must provide both, the configuration files and the source code in a selected programming language (Java, C or C++).The configuration files define administrative data (the name and the author of the project, the number of supported algorithms, the time limit for algorithm execution, ...) while the source code provide the logic for executing the algorithms and for evaluating the results of the execution.The configuration files use the json format and have predefined names and positions in the folder hierarchy of the project.For example, the configuration file for the project P is named P.atp and it is placed in the subfolder proj of the project folder PROJ-P.Besides the basic information about the project the configuration of the project also include the information about algorithms (the name and the author of each algorithm, programming language of the algorithm, ...), about test sets (the number of test cases of each test set, sizes of the test cases, time limits for execution of a test case, ...) and about the results (the number and the type of the indicators of execution).
The source code of the project is provided in the following classes (in the case of Java programming language; for C/C++ the logic is similar): TestCase (a class with predefined data structures needed to present the input and the output of the algorithms), AbsAlgorithm (a class that defines an abstract method that it will be used (when implemented) as the heart of the algorithm) and TestSetIterator (a bridge between the definition files and Java data structures; in this class test set configuration file is read and Java test case is generated).

Types of the ALGATOR engines and users
The ALGATOR was developed to be used as a standalone and/or as a server application.A standalone application is used to develop, test and evaluate algorithms in a separated domain where the results are used only by a limited and typically small group of people.This application can be installed and used on every personal computer.The main drawback of using the standalone version is that the results of the execution can not be fairly compared with the results of other groups of the researchers.On the other hand, the server application offers the possibility to run algorithms provided by different researchers of different groups on a single computer.The results obtained are accurate and comparable.The server version is usually installed on an internet server computer and accessed through the web interfaces.ALGATOR supports four different user roles: the system administrator (installs and manages the whole system and has the access to all the resources of the system), the project administrator (defines the project and has an access to all the project resources), researcher (defines an algorithm for the selected project, runs predefined tests and compares the results with the results of other algorithms) and the guest (observes all public projects, algorithms, and test results).The logic of the user rights and roles is equally supported in both versions of the ALGATOR, but it is a bit relaxed in the standalone version since all the roles are usually played by a single user.

The measurements
For each problem there are several different measurements that ensure the correctness and by which one can assess the efficiency of the algorithms.These measurements include the indicators of time consumption and of the quality of the result, counters for the usage frequency of the parts of the program code, and the counters of the usage of the basic execution operations (i.e. the machine instructions).In the ALGATOR system all kind of the measurements are supported and are grouped into three categories: the EM, CNT and JVM indicators.
EM indicators.The EM indicators are used to measure the time and other project-specific metrics .All measurements of the time indicators are performed automatically.To provide as accurate time indicators as possible the ALGATOR tries to reduce the influence of the uncontrolled computer activities (e.g.sudden increase of a system resource usage) by running each algorithm several times.The system measures the first, the best, the worst and the average time of the execution.The project administrator only needs to specify the phases of algorithm execution (e.g. the pre-processing phase, the main phase, the post-processing phase, ...) and to select which of the time indicators are to be presented as the result of execution.The project-specific indicators are defined by the project administrator.They can be presented as a string or as a number.For example, for exact algorithms, the value of an indicator could be "OK" (if the algorithm produced the correct result) or "NOK" (if the result of the algorithm is not correct).For approximation algorithms the value of an indicator could be the quality of the result (i.e. the quotient of the correct result and the result of the algorithm).
The values of the EM indicators are generated by AL-GATOR performing the following steps: 1. load the test case and create its project-specific representation, 2. load the algorithm (if the algorithm is implemented in Java, for example, the ALGATOR uses the Java reflection capabilities to create the algorithm instance), CNT indicators.The CNT indicators (the so called counters) are used to count the usage of the parts of the program code.This option is used to analyse the usage of a certain system resource or to count the usage of the selected type of commands on the programming language level.Using this, one can, for example, measure how many times the memory allocation functions were executed during the algorithm execution and the amount of the memory allocated by these calls.One can also use CNT indicators to detect which part of the algorithm is most frequently used.For example, if the problem in concern would be the data-sorting, using the CNT indicators one could count the number of comparisons, the number of swaps of elements and the number of recursive function calls (which are the measures that can predict the algorithm execution behaviour [9]).To facilitate the CNT indicators in the project, the project administrator has to define the names and the meaning of the counters and the researchers have to tag the appropriate places in their code.Everything else is done automatically by the ALGATOR.
JVM indicators.Before the execution of the algorithm, the algorithm code has to be compiled to a code on a lower level.For the C and C++ projects this means that the algorithms are compiled to the machine code of the architecture used by the system, while for the Java projects this means that the algorithms are compiled to the Java bytecode.The performance of the algorithm depends on the number and the type of the low-level instructions used during the execution [5].The ALGATOR enables the analysis of the low-level instruction usages for the algorithms written in the Java programming language.During the execution of the algorithm ALGATOR counts the bytecode instructions that were used and at the end it prints out the statistics for each instruction (the so called JVM indicators).To enable this facility, a special VMep library [7,8] was developed and integrated into an open-source Java Virtual Machine JamVM [6].The VMep enables bytecode counting and makes ALGATOR a very powerful tool for deep analysis of the algorithms' behaviour.In the rest of this paper we will first present some details about the implementation VMep, then we will give two examples of how the JVM indicators can be used in practice.We will show some method for predicting the time consumption based on the analysis of the JVM indicators.
Note that the EM and the CNT indicators are provided for programs written in Java and C++ language while the JVM indicators are (due to obvious reasons) available only for programs written in Java.Currently, an extension of the system is being developed that will enable machine instruction counting for programs running in a non-virtual environment.

USING JAMVM AND VMEP LIBRARY
JamVM [6] is on open source implementation of Java Virtual Machine designed by Robert Lougher.It is a minimal fully operational implementation of JVM written in C and can be translated on several platforms.In order to facilitate Java bytecode counting in JamVM and to simplify the usage of the solution, a special Java library called VMep (Virtual Memory entry point) [7] was developed.To link the system independent Java world with system dependent java virtual machine JamVM, VMep was implemented as a collection of native methods.The main VMep class that supports the initiation and finalisation of the observation is called Monitor.It offers methods like start(), stop() and addRuntimeFilter().The first methods are used to start and to stop (or to pause) the observation, while the latest is used to add filters (i.e. to limit the scope) of the observation and thus to speed up the execution of the program.The Monitor class has two important subclasses, namely InstructionMonitor and MemoryMonitor.The first one is used to provide information about instruction usage while the latest covers the area of memory usage.A usage of VMep library is presented in Listings 2.Here we count the instructions used by the factorial() method (using this method the program multiplies the first 100 integers).The program VMepTest from Listings 2 prints a statistics about the instructions usage as presented in Listings 3. We observe that only 16 different Java bytecode instructions (out of 202) were used to calculate the result.The overall number of used instructions was 904 or around 9 instruction per a loop (note that the calculation of    Due to additional tasks that are performed by the VMEP library during the execution of algorithms (i.e.collecting the information on instructions and memory usage), the running times of the algorithms executed by JamVM are significantly bigger than those obtained by running the same algorithms on a standard VM.The slowdown factor tends to be a constant, around 3.6 on average.
VMep library in integrated in ALGator in such a way that a final user can use it without actually knowing its implementation details.By calling the appropriate ALGATOR's module, the user gets Java bytecode statistics ready to be analysed by the analytics module.More precisely, by executing, for example, java algator.Execute Sorting -m jvm, ALGATOR will run all algorithms in the Sorting project and save the results to output files.By calling java algator.Analyse Sorting the user will be able to analyse the results and produce tables, charts and graphs as depicted in Figure 1.
In this example we were looking for a relation between the minimal execution time (Tmin) and the three most frequently used Java bytecode instructions (i.e., ILOAD, ALOAD 1, and IALOAD) in the JWirth algorithm.We found out that Tmin (the blue line in graph) very closely correlates with X (the red line in graph), where X was defined to be

JVM INDICATORS IN PRACTICE
To explore the measuring capabilities of the ALGATOR we chose a simple matrix multiplication problem: given two square matrices A and B each containing n 2 elements (a i j and b i j for i, j = 0, . . ., n − 1) calculate the elements of a square matrix C by Since the number of operations in this formula is cubic to n, we can reasonably expect that the time complexity of any algorithm implementing this formula would have the time complexity Θ(n 3 ).The simple implementation of this formula is presented in listings in Figure 4.

void MUL(int[][] A, int[][] B, int[][] C) {
for (int i = 0; i < A.length; i++) { for (int j = 0; j < A.length; j++) { for Fig. 4 The Java code for the MUL algorithm We named this implementation the MUL algorithm since its main (and the most consumptive) operation is the multiplication.Using the ALGATOR's time complexity indicators we measured the time needed to execute this algorithm on a set of test cases with dimensions n ranging from 200  1 The Java bytecode instructions usages in the matrix multiplication algorithm.
to 500.We run all the tests described in this paper on a personal computer with Intel(R) Core(TM) i7-6700 CPU running at 3.40GHz with 32Gb of memory.The execution of the matrix multiplication for smaller inputs (n=200) was done in 6000 microseconds and for larger matrices (n=500) in 140.000 microseconds.To eliminate the impact of the real environment we executed all the tests (i.e.we calculated each product) for 500 times and we took the minimal time of all executions (obviously, this is the time in which the execution can be performed if the environmental influences are as small as possible).The time of the execution for the matrix multiplication problem is depicted in Figure 6 with blue line.In the same graph a simple prediction for the execution time is also depicted.It was calculated by a simple method called Calc1.In this method we calculated a multiplication factor c = avg(time i /n 3 ).The red dots in Figure 6 represents a graph of a function cn 3 .Obviously the red dots are of the same shape as the blue line (which is due to the fact that our algorithm has Θ(n 3 ) time complexity) but it is not very accurate.An average error (i.e. the difference between measured (blue) and calculated (red) time divided by measured time) is 11,25%.This error is a bit smaller (i.e 7,1%) if we take only bigger dimensions of input matrices (from 300 to 500), but it is still relatively big.Therefore the method Calc1 can not be considered a very successful method.
Fig. 6 The time complexity of the MUL algorithm (blue line) and the performance prediction calculated by a simple Calc1 method (red dots).
In order to find better performance prediction for this algorithm we used ALGATOR's capability for measuring the usages of the Java bytecode instructions (the bytecode for the algorithm MUL is listed in Figure 5).The results show that only 16 (out of 202) Java bytecode instructions are used during the execution of this algorithm: 10 instructions for the stack manipulation (ICONST 0, ILOAD, ALOAD 1, ALOAD 2, ALOAD 3, IALOAD, AALOAD, ISTORE, IASTORE, DUP2), two instructions to control the flow of the program (IF ICMPGE, GOTO), the AR-RAYLENGTH instruction used to determine the size of an array, and three arithmetic instructions (IADD, IMUL, IINC).The frequencies of the usages of these instructions for the matrices of sizes from 10 to 50 are presented in Table 1.As it is clearly seen from the data in the table, for most of the instructions their usages in the matrix multiplication algorithm is of the order Θ(n 3 ).The only exceptions are the instructions ICONST 0 and ISTORE with the order Θ(n 2 ).From the data presented in Table 1 we calculated the overall number of the instructions INST(n) used in the MUL algorithm: This means that in the case of n = 500, for example, the JVM performs 25 × 500 3 + 12 × 500 2 + 12 × 500 + 6 = 3.128.006.006bytecode instructions to execute the MUL algorithm.Since this execution requires approximately 140.000 microseconds, an average time to execute one Java byte code instructions is 0,044ns.Analyzing the results presented in Table 1 (extended with measurements for n=60, ..., 500) a natural question arises: can we calculate an average time (over all the measurements) used to execute one bytecode instruction and use this average to predict the behaviour (i.e.time consumption) of the MUL algorithm for a given n.To find an answer to this question, we propose the following method Calc2: calculate the average time I n used for one bytecode instructions while performing MUL on the matrix of size n (e.g.I 500 = 0, 044) and calculate I as an average of I n .Then use I to estimate the execution time of MUL by T (n) = I * INST (n).Using this method we calculated I = 0, 039ns (note that we used only measurements for n = 300, ..., 500 since we assume that the measured times are much more accurate for bigger inputs).Surprisingly, the Calc2 method gives very similar results as the method Calc1: an average difference between those two methods is 0,03% for n = 300, . . ., 500.In other words, calculating the uniform average time per bytecode instruction yields another useless method for estimation of time consumption.
The main reason for bad results is that some bytecode instructions are much more expensive than the others.For example, we can reasonably assume that the IMUL instructions takes much more time to execute than the ILOAD instruction (the first instruction multiplies two integers while the second one loads an integer onto a stack).The question is, how many different types of instructions (instructions of the same type take approximately the same time to execute) are included in the MUL algorithm.To answer this question we implemented two algorithms, both of then very similar to MUL.The first one, the ADD algorithm, is an exact copy of MUL with the only difference in the line 5 where we instead of multiplication use addition ( ).The execution of this algorithm results in the usages of exactly the same Java bytecode instructions, the only difference is that instead of IMUL in ADD algorithm only IADD instruction is used (which is logical, but we also proved this by scanning the ALGATOR's jvm indicators).As a consequence, in the MUL we have n 3 IADDs and n 3 IMULs while in ADD we have only 2×n 3 IADDs.The number of all the other instructions is equal in both algorithms.In the second algorithm, SET, we deleted line 5 of MUL and replaced it with 4 lines as showed in the listings in Figure 7.The resulting SET algorithm compiles into a Java bytecode program with exactly the same number of instructions as the MUL, which means that for executing SET on matrices of size n, JVM also performs INST (n) bytecode instructions.The only difference is that the SET does not use the IADD and IMUL instructions.
The algorithm SET uses only the following twelve instructions: ICONST 0, ILOAD, ALOAD 1, ALOAD 2, ALOAD 3, IALOAD, AALOAD, ISTORE, IINC, IF ICMPGE, GOTO, ARRAYLENGTH.We made an assumption that these instructions are all equally consumptive, we named them as "simple instructions", and we used the method Calc2 to calculate their average execution time I.Using the resulting I = 0, 0248ns (for the further reference we will denote it with I s ) and formula T SET (n) = I S * INST (n) we found out that the Calc2 methods in this case yields almost a perfect estimation.Figure 8 shows the measured time of the SET method (blue line) and its estimation provided by Calc2 method.An average error (n = 200, ... . . ., 500) of this method is 0,4%.This means that the calculated I s = 0, 0248 nanoseconds is a reasonably good estimation for the execution time of every simple Java bytecode instruction on this computer.Using the measured times of MUL and SET and averaging for n = 300, . . ., 500 we obtain λ = 0, 22 and I A = 0, 24 nanoseconds.This means that an average cost of an arithmetic operations IADD and IMUL is 9,7-times bigger than an average cost of a simple instruction.
To estimate the execution time of the MUL algorithm we use the following Calc3 method: given the factors I S and I A , calculate the estimation of the time complexity of the MUL algorithm by Using this estimation we find out that it much better fits the MUL algorithm then the previous ones.Graph in Figure 10 shows the time complexity of MUL with blue line and the Calc3 estimation with red dots.An average error of this estimation (for n = 300, . . ., 500) is 2.3%.
For the next example of JVM indicators usage let us consider the data-sorting problem.Here an algorithm aims to sort (in a prescribed order) the input array of numbers.It is well known that the fast sorting algorithms can perform this task in O(n log n) time.In our experiment we used the so-called Wirth's algorithm, which is a special case of the QuickSort [9] sorting algorithms.It uses one pivot (the first element of the input array) to split the array into two sub-arrays (one with numbers that are less than (or equal to) the pivot and the other with numbers that are greater then (or equal to) the pivot) and then it sorts these arrays recursively (see algorithm's code in Listings 12).Running this algorithm in AL-GATOR reveals that the Java bytecode of this algorithm uses only 17 different instructions, namely, IFGT, ISUB, ALOAD 0, INVOKEVIRTUAL, RETURN, ILOAD 3, ILOAD 2, IF ICMPGT, ISTORE, IASTORE, IF ICMPGE, IF ICMPLE, GOTO, IINC, IALOAD, ALOAD 1, and ILOAD.It is interesting (but according tho the nature of the Java virtual machine, which is a stack oriented machine, not very surprising) that a majority of work is done by only three instructions: IALOAD, ALOAD 1, and ILOAD.The number of all instructions used by Wirth's algorithm when sorting arrays of sizes from 100000 to 500000 if presented Table 2.In this table the number of the three LOAD instructions are presented in the first line, the number of all instructions in the second and the quotient between the first and the second value in the third.For the test cases used in this experiment all the quotients were about 0.60, which means that the three LOAD instructions perform about 60% of all work.This fact can also be observed in a graph in Figure 11 where the number of each instruction used is depicted.We can see that most instructions are used very rarely, some are used moderately and a few of them very frequently.Using this observation and concrete numbers obtained with the experiment one could derived a formula to predict the execution time being dependant only on the number of the three LOAD instructions used while executing the algorithm (see Formula 1).

CONCLUSIONS
In this paper we described the ALGATOR -a system for testing and analysing the algorithms.We showed how  to use the ALGATOR's ability to count the usages of the Java bytecode instructions.Using three algorithms (MUL, ADD and SET) we presented different (more or less efficient) methods to produce the performance prediction of the algorithms based on the number of Java bytecode instruction used.We showed that the "simple" instructions (e.g.ILOAD 0, IALOAD, ISTORE, ...) are equally time consumptive and that they on average take 0,0248 nano seconds to execute (on our computer).We also showed that arithmetic instructions (IADD and IMUL) are much more time consumptive -on our computer these instructions take 0,24 nano seconds (which is almost 10 times slower than the simple instructions).Using these information about bytecode instructions and the formula for total bytecode instruction usages (which was also derived from the results of ALGATOR's execution) we presented a method for the execution time prediction of the selected algorithm for matrix multiplication.The results of this method were much better then the results of a basic (naive) method which estimates the time complexity with a simple cubic function.
The ALGATOR's capability to count the Java bytecode usages helps us to better understand the behaviour of the algorithms.The test case presented in this paper is very educative, but it is not general because of the nature of the selected algorithm (the behaviour of the algorithm is totally deterministic and does not dependent on the input data; the algorithm always uses the same instructions regardless the content of the input matrices).To prove a general usability of the JVM indicators other problems and algorithms should be concerned.

3 .
read the values of the test case specific parameters, 4. run the algorithm and measure the time consumption, 5. read and store the values of the time indicators, 6. determine and store the values of the project-specific indicators, 7. write stored indicators into the output as prescribed in the result description configuration files.

Fig. 2
Fig. 2 Using the VMep library to count the instructions usage while multiplying the first 100 integers

Fig. 1
Fig. 1 Using ALGATOR's analytics module to analyse the Java bytecode usage in Sorting project

Fig. 3 A
Fig. 3 A result produced by the VMepTest class

Fig. 7 Fig. 8
Fig.7The Java code for the SET algorithm

Fig. 9
Fig.9The differences in the time complexities of the MUL and ADD algorithms.To make a good estimation for the MUL algorithm we now only have to determine the estimation for the time complexities of the IMUL and IADD instructions.First we compare the execution time of the algorithms MUL and ADD

Fig. 10
Fig. 10 The time complexity of the MUL algorithm (blue line) and performance prediction calculated by a Calc3 method (red dots).
Fig.12The Java code for the Wirth's QuickSort algorithm

Fig. 11 A
Fig. 11 A statistics of the JVM instructions usage

Table 2
Usage of Java bytecode instructions for Wirth's algorithm.