A Framework for Automatic and Parameterizable Memoization

Improving execution time and energy eﬃciency is needed for many applications and usually requires sophisticated code transformations and compiler optimizations. One of the optimization techniques is memoization, which saves the results of computations so that future computations with the same inputs can be avoided. In this article we present a framework that automatically applies memoization techniques to C/C++ applications. The framework is based on automatic code transformations using a source-to-source compiler and on a memoization library. With the framework users can select functions to memoize as long as they obey to certain restrictions imposed by our current memoization library. We show the use of the framework and associated memoization technique and the impact on reducing the execution time and energy consumption of four representative benchmarks.

We define memoizable functions [3,4] as pure functions, i.e., deterministic and without side effects, with three additional constraints imposed by our In applications with memoizable functions in critical code sections, mem-23 oization may provide important execution time reductions and energy con-24 sumption savings. Our memoization framework relies on internal tables that 25 store the results of previous computations and replace future calls by ta-26 ble lookups. The elements of the internal tables are indexed with a hash 27 calculated from the call arguments of the memoized function.

28
A memoization approach can start by profiling the application and by 29 identifying the contribution of memoizable functions to the overall execu-30 tion of the application. A profiling step may also provide values to setup 31 an initial version of the internal table of the memoization technique. Our 32 framework allows loading internal tables before application runs (e.g., with 33 profiling data) and/or update them at runtime (allowing adaptation to ex-34 ecution contexts not considered during the profiling phase). This solution 35 may enable savings in multiple kinds of applications and allow users to write 36 runtime adaptivity strategies to make applications more resilient to context 37 changes and able to achieve predetermined execution thresholds. 38 The memoization technique can be integrated into any C or C++ ap-39 plication that has memoizable functions or methods. Based on the selected 40 memoizable functions, our framework generates a new version of the applica-41 tion enhanced with memoization support by relying on the Clava 1 source-totions on the code, e.g., where and how to apply the memoization technique in this specific case. The main modular unit in LARA code is the aspect, which 49 can be considered as a function, with input and outputs, but that has the 50 capability of interacting with the internal representation of the application 51 source code. When several aspects are developed to achieve a specific goal, 52 we have a LARA program, which we often refer to as a strategy. 53 Our memoization framework is based on the memoization approach pre-54 sented in [4], that uses a dynamic library and applies memoization at binary 55 level. Our framework has been generalized for C++, adds extensions that 56 allow application flexibility, adds options regarding table loading and table   57 runtime updating, and is applied at the level of the application source code. The software solution presented in this paper relies on two different com-60 ponents which are combined to easily enhance an application with memo-61 ization support. The first component is Clava, a source-to-source compiler 62 able to automatically generate a new version of the application code with 63 the necessary changes to support memoization. This is performed through a 64 series of LARA aspects the user can call and parameterize, all distributed as Consider the memoizable C function foo shown in Figure 2. This is a pure 74 function which takes a single float parameter and returns data of the same 75 type. The source-to-source step consists of two phases, with the first phase 76 being the addition of another function to the program. This new function 77 wraps the target function and includes the memoization logic to interface to 78 the memoization library. This wrapper is also illustrated in Figure 2. The 79 second phase replaces all calls (it is possible to specify a LARA aspect that 80 applies memoization to only certain call sites) to the original function, foo, 81 with calls to its corresponding wrapper, foo wrapper. This technique works 82 for C and C++ functions as well as for C++ methods, and takes into account 83 name mangling, function overloading, and references to objects.

84
The user can, through LARA aspects or manually, further change the memoization, besides changing the source of the application, generates this file. In this article, we propose the use of memoization through Clava, by 98 relying on aspects programmed with the LARA DSL. However, it is also 99 possible to use the standalone library 2 without relying on LARA and the 100 Clava compiler. To do so, the user must manually write the function iden-  The advantage of using LARA strategies is that the memoization library 106 is integrated into the application without performing manual modifications 107 of the source code. The code generated by Clava is then compiled and linked 108 with the associated generated memoization library.

109
In order to enhance an application with memoization with LARA strate-110 gies, users can call the library aspects and define a number of parameters.  wrapper will then return this value. If the inputs are not equal we have a 134 collision, meaning that two different inputs were hashed into the same slot.

135
In this case, the lookup function returns false. Once again, the return value 136 is computed by calling the original function, which is eventually returned.

137
The correct value is always returned whether there was a hit, a miss or  Clava is a source-to-source compiler, written in Java, that uses Clang 3 as 151 a front-end to parse C and C++ source code. The compiler maintains an 152 internal representation of the application which is analyzed and transformed 153 according to the strategies defined in the input LARA code. Clava includes multiple LARA libraries that can be used for different purposes, including 155 the memoization framework presented in this article.

156
The Library Generator is a sed script that takes the function definitions 157 file (*.def) and several code templates to generate the final customized code 158 of the memoization library. This code can then be compiled into a library.

205
This section presents an example of a LARA aspect that illustrates the 206 interface between the user and the memoization library described in the 207 previous section.

208
Consider an example of a C application that uses the mathematical func-209 tions cos, acos and sqrt. Moreover, the profiling of the application shows 210 that there is a large number of calls to a user function called myfunc. Figure 5   211 shows a LARA aspect, named Launcher, that can be used to apply memoiza-212 tion. First, we import the class that contains the memoization aspects (line

223
The Memoize Function aspect is a helper aspect that automatically sets 224 some default parameters when calling other, more general aspects. For in-225 stance, line 4 of Figure 6 exposes some of the parameters that can be specified 226 on those more general aspects. It specifies that it is a C user function (value 227 2), followed by the name of the function (myfunc) and its associated wrap-

264
• equake is an application extracted from SPEC OMP. It has calls to 265 the mathematical functions sin, cos and sqrt in its critical region.

266
• fft is a Fast Fourier transform implementation extracted from the 267 BenchFFT 4 benchmark suite. It calls the functions sin and cos.

268
• rgb2hsi is a benchmarking kernel that converts images from RGB 269 model to HSI model. It calls cos, acos, sqrt, and a pure user function. 270 The tests have been performed on an Intel(R) Core(TM) i7-5600U CPU 271 @ 2.60GHz and the C/C++ codes were compiled using GCC with -O3. were tested with a single input due to the lack of available workloads. Table 1 presents the speedups obtained with the four tested memoziation 279 configurations, over the original version (i.e., without memoization). Table 2 280 presents, for the same configurations, the energy consumption improvements 281 over the original. This was measured using a utility tool that relies on Intel 282 RAPL counters. Values below 1 represent a reduction in energy consumption.

283
For instance, for the atmi benchmark with a 65536-nu configuration, we only 284 consume 71% of the original (0.71 in the table).

285
Overall, the use of our memoization framework allows us to achieve con-