Time-energy measured data on modern multicore systems running shared-memory applications

This article presents execution time and energy data collected from modern multicore systems running shared-memory applications, analyzed using our analytic models. While the full data sets and source code are available on Github, this data-in-brief article includes some samples and describes the experimental setup.

used. Table 7 presents measured data on heterogeneous multicores with dynamic OpenMP scheduling when all cores are used. Table 8 shows model's output per system and application, while Table 9 presents a summary of model accuracy per system for all applications and speedup laws used, with respect to the sequential fraction and energy savings. The data in Table 8, corresponding to Amdahl's law [3], is plotted in Fig. 2. The corresponding data derived with Gustafson's law [4] is plotted in Fig. 3.

Setup
The experimental setup is depicted in Fig. 1. To collect power and energy, we use a Yokogawa WT201 power meter connected to the 240V AC power line. A controller system is used to start the experiments and collect execution and energy data from the target system. The power and energy samples are collected once per second. Table 1 summarizes the characteristics of the target systems used in our measurements. Table 2 summarizes the shared-memory applications with their input parameters, as used for collecting the measurements. These applications are selected from well-known benchmarking suites, such as NPB [5], Rodinia [6], Parsec [7] and Mantevo [8]. In addition to the first seven applications presented in our research work [1,2], we provide data for CloverLeaf (CL), miniFE (FE) and miniGhost (GH) benchmarks from Mantevo suite [8], running on Xeon, i7 and Pi3.

Measured data
Measured time-energy data consists of seven columns, as shown in Table 3 for EP execution on Xeon. Each row represents the execution on a number of cores of the given application on the given system. The columns represent the number of nodes, number of cores per node, the core clock frequency of the cores, the execution time in seconds (s), the energy in Watts-hour (Wh) and Joules (J), and the average power consumption in Watts (W). The number of nodes is always one because these Specifications Execution time and energy data were collected while the hardware system was running only the target shared memory application and the operating systems. The measured data includes noise from the operating system. There is no pretreatment of samples or data.

Experimental features
-Power and energy data were collected with a Yokogawa WT210 at a rate of one sample per second -Execution time represents wall clock time and is measured in Linux using/usr/bin/time Data source location Singapore Data accessibility The data and source code associated with this paper are available on Github: https://github. com/dloghin/multicores-time-energy Value of the Data This set of data includes execution time and energy measurements of up to ten shared-memory applications covering multiple domains on a wide range of modern multicore systems. These systems include both high-performance and low-power, homogeneous and heterogeneous, and are representative for server, desktop and mobile domains. The data can be used to understand the time and energy performance of modern shared-memory multicore systems. It can serve as a reference for other researchers in the domain. The source code implements the models described in our work [1,2] and serves as a starting point for researchers, developers and system designers experiments are run on single-node shared-memory multicore systems. To apply our models [1,2], the key columns to consider are Cores, Time and Energy. For heterogeneous systems, such as XU3 and TX2, we provide four measured data sets per application, as exemplified in Tables 4e7 for EP on XU3. The first two data sets represent the execution with  HMP (Denver þ ARM Cortex-A57) 6 (2 þ 4) 2.04 8   OpenMP static scheduling on big and little cores, respectively. The last two data sets represent the execution on all cores using static and dynamic OpenMP execution, respectively.

Model output data
Our analytic models [1,2] are implemented in Python and can be run on a Linux system using the provided bash scripts. There are two wrapper scripts corresponding to homogeneous and heterogeneous systems, respectively. Besides speedup and energy data, these scripts take as parameters the number of cores, the active power fraction (APF) [1,2] and the idle power of the system. By tweaking these parameters, users can explore new system designs and estimate their time-energy efficiency.  Model output data consists of nine columns, as shown in Table 8 for EP running on Xeon when Amdahl's law [3] for speedup is used. The first column represents the number of cores used for execution, while the other eight columns represent measured and predicted speedup, energy savings, execution time and energy, respectively.
In addition, the source code implementing the model reports the sequential fraction and the Root-Mean-Square Deviation (RMSD) between measured and predicted values across all core counts. A summary consisting of the sequential fraction (f), RMSD of the sequential fraction (RMSD(f)) and RMSD of energy savings (RMSD (es)) for each workload and for both Amdahl's and Gustafson's laws, is written in a stats.csv file for each system. Table 9 exemplifies such data for the Xeon system.
The speedup values in Table 8 correspond to Amdahl's law [3] and are used to plot Fig. 2. On the other hand, Fig. 3 represents the same measurements, while the predicted speedup is determined using Gustafson's law [4]. The results for other systems are presented in our research papers [1,2].