Experimental data from the simulation of on-chip communication architectures using RedScarf simulation environment

This article presents data from an extensive set of simulation-based experiments to compare the performance of on-chip communication architectures. These experiments were performed using the RedScarf simulation environment [1], which is described in the article entitled ‘RedScarf: an open-source multi-platform simulation environment for performance evaluation of Networks-on-Chip’ [2]. In the experiments presented here, several intra-chip communication architectures were compared under different traffic patterns. Latency, jitter, and throughput metrics were collected. Data is useful for researchers investigating on-chip communication architectures who need baseline data for comparison.


Data
Tables 1e3 present data obtained from experiments carried out using RedScarf [1,2] to evaluate the performance of 64-node on-chip communication architectures submitted to six traffic patterns based on different spatial distributions. The tables present the average latency, the jitter, and the throughput measured by varying the operating frequency. The first metric is the average of the latency of all the packets delivered, while the second metric measures the dispersion of the latency suffered by these packets (i.e., the standard deviation). The third metric consists of the traffic that the network accepted given the offered traffic, which varies with the changing of the operating frequency because the injection rate is constant (320 Mbps). Tables 4e6 show data related to the simulation of a 4 Â 4 2D Mesh topology running with five different routing algorithms, including one deterministic algorithm and four partially-adaptive routing algorithms based on the Turn Model [3]. Tables 7e9 presents data collected from experiments that evaluate the impact of four arbitration policies on the average latency, jitter, and throughput of a 4 Â 4 Torus topology. Finally, Tables 10e12 show the impact of the buffers depth and the use of output buffers on the three performance metrics of a 4 Â 2 Â 2 Mesh topology. Seven different memory schemes are employed.

Experimental design, materials, and methods
The experiments used synthetic traffic generators to inject packets into the network. Traffic generation and analysis employed the model proposed in Ref. [4] with the discard of the first packages delivered to avoid the systematic bias of the simulation, according to the method presented by Ref. [5].
Specifications Table   Subject Hardware and Architecture Specific subject area Simulation-based performance evaluation of on-chip communication architectures  Type of data  Tables  How data were acquired Computational simulation Data format Raw and analyzed Parameters for data collection Experiments are based on traffic patterns composed of communication flows configured to inject 128-bit packets into 32-bit data links at a constant injection rate of 320 Mbps. Description of data collection Data were obtained using modules which collect information about each packet delivered by the communication architecture. These modules collect the necessary       Static 10,012.9 13.9 13.9 13.9 13.9 13.9 13.9 13.9 13.9 13.9 Rotative 10,012.9 13.9 13.9 13.9 13.9 13.9 13.9 13.9 13.9 13.9 Random 10,012.9 13.9 13.9 13.9 13.9 13.9 13.9 13.9 13.9 13.9 Round-Robin 10,012.9 13.9 13.9 13.9 13.9 13.9 13.9 13.9 13.9 13.     The simulations applied six spatial traffic distributions, including Uniform, Bit-reversal, Perfect Shuffle, Butterfly, Transpose, and Complement [6]. In spatial distributions in which a source node can generate multiple communication flows to different destinations (e.g., Uniform traffic), their packets were generated and injected in random order with the use of the random number generator of the Cþþ Standard Template Library (STL), which relies on a uniform discrete distribution. This generator utilizes a seed computed in the front-end, and this seed is derived from the current simulation start time. In permutation-based distributions, when the destination node of a communication flow was the source node itself, the traffic generator did not generate the corresponding packets. Regarding the temporal  As shown in the tables presented above, the experiments evaluated the performance of six on-chip communication architectures, including Bus, Crossbar, and four NoC topologies (Chordal Ring, 2D Mesh, 3D Mesh, and 2D Torus). The experiments did not consider all the architectures for each spatial distribution. We did not evaluate the permutation-based distributions on the Bus as traffic does not depend on the destination addresses. We did not also consider NoC topologies in which a deadlock condition was reached because the routing algorithm was not able to avoid it. In addition to the topologies, the experiments also evaluated five routing alternatives in a 2D Mesh, four arbitration policies in a 2D Torus, and seven memorization schemes in a 3D Mesh.
We have configured the simulations to run until the delivery of 100,000 packets. The first 40,000 packets (40%) were discarded from the analysis to reduce sampling bias relative to the network warmup period. It was not necessary to discard packets at the end of the simulation (drain period) as the simulator did not stop generating packages until the stop condition was reached.
The experiments were performed under different operating frequencies. As the data link width equals 32 bits and the injection rated is constant (320 Mbps), each operating frequency (e.g., 50 MHz) corresponds to a specific offered traffic (e.g., 0.20), as it is shown in the header of each table above.
The metrics used in the experiments were the average packet latency, jitter (defined by the standard deviation of packet latencies), and the throughput (which expresses the accepted traffic). These data are presented in the tables above and can be used as a reference for researches on on-chip communication architectures.