1 Introduction

System-on-Chip (SoC) can be defined as an integration of multiple components of an electronic system onto a single chip. This integration technology has allowed more components to be placed in the same area. The consequent increase in the number of components has called for a novel communication technology, named Network-on-Chip (NoC), between the electronic components since the traditional communication methods have proved to be a bottleneck in the whole system [1]. NoC technology has replaced traditional bus-based or point-to-point communication methods, allowing communication to be faster and more energy-efficient. However, this communication method introduced points far from each other, making the multi-hop communication between such cores inefficient in terms of latency and energy consumption. This inherent property of NoCs made design decisions such as topology selection, routing, and application mapping critical while designing a chip [2].

One solution to shorten the hop counts between remote components can be allowing them to communicate through a wireless connection. Previous studies suggested different antenna types for on-chip wireless communication, such as millimeter-wave and carbon nanotube antennas [3] and ultra-wideband antennas [4]), which led to Wireless Network-on-Chip (WiNoC) architectures. The aforementioned NoC design decisions are also quite important in designing a WiNoC system, especially in hybrid topologies, where wireless and wired routers are used on the same topology [5, 6]. Although WiNoC enables higher scalability and bandwidth along with lower communication latency and energy consumption than traditional wired NoC by reducing communication distance between remote points to a single-hop wireless interconnect, it still introduces its own challenges related to the integration of wireless routers, such as the complexity of hardware or power overhead [7].

WiNoCs can be designed using either a purely wireless-based topology or a hybrid topology (see Fig. 1). In a wireless-based topology, the components are connected only using wireless links. In a hybrid topology, there are wires between the components that are close to each other, while components that are far away from each other use wireless links [4]. Designs that have hybrid topology can be categorized into “2D mesh-based hybrid topology”, “multiple tiers hybrid topology”, and “small-world-based topology”, with designs based on “small-world-based topology” being the popular choice [4, 5]. The hybrid topology is widely accepted to be better than the pure wireless-based topology since the latter one has greater channel arbitrations, poor scalability, and area overhead [4, 5]. Therefore, we focus on the design challenges of hybrid WiNoCs in this paper.

Fig. 1
figure 1

Purely wireless and hybrid 2D mesh WiNoC topologies

One of the important design challenges for NoC design is mapping tasks of the given application onto cores of the NoC topology under given constraints and objective functions. The application mapping to multiple cores is known to be NP-hard [8]. Although there are several successful mapping algorithms for NoCs using different methods such as integer linear programming (ILP) [9], metaheuristics [10], and heuristics [11,12,13], the literature lacks the optimal mapping techniques for hybrid WiNoCs. In this study, we aim to fill this gap and present two novel application mapping methods based on quadratic programming (QP) and simulated annealing (SA) for WiNoCs. Our methods take the application graph and the hybrid 2D WiNoC mesh topology, where some routers communicate through wireless links. The objective of our methods is to minimize the communication energy consumption of the application.

We can list the main contributions of this paper as follows:

  • We introduce a QP-based model for application mapping in 2D mesh-based hybrid WiNoCs, aimed at minimizing energy consumption. While optimal for smaller applications, its long execution times limit its use for larger node counts. This model’s results serve as a benchmark for evaluating our metaheuristic approach.

  • We propose a metaheuristic SA-based method addressing the same mapping challenge, achieving optimal or near-optimal solutions efficiently across various problem sizes. This demonstrates a significant improvement in runtime performance.

  • We conduct comparisons between our QP and SA methods under various wireless router placements, illustrating the impact of router placement on communication cost and system design. This includes an analysis of how the number of routers and tasks influences performance, with the SA method consistently providing rapid, effective solutions compared to the slower QP model.

  • We compare the SA-based method against existing heuristic mapping approaches in the literature, showcasing its superiority in achieving better performance across all benchmarks used in our study.

The rest of the paper is organized as follows. We present the related work in the next section. In Sect. 3, we give the details about the energy model and the formal problem definition. In Sect. 4, we present our optimal application mapping method along with the QP formulations, as well as our SA-based metaheuristic application mapping method that tackles the same problem. We show and discuss the experimental results in Sect. 5. Finally, we conclude this paper with future directions in Sect. 6.

2 Related work

There have been several studies in the literature proposing application mapping solutions for wired NoCs, which employ methods including optimal schemes such as integer linear programming (ILP) [9, 14], metaheuristic methods such as simulated annealing (SA) [14, 15], genetic algorithm (GA) [14, 16,17,18,19,20], bio-inspired optimization algorithms (membrane computing) [21], particle swarm optimization [22], ant colony optimization [23, 24], spiral optimization [20], branch and bound [25, 26], and various heuristic methods [2, 11,12,13, 27,28,29,30,31,32].

However, there are only limited studies that tackle the application mapping problem for WiNoC in the literature; hence, there is a necessity for novel optimized solutions for this problem. In [33], authors focused on the congestion control issue in application mapping algorithms for hybrid WiNoCs. They proposed a dynamic application mapping algorithm (DAMA), which introduced three key additions to the existing solutions: optimal selection of the first node where the mapping of tasks should start, finding the task with the most edges to be mapped first with the goal of lowering the probability of internal congestion, and determining an area of adjacent available nodes to which the tasks should be mapped to avoid external congestion.

In [34], authors proposed a round rotary mapping (RRM) for hybrid WiNoCs that also aims to avoid congestion in wireless links and balance heat distribution in the system. The proposed algorithm prevents congestion by employing the concept of minimum Hop-Count Contiguity while mapping the tasks as closely as possible to avoid communication over long distances, whereas even heat distribution is achieved by mapping the input applications onto different regions in a round-robin manner. The experimental results showed that the proposed algorithm significantly improves congestion and temperature management while also achieving lower latency and total execution time.

Sacanamboy-Franco et al. presented a multi-objective GA in [35], which aims to improve the figures of acceleration, power consumption, and network bandwidth over a hierarchical hybrid WiNoC topology where a set of subnets using wired mesh topology resides on the first level, and a star topology with wireless links resides on the second level. Although the main objective of the optimization is the performance, other metrics are considered to be important as well, as they would cause bottlenecks in the system, such as congestion caused by the bandwidth or power concentrations resulting from the poor power consumption of nodes. The experimental results showed that the proposed algorithm was able to optimize multiple metrics simultaneously in a few dozen of iterations.

Chen et al. also presented a multi-objective GA-based mapping approach for different topologies in [36]. The experiment results showed that, even while relying solely on the mapping approach, their algorithm was able to achieve a gain in performance. Co-design resulted in a greater performance and power consumption gain.

In [37], authors proposed a WiNoC mapping approach that adapts a population-based incremental learning (PBIL) algorithm proposed in [32] and improves its convergence time compared to Shannon formulation by employing two Renyi entropies to adjust the learning rate. The proposed algorithm also avoids mapping tasks that would cause overlapping. They evaluated the proposed method on hierarchical WiNoC architectures and demonstrated the achieved improvements over the existing approaches with respect to speed and accuracy.

In [38], Morales et al. presented a simulation-based approach for the evaluation of application mapping methods designed for WiNoC with a goal to enable their evaluation based on the relevant metrics other than communication network performance, which is noted as a shortcoming of the existing WiNoC simulators. They tested their proposed evaluation strategy on a mapping algorithm for NoC-based systems proposed in [31] on a WiNoC architecture and showed improved latency results.

Dehghani et al. introduced a novel deadline-aware and energy-efficient dynamic task mapping and scheduling approach for hybrid WiNoC-based multicore systems in [39], which addresses the challenges of mapping and scheduling dynamic application workloads, particularly for real-time applications, by leveraging wireless links and considering both core utilization and task laxity. Through simulations, the method demonstrated superior performance in reducing communication energy consumption, deadline violation rates, communication latency, and runtime overhead compared to the existing approaches.

In [40], Sacanamboy proposed a population-based incremental learning (PBIL) algorithm, incorporating neural networks and genetic algorithms, to optimize task mapping for IoT applications on WiNoC architectures. Aiming at enhancements in bandwidth, speedup, power consumption, and communication cost, the PBIL algorithm showed significant improvements in both synthetic and real-world applications, notably outperforming existing algorithms in terms of power efficiency and communication cost reduction, with substantial gains observed across different topologies.

In Table 1, we present a summary of the related work that tackled the application mapping problem in WiNoCs. We note the optimization targets along with methodology and target topology.

Table 1 A summary of related hybrid WiNoC mapping studies

In this study, the proposed QP-based model determines optimum solutions for the application mapping on 2D mesh-based hybrid WiNoCs. To the best of our knowledge, there is no previous study that uses mathematical programming to solve the problem we tackle here optimally. Furthermore, our study is the first to utilize SA as a metaheuristic for the same problem, since the QP-based model is unsuitable for problems with many variables. We chose SA for its flexibility and faster search in large optimization spaces compared to other metaheuristics. Additionally, SA guarantees convergence to the global optimum solution if certain conditions, such as the cooling schedule and the probability of accepting a worse solution, are met.

3 Problem definition

Our application mapping problem for 2D mesh-based hybrid WiNoC (illustrated in Fig. 2) involves an input application that can be represented as a weighted communication task graph (WCTG).

Definition 1

(Application as a weighted communication task graph) We denote the input application’s weighted communication task graph (WCTG) representation as \(G_{A} = (V_{A}, E_{A})\), where each vertex \(v_{i} \in V_{A}\) corresponds to a task in the application, while each edge \(e_{i,j} \in E_{A}\) represents a dependency between tasks i and j, with a weight \(w_{i,j}\) measured in bits per second (bps) for the amount of data transfer required between these tasks.

We give an example input application WCTG with eight tasks at the bottom left corner of Fig. 2.

Fig. 2
figure 2

An illustration of the application mapping problem in hybrid WiNoC. WR in the TG is the tiles with extra wireless links while others are connected to the network via wired connections

Definition 2

(Target Architecture as a Topology Graph) Similarly, the target architecture is represented as a topology graph (TG), denoted as \(G_{T} = (V_{T}, E_{T})\). Each node \(n_{i} \in V_{T}\) corresponds to the router of a tile in the topology, while each edge \(l_{i,j} \in E_{T}\) denotes a physical link between nodes i and j with a maximum link capacity \(c_{i,j}\), i.e., the maximum supported data transfer in bps.

We give an example 3 × 3 2D mesh type TG in Fig. 2. As the example TG illustrates, we enumerate the routers starting from the top-left router and ending in the bottom right in the order from left to right and top to bottom. In this TG, we represent the nodes with extra wireless links as WR, indicating that it is a wireless router. These routers have also wired connections to their immediate neighbors. In this study, we manually select the locations of the wireless routers. While we are doing this, we try to minimize the distance of the remote nodes and consider the selection methods of previous studies as we will discuss them in the experimental result section. Selecting the numbers of wireless routers and their locations on the mesh is also an optimization problem and we leave this problem as a future work.

Definition 3

(Application Mapping Problem) In this study, we address the application mapping problem in WiNoCs, which involves identifying a one-to-one mapping function F that maps each vertex in the WCTG to the nodes of the TG, with the objective to minimize the energy consumed by communication among the nodes. We give the formulation of the problem in (1).

$$\begin{aligned} {F:V_{A} \rightarrow V_{T}} \quad s.t. \quad {f(v_{i}) = n_{i}, \forall v_{i} \in V_{A}, \exists n_{i} \in V_{T}, |V_{A}| \le |V_{T}|} \end{aligned}$$
(1)

We limit the mapping function as a one-to-one function to simplify the overall mapping problem. It is possible that this type of mapping may lead to imbalanced loads among nodes; however, there are ways to address this issue. One such solution involves grouping nodes to achieve load balancing before the mapping phase. This approach can be particularly useful for applications such as deep neural networks (DNNs), which consist of multiple layers with varying numbers of neurons [41]. Directly mapping each neuron to a node can be quite complex; however, clustering neurons and then mapping them to the same number of nodes can significantly simplify the mapping process.

We show an example mapping result on the right side of Fig. 2 for the given WCTG onto the 3x3 TG. Our goal is to propose an optimization method as a solver to our mapping problem as shown in the same figure. In the subsequent section, we will describe the constraints and objective function in our mathematical model, which will provide a comprehensive overview of our mapping problem.

4 Proposed application mapping methods

This section will introduce our two application mapping methods, a quadratic programming-based mathematical model and a simulated annealing-based metaheuristic method. Prior to delving into their specifics, we will outline our preprocessing step, which is necessary to identify the inputs required for our methods.

4.1 Preprocessing algorithms

A 2D mesh topology can have multiple routes between tiles. When wireless connections are added, the number of alternative routes increases, as shown in the example TG in Fig. 2. Wired and wireless connections have different communication costs, and sometimes a pure wired connection may be less costly than a wireless or hybrid connection, while other times, a wireless connection might be more efficient depending on their cost ratio for a given distance and the total distance. Thus, the least costly communication path needs to be determined for each pair of tiles on the topology. This involves determining the weights of the communication cost for both wireless and wired connections and calculating their shortest path using these weights.

4.1.1 Weight calculation

The first step of our method is to calculate the communication cost weights between each connected tile on the given TG since the communication cost for wired and wireless connections have different weights. Therefore, each calculated weight we determine represents the communication cost between two directly connected tiles, i and j. We represent the given TG, illustrated in Fig. 2 as the top-left target architecture, with an adjacency matrix \(M_{TG}\). In this matrix, the wired connections are represented by the letter c and wireless connections are represented by the letter w. We assume the wired connections as 1 unit (i.e., \(c=1\)). If there is no direct connection between two tiles, the corresponding cell in the matrix is \(\infty\) while the cell that represents the connection to the tile itself is filled with the value zero. We give an example \(M_{TG}\) in (2), which describes the connections of the TG given in Fig. 2.

$$\begin{aligned} M_{TG}= \begin{bmatrix} {0} &{} {c} &{} {\infty } &{} {c} &{} {\infty } &{} {\infty } &{} {\infty } &{} {\infty } &{} {w} \\ {c} &{} {0} &{} {c} &{} {\infty } &{} {c} &{} {\infty } &{} {\infty } &{} {\infty } &{} {\infty } \\ {\infty } &{} {c} &{} {0} &{} {\infty } &{} {\infty } &{} {c} &{} {\infty } &{} {\infty } &{} {\infty } \\ {c} &{} {\infty } &{} {\infty } &{} {0} &{} {c} &{} {\infty } &{} {c} &{} {\infty } &{} {\infty } \\ {\infty } &{} {c} &{} {\infty } &{} {c} &{} {0} &{} {c} &{} {\infty } &{} {c} &{} {\infty } \\ {\infty } &{} {\infty } &{} {c} &{} {\infty } &{} {c} &{} {0} &{} {\infty } &{} {\infty } &{} {c} \\ {\infty } &{} {\infty } &{} {\infty } &{} {c} &{} {\infty } &{} {\infty } &{} {0} &{} {c} &{} {\infty } \\ {\infty } &{} {\infty } &{} {\infty } &{} {\infty } &{} {c} &{} {\infty } &{} {c} &{} {0} &{} {c} \\ {w} &{} {\infty } &{} {\infty } &{} {\infty } &{} {\infty } &{} {c} &{} {\infty } &{} {c} &{} {0} \\ \end{bmatrix} \end{aligned}$$
(2)

We give the pseudocode of the algorithm we use to determine the weights between wired and wireless connections in Algorithm 1. When we calculate the communication cost weights of the wireless connections, we assume that there is a ratio between the wired and wireless communication costs for transferring data for the same distance. We represent this ratio with a constant value \(\varrho\). In our experiments, we used several values of \(\varrho\) to show its effect on our results. We assume the distance (i.e., \(d_{\rm tiles}\)) between routers with a wired connection is the same and the weight for each wired connection edge(ij) is determined by multiplying the distance with the wired weight of 1 unit (Line 7 in Algorithm 1). However, the distance between the routers with wireless connections is longer than wired ones. In order to determine their weights, we first need to calculate the Euclidean distance between two routers (Line 9). We then multiply this distance value with the base distance value between tiles (i.e., \(d_{\rm tiles}\)) to determine the total distance (Line 10). Finally, we multiply the total distance with our cost ratio \(\varrho\).

Algorithm 1
figure a

Edge weight calculation algorithm

4.1.2 Least communication cost calculation

After determining the communication cost weights for each connected pair of tiles, we adopt the Floyd–Warshall shortest-path algorithm [42] to determine the least costly path among all pairs of tiles in the given TG. We give the pseudocode of the adopted algorithm in Algorithm 2. The reason for selecting Floyd–Warshall is that it is a versatile algorithm that is effective in determining all pairs’ shortest-path problems.

Algorithm 2
figure b

Floyd–Warshall-based least communication cost algorithm

4.2 QP-based method

We give the notations used in our QP formulations in Table 2.

Table 2 Notations used in the mathematical model

\(\mu _{i,j}\) is a binary variable that is true if the task i is mapped to the tile j, and false otherwise, as formally shown below in (3) and (4).

$$\begin{aligned}{} & {} \forall (i \in Tasks, j \in Tiles): \mu _{i,j} \quad is\_binary \end{aligned}$$
(3)
$$\begin{aligned}{} & {} \mu _{i,j} = {\left\{ \begin{array}{ll}1 &{} if Task_{i} is assigned to Tile_{j} \\ 0 &{} otherwise \end{array}\right. } \end{aligned}$$
(4)

Each task must be mapped to exactly one tile. This is formulated in Eq. (5).

$$\begin{aligned} \forall i \in Tasks: {\sum _{j \in Tiles} \mu _{i,j} = 1} \end{aligned}$$
(5)

On the other hand, each tile can be assigned at most one task. This is formulated in (6).

$$\begin{aligned} \forall j \in Tiles: {\sum _{i \in Tasks} \mu _{i,j} \le 1} \end{aligned}$$
(6)

\(\kappa _{i,j,r,s}\) is a binary variable that is true if task i is mapped to the tile r and task j is mapped to the tile s. It is false otherwise. This is formally shown (7) and (8).

$$\begin{aligned}{} & {} \forall (i,j \in Tasks, r,s \in Tiles): \kappa _{i,j,r,s} \quad is\_binary \end{aligned}$$
(7)
$$\begin{aligned}{} & {} \kappa _{i,j,r,s} = {\left\{ \begin{array}{ll}1 &{} if Task_{i} is assigned to Tile_{r} \\ &{} and Task_{j} is assigned to Tile_{s} \\ 0 &{} otherwise \end{array}\right. } \end{aligned}$$
(8)

\(\kappa _{i,j,r,s}\) is calculated by multiplying (i.e., logical AND operation) \(\mu _{i,r}\) and \(\mu _{j,s}\), as formulated in Eq. (9) to obtain a QP-based model.

$$\begin{aligned} {\forall (i,j \in Tasks, r,s \in Tiles): \kappa _{i,j,r,s} = \mu _{i,r} \times \mu _{j,s}} \end{aligned}$$
(9)

The corresponding linear programming (LP) formulation for the Eq. (9) can easily be obtained using the Inequalities (10). However, we do not use linear formulations in our model since we observed through experiments that the QP model is able to determine solutions faster than the LP model. The observed faster performance of the QP model over the LP model in our experiments can be attributed to the used optimizer’s handling of the two problem types. Gurobi optimizer is optimized for various problem structures and utilizes different algorithms based on the problem’s nature. For QP problems, it employs efficient convex optimization techniques that often converge more quickly to a solution. This efficiency in solving QP models allows for quicker determination of solutions, making it a preferred approach in our study despite the potential applicability of LP formulations. We present both models here to give guidance to future studies.

$$\begin{aligned} \forall (i,j \in Tasks, r,s \in Tiles): \kappa _{i,j,r,s} \le \mu _{i,r} \end{aligned}$$
(10a)
$$\begin{aligned} \forall (i,j \in Tasks, r,s \in Tiles): \kappa _{i,j,r,s} \le \mu _{j,s} \end{aligned}$$
(10b)
$$\begin{aligned} {\forall (i,j \in Tasks, r,s \in Tiles):} {\kappa _{i,j,r,s} \ge \mu _{i,r} + \mu _{j,s} - 1} \end{aligned}$$
(10c)

\(\mathcal {E}_{Total_{i,j}}\) represents the total communication energy cost between two tiles i and j and it is formulated in Eq. (11). In this equation, for each communicating node i and j in task graph TG that are assigned to tiles r and s, we add the communication cost from the corresponding entry of the SPG matrix.

$$\begin{aligned} {\forall (r,s \in Tiles):} \quad {\mathcal {E}_{Total_{r,s}} = \sum _{i,j \in Tasks} SPG_{r,s} \times \kappa _{i,j,r,s} \times WCTG_{i,j}} \end{aligned}$$
(11)

Finally, \(\mathcal {E}_{WiNoC}\) represents the total communication energy cost of the WiNoC system, and it is formulated in Eq. (12).

$$\begin{aligned} \mathcal {E}_{WiNoC} = \sum _{r,s \in Tiles} \mathcal {E}_{Total_{r,s}} \end{aligned}$$
(12)

Our objective function is to minimize the total communication energy of the WiNoC system.

4.3 SA-based method

Simulated annealing (SA) is a metaheuristic method that prevents the search from getting stuck in local minima [43]. Pseudocode given in Algorithm 3 shows the basic steps of the SA process. Some random initial state is chosen in the beginning as the current solution, and the selection of a good initial temperature ensures the desired acceptance probability at the beginning (e.g., 80%). Then, the process of simulated annealing proceeds as follows. As long as some stopping condition (e.g., low temperature, the desired number of iterations, etc.) is not satisfied, a candidate neighbor solution is selected. Its objective function cost is compared to the cost of the current best solution, and if it is better, it is immediately accepted. However, even if it is not better, it can still be accepted based on the acceptance probability that is affected by the current temperature of the system. As the temperature cools, the probability of accepting a worse solution decreases. The decisions about the implementation details, such as the selection of the starting state, initial temperature, neighbor states, and the appropriate cooling schedule, must be made for each specific problem.

Algorithm 3
figure c

A general simulated annealing pseudocode

In the following sections, we explain the steps of our SA-based method, which is configured to our application mapping problem.

4.3.1 Determining the initial state

Instead of generating a random mapping solution, we employ the following heuristic approach to obtain the initial state. We iterate over each edge in the SPG, selecting it as the start edge \(e_s\). At each iteration, we first identify the highest-weight edge \(e_{\rm max}\) in the WCTG that is not already mapped. We then map the tasks associated with \(e_{\rm max}\) onto the cores corresponding to the vertices of \(e_s\). Next, we identify the next highest-weight edge \(e_{\rm max}\) in the WCTG that is not already mapped and the next lowest-weight edge \(e_s\) in the SPG, and we map the tasks associated with \(e_{\rm max}\) onto the cores corresponding to the vertices of \(e_s\). The algorithm updates edge weights at each mapping step to avoid reselecting them. This step is repeated until all tasks in WCTG are mapped, after which we assess the energy of the obtained mapping configuration. If the generated solution is the minimum-energy solution obtained so far, the algorithm saves it as the current minimum-energy mapping solution. This approach is repeated until every SPG edge has been considered the starting edge. Our experimental results show that the heuristically generated initial solution helps our SA-based method converge to optimal solutions faster than using randomly generated initial solutions.

4.3.2 Determining the initial configuration and the cooling schedule

The initial configuration of the annealing process consists of the following constants: the initial temperature \(T_\textrm{initial}\), the final temperature \(T_\textrm{final}\), the number of tries, and the parameter \(\alpha\) that is used to determine the next temperature in the cooling schedule. In this study, we automatically identify the initial and final temperature using the algorithm suggested by [44]. This algorithm determines the initial and final temperature of the annealing process such that the acceptance probability is 98% at the beginning and the improvement rate is 0% at the end. In our experiments, we set the number of tries to the constant value of 12 since this is the maximum number of simultaneous threads allowed by the six-core processor. The \(\alpha\) parameter is determined with respect to the number of tasks and formulated by Eq. (13), where N is the number of tasks in the application task graph WCTG.

$$\begin{aligned} \alpha = {\left\{ \begin{array}{ll} 0.001 &{} \text {if } N< 16 \\ 0.0001 &{} \text {if } 16 \le N < 25 \\ 0.00005 &{} \text {if } N \ge 25 \end{array}\right. } \end{aligned}$$
(13)

In our method, we use an exponential multiplicative cooling schedule [45], expressed in Eq. (14), where step is the current number of the annealing steps.

$$\begin{aligned} \small T_\textrm{step} = T_\textrm{initial} \times (1 - \alpha ) ^\textrm{step} \end{aligned}$$
(14)

The annealing process continues until the current temperature \(T_\textrm{step}\) becomes less than or equal to the final temperature \(T_\textrm{final}\).

4.3.3 Identification of the neighbor solutions

We define a mapping as a solution state \(S = [S_1, S_2, \ldots , S_R]\), which is expressed by a one-dimensional array of length R, which is the number of tiles in the topology. We show an example state array for the MPEG-4 benchmark (shown in Fig. 5) in Eq. (15), representing a mapping solution. If a tile is empty (i.e., no task is mapped onto it), then it is expressed as -1 in the array. Otherwise, the task number is written on the mapped tile.

$$\begin{aligned} \small S = [10, 11, -1, -1, 7, 12, -1, 2, 8, -1, 9, 1, 6, 3, 4, 5] \end{aligned}$$
(15)

The corresponding mapping configuration of the example mapping solution for the MPEG-4 benchmark given in Eq. (15) is illustrated in Fig. 3. As evident from the first row in the figure, node 10 (\(n_{10}\)) of our example WCTG is assigned to tile \(t_1\) and node 11 (\(n_{11}\)) is assigned to tile \(t_2\). On the other hand, tiles \(t_3\) and \(t_4\) are empty (i.e., no tasks are assigned to these tiles). This is also evident from the first four entries of Eq. (15) in the order of tile numbers.

Fig. 3
figure 3

Illustration of an example mapping configuration

To generate a neighbor solution, we swap two randomly selected states from the current solution and swap them. We then calculate the energy cost of the new solution. If it is better than the current best solution, we accept it. Otherwise, we accept or reject it based on the acceptance probability criteria, as given in Algorithm 3. We continue this process until our stopping criterion is met. After our SA method iterates until it reaches the stopping condition, it returns the best solution at hand as the final solution.

5 Experimental results

In this section, we present our experimental setup in Sect. 5.1, we discuss the results of the experiments in Sect. 5.2, we compare our SA-based method to other related (meta)heuristic approaches in Sect. 5.3 and then finally analyze the complexity and execution times in Sect. 5.4.

5.1 Experimental setup

We implemented the proposed methods in C++ programming language and used the Gurobi optimizer framework [46] to run our QP-based model. Gurobi optimizer uses parallel computing to speed up optimization by running multiple optimization algorithms assigned to different processor cores. For large-size test cases, we limited the maximum running time of our solver per test to eight hours and accepted the best solution at hand.

For the experiments performed on hybrid WiNoC topologies, we assume the wired and wireless energy costs as 1 and 0.3, respectively, as in [47]. Throughout this section, we will assume the wired-to-wireless links’ communication energy cost ratio as \(\varrho = 0.3\). Further details on the rationale behind this choice are provided in Sect. 5.5. We run the algorithms on a desktop computer with a 3.40 GHz six-core 12-thread CPU and 16 GB of RAM.

In our experiments, we used 33 topology graphs in varying sizes, six of which do not have any wireless routers. We placed wireless routers on the topologies either by directly copying from previous studies or slightly changing them, motivated by their source. Due to the page limitation, we only give two examples of wireless router placement in Fig. 4.

Fig. 4
figure 4

Topology graphs without (a) and with (b) wireless routers

Table 3 demonstrates the naming convention used in topology graph names. In this convention, WP is the abbreviation of wireless placement. The number after the letter N represents the size of the mesh, while the number next to the letter R represents how many wireless routers there are in the mesh. Since we have different manual placement configurations of the wireless routers in the mesh, we enumerate them as M1, M2, and so on. Table 4 lists the used topology graphs and their sources.

Table 3 Naming convention used in topology graph names
Table 4 Topology graphs used in the experiments

We used a total of 32 application graphs (i.e., WCTGs) in varying sizes in the experiments. We generated 25 task graphs similar to the benchmarks used in the literature. We also used six benchmarks that were taken from previous studies. These benchmarks are VODP [12], MWD [51], MPEG-4 [12], 263 Encoder-MP3 Decoder [52], 263 Decoder-MP3 Decoder [52], MP3 Encoder [52]. Furthermore, we also utilized a custom-generated benchmark with 25 nodes named G25. Two examples of the multimedia benchmarks of the benchmark task graphs are illustrated in Fig. 5.

Fig. 5
figure 5

WCTGs for MPEG-4 and MWD

When generating task graphs, we analyzed the out-degrees of vertices and weights of edges for the given benchmarks. After the analysis, we used a custom Directed Acyclic Graph (DAG) generator [53]. We slightly altered the software to accept predefined values of weights and out-degrees during generation. We used the algorithm described in [54] to generate random values that would mimic the desired distribution. Figure 6 shows the benchmark G25 and randomly generated task graph RV5DAG1 inspired by G25. Table 5 demonstrates the naming convention used in random WCTG names.

Table 5 Naming convention used in random WCTG names
Fig. 6
figure 6

WCTGs for G25 and RV5DAG1

5.2 Comparing QP model with SA-based method

In our first set of experiments, we compare our SA-based application mapping method to the mathematical QP model. Table 6 presents the test results (communication cost) for all benchmarks with respect to the number of wireless routers in the topology.

Table 6 Test results (communication cost) with respect to the number of wireless routers in the topology

In the first column of the table, we give the size of the mesh topology. The second and third columns show the number of wireless routers and their placement in the mesh, respectively. The topology is purely wired if the number of wireless routers is 0. In this set of experiments, we used only manual placement 1 (i.e., M1). The fourth and fifth columns represent the number of tasks in the mapped application and its WCTG, respectively. The following two columns, QP and SA, present the resulting communication cost for each test case. Finally, the last column indicates the percentage change of the results generated by the SA-based mapping method relative to the QP results. The smaller the percentage change increase in the obtained communication cost results, the better the performance.

From the results comparing SA to QP, given in the column denoted as SA/QP, we observe that the SA approach obtains optimal mapping results for most cases and near-optimal results for the remaining test cases. The highest deviation from the optimal results is evident for the test case of mapping the benchmark mwd onto the 4 × 4 mesh with WPN4R5M1 wireless router placement, for which SA obtains 4.30% higher communication cost. Nevertheless, the average percentage change of the SA results from the optimal solutions obtained using QP is only 0.46%; hence, it is negligible.

Fig. 7
figure 7

Benchmark test results with respect to the number of wireless routers

Furthermore, if we look at the trendlines for each benchmark shown in Fig. 7, we can observe that the number of wireless routers in the topology significantly affects the total communication energy cost. As the number of wireless routers increases, the total energy consumption for communication decreases for each benchmark, demonstrating the advantage of hybrid WiNoC architectures over pure NoC.

In Table 7 and Fig. 8, we present the results for all benchmarks for the tests performed with respect to the different manual placement of wireless routers in the topology.

Table 7 Test results (communication cost) with respect to the placement of wireless routers in the topology
Fig. 8
figure 8

Benchmark test results with respect to the placement of wireless routers

We observe that the mapping solutions obtained with SA are optimal or near-optimal for all test cases. Again, the average percentage change increase of the SA results from the optimal solutions is only 0.70%, which can be neglected.

Furthermore, considering the results from various experiments regarding the placement of wireless routers, two specific configurations, namely WPN4R3M5 and WPN4R3M6, have empirically demonstrated superior performance. These configurations are illustrated in Fig. 9.

Fig. 9
figure 9

Best placement results for the wireless routers

Better performance of the WPN4R3M5 and WPN4R3M6 topologies can be attributed to the strategic placement of wireless routers within the given 4 × 4 2D mesh NoC architecture, which optimizes the balance between connectivity and coverage across the network. In these configurations, three wireless routers are manually positioned to maximize the efficiency of wireless communication while minimizing communication latency. Specifically, a more optimal topology, represented by a configuration where two wireless routers are placed at two corners and one in the center (but not too close to the other wireless nodes), facilitates efficient direct communication across the mesh. This arrangement ensures that most nodes are within one hop of a wireless router, significantly reducing the number of hops needed for communication between distant nodes. The central router acts as a pivotal point for cross-network communication, enhancing the network’s overall data throughput and reducing communication energy consumption. In contrast, placing all wireless routers at the corners or too close to each other (as in less optimal configurations) limits the effectiveness of the wireless network coverage. This arrangement may cause certain areas of the mesh to rely more heavily on wired communication, leading to increased latency and energy consumption due to longer wired paths.

Finally, in Fig. 10, we present the test results for randomly generated benchmarks with respect to the size of the topology onto which the benchmark is being mapped. The topology sizes used in this set of experiments are 5 × 5, 6 × 6, 7 × 7, and 8 × 8. It can be observed that for some tests, QP results are slightly worse than SA results, which should never occur under normal circumstances. For those cases, QP did not finish execution before the eight-hour deadline expired, and the current best solutions were accepted. This set of experiments also shows how mathematical optimization methods are not practical for problems of big sizes, as they cannot generate optimal results in realistic running times.

Fig. 10
figure 10

Test results for randomly generated benchmarks with respect to the topology size (in log scale)

The results demonstrate how the proposed SA-based method can generate even better solutions than the QP method in some cases for large-sized problems. We observe an overall reduction in communication cost of about 3% on average when the QP method cannot converge to the optimal solution within the allowed time limit.

5.3 Comparing SA-based method with previous related work

In this set of experiments, we compare our SA-based mapping method to other existing heuristic mapping methods proposed in the literature, namely CastNet [2], MOCA [13], and NMAP [11]. For the purpose of a fair comparison, we use the experimental setup from [2]. Six benchmarks are mapped onto a 4 × 4 mesh-based NoC topology; i.e., the number of wireless routers is zero as these methods are not proposed for the WiNoC topologies.

The results are presented in Table 8. Columns 2, 3, 4, and 5 present the total communication energy consumption results for SA, CastNet, MOCA, and NMAP methods, respectively. Columns 6, 7, and 8 show the percentage change decrease in the results of SA compared to the other methods. The higher the percentage change decrease, the better the performance of SA compared to other methods. It can be observed from Table 8 that the proposed SA-based method outperforms all other compared methods for all used benchmarks. The improvement of SA in the overall energy consumption ranges from 2.47% up to 18.91% for some benchmarks.

Table 8 Communication energy cost results of SA, CastNet, MOCA, and NMAP

The observed variability in performance advantages can largely be attributed to the inherent characteristics of the Simulated Annealing-based method we employed. The SA method’s strength lies in its flexibility and capability to explore a vast solution space through probabilistic transitions, avoiding local optima that other allocation schemes might prematurely converge to. This global optimization approach allows the SA method to identify more efficient task placements by evaluating a broader range of potential configurations.

5.4 Execution time analysis

In this section, we discuss the complexity and execution time of our proposed methods. Table 9 presents the test execution times in seconds for all benchmarks with respect to the number of wireless routers in the topology.

Table 9 Execution times in seconds with respect to the number of wireless routers in the topology

From the execution time results given in the table above, we observe that the proposed QP model generates optimal results in very fast times for small problem sizes, making it the best method of choice. However, as the mesh size increases, the execution time of the QP model starts growing exponentially. In the last column, we give the percentage decrease obtained by the SA-based method over the QP model in the execution times. We can observe that even for the relatively small problem sizes, the SA-based method achieves about 50% faster execution times on average, which, coupled with the optimality of the generated solutions, makes the proposed metaheuristic method very efficient.

Table 10 presents the test execution times in seconds for all benchmarks with respect to the placement of four wireless routers in the topology.

Table 10 Benchmark test execution times with respect to the placement of wireless routers in the topology

From the execution time results given in the table above, we observe a similar trend. The SA-based method outperformed the QP model for almost all test cases. On average, the SA method generated solutions more than 50% faster than QP for these relatively smaller problem sizes, for which the QP obtained the optimal results in the given time execution limit of eight hours.

Furthermore, if we take N to represent the number of tasks in an application graph and R the number of tiles in the topology, then to search through the entire solution space, it would be necessary to explore \(N!\times R\) possible mappings. Hence, the performance of metaheuristic methods over mathematical optimization-based methods increases significantly as the problem size grows. This trend is also evident in our experiments, as the biggest topology size for which the QP model was able to finish execution is 5 × 5 mesh. Other large-size experiments had to be stopped after the time limit expired or due to insufficient computer memory used to run the model. In Fig. 11, we demonstrate how the QP model’s execution time starts growing exponentially as the topology size increases while, at the same time, the SA-based method’s running time grows linearly.

Fig. 11
figure 11

The execution times of the proposed methods as the mesh size increases

A mathematical programming-based model in general, and the proposed QP-based model in particular, is not a practically applicable solution because of the long execution time and high memory requirements, especially while trying to optimize large graphs. Furthermore, we acknowledge that while the QP-based model offers advantages in terms of solution accuracy and speed for smaller problem sizes, its scalability to larger problem sizes is indeed constrained. This limitation primarily arises from the increased computational complexity associated with solving QP problems as the number of variables grows. The inherent nature of QP formulations, involving quadratic terms, leads to a more complex solution space that requires significantly more computational resources to navigate. As the problem size increases, this can result in longer solution times and higher memory demands, potentially making the QP-based approach less feasible for very large-scale applications. In contrast, the proposed SA-based method offers a more practical solution, providing near-optimal results with significantly lower time and memory requirements, thereby making it a more feasible option for the problem at hand.

5.5 Experimenting with different wired-to-wireless communication cost ratios

In this section, we explore the impact of varying the communication cost ratios between wired and wireless links on the performance of hybrid WiNoC topologies. The experiments are based on a specific 4 × 4 mesh topology, as depicted in Fig. 4.

Table 11 presents the comparison of the communication costs associated with different application task graphs (App. TG) on a fully wired topology WPN4R0M1, depicted in Fig. 4a, versus the hybrid wireless topology WPN4R3M1, modified to include three manually placed wireless routers as depicted in Fig. 4b, with differing wired-to-wireless cost ratios (ρ). These ratios range from 0.3 to 2, providing a comprehensive overview of how varying the cost ratio affects the system’s overall communication energy.

Table 11 Communication cost results with varying wired-to-wireless communication cost ratios

The results demonstrate that within the specified topology (WPN4R3M1), the incorporation of wireless links at a cost ratio (ρ) of 0.3 leads to a notable decrease in communication costs across all tested application task graphs, highlighting the energy efficiency benefits of integrating wireless links. Nonetheless, when the cost ratio escalates beyond 0.3, reaching up to 2, the substitution of wired links with wireless ones ceases to offer any advantage for the applications analyzed. This shift indicates a preference for wired links over wireless ones as the cost of wireless communication rises, thereby diminishing the energy efficiency benefits initially gained from wireless link integration at lower cost ratios.

This observation underscores the critical balance between the cost-efficiency of wireless links and their strategic deployment within a WiNoC system. While the integration of wireless links can offer significant energy savings and performance benefits at lower cost ratios, the effectiveness of such an approach is heavily dependent on maintaining a cost ratio that justifies their use over traditional wired connections.

6 Conclusion

Network-on-Chip offers a more efficient alternative to traditional communication methods by reducing latency and energy consumption. Wireless NoC (WiNoC) further addresses these challenges through single-hop wireless connections, enhancing scalability and energy efficiency. However, designing WiNoCs faces complexities such as application mapping and wireless router integration, which are NP-hard problems. This study introduces application mapping methods for hybrid WiNoC mesh topologies using quadratic programming (QP) and simulated annealing (SA). The QP-based approach achieves optimal solutions for smaller benchmarks, while the SA-based method provides optimal or near-optimal results for larger problem sizes, making it feasible for practical applications. Our methods take the application graph and the hybrid 2D WiNoC mesh topology, where some routers communicate through wireless links as inputs, and generate the optimal application mapping to minimize the communication energy consumption of the application.

Our findings highlight the effectiveness of the SA-based method in handling larger problem sizes where the QP model is constrained by computational and memory limitations. Nonetheless, the results obtained from the QP model are crucial to testing the optimality of the solutions obtained by other (meta)heuristic methods. The results demonstrate the potential of metaheuristic approaches for application mapping in hybrid WiNoC architectures. Future research directions include optimizing application mapping across various WiNoC topologies and improving wireless router placement for enhanced communication efficiency, and exploring different strategies to enhance the practical applicability of mathematical optimization models. Additionally, we plan to investigate incorporating empirical data on wireless channel behavior and dynamic energy consumption to refine our energy models, further bridging the gap between theoretical models and practical applications.