Area and Energy Opimized QCA Based Shuffle-Exchange Network with Multicast and Broadcast Configuration

In any wide-range processing system, rapid interconnecting networks are employed between the processing modules and embedded systems. This study deals with the optimized design and implementation of Switching Element (SE) which operates in four modes, accepting two inputs and delivering two outputs. The Shuffle-Exchange Network (SEN) can be used as a single-stage as well as a multi-stage network. SEN is used as an interconnection architecture which is implemented with exclusive input-output paths with simple design. The SE acts as a building block to the Multi-stage Shuffle-Exchange Network (M-SEN) with facilities to perform unicast and multicast operation on the inputs. An 8 × 8 M-SEN model is also implemented, which works in three modes of communication, termed as "One-to-One", "One-to-Many" and "One-to-All" M-SEN configuration. All the QCA circuits have been implemented and simulated using CAD tool QCADesigner. The proposed QCA-based M-SEN design is better in terms of area occupied by 14.63 %, average energy dissipation by 22.75 % and cell count with a reduction of 84 cells when compared to reference M-SEN architecture. The optimization of the design in terms of cell count and area results in lesser energy dissipation and hence can be used in future-generation complex networks and communication systems.


Introduction
A pioneering technology that promises the design of digital integrated circuits with high packaging, minimum energy dissipation and lower delay is Quantumdot Cellular Automata (QCA). QCA possesses numerous advantages, such as compact device area with elevated switching speed and minimal power dissipation. Due to these advantages, the design of digital logic circuit at the nano-scale level with QCA is enhancing the trends [1].
In the subsequent decades, according to Moore's Law, rapid scaling down of the devices with Complementary Metal-Oxide-Semiconductor (CMOS) technology will further lead to various issues as those CMOS-devices are reluctant to accept change in scaling [2]. Some of the difficulties faced by the devices based on transistor are thermal power dissipation, tunnelling and leakage current [3]. Leakage current causes power dissipation since there is a decrease in supply voltage and significant increase in threshold voltage. Therefore, QCA offers smaller area solutions and lower thermal energy consumption. Numerous modules, specifically processing elements capable of communicating with each other are utilized in building systems responsible for computation.
The possibility of designing and implementing highly complex and highly complicated interconnection networks has increased due to the rigorous evolution of new-age communication technology [4]. Amidst the components required to build a system, the interconnection networks provide an interesting and cost-efficient remedy for communication and interconnection.
A highly efficient communication technique becomes the need of the hour to facilitate parallel and collaborative computing depending on the applications due to the increase in number of operative modules in systems at extensive level. This technique must be capable of combining and integrating the robust communication architecture on the chip so as to operate multiple tasks efficiently, at maximum speed and nominal cost. The focus of the work is to design a compact and energy efficient Shuffle-Exchange Network with multicast and broadcast configurations.
There are two ways in which the network architectures can be implemented. One way is to use all the switches in a single phase and join various links repeatedly to arrive at the eventual destination. This is referred to as single-stage structure or recycling structure [5]. Another way is to dispatch the information from the source vertex to the destination vertex by means of a series of interconnecting switching structures and inter-links. Therefore, this is termed as multi-stage network. In order to reach the escalating requirements of high-performance computing, Multistage Interconnection Networks (MINs) are preferred due to the progress observed in parallel computing field.
In systems capable of multiprocessing, MINs are considered to provide a reasonable solution allowing programmable information routes among functional modules. A simple routed distinctive path MIN is called Shuffle Exchange Network (SEN).
The objective of the work is to propose and design an area and energy efficient 8 × 8 M-SEN. The M-SEN possess transmission capability from one input node to multiple output nodes, which is called multicast mode, and transmission capability from one input port to all output ports concurrently, which is called broadcast mode. With the notion of reducing cells in order to obtain an optimized network, the area and energy dissipation is also minimized. CAD tool QCADesigner is employed to design and run simulations of the proposed designs.
The rest of the paper is further categorized into different sections. The literature review of different papers and relevant topics are carried out accordingly in Sec. 2.
The overview of QCA technology, SEN, the design theory of the proposed basic QCA switching module or Switching Element (SE) and the implementation of the proposed QCA SE and M-SEN are presented in Sec. 3. The results of the implementation are put forward in Sec. 4. and discussed in Sec. 5. The work is concluded briefly in Sec. 6.

Literature Review
Over the last few decades, CMOS technology has always stood out as the go-to technology for designing and implementing Very-Large-Scale Integration (VLSI) circuits. QCA is one of the replacement technologies for CMOS due to its advantages which provide a solution to the constraints experienced while using CMOS technology. The lower energy and higher speed observed during the usage of QCA design technology at nano-scale level possess an extensive scope in the near future.
In order to enable connection between processors and memory modules, MINs are preferred over crossbar networks due to their low switching cost [4]. Also, MINs dispense the widened bandwidth to various units and deliver minimal latency access to modules with memory storage. In this paper, a QCA technique is developed so as to build a common Delta MIN architecture. Various possible implementations such as Baseline network, Omega network, Generalized Cube network, and Butterfly network are designed and analysed using QCADesigner. The results show that these structures can be conveniently realized using QCAbased cells and surpass other implementations based on nano-technology.
The complicated core functions of large-scale systems are performed by providing fast and reliable communication using interconnection networks between modules that are embedded on chip and elements that are capable of processing information. The SEN is widely utilised as an interconnection network due to its simple configuration and latency reduction characteristic. In [5], the authors suggest a 2 × 2 four-function SE in QCA nanotechnology. This SE is employed in implementing M-SEN at nanoscale level with multicast and broadcast potentiality. As a result, One-to-One, One-to-Many and One-to-All communication becomes possible inside the computing systems between various frameworks. Based on latency, cell count, cost function and occupied area, the suggested architecture was evaluated, analysed and was concluded that it was better and distinct compared to other existing architectures.
QCA is a transistor-less nanotechnology, wherein the electron plays a vital role in transmission of digital information and performs numerous logical calculations. No current source is necessary for QCA for these logical calculations. Based on the receiver's port address, the input signal is routed in multistage communication architecture, called the Banyan network. In [6], a novel Banyan network with a single layer is suggested using QCA, in which the port address is identified as a string. The 4×4 and 8×8 Banyan network suggested in this paper consists of a new single layer 2 × 2 crossbar switch. The suggested Banyan networks have been character-ized according to their fault-tolerant trait. The simulation details show that the suggested crossbar switch as well the Banyan network have notable efficiency with respect to area and latency.
The results in [7] put forth the necessity of MINs with respect to super computers due to the increase in the number of hubs. In order to achieve ultra-reliable, fault tolerant, cost effective as well as dynamically reroutable substructures inside the super computers, enhancing the reliability of the MINs deserves utmost importance. The SEN, a type of MIN, possess the characteristics which satisfy these constraints. A classification of SEN MIN is conveyed and contrasted with existing regular SEN MIN considering various parameters like path length, cost effectiveness, fault-tolerance and so on. The study conducted in this paper demonstrated that using less stages and involving non-SEs such as multiplexer and demultiplexers in the design, brings down the cost involvement of the entire network, as they do not produce additional latencies Furthermore, suitable SEN can be opted based on the application and requirement keeping in mind the cost and accessibility.
Suitable interconnection structures have been designed using SENs for their simple configuration and size of the switching elements. A pristine framework called the Replicated Enhanced Augmented SEN has been presented in [8]. An Enhanced Augmented SEN (E-ASEN) is utilized by making a replica of the network to increase the reliability of the performance and creating a redundant route. The results showed that increasing the number of routes by using a recreated network produced the maximum performance with respect to reliability. Hence the replicated E-ASEN provided better performance compared to other SENs. Terminal reliability analysis is used to gauge the performance parameter of SENs in terms of reliability.
The distributed communication networks incorporate a circuit switched network to disseminate the input signal to various users. In [9], the authors introduced an innovative fault-tolerant circuit switched module using QCA. To achieve this network, a unique crossbar switch is demonstrated using only one layer. The circuit switched structure consists of a crossbar switch, a multiplexer, demultiplexer, and logic gates. Latency and area of the architectures are analysed.
By inspecting the energy emitted by the layouts, it is found out that the suggested QCA schematics have the least dissipation of energy. The network is termed to be fault-tolerant by analysing the communication that is made possible due to stuck-at-fault at the signals in-charge. The results show that the layouts provide computation at nano-scale level, at a faster rate.
The QCA design employs Quantum-Dot Cellular Automata (QDCA), which represents a cellular alter-native to CMOS design. A revolutionary design of five input majority gate is suggested in [10]. In addition to these design characteristics, the suggested design has cells that are not rotated, have a single-layer structure, and is resilient. The design reduces the number of cells by 7.69 % while maintaining an efficiency comparable to the best analogous designs published in the literature.
A quantum-dot cellular automaton is an excellent and feasible alternative to CMOS technology. The basic components of computerised circuits are latches and flip-flop circuits. This study offers a small and lowenergy JK Flip-flop built in the QCADesigner [11]. The experimental findings produced in this article reveal a reduction in cell count, which in turn reduces the complexity of the circuit, as well as a reduction in energy and area.
Reducing the circuit area is the major design target in the case of molecular quantum dot cellular automata. When multiple functions can be done with the same hardware, a single piece of hardware that can conduct both tasks is preferred [12]. Studies recommend the use of quantum-dot cellular automata nanotechnology in circuit construction at the nanoscale level with the usage of clock zones and multilayer crossovers [13].
Despite various hurdles, such as leakage current at nanoscale levels, CMOS technology is already overwhelmed with the need of nanoscale devices. The concept of an exclusive-or gate with fewer cells is being suggested as a way to control the inversion of inputs [14]. The failure of individual cells in Quantumdot Cellular automata circuit construction has the potential to have a substantial impact on the overall performance of the circuit. Additionally, during manufacture, QCA circuits may experience manufacturing problems [15]. The suggested designs introduce and calculate design parameters which can be used to calculate the fault tolerance abilities of QCA circuits [16]. Additionally, the crossbar switch forms the elementary component in certain multi-stage interconnection networks. Using the suggested crossbar switch [21], the baseline network was optimized to 1713 cells, with a latency of 20 ps and QCA cost of 1156.
CMOS technology has been the go-to technology for VLSI circuit design and implementation since the 1960s. For the advantages, QCA is one of the solutions to the issues faced by CMOS technology. In order to provide connections between processors and memory modules, MINs are generally used instead of crossbar networks, which have a lower switching cost. Increasing the reliability of the MINs is critical to ultra-reliable, fault tolerant, cost effective, and dynamically re-routable substructures in the supercomputers. The proposed work suggests an efficient SE with multi-cast and broadcast capability, for enhanced connectivity networks between modules that are embedded on the chip and elements that are capable of processing information. Inverter and Majority Voter (MV), known as majority gate, are the two primary gates used in QCA. Majority gates can be a 3-input gate or 5-input gate which can be modelled into an inverter, OR gate and AND gate. MV can be used as a 3-input gate, as presented in Fig. 2 [1], that requires five cells for construction or as a 5-input gate which occupies ten cells. The 3-input MV gate is expressed as −M (P, Q, R) = P Q + QR + RP , wherein P , Q, and R are the inputs given to the gate and M denotes the majority function performed over 3 input variables.

Methodology
The inputs can be modified so as to vary the operation of the gate. If one of the input signals is set as logic '0', then the 3-input majority gate acts as 2-input AND gate. Similarly, if one of the input signals is set as logic '1', then the 3-input majority gate acts as 2-input OR gate. To build devices with high complexity, both placement of the cells and synchronization of information play a major role in producing accurate outputs. Any signal arriving at a logic gate, which might propagate before any other set of input signals reaching the particular gate, must be prevented at any cost. Therefore, the flow of the data should be managed very efficiently in order to arrive at desired outputs. To manage this, different clock-phase mechanisms are facilitated by QCA technology. Altogether, QCA technology allows four various phases of clock which areswitch, hold, release, and relax. Each clock signal has a 90 degrees negative shift in phase with respect to the preceding clocking [20].
The illustration of the different clock phases used in QCA technology is specified in [17]. In the switching phase, an exclusive polarity of the cell is obtained because the polarization of proximal cells exerts influence on a cell affecting its own polarization. Here, the barriers are raised when there is an occurrence of actual computation. In the second phase of clock-hold, the barriers are maintained at a higher level and polarization remains unaltered. During release phase of clock, the barriers kept high are moved lower and these cells suffer a loss of polarity. In the relax phase, the cells are left non-polarized.

QCA Logic
Columbic interactions occur between the electrons when the cells are positioned in neighbouring locations. These columbic forces arise due to the presence of harmonizing effect between the corresponding cells which cumulatively leads to changes in polarization of the proximal cells. Any elementary logic gate can be designed with ease using the cells as the building blocks. When the cells are kept near each other, their individual polarization will play a role on the polarization of the proximal cell. A QCA wire is mainly comprised of cells which are combined in a linear fashion, as depicted in [17]. These cells are combined in a row-wise manner to propagate same information throughout the linear cells from one site to another site. The cells are bound together because there exists a columbic inter-actional force between them that aids in propagating the logic data from one end to the other. Since there is electron repulsion between the cells, the polarization of one cell will exert dominance on the polarization of the adjoining cells and forces a change in their polarization. The cells can be combined at a certain angle such as 90 degrees with edges joined to one another or 45 degrees with vertices joined to one another in a straight manner.
An inverter is built by positioning the cells such that their vertices are 90 degrees shifted and the polarization occurs due to the columbic interaction between the oppositely connected cells. Robust and corner cell inverters are the two varieties of inverters that can be realised in QCA. Four QCA cells are used to realise the corner cell inverter, while 7 cells are required to realise a robust inverter [17]. The displaced cell will receive the transpose value of the preceding cell due to repulsion caused between the electrons by the Columbic forces. When the cells are placed at 45 • to each other, the neighbouring cell will intrinsically receive the inverted value of the data.

Shuffle-Exchange Network
The SEN is one among the most prominent and efficient interconnection modules used. Two routing functions known as shuffle and exchange permutations, are utilised to create this network. The port address of the transmitter T , undergoes the process of permutation to arrive at the address of the receiver, R. The perfect shuffle permutation method is used to acquire the receiver address, in which the sender address bits undergo a rotation process to one left place. Suppose the network has M input and M output ports, then the network size is estimated by the number of ports M which corresponds to 2n, where each port address is a unique n-bit binary number. The equation of sender address, receiver address and the permutation function in binary form are given in Eq. (1) and Eq. (2): The next routing function is for the exchange network including permutation where the least significant bit value is applied reversibly in the sender (T ) address to arrive at the receiver (R) address. An interconnection between pairs of network ports is observed along with addresses whose binary representation differs exclusively in the rightmost bit. Equation (3) and Eq. (4) are used to express the permutation function of the exchange network: The perfect shuffle is depicted using dashed lines and exchange routing is depicted using solid lines for M = 8 configuration in Fig. 3. The perfect-shuffle and exchange combinations can be employed in a single-stage structure as well as in a multi-stage structure to communicate between multiple and various modules in the processing systems [5]. The 8×8 single stage configurations is represented in the Fig. 4. The S-SEN consists of one condition of four switching modules which is then followed by a perfectshuffle module of interconnected links [5]. Output side must be sent back to the correlated input side to conclude the overall interconnected network. Routing of the data through the shuffle and exchange processes is done until it can be transmitted through to the desired network port. The performance of the singlestage structures is known to be limited. Because of this, enough number of replicas of S-SEN is used to cascade to form the forward feed interconnected network. This gives rise to a network called M-SEN, presented in Fig. 5.
The data used in the interconnection is no more in need to circulate across the overall network. But this data is passed on throughout the stages in a unique way from input node to the destination node. It depends on the properties of the switches used. Therefore, optimizing the S-SEN and replicating it to build the M-SEN, will optimize the bigger circuit as a whole. The M ×M M-SEN network is designed using n(log 2 M ) columns of the SEs to let the interconnections exist between the M ports. Each stage is made of M/2 switching modules and it is then linked to the successive stage using a perfect-shuffle sample of interconnected links.

Design Theory -Switching Element
The Switching Element (SE) is termed as a packet capable of interchanging ability between the applied inputs and obtained outputs with the two probable 2 × 2 settings as depicted in Fig. 6 [5]. It accepts two inputs, produces two outputs and has a control reserved for switching, through which the input signals are exchanged or switched under the internal control. The two-function switch has two modes of operation based on the value of the internal control line applied to the switching control. These stages are known as unicaststraight and unicast-exchange. The required configuration is selected based on the value of the control line, C, when M = '0'. It can also work as broadcasting-top network and broadcasting-bottom network.
Other two probable 1 × 2 configurations of the SE are depicted in Fig. 6(c) and Fig. 6(d). These con-figurations provide broadcast capability to the SE, thereby making it distinct from other SEs. An additional broadcasting control line, M , is enabled in the SE to choose either 2 × 2 or 1 × 2 configurations. If the unicast-straight or unicast-exchange is utilized for interconnection purpose, then the unicast 'One-to-One' mode is enabled and if there is a requirement of broadcast-bottom or broadcast-top for interconnection, then the broadcast 'One-to-All' communication mode is enabled. In order to transmit the input signals directly to the output ports or by exchanging them and then transmitting or sending one input to all the output ports, the proposed SE design can be utilized.  The SE is realised using XOR gate for selection of mode and two multiplexers to perform the routing function. The circuit diagram of the basic SE is displayed in Fig. 7. The input ports are represented by I0 and I1, whereas the output ports are represented by O0 and O1. Additionally, the control signals are given by M and C. The proposed SE can function in four different configurations and all the 4 cases are specified in [5].

Implementation of Proposed QCA Based SEN Architecture
The SE performs four different functions when the control signals, M and C are varied in four possible combinations. The proposed SE consists of 76 cells lying in a single layer and it occupies a layout area of 0.066 µm 2 . The SE in a single layer is shown in Fig. 8. The SE is a combinational logic of XOR and multiplexer. The XOR logic is designed using 6 QCA-cells. The MUX is designed using 6 QCA-based cells. The select control for the MUX is given as the values '1' and '−1' for producing an output based on the input cell value. As the internal controls, M and C, are varied, the output is obtained in accordance with the communication type of mode.  Table 1 displays a routing table for all possible setups. Some suggestions for constructing a circuit using the QCADesigner 2.0.3 utilising both engines are provided. QCA is a computational technique that can encode binary data using cell-to-cell interactions and bistable charge configurations [18]. Throughout the simulation, the radius of effect is kept constant at 65 nm, and comparable other parameters need to be kept constant while developing with QCA technology. Table 2 specifies the design characteristics of QCADesigner. The QCA schematic for the proposed 8 × 8 M-SEN network is shown in Fig. 9.  18 · 10 −9 m 18 · 10 −9 m Cell height 18 · 10 −9 m 18 · 10 −9 m Dot diameter 5 · 10 −9 m 5 · 10 −9 m Time setup 1 · 10 −16 s 1 · 10 −16 s Total simulation time 7 · 10 −11 J 7 · 10 −11 J Clock high 9.8 · 10 −22 J 9.8 · 10 −22 J Clock low 3.8 · 10 −23 J 3.8 · 10 −23 J Clock amplitude factor 2 2 Radius of effect 65 · 10 −9 m 65 · 10 −9 m Layer separation 11.5 · 10 −9 m 11.5 · 10 −9 m Maximum iterations per sample 100 NA sisting of four SE, accepts the inputs from external devices whereas the remaining SEs are combined in a cascaded manner. The input cell is kept at clock zone-0 whereas the output cell is kept as clock zone-2 since there is a delay in the circuit for the input to arrive at the destination cell. This M-SEN structure requires 1220 number of QCA-based cell and it consumes area of 1.178 µm 2 .

Results
The simulation is carried out using the CAD tool QCADesigner 2.0.3 to run the working for switching element and 8 × 8 M-SEN module. The SE consists of 76 QCA-cells, consumes a total layout area of 0.066 µm 2 and cell area of 0.0246 µm 2 . The total layout area represents the actual area occupied by the entire circuit keeping in mind the distance between two cells, whereas the cell area is the nominal area occupied by the total number of cells. There are four clock zones present in the circuit, namely clock zone-0, clock zone-1, clock zone-2, and clock zone-3. The latency in QCADesigner is measured as the difference in the time of output with respect to input and it is measured in terms of clock period.
QCA Cost is a measure of latency and cell area which gives an overall estimation of the performance of the circuit. The QCA Cost is expressed in Eq (5): The design result of the four-mode switching network is represented in Tab. 3. As observed in Tab. 3, the proposed SE has reduction in the cell count which will optimize its energy dissipation. CAD tool QCADesigner-E is used to estimate the value of total energy dissipation and average energy dissipation of SE. As depicted in Tab. 3, there is a decrease of 51.59 %, 38.21 %, and 6.17 % in cell count when the proposed SE is compared to the architectures in [4], [9], and [5], respectively. Also, a decrease of 90.16 %, 82.04 %, and 6.11 % in cell area can be observed when compared to the architectures in [4], [9], and [5], respectively. The 76-celled SE acts as a basic building block of the entire 8 × 8 M-SEN structure. There are four clock zones present in this design. When the switches are set in unicast-straight configuration, then the information sent from input ports I0 to I7 are obtained at O0, O4, O1, O5, O2, O6, O3, and O7, respectively. This depicts 'One-to-One' mode of communication. When the switches are set in unicastexchange configuration, then the information sent from input ports I0 to I7 are obtained at O7, O3, O6, O2, O5, O1, O4, and O0, respectively. Also depicts 'Oneto-One' mode of communication. When the switch S11 operates in unicast-exchange mode and switches S22, S33, and S34 operate in broadcast-top mode, the input I0 is observed at O4, O5, O6 and O7 output ports. This demonstrates 'One-to-Many' mode of communication.
When the switches S11, S21, S22, S31, S32, S33, and S34 operate in broadcast-top mode, the input I0 is observed at all the output ports O0 to O7. This demonstrates 'One-to-All' mode of communication. The 'unicast-exchange', 'broadcast-top' and 'broadcast-bottom' configurations have also been simulated and verified accordingly. The output arrives after 4.5 clock periods, the latency of the circuit is found to be 4.5. 8 × 8 M-SEN require 1220 QCA-based cells. It occupies a layout area of 1.178 µm 2 and cell area of 0.3953 µm 2 .

Discussion
The design results of the M-SEN configuration analysed using QCADesigner 2.0.3 are as listed in Tab. 4. As depicted in Tab. 4, the proposed SEN has a decrease of 53.38 %, 21.23 %, and 6.44 % in cell count compared to the architectures in [4], [6], and [5], respectively. Also, the proposed architecture is better in terms of total area as there is a decrease of 70.11 %, 38.96 %, and 14.63 % when compared to architectures in [4], [6], and [5], respectively. A reduction of 70.11 %, 80.69 %, and 14.66 % in QCA cost can be observed on comparing the proposed architecture with architectures in [4], [6], and [5], respectively. The reference architectures were also implemented using the QCADesigner and the energy values were calculated. QCADesigner-E is capable of providing the total and average energy dissipation of the circuits [19].
The total energy dissipation and average energy dissipation of SE is listed in Tab. 5. The amount of total energy dissipated and average energy dissipated is measured in terms of eV. The proposed SE is better in terms of total energy dissipation by 9.93 % and average energy dissipation by 9.48 % compared to SE in [5]. This reduction will optimize the usage of SE. The total energy dissipation and average energy dissipation of M-SEN is listed in Tab. 6. It can be analysed that the proposed 8 × 8 M-SEN has an energy reduction of 23.17 % and 22.75 % with respect to total energy dissipation and average energy dissipation respectively, compared to [5]. Therefore, the proposed 8 × 8 M-SEN architecture is better and efficient in terms of cell count, area, cost, total energy dissipation and also average energy dissipation.
The design approach used in the proposed 8 × 8 M-SEN is single layered. The single layer design used in the architecture can be considered as the limitation of the network. Moreover, the multi-layer design is a subject of future research and might yield better results.

Conclusion and Future Research
The implementation and execution of a single-stage and an 8 × 8 multi-stage perfect shuffle-exchange network is demonstrated in this paper. The simulation is achieved using CAD tool QCADesigner 2.0.3 in nanotechnology. The proposed architecture is comprised of SE which has four operational modes and, such shuffled linked are coupled to form multi-staged networks. The optimization of the basic SE, when replicated to construct the M-SEN, results in further optimization of the larger circuit.
The multiple stages are functioned to perform unicast with 'One-to-One', multicast with 'One-to-Many' and broadcast with 'One-to-All' communication types. The circuit parameters such as QCA-cell area, cell count, latency, clock zones, QCA Cost are measured using the QCADesigner. The proposed switching element contains lower number of cells than the other architectures. The proposed 8 × 8 M-SEN configuration has 14.63 % reduction in area, 14.66 % reduction in cost, 23.17 % reduction in total energy dissipation and 22.75 % reduction in average energy dissipation compared to the reference architecture.
The optimization of the SE and M-SEN may be done using multilayer concepts which might yield better results. The proposed optimized switching modules used in single and multiple stages can act as functional blocks in the future-generation architectures and compounded communication systems, thereby optimizing their performance.

Author Contributions
B.S.P. conceptualized the idea, methodology, design and analysis. S.H.M. and K.J.N. carried out the design methodology and implementation. All the authors contributed to the analysis of the design and results and provided inputs to the manuscript. All authors involved in analysing, drafting, editing and revision of the paper.