PCOI: Packet Classification-Based Optical Interconnect for Data Centre Networks

To support cloud services, Data Centre Networks (DCNs) are constructed to have many servers and network devices, thus increasing the routing complexity and energy consumption of the DCN. The introduction of optical technology in DCNs gives several beneﬁts related to routing control and energy eﬃciency. This paper presents a novel Packet Classiﬁcation based Optical interconnect (PCOI) architecture for DCN which simpliﬁes the routing process by classifying the packet at the sender rack and reduces energy consumption by utilizing the passive optical components. This architecture brings some key beneﬁts to optical interconnects in DCNs which include (i) routing simplicity, (ii) reduced energy consumption, (iii) scalability to large port count, (iv) packet loss avoidance, and (v) all-to-one communication support. The packets are classiﬁed based on destination rack and are arranged in the input queues. This paper presents the input and output queuing analysis of the PCOI architecture in terms of mathematical analysis, the TCP simulation in NS2, and the physical layer analysis by conducting simulation in OptiSystem. The packet loss in the PCOI has been avoided by adopting the input and output queuing model. The output queue of PCOI architecture represents an M/D/32 queue. The simulation results show that PCOI achieved a signiﬁcant improvement in terms of throughput and low end-to-end delay. The eye-diagram results show that a good quality optical signal is received at the output, showing a very low Bit Error Rate (BER).


Introduction
DCNs are indispensable entities that enable many of today's services like social networking, search engines, e-mail, and so on. DCNs should be able to satisfy the Quality of Service (QoS) requirements of a huge number of customers in companies like Microsoft, Google, Yahoo, eBay, IBM, and so on. ese companies have data centres which have at least 50,000 nodes in a single data centre. Such a massive scale infrastructure needs energy to work. In 2010, the energy consumed by data centres was 1.5 percent of global power consumption. In 2012, the energy consumed by data centres was 120 billion kilowatts [1]. Apart from energy consumption, the DCNs are also growing in the annual network traffic. According to Cisco, the DCNs' traffic is growing at a compound rate of 25 percent up to 2019, reaching to 10 zettabytes (ZB) per year [2]. e data centre traffic is now measured in ZB, and by 2021 more than 95 percent of the total traffic will be terminated and originated in the DCNs. e major concern in designing a DCN is reduction in its power consumption, which impacts the operational expenditure (OPEX). Optical switch technologies are the choice of future [3].
Such challenges of growing traffic demand and energy consumption can be addressed by the introduction of optical interconnects. e optical interconnects in DCNS are able to provide ultrahigh transmission bandwidth in an energy-and cost-efficient way. ere are some obvious benefits of using optical interconnects in DCNs such as large port count, long reach communication, ability to reconfigure, large extinction ratio, and dealing with traffic heterogeneity. Optical DCNs have many communicating nodes, so they have large arrays of similar optical components. Optical components used in optical data centres follow fixed physical laws related to light. Interaction between optical components depends upon the changes in the properties of light brought by the optical components. Optical switching is a key component of today's high-performance communication networks. Optical interconnect is a box with N inputs and N outputs. At any time, the internal interconnection of optical components establishes paths from the inputs to the outputs. Congestion can occur when the number of source nodes trying to access a destination node exceeds the capacity of the destination node, which makes queuing unavoidable. Queuing can be done on the input side, the output side, or both [4]. e signal degradation due to optical components can introduce Bit Error Rate (BER) in optical interconnects. e signal degradation can be measured by analysing the eye diagram of the signal. e eye diagram is created by superimposition of zeros and ones of the optical digital data stream. e eye diagram of the optical signal can give information about jitter, noise, signal amplitude, duty cycle distortion, fall time, and bit period. Different types of signal degradations include reduced amplitude, changed shape, introduction of noise, and change in the fall time or rise time of pulse. Different types of optical components are responsible for such signal degradations. Optical components used in optical data centres include couplers, Directly Modulated Masers (DML), Vertical Cavity Surface Emitting Laser (VCSEL), Fibber Delay Lines (FDL), Fiber Bragg Grating (FBG), and SOA. Couplers cause reduction in amplitude, DMLs and VCSELs cause change in shape of signal, FDLs cause delay in bits, FBGs cause addition of noise, and SOAs introduce spikes in the optical signal. e signal degradation compensation can be achieved using various methods. Reduction in amplitude is compensated by using amplifiers or increasing the amplitude of transmitted signal. Optical amplifiers can reduce the BER by boosting the amplitude of weak optical signal [5]. Use of DML reduces the cost of communication system as it eliminates the need for separate modulator. However, they introduce nonlinear changes in shape of optical signal which need to be compensated. ese changes can be compensated by using predistortion circuits [6]. Changes in shape of optical signal are also introduced by VCSEL. Feedforward equalization is used to compensate these changes [7].
When SOAs are used in optical DCNs, they cause spikes at the start of bit. ese spikes gradually reduce with time, depending upon the transition time of SOA. Such distortion can be compensated by increasing the length of bit period, such that it is greater than the transition time of SOA. is results in the reduction of supported bit rate of the system. e equalizer adaptation algorithm can mitigate the timing jitter caused by FDL, which works bitwise to detect the amount of time shift [8]. Uniform noise introduced by FBG can be mitigated by either amplifying the optical signal or increasing the transmitter power. e objective of this paper is the proposal of a novel architecture which presents various benefits over existing architectures and investigate its performance on the basis of the following techniques: (1) Mathematical analysis of the input and output queuing is performed (2) Simulation of TCP protocol (3) Eye-diagram analysis is also conducted to measure the signal degradation and BER at physical layer In the rest of paper, Section 2 describes related work of various optical architectures, Section 3 describes the PCOI architecture, Section 4 describes the system model for input and output queue analysis, Section 5 describes the input and output queuing analysis of PCOI architecture, Section 6 describes the TCP simulation of PCOI architecture, Section 7 presents the physical layer analysis in terms of eye diagram and Bit Error Rate (BER), and Section 8 presents the conclusions and future work.

Related Work
Various architectures have been proposed in the literature which encounter various problems of optical networks and exploit benefits of optical components. ese architectures can be classified into the categories which include (a) reconfigurable architectures, (b) low latency architectures, (c) low blocking probability architectures, (d) low power consumption architectures, (e) scaling link bandwidth architectures, and (f ) high radix architectures.

Reconfigurable Architectures.
ese architectures have the ability to change the network topology based on change in traffic patterns. Reconfigurable architecture mentioned in the literature includes [9][10][11] as explained subsequently.
In [9], the architecture presented is based on Wavelength Selective Switch (WSS) and MEMS switch. Every server has a unique wavelength, which is multiplexed together according to destination rack. WSS groups the wavelengths based on destination racks and sends groups of wavelengths to MEMS ports. e topology manager is responsible for configuring WSS and MEMS switch for proper function.
In [10], a reconfigurable architecture based on Arrayed Waveguide Grating Router (AWGR) and Tunable Wavelength Converter (TWC) is presented. Header is extracted from each packet and sent to the control plane. e payload of packet keeps on waiting in FDL until the control unit makes decision about the wavelength to be set for each packet through TWC. Based on change in wavelength, the AWGR routes the packet to proper destination port. e architecture in [11] is made up of MEMS switches. e control unit is responsible for configuration of MEMS switches. As MEMS switches are slow to configure, so this architecture is more suitable to circuit-switched applications as compared to packet-switched application. e architecture in [12] uses Software Defined Network (SDN) with optical switching. e building block is a pod, which hosts several racks. e ToRs are connected using a star topology. e switching within the same pod is performed passively using optical filtering. e network can be scaled by using pods in a ring topology. Each ring has WDM traffic and add/drop multiplexing to and from ring is performed on a per-wavelength basis. e data plane exists in TDMA, where time slots are accessed for rack to rack communication.

Low Latency
Architecture. Low latency architectures are the ones which have distributed control which include [10,13].
In [10], a broadcast-select Spanke-type architecture is presented which minimizes the control decisions in the network to reduce the latency. e packets from source nodes are broadcast to all destination nodes. At the destination node, Wavelength Selector (WS) selects the appropriate destination node. is architecture is scalable without affecting the latency.
In [13], the network is flattened into three stages. e first stage is called input module (IM), the second stage is called central module (CM), and the third stage is called output module (OM). Each module is made up of AWGR and a scheduler. e scheduler at each stage configures the wavelength of flows to be directed to appropriate output port of AWGR. is is also scalable without affecting the latency of the system.

Low Blocking Probability Architecture.
ese are the architecture which show a high number of successfully transmitting nodes out of total transmitting nodes at the same time. ese architectures include [14][15][16]. e architecture in [14] is a three-stage network. e first and third stages are based on AWGR, whereas the second stage is a time buffer. Collisions are avoided by using the time buffers. e packets wait in the second stage for contention to be resolved in the third stage.
In [15], the contention is avoided by using spacewavelength multiplexing and broadcast-select scheme.
e select unit has two functions: first to select the correct spatial group and second to select correct wavelength. e select units are made up of SOAs. e central scheduler controls the SOA gates. ere are two receivers on each destination node, which further reduces the blocking probability.
In [16], the Reflective SOA (RSOA) is behaving as the mutex element. In case of contention, when multiple input ports are trying to access the same destination port, the RSOA grants access to only one input port to transmit. e sender node only starts to transmit the data if a positive acknowledgment is given by RSOA.
e acknowledgment of RSOA is made up of reflected optical power. In this case, multiple sender nodes are trying to send to the same output port the reflected power if RSOA drops.

Low Power Consumption Architectures.
e architectures with low energy consumption are [17,18]. e architecture in [17] has highly distributed control, which makes it scalable to large port counts. Electronic buffer is implemented on each node, which has multiple queues, and each queue has a distinct wavelength. Multiple flows can arrive at a single output port, which can be resolved by Wavelength Selector (WS) partially. When packets from different queues of the same node go to a single destination node, the contention cannot be resolved and only one flow is forwarded. However, when multiple flows from different nodes arrive at the destination, such contention is resolved by WS by changing its wavelength.
In [19], the architecture is made up of clusters, boards, and nodes. Each board has multiple nodes and multiple boards are connected using optical wavelength multiplexing. Vertical Cavity Surface Emitting Lasers (VCSELs) are used to eliminate the need for external modulator, which conserves energy.
In [18], the two main optical communicating devices are AWGR and Microring Resonators (MRR). AWGRs are passive optical devices, which do not need any external energy input. MRRs are also very low power and high bandwidth devices.
is architecture is entirely made up of low energy consumption devices, which save energy.
In [20], an architecture is presented which minimizes energy consumption by using a combination of optical cross-connects and WDM rings. Optical circuit switches provide dedicated nonblocking circuits. WDM rings are interrack switching elements. e architecture in [21] uses space and wavelength multiplexing. e architecture is divided into cards and each card has multiple nodes. Each node has a distinct wavelength. It uses multiple level addressing. e card is selected using couplers in space domain and port is selected in wavelength domain by the wavelength.

Scaling Link Bandwidth Architectures.
ese architectures include [9,19,22]. In [19], Vertical Cavity Surface Emitting Lasers (VCSELs) are used, which behave as both the source of light and the modulator. VCSELs eliminate the need for external modulators in the transmitters. e unused VCSELs are shared among transmitters to increase their transmitting bandwidth, which makes this architecture a scaling bandwidth architecture.
In [23], the architecture uses the ring topology and wavelength multiplexing to interconnect the nodes. e add/ drop multiplexer and Wavelength Selective Switch (WSS) are used in the ring to select the wavelength for destination node and add the wavelength to send data by sender node.
is architecture is not much scalable. In [24], an architecture is presented in which AWGRs are used but the interconnection between racks is based on passive optical components. e mixed linear integer programming model is used for wavelength assignment, which ensures that a single wavelength is assigned between two server groups. e directionality of AWGR in ensured so that flows are always directed from input ports to output ports. e work in [25] presents an architecture which uses optics and commodity switches. e backplane is a switchless core made up of bus-based fiber rings. e architecture is divided into sectors and each sector is made up of ToR switches and interconnection pods. Within a sector, electronic switching is performed. e absence of switches implies full bisection bandwidth for single-hop communication.

High Radix Architectures.
High port count in the architectures is achieved if the network performance is not affected by increasing the number of nodes in the network.
In [26], a distributed and scalable optical packet switch architecture is presented. It is based on Arrayed Waveguide Grating Router (AWGR) and SOA. A single AWGR can connect multiple ToRs. It has a modular structure and nonblocking nature of AWGR makes it scalable. It shows lower latency and low blocking probability.
In [9], an architecture is presented which can achieve dynamic configuration of link bandwidth. For overall connectivity, Microelectromechanical System (MEMS) switch is used, which uses micromirrors to deflect the light to the output port. Wavelengths are grouped using Wavelength Selective Switch and directed to respective destination racks. e architecture in [10] presents a reconfigurable architecture, which is based on AWGR. Wavelength conversion mechanism is used at the input port to rout the light paths to respective destination ports. Tunable Wavelength Converters (TWC) are used for this purpose to change the wavelength of light signal. e labels are processed to extract the destination address, which is used to change the wavelength of input signal. e architecture in [13] flattens the networks, which reduces the number of hops in the network. It is a multistage architecture, which is achieved using three modules: the input module, central module, and output module. Each module is composed of TWC and AWGR. Wavelength switching is used in each module to direct the light path to appropriate input port of next module. e PCOI architecture was proposed to minimize the routing complexity by exploiting packet classification and use of queuing. It further reduced energy consumption by using passive optical components. e next section describes the PCOI architecture briefly.

The PCOI Architecture
e proposed architecture which avoids contention under various traffic patterns is shown in Figure 1. e nodes are arranged in the form of racks. Each rack is assigned a unique wavelength. is architecture exploits the benefits of input and output queuing, wavelength multiplexing, and space multiplexing. Each rack has a classify module. is module classifies the packets on the basis of destination rack and puts them in respective queues. e packets from queues are converted into optical domain by using Electrical-to-Optical (E/O) conversion. e optical flows from a single rack have a unique wavelength, thus requiring as many wavelengths as the number of nodes. e flows of a single destination rack from multiple sender racks have different wavelength, which allows simultaneous data to be received from multiple sender nodes, making the allto-one communication possible. Optical power combiner at receiver rack collects various wavelength flows, which are later demultiplexed and given to 1 × N switch at receiver rack. e optical power combiner is used because of its passive nature and low cost; it can combine various frequency optical signals without the need for external source. e packets of each source rack are kept separate using different wavelengths. e demultiplexed wavelengths contain the packets of respective source racks. Out-of-band signalling is used to reduce the header extraction time. At the destination rack, packets of each wavelength go through a 1 × N switch, which is also shown in Figure 2. e switch is designed after header detection packet goes to its destination node, where they are collected by the output queue in case of contention. e 1 × N switch in Figure 2 is a high port optical switch. e header information and payload of packet are separated. e header is sent to control unit, whereas the payload is sent to SOA after some delay, which is equal to the processing time of control unit. e delay is provided by the FDL which exits before the SOA. e payload keeps traveling in FDL until the decision is made by control unit to turn on the respective SOA of destination port. e rest of SOAs are turned off by the control unit. As a result, the payload only travels to the destination port of switch as determined by the destination address of header.

System Model
e PCOI architecture represents a nonblocking switch as it involves self-routing mechanism, where a packet finds its path to the desired output port. is is shown in Figure 3. e transmitter and receiver nodes represent the input and output queue system. Head-of-Line (HOL) blocking problem arises when multiple source nodes try to send to the same destination node, which results in stoppage of transmission of packets from head of input queue which are sending packets to that destination. is is shown in Figure 4. N input and output ports are assumed, and transmission is synchronous, which means that packets are sent from the input ports to different output ports at the same time. When the buffer of the output port is full, it sends backpressure signal to the respective input queue. After receiving the backpressure signal, the input queue stops transmitting the packets to that output port. e packets of different input queues that want to send packets to the same output buffer are made to wait in the input queue. e backpressure mechanism prevents overflow in the output buffer.
e effect of backpressure on the performance of optical interconnect for different buffer sizes is analysed. e number of ports is assumed to be large, that is, N ⟶ ∞. e PCOI architecture achieves speed-up at the output port by receiving multiple flows using wavelength, code, and space multiplexing. It results in significant improvements in delay reduction and throughput enhancement. e throughput enhancement is evident from Figure 4, where each destination ports 32 servers in the queuing model. It can process 32 streams at the same time, which increases the throughput and reduces the queuing delay.

The Input and Output Queue Analysis of PCOI Architecture
e PCOI architecture behaves as an N\times N switch. e packets are assumed to be of fixed length and fixed transmission time.
e traffic is randomly distributed, which  Mathematical Problems in Engineering means that a packet from an input port can be sent to any of the N output ports with equal probability, which is 1/N. e output queue operates on First Come First Serve (FCFS) basis. If the output port is idle, the packets go through the PCOI architecture directly. e performance of the PCOI architecture is affected by the output buffer size "b." e arrival process is Bernoulli, as the probability of arrival in a time slot defines the load. e efficiency of the PCOI architecture is measured, based on maximum throughput and average delay. e packet delay is made up of three components: (i) waiting time in input queue until the head; (ii) waiting time at the head of the input queue due to HOL, and (iii) waiting time at the output queue due to contention of the output port. e traffic of input port on average is "p" packets per unit time. Packets of the input ports are independent of each other.
ere is a possibility that more than "b" packets are trying to access the same output port. e output port of the PCOI architecture can receive 32 multiplexed streams of packets. e waiting time at the input buffer until it begins the transmission to the output port represents an M/D/32 system. e M/D/32 queue can be treated as M/D/1 queue by scaling basic unit of time; the p of M/D/1 queue becomes 32p for M/D/32 queue [27]. e analysis is based on [28], which represents an M/ D/1 queue. It is modified for the RPL architecture by replacing p with 32p. Closed form expressions for the average delay and maximum throughput are derived. us, the total average delay, which is shown in Figure 5, is given by where Q ad Q 2 are the first and second moments of waiting customers of M/D/c queue and p j are the steady-state probabilities given by e corresponding p. g. f for the steady-state probabilities is e probability that a packet arrives at the head of the input queue and experiences delay due to backpressure, which is shown in Figure 6, is given by

Simulation
e performance of the PCOI architecture is measured for different output queue sizes using TCP. Simulation setup consists of a 512 × 512 network. e network is implemented in NS2, a discrete event simulator developed for research and educational use [29]. It is an open-source software. For many-to-many communication pattern, each node transmits to a randomly chosen destination node. e random destination node address is generated according to a uniform random variable. Figure 7 shows the number of sender nodes trying to access the destination nodes. is number fluctuates between 1 and 5. e median of these two numbers is 3, so for many-to-many communication pattern, it is assumed that 3 nodes are trying to access a destination node.
TCP is used as the transmission protocol. e traffic has a constant bit rate. e load is varied from 1 Mbps to 10 Mbps, which accounts for the normalized load from 0.1 to 1 in the figures in the following sections. e link delay is assumed to be 10 ms. TCP is responsible for 90 percent of Internet traffic. TCP can adapt the transmission of packets according to bandwidth; it avoids congestion and retransmits lost packets. By keeping the output queue buffer size constant, the network initially shows a higher average delay due to the slow-start phase of the TCP. TCP has a slow-start phase because all the nodes try to transmit at the same time and congestion causes packets of input nodes to wait to avoid packet loss. Performance of TCP is analysed by changing the buffer size of the output queue in NS2. e Drop-Tail queue is implemented in which the last arrived packet is dropped if the queue is full. e overall throughput of TCP increases by increasing the buffer size, whereas the average delay of the packets decreases by increasing the buffer size of the output queue. Constant bit rate (CBR) application is used to generate traffic on the nodes. TCP uses reliable congestion control, in which acknowledgment is created by the destination to know whether packets have been received. Lost packets are interpreted as a congestion signal. Initially, the average delay is high but as the load increases, the average delay decreases. It uses a dynamic congestion window, which grows rapidly initially and then increases slowly as it reaches a threshold. When congestion is detected, it drops rapidly. e output of the NS2 simulation is in the form of a trace file. e required information of delay and throughput is obtained from the trace file by performing text processing using a Perl script. e small Perl programs are used as filters to extract the required information from the text. e trace file has 12 fields, the first field is of event type, the second field is the time at which the event occurs, the third field is the input node of link at which event occurs, the fourth field gives output node of link, the fifth field is of the packet type (TCP, UDP, AGT, etc.), the sixth field gives the packet size in bytes, the seventh field includes flags, the eighth field gives the flow id, the ninth field gives the address in the form "node.port," the tenth field gives the address in the same form, the eleventh field gives the packet sequence number of the network layer, and the twelfth field shows the unique id of the packet. Figure 8 shows the TCP delay for different output buffer sizes. Initially delay is large due to slow-start phase of the TCP and it reduces later. Figure 9 shows the throughput performance of the TCP for different buffer sizes. It is seen that for lower buffer size there is sudden drop in throughput for higher load because of sudden change in congestion window size. For lower value of output buffer size, the congestion occurs early by increasing the load. Figure 10 shows the maximum achievable throughput as a function of output queue buffer size. All-to-one communication pattern is the worst traffic encountered in any network. Figures 11 and 12 show that PCOI architecture shows performance benefit for this traffic pattern for all-to-one and many-to-many communication paradigm, respectively.

Physical Layer Analysis
e physical layer of PCOI architecture is simulated in the OptiSystem [30], to measure the BER and signal degradation. e signal degradation can be analysed from the eye diagram, which is shown in Figure 13. e simulation in OptiSystem is carried out by analysing the optical path of each architecture. e optical signal consists of pseudorandom bit sequence. For bit generation, non-return-to-zero (NRZ) pulse is used. NRZ pulse is used because the pulses have more energy and have additional rest state besides zeros and ones, which gives a bigger margin between two logic levels. Mach Zehnder (MZ) modulator generates the optical     signal which corresponds to the bitstream. MZ modulator is popular for low power, compact size, and monolithic integration. e transmit power is varied from −12 dBm to 5 dBm. e eye diagram of the PCOI architecture in Figure 13 is calculated using the 0 dBm transmit power. It has shown a very good eye opening. is shows that received signal in the PCOI architecture has enough signal quality to be detected. e eye opening shows that it has low jitter. e log (BER) of −10 is achieved for this eye diagram.

Conclusions and Future Work
e modelling of optical data centres is very important, as it helps in making important decisions about their performance. It is very important to consider at design time of optical data centres which optical components to use. e choice of optical components directly affects the quality of received signal. If the quality of received signal is bad, it can adversely affect the Bit Error Rate of the system. Ambiguity  in logic levels makes it difficult for the receiver to distinguish between bits. However, if an optical component is used that degrades the signal quality, then the mitigation techniques should be used to counter the effect of optical components.
ere are two ways to improve the performance of optical data centres, either reducing the signal degradation or making the design of optical data centre to reduce contention. e main advantages of the PCOI architecture are routing simplicity, reduced energy consumption, scalability to large port count, packet loss avoidance, and all-to-one communication support. Routing simplicity is achieved in PCOI by using the packet classifier at the sender side, which classifies the packets based on destination rack and puts them in a queue. e reduced energy consumption is achieved in PCOI by use of passive optical components. Passive optical components are those which do not need any external power source for their working. e PCOI architecture is scalable to large port count due to lack of central controller. e PCOI architecture can avoid packet loss in worst communication patterns by exploiting the redundancy of optical components and queues to temporary store packets with collisions. All-to-one communication pattern is the worst communication pattern in the communication system, in which all the sender nodes try to access a destination node or rack at the same time. PCOI architecture supports all-to-one communication pattern by using queues and passive optical components. e general problems seen in the PCOI architecture are the use of large number of optical components and the signal degradation.
ere are two main types of optical signal degradations in PCOI: one is caused by passive optical components which is simply the reduction in optical power as the optical signal passes through a passive optical component and the second is the change in the shape of optical signal as it goes through SOA. Due to signal degradation, it is not possible to improve the BER beyond a certain limit. e future work is related to the introduction of new modulation formats to the performance of PCOI and the analysis of signal degradation imposed by new modulation formats.

Data Availability
No data were used to support this study. e authors have conducted the simulations to evaluate the performance of proposed protocol. However, any query about the research conducted in this paper is highly appreciated and can be answered by the principal authors Rab Nawaz Jadoon (rabnawaz@cuiatd.edu.pk) and Mohsin Fayyaz (mohsinf@ cuiatd.edu.pk) upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest.