Embedded Parallelism Enabling Ultralow-Power Zigbee Voice Communications

Short-range wireless technologies are known to transmit voice, audio, image, and video messages in real time. Energy consumption and transmission reach are critical in such networks, especially for portable and power autonomous devices. The purpose of the Voice over Zigbee technology is to provide a competitive oﬀering that excels in these performance aspects. Due to the CSMA-CA mechanism implemented in the 802.15.4 layer, a well-designed strategy must be considered in Zigbee to create a robust, reliable, and full-duplex conversation. In past eﬀorts, we proved that the radio channel of Zigbee has enough bandwidth to support a full-duplex conversation with narrow-band voice codecs. Our embedded implementation of the Speex voice codec targeted the development of a low-cost, ultralow-power, long-range wireless headset using Zigbee technology to transmit voice in full-duplex mode for use with leading PC VoIP programs. Furthermore, we presented the real environment performance evaluation and power consumption tests involving the developed headset prototype. Talk time is comparable to Bluetooth including at the power budget, the codec processing, and analogue audio interface, but its deep-sleep lifetime more than doubles the Bluetooth performance. This was one of the very few successful eﬀorts to port a voice codec on an ultralow-power DSP for use with power sensitive Zigbee applications, which is highly cited in the literature, proving additionally that using an open-source codec can deliver similar voice quality, reducing the total system cost. The current paper elaborates on the embedded parallelism of the Speex implementation and the exploitation of the DSP architectural parallelism which critically enabled the Voice over Zigbee application on the ultralow-power DSP platform. Another signiﬁcant contribution of this work is towards understanding and resolving the challenges faced when trying to achieve good quality transmission of media over constrained devices and networks. The way to new ultralow-power voice-related Zigbee and constrained network applications is open.


Introduction
Zigbee is a wireless standards-based technology that addresses the needs of sensory network applications, enabling broad-based deployment of complex wireless networks with low-cost and low-power solutions that run for years on inexpensive primary batteries.
e raw data rate of this technology at 2.4 GHz, 250 kbps is low compared with other wireless technologies, but sufficient to transmit voice using narrow-band voice codes.
Prior to our work, few approaches can be found in the literature that examine the possibility of 802.15.4/Zigbee networks supporting voice transmission, either through simulation or based on already deployed networks (e.g., [1][2][3][4]).e general conclusion drawn from previous work is that voice can be carried over Zigbee; however, there exist several restrictions in terms of available bandwidth and network topology.
e Eurostars Z-Phone project goal was to develop a low-cost, ultralow-power, long-range, ergonomic wireless headset using Zigbee Technology to transmit voice in fullduplex mode for use with leading PC VoIP programs (e.g., Skype and Messenger), in addition to a USB-Zigbee bridge module and a PC driver [5].e initial objectives were to achieve 2x communication autonomy and 4x distance compared with Bluetooth technology.DECT and WiFi consume much more than Bluetooth and cannot be easily integrated into small devices. is increase in autonomy has also an impact on the lifetime of the rechargeable battery and consequently on the lifetime of the headset, since batteries in wireless headsets are not replaceable.
Due to the limited available bandwidth, voice must be compressed before being transmitted.e selection of the most suitable voice codec and its implementation is challenging, considering the requirements of Z-Phone.e paper summarizes the results of our previous work regarding the design, implementation, and validation of the Z-Phone architecture and the headset system and presents the details of the embedded parallelism which made the application feasible in the constrained environment.Important future work items are presented in the conclusions section.

VoZ Technology
Various wireless communication methods are known to transmit voice, audio, and/or video messages in real time.
Examples of these methods are used in the Bluetooth technologies based on the IEEE 802.15.1 protocol, WiFi based on the IEEE 802.11 a/b/g protocol, and DECT.e purpose of the VoZ (Voice over Zigbee) technology is to overcome the competitive technologies regarding energy consumption, a critical issue for portable and/or power autonomous devices, and transmission reach.Due to the CSMA-CA mechanism implemented in the 802.15.4 layer [6], several strategies must be considered in Zigbee to create a robust, reliable, and "full-duplex" conversation.Such strategies include setting up a partially asynchronous communication network and not using confirmation messages in the communication transaction.Robust and good quality conversations have been implemented over Zigbee testing several voice codecs such as G.729A (8 kbps) or G.723.1 (5.3/6.3 kbps).As shown below, the radio channel of Zigbee has enough bandwidth to support a full-duplex conversation with narrow-band voice codecs.
A deeper study of the Zigbee link necessitates the determination of the Zigbee stack time.Stack time is the time that the Zigbee stack firmware running on the microprocessor takes from capturing a Zigbee packet in the air to delivering it to the application layer.A set of transmissions were carried out between Zigbee nodes, with one node sending a message and another resending it instantaneously to calculate the time required for the Zigbee stack to process a packet (Figure 1).Measurements resolution was 1 ms.
e difference between actual time and time stamp (ticks) can be calculated by the following expression: To simplify the calculation, we can suppose that e final expression would be A Zigbee sniffer was used to find out the real header length transmitted.In our trials, using the stack from EmberZNet PRO 3.11 with security features disabled over the xip EM250 [7], the header length resulted to be 30 bytes (4 PHY, 10 MAC, 8 NWK, and 8 APL) [8].e PHY header length was variable between 3 and 5 bytes.Table 1 shows the test results obtained in function of the bytes of information or payload.e tick value has been obtained as the mean from one thousand measurements.
Table 1 depicts the communication test results including the minimum time between voice frames required to create a full-duplex conversation in function of the payload.To extract the stack time, the Zigbee 2.4 GHz data rate of 250 kbps is taken into account in the calculation of the propagation time. is time depends on the total length of the packet to be sent.In all cases, the measurements follow a normal distribution as shown in Table 2. Figure 2 shows the similarity between both shapes-ticks measurements distribution and normal distribution-in this diagram for 80 bytes.
According to Table 2 and the normal distribution probability formula, we can send and receive an 80-byte packet payload (plus headers) every 22 ms with a probability error near to zero, as shown in the following calculation.
at is to say, we could create a conversation using a voice codec of 29 kbps (80 * 8 bits/0.022ms), in this case:

Progress beyond e State-of-e-Art.
is section presents a comparison of our work in the Z-Phone project with previous relevant work.e most similar research to the Z-Phone project was the one named "Voice Communications over Zigbee Networks," performed by AT&T Labs Research [1].In the Zigbee voice communication transmission, the main issue to solve is to avoid the collisions between packets.If two packets are transmitted at the same time, the CSMA/CA used in Zigbee adds a random delay to both packets before resending them. is way the packets are sent too late for a real-time voice communication.e difference between our solution and the one proposed by AT&T is the method used to avoid these collisions.We use a ping-pong method that only allows a node to transmit a packet when the other nodes are not sending any 2 Journal of Computer Networks and Communications information (similar to a synchronization method used in the 1-channel serial data buses), while the AT&T system uses the solution proposed in the VoIP protocol, which includes a TDMA mechanism access to the channel.At the end, the work presented in [1] will achieve similar features, with limited 1-2 hops voice communication and 8 meters free obstacle transmission distance.In our own work, we have used the G.729a codec to test and implement our ping-pong method, as it is the most standard for voice compression in embedded systems.AT&Tsystem has used the same codec in their work.In the Z-Phone project, we moved further and proved that, using an open-source codec can deliver similar voice quality, reducing the total cost of the system.Moreover, we have presented engineering details of e cient codec implementation and validation in a restricted ultralowpower-embedded environment [7,9].e works presented in [3,4] have a di erent approximation as per voice transmission.Both share the voice transmission with various data transmission, which increases signi cantly the packet collisions and applies ine ectively a solution which is similar to our ping-pong method.In these cases, they use a time-slotted communication in the MAC layer to avoid the collisions, but a strict synchronization method (beaconed frames or AM radio synchronization packets) needs to be integrated in the proposed mechanism to work correctly.e codecs used in these works are similar to G.729a or Speex, with a bit rate of 4.4 to 8 kbps in one case and 8 to 16 kbps in the other, respectively.Furthermore, the work presented in [2] is only a simulation to test if a Zigbee network could have enough throughputs to manage a 16 kbps voice codec, but the authors do not provide any information about how to solve the synchronization and collision problem.
In conclusion, the focus of our work is on the challenges faced when trying to achieve high-quality voice transmission over constrained devices and networks, which is also what sets apart this work from the previous publications referenced.e aim is to present in detail the above procedure using Z-Phone's requirements as a case study for this problem.Furthermore, our porting of the Speex open-source voice codec on an ultralow-power DSP for use with power sensitive Zigbee and constrained network applications is one of the very few successful e orts in this area.Chang et al. [14] designed and implemented the Speex voice codec at 8, 11, and 15 Kbps bit rates on an embedded system with Xbee WSN nodes using multihopping Zigbee communication, presenting a performance analysis in terms of distance-signal strength dependence, network delay, and PESQ-MOS voice quality.Meiqin et al. [15] achieved wireless real-time voice communication based on TI-Zstack protocol and AMBE-1000 and CC2530 modules.e proposed wireless voice communication system consists of ve nodes, of which one is coordinator, one is end device, and the other three are routers.
e system can be used in underground coal-mine equipment, earthquake disaster relief, re ght rescue, etc.
e longest end-to-end communication distance is 15 m.When using power ampli er (increasing the energy consumption) and routers, distance can be longer.e max hop is two due to each router adding noise to the signal.Yang et al. [16] provide a voice communication system which adopts CML's CMX618 for voice data collection and forward error correction coding, TI's CC2530 for Zigbee network formation, and voice data transmission.e authors assess the system for Point-to-Point (P2P) and End-to-End (E2E), using a network of one voice base station and eight router nodes, con gurations.
e results, stressing the voice quality in mesh Zigbee with multiple hops (for mine monitoring), show that the proposed system ensures voice quality in complex Zigbee network with higher integration and lower power consumption.Fu et al. [17] present the detailed development of a proprietary low-power WSN platform with star topology and 3 voice communication modes: P2P, Peer-Central-Peer, and voice conference, using the CVSD 15.625 Kbps voice codec.Zigbee communications are based on the 8-bit RF SOC CC2430/CC2530.Extensive analysis demonstrates that the proposed system is a low-power, low-speed, and highperformance WSN platform.It consists of up to 16 audio sensor nodes, 64 typical parameter monitoring nodes, and 1 central node for network establishment and management.
e audio channel capacity is 3 real-time two-way voice communications or audio conference including all audio nodes at the same time, and the voice delay is less than 40 ms.
e communication distance between audio nodes is longer than 70 meters indoors and 120 meters outdoors.e authors suggest possible applications in emergency voice communication, audio/sound sensor networks, and health monitoring systems.
Chang et al. [18] develop an end-to-end rescue communication voice gateway to provide a stable voice transmission over Bluetooth and Zigbee networks for mountain climbers.To implement the device, it adopts Speex with submode 4 and 11 kbps data rate to provide the best tradeo between speech quality and computational complexity, based on our work.e performance analysis, in terms of end-to-end throughput, packet loss rate, jitter, and delay, shows that the proposed implementation can e ciently support voice transmission over wireless sensor networks.A mine wireless voice communication system based on Zigbee is presented in [19] adopting CC2530 as RF sendingreceiving unit of voice communication node and AMBE voice codec (selected as an amateur radio speech codec) to realize voice message two-way wireless communication over the Zigbee protocol.Zigbee is also examined in the framework of space mission operations [20].WPANs are used in space to convey information over relatively short distances among the participant receivers.Unlike WLANs, connections effected via WPANs involve little or no infrastructure.is allows small, power efficient, inexpensive solutions to be implemented for a wide range of devices that can be duty-cycled aggressively to a low-power sleep state.
Report [20] acknowledges that Zigbee has not been as widely adopted as expected, due in large part to the difficulty of the 802.15.4 MAC in enabling reliable transport in the face of difficult networking environments.In this framework, the authors of the present work were contacted by engineers of the Institute of Space Techniques and Technologies of the Republic of Kazakhstan, who were working in the field of transmitting voice over Zigbee technology, to discuss important details of our VoZ speech codec implementation described in [9].e European EAR-IT project addresses real-life experimentations of intelligent acoustic for supporting high societal value applications in a large-scale smart environment.For instance, a city emergency center can request ondemand acoustic data samples for surveillance purposes and management of emergencies.Pham et al. [21] and Pham and Cousin [22] present experimentations on streaming encoded acoustic on the SmartSantander largescale test-bed comprising more than 2000 IoT nodes (ATmega1281 microcontroller-based WaspMote nodes and TelosB CM5000 and CM3000 motes).An audio board was built around a 16-bit Microchip dsPIC33EP512 microcontroller offering enough processing power to encode the audio data in real-time to produce an optimized 8 kbps encoded Speex audio stream (Speex encoding library is provided by Microchip).Audio boards used the TelosB nodes as host boards due to inability of multihop transmission in WaspMotes.Multihop transmission used both WaspMote and TelosB motes as relay nodes.ese works further highlight the main sources of delays and show how multihop streaming of acoustic data can be achieved by carefully taking into account these performance limitations with appropriate audio aggregation techniques.
Koucheryavy et al. [23] acknowledge the positive experience of transfer of voice data over the Zigbee protocol and examines research issues of Public Flying Ubiquitous Sensor Networks (FUSN-P) and Flying Ad Hoc Networks (FANETs) involving terrestrial segments and aerial segments composed by Unmanned Aerial Vehicles (UAVs) and drones.It presents a model network for full-scale experiment and solutions for the Internet of ings.e voice and video transmission from terrestrial segment to UAV-P using different protocols (Zigbee, 6LoWPAN, and RPL) is a critical investigation task because often it may be the only chance to pass the necessary information to the area of terrestrial sensor fields.Other important tasks involve the development of clustering algorithms for terrestrial and flying segments, route optimization for data collection, as well as the consideration of UAV-P's as a queue system and FUSN-P as a queue network, respectively.Kirichek et al. [24] examine the efficiency of using Zigbee in Flying Ubiquitous Sensor Networks for transferring voice and image data.e authors conclude that Zigbee networks, praised for their low cost, high autonomy, simple creation, and survivability, are useful for transmitting multimedia information only where requirements to the quality of voice and image transmission rate are low.

Z-Phone Architecture
e software/firmware architecture of Z-Phone used with a PC is illustrated in Figure 3.It consists of a wireless headset and a USB dongle providing Zigbee connectivity to the PC.In addition, the Z-Phone driver installed on the PC provides a software interface to any hosted VoIP application.e Z-Phone driver and the headset's DSP contain one instance of a voice codec.Voice packets arriving from the Internet are handled by the VoIP application (e.g., Skype).e decoded voice data are buffered for the Z-Phone driver to re-encode them using the voice encoder.e encoded voice data are passed through the USB interface to the dongle where they are handled by the Zigbee stack and are transmitted over the air to the Z-Phone headset.After passing the Zigbee stack at that end, the data are transferred to the headset's DSP engine where they are decoded and reproduced as voice through the speaker.e opposite path is followed for the upstream voice data from the headset's microphone to the Z-Phone driver and then from the VoIP Application to the Internet.
Several factors affect the useful bandwidth that is used for carrying the encoded voice frames.Most importantly the fact that voice is transmitted in full-duplex mode reduces the available bandwidth in half, since the two nodes do not transmit data concurrently.In addition, it was decided not to include many voice frames in one packet in order to reduce latency and also reduce the impact of lost packets on voice quality.
e tradeoff for this decision is the considerable overhead induced by the headers.Since small packets are regularly transmitted, stack processing and transmission times also affect the useful bandwidth.In Section 2, we proved that, in the worst-case scenario (maximum transmission times), the respective data rates should not exceed 29 Kbps.Several measurements under these conditions show that data rates up to 38 Kbps are feasible.

Codec Selection.
In Z-Phone, we defined a set of requirements to provide a guideline in the selection of the most appropriate codec.A fixed-point implementation had to be considered since the DSP platform of the headset was a low-end, low-power DSP without FPU support.Bandwidth, processing, and memory limitations of the end system favored the selection of a low bit-rate codec.However, voice quality is an important factor for Z-Phone; hence, the tradeo for choosing a low bit-rate codec should be carefully evaluated.Furthermore, voice quality is a ected by the algorithmic delay of a speech codec (time required to gather the samples that form one speech frame), being a signi cant part of the transmission delay between the sender and the receiver in a VoIP call (see [9] for a comprehensive analysis on algorithmic delay).Another aspect of speech codecs that a ects voice quality in network conditions with increased packet loss rates is their packet loss concealment mechanism.
Most of the voice codecs evaluated had a xed mode of operation in terms of bit rate and consequently voice quality.
e only codecs that o er exibility in this matter, which would allow ne tuning when integrated in the Z-Phone system, are AMR, SILK, and Speex with the rst being royalty-based.A royalty-based codec, although used often for proof-of-concept work (e.g., [1]), clearly hinders the market perspective of the product envisioned by Z-Phone, by increasing its cost.In addition, the Z-Phone system does not require the use of speci c codecs for issues such as interoperability.For that reason, the standard G.729 choice was abandoned.SILK and iLBC presented many attractive features but at the time of performing this work they were not available as open source.erefore, the Speex codec was selected for this project, since it can o er superior voice quality to the vast majority of equivalent codecs when operating at 15 or 18 kbps [9].

Codec Implementation.
Available Speex source code ports to popular DSP platforms, such as ARMv4, ARMv5E, the Black n architecture, the TI C5xx family, Freescale's DSP56852, and the Microchip dsPIC family are not suitable for Z-Phone that targeted an ultralow-power current consumption for both the DSP and the audio interface support circuitry.
e DSP selected for Z-Phone was ONSemi's BelaSigna 300 [25].A very important advantage of BelaSigna 300 over other similar DSPs (e.g., microchip's dsPIC family) is the integrated analogue audio interface that minimizes the need for external components, which could prove to be a huge bene t in housing and its associated costs.It o ers high-performance audio processing capabilities and exible programming while satisfying form-factor (size of a rice seed) and low-power constraints.
e main CFX DSP core is user programmable using 24-bit xed-point, dual-MAC, and dual-Harvard architecture.It is able to perform two MACs, two memory operations, and two pointer upgrades in one clock cycle.BelaSigna 300 further includes a con gurable accelerator that is controlled by the main DSP core.
Codec implementation involved the porting of the subset of Speex encoder and decoder functionality that was necessary for our application.e question that was raised was whether a very small IC that operates at a low clock frequency which is expected to present performance and memory constraints such as BelaSigna 300 would be able to handle the processing power and memory requirements of Speex.In [26], a port of Speex operating at 8 kbps on an ARM-based microcontroller working at 72 MHz and the resulting resource requirements are presented; however, the target platform was not directly comparable to BelaSigna 300.On the other hand, the results of the porting of Speex on dsPIC [27], using only narrowband encoder and decoder and only one mode of operation (8 kbps), indicated that BelaSigna could support at least part of Speex functionality.
We have presented detailed information regarding the codec implementation in [7,9], which we do not reiterate here.We decided to implement submodes 3 and 4 (8 and 11 kbps) as these seem to present the best tradeo between speech quality and performance requirements.
ey also share the same functions for LSP quantization and adaptive codebook search meaning that supporting both submodes does not require signi cant additional e ort.
During the codec implementation, it was found that it was not always possible to achieve bit-exact results (exact same output with a reference implementation for a speci c input).e two factors that a ected bit-exactness were the di erences in arithmetic and in rounding between BelaSigna 300 and Speex source code.e memory requirements of the partial Speex port to BelaSigna 300 are presented in Table 3.

Speech Quality.
Speex delivers good quality both in 8 kbps and in 11 kbps mode [28].In order to evaluate the outcome of the voice codec implementation, a series of tests on the speech quality was performed using ITU-T Perceptual Evaluation of Speech Quality test methodology for automated assessment of the speech quality as experienced by a user of a telephony system [29].PESQ takes as input the original signal which in this case was a wav le containing 51 seconds of di erent male speakers and the degraded signal which was a wav le that resulted from encoding and decoding the original wav le.
Figure 4 depicts the PESQ-MOS obtained for the two Speex submodes that are of interest for Z-Phone, for different complexities.Each submode was tested in three 6 Journal of Computer Networks and Communications scenarios, including encoding and decoding on the PC using the original Speex library as well as Belasigna transmit encoding and receive decoding.It must be noted that this test is not used as an evaluation of the speech quality that Speex is able to deliver, but instead as an indication for the development process of the correctness of the porting.It also gives an indication of the impact of the non-bit-exact implementation.As it can be seen on the graph, the Speex port to BelaSigna 300 follows closely the original Speex implementation in terms of speech quality with no signi cant deviations observed.It can also be seen that the value of the "complexity" con guration parameter does not have a signi cant impact on voice quality.Considering the tradeo between performance requirements and speech quality, a value of 1 to 2 for the "complexity" parameter of Speex seems to be su cient.

Performance.
Speex uses a 20 ms speech frame.erefore, it must be ensured that one instance of the encoder, one instance of the decoder, and the remaining tasks, such as data transfer and mixing, are executed in less than 20 ms.
e decoder execution time does not change signi cantly for di erent submodes and values of complexity in the encoding process, and it was measured to be around 1.5 ms. e remaining tasks require roughly 0.5 ms, mostly due to data transfers.Figure 5 depicts the execution time of all the DSP's tasks for the two submodes and for di erent values of the complexity parameter.e execution time exceeds the 20 ms limit for values of complexity higher than 2 for 11 kbps and higher than 4 for 8 kbps.However, since the speech quality is not signi cantly improved with higher values of complexity, the performance of the implementation is acceptable for this application.

Transmission Tests.
ree di erent distances were applied between the Z-Phone headset and the USB dongle to check the transmission capability of the system.To test the link, 256 packets were sent to the USB dongle, with 60 msec delay after each package.
e transmission power was 3 dBm.
e Link Quality Index and the received signal strength were recorded for every received packet (Figures 6  and 7).255 packets arrived from distances 1.5 m and 5 m (99.61%).Another packet was lost when distance between the devices increased to 8 m (successful reception 99.22%).As expected, the results show that as the distance between the Headset and the Dongle increased, the quality of the link decreased.However, there was no big di erence detected between the link quality results even if it measured at 5 m or 8 m distances.

Power Consumption.
is section summarizes the power consumption performance of the Z-Phone system and a comparison with a Bluetooth reference system which is based on the In neon PMB 8753 BlueMoon UniCellular single-chip Bluetooth v2.0 + EDR solution [30].e major energy consumption elements of the Z-Phone system are the Belasigna 300 audio DSP and the EM250 single-chip Zigbee solution with an integrated Zigbee transceiver.e system further comprises (i) a small power management unit targeted for low-power consumer handheld end equipment, which contains the battery charger and a step-down DCDC converter and (ii) a programmable low-power clock generator (10 μA max in power-down mode), which provides the external clock required by Belasigna 300 WLCSP package at 40 MHz working frequency for optimized codec performance on the restricted platform.
e Bluetooth chipset is clocked through the internal crystal oscillator   Z-Phone), while the deep-sleep lifetime (similar to the "almost ready" function available in some Bluetooth headset products) can reach 1144 h (48 days).A methodology for measurement of average current consumption as a useful tool to select a chipset/module and help the engineer understand a system/product early in the development phase is presented in [31].We further measured the efficiency of our Bluetooth reference system for carrying voice over packetoriented ACL links, which has been shown to achieve better TCP connection performance at a slight increase of voice delay in a piconet topology sharing bandwidth between voice and data links [32].Even if Bluetooth headsets do not use ACL links for voice communication, in fact this comparison with Z-Phone over a better comparable network sublayer demonstrates the power efficiency of our Zigbee implementation as opposed to Bluetooth in terms of bandwidth use.Zigbee demonstrates more than 2x deep-sleep lifetime against Bluetooth between successive battery charges, consequently extending battery life.Excluding the almost zero current drawn by the Zigbee transceiver and the power management unit and analogue input and output circuitry which can be considered common between the two systems, it turns out that Z-Phone's Belasigna 300 DSP and lowpower clock generator in total require approximately half of the ultralow-power mode current of the Bluetooth chipset.Current consumption values are taken with: LEDs and external EEPROMs disconnected, 150 Ohm speaker impedance, no RF retransmissions, RF TX power set to 0 dBm (class 3 devices), and microphone bias set to minimum current level.More details regarding the Z-Phone DSP and clock generator current consumption can be found in [30].

Z-Phone Headset Unit.
e headset unit plays an important role in the Z-Phone system: it not only serves as a voice transcoder combined with a Zigbee transceiver but also provides a basic optical and interacting interface for the users and battery charging options via standard microUSB socket.
e audio DSP is wired to the EM250 Zigbee transceiver via I2C communication bus and eases the interfacing of the microphone and the speaker by providing a built-in analogue front-end and Class D output stage.Mixing the different call tones/status with voice is also the task of the DSP.e unit is powered by a standard 3.7 V-150 mAh Li-Ion coin battery.Charging it and supplying the different voltage levels for the DSP and the Zigbee chip is the role of a multioutput DC/DC converter and battery charger unit.e battery can be easily charged by plugging the unit to any USB host device.e user can read the battery status from the optical indicators located on the headset.To eliminate glitches, artefacts in the voice and to handle the demand of the two different voltage levels and the two different frequencies of the clock signals, a serially configurable lowpower 2-channel PLL clock generator that offers low jitter, individually settable power supply and frequency for both clock outputs is also incorporated in the design.
A standard user interface was created for the user to learn how to use the buttons of the headset: power on/off, hook on/hang up, pairing, and volume up/down.e same applies to the optical indicator.e dedicated LEDs indicate different battery and call statuses.e housing of the headset is made with injection moulding using a nonconducting thermoplastic material, ABS, which is impact resistant and mechanically robust.

Z-Phone Driver.
e USB dongle supports USB Audio and USB HID standards, so it is compatible with the major operating systems.e memory resident application running under Windows environment provides a bridge between selected applications (e.g., Skype and X-Lite) and the headset by handling call events, such as picking up an incoming call or terminating a call.It also contains features such as call logging and an address book including photos, etc.Without this application, the headset can be used in a similar way to the ordinary headphones.

Belasigna 300 System Architecture.
e BelaSigna 300 system is an asymmetric dual-core architecture, mixedsignal system-on-chip designed specifically for audio processing.e BelaSigna 300 system is centered around two processing cores: the CFX DSP and the HEAR Configurable accelerator.
e CFX DSP Core is used to configure the system and coordinate the flow of signal data progressing through the system.
e CFX DSP can also be used for custom signal processing applications that cannot be handled by the HEAR Core.e HEAR Core is a microcode configurable signal processing engine that works with the CFX DSP Core to execute a variety of signal processing functions.
e CFX and HEAR cores have a few local and shared memories available to them.
e two cores process data brought into system memory by the Input/Output Controller (IOC).e IOC can handle inputs from either an analogue input stage or PCM interface input and can handle outputs to a digital direct-drive output stage or PCM interface output.e FIFO controller simplifies access to data by the CFX DSP, the HEAR, and the IOC by providing configurable hardware-based FIFO buffers.Figure 8 illustrates the Belasigna 300 system architecture.

Use of System-Level Parallelism on Speex Implementation.
Although BelaSigna 300 includes a powerful coprocessor (HEAR), we decided to refrain from extensive use of it due to the following two factors: Journal of Computer Networks and Communications (i) e differences of its rounding mechanism compared to the rounding needed by Speex and the fact that Speex uses operations between operands with different accuracy would possibly require adaptation of the algorithm (e.g., recalculating filter coefficients with different accuracies).(ii) Its dedicated memory for input and output data would require frequent data transfers.is would have a negative impact on performance and development time since this approach could prove to be error-prone.
Considering the development time requirements of the project, we decided to use the HEAR coprocessor only for a few basic vector operations.
All processing is performed within 20 ms, which is the frame duration used by Speex.At any given time, 4 frames are being processed: one frame being captured by the input stage, one frame being encoded, one frame being decoded, and one decoded frame being reproduced by the output stage.Encoded frames are exchanged between the DSP engine and the Zigbee interface module through the I2C controller.On system level, the following tasks are performed in parallel within a 20 ms time slot: (1) Analogue-to-digital conversion and storing of the samples on input FIFO buffer by the IOC input side.(2) Reading frame from input FIFO, frame encoding, frame decoding, and possible tone generation by the DSP. is is followed by mixing the two outputs of the previous step and storing the final output on the output FIFO buffer, using the HEAR coprocessor.(3) Digital-to-analogue conversion of the previously decoded frame that is stored in the output FIFO by the IOC output side.(4) Encoded frame exchange with the Zigbee interface module through the I2C interface.

Instruction-Level Parallelism.
e Speex encoder and decoder are fully implemented on the CFX DSP.Extensive instruction-level parallelism is possible, in a very compact format.e DSP can execute up to four computation operations in parallel with two data transfers [33].
is is achieved by the different execution units that utilize the multiple bus and memory architecture and the dual data path (X and Y) as it can be seen in Figure 9. e execution units are as follows: On instruction level, CFX has three types of instructions.

Long Instructions.
ese instructions perform only one operation in one clock cycle.

Arithmetic-Move Instructions.
ese instructions contain two parts: an arithmetic part that is executed by the DCU and a data movement part that is executed by the DMU and AGUs.Syntactically, the two parts are separated by "|||" indicating that they are executed in parallel, so an abstract Arithmetic-Move instruction is written like this, <arithmetic part> ||| <move part> e arithmetic and move parts can be further separated into dual arithmetic and dual move instructions operating on the X and Y data path, respectively, which means that an arithmetic-move instruction can execute up to 4   providing instruction memory space optimization.e two operations are not performed in parallel but serially.eir syntax is like this, <first move part> >>> <second move part> e Z-Phone Belasigna 300 Speex implementation contains 4975 CFX assembly instructions.e distribution among the different types is shown in Table 5.
Arithmetic-move instructions can perform from 1 up to 6 operations per cycle.Practically, it is very challenging if not impossible to use this feature at its full potential.Our Speex implementation has used arithmetic-move instructions varying from 1 up to 5 operations per cycle.Obviously, the performance gain is more significant for instructions (and their operations) inside a loop.Table 6 presents the distribution of the arithmetic-move instructions used in the Speex implementation, based on their loop level and also their operations per cycle count.
As mentioned before, the move-move instructions consist of two move instructions that execute in 2 clock cycles.However, each of these move instructions may perform one additional pointer register arithmetic operation, improving this way their efficiency.e distribution of move-move instructions based on their loop level and the number of operations per 2 clock cycles is shown in Table 7.
Because only of the implemented instruction-level parallelism the codec execution time (measured in clock cycles) is reduced by one-third on average in all test cases using several different test speech samples and profiling data, as opposed to the total number of individual Speex Belasigna 300 operations and equal number of clock cycles in a serial execution.is benefit is added to the speed-up achieved already through system-level parallelism.Performance benefit through instruction-level parallelism can increase considerably, if our Speex Belasigna CFX assembly code is carefully re-engineered to exploit better the CFX parallelism potential.e adoption of parallel programming principles for task decomposition involving the identification of tasks that can be done concurrently (linear, iterative, and recursive task decomposition), data structures (input/output/ intermediate) or parts of them that can be managed at the same time, and the dependencies that impose ordering constrains (synchronization) and data sharing can help maximize the speed-up of code execution through maximizing concurrency and minimizing parallelization overheads.Illustrating the importance of instruction-level parallelism, if it had not been implemented in the codec port, the execution time plots of Figure 5 would have raised along the y-axis above the threshold line (marginally for complexity value 1 in the 8 kbps mode), jeopardizing the feasibility of the application.Besides the gains through instruction-level parallelism, the codec execution time can be further reduced through effective use of the HEAR coprocessor/accelerator.

Conclusions and Future Work
e Zigbee standard is designed to enable the deployment of low-cost, low-power wireless sensor and control networks based on the IEEE 802.15.4 physical radio standard.Despite the low data rates of Zigbee, its use for transmission of voice is feasible.In this paper, we prove that robust and good quality conversations can be implemented over Zigbee.In fact, the radio channel of Zigbee has enough bandwidth to support a full-duplex conversation with narrow-band voice codecs.
We have discussed various aspects of the selection process for a voice codec for high-quality voice transmission over Zigbee.e initial "obvious" choice for a good quality codec around 8 kbps was a G.729 flavour.When we realized the cost implications of such a choice on a commercial product, we looked into the open-source community and discovered a wide selection of high-quality solutions.e results of the porting effort are presented showing that it is feasible to use a narrow-band codec in a real-environment embedded application with strict lowpower requirements and bandwidth limitations such as a wireless headset based on Zigbee.We managed to achieve real ultralow-power operation, while including all the necessary analogue interfaces which proved to be also a huge benefit.Finally, we briefly presented the Z-Phone headset unit and the results of the transmission range and power consumption tests involving the developed embedded system.
is is one of the very few expert engineering efforts known in the literature achieving a successful porting of a (royalty-free) speech codec on an ultralow-power DSP enabling voice communications in restricted ultralow-power devices and networks.
Future work will focus on significant project extensions concerning response time and robustness in front of interference regarding the use of the system as a headset for voice communications, as well as towards the migration of the developed technology in other applications, such as Voice Directed Warehousing (VDW) and in the promising and challenging domain of Wireless Sensor Network (WSN) applications.
Regarding the first point, response time, we intend to demonstrate that voice communication over Zigbee can be held using a repeater node. is would provide a coverage range significantly bigger when compared with Bluetooth headsets and double the current achievement.
Regarding the second point, robustness in front of interferences, we intend to take into account that many users will be using the system, even simultaneously, and the environment (e.g., an office) will be changing.e objective of this task will be to develop a method for effective use of the available channels and ability to react to interferences that might appear during a voice conversation while ensuring that the number of simultaneous conversations that can be held in the same space (office) is high enough to be used in a realistic environment.
e number of available Zigbee channels, 11, will be sufficient for supporting a sufficient number of users.Regarding the third point, the integration of the Z-Phone system in VDW makes it necessary considering a different use scenario in which, although the core of the headset for VoIP system is valid, some additional characteristics must be taken into account, such as audio quality, dynamic routing procedures, conversation profile, channel sharing to increase number of users, and delay tolerated.In the broader WSN/ VoSN (Voice over Sensor Network) scenario, voice streaming will need to address the constraints in terms of communication bandwidth, computational power, and energy budget that severely affects the actual streaming capabilities of lowpower wireless sensor devices.Several metrics of multihop communication such as throughput, jitter, latency, and packet loss will have to be measured and analyzed.
Last but not least, a variable bit-rate codec implementation will be tested, using the highly-scalable novel open codec Opus (bit rates from 6 Kbps to 510 Kbps, sampling rates from 8 KHz to 48 KHz and frame sizes between 2.5 ms and 20 ms).Opus combines technology from Speex and Skype's SILK and is the first open-source codec which has become an IETF standard, unlike other open-source codecs, such as Speex targeting speech and Vorbis, which is aimed at comparing music and audio in general.Towards this end, the objective will be to perform a deep study of the different possible configurations of Opus, to implement it in a DSP system and to effectively improve the current MOS score for the platform developed in the Z-Phone project.

Data Availability
Reproducing the findings presented in the paper by a third party would require the on-hand availability of the Zigbee system code, besides the target platform, including the implementation of the VoZ transmission mechanism as well as the Speex assembly code implementation on the Belasigna 300 DSP. e private organisations involved in the Z-Phone project, namely, Ateknea Solutions and inAccess Networks, would not allow disclosing the system code and depositing it in a publicly available data repository.

Figure 2 :
Figure 2: Graphical representation of ticks measurements distribution for 80 bytes payload.

Table 1 :
Communication test results in function of payload.

Table 2 :
Distribution of 10000 tick results in function of payload.

Table 4
summarizes the power consumption gures.e typical Z-Phone talk time is approximately 7.51 hours.In deep-sleep mode Z-Phone can demonstrate an impressive lifetime reaching up to 2362 h (98.42 days) before battery charging is necessary.As expected, Z-Phone takes full advantage of the exceptionally ultralow-power consumption of the Zigbee transceiver in deep-sleep mode (1 μA).e Bluetooth reference system talk time is reaching 7.88 h over an eSCO link carrying EV3 voice packets and 8.23 h over a SCO link; however, at almost the same power budget, Z-Phone integrates the DSP voice encoding/decoding and analogue microphone/audio interface.In our low-power optimized Bluetooth reference system design, the standby time can reach up to 175.91 h (mode not available in

Table 4 :
Power consumption: Z-Phone vs Bluetooth.

Table 6 :
Distribution of Speex arithmetic-move instructions based on loop level and number of operations per cycle (1-5).

Table 7 :
Distribution of Speex move-move instructions based on loop level and number of operations per 2 clock cycles.