Using Imperfect Transmission in MEC Offloading to Improve Service Reliability of Time-Critical Computer Vision Applications

The emerging time-critical Internet-of-Things (IoT) use cases, e.g., augmented reality (AR), virtual reality (VR), autonomous vehicle etc., on the one hand, involve computation intensive computer vision (CV) components in the services, on the other hand, the computation task should be completed within the stringent latency constraint, otherwise, the service reliability will deteriorate. The state-of-the-art work has shown that it is promising to tackle this challenge by offloading the computation tasks to the edge servers (ESs) using mobile edge computing (MEC). However, offloading tasks from local IoT devices to remote ESs could cause communication errors, thereby resulting in transmission failure even service timeout. The existing work mainly requires perfect transmission during task offloading at physical layer or transport layer. In fact, CV algorithms for, e.g., image classification and recognition, are able to tolerate certain level distortion of the input image to maintain the required inference accuracy. In this paper, we focus on the service reliability at application layer and study how feasible it is to improve the service reliability of the time-critical CV services in MEC system by allowing imperfect transmission. The service reliability is modeled by the transmission failure probability, service timeout probability and inference accuracy. The optimization goal is to maximize the service reliability, subject to the latency constraint. Due to the non-convexity, we solve this problem by the semi-definite relaxation based algorithm for a multi-user scenario. We evaluate the algorithm considering the practical scenarios, i.e., object detection with SSD and YOLOv2. The proposed algorithm achieves the performance closed to the exhaustive method but at a much lower complexity.


I. INTRODUCTION
A variety of time-critical Internet-of-Things (IoT) use cases, e.g., augmented reality (AR), virtual reality (VR), autonomous vehicle and remote surgery etc., have attracted widespread interests in recent years. These uses cases are facing the challenges of the stringent latency and high reliability. For example, the cooperative collision avoidance service in autonomous driving requires a maximal latency of 10 ms and reliability of 99.99% [1]. However, due to limited computation capability of user equipments (UEs), it is challenging The associate editor coordinating the review of this manuscript and approving it for publication was Adnan Shahid .
for UEs to complete the computation intensive tasks within the delay deadline merely by the local processing [2]. Mobile edge computing (MEC) is a promising paradigm to shorten service latency [3], [4], by offloading the computational tasks from UEs to powerful computation units in the proximity to UEs, i.e., edge servers (ESs). However, the communication errors during the offloading process could deteriorate the transmission reliability and latency, thereby decreasing overall service reliability.
Generally speaking, the service reliability is affected by several factors. For the use cases like AR, VR and autonomous vehicle etc., object detection and classification [5], [6] are one of the most popular computer vision (CV) computation tasks. The overall service reliability depends on the inference accuracy of CV algorithms, service timeout probability, and transmission failure probability. The service latency and transmission reliability have significant impact on the user of experience (UoE) for the time-critical CV services in MEC system, in spite of high inference accuracy. To improve UoE of the CV services, the state-of-the-art work mainly requires perfect communication, i.e., zero bit error or packet error during task offloading [7]- [9]. When a transmission error occurs, retransmission scheme is used to guarantee reliable data delivery between UE and ES, which could exacerbate the communication delay. For example, when an AR headset transmits a captured image to an ES, transmission control protocol (TCP) is used to ensure all the packets are delivered to the ES. Due to the overall latency constraint of the time-critical IoT applications, increasing communication delay means reducing computation time budget. As a result, it could increase the service timeout probability.
In fact, CV algorithms for, e.g., image classification and recognition, are able to tolerate certain level distortion of the input image, while keeping the required inference accuracy [10]. For instance, the faces in Extended Yale B Face Database can be still recognized if 13.3% or fewer pixels of an image are corrupted [11]. Allowing a small image distortion, it can be used to relax the transmission reliability requirements. In other words, a few errors are acceptable for a task transmission, e.g., image sent by UE, which can help to reduce the service timeout probability while maintaining the service accuracy.
The main contribution in this paper is that we study how feasible it is to improve the overall reliability of the time-critical CV services in MEC system allowing imperfect transmission. The MEC system with multiple UEs and ESs is considered. We systematically model the latency and reliability of the time-critical CV services under image distortion, and formulate the optimization problem to maximize the overall service reliability, subject to the latency constraint. Due to non-convexity, the semi-definite relaxation (SDR) based algorithm is designed to achieve a sub-optimal solution for the UE-ES association, as well as communication and computation resource allocation. We evaluate the performance of the proposed algorithm using the practical object detection methods (SSD [5] and YOLOv2 [6]) in the multiple scenarios. The results show that the SDR based algorithm achieves the performance close to the exhaustive method at a much lower complexity.
The rest of the paper is organized as follows. Section II elaborates the related work. Section III describes the system model and problem formulation. Section IV explains how the SDR-based optimization algorithm solve the formulated problem. Sections V presents the numerical results. Finally, Section VI concludes the paper.

II. RELATED WORK
Many researchers have studied the reliability of computation offloading in MEC from the perspectives of different layers at the communication protocol stack. The work in [12], [13] studies the reliability in the application layer. Merluzzi et al. define an out-of-service probability as the probability that the overall latency to complete the task exceeds a threshold. They propose the algorithm to strike a balance between energy consumption and task delay for computation offloading, subject to the reliability constraint. Liu et al. [13] model the reliability of computation offloading as the probability that the queue lengths at both ES and UE do not exceed their thresholds. The work in [14]- [17] considers the reliability of wireless transmission at PHY layer. Azimi et al. [14] study the transmission failure probability using superposition coding on wireless fading channel, and optimize the energy consumption with the constraints of the latency and reliability. Liu and Zhang [15] jointly minimize the transmission failure probability and latency allowing multiple ESs cooperatively to complete a task. Zhou et al. [16] maximize the successful transmission probability within latency constraints by optimizing the transmitted data bits in each time slot. Wang et al. [17] consider the successful transmission probability under channel condition in the worse case and leverage the conditional value-at-risk approach to ensure the reliable transmission for computation offloading. Moreover, other work in [3], [18], [19] proposes to consider the computation reliability of ES, i.e., the probability that the hardware or software break down. Haber et al. [18] introduce multiple Unmanned Aerial Vehicle mounted Cloudlets as ESs and leverage redundancy to meet the reliability requirements for the computation offloading. The work in [3], [19] proposes to jointly improve both communication and computation reliability of offloading. Liu and Zhang [3] model a task as a directed acyclic graph and schedule the allocation of sub-tasks to minimize the transmission and computation failure probability. Liu and Zhang [19] propose an online algorithm to maximize the communication and computation reliability by dynamic selections of transmission rates and ESs.
For CV services, the inference accuracy of the CV algorithms could be regarded as the computation reliability. Offloading the CV tasks to MEC could allow advanced CV algorithms to be applied to the CV tasks with reduced CV service latency while guaranteeing the computation reliability. Various offloading strategies are proposed in literature, including input rescaling [7], DNN partitioning [9], [20], [21], and early-exit points in DNN for fast inference [8], [22]. Liu et al. [7] propose to dynamically rescale the input images to adapt to the degradation of wireless channel, thereby, make a tradeoff between the task runtime and the inference accuracy. Instead of offloading the input images to ES directly, the work in [9], [20] partitions DNN computation between UE and ES at the granularity of NN layers. Kang et al. [20] study the optimal partitioning points of 8 different DNN models to minimize the overall latency and energy consumption. As the size of the intermediate layer data in DNN is large, to reduce the transmission delay on wireless channel, Eshratifar et al. [9] utilize JPEG compressor to encode the intermediate data and introduce a fine-tuning method to reduce the inference accuracy loss caused by the lossy compression. Instead of partitioning the DNN between UE and ES like [9], [20], Zhang et al. allocate the DNN task among multiple ESs and propose an algorithm to jointly optimize the task latency and resource utilization in the MEC system. Furthermore, the work in [8], [22] leverages early-exit points in DNN to reduce the runtime of CV services. Teerapittayanon et al. [22] design a new DNN architecture, i.e., BranchyNet, where several early-exit points are set in the DNN and the inference could be terminated at the early-exit point reaching high confidence. Li et al. [8] combine the early-exit points with DNN partitioning in MEC system. They partition the computation of a shallow NN obtained by early exiting the test DNN. To minimize latency, the early exit point and partitioning point are dynamically optimized, according to the channel data rate and the runtime of each DNN's layer profiled at the offline stage.
Most of the previous work prefers to use reliable transmission scheme to offload CV services to MEC, which guarantees the inference accuracy but could result in retransmission delay and control overhead. On the contrary, Liu and Zhang [23] conduct a series of experiments to investigate how image distortion, caused by unreliable transmission, affects the inference accuracy and service reliability of CV tasks in MEC system in real-life environment. Compared to the work in [23], in this paper, we analyze the overall service reliability of CV services in the MEC system with multiple UEs and multiple ESs, and systematically model the communication reliability at PHY layer, service timeout probability, and the computation reliability, i.e., the inference accuracy of the CV services.

III. SYSTEM MODEL AND PROBLEM FORMULATION A. SYSTEM MODEL
As illustrated in Fig. 1, we consider an MEC system with multiple UEs and ESs. To facilitate the multiple ESs, a softwaredefined networking (SDN) architecture could be introduced to the MEC system [24], [25]. The SDN controller, referred to as an edge orchestrator, keeps the global status information updating in the system, e.g., channel state information (CSI) between UEs and ESs, available computing capacity of ESs, service requirements etc., and optimizes the resource allocation and offloading strategy for UEs. UEs are granted communication and computation resources through the control plane after service requests. The CV tasks are subsequently offloaded to ESs in the data plane. The physical communication channel of the control plane is independent from that of the data plane in the SDN architecture. In this paper, we mainly focus on the latency and reliability of computation offloading in the data plane.
The sets of UEs and ESs are denoted by M and N , respec- The computing capacity of ES n is f max n (in cycles/s). The computation task of UE m is denoted by a tuple (b m , α m ). b m represents the task size (in bits), while α m is the computation intensity of the task, i.e., the number of the required CPU cycles to compute one-bit task (in cycles/bit). We use a vector x m to denote the association between UE m and ESs, where If UE m is associated with ES n, the element x m,n in x m is 1; otherwise, x m,n = 0. Note that a UE is associated with one of ESs, i.e., n∈N x m,n = 1 for ∀m ∈ M. The wireless channels of different ESs are orthogonal. UEs offload tasks to their associated ESs with orthogonal frequency-division multiplexing access (OFDMA) scheme. k max n denotes the total number of resource blocks (RBs) on the channel of ES n. k max n RBs are shared by the associated UEs.
Due to the fact that data size of a task output is normally much smaller than the input one, the latency caused by downlink transmission is negligible [7], [13], [15].

B. LATENCY AND RELIABILITY MODEL
The latency to complete a task is composed of the communication and computation latency. Due to the fluctuation of wireless channel, it is required to utilize the proper modulation and coding schemes (MCSs) to ensure a constant transport block error rate (BLER). The proper MCS for the task transmission could be determined based on the estimated CSI in practice. How to optimally select MCS based on CSI for different communication channels is a classical link adaptation or rate adaptation problem in wireless and mobile communication, which has been well studied in the literature [26]- [28]. Note that the selected MCS has the fixed transmission rate on each RB. Therefore, in this article, we mainly focus on the RB and computation allocation among UEs. For a given MCS, γ m,n is used to denote the transmission rate of the RB from UE m to ES n (in bits/s). k m,n denotes the number of RBs assigned to UE m by ES n, which meets m∈M k m,n ≤ k max n . Thus, the latency to transmit the task from UE m to the associated ES is We assume that ES n assigns computing capacity of f m,n to ES m, where m∈M f m,n ≤ f max n . As a result, the task runtime of UE m at the associated ES is Therefore, the latency to complete the task of UE m is The overall service reliability is modeled as a quality of inference (QoI) which includes transmission failure probability, timeout probability and inference accuracy. For a given MCS, the number of transport blocks (TBs) to offload a task of UE m, N TB m , can be calculated by where τ is the duration of one TB (in seconds), and · is a ceiling function. Given the targeted BLER is η, the conventional way to define the transmission failure probability of UE m is 1−(1−η) N TB m [14]- [16]. However, when small distortion of an image is allowed, it means it can be regarded as a successful transmission if the number of lost TBs is not more than a threshold. Thus, considering TBs are independent with each other, the transmission failure probability can be modeled as where · is a floor function, θ denotes the maximal allowed percentage of the distorted pixels (0 ≤ θ ≤ 1). Moreover, due to the latency constraint, the timeout probability of UE m is F to m = P{T m > δ m }, where δ m is the latency threshold for UE m. In this way, the offloading failure probability (OFP) of UE m is defined as Note that, for the case that completes one task only, the OFP is equivalent to We assume the inference accuracy of UE m, A Infer m , can be guaranteed, if the distortion in an image is not more than θ. Hence, the QoI of a CV service is defined as

C. PROBLEM FORMULATION
We formulate an optimization problem to maximize the average QoI of multiple UEs allowing partial distortion in a task, subject to the latency constraint, which is

IV. SDR-BASED OPTIMIZATION ALGORITHM
In this section, we propose the algorithm based on SDR to solve P1. The key idea is to relax the problem into a quadratic constrained quadratic programming (QCQP) problem. Then the transformed problem is homogenized and solved by dropping the rank-one constraint.
P2 is a mixed-integer nonlinear programming problem.  (N + 2). Then a new variable is introduced by Z = y 1 y T , 1 , which is rank-one symmetric positive semidefinite matrix. Assuming Q = M (N + 2), the (Q + 1, Q + 1)-th element in Z is equal to 1, i.e. Z Q+1,Q+1 = 1. Therefore, the equivalent formulation of P4 is presented as It is noticed that, except for the constraint Rank(Z) = 1 in (13e), the objective function and the other constraints in P5 are convex. To solve the problem, we relax P5 by dropping the rank-one constraint. The relaxed problem can be solved by a convex programming solver, e.g., CVX [29], of which the solution is denoted byẐ. IfẐ is a rank-one matrix, it is obviously the solution of P5. Otherwise, if Rank(Ẑ) > 1, it is necessary to convertẐ to a feasible solution of the problem.
Based on [29], the randomization way is utilized to obtain the feasible solution. We first extract Z S , the upper-left Q × Q sub-matrix ofẐ. Then a random column vector ξ is generated, following a Gaussian distribution with zero mean and covariance Z S , i.e. ξ ∼ N 0 Q×1 , Z S . Here ξ is a random sample in the solution space of P4. However, it may not meet the constraints of the problem. According to (11a), ξ is rescaled to approximate the feasible solution bŷ . We subsequently utilize a sigmoid function sig(x) ≡ extract the first Q elements from the diagonal ofẐ; 4: calculate square roots of extracted diagonal elements; 5: obtain optimalx m ,k m andf m based on the square roots; 6: else 7: extract the upper-left Q × Q sub-matrix ofẐ as Z S ; 8: for l = 1 to L do (l ≤ L) following the above-mentioned steps, respectively. During the L set of results, the one that maximizes the objective function of P1 will be the final solution. We summarize the proposed SDR-based Algorithm in Algorithm 1. We provide the analysis of the computational complexity for the proposed algorithm as follows. A worst-case complexity to solve the semi-definite programming problem is O max{g, h} 4 h 0.5 log (1/ ) [30], where g denotes the number of the constraints, h denotes the dimension of the symmetric matrix Z, and is the solution accuracy ( > 0). In our case, the number of the constraints is 2M + 2N + 1, while the dimension of Z is MN + 2M + 1. Moreover, the maximal number of loops to obtain the feasible solution of P5 is 2LMN . Therefore, for the MEC system with multiple UEs, the overall complexity of the proposed algorithm in the worst case is O (MN + 2M + 1) 4.5 log (1/ ) + 2LMN .

V. NUMERICAL RESULTS
In this section, we evaluate the proposed SDR-based algorithm in different scenarios. The number of the ESs in the system is set to 3. The channel bandwidth of each ES is 10 MHz with 50 RBs. The signal-to-noise ratio (SNR) between the UEs and the ESs follows a uniform distribution. Based on the SNRs, the UEs select the proper MCSs to guarantee that the BLER does not exceed 10 −7 . The SNR-MCS mapping is obtained by the link abstraction model according to the LTE networks [28]. Our algorithm can still work if we use the SNR-MCS mapping from a different wireless communication system. It is possible to have various MCSs by combining different modulations (QPSK, 16QAM, 64QAM) and coding rates (from 1/9 to 9/10). We use object detection as an example of CV algorithms used in IoT applications. The maximal tolerable image distortion θ is set to 13% [11]. Table 2 summarizes the default simulation parameters.
We compare the SDR-based algorithm with the three schemes: 1) Exhaustive scheme; 2) Distance-based algorithm, 3) the scheme with perfect input image, i.e., the case θ = 0%. For the Distance-based algorithm, a UE is always associated with the closest ES. Based on the associations, the RBs and computing capacity are allocated by the interior-point method. We evaluate the QoI and OFP of the algorithms. The numerical results are based on an averaged value over 5000 Monte Carlo simulations. Note that the timeout probability could be caused by both the resource limit in the system and the unfeasible solution of the algorithms.
In Fig. 2 and Fig. 3, we present the performance comparisons with the different task sizes, i.e., the input image size for object detection. The tick labels on x-axis mean the dimension of the images, e.g., 300x denotes a 300 × 300 image. The ticks 300x, 375x, 416x, 512x and 600x correspond to the task size of 0.27 Mbits, 0.42 Mbits, 0.52 Mbits, 0.78 Mbits and 1.08 Mbits (8 bits/pixel in gray scale), respectively. The number of the UEs is set to 5. Fig. 2 shows that the OFP of  the proposed algorithm is approximated to that of the Exhaustive scheme but with much lower computational complexity. The Distance-based algorithm has much higher OFP than the SDR-based algorithm, which is mainly caused by higher timeout probability. Compared with the 3rd scheme, i.e., using perfect input image, the proposed algorithm allowing 13% distortion improves the OFP by one order of magnitude, when the task size is not larger than 416x. For the case 2 δ m = 33 ms, when offloading images of 512x and 600x, the SDR-based algorithms have similar OFP, no matter distortion is allowed or not. The reason lies in that the timeout probability dominates the OFP due to the limited resource to process the tasks of large size. When δ m is relaxed to 40 ms, compared with the scheme with perfect input image, it can reduce OFP nearly one order of magnitude by allowing 13% distortion. Fig. 3 illustrates how task size impacts QoI of the schemes. The inference accuracy of different image sizes is modeled by A Infer = 1 − 1.578e −6.5 * 10 −3 d img [7], where d img denotes the dimension of the image. Note that A Infer increases 2 The frame rate of camera is normally 30 frames per second.  with d img increasing. It shows that the SDR-based algorithm has the similar QoI as the Exhaustive scheme. It achieves the maximal QoI when the dimension of image is 512x, at which it strikes a balance between the OFP and inference accuracy. The proposed algorithm clearly outperforms the Distance-based scheme. In addition, we only show the performance of the SDR-based algorithm allowing 13% distortion in Fig. 3, as the algorithm with perfect input image does not improve the QoI performance. Note that the 13% distortion here is the upper bound that the CV algorithms can allow. On the one hand, once the distortion of the input image exceeds 13%, the accuracy of object detection declines rapidly, thereby significantly decreasing the QoI. On the other hand, the SDR-based algorithm allowing distortion of less 13% will achieve similar OFP and QoI to the case that θ = 13%, since the service latency is same and inference accuracy is very close. Fig. 4 and Fig. 5 show how the computational intensity of a task, α m , affects the performance of the algorithms. Here, the number of UEs is 5, and the image size is 512x. The values on the x-axis, i.e., 167 cycles/bit, 238 cycles/bit, 500 cycles/bit and 825 cycles/bit, are obtained by testing MobileNetV1-based SSD, MobileNetV2-based SSD, YOLOv2, and VGG16-based SSD, respectively. In Fig. 4, the OFP of the SDR-based algorithm is lower than that of the other schemes. For the case that α m is 167 and 238 cycles/bit, the proposed algorithm achieves one order of magnitude lower OFP than the case of zero distortion when δ m = 66 ms. When processing the tasks with larger computation intensity α m and smaller latency threshold δ m , certainly the OFP of the algorithms increases and is gradually dominated by the timeout probability. Fig. 5 illustrates the QoI with varying computation intensity. Note that, without time constraint, the maximum inference accuracy of MobileNetV1-based SSD, MobileNetV2-based SSD, YOLOv2, and VGG16-based SSD is 0.725, 0.736, 0.786 and 0.799, respectively, which is tested on the dataset of Pascal VOC2007. It shows the proposed algorithm is able to run the advanced object detection algorithms to achieve high QoI under the different time constraints. The reason is that the proposed algorithm optimizes the radio resource allocation and shortens the communication latency, thereby, obtaining more time budget to carry out the computation. For example, when δ m = 40 ms, although the tested two schemes can both run MobileNetV2-based SSD (α m = 238 cycles/bit) to detect objects in the image; however, the proposed scheme can achieve the QoI of 0.736 which is the maximum inference accuracy of MobileNetV2based SSD, while the Distance-based scheme is better to use MobileNetV1-based SSD (α m = 167 cycles/bit) instead achieving the QoI of 0.69. When relaxing δ m to 66 ms, the proposed algorithm is capable to execute YOLOv2 (α m = 500 cycles/bit) to achieve the maximal QoI of 0.786.
In Fig. 6, we compare the change of OFP of the tested algorithms under different number of UEs. The task size is set to 0.78 Mbits. The OFP of the proposed SDR-based algorithm is much lower than that of the Distance-based algorithm. When δ m = 33 ms, the OFP of the SDR-based algorithms increases monotonically with the increasing number of UEs, no matter distortion is allowed or not. This is due to the fact that when δ m = 33 ms, the timeout probability dominates the OFP. However, when δ m is relaxed to 40 ms, it achieves lower OFP if distortion is allowed. Furthermore, the differences of the OFP between the proposed algorithms and the Distance-based scheme are enlarged significantly.

VI. CONCLUSION
The CV algorithms for, e.g., object detection and classification, are able to tolerate certain level of image distortion while keeping the required inference accuracy. Motivated by this feature, we investigate the impact of image distortion on the overall service reliability of the time-critical IoT using edge intelligence. We systematically model the reliability as QoI, and formulate the optimization problem to maximize the average QoI for the multi-UE scenarios. The problem is solved by the SDR-based algorithm, which sub-optimally allocates communication and computation resources in the MEC system. The numerical results show that the proposed algorithm achieves similar performance as the Exhaustive scheme but at lower complexity. By allowing small distortion in the image, it can reduce the OFP by one order of magnitude, without any compromise in the inference accuracy. The proposed algorithm is applicable to determine the optimal deep learning method for the time-critical IoT service by striking a balance between the OFP and the inference accuracy.
In the current work, we assume the pixels have equal importance for pattern recognition or objective detection tasks. In practice, it might not be the case. Therefore, it is interesting to study different resource allocation schemes and offloading strategies that jointly taking into account image encoding.