Online data poisoning attack against edge AI paradigm for IoT-enabled smart city

: The deep integration of edge computing and Artificial Intelligence (AI) in IoT (Internet of Things)-enabled smart cities has given rise to new edge AI paradigms that are more vulnerable to attacks such as data and model poisoning and evasion of attacks. This work proposes an online poisoning attack framework based on the edge AI environment of IoT-enabled smart cities, which takes into account the limited storage space and proposes a rehearsal-based buffer mechanism to manipulate the model by incrementally polluting the sample data stream that arrives at the appropriately sized cache. A maximum-gradient-based sample selection strategy is presented, which converts the operation of traversing historical sample gradients into an online iterative computation method to overcome the problem of periodic overwriting of the sample data cache after training. Additionally, a maximum-loss-based sample pollution strategy is proposed to solve the problem of each poisoning sample being updated only once in basic online attacks, transforming the bi-level optimization problem from offline mode to online mode. Finally, the proposed online gray-box poisoning attack algorithms are implemented and evaluated on edge devices of IoT-enabled smart cities using an online data stream simulated with offline open-grid datasets. The results show that the proposed method outperforms the existing baseline methods in both attack effectiveness and overhead.


Introduction
Artificial Intelligence (AI) services have been widely adopted in various fields of smart city such as industrial manufacturing, enterprise services and daily consumption.These services, including unmanned driving, e-commerce, smart homes, and smart finance, have profoundly transformed people's lifestyles and enhanced production efficiency [1,2].Edge computing has become popular due to its advantages of ultra-low latency, energy efficiency, and strong scalability, which allows it to share the computing resources and service pressure of the cloud center and optimize the computing architecture of AI services.This in turn creates favorable conditions for pushing the AI frontier to the IoT (Internet of Things)-enabled edge, which resides at the last mile of the Internet [3].The continuous convergence of edge computing and artificial intelligence has led to the emergence of a new paradigm called edge intelligence (edge AI paradigm) [4][5][6].The edge AI paradigm enables end entities in the networks to make decisions based on local data instead of sending it to the remote cloud [7].The deployment of AI models on edge nodes enables AI training and inference and provides AI services to terminal devices.However, the edge AI paradigm is more vulnerable to attacks due to less potent security protocols on the resource-constrained edge hardware [8].In the edge AI environment, attackers can easily masquerade as legitimate user terminals to generate malicious data online and attack the edge AI model.Therefore, it is imperative to evaluate potential attacks that can target AI models at the edge, especially in the context of smart cities.
Among the potential attacks, the most destructive attack is data poisoning attacks (DPA).Current offline DPA are not suitable for the online learning process used in the edge AI paradigm, where most learning tasks involve predicting continuous data rather than classification, unlike the image processing or classification scenarios that data poisoning attacks primarily focus on.Although some studies have investigated online DPA based on resource-rich environments, these methods are not applicable in resource-constrained IoT-enabled smart cities environments, where the problem of periodic overwriting of training samples cannot be handled.Moreover, existing online attack methods use randomly selected sample points for attacks, which are not ideal for expensive bi-level optimization attack strategies.Therefore, existing research on DPA is not suitable and there is a need to optimize existing online attacks to adapt to resource-constrained environments, while enhancing the efficiency of attacks under online mode.Therefore, the main contributions of this work are as follows: • It proposes an online poisoning attack framework based on the edge AI environment of IoTenabled smart city for the first time.The framework takes into account the limited storage space in the AI edge environment and proposes a rehearsal-based buffer mechanism to manipulate the model by incrementally polluting the sample data stream that arrives at the appropriately sized cache to optimize the efficiency of the attack.
• It proposes a maximum-gradient-based sample selection strategy that overcomes the problem of periodic overwriting of the sample data cache after training.This strategy converts the operation of traversing historical sample gradients into an online iterative computation method.
• It proposes a maximum-loss-based sample pollution strategy that solves the problem of each poisoning sample being updated only once in the gradient ascent direction in basic online DPA.This strategy transforms the bi-level optimization problem from the offline mode to the online mode.
• It implements online gray-box poisoning attack algorithms with the framework and strategies mentioned above.It evaluates the effectiveness and overhead of the proposed attack on edge devices of IoT-enabled smart city using an online data stream simulated with offline open-grid datasets.
The rest of this paper is organized as follows.Section 2 presents the related works on data poisoning attack.Section 3 describes the basic settings, symbol notations and related issues of the five elements relevant to offline and online DPA.Section 4 presents an online incremental poisoning attack framework in the edge AI environment of IoT-enabled smart cities and provides a detailed description of the proposed sample selection and pollution strategies.Section 5 introduces online algorithms for gray-box poisoning attack with maximum-gradient-based sample selection strategy and maximumloss-based sample pollution strategy.Section 6 presents the experiment and result analysis.Finally, Section 7 concludes this paper.

Related works
Offline DPA has received extensive attention in the research community, mainly focusing on interfering with the training process of offline or batch learning algorithms.In this setting, attackers repeatedly poison randomly selected samples in the direction of maximum gradient and construct a poisoned sample set that maximizes the loss.At the end of attacks, the constructed poisoned sample set is inserted into the end of the legitimate sample set one-time.Since the pioneering work of the Biggio team [9], DPA has undergone significant development, and a large amount of research has been carried out based on their work.Among them, the Mei team [10] formalized the poisoning problem as a bi-level optimization problem.To improve efficiency, some teams have proposed label flipping [11] and statistically-based [12] poisoning methods, which do not require model fitting and reduce the algorithmic complexity.Although these two methods have a low algorithmic complexity, they are easily detected and discarded by human examiners or automated detectors.For most machine learning or artificial intelligence models [13], the gradient ascent method is the most computationally expensive method, but it is the most effective and confidential [14].
Online DPA has drawn increasing attention in recent years.In the setting of online DPA, attackers contaminate the arriving samples in a specific order to achieve the attack objective of accumulating loss.There are four main challenges brought to offline DPA in online environment.First, due to the inability to obtain the entire sample set, the baseline clean dataset for constructing poison samples can only be built from the current cache or historical sample set.Second, the order in which samples arrive is also a factor to be considered in poisoning attacks.Third, offline DPA can poison any position in the sample set, while in online mode, only the current cache samples can be poisoned.Fourth, high-cost attack methods in offline mode may become inappropriate.To solve the problem of unknown sample sets, Burkard and Lagesse [15] proposed heuristic attacks against support vector machines (SVM) learning from data streams.This method is more like fake online attacks (with full knowledge of future samples, referred to as the clairvoyant online DPA [16]), which obviously does not conform to the premise assumption of online mode.Zhang et al. [16] and Margiotta et al. [17] used the Markov decision process method to model the online DPA problem, which is also based on the premise of knowing the probability distribution of the samples.Although they also propose to build an increasingly accurate empirical distribution from historical sample data, it cannot solve the problem of high cost of model predictive control and sample distribution bias.Some papers [18][19][20] have studied the calculation and optimization methods of sample influence, but unfortunately, these methods are based on the hat matrix or Hessian matrix constructed from the entire sample set in offline mode and are not applicable to online mode.The work closest to ours is Wang and Chaudhuri [21], applied gradient-based offline methods to online DPA and proposed a sample selection method based on maximum recursive gradient.Moreover, in edge AI environments where historical samples are periodically overwritten, the absence of some historical samples makes it impossible to compute the recursive gradient.Therefore, our approach differs from theirs in that we adopt a rehearsal buffer-based method for calculating recursive gradient incrementally, which addresses the issue of periodic overwriting of the sample data cache after training in edge AI environment.

Preliminary
AI models typically contain five elements [22]: feature space, learning type (e.g., regression or SVM), learning algorithm, learning hyperparameters and training datasets.Based on attackers' degree of knowledge over these five elements and the type of elements involved, DPA can be classified into different types.This section describes the basic settings, symbol notation and related issues of the five elements relevant to offline and online DPA.The definitions of the symbols used in this paper are shown in Table 1.The samples that have been trained in the past time slices

𝛾
The poisoning rate

𝑛
The number of poisoned samples in time slice  _ Feature vector of the sample with the highest gradient

𝑝
The size of rehearsal buffer

Basic setting and notation
For the feature space ℝ , the total number of feature vectors and the dimension of each feature  (1).Equation (2) gives the expression of the objective function, where   ,  represents the loss function,   and  represent the regularization term and their corresponding regularization factor.Equation (3) gives the iterative process for solving the objective function using a learning algorithm (taking gradient descent algorithm as an example), where  represents the learning rate of the iteration,  and  represents the model parameter before and after one iteration of learning and     ,  represents the gradient of the objective function with respect to the model parameter.The learning algorithm terminates and obtains the optimal parameter when meeting the convergence condition  in Eq (4), which is usually set to 1 10 .
Learning algorithms can be divided into two types: offline and online.The former mainly includes algorithms such as stochastic gradient descent (SGD), mini-batch gradient descent (MBGD) and batch gradient descent (BGD).The latter primarily consists of algorithms like online gradient descent (OGD) and online mini-batch gradient descent (OMBGD).The biggest difference between offline and online learning algorithms is the way in which the training sample set is obtained and used [23].The sample set used by offline learning algorithms is known and fixed (can be trained repeatedly), while the sample set used by online learning algorithms is gradually obtained over time (each sample is only trained once) and future samples are unknown.Therefore, DPA is also divided into offline DPA and online DPA based on the different learning algorithms.

Offline DPA
For the above AI models, the basic offline DPA attack can be formalized as a bi-level optimization problem as shown in Eq (5), where  represents the clean test sample set,  represents the poisoned sample set,  * represents the poisoned parameter learned by the model and  is the baseline clean dataset used for constructing  .According to the definition of Eq (1), we can know that the inner optimization  * ∈    ∪,  in Eq (5) represents the usual minimization of the model loss during the fitting of a model on both the clean training dataset  and the poisoned dataset  and that the outer optimization    ,  * represents the maximization of the prediction loss under the influence of the poisoned parameter  * .Equation ( 6    ,  * .. * ∈     ∪  , Figure 1 shows the schematic diagram of offline DPA.In the figure, rectangles represent training samples, where green rectangles represent normal samples and red rectangles represent poisoned samples.Rounded squares represent model parameters, with red rounded squares representing poisoned parameters after the poisoning attack is completed.Diamonds represent decision conditions.Black solid arrows indicate the normal training process, which is demonstrated using the MBGD algorithm as an example in the figure, where the batch size is  ( 1 for SGD algorithm and   for BGD algorithm).The algorithm fits the model and computes parameters once using Eq (3) for each batch of samples until convergence is reached.Red dashed arrows represent the inner optimization loop mentioned in Eq (5), i.e., obtaining new convergent parameters through gradient descent after the poisoned sample is added to the training set.Red solid arrows represent the outer optimization loop mentioned in Eq (5), i.e., updating and maximizing the loss on the training sample set using Eq (6), obtaining the optimal poisoned parameter  * finally.To maintain the generalization of the model, the sample set is randomly reordered after each traversal and  is also selected from the training set randomly, which demonstrates that offline DPA does not consider any order of samples.,  up to time slice .The minimized loss is equivalent to the optimal loss obtained by offline training using all historical samples up to time slice .In the normal training process, the model parameters are updated using the samples in each time slice according to Eq (9).When the convergence condition in Eq (10) is reached, the optimal parameter  * are obtained.By substituting Eq (7) into Eq (10), it can be concluded that the convergence condition is equivalent to judging whether the loss of samples on the current time slice is sufficiently small.Due to the inability to use training samples for fitting the model iteratively as in the offline mode, it is necessary to determine whether the learning algorithm has reached the convergence condition after each parameter update.The red arrows show the process of online DPA.In this process, the attacker first selects an attack sample point in the current time slice, such as choosing the sample point   ,  from time slice ℎ to start the attack.According to Eq (11), the sample is poisoned,  and  represent the samples before and after poisoning.Then, the poisoned sample point is trained together with other normal sample points.In the same way, points are selected, polluted and the model is trained in the subsequent time slices ℎ to ℎ .Finally, at time slice , the convergence condition is reached and the optimal poisoned parameter  * is obtained.

Attack model
We define our attack model following the framework proposed in [24], which involves identifying attacker's goals and describing their knowledge and capabilities.This information is then utilized to define attack strategies.To simplify the problem description and express the proposed method clearly, from this section onward, we assume that the target model for the online DPA attack is linear regression model  ℎ   , with the regularization term set to zero and the batch size as .This means that the learning algorithm employs the OMBGD algorithm and the objective function is defined as the mean squared error loss function in Eq (12).

Attacker's goal
Consistent with offline DPA and basic online DPA objectives, the goal is to poison specific samples to cause mis-predictions selectively, while the availability attack aims to indiscriminately corrupt learning models by poisoning training samples.Unlike the basic online DPA goal in section 4.1, we attempt to maximize the attack effect for each poisoned sample point.

Attacker's knowledge
Based on understanding of the five elements mentioned in section 3, attacks can be divided into white-box, black-box and gray-box types [22].Since online DPA is unknown for future sample streams, attackers can only be aware of some training samples and cannot possess the knowledge of a whitebox attack.In reality, a completely black-box attack is also infeasible, as attackers need to understand partial training samples at least.Therefore, we consider a gray-box attack method for online DPA, where the attackers are assumed to have knowledge of the learning type (e.g., regression), learning algorithm and partial training samples but do not know the trained parameters.Another difference from basic online DPA is that it assumes that the sample data stream will be permanently stored on the AI service's device after training is completed.However, in the edge AI environment, data streams will be periodically overwritten, meaning that attackers can only be aware of the samples in the time slice stored in the buffer and they are unaware of both future samples and some historical samples.

Attacker's capability
The attacker's capability is limited to manipulating the training sample data; that is, altering the training process is not allowed.In basic online DPA, attackers have full control over samples in current or historical time slices.However, in this paper, the defined attacking capability is limited to having full control over the samples stored in the buffer of current time slice.The attack is constrained within a certain range, that is, the poisoning rate up to the current time slice  cannot exceed a certain limit , as it would expose the attack.We define the poisoning rate as  ∑  /Tb, where  is the number of poisoned samples in time slice .The attacker can choose to attack same number of samples in each time slice or vary the number of poisoned samples.In this paper, we assume that the number of poisoned samples in each time slice is same, making it easy to prove that   /b.

Attack strategy
Online DPA can be divided into four stages [24]: sample monitoring, attack point selection, data polluting and stream poisoning.The primary focus of the core strategy setting is on the attack point selection and data polluting stages.These two stages are used to construct the poisoned sample set.Then, the training samples containing the poisoned sample set are replayed to the target model in the final stage.After the attack point selection is complete, pollution of the sample points can involve polluting the feature vectors of the training sample stream, polluting the labels or polluting both simultaneously.Extensive literature has demonstrated that polluting the feature vectors yields the most optimal results.Therefore, the pollution strategy in this section will also focus on polluting the feature vectors.This section will emphasize the description of attack point selection and pollution strategies.

Maximum-gradient-based sample selection strategy
In online DPA, modifying training points at certain positions in the stream may yield high benefits.This strategy could be potentially exploited by a successful attack to reduce the search space.Equation ( 13) presents a gradient-based selection strategy, where at time slice , the target function calculates the gradient for all samples prior to  and the sample with the highest gradient is chosen as the poisoning sample.The rationale behind this strategy is that the gradient is an indicative measure of the target function's variation.A higher gradient at a node implies that the target function changes rapidly at that point.By polluting the sample point in the direction of the gradient ascent, the target function will increase rapidly, thus achieving the desired attack effect.
To compute the gradient of the target function with respect to the samples, we use the recursive gradient given by Eq (14).
By applying the chain rule and substituting Eq (12) into ( 14) and to simplify the expression, we set b 1 in Eq (12), which leads to Eq (15).
From Eq (15), it is clear that to compute the gradient of the current target function with respect to each sample point at time slice , one needs to know the feature vectors of all historical sample points.However, in edge AI environments, due to storage limitation, the sample data cache is periodically overwritten after training.This leads to a situation we define as cache strategy with forgetting.We define a cache strategy similar to that of the sliding window.Figure 3   Based on this storage strategy, the forgotten samples cannot be used to calculate the gradient of the target function with respect to that point using Eq (15), rendering the selection strategy inapplicable.To address this issue, we propose a gradient calculation algorithm based on rehearsal.The main principle of this algorithm is that after each maximum gradient sample point is selected at current time slice, both the point and its corresponding model parameters are stored in a dedicated buffer called the rehearsal buffer.The rehearsal buffer can remember partial historical information when samples are overwritten, and its size is less than or equal to the total number of poisoning sample points.When the next time slice arrives, the sample points in the rehearsal buffer are combined with those in the newly arrived time slice to calculate the new maximum gradient point, as shown in Eq (16).To simplify the expression, we still set the cache size b 1.Meanwhile, we assume the size of rehearsal buffer is , in which the samples are denoted as  ,  ∈  ,  ,  ,  , . .,  ,  .
4.4.2.Maximum-loss-based sample pollution strategy Section 3.3 presents the basic online DPA, in which each poisoning sample is only updated once in the gradient ascent direction and cannot be updated iteratively in the same direction as in offline mode to maximize attack effectiveness.This may result in a prolonged attack process.Online DPA is highly time-sensitive, so that if the attack is not completed within a limited time, it can be easily detected by defense algorithms.Therefore, this paper still attempts to update each poisoning point in the gradient ascent direction to maximize the attack effect.Equations ( 17) and ( 18) provide a bi-level optimization formula for the online mode, which is called the rehearsal-based poisoning strategy.In the inner optimization of Eq (18), the model is trained using the samples from the current time slice and the poisoned samples from the rehearsal buffer to obtain the inner layer parameter  * .In the outer optimization of Eq (17), the model generates the maximum prediction loss under the influence of poisoned parameter  * using the samples of newly arrived time slice.
. . * ∈     ∪   ehearsal buffer,   Algorithm 1 presents the overall implementation process of the rehearsal-based online DPA algorithm.Lines 1 to 6 initialize the attack.Starting from line 7, the main loop of the attack process has begun.Constant  is used to prevent oscillation caused by inappropriate parameter settings.Lines 8 and 9 record the number of current time slice and the total number of samples.Lines 10-12 perform the point selection operation, with the details described in Algorithm 2. After the point selection operation, lines 13-14 pollute each poisoned sample point recorded in the rehearsal buffer one by one, i.e., updating each poisoned sample point in the direction of the maximum gradient.The projection operator  in line 14 projects the polluted sample values into the feasible domain of the feature space.Line 14 adjusts the pollution speed by controlling the learning rate  and decay factor . Line 15 completes the inner optimization process corresponding to Eq (18).Lines 16-18 complete the convergence condition judgment process, wherein the attack ends when the current model parameter achieve the minimum loss in time slice  and the maximum loss in time slice  MSE_min ←   ,  24: end if 25:end while 26:Insert each poison sample from the rehearsal buffer into each time slice of the target data stream one by one Algorithm 2 presents the implementation process of the point selection operation.Line 1 combines the sample points of the current time slice and the sample points in the rehearsal buffer.Line 2 calculates the gradient for each sample in the merged sample set  according to Eq (16) and finds the sample point with the maximum gradient.Lines 3-11 complete the addition and replacement operations for the samples in the rehearsal buffer.Line 3 first determines whether the newly generated sample point  _ is already included in the rehearsal buffer.If it is not, the following steps are performed.Line 4 checks if the current size of the rehearsal buffer has reached the upper limit of the number of poisoned samples.If it has reached and the current sample is not included in the rehearsal buffer, lines 5-6 replace the sample point with the smallest gradient in the rehearsal buffer.If the current size of the rehearsal buffer has not yet reached the upper limit of the number of poisoned samples and the current sample is not included in the rehearsal buffer, line 9 directly inserts the sample point

Theoretical analysis of complexity
According to Eq (14), the complexity of calculating the gradient for each sample point is O  , where  denotes the dimension of the feature vector.Based on lines 2 and 5, the complexity of Algorithm 2 is O  2_  .According to lines 11 and 14 in Algorithm 1, the complexity of Algorithm 1 is   2 3_  .Assuming that the total number of samples in Algorithm 1 converges to , then  , and at the same time, _  ≪ .Therefore, the complexity of Algorithm 1 is   approximately.The complexity of the offline DPA with the same number of samples is  _ *  *  , where _ in the offline mode refers to the rounds of updating poisoned sample features according to the gradient ascent direction.Although the basic online DPA also has a complexity of   , it requires full storage of samples, resulting in a storage space of   , while Algorithm  Thus, we can obtain   2_   .In summary, in terms of time and space complexity, Algorithm 1 has advantages compared to offline DPA and basic online DPA.

Experiment
This section evaluates the effectiveness of the rehearsal-based online DPA (RB-ODPA) with maximum-gradient-based point selection strategy and maximum-loss-based pollution strategy when applied to edge devices.The definitions of abbreviations used in this paper are shown Table 2. IA-ODPA Incremental attack proposed by [21] RB-ODPA The rehearsal based online DPA proposed of this paper SVM Support vector machine Experimental setup.In order to simulate the edge computing environment of IoT-enabled smart city, the attack algorithm was run in Linux OS in edge-embedded boards, which were mainly configured with a main chip with a cortex-A7 core, 1.2 GHz, 256 MB RAM and 512 MB ROM.The evaluation metrics are same as our previous work [24], which mainly include MSE loss, the running time of attack and the LOT.This paper adopts the OptP method proposed in [12] as the baseline algorithm of offline DPA and the incremental attack method (abbreviated as IA-ODPA) proposed in [21] as the baseline algorithm of basic online DPA.
Data set.To validate the application scenario of IoT-enabled smart cities, we selected data from intelligent power systems, which contains 9568 data samples.The features include the average temperature of the environment per hour, the average pressure of the environment per hour, the average relative humidity of the environment per hour, the exhaust vacuum per hour and the predicted label, which is the net energy output per hour.To simulate online data streams, we input these samples in batches in accordance with the strategy shown in Figure 3.We performed normalization on all sample values, resulting in the feasible range of features and labels being [0,1].This normalization process ensured consistency in the range of values for both features and labels.
Basic parameters settings.In experiments, we set poisoning rate at 5, 10, 15 and 20%.The termination condition () for algorithm convergence was set to 1e-8.The decay parameter ) and learning rate () for polluting feature values of the poisoned sample points in the direction of gradient ascent was set to 0.05 and 0.01 respectively.

The relationship between cache size setting and point selection
In this section, we investigate the differences between offline and online modes of sample selection and study the impact of different cache sizes on sample selection in the online mode.We compare the time cost and selection accuracy of the basic online sample selection, offline sample selection and the sample selection method based on the rehearsal buffer.The experimental results are presented in Tables 3 and 4. The offline sample selection is performed by our proposed method with a given poisoning rate to select poisoning points once from the entire sample set, i.e., the execution results of Algorithm 2 when the cache size b = 9568.Table 3 shows the poisoning rate in the first column, followed by the number of selected poisoning sample points in offline and basic online modes in the second column.The third to fifth columns illustrate the count of same sample points selected by our proposed selection method and offline mode under different cache size settings.From the results presented in Table 3, we can observe that the similarity between the selected samples by our proposed method and those selected in offline mode can reach over 95% by adjusting the size of the cache for different poisoning rates.Figure 5 provides a more intuitive representation of the results in Table 3, where our proposed selection method can select sample points that are relatively close to those selected in offline mode when the cache sizes are 2300, 3100 and 4700.
Table 4 compares the time cost of sample selection methods between offline and online modes.The first column shows the poisoning rate, the second column shows the time cost of sample selection in offline mode, the third column shows the time cost of basic online DPA for sample selection and the fourth to sixth columns show the time cost of our proposed sample selection method under different cache size settings.From the results presented in Table 4, we can observe that our proposed sample selection method has a lower time cost compared to the first two methods, which is consistent with the results of the algorithm complexity analysis presented in Section 5. Figure 6 presents the time cost of our proposed sample selection method under different cache size settings for a poisoning rate of 0.05.We can observe that as the cache size increases, the overall execution time of the algorithm increases.However, when the cache size is set to 2300, 3100 and 4700, the algorithm can select the maximum number of sample points that are the same as those selected in offline mode with a relatively small time cost.

Effectiveness comparison of proposed attacks and baseline attacks
In this section, we conducted a detailed analysis of the MSE, time and LOT of the three algorithms, based on the experimental settings described in the previous section.The results are presented in Tables 5  and 6 and the comparative values of LOT are in Table 7.
It is observed that all three attacks can mislead the predictive performance and the change in MSE is also linear and upward with the increase in poisoning rates.The red line in Figure 7, representing the RB-ODPA algorithm, shows the best performance with the highest loss, which demonstrates the effectiveness of our method.Table 7 indicates that the selection of attack algorithms should not be limited to the degree of improvement in MSE alone, but should consider LOT comprehensively.It clearly shows the performance comparison of the three algorithms in the LOT index, in other words, RB-ODPA achieves the maximum MSE loss with the minimum time cost.

Optimal setting cache size
In this section, we investigate the issue of cache size setting and evaluate the attack effectiveness of our proposed method under different cache size settings using the LOT metric.The experimental results are presented in Table 6 and Figure 8.According to the experimental results, our proposed attack method outperforms other methods in terms of the LOT metric.Our proposed method also exhibits different LOT results under different cache size settings, with a cache size setting of b = 2300 yielding the optimal attack effectiveness.Therefore, we set this as the optimal cache size setting for our proposed attack method.

Conclusions
In conclusion, this paper proposes an online poisoning attack framework for edge AI environments in IoT-enabled smart cities, which takes into account the limited storage space in the AI edge environment and proposes a rehearsal-based buffer mechanism to incrementally pollute the sample data stream that arrives at the appropriately sized cache to optimize the efficiency of the attack.A maximum-gradient-based sample selection strategy is proposed to overcome the periodic overwriting of the sample data cache after training, while a maximum-loss-based sample pollution strategy solves the problem of each selected sample being polluted only once in the gradient ascent direction in basic online DPA.The proposed online gray-box poisoning attack algorithms are implemented and evaluated on edge devices of IoT-enabled smart cities using an online data stream simulated with offline open-grid datasets.The experimental results demonstrate the effectiveness and overhead of the proposed attack framework and strategies, which can provide a reference for researchers to design defenses against such attacks.

Use of AI tools declaration
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
) uses gradient ascent to contaminate data points  , making their sample values  contaminated as  , where    ,  represents the gradient of the objective function at the data point  ,  represents the learning rate of iteration and  is the number of iterations. represents the projection operator, which projects the contaminated sample values into the feasible domain of the feature space.

3. 3 .
Figure 2 illustrates the schematic diagram of basic online DPA.In the figure, rectangles still represent samples, with red ones indicating poisoned samples and green ones representing normal samples.Rounded squares denote model parameters and red rounded squares indicate the poisoned parameters after the poisoning attack is completed.White hollow arrow represents training order of sample data stream.The black arrows demonstrate the normal training process using OMBGD as an example.Here, ℎ ~ℎ represent the samples that have been trained in the past time slices (each time slice containssamples, defined as ,  and since each sample can only be used once, the objective function is defined as  in Eq(7)).The online training goal is to minimize  as in Eq(8).From the definition of the formula,  reflects the gap between the cumulative loss ∑   ,  and the minimized loss

Figure 2 .
Figure 2. Schematic diagram of basic online DPA.
illustrates the strategy of storing the online training sample stream arriving at the edge node, the solid and dashed boxes represent the samples cached at time slice  and  1, respectively.The capacity of the cache is .According to this strategy, at time  , the sample   ,  .The new samples  will completely overwrite the historical samples  at time slice  1.

Figure 4
Figure 4 presents the schematic diagram of the rehearsal-based online DPA method.The

Figure 4 .
Figure 4. Schematic diagram of rehearsal based online DPA.

Figure 5 .
Figure 5.The variation of the same samples number with cache size.

Figure 6 .
Figure 6.The variation of execution time of sample selection with cache size (poisoning rate = 0.05).

Figure 9
Figure 9 illustrates the attack effectiveness of offline DPA and online DPA on a real system, with Figure 9(a) showing the offline attack effectiveness and Figure 9(b) showing the online attack effectiveness.In Figure 9(a), the offline attack method places the poisoned samples at the beginning (or end) of the normal samples, resulting in continuous abnormal values at the beginning, which is easily exposed.In contrast, in Figure 9(b), the blue dots represent normal sample values, while the red dots represent the values of poisoned samples after attack.Since the poisoned samples are mixed with normal samples, the offline attack is more covert.

Figure 9 .
Figure 9.Comparison of offline and online DPA attack effects.
and  represent the corresponding label vectors.Given the learning model  ℎ  and objective function    ,  , where  represents the learning parameter of the model, the normal training goal is to calculate the optimal parameter  * for the minimum objective function shown in Eq simultaneously.Lines 19-24 update the maximum and minimum loss values of the model during the attack process.
Samples in the current time slice, ,Samples in the current rehearsal buffer,  _ , current model parameter, ,objective function, , the number of poisoned samples at present,_ _ 10: end if 11: end if 12:return  _ 1 only requires storage space of   2_ .

Table 2 .
List of abbreviations.

Table 3 .
Comparison of sample selection.

Table 4 .
Comparison of sample selection time.

Table 5 .
MSE comparison of attacks.

Table 6 .
Time cost and LOT of sample selection under different cache size.

Table 7 .
Comparison of attack time and LOT (cache size b = 4700).