Traffic Pattern Prediction and Spectrum Allocation with Multiple Channel Width in Cognitive Cellular Networks

This paper investigates the traffic pattern prediction based on seasonal deviation and spectrum reallocation with multiple channel width in cognitive cellular networks. Compared to the existing approaches based on time series or classical statistic method, the binary exponential deviation offset prediction proposed in this paper focuses on the increment or decrement on every sampling point during an exponential offset period. Then the deviations will be revised at different levels in the next prediction process. The proposed approach is validated with some real end-user data from a WiFi network and simulation experiments. Based on such a precise prediction, we allocate the channels with different bandwidth to end-users according to diverse quality-of-service (QoS), which increases both the system's profits and actual spectrum utilization. The multidimensional bounded knapsack problem is introduced to divide channels, to which the proposed balance between value density and request probability strategy gets the approximate solution. The simulation experiment results show its good performance in not only utility but also spectrum utilization of the base-stations, especially when the resources are deficient.


Introduction
Cellular networks have succeeded in the last twenty years, but they are put to the test in rapidly increasing data traffic. The popularity of smart phones, which are not so much phones as hand-held computers, integrates several multimedia information services. In an open market report of Ericsson, it is described that the data traffic of cellular networks has doubled from the third quarter 2011 to the third quarter 2012, and mobile data traffic driven mainly by video is expected to grow with a CAGR (compound average growth rate) of around 50 percent in the time frame 2012-2018, which entails growth of around 12 times by the end of 2018 [1]. A study report of Juniper Research describes that about 40% mobile data traffic will still rely on cellular networks by 2017 [2].
Since the objective of cognitive radio (CR) technology is to reallocate spectrum resources reasonably and efficiently, the combination of these two technologies brings the promising prospect to cellular networks [3]. The idle spectrum will be circulated, and the cellular networks will obtain more resources than before.
The spectrum sharing in cognitive cellular (CGC) networks is a two-phase procedure. In the first phase, the CGC network gets spectrum resources from the primary users (PUs) through sensing [4] or borrowing [5] or through auctions [6][7][8]. In the second phase, the base-station (BS) of every CGC network cell allocates the spectrum to end-users.
In the allocation process, the BS should understand the traffic pattern, such as arrival rate, of end-users in order to plan spectrum allocation in its cell properly. In the researches of predecessors, PUs have higher priority to use spectrum as its owners and secondary users (SUs) will not be affected if they know when and how long the PUs will use it. So, the prediction for traffic pattern is one of the important contents in CR research field. A time series analysis model is adopted in [9] in which the user arrivals are regarded as time series. The authors of [10] estimate PUs' actions according to experimental statistics. In [11], the authors propose a prediction algorithm which exploits the periodicity of traffic process. These above studies focus on PU's traffic pattern estimation and got some outstanding achievements. However, compared to PUs which are often organizations or institutions, the end-users as individuals are more susceptible 2 International Journal of Distributed Sensor Networks and fluctuant because of weather, festival, or other events. In such a situation, the traffic will present an obvious deviation. Considering the deviations adequately will make the traffic prediction for end-users more accurate.
After estimating the traffic actions of end-users, the BS allocates cognitive spectrum to them. At present, there are few studies on spectrum allocation in CGC networks particularly, but some studies in CR networks are developed. A distributed resource allocation method is proposed in [12], whose purpose is to balance the queue length and promote the fairness. The limitation is that the architecture of CGC network is centralized, and the spectrum allocation from a BS to end-users is planned by the BS rather than the end-users voluntarily. The resource allocation standard in [13] is to maximize the coverage of CR network through a power control method. Since in a CGC network the cell's range is certain, expanding the coverage does not absorb much attention. Moreover, the previous achievements are mainly concentrating on the technical property of CR networks. The commerciality of CGC networks, different from other common CR networks, has determined that it needs the characteristics of market-based operation. Since the economical benefits encourage a CGC network to value the end-user's requirements and devote itself to improve QoS, it is as important as the technical aspect. The better user experience is always one of the most significant goals of cellular networks. Such an absence is a shortage of current study on spectrum allocation in CGC networks.
The resources allocation in a CR network is summarized to a multidimensional knapsack problem and a greedy algorithm is introduced in [14]. Such a process is instructive for us to consider the spectrum allocation in CGC networks. However, the allocation in a CGC network is reduced to knapsack problem which is not only bounded but also multiple, and the mathematical model should be defined again. Furthermore, the density of value greedy algorithm only concerns the reward and cost of BS that ignores the volatility of end-user demands. So, an improvement is necessary too.
Our contributions are as follows.
(i) A binary exponential deviation offset (BEDO) method is proposed to predict the arrival rate of endusers. The increment or decrement from predicted value to sampled data is regarded as a deviation. Certain external factors cause the deviation and will continue to affect the next sample points. A binary exponential factor is introduced to measure the influence duration. In every prediction process, the past deviations will be offset at different levels.
(ii) The BS's utility in the second phase of two-phase spectrum sharing in a CGC network is discussed first, which has been ignored in previous studies. The appropriate economic stimulation will boost the CGC network to serve end-users more attentively. The relevance between profit and spectrum also promotes spectrum saving in resources reallocation.
(iii) For spectrum reallocation, a balance between value density and request probability (BVDRP) algorithm is proposed. The existing heuristic algorithms can not completely apply to the multidimensional bounded knapsack problem. In BVDRP, the pursuit for benefit, cost of system, and irregularity of end-user requests are all concerned.
The rest of the paper is organized as follows. The system model is introduced in Section 2. Section 3 does some theoretical analysis to the model, where we give the method to calculate its parameters. In Section 4, the system is evaluated with real network data and simulation experiments. Finally, Section 5 summarizes our conclusions.

Spectrum Access Method.
A cellular network distributes over land areas called cells. Each is served by at least one fixedlocation transceiver, known as a BS. A BS is the center of its local communication cell. For clear and simple description, in this paper a BS's spectrum programme is discussed in its local cell.
It is shown in Figure 1 that the right of use of idle spectrum segments is translated from PUs to a BS. At present CGC study stage, the spectrum sharing is mainly in communication networks using different technologies such as GSM, CDMA, UMTS, or Wi-MAX [15]. Along with the communication technologies and equipment progressing, it is believable that cognitive spectrum resource types will be extended.
In a CGC network cell, if an end-user wants to communicate with another, he has to get an available channel from the BS as shown in Figure 2. The BS allocates different users different channels. A complete graph can be used to describe the conflicts of a BS's users [16]. The communication range of a BS is not as large as the interference, so the spectrum channels of a BS must be different from other adjacent ones. In this paper, the spectrum allocation of a single BS is discussed.

Model
Framework. The proposed traffic prediction and spectrum allocation approach, illustrated in Figure 3, is composed of three modules.  (0,u+1) , (1,0) , (1,1) , (1,2) , (1,u+1) ,  The function of traffic prediction model is to predict end-user traffic as accurately as possible. The inputs are the end-user arrival rates ( ,0) , ( ,1) , . . . , ( , +1) for required bandwidth , where the past sampling period is from 0 to +1. Its other input is Δ which is the forthcoming prediction duration. This model outputs the predicted channel quantity for bandwidth . The process is explained more in Section 3.1.
The allocation strategy model builds up the spectrum allocation strategy in accordance with the operation goals of system. The inputs include the weighting factors and , price series V , and channel quantity for required bandwidth which is the output of traffic prediction model.
In Section 3.2, the operation of this model will be presented in detail.
The last part, spectrum allocation model, builds on the outputs of the above two modules. The channel quantity series show the predicted desired channel quantities with different bandwidth. The allocation strategy provides the distribution priority series . The channel quantity with bigger value should be given priority.

Proposed Algorithm
The spectrum resource that the BS acquires from PUs is denoted by . The system auctions or rents from PUs or in secondary markets, so the spectrum is not continuous but a series of spectrum blocks { }, which can be written as follows: where = 1, 2, . . . , .
The distinct kinds of channel width the system provides to end-users are denoted by , where = 1, 2, . . . , . Every actual scale of could be decided by advance surveys or experimental data collections. In [17], the authors describe the statistical results about the videos of YouTube whose bit rates are nearly the same in the website. Such a research proves that it is possible to fix on the required bandwidth of similar applications preliminarily. So, a suitable series of channel width { } could be confirmed according to several major kinds of applications.
Suppose the amount of demands for channel width is , for which the income is V in a unit of time. The pricing strategy is a marketing decision.
In such an economical procedure, the technical problems to be solved are the following two: how to predict the demand quantity for channel width and how to divide the spectrum blocks { } into available channels.

To Predict Demanded Quantities.
To predict demanded quantities is a traffic pattern prediction procedure. According to a CGC network's characteristics, the system only occupies the spectrum resources for a specific time, after which they have to be returned to PUs. At that time, the system will rent some new blocks of spectrum. Such a process means that the spectrum resources are updated regularly, in which the period is called a spectrum period. The key to divide spectrum blocks suitably is to predict end-user arrivals and calculate accurately before the following spectrum period.
Denote the spectrum period by Δ ; the number of channel requests for bandwidth is where ( ) is the density function of end-user request arrival rate for bandwidth . In CR network studies, the arrival of spectrum requests is usually described with a Poisson process [18,19], which can be defined as where denotes the average arrival rate. During the estimation of end-user actions, the parameter is considered unaltered in a time period Δ , and another new parameter is used in the next period. Such a period is a natural period which is different from the spectrum period. The arrival rates for bandwidth are denoted by series { ( , ) }. Considering the spectrum trading process, Δ is regarded not less than Δ and expressed with where Δ 0 is the time earlier than the natural period Δ 1 and Δ +1 is the time later than Δ . Since { ( , ) } are the average arrival series, we get Hence the estimation of is translated to estimate the average arrival { ( , ) }. Since similar rules govern what humans do in the same time period, the end-user arrival rate is assumed to be periodic with a 24-hour period (one day) which is denoted by a time series { ( ,1) , ( ,2) , . . . , ( ,24) }. The historical data provide the prediction baseline for ( , ) which is where is the sampling days. The predicted arrival ratê( , ) is expressed as follows: in whicĥ( , ) is estimation value of deviation. Let us consider the deviation ( , ) . Weather, holidays, or other emergencies could bring the end-user arrival fluctuating, most of which will not disappear soon. Although the acting time of different elements is different, it is reasonable to take the recent emergency as active influence. Namely, more recent emergencies will be reflected in the next deviation more greatly. A binary exponential offset factor is introduced to estimate the deviation as follows: in which decides the offset duration. The deviations in past sampling intervals will be reflected with an exponential depletion.
Denoting the deviation from the statistic baseline to actual arrival rate, ( , ) is So, rearranging (8) and (9), The predicted arrival ratê( , ) is rewritten as follows: From an intuitive understanding, the deviation from the statistic baseline to actual arrival rate is caused by external International Journal of Distributed Sensor Networks 5 factors. Such an influence will not disappear in a short period likely. When the arrival ratê( , ) is estimated, the effect factors which bring about the deviations (̂( , − ) − ( , − ) ) are considered active at different levels. So, they are included in the calculation of deviation instead of (̂( , ) − ( , ) ) which is difficult to be calculated timely.
Pay attention that the channels amount for bandwidth is the number that the system expects to provide. Usually it could not be realized because of the limitations of spectrum resource, uncertainties of market behavior, and irregularities of spectrum blocks. So the system has to try its best to obtain as many resources as it can. In such a case, it has to be considered how to make full use of vested spectrum and maximize the system's utility. This involves the next question: how to divide spectrum blocks into channels.

To Divide Channels.
To divide discontinuous spectrum blocks into different channels, it is not only bounded but also multiple. So we call it a multiple bounded knapsack problem (MBKP).
The decision problem form of such a knapsack problem is NP-complete [20]; thus it is expected that no algorithm can be both correct and fast (polynomial-time) in all cases. Our interest is to get rapid-speed and low-complexity solutions, so the greedy method is concerned. It tries to get the best choice basing on the current situation in spite of the overall conditions. Compared with the optimal solution at the expense of a lot of time and space, the easy and satisfactory one is enough. A heuristic strategy could not get the optimal solution necessarily but achieve the desired objective quickly [14]. It is suitable for our system, where the channel division is updated termly and the convenient acquisition is more important.
The key of such a greedy algorithm is to appoint the greedy strategy. The density of value greedy strategy sorts the items in decreasing order of value per unit of weight, V / . In our scenario, the value stands for the system's profits from end-users, while the weight represents the provided bandwidth. So, the density of value means the ratio of an item's reward to its cost. It is just the policy adopted in [14]. Since the amount of spectrum resources the system owned is limited and the BS works in a commercial mode, it wants to gain more input with less output. So the density of value greedy strategy is more suitable for channel division in a CGC network than other developed ones.
However, it has to be considered that the required bandwidth's probabilities lie in different spectrum periods. Though some channels' value densities are bigger, they appear much less in end-user's requests. Such idle channels not only bring few incomes for the system, but also waste scarce spectrum resources seriously.
On account of such considerations, an improved density of value greedy strategy for channel division is proposed. It makes a balance between value density and request probability (BVDRP).
The probability that a kind of bandwidth appears in requests is valued in interval (0, 1). In a CGC network, if the selling price of channel is many times than the buying price, the value density will be much larger than request probability and the comparison is uneven. So, the first step to formulate BVDRP strategy is to polish value density in order to be comparable to request probability, which can be written as follows: where = 1, 2, . . . , . Then, the weighting factors, and , are introduced into the improved greedy algorithm as follows: The determination coefficient of channel division decides the allocation order. The bandwidth with greater coefficient will be allocated with a priority, which can be expressed as follows: where = 1, 2, . . . , . The values and show the preference the system has between value density and bandwidth request probability. Most often, they are both in interval (0, 1). Two extreme situations are as follows. When = 1 and = 0, the system divides the channels following density of value greedy strategy completely in spite of request probability. It means the pursuit of maximum benefits and assumption of wasteful channels risk. So, it is a radical channel division strategy. On the contrary, when = 0 and = 1, the system cares about the full use of resources and tries to maximize the channel request probabilities, where the anticipation for incomes is weak. Such a strategy is conservative.
In practical applications, the values of and are selected according to the system load, choice between income and resource utilization, and other demands.
The mathematical model is to select proper channel bandwidth from { }, with which the system divides spectrum blocks to maximize the system's utility. If , denotes the amount of channels which are divided from the spectrum block with bandwidth , such a problem can be described as follows:  The algorithm to describe BVDRP strategy, which takes into account both value density and request probability, is shown in Algorithm 1. The space complexity of sort operation demonstrated in steps 5 and 8 is ( ) and the time complexity is ( log 2 ( )). Synthesizing to divide discontinuous spectrum blocks by different bandwidth, the space complexity of the whole algorithm is * ( ) and the time complexity is * ( log 2 ( )).

Experiment and Evaluation
Firstly, the proposed BEDO prediction method is compared with three other ones which are as follows.
(i) Classical Statistical Method. This method takes the mean value of arrival rates at the same time point in sampling periods as the prediction result. It is the most easily understandable method and simple to calculate.
(iii) WCNC Method. The prediction method in [11] is presented in WCNC 2008. It takes arrival increment dividing time change as slope to estimate the prospective arrival rate and shows an admirable prediction ability.
Although a WiFi network is very different from a CGC one, the two kinds of networks provide network access service. For end-users, when they are ready to visit a website, they will not notice any difference. So, the end-user behaviors are similar in these networks. Some real end-user usage data from CRAWDAD [21] are adopted to verify the BEDO prediction method proposed in this paper. The beginning 240 hours are taken as training time, and then the arrival rates in the next 120 hours are predicted and compared to real arrival data. The comparison results are shown in Figure 4.
Considering that the arrival rate is easy to be affected by external factors, we take some further simulation experiments to examine the prediction effects with the arrival rate changing severely. The performances of distinct methods are compared in Figure 5.
In the above prediction comparisons, when the arrival rate is stationary, just like in Figure 4, the performances of four prediction methods are similar. They all reflect the periodic trend and the variances are acceptable. Although some certain variances of (2, 0, 4)(1, 1, 1) 24 are higher, its entire prediction effect is close to WCNC method. The lower variances indicate BEDO and statistical prediction are a little better than the other two methods, but the advantage is not obvious. When the arrival rates fluctuate severely in simulation experiments shown in Figure 5, the performance of BEDO method is much better than other predictions, whose variance has been kept in a stable and low level. The cumulative variances listed in Table 1 prove such a viewpoint with data.
So, BEDO method is validated to predict the end-users arrival rate accurately. This is the prerequisite of following spectrum allocation.
Subsequently, some simulation experiments are adopted to validate the channel division approach this paper put forward. The parameters are shown in Table 2. Suppose end-user requests coming with a Poisson process whose parameter = 100, service time following an exponential distribution with = 1, and bandwidth probability following a Gaussian distribution with = 350 and 2 = 100. Since the experiments aim to compare the different effects of different channel allocation strategies, the parameters are preassigned. In a real SSP's working process, they should be predicted by one of the estimation methods, such as BEDO. The weighting   factors and are both 0.5, which means the system regards value density and request probability uniformly.
In simulations, the spectrum blocks are generated randomly, the amount of which is from 2 MHz to 50 MHz increasingly. The channels are divided using five different allocation strategies, and the allocation efficiencies are calculated with allocated bandwidth divided by total spectrum bandwidth.
BVDRP is compared with the other four strategies including density of value greedy, value greedy, channel width from 8 International Journal of Distributed Sensor Networks  small to large, and channel width from large to small. The reasons that they are selected are as follows.
(i) Density of Value Greedy. This strategy is introduced in [14] as a desirable heuristic strategy. Compared with the other previous ones, it is a practical method regarding not only reward but also cost.
(ii) Value Greedy. It highlights the purpose to seek profits merely, which is an economical target on spectrum allocation in CGC networks superficially.
(iii) Channel Width from Small to Large. This strategy divides channels as many as possible and accords with the objective of [4] to enlarge the network capacity.
(iv) Channel Width from Large to Small. It is the opposite to the third strategy, so it is conducted for reference. Three aspects affected by the above five channel divisions are compared, including allocation efficiencies, utilities of system, and spectrum utilizations.
As shown in Figure 6, the five division methods are not obviously different in allocation efficiencies. When the spectrum resources reach 40 MHz totally, the five curves begin to overlap. It illustrates that 40 MHz resources have satisfied all the system's demands, over which the allocation results are consistent in spite of division difference. The allocation curves decrease gradually over 40 MHz, because the superfluous resources are wasted above the spectrum demands satisfaction.
According to these simulation results, it is known that the different division strategies do not reflect differences if the spectrum resources acquired from primary users are sufficient. It is understood that the BSs in CGC networks could not achieve a plenty of spectrums due to resource's unbalance. So, we select the spectrum scenes of 10 MHz, 20 MHz, and 30 MHz to compare system utilities and spectrum utilizations. Such scenes stand for spectrum resources from exceedingly poor to relatively enough. Figure 7 shows that, with different distributable resources, the system utilities increase linearly following time growth. The incomes of BVDRP, density of value, and bandwidth from small to large strategies are approximate, of which value greedy and bandwidth from large to small are lower.
Value greedy strategy pursues value maximization too much and ignores that more value means more bandwidth. On the face of seeking incomes, this division approach brings less utility to the system when the resources are limited. Bandwidth from large to small strategy wants to divide the channels needing larger bandwidth in advance, until the allocation is difficult. Then the remainder spectrum is divided into smaller channels. Though the higher price is charged for larger channel, the spectrum spent is more and the number of available channels is less. Finally, the total utility of system is less.
Among the three channel division strategies producing more incomes, density of value greedy wants to exchange less bandwidth for more profits. Bandwidth from small to large tries to divide more available channels when spectrum resources are certain and exactly in the experiments the smaller channels have the higher density of value. So, they obtain more utilities for the system.   BVDRP strategy this paper proposed takes into account both value density and request probability. The available channels are vacant rarely that brings the good return to the system. When the lack of spectrum resources is serious ( = 10 MHz) in Figure 7(a), the system obtains the most profits following this process. Figure 8 shows the spectrum utilizations using five different channel division strategies. When = 10 MHz, the system has resources scarcely. The spectrum utilization rates are stable at around 63% with value greedy and bandwidth from large to small strategies. The usage of available channels divided by density of value and bandwidth from small to large strategies are affected according to the channel requests, the spectrum utilizations of which fluctuate strongly within interval 70% to 95%. Our BVDRP strategy takes the full consideration of end-user requests, so it takes better place in the spectrum utilization rate that is kept above 96% approximately. When = 20MHz and = 30MHz, the spectrum resource tends to be abundant and the available channels are more. So, the end-users queue is eased and there are vacant channels sometimes, which leads to the spectrum utilization rates reducing at some certain extents. They fluctuate from 68% to 88% with value greedy and bandwidth from large to small strategies and from 76% to 97% with density of value and bandwidth from small to large ones. The BVDRP approach presents the highest utilization in five strategies, which is about from 76% to 99%. On the whole, BVDRP shows up some certain advantages in both system utility and spectrum utilization, especially when cognitive spectrum is scarce.

Conclusion
In this paper, we propose a traffic pattern prediction and channel division approach for the wireless service provider (the base-station) in a CGC network. The end-user arrival rate and bandwidth probability are estimated with a binary exponential deviation offset (BEDO) method. Then, we adopt the balance between value density and request probability strategy (BVDRP) to solve the multiple bounded knapsack problem and divide discontinuous spectrum blocks into channels. At last, the methods are evaluated with real data and simulation experiments. The results prove that in a CGC network our process will not only bring more profits to the service provider but also make better use of spectrum resources. And the advantage is more obvious, while the resource is less.