A Review on Spectrum Sensing for Cognitive Radio: Challenges and Solutions

Cognitive radio is widely expected to be the next Big Bang in wireless communications. Spectrum sensing, that is, detecting the presence of the primary users in a licensed spectrum, is a fundamental problem for cognitive radio. As a result, spectrum sensing has reborn as a very active research area in recent years despite its long history. In this paper, spectrum sensing techniques from the optimal likelihood ratio test to energy detection, matched ﬁltering detection, cyclostationary detection, eigenvalue-based sensing, joint space-time sensing, and robust sensing methods are reviewed. Cooperative spectrum sensing with multiple receivers is also discussed. Special attention is paid to sensing methods that need little prior information on the source signal and the propagation channel. Practical challenges such as noise power uncertainty are discussed and possible solutions are provided. Theoretical analysis on the test statistic distribution and threshold setting is also investigated.


Introduction
It was shown in a recent report [1] by the USA Federal Communications Commission (FCC) that the conventional fixed spectrum allocation rules have resulted in low spectrum usage efficiency in almost all currently deployed frequency bands. Measurements in other countries also have shown similar results [2]. Cognitive radio, first proposed in [3], is a promising technology to fully exploit the under-utilized spectrum, and consequently it is now widely expected to be the next Big Bang in wireless communications. There have been tremendous academic researches on cognitive radios, for example, [4,5], as well as application initiatives, such as the IEEE 802.22 standard on wireless regional area network (WRAN) [6,7] and the Wireless Innovation Alliance [8] including Google and Microsoft as members, which advocate to unlock the potential in the so-called "White Spaces" in the television (TV) spectrum. The basic idea of a cognitive radio is spectral reusing or spectrum sharing, which allows the secondary networks/users to communicate over the spectrum allocated/licensed to the primary users when they are not fully utilizing it. To do so, the secondary users are required to frequently perform spectrum sensing, that is, detecting the presence of the primary users. Whenever the primary users become active, the secondary users have to detect the presence of them with a high probability and vacate the channel or reduce transmit power within certain amount of time. For example, for the upcoming IEEE 802.22 standard, it is required for the secondary users to detect the TV and wireless microphone signals and vacant the channel within two seconds once they become active. Furthermore, for TV signal detection, it is required to achieve 90% probability of detection and 10% probability of false alarm at signal-to-noise ratio (SNR) level as low as −20 dB.
There are several factors that make spectrum sensing practically challenging. First, the required SNR for detection may be very low. For example, even if a primary transmitter is near a secondary user (the detection node), the transmitted signal of the primary user can be deep faded such that the primary signal's SNR at the secondary receiver is well below −20 dB. However, the secondary user still needs to detect the primary user and avoid using the channel because it may strongly interfere with the primary receiver if it transmits. A practical scenario of this is a wireless microphone operating in TV bands, which only transmits with a power less than 50 mW and a bandwidth less than 2 EURASIP Journal on Advances in Signal Processing 200 KHz. If a secondary user is several hundred meters away from the microphone device, the received SNR may be well below −20 dB. Secondly, multipath fading and time dispersion of the wireless channels complicate the sensing problem. Multipath fading may cause the signal power to fluctuate as much as 30 dB. On the other hand, unknown time dispersion in wireless channels may turn the coherent detection unreliable. Thirdly, the noise/interference level may change with time and location, which yields the noise power uncertainty issue for detection [9][10][11][12].
Facing these challenges, spectrum sensing has reborn as a very active research area over recent years despite its long history. Quite a few sensing methods have been proposed, including the classic likelihood ratio test (LRT) [13], energy detection (ED) [9,10,13,14], matched filtering (MF) detection [10,13,15], cyclostationary detection (CSD) [16][17][18][19], and some newly emerging methods such as eigenvalue-based sensing [6,[20][21][22][23][24][25], wavelet-based sensing [26], covariancebased sensing [6,27,28], and blindly combined energy detection [29]. These methods have different requirements for implementation and accordingly can be classified into three general categories: (a) methods requiring both source signal and noise power information, (b) methods requiring only noise power information (semiblind detection), and (c) methods requiring no information on source signal or noise power (totally blind detection). For example, LRT, MF, and CSD belong to category A; ED and wavelet-based sensing methods belong to category B; eigenvalue-based sensing, covariance-based sensing, and blindly combined energy detection belong to category C. In this paper, we focus on methods in categories B and C, although some other methods in category A are also discussed for the sake of completeness. Multiantenna/receiver systems have been widely deployed to increase the channel capacity or improve the transmission reliability in wireless communications. In addition, multiple antennas/receivers are commonly used to form an array radar [30,31] or a multiple-input multiple-output (MIMO) radar [32,33] to enhance the performance of range, direction, and/or velocity estimations. Consequently, MIMO techniques can also be applied to improve the performance of spectrum sensing. Therefore, in this paper we assume a multi-antenna system model in general, while the single-antenna system is treated as a special case.
When there are multiple secondary users/receivers distributed at different locations, it is possible for them to cooperate to achieve higher sensing reliability. There are various sensing cooperation schemes in the current literature [34][35][36][37][38][39][40][41][42][43][44]. In general, these schemes can be classified into two categories: (A) data fusion: each user sends its raw data or processed data to a specific user, which processes the data collected and then makes the final decision; (B) decision fusion: multiple users process their data independently and send their decisions to a specific user, which then makes the final decision.
In this paper, we will review various spectrum sensing methods from the optimal LRT to practical joint space-time sensing, robust sensing, and cooperative sensing and discuss their advantages and disadvantages. We will pay special attention to sensing methods with practical application potentials. The focus of this paper is on practical sensing algorithm designs; for other aspects of spectrum sensing in cognitive radio, the interested readers may refer to other resources like [45][46][47][48][49][50][51][52].
The rest of this paper is organized as follows. The system model for the general setup with multiple receivers for sensing is given in Section 2. The optimal LRT-based sensing due to the Neyman-Pearson theorem is reviewed in Section 3. Under some special conditions, it is shown that the LRT becomes equivalent to the estimator-correlator detection, energy detection, or matched filtering detection. The Bayesian method and the generalized LRT for sensing are discussed in Section 4. Detection methods based on the spatial correlations among multiple received signals are discussed in Section 5, where optimally combined energy detection and blindly combined energy detection are shown to be optimal under certain conditions. Detection methods combining both spatial and time correlations are reviewed in Section 6, where the eigenvalue-based and covariance-based detections are discussed in particular. The cyclostationary detection, which exploits the statistical features of the primary signals, is reviewed in Section 7. Cooperative sensing is discussed in Section 8. The impacts of noise uncertainty and noise power estimation to the sensing performance are analyzed in Section 9. The test statistic distribution and threshold setting for sensing are reviewed in Section 10, where it is shown that the random matrix theory is very useful for the related study. The robust spectrum sensing to deal with uncertainties in source signal and/or noise power knowledge is reviewed in Section 11, with special emphasis on the robust versions of LRT and matched filtering detection methods. Practical challenges and future research directions for spectrum sensing are discussed in Section 12. Finally, Section 13 concludes the paper.

System Model
We assume that there are M ≥ 1 antennas at the receiver. These antennas can be sufficiently close to each other to form an antenna array or well separated from each other. We assume that a centralized unit is available to process the signals from all the antennas. The model under consideration is also applicable to the multinode cooperative sensing [34][35][36][37][38][39][40][41][42][43][44]53], if all nodes are able to send their observed signals to a central node for processing. There are two hypotheses: H 0 , signal absent, and H 1 , signal present. The received signal at antenna/receiver i is given by In hypothesis H 1 , s i (n) is the received source signal at antenna/receiver i, which may include the channel multipath and fading effects. In general, s i (n) can be expressed as EURASIP Journal on Advances in Signal Processing 3 where K denotes the number of primary user/antenna signals, s k (n) denotes the transmitted signal from primary user/antenna k, h ik (l) denotes the propagation channel coefficient from the kth primary user/antenna to the ith receiver antenna, and q ik denotes the channel order for h ik . It is assumed that the noise samples η i (n)'s are independent and identically distributed (i.i.d) over both n and i. For simplicity, we assume that the signal, noise, and channel coefficients are all real numbers. The objective of spectrum sensing is to make a decision on the binary hypothesis testing (choose H 0 or H 1 ) based on the received signal. If the decision is H 1 , further information such as signal waveform and modulation schemes may be classified for some applications. However, in this paper, we focus on the basic binary hypothesis testing problem. The performance of a sensing algorithm is generally indicated by two metrics: probability of detection, P d , which defines, at the hypothesis H 1 , the probability of the algorithm correctly detecting the presence of the primary signal; and probability of false alarm, P f a , which defines, at the hypothesis H 0 , the probability of the algorithm mistakenly declaring the presence of the primary signal. A sensing algorithm is called "optimal" if it achieves the highest P d for a given P f a with a fixed number of samples, though there could be other criteria to evaluate the performance of a sensing algorithm.
Stacking the signals from the M antennas/receivers yields the following M × 1 vectors: The hypothesis testing problem based on N signal samples is then obtained as H 0 : x(n) = η(n),

Neyman-Pearson Theorem
The Neyman-Pearson (NP) theorem [13,54,55] states that, for a given probability of false alarm, the test statistic that maximizes the probability of detection is the likelihood ratio test (LRT) defined as where p(·) denotes the probability density function (PDF), and x denotes the received signal vector that is the aggregation of x(n), n = 0, 1, . . . , N − 1. Such a likelihood ratio test decides H 1 when T LRT (x) exceeds a threshold γ, and H 0 otherwise. The major difficulty in using the LRT is its requirements on the exact distributions given in (5). Obviously, the distribution of random vector x under H 1 is related to the source signal distribution, the wireless channels, and the noise distribution, while the distribution of x under H 0 is related to the noise distribution. In order to use the LRT, we need to obtain the knowledge of the channels as well as the signal and noise distributions, which is practically difficult to realize.
If we assume that the channels are flat-fading, and the received source signal sample s i (n)'s are independent over n, the PDFs in LRT are decoupled as If we further assume that noise and signal samples are both Gaussian distributed, that is, η(n) ∼ N (0, σ 2 η I) and s(n) ∼ N (0, R s ), the LRT becomes the estimator-correlator (EC) [13] detector for which the test statistic is given by From (4), we see that R s (R s + 2σ 2 η I) −1 x(n) is actually the minimum-mean-squared-error (MMSE) estimation of the source signal s(n). Thus, T EC (x) in (7) can be seen as the correlation of the observed signal x(n) with the MMSE estimation of s(n).
The EC detector needs to know the source signal covariance matrix R s and noise power σ 2 η . When the signal presence is unknown yet, it is unrealistic to require the source signal covariance matrix (related to unknown channels) for detection. Thus, if we further assume that R s = σ 2 s I, the EC detector in (7) reduces to the well-known energy detector (ED) [9,14] for which the test statistic is given as follows (by discarding irrelevant constant terms): Note that for the multi-antenna/receiver case, T ED is actually the summation of signals from all antennas, which is a straightforward cooperative sensing scheme [41,56,57]. In general, the ED is not optimal if R s is non-diagonal. If we assume that noise is Gaussian distributed and source signal s(n) is deterministic and known to the receiver, which is the case for radar signal processing [32,33,58], it is easy to show that the LRT in this case becomes the matched filtering-based detector, for which the test statistic is

Bayesian Method and the Generalized Likelihood Ratio Test
In most practical scenarios, it is impossible to know the likelihood functions exactly, because of the existence of 4 EURASIP Journal on Advances in Signal Processing uncertainty about one or more parameters in these functions. For instance, we may not know the noise power σ 2 η and/or source signal covariance R s . Hypothesis testing in the presence of uncertain parameters is known as "composite" hypothesis testing. In classic detection theory, there are two main approaches to tackle this problem: the Bayesian method and the generalized likelihood ratio test (GLRT).
In the Bayesian method [13], the objective is to evaluate the likelihood functions needed in the LRT through marginalization, that is, where Θ 0 represents all the unknowns when H 0 is true. Note that the integration operation in (10) should be replaced with a summation if the elements in Θ 0 are drawn from a discrete sample space. Critically, we have to assign a prior distribution p(Θ 0 | H 0 ) to the unknown parameters. In other words, we need to treat these unknowns as random variables and use their known distributions to express our belief in their values. Similarly, p(x | H 1 ) can be defined.
The main drawbacks of the Bayesian approach are listed as follows.
(1) The marginalization operation in (10) is often not tractable except for very simple cases.
(2) The choice of prior distributions affects the detection performance dramatically and thus it is not a trivial task to choose them.
To make the LRT applicable, we may estimate the unknown parameters first and then use the estimated parameters in the LRT. Known estimation techniques could be used for this purpose [59]. However, there is one major difference from the conventional estimation problem where we know that signal is present, while in the case of spectrum sensing we are not sure whether there is source signal or not (the first priority here is the detection of signal presence). At different hypothesis (H 0 or H 1 ), the unknown parameters are also different.
The GLRT is one efficient method [13,55] to resolve the above problem, which has been used in many applications, for example, radar and sonar signal processing. For this method, the maximum likelihood (ML) estimation of the unknown parameters under H 0 and H 1 is first obtained as where Θ 0 and Θ 1 are the set of unknown parameters under H 0 and H 1 , respectively. Then, the GLRT statistic is formed as Finally, the GLRT decides H 1 if T GLRT (x) > γ, where γ is a threshold, and H 0 otherwise.
It is not guaranteed that the GLRT is optimal or approaches to be optimal when the sample size goes to infinity. Since the unknown parameters in Θ 0 and Θ 1 are highly dependent on the noise and signal statistical models, the estimations of them could be vulnerable to the modeling errors. Under the assumption of Gaussian distributed source signals and noises, and flat-fading channels, some efficient spectrum sensing methods based on the GLRT can be found in [60].

Exploiting Spatial Correlation of Multiple Received Signals
The received signal samples at different antennas/receivers are usually correlated, because all s i (n)'s are generated from the same source signal s k (n)'s. As mentioned previously, the energy detection defined in (8) is not optimal for this case. Furthermore, it is difficult to realize the LRT in practice. Hence, we consider suboptimal sensing methods as follows.
If M > 1, K = 1, and assuming that the propagation channels are flat-fading (q ik = 0, ∀i, k) and known to the receiver, the energy at different antennas can be coherently combined to obtain a nearly optimal detection [41,43,57]. This is also called maximum ratio combining (MRC). However, in practice, the channel coefficients are unknown at the receiver. As a result, the coherent combining may not be applicable and the equal gain combining (EGC) is used in practice [41,57], which is the same as the energy detection defined in (8).
In general, we can choose a matrix B with M rows to combine the signals from all antennas as The combining matrix should be chosen such that the resultant signal has the largest SNR. It is obvious that the SNR after combining is where E(·) denotes the mathematical expectation. Hence, the optimal combining matrix should maximize the value of function Γ(B). Let R s = E[s(n)s T (n)] be the statistical covariance matrix of the primary signals. It can be verified that where Tr(·) denotes the trace of a matrix. Let λ max be the maximum eigenvalue of R s and let β 1 be the corresponding eigenvector. It can be proved that the optimal combining matrix degrades to the vector β 1 [29]. Upon substituting β 1 into (13), the test statistic for the energy detection becomes EURASIP Journal on Advances in Signal Processing

5
The resulting detection method is called optimally combined energy detection (OCED) [29]. It is easy to show that this test statistic is better than T ED (x) in terms of SNR. The OCED needs an eigenvector of the received source signal covariance matrix, which is usually unknown. To overcome this difficulty, we provide a method to estimate the eigenvector using the received signal samples only. Considering the statistical covariance matrix of the signal defined as we can verify that Since R x and R s have the same eigenvectors, the vector β 1 is also the eigenvector of R x corresponding to its maximum eigenvalue. However, in practice, we do not know the statistical covariance matrix R x either, and therefore we cannot obtain the exact vector β 1 . An approximation of the statistical covariance matrix is the sample covariance matrix defined as Let β 1 (normalized to β 1 2 = 1) be the eigenvector of the sample covariance matrix corresponding to its maximum eigenvalue. We can replace the combining vector β 1 by β 1 , that is, Then, the test statistics for the resulting blindly combined energy detection (BCED) [29] becomes It can be verified that where λ max (N) is the maximum eigenvalue of R x (N). Thus, T BCED (x) can be taken as the maximum eigenvalue of the sample covariance matrix. Note that this test is a special case of the eigenvalue-based detection (EBD) [20][21][22][23][24][25].

Combining Space and Time Correlation
In addition to being spatially correlated, the received signal samples are usually correlated in time due to the following reasons.
(1) The received signal is oversampled. Let Δ 0 be the Nyquist sampling period of continuous-time signal s c (t) and let s c (nΔ 0 ) be the sampled signal based on the Nyquist sampling rate. Thanks to the Nyquist theorem, the signal s c (t) can be expressed as where g(t) is an interpolation function. Hence, the signal samples s(n) = s c (nΔ s ) are only related to s c (nΔ 0 ), where Δ s is the actual sampling period. If the sampling rate at the receiver is R s = 1/Δ s > 1/Δ 0 , that is, Δ s < Δ 0 , then s(n) = s c (nΔ s ) must be correlated over n. An example of this is the wireless microphone signal specified in the IEEE 802.22 standard [6,7], which occupies about 200 KHz in a 6-MHz TV band. In this example, if we sample the received signal with sampling rate no lower than 6 MHz, the wireless microphone signal is actually oversampled and the resulting signal samples are highly correlated in time.
(2) The propagation channel is time-dispersive. In this case, the received signal can be expressed as where s 0 (t) is the transmitted signal and h(t) is the response of the time-dispersive channel. Since the sampling period Δ s is usually very small, the integration (24) can be approximated as Hence, where For time-dispersive channels, J 1 > J 0 and thus even if the original signal samples s 0 (nΔ s )'s are i.i.d., the received signal samples s c (nΔ s )'s are correlated.
(3) The transmitted signal is correlated in time. In this case, even if the channel is flat-fading and there is no oversampling at the receiver, the received signal samples are correlated.
The above discussions suggest that the assumption of independent (in time) received signal samples may be invalid in practice, such that the detection methods relying on this assumption may not perform optimally. However, additional correlation in time may not be harmful for signal detection, while the problem is how we can exploit this property. For the multi-antenna/receiver case, the received signal samples are also correlated in space. Thus, to use both the space and time correlations, we may stack the signals from the M 6 EURASIP Journal on Advances in Signal Processing antennas and over L sampling periods all together and define the corresponding ML × 1 signal/noise vectors: Then, by replacing x(n) by x L (n), we can directly extend the previously introduced OCED and BCED methods to incorporate joint space-time processing. Similarly, the eigenvaluebased detection methods [21][22][23][24] can also be modified to work for correlated signals in both time and space. Another approach to make use of space-time signal correlation is the covariance based detection [27,28,61] briefly described as follows. Defining the space-time statistical covariance matrices for the signal and noise as respectively, we can verify that If the signal is not present, R L,s = 0, and thus the off-diagonal elements in R L,x are all zeros. If there is a signal and the signal samples are correlated, R L,s is not a diagonal matrix. Hence, the nonzero off-diagonal elements of R L,x can be used for signal detection.
In practice, the statistical covariance matrix can only be computed using a limited number of signal samples, where R L,x can be approximated by the sample covariance matrix defined as Based on the sample covariance matrix, we could develop the covariance absolute value (CAV) test [27,28] defined as where r nm (N) denotes the (n, m)th element of the sample covariance matrix R L,x (N). There are other ways to utilize the elements in the sample covariance matrix, for example, the maximum value of the nondiagonal elements, to form different test statistics.
Especially, when we have some prior information on the source signal correlation, we may choose a corresponding subset of the elements in the sample covariance matrix to form a more efficient test.

Cyclostationary Detection
Practical communication signals may have special statistical features. For example, digital modulated signals have nonrandom components such as double sidedness due to sinewave carrier and keying rate due to symbol period. Such signals have a special statistical feature called cyclostationarity, that is, their statistical parameters vary periodically in time. This cyclostationarity can be extracted by the spectral-correlation density (SCD) function [16][17][18]. For a cyclostationary signal, its SCD function takes nonzero values at some nonzero cyclic frequencies. On the other hand, noise does not have any cyclostationarity at all; that is, its SCD function has zero values at all non-zero cyclic frequencies.
Hence, we can distinguish signal from noise by analyzing the SCD function. Furthermore, it is possible to distinguish the signal type because different signals may have different nonzero cyclic frequencies.
In the following, we list cyclic frequencies for some signals of practical interest [17,18].
(1) Analog TV signal: it has cyclic frequencies at multiples of the TV-signal horizontal line-scan rate (15.75 KHz in USA, 15.625 KHz in Europe).
(3) PM and FM signal: x(t) = cos(2π f c t+φ(t)). It usually has cyclic frequencies at ±2 f c . The characteristics of the SCD function at cyclic frequency ±2 f c depend on φ(t).
When source signal x(t) passes through a wireless channel, the received signal is impaired by the unknown propagation channel. In general, the received signal can be written as EURASIP Journal on Advances in Signal Processing 7 where ⊗ denotes the convolution, and h(t) denotes the channel response. It can be shown that the SCD function of y(t) is where * denotes the conjugate, α denotes the cyclic frequency for x(t), H( f ) is the Fourier transform of the channel h(t), and S x ( f ) is the SCD function of x(t). Thus, the unknown channel could have major impacts on the strength of SCD at certain cyclic frequencies.
Although cyclostationary detection has certain advantages (e.g., robustness to uncertainty in noise power and propagation channel), it also has some disadvantages: (1) it needs a very high sampling rate; (2) the computation of SCD function requires large number of samples and thus high computational complexity; (3) the strength of SCD could be affected by the unknown channel; (4) the sampling time error and frequency offset could affect the cyclic frequencies.

Cooperative Sensing
When there are multiple users/receivers distributed in different locations, it is possible for them to cooperate to achieve higher sensing reliability, thus resulting in various cooperative sensing schemes [34-44, 53, 62]. Generally speaking, if each user sends its observed data or processed data to a specific user, which jointly processes the collected data and makes a final decision, this cooperative sensing scheme is called data fusion. Alternatively, if multiple receivers process their observed data independently and send their decisions to a specific user, which then makes a final decision, it is called decision fusion.

Data Fusion.
If the raw data from all receivers are sent to a central processor, the previously discussed methods for multi-antenna sensing can be directly applied. However, communication of raw data may be very expensive for practical applications. Hence, in many cases, users only send processed/compressed data to the central processor.
A simple cooperative sensing scheme based on the energy detection is the combined energy detection. For this scheme, each user computes its received source signal (including the noise) energy as T ED,i = (1/N) N−1 n=0 |x i (n)| 2 and sends it to the central processor, which sums the collected energy values using a linear combination (LC) to obtain the following test statistic: where g i is the combining coefficient, with g i ≥ 0 and M i=1 g i = 1. If there is no information on the source signal power received by each user, the EGC can be used, that is, g i = 1/M for all i. If the source signal power received by each user is known, the optimal combining coefficients can be found [38,43]. For the low-SNR case, it can be shown [43] that the optimal combining coefficients are given by where σ 2 i is the received source signal (excluding the noise) power of user i.
A fusion scheme based on the CAV is given in [53], which has the capability to mitigate interference and noise uncertainty.

Decision Fusion.
In decision fusion, each user sends its one-bit or multiple-bit decision to a central processor, which deploys a fusion rule to make the final decision. Specifically, if each user only sends one-bit decision ("1" for signal present and "0" for signal absent) and no other information is available at the central processor, some commonly adopted decision fusion rules are described as follows [42].
(1) "Logical-OR (LO)" Rule: If one of the decisions is "1," the final decision is "1." Assuming that all decisions are independent, then the probability of detection and probability of false alarm of the final decision are P f a,i ), respectively, where P d,i and P f a,i are the probability of detection and probability of false alarm for user i, respectively. respectively.
Alternatively, each user can send multiple-bit decision such that the central processor gets more information to make a more reliable decision. A fusion scheme based on multiple-bit decisions is shown in [41]. In general, there is a tradeoff between the number of decision bits and the fusion 8 EURASIP Journal on Advances in Signal Processing reliability. There are also other fusion rules that may require additional information [34,63].
Although cooperative sensing can achieve better performance, there are some issues associated with it. First, reliable information exchanges among the cooperating users must be guaranteed. In an ad hoc network, this is by no means a simple task. Second, most data fusion methods in literature are based on the simple energy detection and flat-fading channel model, while more advanced data fusion algorithms such as cyclostationary detection, space-time combining, and eigenvalue-based detection, over more practical propagation channels need to be further investigated. Third, existing decision fusions have mostly assumed that decisions of different users are independent, which may not be true because all users actually receive signals from some common sources. At last, practical fusion algorithms should be robust to data errors due to channel impairment, interference, and noise.

Noise Power Uncertainty and Estimation
For many detection methods, the receiver noise power is assumed to be known a priori, in order to form the test statistic and/or set the test threshold. However, the noise power level may change over time, thus yielding the socalled noise uncertainty problem. There are two types of noise uncertainty: receiver device noise uncertainty and environment noise uncertainty. The receiver device noise uncertainty comes from [9][10][11]: (a) nonlinearity of receiver components and (b) time-varying thermal noise in these components. The environment noise uncertainty is caused by transmissions of other users, either unintentionally or intentionally. Because of the noise uncertainty, in practice, it is very difficult to obtain the accurate noise power.
Let the estimated noise power be σ 2 η = ασ 2 η , where α is called the noise uncertainty factor. The upper bound on α (in dB scale) is then defined as B = sup 10 log 10 α , (39) where B is called the noise uncertainty bound. It is usually assumed that α in dB scale, that is, 10 log 10 α, is uniformly distributed in the interval [−B, B] [10]. In practice, the noise uncertainty bound of a receiving device is normally below 2 dB [10,64], while the environment/interference noise uncertainty can be much larger [10]. When there is noise uncertainty, it is known that the energy detection is not effective [9][10][11]64].
To resolve the noise uncertainty problem, we need to estimate the noise power in real time. For the multi-antenna case, if we know that the number of active primary signals, K, is smaller than M, the minimum eigenvalue of the sample covariance matrix can be a reasonable estimate of the noise power. If we further assume to know the difference M − K, the average of the M − K smallest eigenvalues can be used as a better estimate of the noise power. Accordingly, instead of comparing the test statistics with an assumed noise power, we can compare them with the estimated noise power from the sample covariance matrix. For example, we can   compare T BCED and T ED with the minimum eigenvalue of the sample covariance matrix, resulting in the maximum to minimum eigenvalue (MME) detection and energy to minimum eigenvalue (EME) detection, respectively [21,22]. These methods can also be used for the single-antenna case if signal samples are time-correlated [22]. Figures 1 and 2 show the Receiver Operating Characteristics (ROC) curves (P d versus P f a ) at SNR = −15 dB, N = 5000, M = 4, and K = 1. In Figure 1, the source signal is i.i.d and the flat-fading channel is assumed, while in Figure 2, the source signal is the wireless microphone signal [61,65] and the multipath fading channel (with eight independent taps of equal power) is assumed. For Figure 2, in order to exploit the correlation of signal samples in both space and time, the received signal samples are stacked as in (27). In both figures, "ED-x dB" means the energy detection with x-dB noise uncertainty. Note that both BCED and ED use the true noise power to set the test threshold, while MME and EME only use the estimated noise power as the minimum eigenvalue of the sample covariance matrix. It is observed that for both cases of i.i.d source (Figure 1) and correlated source (Figure 2), BCED performs better than ED, and so does MME than EME. Comparing Figures 1 and 2, we see that BCED and MME work better for correlated source signals, while the reverse is true for ED and EME. It is also observed that the performance of ED degrades dramatically when there is noise power uncertainty.

Detection Threshold and Test Statistic Distribution
To make a decision on whether signal is present, we need to set a threshold γ for each proposed test statistic, such that certain P d and/or P f a can be achieved. For a fixed sample size N, we cannot set the threshold to meet the targets for arbitrarily high P d and low P f a at the same time, as they are conflicting to each other. Since we have little or no prior information on the signal (actually we even do not know whether there is a signal or not), it is difficult to set the threshold based on P d . Hence, a common practice is to choose the threshold based on P f a under hypothesis H 0 . Without loss of generality, the test threshold can be decomposed into the following form: γ = γ 1 T 0 (x), where γ 1 is related to the sample size N and the target P f a , and T 0 (x) is a statistic related to the noise distribution under H 0 . For example, for the energy detection with known noise power, we have For the matched-filtering detection with known noise power, we have For the EME/MME detection with no knowledge on the noise power, we have where λ min (N) is the minimum eigenvalue of the sample covariance matrix. For the CAV detection, we can set In practice, the parameter γ 1 can be set either empirically based on the observations over a period of time when the signal is known to be absent, or analytically based on the distribution of the test statistic under H 0 . In general, such distributions are difficult to find, while some known results are given as follows.
For energy detection defined in (8), it can be shown that for a sufficiently large values of N, its test statistic can be well approximated by the Gaussian distribution, that is, Accordingly, for given P f a and N, the corresponding γ 1 can be found as where For the matched-filtering detection defined in (9), for a sufficiently large N, we have Thereby, for given P f a and N, it can be shown that For the GLRT-based detection, it can be shown that the asymptotic (as N → ∞) log-likelihood ratio is central chisquare distributed [13]. More precisely, where r is the number of independent scalar unknowns under H 0 and H 1 . For instance, if σ 2 η is known while R s is not, r will be equal to the number of independent real-valued scalar variables in R s . However, there is no explicit expression for γ 1 in this case.
Random matrix theory (RMT) is useful for determining the test statistic distribution and the parameter γ 1 for the class of eigenvalue-based detection methods. In the following, we provide an example for the BCED detection method with known noise power, that is, T 0 (x) = σ 2 η . For this method, we actually compare the ratio of the maximum eigenvalue of the sample covariance matrix R x (N) to the noise power σ 2 η with a threshold γ 1 . To set the value for γ 1 , we need to know the distribution of λ max (N)/σ 2 η for any finite N. With a finite N, R x (N) may be very different from the actual covariance matrix R x due to the noise. In fact, characterizing the eigenvalue distributions for R x (N) is a very complicated problem [66][67][68][69], which also makes the choice of γ 1 difficult in general.
When there is no signal, R x (N) reduces to R η (N), which is the sample covariance matrix of the noise only. It is known that R η (N) is a Wishart random matrix [66]. The study of the eigenvalue distributions for random matrices is a very hot research topic over recent years in mathematics, communications engineering, and physics. The joint PDF of the ordered eigenvalues of a Wishart random matrix has been known for many years [66]. However, since the expression of the joint PDF is very complicated, no simple closed-form expressions have been found for the marginal PDFs of the ordered eigenvalues, although some computable expressions have been found in [70]. Recently, Johnstone and Johansson have found the distribution of the largest eigenvalue [67,68] of a Wishart random matrix as described in the following theorem.
The Tracy-Widom distribution provides the limiting law for the largest eigenvalue of certain random matrices [71,72]. Let F 1 be the cumulative distribution function (CDF) of the Tracy-Widom distribution of order 1. We have where q(u) is the solution of the nonlinear Painlevé II differential equation given by Accordingly, numerical solutions can be found for function F 1 (t) at different values of t. Also, there have been tables for values of F 1 (t) [67] and Matlab codes to compute them [73]. Based on the above results, the probability of false alarm for the BCED detection can be obtained as which leads to or equivalently, From the definitions of μ and ν in Theorem 1, we finally obtain the value for γ 1 as Note that γ 1 depends only on N and P f a . A similar approach like the above can be used for the case of MME detection, as shown in [21,22]. Figure 3 shows the expected (theoretical) and actual (by simulation) probability of false alarm values based on the theoretical threshold in (55) for N = 5000, M = 8, and K = 1. It is observed that the differences between these two sets of values are reasonably small, suggesting that the choice of the theoretical threshold is quite accurate.

Robust Spectrum Sensing
In many detection applications, the knowledge of signal and/or noise is limited, incomplete, or imprecise. This is especially true in cognitive radio systems, where the primary users usually do not cooperate with the secondary users and as a result the wireless propagation channels between the primary and secondary users are hard to be predicted or estimated. Moreover, intentional or unintentional interference is very common in wireless communications such that the resulting noise distribution becomes unpredictable. Suppose that a detector is designed for specific signal and noise distributions. A pertinent question is then as follows: how sensitive is the performance of the detector to the errors in signal and/or noise distributions? In many situations, the designed detector based on the nominal assumptions may suffer a drastic degradation in performance even with small deviations from the assumptions. Consequently, the searching for robust detection methods has been of great interest in the field of signal processing and many others [74][75][76][77]. A very useful paradigm to design robust detectors is the maxmin approach, which maximizes the worst case detection performance. Among others, two techniques are very useful for robust cognitive radio spectrum sensing: the robust hypothesis testing [75] and the robust matched filtering [76,77]. In the following, we will give a brief overview on them, while for other robust detection techniques, the interested readers may refer to the excellent survey paper [78] and references therein.

Robust Hypothesis Testing.
Let the PDF of a received signal sample be f 1 at hypothesis H 1 and f 0 at hypothesis H 0 . If we know these two functions, the LRT-based detection described in Section 2 is optimal. However, in practice, due to channel impairment, noise uncertainty, and interference, it is very hard, if possible, to obtain these two functions exactly. One possible situation is when we only know that f 1 and f 0 belong to certain classes. One such class is called the ε-contamination class given by where f 0 j ( j = 0, 1) is the nominal PDF under hypothesis H j , j in [0, 1] is the maximum degree of contamination, and g j is an arbitrary density function. Assume that we only know f 0 j and j (an upper bound for contamination), j = 1, 2. The problem is then to design a detection scheme to minimize the worst-case probability of error (e.g., probability of false alarm plus probability of mis-detection), that is, finding a detector Ψ such that Hubber [75] proved that the optimal test statistic is a "censored" version of the LRT given by where and c 0 , c 1 are nonnegative numbers related to 0 , 1 , f 0 0 , and f 0 1 [75,78]. Note that if choosing c 0 = 0 and c 1 = +∞, the test is the conventional LRT with respect to nominal PDFs, f 0 0 and f 0 1 .
After this seminal work, there have been quite a few researches in this area [78]. For example, similar minmax solutions are found for some other uncertainty models [78].

Robust Matched Filtering. We turn the model (4) into a vector form as
where s is the signal vector and η is the noise vector. Suppose that s is known. In general, a matched-filtering detection is T MF = g T x. Let the covariance matrix of the noise be R η = E(ηη T ). If R η = σ 2 η I, it is known that choosing g = s is optimal. In general, it is easy to verify that the optimal g to maximize the SNR is In practice, the signal vector s may not be known exactly. For example, s may be only known to be around s 0 with some errors modeled by where Δ is an upper bound on the Euclidean-norm of the error. In this case, we are interested in finding a proper value for g such that the worst-case SNR is maximized, that is, It was proved in [76,77] that the optimal solution for the above maxmin problem is where δ is a nonnegative number such that δ 2 g 2 = Δ. It is noted that there are also researches on the robust matched filtering detection when the signal has other types of uncertainty [78]. Moreover, if the noise has uncertainties, that is, R η is not known exactly, or both noise and signal have uncertainties, the optimal robust matched-filtering detector was also found for some specific uncertainty models in [78].

Practical Considerations and Future Developments
Although there have been quite a few methods proposed for spectrum sensing, their realization and performance in practical cognitive radio applications need to be tested [50][51][52]. To build a practical sensing device, many factors should be considered. Some of them are discussed as follows.
(1) Narrowband noise. One or more narrowband filters may be used to extract the signal from a specific band. These filters can be analog or digital. Only if the filter is ideally designed and the signal is critically sampled (sampling rate is the same as the bandwidth of the filter), the discrete noise samples could be i.i.d. In a practical device, however, the noise samples are usually correlated. This will cause many sensing methods unworkable, because they usually assume that the noise samples are i.i.d. For some methods, a noise prewhitening process can be used to make the noise samples i.i.d. prior to the signal detection. For example, this method has been deployed in [22] to enable the eigenvalue-based detection methods. The similar method can be used for covariance-based detection methods, for example, the CAV.
(2) Spurious signal and interference. The received signal may contain not only the desired signal and white noise but also some spurious signal and interference. The spurious signal may be generated by Analog-to-Digital Converters (ADC) due to its nonlinearity [79] or other intentional/unintentional transmitters. If the sensing antenna is near some electronic devices, the spurious signal generated by the devices can be strong in the received signal. For some sensing methods, such unwanted signals will be detected as signals rather than noise. This will increase the probability of false alarm. There are methods to mitigate the spurious signal at the device level [79]. Alternatively, signal processing techniques can be used to eliminate the impact of spurious signal/interference [53]. It is very difficult, if possible, to estimate the interference waveform or distribution because of its variation with time and location. Depending on situations, the interference power could be lower or higher than the noise power. If the interference power is much higher than the noise power, it is possible to estimate the interference first and subtract it from the received signal. However, since we usually intend to detect signal at very low SNR, the error of the interference estimation could be large enough (say, larger than the primary signal) such that the detection with the residue signal after the interference subtraction is still unreliable. If the interference power is low, it is hard to estimate it anyway. Hence, in general we cannot rely on the interference estimation and subtraction, especially for very low-power signal detection.
(3) Fixed point realization. Many hardware realizations use fixed point rather than floating point computation. This will limit the accuracy of detection methods due to the signal truncation when it is saturated. A detection method should be robust to such unpredictable errors.
(4) Wideband sensing. A cognitive radio device may need to monitor a very large contiguous or noncontiguous frequency range to find the best available band(s) for transmission. The aggregate bandwidth could be as large as several GHz. Such wideband sensing requires ultrawide band RF frontend and very fast signal processing devices. To sense a very large frequency range, typically a corresponding large sampling rate is required, which is very challenging for practical implementation. Fortunately, if a large part of the frequency range is vacant, that is, the signal is frequency-domain sparse, we can use the recently developed compressed sampling (also called compressed sensing) to reduce the sampling rate by a large margin [80][81][82]. Although there have been studies in wideband sensing algorithms [26,[83][84][85][86][87], more researches are needed especially when the center frequencies and bandwidths of the primary signals are unknown within the frequency range of interest.
(5) Complexity. This is of course one of the major factors affecting the implementation of a sensing method. Simple but effective methods are always preferable.
To detect a desired signal at very low SNR and in a harsh environment is by no means a simple task. In this paper, major attention is paid to the statistical detection methods. The major advantage of such methods is their little dependency on signal/channel knowledge as well as relative ease for realization. However, their disadvantage is also obvious: they are in general vulnerable to undesired interferences. How we can effectively combine the statistical detection with known signal features is not yet well understood. This might be a promising research direction. Furthermore, most exiting spectrum sensing methods are passive in the sense that they have neglected the interactions between the primary and secondary networks via their mutual interferences. If the reaction of the primary user (e.g., power control) upon receiving the secondary interference is exploited, some active spectrum sensing methods can be designed, which could significantly outperform the conventional passive sensing methods [88,89]. At last, detecting the presence of signal is only the basic task of sensing. For a radio with high level of cognition, further information such as signal waveform and modulation schemes may be exploited. Therefore, signal identification turns to be an advanced task of sensing. If we could find an effective method for this advanced task, it in turn can help the basic sensing task.

Conclusion
In this paper, various spectrum sensing techniques have been reviewed. Special attention has been paid to blind sensing methods that do not need information of the source signals and the propagation channels. It has been shown that spacetime joint signal processing not only improves the sensing performance but also solves the noise uncertainty problem to some extent. Theoretical analysis on test statistic distribution and threshold setting has also been investigated.