EURASIP Journal on Applied Signal Processing 2005:3, 439–454 c ○ 2005 Hindawi Publishing Corporation Timing Acquisition with Noisy Template for Ultra-Wideband Communications in Dense Multipath

Timing acquisition is critical to enabling the potential of ultra-wideband (UWB) radios in high-speed, short-range indoor wireless networking. An effective timing acquisition method should not only operate at a low sampling rate to reduce implementation complexity and synchronization time, but also be able to collect sufficient signal energy in order to operate in a reasonable transmit SNR regime. Energy capture for time-hopping impulse-radio transmissions in dense multipath is particularly challenging during the synchronization phase, in the absence of reliable channel and timing information. In this paper, we develop an efficient sampling strategy for correlation-based receivers to accomplish adequate energy capture at a low cost, using a noisy correlation template constructed directly from the received waveform. Merging our sampling operation based on noisy template with low-complexity timing acquisition schemes, we derive enhanced cyclostationarity-based blind synchronizers, as well as data-aided maximum likelihood timing offset estimators, all operating at a low frame rate. Both analysis and simulations confirm evident improvement in timing performance when using our noisy template, which makes our low-complexity timing acquisition algorithms attractive for practical UWB systems operating in dense multipath.


INTRODUCTION
Ultra-wideband (UWB) communications have raised increasing interest in commercial use, with the release of the UWB spectral masks by the US Federal Communications Commission (FCC) in 2002 [1]. Bearing information over repeated ultrashort (nanosecond scale) pulses with low transmission power density, UWB impulse radios come with many beneficial features [2,3]: the ability to exploit the pronounced diversity gain inherent in highly frequency-selective dense-multipath indoor propagation environments; potentiality for very high data rates and large user capacity offered by the enormous bandwidth; noiselike interference to other systems operating on the same band; and so on.
However, the unique features of UWB signaling also impose a challenging task of timing offset estimation (TOE), whose accuracy and complexity directly affect the synchronization speed and overall system performance. In a carrierless UWB impulse radio [2], every symbol is repeatedly transmitted at a low duty cycle over a large number of frames with one pulse per frame, in order to gather adequate symbol energy while maintaining low power density. Such a transmission structure entails a twofold TOE task [4]: one is the coarse frame-level acquisition to identify when the first frame in each symbol starts, and the other is the fine pulse-level tracking to find where a pulse is located within a frame.
UWB radios operating in diverse propagation environments encounter different timing tolerances to acquisition and tracking errors. In traditional radar and ranging applications, UWB signals typically propagate through direct-path or sparse-multipath channels. Tracking accuracy is very critical under such scenarios, in order to capture those ultrashort pulses arriving at the receiver with a low-duty cycle. It has been shown that only a slight timing jitter (tracking error) of a tenth of the pulse duration can decrease the system throughput to zero [5]. Conventional synchronization based on peak-picking the correlation output between the received signal and the transmit waveform template not only has prohibitive complexity due to the exhaustive search over thousands of bins, but also is subject to mistiming caused by spurious multipath returns. Recent efforts in the quest for fast tracking include a coarse bin search for direct-path AWGN channels [6]; a bit reversal search for the noiseless case [7]; a special beacon code that increases the search intervals in conjunction with a bank of correlators that operate in the absence of multipath [8]; and a subspace-based spectral estimator for synchronization in outdoor AWGN or sparsemultipath channels [9].
Emerging interest in commercial applications of UWB radios focuses on dense-multipath indoor wireless communications [10]. In a dense-multipath channel, the received signal no longer has a low-duty cycle, as each transmitted pulse is spread out in the channel by a large number of closely spaced paths to occupy almost the entire frame duration. As a result, a RAKE-type receiver is relatively robust to fine-scale tracking errors [4], and frame-level acquisition becomes the more critical timing task that perplexes the system performance, especially when frame-dependent timehopping (TH) code is employed for smoothing the transmit spectrum and for enabling multiple access [3]. Densemultipath propagation also entails a great challenge in signal energy capture [11], which is exacerbated during the synchronization phase since an optimum maximum ratio combiner (MRC) is no longer applicable due to the lack of channel and timing information. Effective timing acquisition thus faces two primary challenges: (i) to reduce the acquisition time and complexity, which calls for digital synchronizers that operate at a low sampling rate, preferably at the frame rate; and (ii) to provide the desired acquisition accuracy at a reasonable transmission power, which can be achieved only when a "timing-blind" synchronizer is able to capture sufficient symbol energy in the presence of dense multipath.
Towards the first challenge, low-complexity timing acquisition algorithms based on both non-data-aided (NDA) (aka blind) estimation and data-aided (DA) estimation are developed recently [12,13,14]. In the NDA mode, cyclostationarity (CS) that is naturally present in the frame repetition pattern of UWB impulse radio signaling is exploited to acquire the timing offset using frame-rate samples [12,13]. In the DA mode, timing acquisition based on frame-rate samples is achieved using a generalized likelihood ratio test (GLRT), where the effective channel amplitudes are regarded as nuisance parameters [14]. Both the CS and GLRT methods avoid high pulse-rate sampling, thus being able to markedly reduce the implementation complexity and improve the synchronization speed as well. On the other hand, since they both rely on conventional sliding correlation with transmit waveform template to generate digital samples, only a small portion of the transmit power can be captured in the presence of dense multipath. For performance consideration, joint UWB channel and timing estimation based on the ML criterion has been investigated in [15]. Such an approach strives to resolve all the strong paths in the channel in order to capture most of the signal energy; but it requires a very high sampling rate that is even faster than the pulse rate. Recently a DA rapid synchronizer was devel-oped using frame-rate cross-correlation samples of neighboring noisy received waveforms [16], relying on a special pilot symbol pattern. It is worth noting that the second timing challenge of energy capture has not been a concern in narrowband communications, because of the full-duty-cycle transmission structure used therein. As we know, the performance of any estimator is directly affected by the effective receive SNR. Energy capture emerges as a unique challenge in UWB timing, since the receive SNR is not simply related to the transmit energy, but also largely determined by the percentage of energy retained at the receiver after multipath fading. It is essential that both of the two timing challenges are properly treated, in order for an UWB synchronizer to operate effectively in dense-multipath environments.
This paper endeavors to develop accurate and lowcomplexity timing acquisition under dense-multipath propagation settings. Based on the CS and GLRT frameworks, our goal is to enhance them with adequate energy capture capability in the absence of channel estimation and at the same time retain their salient low-complexity features. To this end, we propose a noisy template approach in which a frame-rate correlator selects a proper segment of the received noisy waveform to serve as its correlation template. Notwithstanding its noise contamination effect, such a template inherently records the pulse distortion experienced during propagation; therefore the correlation output samples are able to collect the signal energy spread out over the entire unknown channel. Building upon such a distinct correlation structure with noisy template, we will then derive the resultant noisy-CS blind algorithms and noisy-GLRT DA algorithms for accurate timing acquisition. Our new timing recovery algorithms exemplify the merging of low-complexity digital acquisition strategies with effective energy capture sampling that generates high-quality discrete-time samples. Together, low-complexity, high-performance synchronizers emerge, which are suitable for practical UWB applications operating in dense multipath.
The rest of the paper is organized as follows. A general UWB signal model with mistiming is described in Section 2.
The key noisy template concept and its implementation in both the NDA and DA modes are elaborated in Section 3. Section 4 develops novel noisy-CS NDA timing algorithms based on results of conventional-CS acquisition, followed by analytic evaluation of our noisy template in the context of timing recovery capability. Section 5 describes merging of the noisy template with the GLRT rule to accomplish asymptotically optimal frame-rate DA acquisition. Performance evaluation of the proposed NDA and DA schemes is illustrated via simulations in Section 6, followed by a summary in Section 7. Focusing on the energy capture and complexity aspects of UWB synchronizer designs, this paper confines its treatment to peer-to-peer wireless links, leaving the multiple-access scenario for future work.
Notation. Throughout the paper, we denote by x the integer floor of x, and [x] y := x − x/ y y the modulo of x with base y.

SIGNAL MODEL
In a UWB impulse radio, every symbol is composed of N f repeated pulses, with one pulse per frame of frame duration T f . Each pulse p(t) is of ultrashort duration T p (T p T f ) at the nanosecond scale, occupying an ultrawide bandwidth. Each frame contains N c chips, each of duration T c . The equivalent symbol signature waveform is p s (t) := : c( j)T c < T f } represents user-specific pseudorandom TH code employed for user separation [3]. The hopping pattern may take on different forms, such as fast hopping in which the TH code changes on a frame-by-frame basis but repeats its pattern for all symbols, or slow hopping in which the TH code hops on a symbol-by-symbol basis but remains invariant for all frames within a symbol. We adopt the widely used fast TH pattern unless otherwise specified. Let s(k) ∈ {±1} be independent and identically distributed (i.i.d.) binary data symbols with energy E s spread over N f frames. Focusing on pulse amplitude modulation (PAM), we express the transmitted PAM-UWB waveform as (c.f. Figure 1a) The fading channel can be described by a tapped-delayline (TDL) structure with an impulse response h(t) := L−1 l=0 α l δ(t − τ l ), where the tap attenuation α l and tap delay τ l can be modeled statistically according to [17]. With τ 0 ≤ τ 1 ≤ · · · ≤ τ L−1 , the timing information of interest refers to the first arrival time τ 0 , which has to be estimated prior to coherent symbol detection. To isolate τ 0 , define τ l,0 := τ l − τ 0 as the relative time delay of each tap, where τ L−1,0 ≤ T f − 2T p is the channel delay spread. The aggregate receive pulse per frame is then given by p r (t) := L−1 l=0 α l p(t − τ l,0 ); see the pulses in dashed lines in Figure 1b. After u(t) propagates through the fading channel, the received signal at the output of the receiver antenna is given by where p rs (t) := 0 ) denotes the aggregate receive waveform of each symbol with duration T s ; see the composite symbol waveform in solid lines in Figure 1b. The term w(t) accounts for both ambient noise and multiple access interference, whose composite effect is approximated as a white Gaussian process with zero mean and power spectral density (PSD) σ 2 w . Based on different time scales, the timing offset parameter τ 0 can be expressed as τ 0 := n s T s + n T f + , where n s := τ 0 /T s ≥ 0 denotes the symbol-level timing offset, n := (τ 0 − n s T s )/T f ∈ [0, N f − 1] denotes the framelevel offset, and := [τ 0 ] T f ∈ [0, T f ) denotes the pulse-level offset, respectively. These different levels of timing offset are illustrated in Figure 1 via an example with τ 0 = (n s = 0) · T s + (n = 1) · T f + ( = T f /3) and N f = 3. Since a synchronizer operating in a blind mode cannot distinguish two time delays that are separated by multiple symbol intervals, we assume n s = 0 without loss of generality. In a dense-multipath channel, taps are closely spaced such that τ l+1 − τ l < 2T p , for all l. Under this condition, a frame-rate correlator can almost always catch some pulse return(s) in every frame duration, regardless of the time alignment. Thus system performance is relatively robust to pulse-level timing offset, but is very sensitive to frame-level offset [4]. As a result, we focus on rapid UWB timing acquisition in dense multipath, which amounts to estimating n in the NDA mode and n s , n in the DA mode.

NOISY TEMPLATES
For digital timing recovery, we adopt a general correlationbased synchronization structure shown in Figure 2. A correlator sampler operates at the frame rate to generate digital samples {x(n)}, followed by all-digital timing acquisition using either NDA or DA methods which we will describe, respectively, in Sections 4 and 5. In a frequency-selective multipath fading channel, the accuracy of digital synchronization relies on not only the algorithm used to recover the timing information, but also the sampling mechanism used to capture the symbol energy scattered in the channel. Properly selecting the correlation template will yield high receive SNR in the sampled data, which will be critical to attaining accurate timing acquisition. Unfortunately, the conventional template p s (t) cannot effectively capture the received signal energy in the presence of dense multipath, whereas the optimal correlation template relies on the channel and timing information that is not available to the receiver during the acquisition phase. In this section, we propose a simple and practical noisy template that is a noise-contaminated version of, but asymptotically close to, the unknown optimal template.

Ideal correlation template under mistiming
To understand the noisy template concept, we first inspect the ideal symbol-by-symbol correlation template that maximizes the receive SNR per symbol under mistiming. Let v(t) represent any unit-energy correlation template that is defined within a symbol period t ∈ [0, T s ], that is, y(kN f + 2) y(kN f + 1) Symbols with the same sign Symbols with opposite signs (a) Ts It is evident from the matched filtering theory that the opti- which is in essence the aggregate receive waveform p rs (t) after being circularly shifted byτ 0 . Apparently, v o (t) entails not only the channel information embedded in p rs (t) but also the timing offset τ 0 , both of which are unknown during the synchronization phase. An effective correlation template should match to v o (t) to a large extent, subject to constraints imposed by practical feasibility.

Construction of symbol-rate noisy template
Our approach of approximating the unknown v o (t) is rooted in the fact that the received signal r(t) inherently contains the ideal template v o (t) subject to noise contamination. To illustrate this, we partition r(t) into consecutive T s -long segments with reference to the receiver's clock; see the dashed vertical lines in Figure 1b at t = kT s , for all k. For a symbol-bysymbol detector, the kth segment of r(t) yields the decision statistic for the kth symbol s(k). When τ 0 = 0, the segment boundaries coincide with the received symbol boundaries, and the noise-free version of each segment is exactly the symbol waveform p rs (t) scaled by the corresponding symbol sign and amplitude. Under mistiming τ 0 = 0, in contrast, each of the T s -long observation windows encompasses two consecutive symbols s(k) and s(k − 1), as illustrated in Figure 1b. The symbol waveform p rs (t) cannot be directly identified from the received signal segments because of the unknown τ 0 ; see the vertical solid lines in Figure 1b representing the actual received symbol boundaries. Interestingly, when two consecutive symbols have the same sign, that is, s(k) = s(k − 1), the corresponding received waveform segment reflects the channel effect, but not the symbol transition effect. Therefore, the noise-free version of this waveform segment is exactly the ideal template v o (t) under mistiming; see the T s -long ideal template waveform in the shaded area in Figure 1c, which matches the unknown channel in Figure 1b. This observation suggests that a noisy version of the ideal template can be acquired from some properly selected received signal segments in which the two contributing symbols have the same sign.
To do so, we simply intercept such a T s -long segment of r(t) based on the receiver's clock. We denote this waveform segment as p is a noisy copy of the ideal template (after scaling by 1/ E s ), and is related to p rs (t) by When constructing p ts (t), it is required that the selected segment of r(t) contains two consecutive symbols of the same sign. To satisfy this condition, we propose two different selection strategies for the NDA and DA modes, respectively.  Figure 2: Correlator-based frame-rate digital synchronizer structure.

DA mode
In the DA mode, it is convenient to insert same-sign symbol pairs into the training sequence. Knowing the training sequence pattern, the receiver only needs to acquire n s in order to identify those T s -long segments containing two consecutive symbols of the same sign. Acquiring n s pertains signal detection, which can be typically done effectively via energy detection [18]. In [14], a coarse estimator of the symbollevel offset n s is designed based on the correlator output samples generated by sliding r(t) over a simple TH-independent transmit templatep s (t) := When multiple segments of r(t) are eligible as the noisy template, we recommend that these segments be averaged to smooth out the noise effect, which will in turn improve the overall timing estimation accuracy. With a sufficient number of eligible segments, the averaged noisy template asymptotically approaches the ideal template v o (t) that optimizes the receive SNR.

NDA mode
In the NDA mode, the receiver needs to identify the information sequence pattern in order to intercept a segment of r(t) with contributing symbols of the same sign. To this end, we adopt a maximum energy detection rule to decide which segment should be selected as the noisy template. Suppose r(t) is observed over a span of M symbol periods. Associated with each observation window of duration T s , we define its instantaneous sample energy as E n := is the known THindependent transmit waveform. In the absence of noise, a segment containing same-sign symbols will yield a higher instantaneous energy level than that containing oppositesign symbols. To minimize the noise effect in practice, we select the desired noisy template from the T s -long observation window that has the largest instantaneous energy. Such a template selection rule is illustrated by an example in Figure 3, in which we use M = 4. According to the information symbol pattern, both E 1 and E 3 have larger values than E 2 in the absence of noise. With a noisy r(t), we will choose whichever segment that corresponds to the largest energy level among {E n } 3 n=0 . In this particular case, E 1 has the largest instantaneous value in the presence of noise, which prompts us to choose the first waveform segment in the observation window t ∈ [T s , 2T s ] in Figure 3a as our noisy template.

Construction of frame-rate noisy templates
Having explained the optimal correlation template v o (t) and its noisy version p ts (t) for a symbol-by-symbol correlator, questions arise pertaining to our digital synchronizer structure in Figure 2: what is the proper correlation template in a frame-rate correlator and how to use it to generate frame-rate samples? A common thought would be to directly correlate r(t) with p ts (t) that slides over every T f seconds. However, doing so does not preserve proper matched filtering, and thus is ineffective in energy capture. Shifting the noisy template in Figure 1c by T f , we can easily witness a mismatch between the shifted template and the received waveform in Figure 1b. In fact, the noisy template should always be shifted by integer multiples of T s seconds in order to match r(t), regardless of the sampling rate used in the correlator. Based on this observation, we now describe how to obtain frame-rate samples using p ts (t).
To generate frame-rate samples, we suppose that some frame-long templates are used to correlate with r(t). As shown in Figure 1, when a frame-dependent TH code is used, the received signal waveform within each frame period differs from frame to frame due to TH, but the pattern repeats for every N f consecutive frames comprising a symbol. This indicates that an ideal correlator must use N f different template waveforms for optimal matched filtering in all frames. To do so, we divide the noisy template p ts (t) into N f consecutive segments of duration T f , resulting in N f different frame-long templates, which we term as frame templates. We denote these frame templates by Using these T f -long noisy templates, the frame-rate samples x(n) are generated as follows: where the sample index n is linked to a proper index of the frame templates via a modulo operation on n with base N f . Equation (6) indicates that each frame template is used only once every N f frames in order to match to the TH pattern. Although conceptually there need N f distinct frame templates for optimal matched filtering in the presence of frame-dependent TH, it is not necessary to individually construct these frame templates during the sampling process.
Symbols with opposite signs Symbols with the same sign To implement (6), the receiver can equivalently use the T slong noisy template p ts (t) and slide it symbol by symbol. In this way, p ts (t) is periodically extended over the entire time range to multiple with r(t), as depicted in Figure 1c. Correlating integration is then performed within a T f -long observation window that is moved frame by frame to generate x(n). In this sampling process, the noisy template p ts (t) is shifted at the symbol rate, while the integration window is shifted at the frame rate, both with reference to the receiver's clock. Next, we derive the mathematical expressions for the frame templates {p t f (t; i)} in order to facilitate ensuing algorithm development. Knowing that p t f (t; i) is constructed from a properly selected segment of r(t), we first investigate the general form of a frame-long segment where the time-shift term Λ k, j,l (t) is given by Due to the narrow nonzero support of p(t), only a few triplets (k, j, l) contribute to nonzero summands in (7), for a given n. Those effective triplets should satisfy the condition Λ k, j,l (t + nT f ) ∈ (0, T p ), which means that all the large-scale frame-level components inside Λ k, j,l (t + nT f ) should add up to zero. For this reason, we dissect Λ k, j,l (t+nT f ) into frame-level and pulse-level portions based on the different time scale of each component, as described below: At the pulse level, since t − c j T c − τ l,0 − + T f could span over (−2T f + 3T p , 2T f ), therefore for any q > 1 or q < −1, the pulse-level value will fall outside (0, T p ). As a result, there are only three possible integer values for q, i.e., q = 1, 0, 1, that may result in an effective triplet. Equation (8) indicates that each frame-long observation window (indexed by n for r n (t)) could span over up to three consecutive signal frames (indexed by kN f + j and q's). This result can be intuitively understood by noting that at most two previous frame-long aggregate pulses p r (t) can affect the current observing frame, due to the TH code c j T c , the relative multipath delay of the channel τ l,0 , and the unresolved timing uncertainty , the combined effect of which is c j T c + τ l,0 + < 3T f − 3T p , that is, less than three frames. We define s k := s( k/N f ) as the input symbol at the kth frame. For the effective triplets (k, j, l) that are indexed by q ∈ [−1, 1] in (8), we note that s(k) = s kN f + j = s n−n −1−q and c j = c kN f + j = c n−n −1−q . Thus (7) can be simplified to where A tq (t; n) := L−1 l=0 α l p(t − c n−q T c − τ l,0 − + qT f + T f ) has a time span of up to T f . Observe from the transmitreceive model (9) that A tq (t; n) represents the continuoustime frame-rate (indexed by n) effective channel, which changes from frame to frame due to the fast TH code.
Accordingly, an optimum frame-long correlation template should also change from frame by frame within each symbol duration, in order to match with the received equivalent channel A tq (t; n). To reach the noisy frame templates from r n (t), we simply need to set s n−n −q−1 = 1 in (9) to reflect the fact that a noisy template is selected as a segment of r(t) inside which the contributing symbols have the same (positive 1 ) sign. As a result, our T f -long noisy templates p t f (t; i) can be expressed by where w i (t), t ∈ [0, T f ), corresponds to the scaled noise term in the selected segment of r(t).
So far, we have constructed noisy templates for a framerate correlator with the goal of capturing sufficient signal energy in the presence of dense multipath. Using the frame-rate samples x(n) in (6), we now proceed to develop digital synchronizers to acquire the frame-level time offset n .

NDA TIMING ACQUISITION
In a blind mode, timing acquisition is possible only by exploiting some distinct features of the received digital signals. In the context of narrowband receivers, possible blind solutions include CS-based TOE and subspace-based search. CS typically arises in narrowband systems with multiple antennas or oversampling at a rate higher than the Nyquist rate [19], both of which should be avoided in low-complexity UWB radios. On the other hand, since every informationbearing pulse is repeatedly transmitted over N f frames in UWB signaling, sampling at a low frame rate will yield N f copies of each symbol, which is equivalent to oversampling a T s -long symbol at a rate of 1/T f = N f /T s , thus giving rise to timing-dependent cyclic statistics [12]. The recently developed conventional-CS approach for UWB timing builds upon simple sliding correlation, thus being encumbered by its ineffective energy capture capability in the absence of channel knowledge [12]. In this section, we will develop low-complexity CS timing algorithms based on our new noisy templates, in order to attain sufficient energy capture and enhanced timing accuracy. We start with a brief review of the CS blind acquisition principle established in [12,13].

Conventional-CS blind acquisition
The CS-based acquisition methods in [12] operate on the frame-rate samples x(n) that are obtained by frame-by-frame sliding correlation using p(t) as the correlation template, that is, the template in (6) is replaced by p(t) regardless of the frame index n. Denoting the pulse autocorrelation function as R p (τ) := p(t)p(t − τ)dt, and recalling s k := s( k/N f ), we obtain from (1) that where w(n) is the frame-rate noise sample, and slow TH 2 is considered here, that is, c k := c( k/N f ). Based on the fact that R p (τ) has a narrow nonzero support over (−T p , T p ), only a few (k, l) pairs contribute to the nonzero summands in (11), which simplify x(n) to [12] x Here g q (n) := L−1 l=0 α l R p (−qT f +c n−q T c + +τ l,0 −T f ) denotes the q-dependent discrete-time equivalent channel gain that combines the effects of the transmit filter, multipath propagation, as well as the receive correlator. This model can be intuitively understood by noting that the nth frame-long observation window generating x(n) may encompass up to three input frames s k due to the unknown pulse-level offset as well as the interframe interference induced by TH, whereas the input frame locations k are decided by the frame-level offset n adjusted by multiple q's. It has been established in [12] that the sequence x(n) is a cyclostationary process, which means that the autocorrelation of x(n), defined as r x (n; ν) := E{x(n)x(n + ν)}, is periodic in n with period N f . Being periodic in n, r x (n; ν) accepts a Fourier series (FS) expansion, which yields the cyclic correlation (CC) coefficients R x (l; ν) := (1/N f ) N f −1 n=0 r x (n; ν)e − j(2πln/N f ) . Based on the CS property of R x (l; ν), the detailed derivation in [12] leads to the following result.

Result 1 (CS acquisition via CC). The CC of the framerate samples x(n) has an n -dependent phase in the form of
where |q 0 | ≤ 1. Consequently, the timing information n can be acquired by phase retrieval on R x (l; ν), with a timing ambiguity of up to one frame in acquiring n .
CS also exists in other forms, among which we will explore another set of frame-rate samples y(n) generated by summing up every N f consecutive frame-rate samples x(n), that is, where w(n) := N f −1 k=0 w(n + k) is the composite noise. The time-varying (TV) correlation of y(n), defined by r y (n; ν) := E{y(n)y(n+ν)}, is also periodic in n with period N f . Because y(n) collects a symbol-long portion of r(t) at every consecutive frame (indexed by n), r y (n; ν) turns out to have a ndependent amplitude [12]. Consequently, we can retrieve n by peak-picking r y (n; ν) as follows.
Result 2 (CS acquisition via TV correlation). The TV r y (n; ν) of the frame-rate samples y(n) has peaks at ν max (q 0 ) = n + 1 + q 0 − [n] N f ± N f , for any n and q 0 ∈ [−1, 1]. Peak-picking r y (n; ν) followed by a modulo N f operation on ν max + [n] N f − 1 will thus yield n + q 0 , where q 0 = −1, 0, 1 implies an ambiguity of up to one frame in acquiring n .

Noisy-CS blind acquisition
The conventional-CS-based methods that employ p(t) as the correlation template cannot collect the ample multipath diversity. Motivated by the need for effective energy capture, we will employ a channel-bearing noisy template to accomplish the sampling operation. As discussed in Section 3, the noisefree version of our frame templates given by (10) is nothing but the desired TH-dependent aggregate channel. Next we will demonstrate that the frame-rate samples generated by our noisy template not only preserve the timing-dependent CS property, but also attain adequate energy capture in dense multipath.
Using noisy templates, the frame-rate samples x(n) in (6) can be deduced from (7) and (10) as (14) where A q (n) := 1 [n] N f − n − 1)dt depends on the TH code and w c (n) represents the noise sample which is affected not only by the ambient noise, but also by the noise term in the noisy template.
The discrete-time input-output model in (14) readily leads to a family of enhanced CS-based blind timing acquisition algorithms using the frame-rate samples generated via noisy templates. This is because (14) shares the same form as (12), except that the effective channel gain A q (n) has enhanced energy capture capability in dense-multipath environments. The CS property of x(n) in (14) is evident in the floor operation used in its symbol component s n−n −1−q , which is cyclic with period N f , and is dependent on the time acquisition parameters n . Thus the CS-based acquisition algorithms in Section 4.1 can be directly applied to (14). Based on Results 1 and 2, the following propositions on blind noisy-CS-based timing acquisition are in order.

Proposition 1 (noisy-CS acquisition via TV correlation).
Timing can be acquired from the TV correlation of y(n) := , where x(n) is generated via correlating the received signal r n (t) with a noisy frame template p t f (t; [n] N f ) described by (10) in the presence of fast TH. Estimation of n can be obtained within an ambiguity up to one frame, via peakpicking r y (n; ν) with respect to ν as for each n ∈ [0, N f − 1], followed by averaging across n.

Proposition 2 (noisy CS via CC).
Timing can be acquired from the CC of x(n) that is generated by frame-rate sampling using noisy templates. Estimation of n can be obtained within an ambiguity up to one frame, via retrieving the phase of R x (l; ν) (setting l = ±1 to avoid phase wrapping) as In both propositions, the timing ambiguity of up to one frame hardly affects the overall performance, since typically N f > 20. In practice, the cyclic statistics r y (n; ν) and R x (±1; ν) are replaced by their finitesample estimates obtained over M symbol periods, that is, r y (n; ν) = (1/M) M−1 k=0 y(kN f + n)y(kN f + n + ν) and

Evaluation of noisy template
Although the frame-rate blind estimators in Propositions 1 and 2 resemble the conventional CS in [12,13], the use of noisy templates leads to quite distinct properties for timing acquisition under dense multipath. Evaluation of these properties is in order.

Energy capture capability
It has been emphasized in Section 3 that our noisy correlation template is able to capture sufficient symbol energy via proper matched filtering. To demonstrate how this energy capture capability helps to improve the timing acquisition accuracy, take the noisy-CS TV correlation algorithm in Proposition 1 for example. Here we utilize frame-rate samples y(n) that are the sums of x(n) in (6) over N f consecutive frames. As depicted in Figure 1c, y(n) can be equivalently When s( n−n − 1 N f ) = s( n−n − 1 N f +1), (17) becomes y s (n) = E s Ts 0 p 2 rs (t)dt, which collects the entire symbol energy scattered in the channel p rs (t). In the noise-free case, it is proven in (17) that the energy capture capability of our noisy template is the same as using the unknown p rs (t) to perform maximum ratio combining under perfect timing, even in the presence of TH. Because the parameter estimation error is generally inversely proportional to the receive SNR collected, the performance of the NDA timing acquisition algorithms in Propositions 1 and 2 is considerably improved when digital samples x(n) and y(n) attain near-optimum energy capture.

Timing ambiguity
Compared with the conventional CS, the use of noisy templates is subject to stronger impact from the undesired timing acquisition ambiguity incurred by the unknown . For clarity, we illustrate this timing ambiguity effect in the absence of TH (c.f. Figure 4), which corresponds top rs (t) = N f −1 i=0 p r (t − iT f ). Accordingly, the signal component y s (n) in (17) reduces to where P h := T f 0 p 2 r (t)dt is the receive signal energy per frame and δ h ( ) : dt is a real-valued scalar confined in [0, 1]. From (18), it is evident that the tracking error incurs additional timing ambiguity that is manifested in δ h ( ). Without resolving , the receiver cannot distinguish the true pair {n , δ h ( )} from another pair {n , δ h ( ) + n − n } when only y s (n) is available. Depending on the timing algorithms used to recover n , the estimated n derived from y s (n) may in fact be n +δ h ( ). Fortunately, since δ h ( ) is bounded in [0, 1], the timing ambiguity is confined to within one frame, as stated in Propositions 1 and 2.
The use of a simple correlation template p(t), on the other hand, only incurs a very small δ h ( ) when the channel delay spread is less than but close to T f − T p . Unfortunately, since p(t) has a very narrow nonzero support of T p , it cannot capture much signal energy when sampling at the frame rate in dense multipath. For any other frame-rate correlation template that has a nonzero support of more than T p long, being the optimum template or not, the ambiguity term δ h ( ) always shows up when = 0. Thus this extra acquisition ambiguity is inherent to adequate energy capture prior to tracking.

Noise effect
Because our noisy correlation template is contaminated by noise, the attainable SNR is worse than the optimum (but infeasible) case with an ideal (but unknown) template. Noise enhancement arises from two causes: one is the doubling in the noise variance of each frame-rate sample, resulting from both the received signal noise and the template noise; another is the product of the noise component in r(t) and the noise in the template [20]. Detailed analysis of the composite noise variance is referred to in [20]. Despite the noise enhancement, the advantage of using noisy templates is still pronounced in highly frequency-selective fading channels, since conventional blind sampling techniques are unable to benefit from the large multipath diversity gain.
In summary, the timing performance improvement enabled by near-optimum energy capture via our noisy template considerably outweighs its downsides. Both the bounded timing ambiguity and the noise enhancement are small prices to pay, considering that the desired optimum template v o (t) in (4) entails not only the unknown THinduced multipath channel p rs (t), but also the unknown timing τ 0 .
We conclude this section with some additional remarks on a literature comparison. The noisy template concept has been explored under different contexts. In [21], a transmit reference (TR) scheme pairs each information-bearing pulse with a pilot pulse, where each pilot's receive waveform is used as a noisy template to decode the information-bearing pulse under perfect timing. The template used in TR is time aligned to be exactly a noisy copy of p rs (t), whereas the time location of our correlation template is chosen blindly for synchronization purpose, without necessarily resorting to pilot pulses. In [16], a so-called dirty template scheme is developed for rapid DA timing acquisition. This method correlates the received waveform segments with their noisy neighbors to directly generate correlation samples, and relies on a special pilot symbol pattern to derive its associated timing recovery algorithm. No explicit correlation template is constructed, nor are sampled data resorted to. In contrast, our approach aims at constructing proper noisy templates for correlation-based digital sampling. Based on the digital samples generated via our noisy templates, a variety of digital acquisition algorithms are possible, among which we discuss some NDA and DA examples in this paper.

DA TIMING ACQUISITION
The noisy templates given by (10) can also be used for timing recovery in the DA mode, for which we will develop low-complexity acquisition algorithms based on the ML criterion. Different from [15], we do not attempt to find the ML estimates (MLEs) of the entire channel parameters {α l , τ l } L−1 l=0 , which requires sampling at a rate higher than the pulse rate with complexity related to L. Rather, we aim at recovering the frame-level timing information, treating the equivalent discrete-time channel amplitudes as nuisance parameters. Sufficient energy capture enabled by the MLEs of the channel is alternatively attained via noisy templates in our algorithms, at an impressively low sampling rate of one sample per frame. We will first present our ML DA timing algorithms for a generic RAKE correlation receiver, followed by algorithm improvement via our noisy templates.

Formulation of frame-rate DA timing acquisition
A general frame-rate RAKE receiver generates its output samples z(n) through a bank of RAKE fingers with different delays { τ l ,0 } and weights {w l }, as expressed by 3 . Choosing different parameter sets (L , {w l }, { τ l ,0 }) leads to various RAKE receivers, whose energy capture capability directly affects the timing estimation accuracy and symbol detection performance. For example, setting (L = 1, τ l ,0 = 0) corresponds to a conventional sliding correlator (a.k.a. RAKE-1), which only collects a small portion of the signal energy through a single tap. Choosing (L = L, { τ l,0 = τ l,0 }, {w l = α l }) results in an optimum full-RAKE MRC under perfect timing.
Analogous to (12), and taking into account n s = 0, the discrete-time RAKE output z(n) is given by (19) where w z (n) is a white Gaussian noise sample with variance σ 2 n , and the frame-rate effective channel amplitude is given by L−1 l=0 α l w l R p (−qT f +c n−q T c + − τ l ,0 +τ l,0 − T f ). Since the nonzero support of p R (t) could span over one frame period, the ambiguity factor q in (19) takes three possible values due to the unknown timing offset , the channel delay spread, and the fast TH code. Reiterating the definition s k := s( k/N f ), we note that As a result, we opt to replace s n−nsN f −n −q−1 in (19) by s n−nsN f −n = s( (n − n )/N f − n s ), regardless of q. This is exact for (1−2/N f )100% of the z(n)'s, and is an approximation only for the remaining small percentage of z(n)'s, since N f 1 [14]. The small level of approximation reflects the impact of the inherent timing ambiguity. With the approximation, the frame-rate output of z(n) can be rewritten as By treating other unknown parameters as deterministic and maximizing (25) with respect to a, we obtain the optimum MLE of a as Having the estimate of a, the LLR expression in (25) reduces to J n ; n s := J z s ; a, n , n s = z T s P s n z s , where P s (n ) := S n (S T n S n ) −1 S T n is the projection matrix of S n . Treating J(n ; n s ) in (27) as the objective function to maximize, timing acquisition can be achieved by a grid search over possible (n , n s ) pairs for n ∈ [0, N f − 1] and n s ∈ [0, N − M]. Such a procedure yields the MLEs of all the unknown parameters a, n , and n s , for the general case of TH transmissions with any RAKE types.
It is worth noting that the training sequence should avoid the case of s(n) having the same sign for all n's. If so, the received signal will not reveal any timing information that is inherent in symbol sign transitions. Mathematically, the gain matrix S n would be rank deficient with all its entries being the same. As a result, it would be impossible to estimate the amplitudes a via (26), thus being unable to resolve the timing offset parameters n and n s .

Noisy-ML DA acquisition
Although the GLRT timing acquisition rule applies to any RAKE type, it is generally difficult for a RAKE receiver to properly select its finger parameters (L , {w l }, { τ l ,0 }) to ensure adequate energy capture during the synchronization phase. On the other hand, the aggregate channel itself, p rs (t) under perfect timing or v o (t) under mistiming, consists of a bank of delayed taps corresponding to the multipath returns. It is thus possible to regard a correlator using our noisy template p ts (t) (or equivalently {p t f (t; i)} N f −1 i=0 ) as a RAKE receiver with noisy fingers. Such a noisy frame-rate RAKE receiver can collect the ample multipath diversity gain in a practical manner, without resorting to cumbersome channel estimation at the pulse level.
To merge the noisy template sampling with ML DA acquisition, we borrow the same frame-rate correlation sampling process as detailed in Section 3.3 and used in the noisy-CS scheme. Conceptually being equivalent to a noisy RAKE receiver, our noisy-template-based correlator does not need to implement the actual RAKE fingers for coherent combining. Replacing x(n) by z(n) in (11) for notational consistency, the frame-rate output samples z(n) has a similar form as (19): except that A zq (n) and w zc (n) correspond to the equivalent channel amplitude and the noise sample when our noisy frame templates are used. Because w zc (n) can be well approximated as a white Gaussian noise [20], the LLR expression in (25) also applies to the sampled data generated by the noisy template. Therefore, the ML DA acquisition approach based on the GLRT rule is applicable even when using our noisy template, which we term as the noisy-ML DA timing acquisition method. Reminiscent of the NDA timing acquisition case, the noisy-template-based ML DA approach enjoys adequate energy capture and thus enhanced timing accuracy with a low implementation cost, at the expense of small noise enhancement and extra but bounded timing ambiguity, the negative effects of which become prominent only in the very high SNR region.

SIMULATIONS
To test the proposed timing algorithms, we perform computer simulations using the following transmitter and channel settings. The second derivative of the Gaussian function is used as the UWB pulse shaper, that is, , where we set τ = 0.43 nanosecond to yield a pulse width of T p = 1 nanosecond. The generation of random multipath channels follows the channel model in [17], where rays arrive in several clusters within an observation window, and the amplitude of each arriving ray is a Rayleigh distributed random variable. The parameters of this channel model are set as cluster arrival rate Λ = 0.5 ns −1 , ray arrival rate λ = 2 ns −1 , cluster arrival decay time constant Γ = 30 nanoseconds, and the ray arrival decay time constant γ = 5 nanoseconds. The multipath channel power profile is cut off to make the maximum delay spread τ L−1,0 = 99 nanoseconds. Each symbol duration contains N f = 25 frames and each frame duration is chosen to be T f = 100 nanoseconds. The chip duration is chosen to be T c = 1.0 nanoseconds, and the TH code is chosen randomly over the range [0,90]. Whenever applicable, the timing offset parameters n and are uniformly distributed over [0, N f − 1] and [0, T f ), respectively.
Both the DA and NDA timing acquisition algorithms are tested under various operating conditions, with some comparisons between these two families of estimation techniques. In particular, we focus on evaluating the performance enhancement induced by noisy templates in CS-based NDA acquisition, while similar effects can be expected in MLbased DA acquisition. The performance metrics used include the normalized mean square error (MSE) E{| n − n | 2 /N 2 f } of the n estimates, as well as the system-level BER performance when an optimum matched filter is used for symbol detection using estimated timing τ 0 = n s T s + n T f . The optimal detector is assumed to have the accurate channel knowledge, in order to separate the effect of timing acquisition from imperfect channel estimation. Both MSE and BER performances are plotted versus the symbol signal-to-noise ratio (SNR) defined by E s /σ 2 w . In each test case, a set of M symbols are transmitted either in the NDA or DA mode, with M = 6 and M = 30 representing different lengths of the acquisition time.
Test A (energy capture capability). Both TH and unresolved tracking errors may affect the energy capture capability of the noisy template. For a close examination, we test the effect of each of these two factors separately.
Case 1 (MSE without TH). Under the case of no TH, it is obvious that the noisy templates p t f (t; i) with different indices i in fact have the same waveform. Thus, a single frame-long noisy template is sufficient to collect the signal energy scattered in the entire equivalent channel. Figure 5a shows the notable improvement of noisy CS compared with the conventional CS in the absence of TH, whether with a short or long acquisition length M.
Case 2 (MSE with TH). In this case, the conventional-CS algorithm uses a symbol-dependent slow TH code to retain CS in the samples generated by a simple sliding correlator [12]. In contrast, noisy-CS acquisition can afford to use frame-dependent fast TH while still retaining the CS property, thanks to the N f TH-dependent frame-long noisy templates constructed. The attractive energy capture capability of noisy CS is manifested in the MSE performance comparison illustrated in Figure 5b. When SNR reaches a very high value, for example, 50 dB, the MSE curves of CS and noisy-CS schemes start to merge, due to the extra timing ambiguity that is caused by large tracking errors when using noisy templates. Nevertheless, noisy CS is favored for NDA acquisition, because of the noticeable performance advantage in a practical SNR range (e.g., from 10 dB to 40 dB).
Case 3 (effect of tracking errors). As explained in (17), acquiring n in the presence of the tracking error is in fact estimating n + δ h ( ) in the noisy-CS algorithms, which leads to extra yet small timing ambiguity. This effect of tracking errors is investigated on all CS-based algorithms in Figure 6, with Figure 5b as the reference for comparison. The performance gain exhibited in the small case suggests that further improvement in acquisition accuracy can be achieved by compensating for the tracking error, via simultaneously or iteratively performing small-scale timing tracking as well.
Test B (NDA acquisition). All the CS-based NDA acquisition algorithms are compared under the general case of TH transmissions in the presence of random tracking errors.
Case 1 (MSE performance). The MSE curves of timing acquisition by both TV correlation and CC are depicted in Figure 7a, for both the conventional-CS methods in Results 1 and 2, and the new noisy-CS methods in Propositions 1 and 2. The TV-based algorithms outperform the CC-based algorithms for both CS and noisy CS. Meanwhile, sampling using our noisy template exhibits the expected performance gain in the practical SNR regime.
Comparison between Figure 6 and Figure 7a further reveals the effect of tracking errors on timing acquisition using noisy templates. Under random tracking errors, the MSEs of noisy-CS algorithms start to flatten out after 30 dB.  On the contrary, the MSEs of noisy CS under no tracking errors drop monotonously as the SNR increases, even though the slopes of such drops become slower after 40 dB of SNR. The same trend also occurs in CS algorithms due to the inherent ambiguity represented by q, but the -induced estimation bias δ h ( ) hardly exists in CS-based acquisition.
Case 2 (BER performance). Using an optimal matched filter with perfect channel knowledge, the BER performance subject to imperfect timing acquisition is depicted in Figure 7b. The legend with "ideal CS" represents the CS acquisition using the ideal template v o (t). There is only a very small gap between our noisy-CS performance and the ideal case marked by ideal CS, which indicates that the noise effect of our noisy template is small. On the other hand, there does not seem to be significant performance gaps in BER among CS and noisy-CS algorithms, either. This is because most of the BER curves reach 10 −6 at a low SNR between 20-25 dB, in which region the timing MSEs of all algorithms do not differ much. Since the BER performance is not only affected by the timing acquisition performance, but also directly determined by the receiver structure, this phenomenon is only pertinent to the optimal detector assumed. Note that the optimal detector we use here is quite an optimistic design: it does not just assume perfect knowledge of the multipath channel p rs (t), but also assumes accurate knowledge of the exact channel (noise-free version of p ts (t)) under mistiming. This requires perfect channel estimation under mistiming, which may not be attainable in practice; thus the BER curves only serve as benchmarks to illustrate the impact of timing acquisition alone, assuming that the ensuing receiver components can be designed ideally. For other more practical receivers, the BER performance is expected to be more sensitive to the acquisition MSE. Nevertheless, Figure 7b indicates that our low-complexity timing algorithms are very effective in improving the system-level performance, compared with the case without doing any timing recovery. At a low frame rate,   our algorithms can attain BER performance that is just several dBs away from the ideal case with perfect timing information.
Test C (DA acquisition). ML DA algorithms proposed in Section 5 are evaluated in Figure 8, in the presence of fast TH and random tracking errors. The frame-rate ML synchronizers based on a conventional RAKE-1 correlation sampler and a noisy-template sampling are compared.
Case 1 (MSE performance). As expected, the performance improvement by the noisy-ML solution over RAKE-1 is quite evident, making noisy ML an attractive choice for UWB timing in a practical low SNR region.
Case 2 (BER performance). Figure 8b provides the BER performance of ML DA algorithms, with two additional curves obtained under perfect timing and no timing for reference. The BER curves are ordered consistently with the MSE results. It is obvious that noisy ML outperforms RAKE-1. The BER performance is quite close to that of an ML synchronizer with an ideal template (marked by ideal ML) and that of the perfect-timing case.
Since the MSE is a direct way to testify the acquisition algorithms, it is useful to compare the MSE performance between the NDA mode ( Figure 7) and the DA mode (Figure 8). At the same frame rate, ML DA algorithms offer better MSE performance in the lower SNR region (less than 15 dB). As SNR increases, noisy-ML DA acquisition exhibits a flat estimation error floor, and eventually merges to the same MSE level as noisy-CS(TV) (M = 30) NDA acquisition. With the help of our noisy template, NDA timing could offer performance comparable to that of DA timing, without a training overhead. It can be also observed by comparison that the timing performance benefits from the energy capture capability more noticeable in the DA mode than in the NDA mode.

SUMMARY
Capitalizing on judiciously designed noisy templates, we present in this paper several low-complexity, highperformance timing acquisition algorithms for UWB TH transmissions in dense-multipath environments. In our correlation template design, N f frame templates are constructed from the received waveform itself, in order to match the TH-dependent aggregate channel that changes from frame to frame. Under such a distinct correlation structure, we are able to not only extract timing-dependent cyclic statistics for devising blind synchronizers, but also demonstrate the performance advantage of our solutions enabled by collecting ample multipath diversity gain even during the blind synchronization phase. We also develop DA timing solutions based on the ML criterion, which operate at a low frame rate to reduce the implementation complexity and synchronization time. In both the DA and NDA modes, the salient multipath energy capture capability of our synchronizers proves to be critical in order for low-complexity timing methods to operate well in a practical transmit SNR region, in the presence of dense multipath.
The noisy template approach applies not only to TH UWB, but also to other modulation and multiple access schemes such as direct-sequence UWB. In any case, the symbol-long noisy template should slide at the symbol rate in order to match the aggregate channel that is repeated for every symbol, while the integration window for correlation can be shifted frame by frame to generate frame-rate samples. This sampling process is different from the traditional sliding correlation, in which the correlation template and the integration window are shifted at the same rate. Our noisytemplate approach also works for low scattering environments to conveniently capture scattered multipath returns prior to timing. However, since the inherent multipath diversity provided by the channel is relatively small in this case, the advantage of using noisy templates for collecting the diversity gain at a low cost has to be weighed against the noise enhancement effect incurred by lacking of (computationally expensive) channel estimation.