Roles of equalization in radar imaging: modeling for superesolution in 3D reconstruction

In radar imaging, resolution is generally dictated by its corresponding system point spread function, the response to a point source as a result of an external excitation. This notion of resolution turns out to be rather questionable, as the interpretation of echoes received from a range of continuous targets according to a linear model allows one to cast the imaging problem as a communication system that maps the target reflectivity function onto measurements, which in turn suggests that by virtue of sampling and equalization, one can achieve unlimited spatial resolution. This article reviews the fundamental problem inherent to pulse compression in a multistatic multi-input-multi-output (MIMO) scenario, from a communications viewpoint, in both focused and un-focused scenarios. We generalize the notion of 1D range compression and replace it by a more general 4D pulse compression. The process of focusing and scanning over a 3D object can be interpreted as a MIMO 4D convolution between a reflectivity tensor and a space-varying system, which naturally induces a 4D MIMO channel convolution model. This implies that several well-established block and linear equalization methods can be easily extended to a 3D scenario with the purpose of achieving exact reconstruction of a given reflectivity volume. That is, assuming that no multiple scattering occurs, resolution is only limited in range by the sampling device in the unfocused case, while unlimited in case of focusing at multiple depths. Exact reconstruction under a zero-forcing or least-squares criterion depends solely on the amount of diversity induced by sampling in both space (via scanning rate) and time (via sampling rate), which further allows for a tradeoff between range and cross-range resolution. For instance, the fastest scanning rate is achieved by steering non overlapping beams, in which case portions of the object can be reconstructed independently from each other.


Introduction
Pulse compression is a term widely used within the radar systems community, and was originally achieved by transmitting a Chirp waveform, which, when convolved with its matched version, allows the target information to be recovered without the need of transmitting a narrow pulse, in many cases unfeasible in a practical implementation [1]. The basic idea to restore resolution relies on the assertion that the equivalent of two successively received narrow pulses, whose widths are determined by the chirp bandwidth, are not allowed to interfere. After sampling the received signal, since either a matched a or mismatched receiver is FIR, the overall response can at best only approximate an impulse, therefore limiting resolution. Thus, for a long time, there has been a common belief, especially in military and geophysics applications, that target information recovery for transmissions beyond the limits imposed by the system ambiguity function is not possible, the main reason stemming from the fact that, apparently, accuracy in detection has always been linked to a bandwidth issue.
The notion of resolution in imaging systems turns out to be rather questionable as raised by several authors, be it in the electromagnetic domain [2] for radar applications, or in the acoustic context for ultrasound imaging [3] and geophysics applications [4]. The interpretation of echoes received from a range of targets according to a linear model is by no means new [5,6] and even though the idea of deconvolution is well-established in related fields as seismic exploration [4], a more precise formulation and connections of the imaging problem to signal processing and communications jargons in a 3D scenario is in order.
As noted in [6], the idea of a resolution criterion dates back to Lord Rayleigh (1879) in the context of light intensity distribution, who proposed that two components of equal intensity should be considered resolved, when the first maximum of one component sits at the first minimum of the other. Even though Rayleigh resolution has been considered a bound on resolution, the so-called super-resolution algorithms became known as way to surpass this limit. These include minimum variance unbiased estimate (MVUE) and minimum norm solutions, which are very well established estimation techniques [7]. In pulse radar systems, we can interpret such two components as two pulses received from a straight line of range, which can still be resolved even in cases where the Rayleigh resolution limit is exceeded. Of course, this is the definition of resolution in one dimension.
More generally speaking, in an imaging system, resolution is defined independently in range and cross-range (azimuth and elevation). While the former is dictated by the pulse width, or, by the system ambiguity function (which usually assumes a matched filter structure for the receiver) [1], cross-range resolution is achieved by increasing the antenna aperture, or the transmitted signal frequency; In this scenario, the term superresolution refers to additional processing in which resolution beyond what is determined by the antenna features, transmitted pulse, and the usual matched filter can be achieved, by making use of additional diversity and/or signal processing methods.
Synthetic aperture radars (SAR) are one common example where a virtual array can be generated via antenna motion [1,8]. There is a mathematically rich literature on SARs which exploits spatial diversity of transmitters and receivers in order to sharpen the system point-spread-function [8]. Most of these derivations assume weak scattering, although schemes that exploit multiple scattering are also possible. This can be developed in a monostatic, bistatic, or more generally, in multistatic and wideband configurations. Readers should also refer to [9] for a group-theoretic based approach on radar imaging, assuming a free-space environment. In all these references, a common implication is that the synthesis of an ideal ambiguity function, i.e., an exact Dirac-delta, is not possible, due to strong constraints in its mathematical form.
Thus, while mathematicians have extensively studied the imaging formation from an inverse operator perspective, the signal processing community has tackled the same issue using estimation and information theoretic tools, which have been widely established in digital communications [10]. In particular, the subject of pulse compression has evolved into a more general design of waveforms in multi-input-multi-output (MIMO) radars, capable of transmitting different wave-forms from each antenna. They can be classified into bistatic MIMO radar and colocated MIMO radar. In the former, the transmitting antennas are widely separated, so that each antenna sees a different radar cross sections (RCS). In colocated MIMO radar, the antennas are closely positioned, so that they share the same RCS. The latter provides interference rejection capabilities, improved parameter identification and flexibility in the design of a beam pattern. Each specific scheme is determined by the antennas and imaging area positions in space, which, in turn, are defined by the application at hand. For instance, beamformers provide a flexible way of steering and focusing a microwave or ultrasound beam by time-delaying the corresponding signals a teach antenna [11]. For narrow-band transmitted signals, the MIMO radar is thus approximated by a single-inputmultiple-output (SIMO) system, where scaled versions of a single waveform are transmitted (also known as phased-array radar).
Regardless of the application, the final goal continues to be the optimization of the corresponding MIMO ambiguity function, either by optimizing the waveform (aiming range compression) or its covariance matrix (which affects resolution in cross-range). In the latter, the MIMO ambiguity function is optimized by minimizing the side lobes of the autocorrelations and cross correlations of the waveforms. While most of these references consider point targets, the waveforms can be optimized considering extended targets as well [10]. In particular, [10] considers waveform and receiver optimization assuming that the target and clutter impulse responses are known, by maximizing the SINR in the colocated MIMO radar scheme. One important issue that results from the study of waveform design is whether, from an estimation point of view, one is really required to compress, as raised in [12] considering both Cramér-Rao bound (CRB) and sufficient statistics for parameter estimation.
Unfortunately, be it for imaging point targets or extended targets, the resolution issue has been somewhat overlooked, in the sense that it is still considered bounded by the system ambiguity function (even in the MIMO case). A situation where this becomes of paramount importance arises frequently in radar-based medical imaging systems [13]. For example, detecting tumors or imaging microcalcifications in the human body often requires sub-millimeter resolution, in which case the method should also cope with the practical aspects that worsen the resolution limits normally encountered in long-distance communications applications-see [14] and the references in the same issue. One of the main differences is in the choice of the carrier frequency [15]. Note that the higher the carrier frequency, the better the cross-range resolution (which justifies a 1D model, considering that an entire area can be covered via A-scan lines). On the other hand, in order not to incur into Rayleigh scattering effects [16], one must pick f c >v/Δ min , where v is the wave speed inside the medium, and Δ min the desired minimum range resolution. Now, the higher the carrier, the more the signal is attenuated by the medium, which constitutes one of the fundamental tradeoffs in radar systems, namely, Cross-range Resolution × Depth of penetration. For instance, in X-ray imaging, the exposure to ionizing radiation inherent to X-rays can be damaging to tissues, and, in general, harmful to the human body, since their effects are cumulative. These two factors are the main reasons why microwave imaging systems (MIS) have emerged as an attractive alternative in breast imaging for tumor detection [17][18][19]. Note that lower frequencies also limit the size of the antenna employed, and there should be a fair compromise among all the above factors.
Highly focused beams can also be achieved without necessarily increasing the carrier frequency, with an idea known as Time Reversal [20], another super-resolution technique that generates a virtual aperture by timereversing the signals received from a point emitting source via multiple paths, exploiting the non-homogeneous nature of a medium by re-transmitting them in order to achieve high focusing at the source. The timereversal concept is identical to a matched filtering process, whose principle can be shown analytically, for instance, in [21]. It had been pro-posed recently in conjunction with a binary hypothesis test of detection in the presence or absence of a target in a highly cluttered environment considering a single antenna [22], and further in a beamforming configuration for target detection imaging [23], the latter further applied to a breast cancer scenario. It has also been combined with Spotlight SAR in [24].
An interesting fact with all current radar-based approaches is that resolution in range seems to be treated quite differently from resolution in cross-range. That is, while the former requires "pulse compression", the latter is solved via "beam compression". Note that because in general we have a collection of signals transmitted and received from several directions, the pulse is in reality a 3D function whose shape is given by the combined effect of the transmitted pulse and the MIMO medium's Green function. This motivates us to think of compression as an operation that is performed in all three dimensions in space, in a way that the term "pulse compression" is more properly applied to achieve volumetric, and not only range resolution. Moreover, methods for recovering resolution in radar imaging systems have always assumed a continuous model for the reflectivity information. This is perhaps the major reason why the solution to the inverse problem could only be approximated, normally done so via matched filtering. The fact is, in theory, reflectivity is discrete information whose granularity is dictated by the amount of resolution one is interested in achieving, for both range and cross-range. In this article, this will be accomplished by recasting the imaging problem into one that maps a reflectivity volume into measurements collected by illuminating a target object with a generally unfocused beam. As a result, the commonalities related to minimum range resolution for target detection can be considered irrelevant, once the pulse transmission problem is restated as a standard, albeit, 3D communication problem.
This presentation proposes to formulate the (static) imaging problem in both unfocused and focused cases as a MIMO multirate multidimensional transmission problem. We show how steering at several depths in the focused case can be interpreted as an induced 4D MIMO convolution model, subject to standard linear estimation/equalization problems. In the unfocused case, the amount of induced space and time diversity will be the determinant factor for the accuracy in reflectivity estimation. For the focused scenario, the central idea is to perform estimation of the entire object in one shot, once data from all points inside the object have been collected. As an important special case, we consider the single-antenna, free-space, narrowband, far-field scenario. This results in a Toeplitz structured transmission matrices so that when the beam pattern function is separable, efficient superfast estimation receivers based on the known transmitted pulse exist. This has significant impact on all radar-based applications requiring sub-millimeter resolution, which call for efficient receivers in order to cope with the often massive amount of data involved.
Notation: We denote by T the transpose and * the transpose complex conjugate operator, while we use * to denote convolution. We use vec(·) to stack the columns of a matrix into a vector, and resh M (·) to reshape a matrix with M rows into a column vector.

Background on related work
To contextualize the problem, we shall refer to [25], where an isotropic point-source model p(t, x) = p(t)δ(|x x 0 |), is considered in a wideband distributed aperture. We assume that the transmitted signal wavelength is smaller than the minimum diameter of a target. Otherwise, Rayleigh scattering theory must apply, and the received signal will also depend on the wavelength and target diameter [16]. Hence, under the so-called distorted wave Born approximation, the scattered field measured at point x and time t from a continuous distribution of targets is given by is the incident field, s(z) is the reflectivity function, and g(x, z, t) is the Green's function associated with the background medium [25]. Defining we can write In the development of [25], the scattering operator ℋ (s) represents a linear mapping from the input p(t) to the measurements, in a distributed array pattern. For simplicity, consider a single source of waveform. The operator ℋ is then seen as a linear mapping from the reflectivity variable s(·) to the scattering operator ℋ(s). In [25], the main contribution is towards inverting ℋ(s), or, more generally speaking, estimating s. That is, given the Green's function of the medium, an inverse scattering operator ℋ -1 or a least-squares estimator of the form is considered (or a MMSE approach assuming knowledge of noise statistics); For such, the adjoint ℋ* is computed based on the underlying Green's function. Then, assuming a finite number of transmit vectors, the spectral decomposition of (ℋ*ℋ) is pre-computed, a fact that allows for an offline computation of the action of the estimator on the received signal.
The points below raise important questions with regards to a practical and thus more precise modeling of the imaging system, which shall reveal the extent to which the information on the reflectivity function s(z) can be recovered: 1. The above widebeam, wideband formulation does not convey any information on the geometry of the problem, and does not tell us what limits on resolution can be achieved for a particular antenna configuration. A general question that remains is, how do we quantify resolution by focusing (or unfocusing) when illuminating a target? How does the geometry of a beam and the way it is steered over an object affect its overall reconstruction?
2. For a given carrier frequency, in which case we assume that we have sufficient penetration in range, the minimum range resolution Δ min obtained from the function s(z) is translated into time resolution by virtue of sampling. This immediately raises some fundamental questions regarding feasibility and complexity of implementation of the corresponding receiver. That is, is it possible to sample the received signal fast enough? What kind of transmit/receive structure must be employed to achieve a certain resolution level? What is the impact of sampling in the amount of complexity required for a specific application?
3. In a practical imaging system subject to sampling, and considering finite extent signals in the freespace, what should be the counterpart of (5)? Is there an efficient way of reconstruction that exploits structure, rather than performing the spectral factorization of (ℋ*ℋ)?
In the sequel, we connect fundamental aspects of radar imaging with modeling and equalization in digital communications as a starting point to tackle the above issues.

Motivation
Assume that the transmitter is able to produce a highly collimated beam, so that it can be approximately confined to a straight line, as illustrated in Figure 1.
In this case, our problem reduces to that of imaging a line, defined by the reflectivity function s(z). This is the central idea in radar imaging; to recover s(z) from measurements y(t, x, z), by converting time delay into range z.
An important case of study arises when the background medium is the freespace, where . s(τ ). Figure 1 One dimension imaging inside a volume, assuming a very narrow beam.
the solution of the freespace wave equation. Let p (t) = a (t) e jω 0 t be the transmitted pulse from an antenna at point x, where ω c = 2πf c is the carrier frequency, with f c given in Hertz. Considering that a(t) is slowly varying, we have where v(t) is an additive noise term, in order to model an additional uncertainty. Let τ = 2|rx 0 |/c, and definẽ Substituting it into (7), we get Figure 2 shows the equivalent baseband system, obtained after demodulation of the received signal.
Given a predetermined target resolution, in general two received pulses will overlap, resulting in what is known in the communications jargon as inter symbol interference (ISI). Thus, a natural question that follows is whether such waveform can be designed in order to avoid ISI. The answer to this question is well known in communications theory; for such, the transmitted pulse must have a Nyquist property, meaning that its response should have zero crossings at multiples of the sampling instant T s . In theory, this can be easily achieved via, for instance, a raised-cosine pulse, which naturally conforms with the Rayleigh resolution criterion previously discussed. Two implementation issues arise in such design: in practice, sampling is performed at delayed instants kT s + t 0 , which again results in ISI; Also, for high-resolution sampling, one needs to design a very wide bandwidth pulse, which implies that we should be able to squeeze a narrow, high-energy lobe in the interval [-T s , T s ]. This may not be feasible in many applications (as medical ones), so that, in general, the pulse width will span a wider range. This will result in ISI as well.
All of the above assume no processing at the receiver. When the latter is employed, several techniques for ISI removal can be considered, so that the main question now is how to design both transmitter (pulse) and receiver according to some design criteria, in a way that ISI be completely removed or minimized. For instance, when the received pulses are sufficiently separated in time (yielding a certain spacial resolution), so that they do not overlap, MMSE equalization implies that the receive filter must be matched to a(t), in which case the output SNR is maximized. This is in fact the standard form of pulse compression used in radar systems, where a(t) is normally chosen as a chirp waveform. The chirp waveform is already by itself a form of coding, where the so-called time bandwidth product quantifies the amount of redundancy introduced [13]. Bark codes are also used as a pulse design technique, and in general, the idea is to look for good codes whose orthogonality properties can contribute to undistorted transmission [1,26]. On the other hand, if one is concerned with the minimization of false alarms due to erroneous detection of side-lobes, a mismatched filter design must be pursued instead [26,27]. Note that, without considering noise, a mismatched, possibly infinite impulse response (IIR) filter provides much higher resolution compared to a matched one (if the receiver is allowed to be IIR, the overall transmission can be an exact impulse). Moreover, even in the case where the received pulses do overlap, a general MMSE or least-squares equalizer can be considered instead. Compensating for the inherent ISI in (8) in the MSE sense is the central idea behind most communications systems, even though several strategies of design and transmit/receive schemes can be envisioned.
In general, how fast one can sample is determined by the state-of-the-art of analog-to-digital converters (ADC). For instance, at the speed of light, sub-millimeter resolution implies that two received pulses are separated in time by T s = 2R min /c = 2 · 10 -3 /(3 · 10 8 ) = 6 · 10 -12 s. Since 20 MHz ADCs are the most easily available, achieving such sampling rate becomes a formidable challenge in practice. On the other hand, in the acoustic context these issues are no longer at stake, since ultrasound waves propagate with much lower velocity (average 1540 m/s inside the body). However, the low velocity of ultrasound waves limits the frame rate in which the image is produced, especially for narrow beams and multiple focal points. These facts motivate us to look into general widebeam solutions, where backscatter from several points is received at the same point, or, possibly, at several receiving locations at the same time. Nevertheless, regardless of the nature of the transmissions (electromagnetic or acoustic), in this article we shall first assume that one can sample as fast as necessary, and will examine a few well-known structures into which the radar problem can be straightforwardly cast. The range resolution limit is then tackled by considering Transmitted pulse

Multidimensional multirate model
Equation (4) can be seen as a mapping from the transmitted waveform to measurements, through an operator that depends on the reflectivity function. A more insightful view of this problem, however, is to express the system as one that maps reflectivity in space onto measurements at the receiving antenna, through the combination of a known waveform and the medium's Green function. In this case, different transmission and reception architectures can be envisioned, depending on the level of focusing employed. This is intimately related to (1) the geometry of the antennas and object in space; (2) how many antenna elements are activated for transmission and reception in a certain scheme; and (3) on the nature of the transmitted signals.

Unfocused imaging: MIMO radar
In the MIMO case the transmitting and receiving beams are unfocused, and each antenna element transmits a different waveform. This is illustrated in Figure 3, where a particular patch from an antenna transmits omnidirectionally from each element illuminating the target completely. Thus, for a given point z inside the volume, the received signal at an arbitrary point y on the antenna, given the transmitted pulse p(t, x) at point x only, can be compactly written as [1] where p"(t) denotes the second derivative with respect to t, and J s (x) and J r (y) are the time derivatives of the current density at the transmitting and receiving points on the antenna.
Our goal is to reconstruct the entire object volume, or, equivalently, to recover all the cross sections that form a 3D image. Here is where the geometry of the problem comes into picture. First, it is assumed that the aspect angle of all antennas elements is such that they see the same RCS. Second, we would like to preserve the geometry of the volume we want to reconstruct, in the sense that each cross section is "unrolled" to form a 3D tensor representing the object image. That is, we shall map each point of the reflectivity function s(z) onto the corresponding point of the unrolled object, which we define as continuous tensor defined by S(r), where r = [r 1 r 2 r] T . Its discrete-time version is denoted by the R 1 × R 2 × R 3 tensor S, which is obtained by sampling S at [n 0 T 0 n 1 T 1 kT r ] T . The quantities {R 1 , R 2 } define the target resolution in cross range, while R 3 defines resolution in range. We shall denote by S(k), k = 0, 1, . . . , R 3 -1 the R 1 × R 2 matrix containing a rectangular lattice of points within a cross section of the object at range k, and by S(n)a vector of reflectivities within the tensor S, n = n 0 , n 1 , . . . , n R 1 R 2 , along direction n. Moreover, we define by s k the reflectivity at point We further define R = R 1 R 2 . Let us spatially sample the antenna surface, so that the transmitting patch contains Q t = Q 1 Q 2 antenna elements, which transmit Q t waveforms, i.e., p ℓ (t), ℓ = 0, 1, . . . , Q t -1. These in turn are received at Q r receiving elements, at y i (t), i = 0, 1, . . . , Q r -1. The complete model is illustrated in Figure 4a, which comprises the forward and backward MIMO channels denoted by G f (t) and G b (t), respectively. Let u k (t), k = 0, 1, . . . ,R be the kth output of G f (t), at the corresponding point r k , and note that we can instead express the problem as a mapping from reflectivities to measurements as shown in Figure 4b.
We thus see that our main problem is how to recover the tensor S, given the measurements y i (t), at the desired resolution {R 1 , R 2 , R 3 }. Before discussing the possible solutions to this problem, we first analyze what happens in the more general case of a focused (transmitted) beam.

Focused imaging
Focusing is achieved by concentrating energy in a small volume within the object, so that only echoes received from that particular volume become significant. This is achieved by integration of delayed copies of the transmitted signal, weighted by the time derivative of the current density over the antenna, giving rise to a beam pattern, as illustrated in Figure 5.
Assuming that the antenna elements are closely located, and that p(t) is narrowband, the transmitter becomes a bidimensional phased array, modeled as a SIMO system represented by a steering vector. On the other hand, the receiving beam can be unfocused in general, so that the received signals are collected at all antenna elements and processed by a MIMO system.
Let the mth entry of the underlying SIMO system be given by e jω c t m , where t m is the delay applied to the signal point m relative to the signal at the center of the antenna x o , with respect to the focal point, which we denote by r f . We shall omit the dependence on r f for simplicity of notation. Let u k (t) be the signal at r k ,   defined within the depth of field corresponding to r f . Equation (10) in this case can be written as where the term J(ℓ) consists of a 2D window that provides apodization, in order to shape the transmitted beam pattern. Let a ℓ = x ℓx o be a vector from the center of the antenna to an arbitrary point on the antenna, and define the wave number = ω o /c o . Thus, expressing 4π |r−x | , and using the Fraunhofer condition at the focal plane, we have where w( r k − x o , t) is the 2D discrete time Fourier transform of the time varying tapering function J( )e jω c t ḡ k (x , t) at r k at time t, with denoting the corresponding unit-norm vector. We shall define b k (r f , t) as the combined effect of the beam pattern and the transmitted pulse at time t, with f denoting the focal point. The complete system is illustrated by the spatial multirate structure of Figure 6a.
The interpretation of this scheme is as follows. Assume we have the antenna beam focused at r f . Because the beam is narrow at the focal plane, and due to a finite (extent) depth of field, the illuminated portion of the object can be represented by a K 1 × K 2 × K 3 parallelepipedsr j , which contains the corresponding most significant reflectivity samples (i.e., the ones shaped by the transmitted pulse, lateral beam function and the depth of field). Their positions relative to the point r f are set by the spatial shifts The columns of the downsampling matrix B define the geometry in which these spatial blocks are processed, as the object is scanned at a given rate. That is, lets r a ands r b be two parallelepipeds (tensors), possibly intersecting, at focal points r a and r b , respectively. For instance, when K = | det B | (i.e., the volume of the fundamental parallelepiped defined by B), it implies that nonoverlapping blocks are being processed each time. This is actually the lowest spatial rate for which the object can be processed without loss of useful information.
On the other hand, when choosing B = I the object is being scanned at the highest possible rate, which is given by the desired spatial resolution. For simplicity, we shall assume rectangular parallelepipeds, so that the beam region of support coincides with a rectangular parallelepiped as well. Each tensor block is then (continuous-time) convolved with the More specifically, assume J r (x i ) = 1, and let F r j ,q (t) , q = 0, 1, . . . , Q r -1, be the K 1 × K 2 rectangular matrices that constitute F r f (t) at the focal point r j , j = 0, 1, . . . , D − 1, where D is number of foci. The Q r × 1 received vector is defined as y r j (t) = [y 0 (t) y 1 (t) . . . y Q r −1 ] T and expressed as y r j (t) = F r j (t) s r j ( ) + v r j (t), j = 0, 1, . . . , D − 1, (15) where F r j (t) is Q r ×K given by . . .
Here is the crucial point in understanding the imaging process in general. First, it is important to note the difference between the range resolution defined by the spatial sampling T r , and the time separation between two consecutive received pulses (here, more generally we speak of MIMO pulses), which leads to the sampling rate T s = 2Δ min /c, where Δ min is the minimum target separation required for a certain application, in the colocated radar. This coincides with the spatial sampling T r in the far field scenario, since in this case we are allowed only one focal point in range. Hence, T r = Δ min , i.e., we are limited by the achievable time sampling. On the other hand, focusing allows sampling at the desired spatial resolution, so that Δ min is only defined within the incoming tensors of reflectivities. That is, sampling the received vectors at the rate nT s yields the discrete model where F r j (0) . . . F r j (N − 1) is the sampled version of F r j (t) at T s , while s r j (V ) corresponds to the downsampled version of s r j ( ) , with V = Δ min /T r . Figure 6b illustrates the difference between T r and Δ min .  Figure 6 Overall System model as (a) a mapping from reflectivity to measurements; (b) MIMO system representing the imaging problem.

Modern structures for reflectivity estimation
Suppose we are to estimate a tensor arriving from one particular focal point. Essentially, tensor imaging can be cast like any other MIMO estimation problem, formulated as either a Bayesian or classical estimation minimum variance unbiased estimator (MVUE) problem. Due to the difficulty of finding such nonlinear estimators in practice, it is common to restrict the receiver to be linear. Now, assume that the output signal is sampled at T s . In formulating the convolution model for the received echoes, two common structures can be induced, depending on the receiver design:

Block processing
Let L be the received signal window length, and define the vectorized tensor of reflectivities and its corresponding received vector around r j , respectively: where B' = N + L-1, for k = 0, 1,...,B-1. This choice of L induces a block-by-block transmission scheme with interblock interference (IBI), where each block has size L. That is, let D be the number of desired foci set to cover the object space. Then, y r j (k) = H r j x r j (k) +v r j (k), j = 0, 1, . . . , D where with dimensions Q r L ×K(N + L − 1) , andvr j (k) are the vectorized noise samples. This can be the case in a monostatic scenario illustrated in Figure 7a, where at the time the radar starts listening, part of the signal has already returned to the radar site.
On the other hand, if the observation window is sufficiently long so that all incoming MIMO pulses can be completely observed, transmission becomes memoryless, meaning that at the time the radar starts listening, no information of near targets is present (Figure 7b). In this case, we replace (19) and (20) by where H r j has the form When H r j is a tall full-rank matrix, a solution is found rather independently for every focal point. Two important solutions can be envisioned: (i) Matrix Inversion. Here, the solution is given bŷ where The optimal delay δ opt ∈ [0, NQ r ] is chosen such that it minimizes the output noise power, by minimizing the matrix norm (I T δ H r j ) −1 (see [28], for the case of Toeplitz H r j ).
(ii) Minimum variance (least-squares). This is given bŷ In Equation (22), the dimension of H r j is Q r L ×K(N + L − 1) , so that all we need is Q r L ≥K(N + L − 1) . The same goes for Equation (24), where the condition is (N + M − 1)Q r > BK .

T s -spaced space-time linear equalization
In linear equalization, the final goal is to approximate the overall response as much as possible to an impulse (aside from minimizing the noise effect), for which a zero forcing equalizer is in general IIR. Now, if one is restricted to an FIR receiver, several solutions exist, all of which depending on the desired cost function, aiming to minimize the effect of sidelobes and noise at the output.
Referring to Figure 2, if the received signal is sampled at T s > T p , where T p is the pulse width, the standard pulse compression technique attempts to recover the reflectivity information inherent to s(k) via matched filtering. That is the case if one is primarily interested in maximizing the SNR. Alternatively, the requirement on the SNR can be relaxed, and a mismatched filter can be considered instead. In [27,29], significant reduction on the sidelobes of the overall impulse response can be obtained, if different L p -norm minimization criteria are considered. Observe that regardless of the optimization criteria employed, it is not possible in general to reconstruct the input perfectly, unless an IIR receiver is used, which brings up noise amplification issues in practice.
Given the structure of (22), one cannot find a receive MIMO filter W r j (n) , so that when convolved with F r j (n) yields an impulse, unless H r j is a tall full-rank matrix.
This condition is satisfied if Q r L ≥K(N + L − 1) and the subchannels of the MIMO channel do not share common zeros. Still, a solution to this problem can be found in the MVUE sense, given bŷ where h (δQ r L+1:Q r L(δ+1),:) corresponds to the (δ + 1)th block column of H r j , and δ is the optimal delay for this estimation.
Observe that either in the unfocused scenario, or in case each tensor block s r j is to be recovered independently from other blocks, its 3D resolution can only rely on the feasibility of the sampling rate device. That is, while lateral resolution in these cases depends on the spatial grid density, resolution in range will be limited to Δ min . The point is thus how accurately one can estimate s r j by using (27) or (28), specially when H r j is not a tall full-rank matrix.
The fact is that, in addition to the 1D time convolution model for each focal point, scanning at an arbitrary resolution rate yields overlap of successive blocks, which naturally characterizes a volumetric convolution between the object (reflectivity tensor), and the spacevarying tensor defined by the beam pattern for different focal points. The spatial model for this operation will depend on the number of directions and depths the beam is moved to, whereby, similarly to block processing in time, motion can induce a particular processing structure in space (e.g., block processing, with or without overlap). The overall process is in general a 4D MIMO space (time)-varying convolution model.
For instance, choosing D = (R 1 + K 1 − 1) (R 2 + K 2 − 1) (R 3 + K 3 − 1) implies scanning at the spatial resolution {T 0 , T 1 , T r }, and corresponds to a complete 4D MIMO (space-varying) convolution. On the other hand, for the fastest scanning, we can pick D = R 1 R 2 R 3 /| det B| foci, and the whole space is scanned with nonoverlapping blocks. This tells us that unlike the unfocused scenario, or in the case where information received from around the focal point is not sufficient for independent estimations of each block, focusing allows for further diversity; in this sense, resolution is unlimited, and ultimately dictated by the desired spatial sampling of S, and not only by the sampling device. As a result, estimation must be accomplished jointly, after data received from around all foci have been acquired. Of course, the type of receiver will depend heavily on the nature of the signals involved. For instance, the ultrasound wave speed is much slower compared to the one of a microwave, so that quicker estimates must be produced for every focal point in the former case. As we have mentioned, the beam foci and their positions relative to the antenna will define the convolution model employed. For this reason, we shall consider the beam motion in two separate steps, first with respect to lateral motion, covering all azimuths and elevations, which is then followed by depth motion. a) Lateral Motion Consider the signals in (18), and assume that the object is scanned at a fixed depth r = d k across all possible azimuths and elevations such that the object is still within the beam. This corresponds to picking r j = r l,k = r 1 r 2 d k , where {r 1 , r 2 } are varied in order to cover D 1 = (R 1 + K 1 − 1) (R 2 + K 2 − 1) focal points, for l = 0, 1, . . . , D 1 − 1 . We recognize the sequence {y r j (n) = F r j (n − )s r j (V )} as the output of a MIMO 2D space varying convolution between the constituting matrices F r j ,q (n) defined in (16), and the tensor slice at range Vℓ corresponding to depth d k , which we denote by S d k (V ) . This convolution at depth d k results in Q r images, or equivalently, a matrix with vector entries, each one having Q r coefficients, which we denote by Y d k ,n . Note that similarly to the definition of a time window, which sets the number of received images from a focal point, we can also determine a spatial window, with dimension smaller than D 1 . However, in order to exploit full space and time diversity, we shall continue to assume a full spatial convolution model as well. We thus have ([y r0,k (n) y r1,k (n) . . . y rD 1 −1,k (n)]), (29) so we can replace (18) by a more compact form as where H d k (n − ) is a two-level block banded matrix of size Q r (R 1 +K 1 -1)(R 2 +K 2 -1) × R 1 R 2 , representing a MIMO 2D convolution. Defining and denoting where is a Q r (N + B -1) (R 1 + K 1 -1) (R 2 + K 2 -1) × R 1 R 2 B block toeplitz matrix, with block banded blocks.
The 3D model (33) fully describes the far-field transmission scenario for a single depth at d 0 = ∞; we see that while resolution is unlimited in cross-range, we are still limited to estimate B out of K 3 images within the tensor S.

b) Depth Motion
The lateral scanning can be performed at several depths, giving rise to a full 4D convolution model. In other words, (30) can be defined for k = 0, 1, . . . , D 2 , where D 2 = (R 3 + K 3 − 1), so that overall r j covers focal points. There is, however, a subtle difference in the way convolution is performed in the axial direction. While the 2D convolution sum can be computed at the minimum resolution step {T 1 , T 2 }, the volumetric convolution is obtained by combining images at ranges Vℓ, ℓ = 0, 1, . . . , B -1, and at times 0, 1, . . . , N + B -1, for each depth d k . Let C d k ,c , with c = 0, 1, . . . , B -1 be the cth block column of H d k and define Then, denoting y = vec(Y) and x = vec(X) allows as to write where H is now a 4-level block banded convolution matrix, with dimensions Q r (N + B -1) (R 1 + K 1 -1) (R 2 + K 2 -1) (R 3 + K 3 -1) × BR 1 R 2 R 3 given by Equation (38). Again, we can proceed in finding an estimate of the reflectivity tensor S in (37) via matrix inversion or LS, similarly to (25), with H r j replaced by H . Note that the tall full-rank condition imposed for separate blocks is much more relaxed in this case due to the redundancy of reflectivity in adjacent blocks. b

Fractionally-spaced equalization
Due to the massive amount of data generated in radar imaging, a MIMO FIR receiver that reconstructs an image at a fast rate can be highly desirable. As we have mentioned in Section 5.2, for T s -spaced equalization, this is possible only if H r j is tall full-rank. If not, exact reconstruction based on an FIR receiver is still possible by inducing some form of redundancy in the equivalent discrete-time system. As we have mentioned, coding is one form of redundancy, which leads to bandwidth expansion at the transmitter. One way to induce redundancy without physically introducing it at the transmitter, and more importantly, without incurring on bandwidth expansion, is by oversampling the received signal in time. Oversampling was originally performed by integer factors, giving rise to the so-called fractionally spaced equalizers. Recently, it has been shown that sampling the output at fractional rates, say nT S L M , where L >M, also allows for exact FIR receivers, in which case we further assume that L and M are coprime.
After equalization, a rate reduction step restores the original sampling rate (that means resolution in our context). Fractional sampling allows for significant reduction in overhead when compared to the more common T s /2-spaced FS equalizer, which corresponds to the minimum integer sampling rate. Figure 8a illustrates the fractionally-spaced receiver, while Figure 8b shows the equivalent discrete-time model with fractional sampling. The filters F r j (z) and W r j (z) are known as fractional biorthonormal partners [30] if in the absence of noise, the overall transmission is an exact delay (δ).
Oversampling the output is what makes exact reconstruction possible. In fact, using well-known multirate noble identities [31], this scalar system can be further expressed via its block transmission equivalent, according to Figure 9. The polynomials E(z) and R(z) correspond to polyphase matrices, which are uniquely determined by polyphase components F r j (z) and W r j (z) respectively, via their order L type-2 and type-1 decompositions (we drop the notation on r j in their factors for simplicity)-see [30]: Define the functions, for 0 ≤ k ≤ L -1, with m and l such that ℓL + mM = 1. Since L and M are coprime, the smallest integers m and ℓ are obtained by the Euclid's algorithm. Defining the order M polyphase decompositions the entries of the polyphase matrices E(z) and R(z) are simply {E k, j (z), R i, k (z)}.
In [30], the authors develop a zero forcing solution for R (z) based on the Smith form of E(z). It is shown that there exists an FIR right vector fractional biorthogonal partner of F i (z) if and only if Q r L >KM , and the greatest common divisor (gcd) of all theK M ×KM minors of E(z) is a delay. If theses conditions are satisfied, exact reconstruction of the input can be achieved. The additional degree of freedom offered by the resulting receiver is then exploited in order to minimize the noise power at the output. As mentioned in [30], the advantage of the method relies on the fact that neither the input nor the noise autocorrelation matrices are needed when the noise is uncorrelated, in comparison with an MMSE solution. Note that the ZF solution of [30] is not the only one independent of statistics when the noise is uncorrelated; a pure minimum-variance (LS) based receiver also minimizes the output noise power. Also, in the solution proposed in [30], their ZF method is compared to the MMSE solution considering a zero decision delay in the filter design. This could deteriorate the MMSE performance in comparison with the ZF approach. In this sense, a MVUE approach is also a good candidate for equalization (moreover, computing the Smith form is not a systematic task, and represents an ill-conditioned factorization problem, similar to the Jordan form).
Let N E and N R be the lengths of the polynomial matrices E(z) and R(z), respectively, and define their corresponding matrix coefficients Define the QLN R × KM(N E + N R -1) block convolution matrix   Hence, as long as we have Q r LN R ≥KM(N E + N R − 1)and T is full-rank, if the noise is uncorrelated, similarly to (28) the MVUE of s j (k) is given bŷ 6 Special case: single antenna, narrowband, freespace, far-field A special case of general interest arises when focusing is performed in both transmission and reception, giving rise to two distinct antenna beam patterns. Let β k (r f ) = w 1 ( r k − x 0 )w 2 ( r k − x 0 ) be their product, corresponding to point r k within the reflectivity tensor at r f . In a freespace scenario, (9) simplifies to where β k (r f ) β k (r f )/32π 2 |x 0 − r k | , and τ ≜ |x 0r k |/c. We have also dropped the dependency on x 0 , since it is fixed in our development. This corresponds to Q r = 1, so that we now deal with a simple multipleinput-single-output (MISO) model, as illustrated in Figure 10.
We observe that the 2D filter with coefficients given by the antenna beam pattern defined by b k (r f ) = b k (r) is now space-invariant in cross-range, while filtering along range is accomplished through the transmitted pulse only, which is a time-invariant scalar function p(t). That is, F r j (n − ) in (18) becomes where (30) is replaced by At the far-field we focus on a single depth, d 0 , so that H d k (n − ) = p(n − )H , where H becomes a (R 1 +K 1 -1)(R 2 +K 2 -1) × K 1 K 2 two-level block Toeplitz matrix with Toeplitz blocks. Hence, (33) simplifies tō or, equivalently, in terms of the reshaped model, where T is tall, Toeplitz-like, similar to (24). Assume that the beam pattern function b k (r) is separable, and consider two (discrete) sequences b 0 (n) and b 1 (n), which define two Toeplitz matrices A 0 and A 1 , respectively, at a certain range, similarly to the structure of H r 0 in (24), with the F r j (n) replaced by b 0 (n) or b 1 (n). In this case, we write H as H = A 1 ⊗ A 0 , where A 0 is (R 1 + K 1 -1) × R 1 , and A 1 is (R 2 + K 2 -1) × R 2 . Therefore, referring to (29), we have where [·] :, N + B-2-n denotes the (N + B -2n)th column of the argument, and Y d 0 ,n is defined in (29). This implies that for n = 0, 1, . . . , N + B -2, where we use Y d 0 ,n to define the output that includes the noise term V d 0 (n) , unlike Y d 0 ,n in (29). This model suggests, in light of (25), two main procedures for estimating the volumetric information S d 0 : (i) Matrix inverse. Given the separable model above, we can estimate S d 0 aŝ where δ 0 , δ 1 , and δ 2 are chosen optimally. The restriction matrices I δ 0 , I δ 1 , I δ 2 are defined similarly to (26), with δ 0 [0, R 1 ], δ 1 [0, R 2 ], and δ 2 [0, B]. In [28], it is shown that for the 1D case, when the delay δ is equal . . . to the number of zeros of the corresponding polynomial outside the unit circle, say e.g., p(n), the matrix norm is minimized. Here, because {b 0 (n), b 1 (n), p(n)} are known a priori, {δ 0 , δ 1 , δ 2 } can be found offline, be minimized as well.
(ii) Least-squares. Using well known properties of Kronecker products, the least-squares estimate of S d 0 is given bŷ or, reshapingŜ d 0 andȳ d 0 according to (32) and (32), If H is separable, we obtain We highlight that in this configuration, p(n), b o (n), and b 1 (n) play the role of a known channel in a communication system, which can be optimally chosen in order to meet certain design specification.
Remark: Regarding implementation issues, as we have mentioned, the structure of the receiver is paramount for feasibility of the solution, specially when a significant amount of data is involved.
Exploiting the structure induced by the Green's function of a specific medium has thus a powerful impact on the actual implementable solution of the corresponding inverse. While in [25] an inverse or least-squares operator is applied to the received signal (the latter by considering the adjoint ℋ* and precomputing the spectral decomposition of (ℋ*ℋ)), by virtue of sampling in a 3D freespace scenario, ℋ is replaced by (T ⊗ H) , so that the LS estimator is computed via (56). This allows us to replace a spectral factorization by a superfast receiver whose defining parameters are computed offline. This subject is closely connected with the field of fast equalization receivers in communications [7,32], where an efficient estimation of the transmitted sequence is available.

Preliminary experiments
We perform some preliminary simulations considering the matrix inversion and least-squares receivers for an exact modeling, 3D scenario in freespace aforementioned. We also plot the corresponding 1D equalization receivers based on A-scan lines. The behavior of these equalizers is well known, and the challenge in their implementations lies in cases where the conditioning of T , A 0 , and A 1 becomes too high. Good numerical properties and minimum MSE occur when the columns of these matrices are orthogonal, so that the LS solution becomes a matched filter. This is the case when the transmitted pulses do not overlap, or can be approximated by well designed codes. Two fundamental questions must be taken into consideration when computing the above solutions: (1) What sort of algorithm or method must be employed when implementing (52) or (56). (2) How does the designed pulse affect the numerical properties of the underlying solution, and more importantly, how does it affect the implementation of the method chosen in (1). These issues commonly arise in several different areas where LS type solutions are deployed, and are currently a (hot) topic of considerable activity in the field of fast matrix equalizers [32,33]. In hostile wireless communications environments, long delay spreads and deep fades are responsible for the ill-conditioning of the corresponding convolution matrix, a fact that can be similarly observed in {T , A 0 , A 1 } depending on their design criteria. Fortunately, these matrices are known quantities, and can be designed in order to avoid numerical difficulties. Several practical aspects must be taken into consideration, including ease of implementation of the pulse, antenna dimensions, etc.
We assume that the beam pattern is generated by a 1 m × 1 m antenna with uniform tapering function and current density, producing two separable sinc functions b 0 (n) and b 1 (n). We estimate a 10 × 10 × 10 tensor, with resolution T 1 = T 2 = Δ min = 0.1. The pulse length is T p = 10 -7 , and the carrier frequency is set to f c = 4 GHz. In A-scan imaging, the beam is swept over the object. For each received signal, the estimator is computed based solely on the transmitted pulse, while the beam pattern is usually designed in order to minimize the blurring effect in this direction. In the 3D modeling, however, whatever the beam pattern in cross range is, we make use of its shape when estimating along this direction as well. Observe that a tall matrix can always be induced by moving the beam over the object in order to perform a complete spatial convolution. Hence, improved results may be expected compared to the case of A-scans. Figure 11a shows the MSE decay normalized by the Frobenius norm of S , for a 10 MHz chirp pulse p(t) = cos(2πf c t + πt 2 ), where is the chirp rate. For the matrix inversion approach, in theory the optimal delay δ 2 should be equal to the number of zeros outside the unit circle, in this example given by δ 2 = 99. However, the condition number of (I T δ 2 T ) in this case is 7 · 10 13 , and produces no meaningful result. The optimal delay was verified empirically to be δ 2 = 0 instead. The corresponding condition number of (I T δ 2 T ) is 7.5, and its performance is close to the one of the LS receiver. For the latter, the condition number of T*T is equal to 784. Figure 11b shows the same scenario for a random pulse whose taps are drawn from zero-mean, unit-variance gaussian variables. The condition number of T*T is 1.97, while for (I T δ 2 T ), 13.2. This is to illustrate that conditioning is fundamental, and is not necessarily achieved when the transmitted signal is designed as a Chirp, also aiming implementation issues.
We consider an 80 × 80 real scene image Figure 12a for simplicity at a fixed range and an SNR of 10 dB, for  a randomly generated pulse with Gaussian samples, and additive white Gaussian noise. The optimal minimumnorm matrix inverse and 3D-LS estimation are illustrated in Figure 12b, c, respectively, which can be compared to the standard matched-filter receiver in Figure  12d. The corresponding A-scan schemes are shown in Figure 13a, b. We further compare these recovered scenes with the ones obtained with a Chirp transmission, in Figure 14a, b. We clearly see the effect of conditioning in the matrix inverse solutions, and superiority of the full convolution information-based receivers.
In this article, we have presented preliminary simulations considering the effect of the transmitted signal in A-scan and 3D estimation scenarios. A thorough comparison with other methods in a fair equivalent scenario is still to be pursued in a separate future study.

Conclusions
We have approached the 3D radar imaging reconstruction problem as a tensorial estimation, according to a multidimensional multirate model. This includes both unfocused and focused scenarios. Exact reconstruction of the reflectivity tensor is possible through standard equalization techniques, such as linear equalizers and block estimation. The conditions and accuracy in reconstruction depend on the amount of space and/or time diversity introduced, and can be induced by oversampling or focusing, where in the latter redundancy is achieved via overlapping tensor blocks. The entire process can be seen as a 4D space and time convolution model, whose knowledge allows for enhanced resolution compared to standard matched (or mismatched) filter approaches. In the unfocused scenario, or when scanning is performed via non-overlapping blocks, oversampling through biorthogonal partners is one way of achieving exact reconstruction. Range resolution in these cases is limited by state-of-the-art ADCs, and for focused beams, in theory there is no limit on achievable resolution. An important special case arises in the farfield, narrowband, freespace scenario, which results in a 3D separable convolution model, described via Toeplitzlike linear systems in all 3 dimensions. Because the transmitted pulse and beam pattern functions are known, this yields a superfast equalizer receiver, which (a) can be pre-computed via either matrix inversion or a LS solution. The complexity and numerical properties of the solution are one of the key factors for feasibility of implementation.
Endnotes a Matched Filtering is also called migration in the geophysics jargon, and backprojection in X-ray tomography terminology. b Decision-Directed Receiver. In situations where the range to be imaged is constituted by isolated targets, with a well defined threshold for deciding upon their existence or not, one can rely on decision-directed estimates in order to improve the imaging of the targets. This can be achieved by MIMO DFE, or approximately, by first obtaining a rough estimate of the reflectivity, and then rewriting the linear model in order to focus only on the range where the targets exist. This opens up several possibilities for structures based on decisiondirected feedback, and makes the estimation more accurate by focusing only at the ranges where the target is located.