Design for the system of high-speed data transmission and data pre-processing for synthetic aperture radar imaging based on the system-on-a-programmable-chip

: For the chirp scaling algorithm of synthetic aperture radar imaging, an efficient transmission of a large volume of data is indispensable. Prior to imaging, there is a requirement for appropriate pre-processing of the echo signal by digital down conversion (DDC). The DDC module has to remove the carrier, having an appropriate filtering processing and down-sampling processing. No matter what imaging mode is chosen, such as the stripmap mode, spotlight mode, and sliding spotlight, the needs of the whole imaging system are matched by setting a series of configurations about this pre-processing module and this transmission module. The system-on-a-programmable-chip constituted by the Advanced RISC Machine and field programmable gate array (FPGA) is the perfect experimental platform to test the performance of this system. Some of the algorithms, which are more feasible for this specific project for pre-processing in Maltab, were transplanted to FPGA using the VHSIC Hardware Description Language for functional verification. Finally, the processing results in Matlab were compared with this system to find the difference. At the same time, the time that elapsed from the 2 GB original data entering the system to the time the processed results were completely returned to the PC was also counted.


Introduction
As a kind of the active microwave sensor, synthetic aperture radar (SAR) has all-weather observational effect, which is not influenced by the light and climate and can penetrate the surface or vegetation to obtain the data information of the covered object [1,2]. SAR real-time imaging needs to deal with a large amount of data with high precision when it requires complex calculations [3]. The research of the hardware implementation of SAR imaging has been the focus of SAR imaging research both at home and abroad [4]. However, they have pursued an efficient and perfect hardware implementation of the SAR imaging algorithm as a target, but have neglected the effect that the data transmission and the preprocessing of raw data may have on the entire implementation. An efficient subsystem may improve the efficiency of the entire hardware system. This design based on the system-on-aprogrammable-chip (SoPC) not only has the advantages of SoC, such as fine performance and a high integration degree, but also has the flexibility and low-cost features of the programmable logic (PL) device [5]. In order to cooperate with multiple modes of the imaging system, this paper designs the SAR pre-treatment system to obtain test data that meets the requirements of the system that can guarantee the error of the transmission process is minimised in SoPC.
Appropriately reducing the sampling rate of echo can greatly reduce the overall data rate of the radar system, aiming to meet the requirements of real-time SAR imaging. Direct descending sampling is possible to cause power spectrum aliasing, and the imaging quality is seriously reduced. Therefore, linear phase finite impulse response (FIR) low-pass filtering is required for SAR raw data before descending sampling. In this paper, the sampling rate of the FIR filter is over 1 GHz, which is a very demanding requirement for the implementation of the FIR filter in hardware. The maximum operating speed of the devices introduced by the two major suppliers (Xilinx and Altera) of the field programmable gate array (FPGA) has exceeded 500 MHz [6]. There are some ways to reduce the complexity of the FIR filter and improve the speed of the filter by decomposing the filter coefficient into multiples of 2 [7,8]. This method still adopts a serial structure. The maximum running speed of the serial FIR filter depends on the speed of each filter. However, the method of serial processing in the FPGA is still unable to achieve the >1 GHz sampling rate of the FIR filter.
In consequence, it is must to improve the structure of the implementation of the filter. We adopt the form of four-way parallel processing to reduce the speed of the computing element for requirements, which can be implemented in FPGA. Only the echo signal processed by the analogue-to-digital (AD) converter can be supplied to the system. The pre-treatment module and the main imaging module are not on the same FPGA board to ensure that there is no other interaction between these parts in addition to data interaction. The pre-processing module is designed on the Zynq-7020 platform, whereas the main module is designed on the NetFPGA platform. This design greatly increases the flexibility of the entire processing system, and we can also put the preprocessing module into other processing systems. The Zynq-7020 platform has two Advanced RISC Machine Cortex-A9 processors so that we can make use of them to control this pre-processing system and perform a series of setting on the operating parameters and modes of the system.
There is a Linux operating system running in the Zynq-7020embedded platform, which can greatly enrich the functions of the system and provide a convenient way for high-speed data transmission between the PC and the Zynq-7020 platform. By Ethernet and Transmission Control Protocol/Internet Protocol (TCP/IP), the external data transfer rate can exceed 100 Mbps. Through the Advanced extensible Interface (AXI) direct memory access (DMA) channel and FPGA Mezzanine Card (FMC), the data interaction between the pre-processing module (Zynq-7020) and the main module (NetFPGA) can be implemented [9]. The design of data transmission is shown in Fig. 1.

Pre-processing design
It is necessary to go through the pre-processing module, a buffer stage, to reduce the data rate before putting the raw data into the imaging system. Digital down conversion (DDC) technology can make the high-speed and high-frequency digital signal generated by the A/D converter turn into the Zero-Intermediate Frequency digital signal, which is the baseband signal. Therefore, DDC generally works with a higher rate, and it has become one of the key components that constraints on the performances of digital receivers. DDC consists of four parts: the local numerically controlled oscillator (NCO), digital mixer, low-pass filter, and decimator. The overall design of preprocessing is shown in Fig. 2. This process is designed on a Zynq-7020 platform. There are some parameters of pre-processing as shown in Table 1.
The sampling frequency of ADC is 1 GHz, and the centre frequency of the sampled data is 750 MHz. In order to reduce the system-processing rate, the ADC acquisition chip is set as the Demux double-edge sampling mode. This will result in four parallel signals for each ADC. So DDC adopts the design of four parallel processing.
A digital mixer is a time-domain multiplier that multiplies two sequences of inputs in order. The NCO produces two singlefrequency orthogonal signals: one signal is a cosine signal, and it is sequentially multiplied by the mixer with the original signal sequence to obtain signal I; and the other signal is a sinusoidal signal, and it is sequentially multiplied by the mixer with the original signal sequence to obtain signal Q. At the same time, the two outputs of the FIR anti-aliasing filter have filtered out-of-band signals.
In the procession of DDC and extraction, the sampling rate has changed, and it is necessary to ensure that the signal does not have the aliasing phenomenon, which requires us to adopt a suitable filter to limit the signal bandwidth to a suitable range. This filter is an anti-aliasing filter. Due to the same-generation equal-ripple nature of the elliptical filter and its steeper transition band, the elliptical filter is selected as the anti-aliasing filter in this system.
The magnitude response and phase response of the filter are shown in Fig. 3.
The implementation of the filter makes use of the IP core of the FIR. However, in the Zynq-7020 platform, the maximum clock frequency of this IP core is only ∼300 MHz. Although it is much smaller than the sampling frequency table, the four-way parallel processing method solves this problem. Some computational changes are made in this design of the filter, and the filter became a parallel-processing structure. According to the structure of the direct type of FIR filter, we design a filter banks consisted of four filters. Four parallel output data of this filter bank can take place of serial output data. This filter bank is illustrated in Fig. 4. The four filters are named as CORE0, CORE1, CORE2, and CORE3, respectively. The parameters of every filter are different from others. The data flow of the filter bank is shown in Fig. 5.
For the data-extraction design, the corresponding relationship between the specific working mode and the extraction ratio is shown in Table 2.

Interaction between PC and Zynq-7020 platform
A socket could achieve the data interaction between the PC and the Zynq-7020 platform, which is based on the request/response (client/server) mode. It makes use of TCP/IP to control data exchange between the two platforms because of its stability and reliability [10].
After receiving data from the client (PC), the server (Zynq-7020) will put data into this pre-processing module in order to remove the carrier and reduce the sample rate. After data operation, the data will be returned to PC from Xilinx NetFPGA. The size of the data packet sent and received each time is fixed at 4096 bytes that can be changed with the needs of the system. Fixed-size data packets could facilitate the transfer and processing of data, reducing the possibility of errors occurred. The receive buffer size of the Recv () should be set to 4096 bytes, and the receive mode is set to the MSG_WAITALL mode when the buffer area of data is full, the data can be taken out of the cache as shown in Fig. 6.

Data transmission in Zynq-7020
The Zynq-7020 platform published by Xilinx can be divided into the processing system (PS) and PL. The interface between PL and PS is mostly suitable for the AXI bus protocol. The PL module design follows the AXI bus protocol. We can communicate with the PS side through these interface channels, such as accessing to Double Data Rate SDRAM (DDR), as Fig. 7 shows. The block diagram above illustrates the design that we will create. The processor and DDR memory controller are contained within the Zynq PS. The AXI DMA and AXI Data First Input First Output are implemented in the Zynq PL. The AXI-lite bus allows the processor to communicate with the AXI DMA to set up, initiate, and monitor data transfers. The AXI_MM2S and AXI_S2MM are memory-mapped AXI4 buses and provide the DMA access to the DDR memory. The AXIS_MM2S and AXIS_S2MM are AXI4-streaming buses, which source and sink a continuous stream of data, without addresses.

Interaction between NetFPGA and Zynq-7020
The Zynq-7020 platform establishes the connection to NetFPGA platform through FMC. The FMC has a large data throughput, low latency, and other characteristics, and is very much in line with the requirements. NetFPGA, the last part of the whole imaging system, establishes a connection with the data link via FMC. After data processing, NetFPGA also returns data back to the PC by Zynq-7020.

Analysis of results
This experiment sends 2-GB data through the Zynq-7020 platform to the NetFPGA platform. Since the Zynq-7020 platform does not have an AD converter, in order to verify the performance of the system, we make use of MATLAB to perform AD conversion on the PC. Then inputting the data file after converted into the Zynq-7020 through the network port, and waiting for the processed data to return to the PC. The hardware architecture of this system is shown in Fig. 8.
In the time-domain analysis, at the 750-MHz intermediate frequency, 300-MHz bandwidth signal is the original input signal. The two platforms of Matlab and SoPC are used for processing. Two sets of processing data are obtained, and we do not extract the output data. The time-domain results of the two sets of data obtained are numerically compared, and the calculation method is where A is the data after processing in Matlab, B is the data after pre-processing in this system. The difference matrix of the two sets of data as Fig. 9 shown. It can be seen that the two sets of data are basically the same, only a large difference in a few limited points. We measure the time that has been consumed 10 times, and compute the average of this 10 times. The test results are as shown in Table 3.

Conclusion
In general, the pre-processing system achieves the function of DDC. From the comparison between the results of the preprocessing system and the results generated by the simulation in Matlab, the pre-processing system can fully meet the application requirements of the SAR imaging system. Providing high-quality data for the main module of the imaging algorithm that follows, while reducing the burden on the entire system. A high-speed datatransmission system is also designed and implemented. The input and output of the data are both sent from the PC with Ubuntu. The ARM running the Ubuntu system can control the direction of data transmission, which reduces the design difficulty of the data link. The data link can also mount other modules in future designs without affecting the normal operation of other modules. Through systematic testing experiments, it is verified that an efficient datatransmission capability and a pre-processing subsystem can be achieved for SAR real-time imaging design based on SoPC is provided.