A New Multichannel Parallel Real-time FFT Algorithm for a Solar Radio Observation System Based on FPGA

The real-time fast Fourier transform (FFT) is the essential algorithm for signal processing in a solar radio receiver. However, field-programmable gate array (FPGA) computation resources have become the limitation of real-time processing of signals with increasing time and spectral resolutions. It is necessary to design a real-time parallel FFT algorithm with reduced resource occupation in the development of future receiving systems. In this paper, we developed a multichannel parallel FFT algorithm named the multichannel parallel real-time fast Fourier transform (MPR-FFT), which can greatly reduce FPGA resource occupation while increasing the real-time processing speed. In this algorithm, the 4L simultaneous N-point FFTs are first converted into L simultaneous 4N-point FFTs. Fusion processing is then performed to obtain the 4 ∗ L ∗ N-point spectrum. This method has been used in developing a solar radio spectrometer, which works in the frequency range of 0.5–15 GHz in the Chashan Observatory. In this spectrometer, 16 channel MPR-FFT with 8k-point data is realized in a Xilinx UltraScale KU115 FPGA. The MPR-FFT algorithm reduced the computational resources to a large extent compared to the Cooley-Tukey-based parallel FFT method; for instance, the Look-Up-Table, Look-Up-Table RAM, Flip-Flop, and Digital Signal Process slices were reduced by 37%, 50%, 17%, and 2.48%, respectively. Although the MPR-FFT consumes 14 block RAM resources more than the Cooley-Tukey-based parallel FFT, the MPR-FFT algorithm presents an overall reduction in resource usage.


Introduction
During solar eruptions, i.e., flares, coronal mass ejections (CMEs), etc., magnetic energy stored in the coronal magnetic fields is released and transferred into radiation energy, kinetic energy of energetic particles, and plasma thermal energy. These eruptions may result in severe space weather catastrophes in solar-terrestrial space (Zhao et al. 2004;Hwangbo et al. 2015;Casini et al. 2017;Geng et al. 2018), and the research, monitoring and forecasting of solar eruptions are thus important in the protection of humankind aviation activities.
Radio observations, as one of the two ground-based observation windows, can provide unique information on their source region based on their emission mechanisms, such as plasma emissions at metric wavelengths and gyrosynchrotron emissions at millimetric wavelengths, and are valuable in the study of solar eruptions (Yan et al. 2006;Liu et al. 2011;Casini et al. 2017).
Dynamic spectra are usually used as solar radio observations in the decametric to decimetric wavelength regime (Yan et al. 2021). The signal is captured by the observing system by an antenna and transmitted to the analog front-end (AFE) circuit for amplification, filtering, and mixing. The intermediate frequency (IF) signal is then processed by a digital receiver (spectrometer), in which the analog signal is converted into digital signals and yields the realtime dynamic spectrum. The recent development of a fast analogto-digital converter (ADC) has provided the chance to directly acquire high-frequency signals (several GHz) and could eliminate the mixing process in the AFE circuit (Yan et al. 2020). The application of these high-performance devices could not only improve the time and spectral resolutions of real-time solar radio observations but also reduce the volume, weight and cost of the observation system. However, we should note that fast devices also challenge the design of the digital signal processing module in a receiving system (Luo & Zhang 2020).
The fast Fourier transform (FFT) has been widely used in signal processing in solar radio digital receivers (Morales 2011;Nakahara et al. 2012;Finger et al. 2013;Nakahara et al. 2016;Iwai et al. 2017;Vaate et al. 2017;Liu et al. 2019). The digital signal converted by ADC is processed by the real-time FFT algorithm to generate the final dynamic spectrum. The serial radix-2 FFT is the commonly used FFT algorithm, in which symmetry and periodicity of discrete Fourier transform (DFT) operations are used. The N-point DFT is first decomposed into N/2 2-point DFTs. The results of these DFTs are then used to yield the FFT value of the initial N-point sequence (Baas 1999).
The serial radix-2 FFT algorithm cannot realize real-time observations with increasing observation data. To solve this problem, many researchers have made improvements to the FFT algorithm, such as the "six-step FFT" algorithm proposed by Bailey (1990) and the Cooley Tukey method (twodimensional FFT algorithm) (Cooley & Tukey 1965;Deng et al. 2006). In these algorithms, first, perform L simultaneous M-point FFTs on the input data considered an L * M matrix. The resulting data are multiplied by rotation factors and then transposed into an M * L matrix. Finally, M simultaneous Lpoint FFTs are performed to obtain L * M-point spectrum data (Deng et al. 2006). Zou et al. (2012) optimized the complex multiplier and rotation factors during the implementation of the parallel FFT algorithm. Nakahara et al. (2015) used two techniques to reduce the number of look-up tables (LUTs) in an FPGA. Zhang et al. (2016) developed the Cooley Tukey method and applied it to the broadband and reconfiguration radio observation system for the Five-hundred-meter Aperture Spherical radio Telescope.
However, with increasing observation data, the current parallel FFT algorithm cannot realize real-time observations with limited computational resources. Solar radio bursts could appear in the frequency range of several MHz to tens of GHz. For example, microwave bursts are usually present as continua in the centimetric wavelength regime (Wu et al. 2016(Wu et al. , 2019. In this case, observations should be carried out in a frequency range as large as possible, which raises the demands of spectrometers with wider bandwidth together with higher sensitivity. The limited field-programmable gate array (FPGA) computation resource thus restricts the real-time processing of signals in the receiving system, and developing a real-time parallel FFT algorithm with reduced hardware resource occupation is therefore urgent in future radio observations. Based on the solar radio spectrometer under development (0.5-15 GHz, dual-channel, 14 bit, 3 Giga Samples per second (Gsps)), this paper presents a newly developed multichannel parallel FFT algorithm named the multichannel parallel realtime fast Fourier transform (MPR-FFT). In this algorithm, the 4L channels of data generated from ADC are combined into 2L channels of complex sequences, so 2L IP cores are used to perform simultaneous N-point FFT to generate 2L channels of complex FFT results. The above results are input to the next level to obtain 2L channels of 2N-point real FFTs and then obtain L channels of 4N-point real FFTs. Finally, fusion processing is performed to obtain the 4 * L * N-point spectrum.
(L is a positive integer, according to our actual needs, L = 16, N = 512). We tested the MPR-FFT algorithm in a 0.5-15 GHz solar radio digital receiver to realize real-time dynamic spectrum observations with time and spectral resolutions of 1 ms and 366 kHz, respectively. We find that this method can reduce the hardware resource occupation in FPGA to a large extent while increasing the processing speed. The speed improvement refers to the comparison of the time taken by the MPR-FFT algorithm and the Cooley-Tukey-based parallel FFT method to process each frame of data with a working clock of 187.5 MHz. The MPR-FFT algorithm is also suited for navigation, radar, communication, and other fields (Wang & Zhao 2005;Hua et al. 2019). The next section presents the FFT algorithms for parallel processing based on FPGA. Section 3 illustrates the theoretical derivation and implementation in the FPGA for the MPR-FFT algorithm. Section 4 illustrates the experimental results and analysis for the MPR-FFT algorithm, and the discussion and summary are given in the last section.

Parallel Data Processing Method Based on a Single FFT IP Core
FFT IP core is a functional module provided by FPGA manufacturers in the development platform that can be used for FFT calculations. However, due to the strong limitations of FFT IP core, it cannot meet the current needs of astronomical observations. For example, when FFT IP core is used for realtime FFT calculation of pipeline structure, it only supports realtime calculation of single-channel data, which is far from meeting the needs of real-time FFT processing of multichannel parallel data in solar radio telescopes.
Generally, when conducting FFT processing for multichannel data in a single IP core, two operations should be carried out. The parallel-to-serial conversion and clock signal frequency multiplication are shown in Figure 1. As seen from this figure, the signals (S 0 --S N 1 1 ) output from the high-speed ADC are first converted into a serial signal (S data ) and then processed by an FFT IP core to yield the spectrum. The data rate of the serial output signal (S out ) should be N 1 times those of input signals, i.e., signal S 0 --S N 1 1 , to avoid data loss. Therefore, the processing speed of the IP core should also be N 1 times the transmitting speeds of the input signals to realize real-time processing. We note that N 1 cannot exceed 4 in practice, as the IP core has limitations on both processing speed and resource occupation (Deng et al. 2006). This method is not suitable for real-time FFT processing of multichannel parallel data.

Traditional FFT Algorithm for Parallel Data
At present, parallel FFTs are usually operated based on the "six-step FFT" algorithm proposed by Bailey (1990) and the Cooley Tukey algorithm (butterfly algorithm). We present in Figure 2 the diagram of the "six-step FFT" algorithm and the Cooley-Tukey-based FFT for a sequence of N points, which can be considered a matrix of * N N 1 2 . FFT IP cores are used to perform N 1 simultaneous N 2 -point pipeline FFTs. Complex multiplications are then carried out for these results and N 1 different sets of rotation factors. The full-parallel pipelined FFT is finally processed for the generation of the spectrum.
Thus far, FFT algorithms should balance the processing speed against the occupation of hardware resources in the FPGA. A large amount of resources is occupied by FFTs in each IP core and the full parallel FFT of the N 1 points. Realization of real-time processing comes at the cost of hardware resources, such as Look-Up- Table (LUT), Look-Up-Table RAM (LUTRAM), Flip Flop (FF), and Digital Signal Process (DSP) slices. Therefore, it is necessary to design a realtime parallel FFT algorithm with reduced hardware resource occupation in the development of a receiving system.

Theoretical Derivation of MPR-FFT Algorithm
To realize real-time FFT processing of multichannel parallel data while saving computing resources, we improve and develop a new multichannel parallel real-time FFT algorithm based on the property of odd-even separation in FFT and the principle of the Cooley-Tukey-based parallel FFT algorithm as well as the "six-step FFT" algorithm. In this algorithm, the 4L channels of data generated from ADC are combined into 2L channels of complex sequences, so 2L IP cores are used to perform simultaneous N-point FFT to generate 2L channels of complex FFT results. The above results are input to the next level to obtain 2L channels of 2N-point real FFTs and then obtain L channels of 4N-point real FFTs. Finally, fusion processing is performed to obtain the 4 * L * N-point spectrum (L is a positive integer). We will present the derivation of the developed algorithm in the following.
The traditional FFT algorithm (Luo & Zhang 2020) is illustrated as is the length of the FFT, and n can be written as = + ( ) n n N n , 2 1 2 2 where n 1 = 0, 1, K, N 1 − 1; n 2 = 0, 1, K, N 2 − 1. That is, the sequence of x (n) is decomposed into N 1 sequences, and  each sequence has N 2 data, and k can then be expressed as where k 1 = 0, 1, K, N 1 − 1; k 2 = 0, 1, K, N 2 − 1. According to the above two equations, we can rewrite Equation (1) as We assume a complex sequence as where c(n) and y(n) are real sequences. According to the linear property of FFT, the FFT expression for Equation (5) can be written as is the FFT of z(n) and C[k] and Y[k] are the real part and imaginary part of Z[k], respectively. We can get Equation (7) shows that the FFT results of two N-point real sequences can be obtained by the FFT calculation of the Npoint complex sequence. The supposed sequence x(n), which is a 2N-point real sequence, is decomposed into two N-point real sequences h(n) and g(n): A complex sequence f(n) can then be expressed as and its FFT calculation can be written as By analyzing the algorithm of complex FFT, it is known that the N + 1th element of the calculation is the difference between odd and even sequences in the real sequence, and the rest of the calculations are conjugate symmetric to the N + 1th element. The supposed sequence s(n), which is a 4N-point real sequence, is decomposed into sx(n) and sy(n) according to the parity, sx(n) is decomposed into s11(n) and s12(n), and sy(n) is decomposed into s21(n) and s22(n). Complex sequences s1(n) and s2(n) are constructed and illustrated as According to Formulas 10, 11 and 12 as well as the properties of complex sequence FFT, FFT calculations of sequences sx(n) and sy(n)can be obtained (Chen et al. 2009). According to the linear property of FFT, the FFT calculation of sequence x sxy (n) is obtained and then illustrated as where X s1r and X s1i are the real part and imaginary part of the FFT calculation of sx(n), respectively, X s2r and X s2i are the real part and imaginary part of the FFT calculation of sy(n). It follows that, x sxy (n) = sx(n) + j * sy(n), n = 0 ∼ 2N − 1, and k = 0 ∼ 2N − 1. According to Formulas 11 and 12, the real part X sr (k) and imaginary part X si (k) of the first 2Nth elements (FFT calculation of sequence s(n)) are obtained and then illustrated as , and assuming X r12 (k) = X s1r (k) − X s2i (k), X i12 (k) = X s1i (k) + X s2r (k). According to Formulas 16 and 17, the FFT calculation of the 4N-point sequence s(n) is obtained and then illustrated as According to Formulas 4, 18, 19 and 20, calculation of the N-

multiplication of rotation factors, and
N 4 1 -point full-parallel pipelined FFT calculation (fusion calculation). Diagram of algorithm is shown in Figure 3.
Compared with the "six-step FFT" algorithm and the Cooley-Tukey-based parallel FFT, the MPR-FFT algorithm has a remarkable advantage in the processing of large data. The MPR-FFT algorithm can greatly reduce the resource usage in the multichannel simultaneous FFT calculation step and fusion processing step, while the processing speed of the algorithm is improved slightly.

Implementation of the MPR-FFT Algorithm in the FPGA
With the development of ADCs, fast devices have provided the opportunity for multichannel parallel FFT in the field of radio astronomy, communication, radar, etc. FPGA has been widely used in processing parallel digital signals generated by fast ADCs because of its inherent parallelism, flexibility, and integration. In the field of radio astronomy, the FPGA has made it possible to process real-time observations in broad bandwidth. The implementation of multichannel parallel FFT based on FPGA is therefore, worth, researching.
The digital receiver of the 0.5-15 GHz solar radio telescope comprises an ADC sampling module and an FPGA processing module. The board includes two ADC signal acquisition chips, a clock chip, and an FPGA chip. The sampling resolution of ADC is 14 bits, and the sampling rate can be set flexibly from 2.5 to 3.1 Gsps. The clock chip provides high-precision and low-jitter clocks for ADC and FPGA. The FPGA chip is the main controller and performs real-time FFT processing for digital signals. The FPGA device used has dozens of GTH interfaces, which is conducive to signal processing and data transmission. It can provide 1.16 million logical units, 5520 optimized DSP slices, and 16.3 Gbps backplane transceivers.
We present the diagram of signal processing in the receiving system in Figure 4. The analog signal is converted into a digital signal by ADC CORE (the sampling rate is 3 Giga samples per second (Gsps)). The digital signal is packaged on the JESD204B   transmission interface on the ADC side and transmitted to the FPGA through the 8 links of the JESD204B LINK (the transmission rate of each link is 7.5 Gbps). The data transmitted by the ADC are received by the JESD204B receiving interface on the FPGA side and unpacked in the FPGA. The data parsed in FPGA are 16 simultaneous parallel channel data sets, so the clock frequency is 187.5 MHz. Here, the calculation method 187.5 MHz * 16 = 3 GHz shows no data loss in this process. Therefore, this fixed clock frequency of 187.5 MHz is the clock frequency of the real-time FFT processing algorithm in the solar radio digital receiver. The 16 parallel sampling data are processed by the real-time FFT algorithm in FPGA with a working clock of 187.5 MHz, and the FFT results are finally transmitted to the host computer through a 10 Gigabit optical fiber.
The MPR-FFT method was carried out in the FPGA for 16 parallel channels with a total data volume of 8k points, and the diagram is shown in Figure 5. The input parallel data (×[0] to ×[15]) converted by the high-speed ADC are first processed into 8 complex sequences, and complex FFTs of 512-point data are then conducted for these sequences in eight separate FFT IP cores. The above results are used to calculate 2k-point real FFTs (the N-2N module and 2N-4N module in Figure 5), according to which 4 parallel channels obtain 2k points real FFT per channel. Complex multiplications are further carried out for these results and rotation factors. The four-point fusion calculation steps and sorting output modules are finally processed to generate the dynamic spectrum.

Realization of 2N-point real FFT Using the N-point
Complex FFT The 2N-point real FFTs are realized according to N-point complex FFTs by means of calculation-caching calculations. The time sequence diagram is shown in Figure 6. Data_0 and data_1 are the real and imaginary parts of the complex sequence, respectively, which are then processed with a complex FFT of 512-point data. The 2N-point real FFTs are further carried out for these results. We note that the above calculation requires symmetrical operation for the comp_fft of each frame (512 points). Because of the pipelined outputs, the first half-frame signal will be lost when the second half is output and, therefore, should be cached. In the developed method, the cached data are read from the First Input First Output (FIFO) data buffer once the second half frame arrives. As the calculations of symmetric operation for two data points are obtained in one clock cycle, the results of both the first half-frame (out_l) and the second halfframe (out_h) are output in parallel. We should also note that the conversion of results from a complex FFT to a real FFT will result in a time delay of several clock cycles, which would not affect the real-time data processing.

Fusion Operation
The results of the 2N-point real FFT are further processed by a fusion process for the final generation of the spectrum. A diagram of four-point fusion operations is shown in Figure 7. There are four inputs and four reverse outputs in each clock period, and the four outputs of each stage flow directly to the next stage. The input data (b 1 , b 2 , b 3 , b 4 ) are 2k-point real FFTs from the FFT conversion module. Two steps are included in this operation. In the first step, complex multiplication is carried out for the signal of b 2 and the rotation factor of the first butterfly operation, the results of which are then added and subtracted with b 1 to obtain the signals of c 1 and c 2 . The above operations are also applied to signals b 3 , b 4 , c 3 , and c 4 . In the second step, complex multiplications are carried out for signals of c 3 and c 4 and rotation factors of the second butterfly operation, the results of which are then added and subtracted with c 1 and c 2 to obtain the signals of d 1 and d 3 and d 2 and d 4 . The register transfer level (RTL) diagram of a butterfly cell is shown in Figure 8, which is one of the four butterfly units in Figure 7.

Results
We tested the correctness of the MPR-FFT algorithm and implemented two parallel FFT algorithms (the MPR-FFT algorithm and the Cooley-Tukey-based parallel FFT) in FPGA. Then, the performances of these two algorithms are compared. In our tests, the input data are the sampling data of a singlefrequency (2.7 GHz) signal, which are 16 parallel channel data from ADC in the digital receiver with a sampling rate of 3 Gsps. These parallel input data are processed into one channel sequence, and the FFT function is then conducted for this one channel sequence in MATLAB (Figure 9(a)). We also realized a 16 channel MPR-FFT algorithm with 8k points in MATLAB (Figure 9(b)). The results of the regular FFT method (Figure 9(a)) and MPR-FFT algorithm realized in MATLAB (Figure 9(b)) are first compared, and the 16 channel MPR-FFT algorithm is further implemented in FPGA with a working clock of 187.5 MHz (Figure 9(c)). We compared the resource usage and processing speed of the MPR-FFT algorithm and Cooley-Tukey-based parallel FFT. The test results are presented in the following. Figure 9 presents the FFT results of the regular FFT method (a), MPR-FFT algorithm realized in MATLAB (b), and MPR-FFT algorithm implemented in FPGA (c). We can see that the results of the MPR-FFT method (Figure 9(b)) are identical to those of the regular FFT method (Figure 9(a)). We further implemented the MPR-FFT in FPGA (Figure 9(c)) and found no significant difference among these tests, indicating that the MPR-FFT algorithm can be implemented correctly in FPGA. Figure 10 presents the simulation results of the MPR-FFT algorithm in FPGA. The pipelined processing ensures real-time processing of multichannel parallel data. The signal power_out displayed as an analog waveform and marked in the white box is the power spectrum of a single-frequency signal of 2.7 GHz, indicating the MPR-FFT algorithm is implemented correctly in FPGA and the real-time solar radio observation is possible.
We carried out tests of Cooley-Tukey-based parallel FFT and the MPR-FFT algorithm in FPGA for 8k-point data, and the corresponding computation resource occupations are shown in Table 1 and Figure 11. We can see that compared with the Cooley-Tukey-based parallel FFT, the MPR-FFT algorithm reduces the LUT, LUTRAM, FF, and Digital Signal Process (DSP) resource occupation of up to 37%, 50%, 17%, and 2.48%, respectively. Although the block RAM (BRAM) resources used in the MPR-FFT are higher than those of the Cooley-Tukey-based parallel FFT, the MPR-FFT algorithm presents an overall reduction in resource usage, and the calculation method will be presented in the following.
In Xilinx UltraScale KU115 FPGA, each slice provides eight 6 input LUTs and sixteen flip-flops. There are two types of slices in the UltraScale architecture. SLICEL (logic) has all the LUT and storage element resources, along with the carry logic and wide multiplexers. A SLICEM (memory) can also use the LUTs as distributed 64 bit RAM. Therefore, the capacity of a LUT is 64 bits. From the eighth row of Table 2, we can see that the amount of LUT for the MPR-FFT algorithm is 8666 less than that of the Cooley-Tukey-based parallel FFT method,    which is equivalent to reducing 541.625 kb of resources. The MPR-FFT consumes 14 BRAM (the capacity of each BRAM is 36 kb) resources more than the Cooley-Tukey-based parallel FFT, which is equivalent to an increase of 504 kb of resources. From the reduced 541.625 kb resources and the increased 504 kb resources, it can be concluded that the MPR-FFT algorithm presents an overall reduction in resource usage compared to the Cooley-Tukey-based parallel FFT method.
In addition to the large reduction in LUT resources consumed, the amount of FF and DSP used in the MPR-FFT algorithm has also been reduced to a certain extent. Depending on the length of the FFT algorithm, the number of LUTs will become a bottleneck; however, that of the BRAMs will not be a bottleneck (Nakahara et al. 2015). Therefore, the significant reduction in LUT resources consumed and an overall reduction in resource usage in the MPR-FFT algorithm are significant improvements. Table 2 compares resources occupied by each step of the FFT algorithm based on the MPR-FFT and Cooley-Tukey-based parallel FFT methods. From the resource consumption of all steps of the algorithm and the resource consumption of each step of the algorithm in Table 2, various hardware resources consumed by the multichannel simultaneous FFT calculation processing step in the Cooley-Tukey-based parallel FFT method account for the vast majority of the entire algorithm. In the MPR-FFT algorithm, because 16 simultaneous real FFT calculation results are obtained through 8 complex FFT calculations, the multichannel simultaneous FFT calculation processing step reduces the resource consumption by half, which is one of the main reasons for the reduction of resource consumption of the MPR-FFT algorithm. In addition, the sixth row in Table 2 shows that the MPR-FFT algorithm significantly reduces the consumption of various resources compared to the Cooley-Tukey-based parallel FFT algorithm in the fusion processing step. Because the MPR-FFT algorithm can effectively reduce the parallelism of the algorithm, the complexity of the subsequent fusion processing step is greatly reduced, thereby significantly reducing the consumption of various resources, which is also the main reason for the reduction of resource consumption of the MPR-FFT algorithm.   Figure 11. Proportion of the resources saved in the MPR-FFT compared to the Cooley-Tukey-based parallel FFT algorithm. Table 3 presents a comparison of the processing speeds of the different FFT methods. The processing speed (starting with the first raw data entering and ending with the last transformation data output) is expressed by the number of clock cycles, and the total time is calculated by a working clock of 187.5 MHz. Table 3 shows that the processing speed of the MPR-FFT algorithm is 10 clock cycles faster than that of the Cooley-Tukey-based parallel FFT. The speed improvement in the paper refers to the comparison of the time taken by the MPR-FFT algorithm and the Cooley-Tukey-based parallel FFT method to process each frame of data with a working clock of 187.5 MHz, and the comparison of the time to process each frame of data is shown in Figure 12. Figure 12 shows that the time for the Cooley-Tukey-based parallel FFT method to process each frame of data (the clock cycles from the beginning of each frame of data to the end of the calculation of one frame of data) is 1688 clock cycles, while the time taken by the MPR-FFT algorithm is 1678 clock cycles. The MPR-FFT algorithm reduces the processing time of each frame of data by 10 clock cycles compared with the Cooley-Tukey-based parallel FFT method, which is equivalent to a 0.5% speed improvement. Because the MPR-FFT algorithm can reduce the complexity of the fusion processing, the processing speed of the MPR-FFT method is improved slightly.
According to the above tests, we can see that there are two significant advancements achieved by the MPR-FFT algorithm: (1) The MPR-FFT algorithm presents an overall reduction in resource usage. The computational resources can be reduced to a large extent, especially the resources of LUT, LUTRAM, and FF, which can effectively solve the resource bottleneck problem.
(2) The real-time processing speed of the algorithm is also improved slightly, which guarantees real-time solar radio observations with high time and frequency resolutions in a large bandwidth.

Discussion and Summary
The real-time FFT is the essential algorithm for signal processing in a solar radio digital receiver. However, the FPGA computation resource has become the limitation of the real-time processing of signals with increasing time and spectral resolutions in solar radio observation systems. Therefore, it is necessary to design a real-time parallel FFT algorithm with reduced hardware resource occupation in the development of a future receiving system.
We developed a multichannel parallel FFT algorithm named the multichannel parallel real-time fast Fourier transform (MPR-FFT), which could greatly reduce the FPGA resource occupation while ensuring the real-time solar radio observations with high time and frequency resolutions in a large bandwidth. In this algorithm, the 4L channels of data generated from ADC are combined into 2L channels of complex sequences, so 2L IP cores are used to perform simultaneous N-point FFT to generate 2L channels of complex FFT results. The above results are input to the next level to obtain 2L channels of 2N-point real FFTs and then obtain L channels of 4N-point real FFTs. Finally, fusion processing is performed to obtain the 4 * L * N-point spectrum (L is a positive integer). This method would be used in the developing solar radio spectrometer (dual-channel, 14 bit, 3 Giga Samples per second (Gsps)), which would work in the frequency range of 0.5-15 GHz in the Chashan Observatory (CSO).
In addition to the common I-V types of radio bursts, typical radio bursts also include fine-structure bursts such as zebra patterns and spike bursts. In the decimeter-centimeter band observed at 0.5-15 GHz, in addition to the cyclotron synchrotron radiation covering a wider band, highly structured zebra patterns and rapidly changing spike bursts are the main burst structures. Previous studies have shown that the duration of spike bursts is only 10-60 ms (Feng et al. 2018;Tan et al. 2019), and the zebra pattern presents a multilevel harmonic structure in the range of tens to Figure 12. Comparison of the time to process each frame of data (each frame of data is 8k data, and 16 simultaneous parallel channels are used. Therefore, each frame of data occupies 512 clock cycles). (a) The time for the Cooley-Tukey-based parallel FFT method to process each frame of data is 1688 clock cycles. (b) The time for the MPR-FFT algorithm to process each frame of data is 1678 clock cycles. Figure 13. Solar radio spectrogram from the state of the antenna pointing to the sky to that of the antenna pointing to the Sun.
hundreds of MHz (Feng et al. 2018;Tan et al. 2019). Therefore, the solar radio telescope needs to have high time resolution and frequency resolution to observe the fine structure of this band. According to the observation requirements of the 0.5-15 GHz solar radio telescope system, the time resolution and frequency resolution of the system are set as 1 ms and 366 kHz, respectively. In the 0.5-15 GHz solar radio observation system, the signal is divided into multiple analog signal channels, and four digital receivers are used to sample and process these signals in realtime. The sampling rate of two digital receivers is 3 Gsps and that of the other two digital receivers is 2.6 Gsps. When the sampling rate of the digital receiver is 3 Gsps, the instantaneous bandwidth of each acquisition channel is 1.3 GHz. When the sampling rate is 2.6 Gsps, the instantaneous bandwidth of each acquisition channel is 200 MHz. The switching between the multiple analog signal channels is controlled by digital receivers to ensure that solar radio signals of 0.5-15 GHz are all sampled within a period of the time resolution.
In the spectrometer for this observation system, 16 channel MPR-FFT with 8k-point data is realized in a Xilinx UltraScale KU115 FPGA, and significant progress has been achieved in our tests. The computational resources can be reduced to a large extent; for instance, the Look-Up- Table (LUT), Look-Up-Table RAM (LUTRAM), Flip Flop (FF), and Digital Signal Process (DSP) slices are reduced by 37%, 50%, 17%, and 2.48%, respectively. In addition, the real-time processing time to process each frame of data can be reduced by 10 clock cycles working at 187.5 MHz.
Because the 0.5-15 GHz solar radio telescope is still under development, we built a simple platform to verify the performance of the MPR-FFT algorithm running on the digital receiver. In the temporary platform, the existing 2.4 m antenna and turntable in the laboratory are used to receive the solar radio signal of 7.6-8.9 GHz, which is the signal of one analog channel of the 0.5-15 GHz solar radio telescope. In the experiment, an antenna 2.4 m in diameter is used to receive solar radio signals. Amplifiers and filters working in the frequency band of 7.6-8.9 GHz are used to amplify and filter solar signals from the antenna. The processed analog signal is then sent to the digital receiver for digitization and real-time FFT algorithm processing. In the experiment, we recorded the data processed by the digital receiver when the antenna was pointing to the Sun and pointing away from the Sun. Figure 13 is the spectrum image we processed using the recorded data. The left part of Figure 13 is the result of the antenna pointing to the sky minus its own value, and the right part of Figure 13 is the result of the antenna pointing to the Sun minus the antenna pointing to the sky. Obviously, in the frequency of 7.6-8.9 GHz, the value when the antenna points to the Sun is approximately 1 dB higher than the value when the antenna points to the sky. The digital receiver with the MPR-FFT algorithm can observe the solar radio signal in the experiment, indicating that the MPR-FFT algorithm is suitable for solar radio telescopes.
The 35-40 GHz solar radio telescope that has been built is our very mature observation system. We applied the MPR-FFT algorithm to the digital receiver of the 35-40 GHz solar radio telescope to further verify the performance of the algorithm (Yan et al. 2020;Yan et al. 2021;Shang et al. 2022). The existing 35-40 GHz solar radio telescope has been put into observation for more than 1 yr (Yan et al. 2020;Yan et al. 2021;Shang et al. 2022). At present, its instantaneous bandwidth is 500 MHz, and the output signal of the intermediate frequency bandwidth of the analog receiver is 687.5-1187.5. In the experiment, the 35-35.5 GHz solar signal, which is the signal of an analog channel of the 35-40 GHz solar observation system, was transformed to 687.5-1187.5 MHz by down-conversion processing. Then, the signal was connected to the digital receiver with the MPR-FFT algorithm for observation. As shown in Figure 14, compared with the background values (5°deviate from the Sun), slight and significant enhancements appear when the antenna points to the Sun. We note a phenomenon in the middle of the picture in which the intensity gradually increases every 1.75 s. This results from the rotation of the antenna pointing every 1°, and after rotating the antenna five times, a stable intensity is obtained. It can be concluded that the MPR-FFT algorithm is also suitable for 35-40 GHz solar radio telescopes and has wide applications.