Satellite Aerosol Optical Depth Retrieval Based on Fully Connected Neural Network (FCNN) and a Combine Algorithm of Simplified Aerosol Retrieval Algorithm and Simplified and Robust Surface Reflectance Estimation (SREMARA)

Aerosol satellite retrieval can provide detailed aerosol information on a large scale, which becomes one of the main ways of global aerosol research. However, rapid and accrue aerosol retrieval by satellite is challenging, typically requiring radiation transfer models (RTMs) and surface reflectance (SR). An aerosol retrieval algorithm (SEMARA) combining the simplified aerosol retrieval algorithm and simplified and robust surface reflectance estimation can obtain local high-precision aerosol optical depth (AOD) without RTMs and SR datasets, while the method cannot perform large-scale and long-term aerosol retrieval. Hereby, a machine learning (ML) method based on the fully connected neural network (FCNN) and SEMARA was proposed. The new method optimizes the traditional sample construction of the ML and can achieve aerosol retrieval at a larger spatial and temporal scale. Moderate resolution imaging spectroradiometer data were applied to AOD retrieval on four typical regions globally. The AOD retrievals were validated using aerosol robotic network measurements in comparison to MOD04_3K AOD and the SEMARA. The accuracy validation indicators of the new method, in which the root-mean-square error (RMSE) was 0.109, mean absolute error (MAE) was 0.072, Pearson correlation coefficient (R) was 0.8983, and approximately 79.69% of the retrievals fell within the expected error (EE), performed better than MOD04_3K (RMSE = 0.1972 MAE = 0.1403, R = 0.7692 and Within EE = 55.24%) and the SEMARA method (RMSE = 0.2465 MAE = 0.1106, R = 0.0.5968 and Within EE = 72.85%) in all study regions, and the AOD retrievals can better reflect the spatial variation of AOD with better spatial continuity and coverage.


I. INTRODUCTION
A EROSOLS, the mixed system of solids and liquids suspended in the atmosphere, are mostly originated from human and natural activities (industrial activities, biomass burning, and dust particles) and the soot, pollen, microorganisms, and rain, snow, and fog particles in the atmosphere can all be considered different types of aerosols [1], [2]. Due to inhomogeneous global distribution, short variation period, complex chemical composition, and potential impact on climate and human health, atmospheric aerosols have become a research hotspot in many disciplines, especially in the field of atmospheric science [3], [4], [5], [6], [7]. Aerosol optical depth (AOD), which represents the effect that atmospheric aerosols on the absorption and scattering of solar radiation, is one of the most significant parameters for satellite atmospheric aerosols research [8], [9], [10].
The satellite remote sensing aerosol retrieval can provide aerosol detailed information in spatial variation and has longterm and large-scale aerosol detection, which is important for global aerosol research. The difficulties of satellite aerosol high-precision retrieval are calibration, cloud screening, aerosol models, and surface reflectance [11], [12], where the surface reflectance is widely studied and generally regarded as the key to separating the contribution of surface and atmosphere to solar radiation [13], [14], [15]. A study based on RTM indicated that the 0.01 error of the surface reflectance could cause about 0.2 absolute error of AOD when the AOD value is 1 and surface reflectance is 0.08 [16], [17]. Many aerosol retrieval algorithms had been developed based on different principles for satellite remote sensing sensors, such as dark target (DT) algorithm [18], [19], deep blue (DB) algorithm [16], [20], [21], multiangle implementation of atmospheric correction (MAIAC) algorithm [22], [23], AATSR dual view algorithm [22], [24], [25], and structure function method algorithm [1], [26], [27], where the DT and DB were the first two algorithms proposed with extensive development. The DT algorithm is based on those dark targets (water and vegetation) that have low surface reflectance in the visible and statistical relationships between visible and near-infrared wavelengths to determine surface reflectance in the visible wavelengths [28], [29], [30]. The precision of estimated surface reflectance is high in areas with low surface reflectance in the visible wavelengths (< 0.1), while the reflectance was not determined on most land surfaces without a dark target, so the DT algorithm was mainly used to perform AOD retrieval over the oceans with low-light surfaces [31], [32]. To achieve AOD retrieval in a greater range of surface types, high-reflectance bright areas, the DB algorithm was proposed, which is based on the fact that most surface types have lower reflectance in the blue band compared with other bands and good recognition of aerosols [16], [20], [21]. The algorithm used the surface reflectance datasets generated from existing satellite data in the blue spectral region (412 nm, 443 nm, and 490 nm) to realize the separation of surface contributions from satellite signals, which was widely used to retrieve AOD in different types of surfaces [16], [20], [21], [33]. However, the construction of surface reflectance datasets is time-consuming and suffers from varying surface reflectance [34]. The AOD retrievals of the DB algorithm are also affected by the matching and conversion process on account of using different types of data. Based on the DT algorithm and DB algorithm, MOD04_3K, the Moderate Resolution Imaging Spectroradiometer (MODIS) aerosol product over oceans and continents globally with a higher spatial resolution, is now available [30], [35] and was updated to C6.1 by National Aeronautics and Space Administration (NASA) in 2017, which were widely used for global aerosol monitoring and research [31], [36], [37], [38]. All of these algorithms make use of aerosol-retrieval look-up tables (LUTs), which are calculated by radiative transfer models (RTMs) [39], [40] in advance and include necessary input parameters (e.g., solar and view angles, AOD, and aerosol models). The LUT is restricted to the precision of RTM and input parameters about aerosol properties, which significantly affect final AOD-retrieval accuracy [41]. To simplify the aerosol retrieval algorithms that use RTM to calculate LUTs in advance, a simplified aerosol retrieval algorithm (SARA) was proposed, which can achieve local high-precision AOD retrieval without LUTs [11], [42]. The high-precision algorithm is based on the atmospheric radiative transfer equation [Equation (3.1)] and ground-measured AOD to determine local aerosol conditions and types (asymmetry factor and single scattering albedo), which is different from those algorithms assuming aerosol models by earlier observations. However, the surface reflectance datasets also need to be constructed for the algorithm to separate atmospheric and surface contributions to solar radiation [43]. To avoid errors due to the construction of surface reflectance datasets and reduce time consumption, the aerosol retrieval algorithm [44], an integration of the simplified and robust surface reflectance estimation (SREM) [13] and SARA methods (SEMARA), was proposed. The algorithm uses the SREM method to determine real-time surface reflectance, which need not surface reflectance datasets and was not affected by the change of surface reflectance, while the aerosol retrieval algorithm with supporting of ground-measurement AOD is restricted by SREM and SARA (see Section III-A), which is not suitable for aerosol monitoring over a large scale.
With the development of machine learning (ML) algorithms, many satellite aerosol retrieval algorithms based on ML had been proposed and developed, such as aerosol retrieval based on deep neural networks, random forest and support vector machine [45], [46], [47], [48], which have the strong nonlinear fitting ability for a large amount of data. These ML models can be used as a statistical method to extract remote sensing information for quantitative parameters-retrieval problems, where the clear physical meanings of these parameters are difficult to be accurately described. The neural network models (NNMs) have obvious advantages over other ML models in remote sensing parameter-retrieval efficiency and precision, which had been widely used and validated in different regions around the world [49], [50], [51]. Xing Feng et al. [50] proposed a neural network aerosol retrieval framework based on the fully connected neural network (FCNN) for the geostationary satellite, which was applied to the Advanced Himawari Imager on Himawari-8 and successfully obtained aerosol parameters in China region. The accuracy and reliability of FCNN-retrieval parameters were all better than that of the official aerosol product. Zbizika et al. [52] used deep neural network (DNN) to estimate Svalbard's AODs utilizing auxiliary data (temperature, air mass, water vapor, and wind speed) and obtained results close to the ground measurements. Lu et al. [53] used global mid-low latitude ground AOD measurements to train the DNN model, which achieved high-precision AOD retrieval and the results had a high correlation with ground measurements. However, these aerosol retrieval algorithms based on ML are hard to be given a reasonable interpretation of their behaviors and suffer from representative sample construction and effective training [54], [55], [56]. Chen et al. [34] used a sample dataset constructed by RTM to train deep belief network, which was applied to AOD retrieval in typical regions around the world. The article simplified ML's sample construction and obtained AODs having better precision and spatial coverage compared with the aerosol product based on MAIAC. Although the ML method has the advantage in satellite aerosol retrieval, there are some limitations: The results are affected by the training sample constructed by RTM; the fine sample construction requires a large amount of computation; The NNM is not suitable for newer satellite sensors not contained in RTM; The NNM only uses spectral data without consideration of the effect of auxiliary data on aerosol.
To solve the problem that the SEMASA algorithm needs ground-measured data and the problem of traditional ML's sample construction and application, the aerosol retrieval method based on FCNN and SEMASA (abbreviated as FCNN_SEMASA) is proposed, which uses the AOD retrieval of SEMASA to train FCNN to obtain the optimal NNM. The SEMASA method can use single scene image to obtain local high-precision AOD datasets, which can be used as training samples of the FCNN. Hereby, the trained optimal FCNN model can achieve high-precision AOD retrieval using single-scene satellite images and auxiliary data and has certain spatial and temporal adaptability. Compared with other ML methods, the new method Simplifies sample construction and considers the correlation between auxiliary data and aerosol. The FCNN with a simple structure is easier to be trained, which has a lower cost for the rapid monitoring of atmospheric aerosols and can be conveniently used for different satellite sensors compared to other mature traditional algorithms (e.g., MAIAC and DB algorithm). The Beijing region of China mainly includes urban areas with less vegetation, where the fuel combustion, industrial waste gas and traffic exhaust emissions had led to high aerosol concentration in the air due to the developed economy and dense population [57].

A. Study Regions
The land part of the West and Central Africa study region contains parts of Niger and Nigeria countries. Nigeria has a large number of exhaust emissions s from the burning of agricultural crop residues and waste material, which has high aerosol load. On the contrary, most areas of Niger are located in the Sahara Desert with low vegetation coverage, dry climate and strong winds, and are frequently attacked by dust storms. Thus, West and Central Africa in the article have poor air quality and different aerosol sources [58].
The selected aerosol robotic network (AERONET) sites of South Asia are influenced by different aerosol types (dust, smoke, urban aerosols) and features of geography and climate (arid areas, coastal zones, elevated land, and vegetation type). South Asia is one of the most densely populated regions in the world and suffers from serious air pollution, particularly during postmonsoon and winter [59]. The prevalence of different sources and varying meteorological conditions lead to complex and diverse aerosol composition and morphology of South Asia. The region is highly affected by mineral dust transported from the north-western deserts and dry western regions of South Asia during premonsoon and is affected by smog (smoke and fog) and exhaust emissions during postmonsoon and winter [60].
The Southeast United States located on the Atlantic coast mainly includes three states, namely, Kentucky, Tennessee, and North Carolina, which have vast tracts of farmland growing tobacco, cotton, and other crops in inland areas, as well as industrial areas close to the coast [61]. The aerosol load here is low all year round, and the aerosol sources are mainly from human activities and sea breeze.
Overall, these four study regions, on the one hand, contain different land types and vegetation coverage, on the other hand, they have various climate and air pollution, which have included the geographical and climatic conditions of most population gathering areas around the world. Hereby, the validation of the proposed AOD-retrieval algorithm in these areas has certain representativeness and typicality, and can effectively verify the adaptability of the algorithm in these different typical cases. Table I shows the details of the four study regions.
B. Data Sources 1) AERONET Data: AERONET is a global ground-based aerosol observation network, which uses the CIMEL CE318 automatic sun photometer as the basic observation instrument to provide aerosol ground-based observations at more than 1500 sites in major regions of the world. The AERONET AOD measurements with high accuracy and low uncertainty are generally used as reference truth values for satellite aerosol retrieval results, which are distinguished as three quality levels: Level 1.0 (unscreened); Level 1.5 (cloud-screened and quality controlled); and Level 2.0 (cloud-screened and quality-assured) [62], [63].
Higher levels represent higher data quality and lower uncertainty, so the highest level (Level 2.0) AOD measurements were selected in most sites, and Level 1.5 and Level 1.0 was selected in the sites without Level 2.0 data (https://aeronet.gsfc.nasa.gov/, accessed on November 11, 2022). Table I shows detailed information about the selected sites of the four study rejoins.
AERONET can provide AOD at wavelengths of 440 -, 675 -, 870 -, and 1020-nm with a time resolution of 15 min. To validate the AOD retrieved from the FCNN_SEMASA and SARA method at 550 nm, the AOD measured from AERONET sites needs to be converted using the followingÅngström equations [64]: where τ α (λ) is the AOD measurement at the wavelength of λ (550 nm), β is the turbidity coefficient; a is the wavelength index; and λ 1 and λ 2 are the wavelengths of the two selected channels in the AERONET measurements.
2) Auxiliary Data: The total columnar water vapor (kg m−2), total columnar ozone (kg m−2), surface temperature (K), and surface pressure (pa) from the ERA5 (European Centre for Medium-Range Weather Forecasts (ECMWF) Reanalysis v5) hourly data (https://cds.climate.copernicus.eu/, accessed on November 11, 2022) were used as they affect the visible band TOA reflectance or are directly related to AOD. The ERA5, the fifth generation ECMWF atmospheric reanalysis of the global climate, can provide estimates of atmosphere variables for each The MOD02HKM data is a 500-m resolution MODIS 1Blevel dataset, which contains the radiation values of seven discrete bands in the range of electromagnetic wavelength from 0.45 to 2.20 μm and the detailed information about the seven bands is shown in Table II. These data are scanned by MODIS 1A-level original radiation, and preliminary calibration and geolocation are performed. According to these studies [2], [16], [42], usually, the longer the band, the less aerosol information is contained, so the short-wave B1, B2, B3, and B4 bands of the MOD02HKM data were applied to the SEMARA method. The MOD021KM data is similar to MOD02HKM, but MOD02HKM has a lower resolution (1 km) and contains more bands (36 bands). In the article, MOD021KM data was applied to obtain the spatial AOD distribution over the four study regions. The MOD04_3K data is a mature aerosol product of MODIS, which was created by the DT algorithm with certain restrictions on land and can provide the ambient aerosol optical properties (e.g., optical depth and size distribution) over the oceans and continents globally [30]. The AOD of MOD04_3K in the land was used to quantitatively and qualitatively compare with the AOD of the FCNN_SEMASA method to reveal the advantages of the new method since the DT algorithm had been widely verified and had high accuracy in vegetation areas [19], [65], [66]. The MOD03 and MOD35 data were used to provide observational geometry and cloud and water masking data.
In the article, the 13 predictors (spectral data: the TOA reflectance in B1, B2, B3, and B4, surface reflectance of SREM in B4; geometric data: solar zenith angle, satellite zenith angle, relative azimuth, and auxiliary data: total columnar water vapor, total columnar ozone, surface temperature, surface pressure, and DEM) from above auxiliary data and MODIS data were used as the input and the AOD retrieval of SEMARA in B4 band was used as the output target. A total of 579 340 recorded samples were obtained through the SEMARA method, with 80% of the samples being randomly selected as the training dataset for the FCNN training and the remaining 20% being used as the testing dataset for model verification to obtain an optimal model.

A. SEMARA
The SEMARA method, a combination of SREM and SARA methods, uses surface reflectance estimated by SREM and ground AOD measurements to retrieve AOD within a specific region [67]. The method possesses simple, fast and high-precision advantages and is especially suitable for small-scale satellite aerosol retrieval. The AOD-retrieval principle of SEMARA can be explained by the following equations [11], [13]: where τ a = aerosol optical depth. μ s = cosine of the solar zenith angle. μ v = cosine of the sensor zenith angle. ω 0 = single scattering albedo. P a = aerosol phase function. ρ TOA = reflectance received by satellite at the top of the atmosphere (TOA).
ρ R = atmospheric reflectance due to Rayleigh scattering. ρ s = surface reflectance. T s = atmospheric transmittance along the sun-surface path (downward).
T v = atmospheric transmittance along the surface-sensor path (upward).
T s0 = same as T s but ignoring aerosol information. T v0 = same as T v but ignoring aerosol information. S atm = atmospheric backscattering ratio. S atm0 = same as S atm but ignoring aerosol information. The reflectance received by satellite at the top of the atmosphere, which is a function of measured spectral radiance (L TOA ), cosine of the solar zenith angle (μ s ), earth-sun distance (d) in the astronomical unit, and mean solar exoatmospheric radiation (ESUN), can be calculated using [68] To solve (2) for τ a , the aerosol phase function (P a , (5a)) [69], the scattering angle (Θ, (5b)) [28], the atmospheric reflectance due to Rayleigh scattering (ρ R , (6a)) [2], [70], the Rayleigh scattering phase function (P R , (6b)), the Rayleigh optical depth (τ R , (6c)) [71], the atmospheric transmittance (T s and T v , (7a) and (7b)) and the atmospheric backscattering ratio (S atm , (8)) [72] were calculated where g = symmetry factor. θ v , θ S and ϕ = solar zenith angle, sensor zenith angle and relative azimuth angle, respectively. λ = wavelength in µm The surface reflection (ρ s ) for the SREM method approximated as (3), is different from the TOA reflectance and the real surface reflection obtained by strict atmospheric correction or ground measurement, which corrects the Rayleigh scattering contribution from the atmosphere and retains certain aerosol information. In (3), the atmospheric transmittance of sun-surface path (T s0 ), atmospheric transmittance of surfacesensor path (T v0 ) and atmospheric backscattering ratio (S atm0 ) in an aerosol-free atmosphere can be calculated, respectively, by [13] Overall, τ a be determined by g, ω 0 , ρ TOA and ρ s , where ρ TOA can be obtained from MODIS images and ρ s is available using SREM by (3). The asymmetry factor (g) indicates the relative dominance of forwarding/backscattering and it remains constant for most of the aerosol models and the single scattering albedo (ω 0 ) reveals the relative magnitude of radiation scattering and attenuation caused by aerosol [72]. These two parameters (g and ω 0 ) require support from ground AOD measurement in the region and are solved iteratively by (2), then, the AOD of this region can be retrieved using the fixed point iteration method [73].
Due to the limitation of the SARA method, only AOD can be obtained over a small space range, which does not exploit the spatial-resolution advantages of satellites to the full. Otherwise, the SREM method will fail when the surface reflectance is low due to fixed Rayleigh scattering contribution [74] and the SARA method will fail in the regions with significant surface variance for the estimated aerosol parameter (g and ω 0 ) in (2) are affected by surface reflectance.

B. Fully Connected Neural Network
FCNN models constructed by the three main layers (input layer, hidden layer, and output layer) have a strong nonlinear fitting ability and significant advantages for large amounts of data compared with other ML models (i.e., XGBoost and Random Forest) [49], [50]. The hidden layer included three layers, which had 256, 512, and 512 neurons, respectively. The layers were connected with each other through activation functions combined with weight and bias. The ReLU nonlinear activation function was used in each layer except for the last output layer, which used a linear activation function to ensure meaningful AOD results [48]. The relationship between the input and output of the entire network can be represented by w 1 , w 2 , . . . , w n , b 1 , b 2 , . . . , b n ) (11) where τ represents the AOD outputted by the output layer, F represents the mathematical manipulation function consisting of a series of linear and ReLU nonlinear functions, x 1 , x 2 , x 3 , x 4 , x 5 represent the five predictors of the input layer, w 1 , w 2 , . . . , w n represent weights and b 1 , b 2 , . . . , b n represent biases combined with the activation function among different layers.
The loss function of the NNM can be defined by (11) and the training of the network is actually to continuously adjust the weights and bias to minimize the function [75] where m is the number of the training sample, y represents the real target value (AERONET AOD measurement), x represents the predictors, and w and b represent the weights and biases, respectively. In the article, the mean squared error (MSE), which represented the deviation between true and predicted values, was used as a loss function during the training. The stochastic gradient Descent method (SGD) was used as an optimizer with batch size set as 250 epochs to ensure a stable and robust solution [76]. The dropout layer was used to avoid overfitting by randomly deleting neurons in the hidden layer with a given 0.3 probability [77]. The weight and bias of layers were initialized randomly and the learning rate was initialized as 0.001 [54]. The detailed network parameters are described in Table III. AOD retrieval experiments were performed based on FCNN-SEMARA and SARA methods using MODIS data. To simplify the expression of this article, the AOD retrievals from FCNN-SEMARA, SARA, and MOD04_3K products were abbreviated as "AOD_FCNN," "AOD_SARA," and "AOD_MOD04," respectively, in the following article.

1) Validation and Comparison With AERONET Ground Measurements:
To quantitatively verify the AOD_FCNN, AOD_SARA, and AOD_MOD04 products, the AOD measurements from AERONET Ground sites were used as reference true values. The AERONET sites and data selected had been introduced in Section II. To reasonably select data for validation, it is necessary to match the AOD retrievals with AERONET AOD measurements in time and space. The specific matching strategy in the article is given as follows. The average AOD measured by AERONET ground sites at the MODIS satellite overpass time within ± 30 min was taken as the true value, and the AOD must be obtained by at least two sets of data from the same AERONET ground site, otherwise, it was not involved in the validation. The AOD values of images within 7 × 7-pixel sampling windows (3 × 3 for AOD_MOD04), which were centered on each AERONET site location, were obtained to implement validation experiments and the number of missing values cannot exceed 30% of the total in the sampling windows. Based on the time-and space-matching strategy, the precision validation of AOD_FCNN, AOD_SARA, and AOD_MOD04 in four study regions was performed. It is necessary to note that all data involved in the validation experiments had not been used to train the FCNN model and sites with the most data records in each region provided AODs for the SARA method.
To verify the accuracy and reliability of AOD retrievals, multiple accuracy validation indexes were used, namely, the Pearson correlation coefficient (R), mean absolute error (MAE), root-mean-square error (RMSE), and expected error (EE). To simplify the expression, AOD retrievals falling within, exceeding, and below the EE range are abbreviated as "Within EE," "Above EE," and "Below EE," respectively. The values of these indexes can be calculated by Table IV shows the statistics of selected accuracy validation indexes of the AOD_FCNN, AOD_SARA, and AOD_MOD04 products in the four typical regions over the world, where N represents the number of groups of AOD validation point pairs obtained. Fig. 2 shows the scatterplots and accuracy validation indexes of the three AOD products versus AERONET AOD measurements at 550 nm.
For the BJ region with relatively heterogeneous surface and serious industrial pollution, the AOD_FCNN had the most AOD pairs (626), AOD_MOD04 obtain 406 AOD pairs and AOD_SARA had the least AOD pairs (368). As discussed in Section I, the DT method is limited by surface characteristics and the SARA method needs the support of the ground measured, so fewer AOD points pairs were collected by the two products at the same time and this kind of circumstance also occurred in the other three regions. Table IV and Fig. 2(a) showed that the SARA methods had notable advantages and better accuracy validation indexes (R = 0.9761, MAE = 0.0362 and RMSE = 0.051) compared to AOD_FCNN (R = 0.9007, MAE = 0.0576, and RMSE = 0.0868) and AOD_MOD04 (R = 0.8527), MAE = 0.246 and RMSE = 0.2776) products for the validation of the selected data in BJ region with AERONET sites closed to each other. As explained in Section III, AOD_SARA had high accuracy on a small spatial scale as the SARA method uses an AOD value of AERONET site to retrieve spatial AODs around the site, and the characteristic of SARA also ensures the accuracy and reliability of trained samples of FCNN-SEMARA method. The accuracy and consistency with AERONET measurements of AOD_FCNN were similar to AOD_SARA and were relatively excellent in the urban region with complex aerosol sources. However, the AOD_MOD04 had poor performance with only 17.24% Within EE and had obvious overestimation. The RMSE and MAE were all greater than 2 and the Pearson correlation coefficient (0.8527) of AOD_MDO04 was smaller than that of the AOD_SARA (0.9761) and AOD_FCNN (0.9007). On the contrary, the AOD_SARA and AOD_FCNN had 91.03% and 84.98% AOD retrievals falling within the expected error range, respectively, and the RMSE and MAE were all less than 1. As such, the AOD_FCNN and AOD_SARA retrievals had a   smaller error, higher accuracy, and greater consistency with the AERONET data than the AOD_MOD04 product.
The number of effective AOD points pairs of the AOD_FCNN (N = 406) in the NA region, like the BJ region, was more than that of the AOD_SARA (62) and AOD_MOD04 (134). The AOD_MOD04 had more missing values in the NA region with sparse vegetation and a high-light surface. It is noteworthy that the AOD_SARA had large errors (MAE = 0.5876 and RMSE = 0.7463), consistency with the AERONET data (R = −0.0032) and reliability (Within EE = 4.84%) for a big spatial scale of NA region, which was different from BJ region where the AOD_SARA had excellent performance. The MAE (0.118) and RMSE (0.1611) of AOD_FCNN were smaller compared to that of the AOD_MOD04 (MAE = 0.2145 and RMSE = 0.2547) but the R (0.9141) of AOD_MOD04 is higher than that of the AOD_FCNN (R = 0.84), which indicated that the AOD_MOD04 had better consistency with AERONET data but a worse precision compared to AOD_FCNN. In addition, Fig. 2(b) showed that the percentage of retrievals within the EE of AOD_FCNN (65.27%) was significantly higher than AOD_MOD04 (28.36%), which indicated that most of the FCNN-SEMARA retrievals were closed to the AERONET AOD measurements as observed in the region.
AERONET sites of the SA and SEUS similar to those of the NA region are separating, which are not located in the appropriate space of the SARA method. The AOD_SARA performed poorly compared to the other two AOD products in the two study regions for the limitations of the SARA method, so only the performance of AOD_FCNN and AOD_MOD04 were compared in the following article. In the SA, Table IV and Fig. 2(c) showed that the retrieval results of the AOD_FCNN and AOD_MOD04 were generally similar and the Pearson correlation coefficients validated of these two products were identical (R = 0.7877). The difference of MAE and RMSE between the AOD_FCNN (MAE = 0.1064 and RMSE = 0.1363) and AOD_MDO04 (MAE = 0.1013 and RMSE = 0.1497) were less than 0.02, which indicated that these products had similar high accuracy and smaller errors. The percentage of retrievals within the EE of the AOD_FCNN (72.05%) and AOD_MOD04 (79.21%) were all more than 70%, which proved that the stability of the retrievals and the AODs distribution of these two products was not much different. As shown in Fig. 2(d), the MAE (0.0356) and RMSE (0.0512) of the AOD_FCNN, like the SA region, differed from those of the AOD_MOD04 (MAE = 0.0359, RMSE = 0.0508) by less than 0.02. The difference of the Within EE of AOD_FCNN (88.38%) and AOD_MOD04 (87.67%) was less than 1%, indicating these two products had approximately similar degrees and small differences in the region. However, the Pearson correlation coefficient of the AOD_FCNN (0.7664) was significantly lower compared to that of the AOD_MOD04 (0.8857), which indicated AOD retrievals of FCNN-SEMARA For all four regions combined, the scatterplots and accuracy validation indexes of the three AOD products against AERONET AOD measurements at 550 nm were shown in Fig. 3. It can be seen from the figure that most of the AOD values of the AOD_FCNN were close to the 1:1 line and had centralized distribution [see Fig. 3(b)], and significant overestimation and discrete distribution were observed in the AOD_MOD04 [see Fig. 3(a)] and AOD_SARA [see Fig. 3(c)], which proved that the AOD_FCNN had better stability of the retrievals than the other AOD products in this article. The accuracy validation indexes of the AOD_FCNN were all the best among these three products, which can be intuitively seen in Fig. 3(d). Table IV can also prove the conclusion of Fig. 3(d) that the percentage of retrievals within the EE of the AOD_FCNN increased by 9% to 44%, the RMSE decreased by 45% to 56%, the MAE decreased by 35% to 49% and the R increased by 17% to 51% compared to the other AOD products.
Overall, the validation results proved that the SARA method was not suitable for large-scale AOD retrieval and the requirement of ground measurements limits its further application. The AOD_MOD04 had excellent performance in the regions with dense vegetation and relatively dark surface (e.g., SEUS), but there was a clear overestimation or underestimation in the urban and arid regions (e.g., BJ and NA). On the contrary, the FCNN-SEMARA method had excellent performance in every region and was suitable for all four-study regions as every accuracy validation index of the AOD_FCNN was better than the AOD_MOD04 and AOD_SARA in all study regions.
As discussed previously, the SARA method cannot well represent the spatial variation of AOD on a big spatial scale, so the spatial distribution of AOD_SARA would not be shown and discussed in this section.
To compare better in space and save computing costs, MOD021KM with 1 km resolution was used to implement the FCNN-SEMARA method rather than MOD02HKM with 500 km, which was used for quantitative validation. Spatially representative AOD images with low cloud cover were selected for analysis and comparison of AOD_FCNN and AOD_MOD04. Fig. 4 showed the AOD spatial distribution of AOD_FCNN and AOD_MOD04, which contained four study regions on May 18, January 5, January 20, and January 31, 2018, respectively. From these figures, AOD_FCNN and AOD_MOD04 had similar spatial variation trends, which were following the actual circumstance in the four regions. For example, the MOD021KM image (g) of the North of South Asia was fuzzy compared to its surrounding areas, which probably had high aerosol loading and the corresponding high AOD values were observed in these two AOD products images [(h) and (i)]. The AOD_FCNN provided more details about the spatial distribution of AOD than AOD_MOD04 because of its higher spatial resolution, and AOD_FCNN had better spatial coverage and continuity while AOD_MOD04 showed a large number of missing values and data gaps, particularly in the areas of arid and urban of West and Central Africa and Eastern Asia. The missing AOD retrievals of AOD_MOD04 were due to the inevitable limitation of the DT method, which required surface reflectance and too bright a surface would invalidate the algorithm. The AOD_FCNN could be obtained using the FCCN model and single scene satellite image, which was not limited by surface reflectance and theoretically suitable for aerosol retrieval over various cloud-free surfaces. In addition, higher AOD values were occasionally observed in the AOD_MOD04 compared to AOD_FCNN, and the high values (>1.5) of AOD_FCNN in the Southeastern US were probably caused by cloud contamination. As shown in Figs. 2 and 4, The AOD values in Eastern Asia, West and Central Africa and South Asia were usually large on account of the serious air pollution and sand-dust transport and were low in the Southeastern US owing to the clean atmosphere and high vegetation coverage, which were also confirmed in the previous validation experiments.

B. Importance of Auxiliary Data to FCNN-SEMARA
In this article, auxiliary data (total columnar water vapor, total columnar ozone, surface pressure, temperature, and DEM) related to aerosol was used to participate in sample construction of the FCNN model and the trained network model has an excellent performance in AOD retrieval. However, Most AOD retrieval methods based on physical mechanisms only use spectral information provided by satellite images without the auxiliary data to retrieve AOD. To examine whether the FCNN model utilizing auxiliary data has better accuracy, a model without auxiliary data (only spectral and geometric data) was obtained to retrieve AOD and compared with the FCNN model utilizing auxiliary data in the article. It should be noted that the network structure and parameters of these two models were the same except for the predictors of auxiliary data and the accuracy validation strategy was the same as that described in Section IV-A. Their training error variation with the number of training iterations, and the density scatterplots and accuracy validation indexes versus AERONET AOD measurements at 550 nm in all four study regions were shown in Fig. 5(a)-(d), respectively.
From Fig. 5, the training error decreased with an increase in the number of training iterations, especially for less than 50 iterations. The convergence rate of the model without auxiliary data was slightly faster than that of the model utilizing auxiliary data but the oscillation was more significant at the end of the iteration for fewer predictors [see Fig. 5(b)]. The train loss and valid loss of the model without auxiliary data were 0.010357 and 0.009983, respectively, and higher than those of the model utilizing auxiliary data (train loss = 0.008004 and valid loss = 0.005667), which proved that auxiliary data can make network training more perfect but increase the number of iterations. From Fig. 5(c) and (d), the difference in AOD point-pairs number in the two models (auxiliary data: 1748 and no auxiliary data: 1756) was due to the difference in predictors and the difference in the number of 2 occurred in the SEUS. The Pearson correlation coefficient and the Within EE of the model utilizing auxiliary data were 0.8983 and 79.69, respectively, and higher than those of another model (R = 0.842 and Within EE = 73.08), and no significant overestimation (10.07) or underestimation (10.24) was observed in the model utilizing auxiliary data compared to another model with significant underestimation (Below EE = 18.96), which indicated that the AOD retrievals of the model utilizing auxiliary data had better consistency with the AERONET data and most of them were closed to the ground measurements. The other two accuracy validation indexes of the model utilizing auxiliary data, MAE (0.072) and RMSE (0.109), were lower than those of another model (MAE = 0.0879 and RMSE = 0.1399), proving that the model utilizing auxiliary data had better accuracy and smaller bias as expected. Consequently, the auxiliary data played an important role in the AOD retrieval of the FCNN model and should be used as predictors in the MLM account for its improvement of accuracy.

C. Influence of Different Training Parameters on FCNN-SEMARA
To verify the stability of the new method with different training parameters, the different number of hidden layers (256, 512, 512, 512, and 512), optimizer (Adam), and batch size (1024) compared to the abovementioned model was used to train different models. These models were applied to retrieve AOD in all four study regions like Section IV-A. Their retrievals were validated by the AERONET measurements and the results were shown in Fig. 6.
The selected model with training parameters of the number of hidden layers (256, 512, and 512), optimizer (SGD) and batch size (512) performed well with the best accuracy validation indexes compared to other models with different training parameters. For increasing the number of hidden layers [see Fig. 6(b)], the retrievals of the model with a more complex structure showed a slightly larger error (MAE = 0.0775 and RMSE = 0.1198) and less fell into the EE range (77.48%). However, the correlation coefficient (0.8925) and effective retrievals (1745) of the complex model were similar to the model with fewer hidden layers. Compared to the SGD, the Adam optimizer was less suitable for the AOD-retrieval FCNN model because the model using this optimizer performs worse with lower accuracy (MAE = 0.0982, RMSE = 0.147 and R = 0.8063) and fewer retrievals falling in the EE range (62.81%) but with same retrieval number (1748). Increasing the training batch size from 512 to 1024, the training was accelerated and the retrievals of the model with the larger batch size had a smaller bias (MAE = 0.0682 and RMSE = 0.105), similar R (0.8849) and number falling into the EE range (79.35%) but had significantly fewer effective retrievals (1482).
In general, the models with different training parameters (the number of hidden layers, optimizer and batch size) showed different performances. The model selected in the article is the optimal model for AOD retrieval and the other models were similar to the optimal model, which indicated that the FCNN model with the selected parameters in the proposed method is stable and reliable.

V. CONCLUSION
The FCNN-SEMARA method, based on the FCNN model and SEMARA method, is proposed for large-scale and longterm aerosol retrieval over different regions. The study includes aerosol retrieval in small local spatial and temporal scales using the SEMARA algorithm and MODIS data, which provided input train samples for the FCNN model. To ensure the representativeness of the sample, four typical regions around the world were selected for sample construction. Spectral data, Geometric data and auxiliary data were used as predictors, and the target value of output was AOD from SEMARA at 550 nm. Selecting representative results of the SEMARA method in each region of one year as train-sample sources, as several satellite images and sites could supply enough samples due to the large number of pixels in a single-scene image. The FCNN model was trained by the sample data to gain the optimal weighting and bias of each network layer for high-precision aerosol retrieval.
To verify the AOD retrieval accuracy of the FCNN-SEMARA method, the trained FCNN model and SARA method were used to perform AOD retrieval experiments using MODIS data in four typical regions around the world, and the AODs from AOD_MOD04, AOD_SARA, and AOD_MOD04 were, respectively, compared with AERONET AOD measurements. The results showed that the AOD_FCNN with low values of the MAE (0.072) and RMSE (0.109) had higher accuracy and reliability than AOD_MOD04 (MAE = 0.1403 and RMSE = 0.1972) and AOD_SARA (MAE = 0.1106 and RMSE = 0.2465) against the AERONET AOD, and a larger percentage of AOD retrievals fell within the expected error range (Within EE = 79.69%) compared with AOD_MOD04 (Within EE = 55.24%) and AOD_SARA (Within EE = 72.85%). In addition, the AOD_FCNN also had better consistency with AERONET AOD(R = 0.8983) compared to AOD_MOD04 (R = 0.7692) and AOD_SARA (R = 0.5968). Moreover, an intuitive spatial comparison of the AOD_FCNN and AOD_MOD04 indicated the AOD_MOD04 had better spatial continuity and coverage, especially in the areas with highlight surfaces and high aerosol loading. Overall, the AOD_FCNN had obvious advantages over the AOD_MOD04 and AOD_SARA in AOD-retrieval accuracy, reliability, quantity, and spatial continuity; From the performance of AOD_FCNN in the four study regions, the MAE values of AOD_FCNN in the four regions (BJ, WCA, SA, and SEUS) were 0.0579, 0.118, 0.1064, and 0.0356, respectively, RMSE values were 0.0868, 0.1611, 0.1363, and 0.0512, respectively, Within EE values were 84.98%, 65.27%, 72.05%, and 88.5%, respectively, and R were 0.9007, 0.84, 0.78774, and 0.7664, respectively. Especially, a relatively large RMSE value was observed in the WCA with bright surfaces and high aerosol loading, where the sand and dust weather in the dry season are frequent. Similarly, there was also the same problem of low precision in the SA, where the surface structure was heterogeneous and aerosol sources were complex. The possible reason was that the underestimation of the reflectance of the surface of these two regions by SREM in the SEMARA method leads to the overestimation of AOD [13], which led to a decrease in the quality of the training sample and affected the performance of the neural network model. This is a limitation occurring in the SEMARA method, which can be reduced by shrinking the spatial scope of the retrieval of the SEMASA method when constructing a sample for the FCCN model.
The representativeness of the sample is very important for current ML algorithms and the ML algorithms should not only use spectral information from satellite images but also consider auxiliary data, which is highly correlated with AOD. The addition of auxiliary data improved the AOD-retrieval accuracy and reliability of the FCNN-SEMARA algorithm in the article. In comparison with the SARA algorithm, the new algorithm can achieve aerosol retrieval at a larger spatial and temporal scale without the support of ground-measured AOD data. Compared with the general ML algorithms, the new algorithm greatly simplifies the sample construction method and uses the auxiliary data, and has high accuracy and efficiency. Furthermore, it can provide high-precision, reliable and stable spatial distribution of AOD in large-scale regions by utilization of only a single scene satellite image and auxiliary data. So, as FCNN-SEMARA has an important significance for real-time monitoring of atmospheric aerosols and has some potential to be easily applied to other satellite sensors.
In order to further improve the accuracy and adaptability of the algorithm, here are some works that should be done to improve the algorithm in the future.
1) For MODIS data, the comparison of this algorithm with more mature algorithms (e.g., MAIAC and DB) will be performed in order to fully discover the shortcomings and advantages of the algorithm. 2) The performance and uncertainty of the algorithm in regions with complex surface types and aerosol sources on a global scale should be analyzed, especially in East Asia, South Asia, and North Africa. 3) Verifying and comparing the performance of the algorithm when applied to other satellites, especially geostationary satellites with a high temporal resolution, which should be the advantage of this new algorithm compared to other algorithms. 4) Like other aerosol retrieval algorithms, the cloud also has a great impact on the performance of the algorithm. Therefore, cloud detection algorithms with higher accuracy for different sensors need to be proposed and applied.