Model-Based Sensitivity Analysis on Aerosol Optical Thickness Prediction

Prediction of aerosol optical thickness (AOT) is important to study worldwide climate changes. Researchers have built multiple AOT prediction models. However, few researches were focused on the validation of input attributes for AOT regression. In this paper, we proposed a support vector regression (SVR) model-based sensitivity analysis approach to order 35 MODIS input attributes according to their sensitivity to prediction outputs. Next, the attribute sensitivity orders are used for feature selection in the context of regression by removing insensitive attribute one at a time or by removing attributes whose sensitive orders are larger than number k. The experimental results based on the collocated data between MODIS and AERONET from 2009 to 2011 showed that the top 10 insensitive attributes can be screened to speed up prediction model computation with very little loss of accuracy. The results also suggested that the top sensitive attributes are the most informative attributes, requiring the highest precision for accurate AOT prediction. Thereby, our approach will be valuable for remote sensing scientists or atmospheric scientists to optimize the design precision of top sensitive attributes in scanning equipment like MODIS and therefore improve AOT retrieval accuracy.


Introduction
Aerosols are small solid or liquid particles produced by natural or man-made sources.The research on atmospheric aerosols is very useful to reveal the mechanism of the earth's solar radiation budget, water cycle balance, and climate change dynamics [1,2].Aerosol optical thickness (AOT) is one of the most important aerosol properties.AOT has been computed both by ground-based and satellite-based methods for years and many aerosol retrieval theories and algorithms were proposed.Although ground-based measurements, such as AERONET (Aerosol Robotic Network), turn out to be effective and have high accuracy in AOT retrievals, they require great cost correspondingly and are conditioned to a small amount of sporadic land observation sites.Satellitebased measurements based on domain model, such as MODIS (Moderate Resolution Imaging Spectroradiometer), have global coverage with low costs.But they have no enough accuracy due to the complexity nature of chemical or physical processes affecting aerosols.
However, few researches were focused on the validation of input attributes, or, say, feature selection, for regression models.Although feature selection has been studied in classification tasks [14][15][16][17], few works have been implemented in regress tasks.To some degree, too many input attributes could be redundant or noisy for accurate model prediction.Particularly, for some types of data, such as images, multiple feature extraction approaches can be applied to obtain many features [18,19].But only parts of them are informative for regression models.Thereby, it is very useful to study feature selection problem in the context of regression.
In this paper, we proposed a novel model-based sensitivity analysis approach for feature selection in AOT regression.The approach combines support vector regression (SVR) model with sensitivity analysis (SA) method together to validate the usefulness of model input attributes.SA is a useful tool to ascertain how the output of a given model 2 International Journal of Distributed Sensor Networks depends on its input attributes.It has been developed in a growing number of fields, such as domain modeling in hydrology, ecosystem, structure engineering, and so forth, where computational models are used to simulate the real world [20][21][22][23][24][25][26].SA helps the modeler to understand the model better, especially when the model is complex and unknown correlations exist among input attributes.
Our approach consists of three steps.Firstly, we build an optimized support vector regression (SVR) model to predict AOT.This is because SVR model has been demonstrated to achieve better prediction accuracy than other effective machine learning approaches for AOT prediction [3].Next, we propose a model-based sensitivity analysis approach to order the input attributes by their sensitivity effects to the prediction model outputs.Further, we compare prediction accuracy of model with full inputs to those of models by removing insensitive attribute one at a time or by removing attributes whose sensitive orders are larger than a number .The experimental results based on the collocated data between MODIS and AERONET from 2009 to 2011 show that the top 10 insensitive attributes can be screened to speed up prediction model computation with very little loss of accuracy.The results also suggested that the top sensitive attributes are the most informative attributes, requiring the highest precision for accurate AOT prediction.Thereby, our model-based sensitivity analysis method will be valuable for remote sensing scientists or atmospheric scientists to optimize the design precision of top sensitive attributes in scanning equipment like MODIS and therefore improve AOT retrieval accuracy.
The contribution of this paper is two sides.On one side, feature selection is normally studied in data classification task and few researchers study the problem in the context of regression prediction.In this paper, we proposed a novel model-based sensitivity analysis method for this purpose.On the other side, experimental results show that our method not only can refine the inputs to AOT prediction model, but also provide valuable insights for remote sensing scientists or atmospheric scientists to optimize the observation equipment design and therefore improve their geophysical parameter retrieval algorithm.

Related Work
In recent years, several data-driven retrieval approaches or machine learning methods were proposed for AOT prediction and validation [3][4][5][6][7][8][9][10][11][12][13].Radosavljevic et al. used MODIS radiance observations as inputs and predicted AERONET AOT by neural networks [4].Further, they applied five measures to evaluate the AOT retrieval accuracy [5].Both experimental results showed that the proposed ensemble of neural networks was significantly more accurate than domain-based AOT retrievals for all measures.To make models more accurate, Radosavljevic et al. proposed that the predictors should be customized according to different spatiotemporal partitions [6].They also explored to reduce AERONET sites and select only most informative neighborhood sites to improve accuracy [7].technique for regression and uncertainty estimation [8].Das et al. argued that AOT predictor could be enhanced by combining active learning method with neural networks [9].Han et al. applied a statistic approach to predict AOT as a complement to the domain algorithm [10].In their research, two statistic approaches, spatial interpolation and neural network predictors, were explored.The results showed that statistic approach could serve as a useful complement to traditional deterministic methods with reduced computational efforts.Albayrak et al. used a neural network algorithm with one hidden layer to build a global bias adjustment model to improve MODIS AOT retrieval accuracy [11].Besides of neural networks methods, support vector regression (SVR) was used for AOT prediction by Nguyen et al. [12].They used instance data set and aggregate data set, respectively, to build two SVR models for AOT predictions and they achieved more accurate results than neural network predictions.In addition, Djuric et al. proposed a semisupervised approach to integrate AOD estimations from multiple satellite sensors together and make more accurate estimations [13].However, all aforementioned related work did not use data-driven approach to validate the input attributes and check if the inputs really make significant contributions for AOT regression models.

Method
In this section, we will explain the proposed model-based sensitivity analysis approach for feature selection in AOT regression.The approach consists of three steps, building a SVR prediction model, SVR model-based sensitivity analysis, and feature selection for regression.The processing steps are illustrated in Figure 1.We will explain each step in detail as follows.
3.1.Prediction Model: SVR.SVR is a machine learning method to do regression prediction.It was built on the basis of the Vapnik-Chervonenkis dimension theory and structure risk minimization principle in statistical learning.SVR maps model's input attributes from lower nonlinear dimensions to higher dimensional feature space and tries to find the best regression hyperplane.The most commonly used SVR model is epsilon-SVR.Suppose the training dataset is composed of points ( 1 ,  1 ), ( 2 ,  2 ), . . ., (  ,   ), where   = ⟨ 1 ,  2 , . . .,   ⟩ is a feature vector of  attributes for sample point ,   is attribute ,   is the regression target for   , and  is the number of points in dataset.In epsilon-SVR, we aim to find a regression hyperplane with an epsilon-insensitive loss function as follows: where  and  are weight vector and offset, respectively and 0 is a nonlinear mapping function from the input attribute space to a high dimensional feature space.⟨⋅, ⋅⟩ represents the inner product of involved parameters.
For the function  to be epsilon-insensitive and also as flat as possible, we have the objective function and constraints for SVR as follows: The constant parameter  > 0 determines the trade-off between the flatness of  and the training error.  and  *  are slack variables suggesting the amount for exceeding the target value by more than epsilon or below the target value by epsilon.
By introducing the Lagrangian and performing optimization, we can have the regression hyperplane in the following dual representation: Here,   and  *  are Lagrange multipliers and (  , ) is the kernel function which represents the inner product ⟨0(  ), 0()⟩.
There are four major types of kernel functions, linear kernel, polynomial kernel, radial basis function kernel, and sigmoid kernel.In this study, we use epsilon-SVR with radial basis function kernel provided by libsvm [27] for AOT prediction.Formula (4) shows Gaussian Kernel, one of the radial basis function kernel used by libsvm: Here   ,   are two samples in the dataset, represented as vectors in an input attribute space.‖  −   ‖ 2 represents the squared Euclidean distance between them. is a free parameter.

Model-Based Sensitivity Analysis.
To apply feature selection, we firstly need to order attributes according to their contribution to the regression model, such that the most informative attributes are in the top and the noisy or useless attributes are in the bottom.In this paper, we combine SVR regression model with SA method together to propose a model-based sensitivity analysis approach for deciding the attribute order.SA is a powerful method widely used in studying the uncertainty of model inputs.It orders input attributes by their degree influencing the model outputs.The most sensitive attributes with the biggest impact to outputs are ranked in the top.And the insensitive attributes with no or very little impact on outputs are ranked in the bottom.
In our approach, by using the kernel function showed in formula (4), we build the epsilon-SVR regression model.
As we explain before,   is a vector of multiple attributes; that is,   = ⟨  ,  2 , . . .,   ⟩.To analyze the influence of a small offset Δ of attribute variable   to the dependent variable   , we could use the following formula: ( The larger the   () is, the more influence   has on   .Therefore, we change the value of each attribute slightly at one time in  rotation cycles and we could get  changes of the SVR model outputs.Sorting the input attributes by their influences on SVR outputs, we will have the attribute sensitivity order.

Feature Selection by Filtering out Insensitive Attributes.
After we obtained the sensitivity order of input attributes to regression outputs, we can apply feature selection by filtering out insensitive attributes.
We have designed two types of feature selection.One is the univariate filtering out.In this case, according to the reversed sensitivity order, we will remove an insensitive attribute from inputs to the SVR regression model one at a time.By experiments, we can pragmatically point out which attribute can be filtered out from inputs with no or little loss of regression accuracy or even improving accuracy.The other type of feature selection is multivariate filtering out.We will leave out attributes whose sensitivity orders are larger than number .  is optimized by experiments.The SVR regression model with the remaining attributes will achieve similar accuracy as the model with full attributes.

Measurements of Prediction Accuracy.
Our sensitivity analysis is built based on the SVR regression model.Each input attributes are evaluated by their sensitivity degree to impact the SVR prediction accuracy.For fully judge of the impact, we select multiple widely used regression accuracy measures in AOT retrievals.
The most simple and plain measurements are mean square error (MSE): where   is the AERONET AOT as ground trues and   is the corresponding AOT prediction obtained from the SVR model.
Another most commonly used measure is coefficient of determination ( 2 ): Here,  is the mean value of  across all  collected samples.The highest  2 accuracy is 1.The closer it is to 1, the more accurate the model output is.
Correlation coefficient (CORR) is the indicator of the degree of correlation between truth and prediction variables; it is often used to measure the regression accuracy and defined as According to domain scientists, there are inherent measurement errors in MODIS AOT retrievals [2].The expected boundary is defined as Based on this boundary, two domain specific measurements of AOT retrieval accuracy, mean square relative error (MSRE) and fraction of successful prediction (FRAC), were proposed in [4].We also use these two measurements to evaluate our model.
MSRE is defined as The closer it is to 0, the more accurate the AOT predictor is.
FRAC is defined as where  is the number of predictions that drop within the expected boundary.The closer it is to 1, the more accurate the AOT regression model is.

Experimental Results
4.1.Data Set.Like many previous studies, we use AERONET retrievals as ground-truth for validation of our prediction outputs.We randomly picked 24 AERONET sites among 40 whose longitude is between 70 ∘ E and 140 ∘ E and latitude between 20 ∘ N and 50 ∘ N (shown in Figure 2).And their level 2.0 cloud-screened and quality-assured AERONET data from 2009 to 2011 were collected.Since AERONET does not provide AOT retrievals at wavelength of 550 nm, we interpolated it with its measurements at 440 nm and 675 nm by the following equation: As for MODIS, we choose three aerosol related products from MODIS instrument aboard on TERRA.They are MOD02SSH level 1B radiance data with spatial resolution of 5 km, MOD04 level 2 aerosol retrievals, and MOD35 level 2 cloud mask product with resolution of 1 km.To collocate MODIS data with AERONET data spatially, we apply a region box of ±0.15 degree in latitude and longitude around the corresponding AERONET site when considering MODIS data.MODIS information in the region box is synchronized with the temporal mean values of the AERONET AOT observations taken within ±30 minutes of MODIS overpass.In this way, we derived 35 attributes from MODIS products, represented as  = ⟨ 1 ,  2 , . . .,  35 ⟩.  1 is the number of validated gflags in the region,  2 is AOT at 550 nm for both ocean (best) and land (corrected) with best quality data,  3 - 5 are corrected optical thickness land at 470, 550, and 660 nm by MODIS deterministic algorithm,  6 - 19 are the average and minimum MODIS radiances over cloud-free pixels for seven wavelengths between 0.47-2.1 m,  20 - 26 are the average MODIS radiance uncertainties for the seven wavelengths of cloud-free pixels,  27 - 31 are the number of cloud-free pixels over water, coastal, desert, and land in the region, and  32 - 35 are mode analysis results of the first four bytes in MODIS quality assurance (QA) flags of land.MODIS provides five bytes in QA flags of land in total.But we observe that the fifth byte values are all constant 0 throughout the collected samples, so it is omitted and we just use the first four  1.
For each piece of MODIS data, it was taken into account only when MODIS data within the region described above contains at least one noncloud pixel and at least one AERONET AOT retrieval is available within the time range of ±30 minutes around MODIS overpass.In total, we obtain 1080 spatially and temporally collocated data samples.

SVR Model Optimization.
To build a SVR predict model, we use the tool libsvm [27].It is an efficient and widely used SVM tool package.It can be applied to many classification and regression problems and provides four types of common kernel functions.We first format the experiment data into following form: The AERONET AOT retrieval result was labeled as   above, followed by 35 MODIS derived attributes.
Then we normalize the input attributes of formatted samples.All attribute values were normalized into the range [−1, 1].After that, 5 rounds of random selection were applied to the 1080 samples and we obtained 5 groups of training sets and test sets.In each group, 540 samples were selected as training set and the rest samples as test set.The SVR model was trained on each training set and applied on each corresponding test set. 2 was used to evaluate the model.
In order to gain an optimal model, parameters of each SVR model are optimized by using grid-search provided by libsvm.We choose epsilon-SVR and radial basis kernel function.The optimized parameters are , , and , representing cost in epsilon-SVR, gamma in kernel function, and epsilon in loss function of epsilon-SVR, respectively.Other parameters are set to the default values.
First, the searching ranges of the three parameters were all set to [2 −15 , 2 15 ], and the search step increment was set to 2 2 .
For each parameter setting, the model was trained on 5 train sets and tested on corresponding test sets.The 5 rounds of resulted -squares are reported in Table 2. From the results, we see that through all 5 rounds of modeling and tests, the optimized parameters are very close to each other and the -squares are around 0.81.It shows that SVR model stably achieves optimization with these settings.

Sensitivity Analysis Experiments.
To explore the impact of each input attribute on model prediction output, we change the value of each attribute with a small offset one at a time and then calculate the prediction output difference as described in (5).
By sorting   () in descending order with a constant Δ, we obtain the sensitivity order of all input attributes to the model.To get a statistically robust result, we have done the experiment for 1000 times.Every time the training set and test set are randomly selected.Δ is set to 0.01.We obtain a 1000 * 35 matrix .Matrix element (, ) represents the sensitivity order of attribute  in experiment .The ascending order of ∑ 1000 =1 (, ) suggests the overall statistic sensitivity result for all 35 attributes.The detailed experimental results are reported in Table 3.In the table, the sensitivity order of 35 attributes can be divided into three groups.The first 10 attributes are in the sensitive group, the last 10 are in the insensitive group and the rest are in the medium-sensitive group.
To further validate the sensitivity order of 35 attributes, we repeat experiments by changing parameters from three perspectives: the repeating number of independent experiments, the value of Δ, and the model parameter setting.
For the first aspect, we simply repeat the experiments for 2000 times.For the second aspect, we try Δ ∈ [−0.3, 0.3] with a step of 0.01 and obtain 60 statistic results with different Δ.For the third aspect, we apply model jitter with a small offset on SVR model parameters.We aim to see if the sensitivity orders of attributes remain stable by jittering.
By analyzing the experimental results, we find that the sorting order of attributes in the sensitive group and insensitive group is stable through all experiments by various conditions in the three aspects.The rest medium-sensitive group shows no fixed pattern.But it is rare that attributes in one group moved to another group when condition changed.
These experimental results proved that the sensitivity orders of attributes reported in Table 3 keep stable in various conditions.

Feature Selection Evaluation.
In this step, we aim at exploring feature selection by using the obtained sensitivity order results.We implement two groups of experiments.In the first group, we remove an attribute from inputs to the model one at a time in the reversed sensitivity order.In the second group, we remove all input attributes whose sensitivity orders are larger than number .In both groups, attributes are removed in reversed sensitivity order as reported in Table 3.That is, in the first group, we rotationally remove the 35th, 23rd, 31st, . .., until 3rd attribute.In the second group of experiments, we will firstly model regression by The changes of the above five measures as we remove one attribute from inputs to the model in the first group of feature selection experiments are shown in Figure 3. From Figure 3, we can see that roughly the five measures changes little when we remove any of the top 25 insensitive and medium-sensitive attributes.In such conditions, models with one less attributes achieve accuracy around  2 = 0.805 and it is very close to the model accuracy with the full 35 input attributes.It suggests that any insensitive and medium-sensitive attribute can be screened to speed up prediction model with almost no loss of accuracy.
In particular, Figure 3 shows that the 3rd sensitive order attribute; that is, attribute 7, the average MODIS radiances over cloud-free pixels at wavelength 0.55 m, plays an import role for achieving accurate prediction.Thereby, the design of high precision detection at this wavelength on MODIS will be very useful for resulting in more accurate AOT retrievals.
The exception insensitive attributes are in the order at 5 and 7.The 5th insensitive attribute is 32, representing the first byte of QA flag for MODIS retrieval on land.The 7th insensitive attribute, 33, represents the second byte of QA flag.When removing 32, the value of -square and CORR decrease and the value of MSE increase visibly.The result indicates that the information in the first byte of MODIS QA flag land has an important effect on model prediction even though 32 is an insensitive attribute.While leaving out 33 shows the opposite effect.The model prediction accuracy is even better when we remove it.It suggests that we can safely leave out 33.
In this way, by implementing experiments practically, we found that sensitive attributes affect the prediction accuracy greatly while other insensitivities are not.In addition, we observe that there are some correlations among the input attributes, affecting the SVR model prediction accuracy.So if a sensitive attribute was left out, but the information it contains can be covered by other remaining attributes, then the prediction accuracy will not change much.This phenomenon can partially explain the reason why removing International Journal of Distributed Sensor Networks any attribute in the range of reversed sensitivity order 10-25 has little impact to regression accuracy.
The changes of the five measures as we remove all input attributes whose sensitivity orders are larger than a number  are shown in Figure 4. From Figure 4, we see that the five measures did not change much ( 2 keeps around 0.80) even if we leave out the top 10 insensitive attributes.But when removing the top sensitive attributes, the prediction accuracy will drop sharply.It suggests that the sensitivity order of attributes is effective for feature selection and useful to build a fast prediction model with less input attributes.

Conclusions
In this paper, we proposed a SVR model-based sensitivity analysis approach for feature selection in the context of AOT regression.Specifically, we firstly use our method to order 35 input MODIS attributes according to their sensitivity to prediction outputs.Next, the attribute sensitivity orders are used to carry out feature selection by removing insensitive attribute one at a time or by removing attributes whose sensitive orders are larger than number .The experimental results of regression based on the collocated data between MODIS and AERONET from 2009 and 2011 showed that the top 10 insensitive attributes can be screened to speed up prediction model computation with very little loss of accuracy.The results also suggested that the top sensitive attributes are helpful to identify the most informative attributes as well as the attributes requiring the highest precision for AOT prediction.Thereby, our approach will be valuable for remote sensing scientists or atmospheric scientists to optimize the design precision for some observation attributes in scanning equipment, like MODIS, and further improve AOT retrieval accuracy.

Future Work
Next, we will experimentally combine sensitivity analysis with other data mining models, such as neural networks and decision trees, to test their model-based sensitivity analysis ability.
In addition, we will apply the proposed model-based sensitivity analysis method to other satellite sensors in Atrain satellite constellation and analyze their informative observing attributes for AOT regression.Then we aim to fuse these selected attributes from multiple sensors together for further improving AOT regression accuracy.
Our future work also includes extending our proposed method in big data cases in the internet of things or astronomy context [28][29][30].In these cases, we will collect information from multiple sources with different scale of uncertainty.For example, for estimating astronomy photometric redshift, we need to combine multibands together from multiple sky surveys, such as SDSS, UKIDSS, and WISE.It is useful to first apply our model-based sensitivity analysis approach to pick out those attributes which make significant contributions for the regression model before information fusion from multiple sources.

Figure 2 :
Figure 2: AERONET sites used in the experiment.

2 Figure 3 : 2 Figure 4 :
Figure 3: Changes of -square, MSE, CORR, MSRE, and FRAC when we remove an attribute in sensitivity order one at a time.

Table 1 :
35 attributes derived from three MODIS products.

Table 2 :
Results of SVR model parameter optimization.

Table 3 :
Sensitivity order of attributes by 1000 times of experiments. , and attributes, . .., and so forth.When every time we remove an attribute or multiple attributes with sensitivity order larger than , the model training process was reoptimized and prediction test process was redone.We use the five measures,  2 , MSE, CORR, MSRE, and FRAC, to evaluate the prediction accuracy.