Development of a Near-Sea-Level Calibration Algorithm for Aerosol Optical Depth Measurement Using a Ground-Based Spectrometer

Aerosol optical depth (AOD) is a measurement that represents the total attenuation of solar terrestrial radiation caused by aerosols. Measurement of AOD is often performed using ground-based spectrometers, because this approach has the highest accuracy, as well as high spectral and temporal resolutions. However, frequent calibration of a ground-based spectrometer is often difficult for both absolute laboratories and the conventional Langley method. This is because the former are usually not readily available for most users, whereas the latter is always complicated by possible temporal drifts in the atmosphere. In this paper, a new Langley calibration algorithm was developed to allow frequent calibration, even at near-sea-level sites. The proposed algorithm uses a combination of clear-sky detection, the Perez-Du Mortier (PDM) model and a statistical filter to constrain the extrapolation to get the closest possible extraterrestrial constant over a wide range of the light spectrum. A high degree of linearity was observed between the near-sea-level irradiance predicted by the proposed algorithm and the reference value simulated by the SMARTS model. Overall, the results indicate that Langley calibration at low altitude is feasible provided that strict data screening is imposed.


INTRODUCTION
Aerosol Optical Depth (AOD) in the vertical column atmosphere is an important climate forcing agent in many aspects of environment and human health. It is needed for the studies of aerosol radiative forcings (Dani et al., 2003), air quality condition (Sharma et al., 2012), aerosol transport mechanism (Jia et al., 2012), climate change (Che et al., 2013), aerosol effects on photovoltaic application (Gueymard, 2008) as well as surface reflectance correction for remotelysensed data . One of the important governing factors in AOD retrieval is the maintenance of calibration constant of the spectrometer, especially for long-term monitoring stations (Arai and Liang, 2011).
Conventionally, radiometric calibration is performed using standard laboratory lamps. It determines the absolute response of a spectrometer for a given spectral irradiance incident on the instrument. These lamps typically have inconsistent uncertainty ranging from 1 to 4% in the wavelength from 400 to 1070 nm (Kiedron et al., 1999). They are also prohibitive with necessary power supplies, fragile and have a limited lifetime of about 50 hours (Slusser et al., 2000). In contrast, an a lternative to absolute calibration known as Langley method is a passive calibration method that uses solar radiation as the light source. It is performed using solar disc irradiances to determine the instrument output at top-of-atmosphere and subsequently divide this output by spectrally extraterrestrial irradiances. Detail of the principle is discussed in Section 2.
Though Langley method is economical, it is often complicated by the possible temporal drifts in the atmospheric condition during the calibration period (Shaw, 1983). Therefore, in AERONET, for example, one of the most established aerosol monitoring networks, the reference instruments are typically recalibrated on a basis of 2-3 months cycle in high altitude (3400 m) condition at Mauna Loa Observatory (MLO), for clear and aerosol-stable atmosphere. However, regular access to high altitude is not efficient in accessibility and economical prospects. Consequently, most instruments are calibrated against a reference instrument with a MLO-derived extrapolated value (Saeed and Al-Dashti, 2010) but these secondary calibrated instruments typically have larger uncertainties than the reference instrument uncertainty (Holben et al., 1998).
In this paper, a new algorithm is developed for Langley calibration at near-sea-level for AOD measurement. The proposed algorithm uses the combination of clear-sky detection model and statistical filter to constrain the Langley extrapolation to get as close as possible to the extraterrestrial constant over a wide range of wavelengths. It attempts to produce a data set that is close to the high altitude atmospheric condition. To ensure this, only data that exhibit clear-sky condition according to the model will be selected for regression. The resulting regression is then filtered using the statistical filter to obtain the calibration constant needed for the AOD retrieval. It is believed that the proposed method is also applicable to other ground-based spectrometers that have the similar working principle as the currently used in this study. Particularly, it is useful for spectrometers that used for long-term monitoring purpose which require periodical calibration. The feasibility of the proposed method and validation using radiative transfer code SMARTS are presented in this paper.

CONVENTIONAL LANGLEY CALIBRATION METHOD
This section briefly discusses the theory of the conventional Langley calibration, which is usually performed at high altitude for clear and stable atmosphere. The Langley method uses the progress of the sun's apparent motion that changes the observed path length through the atmosphere to compute an optical depth (Harrison and Michalsky, 1994). It measures the sun's irradiances using ground-based instruments operated at a given location as the sun moves across the sky for significant changes of air mass.
The working principle lies on the basis that as the solar radiation transmits through the atmosphere, it experiences a series of attenuations either by absorption or scattering due to the air molecules or solid particles suspended in the atmosphere. Mathematically, the attenuation of the Sun's direct-beam of particular wavelength passing through the Earth's atmosphere is described by the Beer-Lambert law as (Thomason et al., 1983) where I λ is the direct normal irradiance at the ground (or near-sea-level) at wavelength λ, R is the Earth-to-Sun distance in astronomical units (AU), I o,λ is the extraterrestrial irradiance at the top of atmosphere (TOA), τ λ,i is the total optical depth of the ith scatterer or absorber, and m i is the air mass of the i-th scatterer or absorber through the atmosphere. Taking the natural logarithm of both sides, Eq. (1) can be written as 2 , , ln ln The total optical depth, τ λ,i in Eq.
(2) is contributed by Rayleigh, ozone, aerosol, and trace gases, which can be written as (3) The Rayleigh contribution is approximated using the relationship (Knobelspiesse et al., 2004;Djamila et al., 2011) , , where k Ray(λ) is the Rayleigh scattering coefficient, p is the site's atmospheric pressure, p o is the mean atmospheric pressure at sea-level and H is the altitude from sea-level in meter. Meanwhile, ozone optical depth is calculated using satellite observation of ozone in Dobson unit (DU), which is computed by (Knobelspiesse et al., 2004) τ o,λ,I = Zk oz(λ) x2.69e16mol/cm 2 , where Z is ozone concentration in DU, and k oz(λ) is ozone absorption cross section. Substituting Eqs. (3-5) into Eq.
(2), the uncalibrated pixels, P measured by the spectrometer can be written as 2 , , , , ln ln where n is the number of Langley plots available for calibration. The calibration factor k is obtained by dividing the averaged extrapolated values with the extraterrestrial spectrum which is Subsequently, the calibrated global irradiance measured by the spectrometer is determined by multiplying the global pixels measured at the near-sea-level P λ with the calibration factor k as (Slusser et al., 2000) ,

DEVELOPMENT OF NEW LANGLEY CALIBRATION ALGORITHM
The previous section discussed the principle of the conventional Langley calibration method. In this section, a new radiometric calibration algorithm is developed for near-sea-level calibration purpose. The proposed method introduces an objective algorithm to constrain the Langley extrapolation using the combination of clear-sky detection model with statistical filter. The former is to ascertain only cloudless and clear sky data is selected for the regression, and the latter is to filter the residual regression data for improved instrument's response. Detail of the proposed algorithm is discussed in the following.

Clear-Sky Detection Model
It is hypostasized that the clear sky conditions at high altitude can be accurately approximated at near-sea-level if there is a method to select such a data. In this paper, Perez-Du Mortier (PDM) model is proposed for clear-sky detection method for the selection of clear sky data. This model is selected because it had been proven to be appropriate in classifying the sky type (Zain-Ahmed et al., 2002;Djamila et al., 2011).
The Perez model is usually considered to be the most acknowledged and precise model and it is often used for predicting the daylighting in building design. In Perez's classification, the sky type is classified into three type namely clear sky, intermediate blue and cloudy of overcast using clearness index as indicator. The Perez's index for irradiance can be computed using the relationship between the diffuse I ed , and global I eg , horizontal irradiance as shown by where I dir is the direct irradiance and H  is the solar zenith angle in radian. In Du Mortier model, the sky is classified into five types using Nebulosity index (NI) as indicator of sky type, which can be computed by where I d is the diffuse irradiance, I is the global irradiance. CR in Eq. (11) where I d,cl represents the clear sky illuminance and α represents the solar altitude. I d,cl is calculated using equation where Ar is the Rayleigh scattering coefficient written as with m as the optical air mass and α is the solar altitude. The Perez and Du Mortier Model are used together in this paper to form a combined sky classification as shown in Table 1.

Statistical Filter
The statistical filter used in the current study adopted the similar approach suggested by Harrison and Michalsky (1994), where a successful Langley plot should remain a minimum of one-third points with a standard deviation around the regression line of less than 2 σ. That is, pixels P λ values with residual standard deviation > ± 2 σ from the mean will be removed in the subsequent analysis. The purpose of the statistical filtration is to reduce the field-ofview effects associated to the uncertainties caused by the diffuser used to overfill the image of the sun from the radiometer to retrieve the direct solar beam. Therefore, this filter is basically used as a residual filter to eliminate possible outliers and instabilities due to the instrument responses. In addition, this error estimator is a ratio of intensities, and hence is independent of both the evolution of air mass as well as the absolute calibration of the detector (Harrison and Michalsky, 1994).

DATA AND MEASUREMENT
In the present work, a portable spectrometer ASEQ LR-1 was used. The spectrometer measures radiation ranges from approximately 300 to 1000 nm but only nominal wavelengths 470, 500, 550 and 660 nm were selected for analysis. They were selected according to the spectral band that regularly used in AOD retrieval (Weihs et al., 1995;Blumthaler et al., 1997).
The measurement was conducted at an open space near Tun Mustapha Tower, Kota Kinabalu, Sabah or formerly known as Yayasan Sabah (116°E, 6°N). This site was selected due to its location, which is near sea level with an altitude of 7.844 m, permitting not only the investigation of the feasibility of Langley calibration at low altitude but also the solar spectrum is obstruct-free from irrelevant objects such as trees or artificial buildings.
The range of air mass factors at a given wavelength suitable for Langley plots is governed by the sum of the products of the optical depth and the individual air mass. In this paper, air mass range of 2 to 6 (SZA 60.0° to 80.4°) was used for the visible part of the spectrum. Lower air  (Chang et al., 2013a). On the other hand, higher air masses are avoided due to greater uncertainty in air mass caused by refraction that are increasingly sensitive to atmospheric temperature profiles. This range of air mass (2 to 6) is typically associated to the morning hours just after sunrise from 0640 to 0815 hours.
In the solar geometry processing, air mass m is computed based on geometrical solar zenith angle, which is calculated using Solar Position Calculator, provided by Institute of Applied Physics of the Academy of Science of Moldova (ARG, 2012). The measurements were made every 3 minutes averages within the air mass range of interest. This particular interval was selected to reduce error in P o,λ as using the same range of air mass with a shorter time midpoint of 3 min averages in this study would result in an even smaller error of P o,λ . Measurements were made as many as possible during the observation period unless it was raining or the apparent sun was fully covered by clouds. A full depiction of the methodology scheme is presented in Fig. 1.

Determination of Extraterrestrial Constant from Langley-Plot
A 2-month period of data measurements from April to May of 2012 was used to investigate the feasibility of the proposed calibration algorithm. Within this period, a total of 730 data (hereinafter denoted as D n , where n represents the number of observations) have been collected. However, not all data can be used for Langley regression as the measurement may contain cloudy and overcast data. Therefore, the clear-sky detection model, PDM is used to identify these points and filter them as previously discussed.
For each data D n , the corresponding clearness index (ε) and nebulosity index (NI) was computed using Eq. (10) and Eq. (11), respectively, and they were listed as follows The raw data of clearness and nebulosity index in Eq. (15) is here-forth be known as clearness-nebulosity index (CNI).
The CNI values were inserted into a repetitive regression algorithm for data filtration using permutated criteria C p,q in the range of 1.23 ≤ p ≤ 1.89 and 0.70 ≤ q ≤ 0.99, where p and q represent the criteria index. The permutated criteria C p,q was repeated for other value of ε and NI at a step of 0.01. As a result, there were 2010 criteria generated which is illustrated in Eq. (16).
The criteria in Eq. (16) have their corresponding conditional value given as As an example for implementation purpose, for criterion C 1,1 , only data D n with value of ε 1 ≥ 1.23 and NI 1 ≥ 0.70 will be used for the regression of Langley plot. As a result, some of the corresponding intensity data of D n (intensity of the D n data that does not fulfill the corresponding condition in Eq. (17)) will be filtered out and the remaining are plotted against air mass to get the corresponding Langley plot. The corresponding correlation value of the Langley regression plot is then obtained. This step is repeated for each criterion given in Eq. (16) and (17) which is also as a result generates 2010 of respective Langley correlation values.
A full depiction of the implementation of the repetitive regression algorithm is illustrated in Fig. 2. Among these criteria, the highest correlation of R 2 = 0.77 is obtained at NI ≥ 0.92 and ε ≥ 1.55. By original definition, both indexes coincidently represent a confidence region that free of cloud contamination (refer to Table 1). Therefore, only the intensity data of D n with NI and ε of more than and equal to 0.92 and 1.55, respectively, are used for calibration purpose. In addition to this, a statistical filtration step, described in Section 3.2, is further imposed to the Langley plot to eliminate other measurement errors and uncertainties.
To further illustrate the proposed calibration method and procedure, the result of the filtration steps in the Langley plot is shown Fig. 3 for wavelength at 470 nm. Fig. 3(a) shows the Langley plot of unfiltered data that consists of 730 points (also refer to Table 2). After the repetitive regression algorithm is implemented, the data point reduced to 272 and the corresponding Langley plot is shown in Fig. 3(b). Finally, the number of data further reduced to 200 after the statistical filtration is implemented, and the corresponding Langley plot is shown in Fig. 3(c). The Langley plot of the completely filtered data for the wavelength of interest is then used to determine the calibration factor. Fig. 4 shows the Langley plot for completely filtered data, using the repetitive regression algorithm and statistical filter for the wavelength of 500 nm, 550 nm, and 660 nm. The corresponding information of the Langley plot for each wavelength is shown in Table 2 for clarity. One can also be assured of the adequacy of data collection from this table. It is obvious that the final filtration product yields almost two-third of the data for all wavelengths which complies with the strict fixed error criteria described in Section 3.2 where a minimum of 1/3 of the pre-filtered data points should remain after the statistical filtration.
From Fig. 4, together with Fig. 3(c), the corresponding calibration factor is tabulated in Table 3 for each wavelength. The calibration factor k for each wavelength is computed by dividing the corresponding extrapolated value P o with extraterrestrial constant (Table 3) at TOA obtained from ASTM G173-03 Reference Spectra. Multiplication of pixels measured at near-sea-level with this factor converts the measurements into physical unit in W/m 2 /nm. It can be seen that for increasing wavelengths, the extrapolated value has a decreasing pattern which follows the extraterrestrial constant obtained from reference spectrum. Highest ratio is obtained at 865 nm with calibration factor, k = 6.35E-4 and lowest at 470 nm with k = 9.40E-5.

Performance Analysis
This section presents the performance analysis of the proposed calibration algorithm. The analysis is separated into two parts: assessment using Linke Turbidity factor and validation using SMARTS radiative transfer code. The first part is to assess the proposed PDM model whether it has successfully identified a data set that representative of cloudless and clear sky condition. The second part is to validate the Langley calibration constant obtained from the proposed algorithm in reference to SMARTS radiative transfer code.

Linke Turbidity Factor
To assess the performance of the proposed PDM model, the extrapolated value from Eq. (6) was inserted to the Lambert-Beers exponential attenuation law to classify the post-filtering points in terms of Linke turbidity factor T L given as (Chaâbane et al., 2004) where I N is the direct normal irradiance over the whole    solar spectrum at the earth's surface, I o E sc is the solar constant corrected by the eccentricity factor due to the variation in the sun-earth distance, δ R is the spectrally integrated optical thickness of the clean-dry atmosphere due to Rayleigh scattering and absorption by gaseous, and m is the relative optical air mass. Linke turbidity factor usually refers to the whole spectrum that is spectrally integrated attenuation, which includes presence of water vapor and aerosols. However, this factor can also be calculated monochromatically from solar spectrum data as (Alnaser and Awadalla, 1995) , , 1 ln( ), where the empirical equation of Rayleigh optical thickness is defined as The Linke turbidity scale suggests that factor equals to 1.0 should be obtained for pure atmosphere. However, for a non-ideal atmosphere, factor of less than 6 is considered acceptable in our screening procedure. Fig. 5 illustrates the classification of the data point in terms of Linke turbidity factor after the PDM filtration together with its scale factor at the bottom of the figure. The T L frequency distribution (which is the occurrence in Fig. 5) for all wavelengths nearly follows the Gaussian distribution pattern where the maximum occurrence bins is close to the mean T L . In this figure, it is also observed that the mean T L is wavelength dependent and tends to decrease as the wavelength increase. This is because the radiation loss due to pollutant aerosol is strongly wavelength dependent, where shorter wavelengths are much more affected than longer wavelengths. In other words, under clear-sky condition and within the visible band where gaseous absorption is minimal, aerosols and water vapor attenuate (e.g., scatter and/or absorb) the terrestrial solar radiation more dominantly at shorter wavelengths.
Most importantly, all the post-filtered data points (Fig. 5) exhibit spectral T L in the range of 1.0 to 4.5 which is a characteristic of clean and clear-moist warm air atmosphere. This verifies that the PDM filtration has successfully established a set of clear-sky data from a continuous time series of measurement within the observation. Such a basis increases the confidence that cloudy data within the raw measurement have been conditionally identified and removed prior to determining the calibration factor.

SMARTS Radiative Transfer Code
To further validate the proposed calibration procedure, simulated irradiance value obtained from radiative transfer code SMARTS version 2.9.5 is adopted as truth value. The simulation run is repeated at distinct air mass to obtain a series of truth values correspond to the measurement so that direct point-by-point comparison is feasible at each observation. This model is documented in FSEC Report PF-270-95 and its accuracy is discussed in elsewhere (Gueymard, 2008;Kaskaoutis and Kambezidis, 2008). By taking advantage of the high spectral sensitivity of the model, it could become a powerful tool to assess the performance of the calibration algorithm when certain meteorological parameters are known, which in this case, they are all obtained from local meteorological department and satellite observation. Fig. 6 shows the SMARTS simulated irradiance values plotted against the predicted irradiance values for all wavelengths under three different turbidity days. The dotted line in the figure represents the perfect match between the two values. Deviation from this line symbolizes the error of prediction where points deviate to left represent overestimation and underestimate points deviate to the right. From this figure, it is obvious that the underestimation occurs consistently on low turbidity day for all wavelengths (refer to triangle points in Fig. 6) while overestimation is likely to happen over moderate-to-high turbidity condition (refer to diamond and star points in the figure).
The effect of circumsolar radiation is believed to be responsible for the overestimation. Circumsolar radiation is the scattering effect of direct sunlight through small angle by atmospheric constituents. Under high-turbid condition, scattering of sunlight is more intense and thereby it causes the circumsolar effect becomes significant. A spectrometer of typical design has a field-of-view of less than 5°, as compared to 0.5° angle subtended by the solar disc. As a consequence, the measurement of a spectrometer includes not only the direct beam irradiance but also a substantial fraction of the circumsolar radiation. Therefore, the overestimation in Fig. 6, particularly under moderate-tohigh turbidity condition, is likely caused by this effect.
On the other hand, underestimation between simulatedpredicted shortwave (470, 500 and 550 nm) solar radiation on low turbidity day in Figs. 6(a) to 6(c) is obvious in the lower air mass range (solar irradiance increases as decreasing air mass). This discrepancy is likely due to the underestimated aerosol and water vapor absorption within this range of air mass (Chang et al., 2013b). The lack of aerosol absorption, underestimation of water vapor amount and deficiencies in the clear-sky radiative transfer calculation in high spatial and temporal resolution may be attributed to the underestimation of shortwave solar radiation in the tropical atmosphere particularly at low altitudes.  Despite the observed discrepancies, the high determination of correlation, R 2 > ~0.90 for all wavelengths and turbidity condition confirms the predicted values agree well with the simulated data using SMARTS model. On the basis of the correlation analysis, the proposed algorithm is proven to be acceptable at all wavelengths and atmospheric conditions with small susceptible errors. To further justify the proposed calibration algorithm, three statistical measures Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Normalize Mean Square Error (NMSE) were tested on the obtained results in Table 4. The overall inspection suggests that the uncertainty of the proposed method lies  within the range of error ~1-3% depending on the wavelength of interest. In comparative with most other related studies, the obtained result is compared with other normally calibrated spectrometers in Table 5. The uncertainty range found in this paper is comparable to most of the study reported in the literatures and also well within the acceptable range of error. As an overall validation, the near-sea-level Langley calibration algorithm as proposed in this paper is proven to be acceptable for reliable calibration of ground-based spectrometers for AOD measurement.

CONCLUSIONS
This paper introduced a near-sea-level Langley calibration algorithm for AOD measurement using combination of clear-sky detection model PDM and statistical filter to allow frequent calibration feasible at low altitudes. By the definition from Linke turbidity classification, the proposed algorithm had established a set of data that exhibits homogeneous and stable atmospheric condition through multiple permutated criteria until the best linearity in Langley plot is obtained. It was found that good agreement was obtained between the predicted near-sea-level irradiance using the proposed method and the modeled reference value (SMARTS model) with high correlation for all wavelengths. Also, with comparison to other published papers, the obtained uncertainty is comparable to most of the study reported in the literature and lies within an acceptable range of error. The advantage of using only clear-sky data defined by the proposed algorithm is that noise is reduced and a confident extrapolation to zero air mass by simple regression is feasible. Instead of using a single day for Langley plot, the proposed method collects several days of clear-sky Langley plots over a period of time. In this way, it produces a pool of extrapolated values so that a reliable mean can be computed. Since the proposed algorithm depends on the highest correlation plotted from the pool of screened data, it also increases the confidence that the mean extrapolated value is stable and free of any effects that promote changes in thin cirrus cloud and aerosol loading.