Evaluation of ionospheric models for Central and South Americas

This work shows a 20-month statistical evaluation of different Total Electron Content (TEC) estimators for the Central and South America regions. The TEC provided by the International GNSS Service (IGS) in the area covered around the monitoring GNSS stations are used as reference values, and they are compared to TEC estimates from the physics-based (Sheffield University Plasmasphere Ionosphere Model—PIM) and the empirical (Neustrelitz TEC Model-Global—NTCM-GL) models. The mean TEC values show strong dependence on both solar activity and seasonal variation. A clear response was noticed for a period close to 27 days due to the mean solar rotation, as seen in the solar flux measurements. Consistently, the mean TEC values present an annual variation with maxima during December solstices for southern stations with geographic latitudes greater than 25 S. Semi-annual dependence has been observed in TEC for the sector between 25 of geographical latitude but with modulations caused by fluctuation in the solar radiation. We observed a high correlation between solar radio flux F10.7 and NTCM-GL outputs. The fast increases in F10.7 index have caused significant differences between IGS data and NTCM-GL results mainly for equatorial and low latitudes. For the initial months of the evaluated period (January–April, 2016), the errors of the physics-based model were considerably larger, mainly near the equatorial ionization anomaly. The discrepancies observed in SUPIM results are mainly due to inputs of solar EUV flux. The EUVACmodel has underestimated EUV flux between January and April, 2016, when the solar activity was moderated and Solar2000 model has overestimated such flux during low solar cycle period between May and August, 2017. In relation to IGS data, the two assessed models presented smaller differences during the June solstice season of 2016. 2019 COSPAR. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).


Introduction
Ionospheric modeling that provides accurate ionospheric delay has a direct impact on the Global Navigation Satellite Systems (GNSS) positioning accuracy (Subirana et al., 2013) enabling several applications for users of a single frequency, as for example, that ones of Satellite-Based Augmentation System (Rovira-Garcia et al., 2016). It can also lead to a better understanding of ionospheric conditions but requires intensive data processing and a large amount of observational data in order to validate the results, mainly related to solar activity dependence, seasonal variations, and the global and regional morphology of the ionosphere. Different approaches have produced models that consider chemical and physical processes (Bailey and Balan, 1996;Codrescu et al., 2012) or empirical models relying on equations for ionospheric behavior emulation (Klobuchar, 1987;Bilitza, 2001;Radicella, 2009) which could also be supported by statistical methods applied to observational data (Llewellyn and Bent, 1973;Jakowski et al., 2011a). Besides that, the increase in ground and space remote sensing, which rely on several instruments and sounding techniques (e.g., ionosondes, incoherent scatter radars (ISR), ground-based GNSS receivers and radio occultation missions), improved the coverage of the Earth's upper atmosphere and the observation availability (Yue et al., 2012).
This work presents a 20-month statistical analysis using different ionospheric Total Electron Content (TEC) estimates for the Central-South American region, ranging from À90 W to À25 W longitude and from À65 S to 20 N latitude. In fact, the main purpose of this work is to validate the TEC estimates of two models for equatorial and low latitude covering Central and South Americas using TEC provided by the International GNSS Service (IGS) which is supported by GNSS ground stations data. The spatial and seasonal variations of the TEC as well as the dependencies caused by solar activity between January 2016 and August 2017 are also analyzed.

TEC data
The reference Vertical TEC (VTEC) maps used are provided by the IGS (Hernández-Pajares et al., 2009), which are obtained from the combination of several independently computed VTEC maps. These independent VTEC maps are sorted by ranks and are combined with associated weights to compose an IGS VTEC map. The raw data is acquired from the IGS GNSS ground station network, i.e., around 504 stations, covering different parts of the globe; 39 of these stations are located in Central-South American sector. The VTEC maps are in the Ionosphere Map Exchange Format (IONEX) with temporal resolution of 2 h and spatial resolution of 5 in longitude and 2.5 in latitude (Schaer et al., 1998). After combination, the IGS maps are passed through a validation process using dual frequency altimeters on board TOPEX, JASON, and ENVISAT satellites (Hernández-Pajares et al., 2009). It is also worth highlighting that a large part of information provided by the VTEC maps is based on interpolation, whose techniques are detailed in Hernández-Pajares et al. (2009), Feltens (2007, Mannucci et al. (1998), Hernández-Pajares et al. (1999), Schaer (1999, since there are only a few GNSS ground stations available mainly in the oceanic regions. Once finalized, each IGS VTEC map is stored on the IGS distribution server. Since the GNSS ground station network of IGS in Central and South Americas is quite sparse (see Fig. 1 and Table 1), the observation density is not as high as desired for highly reliable estimates for all parts of the South American region. Therefore, our analysis in this work is only performed within a range of 2.5 around each IGS GNSS station.

SUPIM
In this work an adapted version of the Sheffield University Plasmasphere-Ionosphere Model -SUPIM (Bailey and Balan, 1996) is considered. In this model, coupled timedependent equations of continuity, momentum, and energy balance are solved along closed magnetic field lines to calculate the concentrations, field-aligned fluxes, and temperatures of the electrons as well as of the ionic species O + , H + , He + , N þ 2 , O þ 2 and NO + . Relevant inputs to SUPIM are the neutral atmosphere, as given by NRLMSISE-00 (Picone et al., 2002), the thermospheric neutral wind from HWM07 (Drob et al., 2008), the vertical drift from the Scherliess and Fejer (1999) empirical model, and the solar EUV flux at different wavelength bands obtained from Solar2000's empirical solar irradiance model and forecast tool (Tobiska et al., 2000), or from the EUV flux model for aeronomic calculations (EUVAC) (Richards et al., 1994). The Solar2000 model is licensed to run remotely, so in the event of its unavailability, the EUVAC option is used.
An operational system was developed by Petry et al. (2014) to simulate the ionosphere behavior for a given region of interest. This system runs several instances of a modified parallel version of the SUPIM at different locations, and the outputs are post-processed. After the coordinate transformation from the magnetic to the geographic coordinates, a 3-dimensional data interpolation based on Inverse Distance Weighting (IDW) is applied to a time varying homogeneous grid. Although the system is capable of assimilating observational information using the Newtonian relaxation method, this procedure was not applied in this work. The system outputs can be compared to the TEC reference without any bias, and SUPIM VTEC values are computed by integrating electron density in height. The homogeneous grid output was adjusted to estimate TEC considering altitudes ranging from 90 to 10,000 km. VTEC maps in one-hour temporal resolution are generated  covering the Central-South American region. This is done using a spatial resolution of 1 in latitude per 1 in longitude.

NTCM-GL
The Global Neustrelitz TEC Model (NTCM-GL) (Jakowski et al., 2011a,b) is an empirical climatological model that describes ionospheric behavior through a function of fixed parameters and 12 coefficients. The parameters are combined into the model equations and are defined by location, local time and solar activity. The latter is determined by the solar radio flux index (F10.7 index), which represents the solar impact on the ionosphere within the model. The 12 coefficients of the model are calculated by an iterative non-linear least squares technique. To estimate both parameters and coefficients, NTCM-GL employs TEC maps from the Center for Orbit Determination in Europe (CODE), which uses data from IGS stations for its map generation, whose methodology is detailed in Schaer (1999). Resolution used for NTCM-GL TEC maps was 5 in longitude and 2.5 in latitude, every 2 h. The NTCM-GL goal is to find the best fitting coefficients for its data to minimize the sum of squares of residual errors in relation to CODE TEC maps.

Methodology
In the following, we present a statistical evaluation considering a period of 20 months, from January 2016 until August 2017. It is important to mention SUPIM used EUV fluxes from EUVAC model during 2 intervals in 2016: from January 20 to October 3, and from October 21 to November 20. As each source of TEC maps (IGS, NTCM-GL and SUPIM) has its own temporal and spatial resolution, the constraints for the comparisons are different. Fig. 2 shows the TEC maps coverage for July 27, 2017 at 16:00 UT: (A) for the simulations of the physicsbased ionospheric model, (B) for NTCM-GL, and (C) for IGS VTEC estimates. It is clear that a comparison between them should consider only those TEC values that have a geographic correspondence. A limitation of the analysis is the restriction to locations that are close to where IGS maintains its GNSS stations. This reduces the amount of useful data significantly. Considering the position of every IGS station, a tolerance of 2.5 in latitude and longitude was applied to select the data to be processed, although the complete TEC maps are available. This tolerance value considered IGS maps' lower geographic resolution (2.5 in latitude and 5 in longitude), so the range covered by all IGS stations has geographic matching values with the physics-based model maps. Lower values of tolerance would result in several stations' covered range presenting no geographic match between IGS and physics-based TEC estimates.
A time constraint had to be considered as well, since temporal resolution for physics-based maps follows a 1-h time interval with a total of 24 maps per day, while IGS and NTCM-GL provide VTEC maps with a 2-h time interval corresponding to 12 maps per day. So when we compare them, only half of the physics-based TEC maps are actually used -the ones that match the UT of IGS maps.
Comparisons between different VTEC maps of the same date and time consider only matching locations, and differences can be accumulated in terms of the root-mean-square error (RMSE) (Chai and Draxler, 2014), defined as where error e is defined as the set of differences (e i , i = 1,2,. . .,n) between VTEC reference values from IGS data and the models, and n is the total number of values considered.
The use of the RMSE allows multiple analyses of results using time series plots or intensity maps, which are also more sensitive to outliers (Hyndman and Koehler, 2006). Data manipulation was done using R language (Venables and Smith, 2019). The TEC data is loaded into memory through R data frames using IONEX files for both IGS VTEC maps and NTCM-GL and Gridded Datasets format for SUPIM. The data for all periods under analysis is kept in memory during processing, and therefore a powerful computer node had to be used. It was composed of an Intel Xeon CPU E5-2609, 2.40 GHz with 72 Gb of RAM memory, running CentOS 6.8 platform.

Results and discussion
Figs. 3 (top panel) and 4 present the daily variation of TEC, estimated as the mean value during each 24 h period, considering data from IGS and the corresponding results from the models SUPIM and NTCM-GL for the Central-South American region. The top panel of Fig. 3 shows the mean TEC values around IGS stations, while Fig. 4 presents the mean values for each station separately. The shaded areas in Fig. 3 highlight the periods when SUPIM results were calculated using EUV flux from the EUVAC model instead of the Solar2000, as mentioned in Section 3. The two blue dots near the analyzed period extremities (top panel) are examples of TEC values obtained with SUPIM considering the solar EUV flux given by Solar2000 (left dot) and EUVAC (right dot) models. The solar flux with a wavelength of 10.7 cm (F10.7), defined as an indicator of solar activity, is shown in the bottom panel of Fig. 3. This indicator shows a decreasing tendency in the course of the 20 months but includes a modulation over a period of nearly 27 days, mainly between April and November 2016. The modulations may be attributed to the solar rotation, which also has a period of 27 days. Both IGS data and the results from the models respond to solar flux variations. The mean TEC, considering all station as presented in Fig. 3, has also shown seasonal dependence with maxima near the equinox months. An accurate understanding of the mean TEC seasonal variations can be gained from the results presented in Fig. 4, which are in decreasing latitudinal order from 18.5 to À53.8 (see the corresponding latitudinal sequence in Table 1). An annual variation in the mean values of TEC, with a maximum around December solstice, has been observed at all southern locations with geographical latitudes greater than 25 S. Since the highest photoionization rates occur when the solar zenith angle reaches its lowest values, which happens during the summer, such behavior was expected. Fig. 5 presents the absolute value of the solar zenith angle (v) at noon and over four locations with different latitudes. The black curve represents the station AGGO (À34 ), magenta stands for SCRZ (À18 ), red represents QUI3 (0.2 ) and the blue curve RDSD (18 ). The seasonal variation of the solar zenith angle directly explains the seasonal dependence of TEC for the equatorial sector and for the region between the latitudes of À53 and À25 , i.e., when there is a minimum in v we have maximum in TEC. On the other hand, such direct anticorrelation is not noticeable for the sectors near AE18 . In Fig. 5, the bottom panel shows the resulting overlay of the solar flux and the zenith angle. In fact, both solar flux and zenith angle were normalized by their maximum values to balance the effects, that were calculated with (F 10:7=F 10:7 max : Þ þ ðÀv=v max :). The results show maxima and minima as also presented in IGS data and by the models confirming the same analysis discussed above for the equatorial sector and for latitudes lower than À25 (see red and black curves in Fig. 5) characterizing a strong seasonal control in the mean TEC variation that can be seen in Fig. 4. An unexpected variation in the mean TEC can be seen at the stations near AE18 , for example over RDSD and SCRZ stations, with respective minima during the local summers pointed by the arrows in Fig. 4. It is interesting to note in Fig. 5 that during each central summer period (near June 21 for northern hemisphere and December 21 for southern hemisphere), the resulting overlay shows an accentuated decrease due to solar flux variation, as also highlighted by the blue and magenta arrows along the respective RDSD and SCRZ curves. Thus, the unexpected minima in TEC for the locations near AE18 must be caused by the overlay effect of the solar radiation. This conclusion is also supported by the fact that such minima in the overlay effects are weak for the highest southern latitudinal sector (see black curve during December in Fig. 5), consequently, the mean TEC values do not present the corresponding minima at this localization (see Fig. 4. Same as Fig. 3, but considering each IGS station separately. all panel for latitude lower than 34 in Fig. 4). In addition, as the overlay effect is relatively stronger during June than in December solstice, the mean TEC values respond with a decreasing tendency directly at all locations, except for highest latitudes, as mentioned above. For example, at the equatorial region where the solar zenith effects are similar during the apex of both June and December solstice seasons, the TEC minimum is also relatively deeper in June than in December (see the model results and data for stations QUI3-4 in Fig. 4). Once again, this analysis confirms that, considering this scale frame (daily mean TEC), the superposition of two variables, solar zenith angle and solar flux, is very important to explain the ionosphere variability.
A comparison of the mean TEC from IGS data and NTCM-GL shows that the TEC values are close during the entire period, but on some occasions, e.g., April 2016 and April 2017, they show spikes that are substantially higher (see Fig. 3). It can be observed that the TEC obtained by NTCM-GL is very sensitive to a fast increase in F10.7 index mainly for equatorial and low latitudes, as presented in Fig. 4. The results of the model have not shown such sensitivity for latitudes higher than 40 S. Since NTCM-GL is an empirical model, the response to the F10.7 variation is determined by its coefficients, obviously, the ones that represent the solar cycle dependence. Our analysis reveals the need of a NTCM-GL coefficient update to improve the results for equatorial and low latitude regions. On the other hand, the NTCM-GL model has not included the average of centered 81 days of F10.7 (F10.7A) in the model calculation, as it has been done by most published empirical models. Certainly, TEC values would have smoother results, especially for rapid changes in solar activity, if F10.7A were considered. Fig. 6 shows the RMSE for both NTCM-GL (red triangles) and SUPIM (blue dots), when the IGS TEC is defined as the reference. As expected, the periods of larger differences from IGS TEC are the ones that show higher errors. From February 2016 until April 2016, the physics-based model errors are considerably larger. From May 2016 on, in general, the variability of the RMSE is similar for both models, but from June 2017 to August 2017 SUPIM errors are slightly higher. The discrepancies between the IGS TEC and the SUPIM results are mainly due to inputs of solar EUV fluxes in the model. As mentioned in Section 3, the EUV fluxes used in SUPIM may come from different models (Solar2000 or EUVAC), depending on Solar2000 data availability. Fig. 7   30, 2017. The results are pointed by the blue dots in Fig. 3 and they are clearly much closer to the observations. In summary, it seems that the EUVAC model is underestimating the EUV fluxes for moderate solar activity (first disagreement period), while Solar2000 overestimates them during a low solar cycle (last disagreement period). In fact, further investigation is necessary for a better understanding of the Solar2000 model results if it has overestimated its results only after a prolonged period of near constant F10.7A, as it seems to be between April-August 2017.
Differences between VTEC maps produced by IGS and SUPIM were evaluated for the whole 20-month period. Fig. 8 shows the resulting RMSE map for each month separately. The first three months of 2016 present the highest difference, mainly near the double crest structures (10 -15 geomagnetic North and South) known as equatorial ionization anomaly (EIA) (Takahashi et al., 2016). The same analysis comparing VTEC maps produced by IGS and NTCM-GL (Fig. 9) did not show this anomaly. In Figs. 8 and 9 the lack of RMSE data over a large region, which   coincides with the Amazon rainforest area, is caused by the absence of IGS stations.
Spatial and temporal resolution in NTCM-GL and IGS VTEC maps are the same: 5 in longitude and 2.5 in latitude, every 2 h. Therefore, the set of matching data to compute Fig. 9 includes all TEC values in both VTEC maps that are filtered by the position of each IGS station using a tolerance of 2.5 in latitude and longitude. The SUPIM, on the other hand, has a spatial resolution of 1 in latitude and longitude, which reduces the matching data by half (only integer values of latitude in IGS VTEC maps are used). Thus, the RMSE shown in Fig. 8 has a lower resolution compared to the results shown in Fig. 9.
Considering the daily variation of solar irradiance in the ionosphere, we expect a difference in errors between diurnal and nocturnal periods. Fig. 10 shows the RMSE of the difference between IGS VTEC maps and SUPIM (blue dots) as well as IGS VTEC maps and NTCM-GL (red triangles)-both over the 20-month period and grouped by the hour of the day, in Coordinated Universal Time (UTC). The Central-South American sector local time extends from Greenwich Mean Time (GMT)-3 to GMT- Fig. 10. RMSE of the difference between SUPIM (blue dots) or NTCM-GL (red triangles) and IGS VTEC maps, integrated over the 20-month period and grouped by the hour of the day. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) Fig. 11. RMSE of the difference between IGS and SUPIM (top panel) as well as IGS and NTCM-GL (bottom panel), grouped by the hour of the day for each month separately. 5. Considering UTC, a shift in diurnal and nocturnal periods can be observed. In Fig. 10 one can see that both NTCM-GL and SUPIM perform better mostly between late evening and early morning periods (local time), and SUPIM errors are lower between 10 and 12 UTC. Fig. 11 shows the RMSE grouped by the hour of the day for each month separately, both for SUPIM (top panel) and NTCM-GL (bottom panel) VTEC maps. Like in the previous figures' assessment, it is possible to verify that the models show less difference to IGS VTEC maps between late evening and early morning (local time) and during winter season in the southern hemisphere. Also, the spikes in TEC values observed in April 2016 and April 2017 for NTCM-GL (Fig. 3) seem to be correlated to diurnal period, since RMSE (Fig. 11) is higher between 12 and 22 UTC.

Conclusion
A comparison of different approaches to model the ionosphere TEC on the Central and South Americas over a period of 20 months is presented. As a reference for TEC values the IGS VTEC maps were used, limited to a range covered by IGS stations available in the region. It is relevant to mention that besides the comparison discussion, the VTEC behavior was also analyzed.
It has been noticed the VTEC measurements respond clearly to the F10.7 modulation of 27-day period attributed to the solar rotation period. Also, seasonal variation played an important role for daily mean VTEC, with annual variations observed at southern stations located at latitudes greater than $25 S. The VTEC seasonal dependence with a semi-annual variation was observed for stations at latitudes between AE25 , except for the sectors near AE18 (see Fig. 4). The mean VTEC variation with an unexpected minimum during local summer was explained by the overlay effect of solar flux. In general, all these features presented by the VTEC observations were reproduced by the two models (NTCM-GL and SUPIM).
The empirical model (NTCM-GL) achieved an overall good agreement to IGS VTEC maps. However, a sensitive correlation to F10.7 was observed, which led to a significant divergence from the IGS VTEC model during fast F10.7 peak events. Since NTCM-GL is overestimating VTEC values for such cases, it was suggested to include the F10.7A dependence in its empirical coefficients. During the initial period of analysis the SUPIM results diverged more significantly from IGS VTEC maps, especially near EIA, and at the end of the period smaller discrepancies were also observed. They are mainly related to different sources of solar EUV fluxes used in the model. SUPIM produced underestimated mean VTEC values before April 2016, when the EUV fluxes from EUVAC model were used. Also, overestimated VTEC values were found after June 2017, with the use of EUV fluxes given by Solar2000 model. Our experiments exposed the need of EUV flux val-idations for both EUVAC and Solar2000 models, which could be explored in future work.
Our statistical analysis using the RMSE has shown that in general NTCM-GL model produces better results than SUPIM, as it can be seen in Fig. 11. In addition, it also confirms the discrepancies in the SUPIM results at the extremities of the analyzed period. Since the RMSE values for both NTCM-GL and SUPIM results decrease with the decreasing period of solar activity, mainly during the first southern winter season, we can conclude the models tend to produce better results with the decreasing level of solar radiation.