A Community Ionosphere‐Thermosphere Observing System Simulation Experiment (OSSE) Tool: Geospace Dynamics Constellation Example

,


Introduction
Data Assimilation (DA) is a statistical estimation method that aims to combine observation and model information optimally.For geospace applications, DA helps integrate neutral and/or plasma observations measured by various ground-based and/or space-based instrumentation into a numerical model of the thermosphere, ionosphere, and magnetosphere.This process ensures that the dynamics and physics of a numerical model are better constrained by observations, and the spatial and temporal gap in observing systems is filled with information from a model with global coverage.Like tropospheric numerical weather prediction, it has been recognized as a promising approach for space weather reanalysis, nowcasts, and forecasts, even though infrastructure to enable research in the geospace community is not currently state-of-the-art (Vourlidas et al., 2023).
Through the extensive use of DA and numerical forecasting tools, Observing System Simulation Experiments (OSSEs) offer a cost-efficient and objective approach to quantitatively assess the potential influence of new observational systems and alternative configurations for existing systems (e.g., Atlas, 1997;Hoffman & Atlas, 2016;Zeng et al., 2020).In an OSSE, synthetically generated observations (instead of real measurements) are assimilated into the model.The synthetic observational data are usually generated by sampling modeled environmental states from a "nature run" simulation and by adding measurement errors to the sampled model output data.Nature run simulation should be conducted carefully (e.g., European Centre for Medium-Range Weather Forecasts & National Centers for Environmental Prediction/National Weather Service/NOAA/U.S.Department of Commerce, 2007).This process should ideally be conducted with instrument simulators that mimic realistic error characteristics of measurements according to a realistic space-time sampling of a proposed observing system.When analyzing OSSE results, the nature run serves as the "truth" state.Because the nature run and the process of generating synthetic data are entirely controlled, OSSEs provide a systematic approach to qualify the potential impact of assimilating observations from present or future observing systems and examine issues with DA and numerical forecasting tools (Hoffman & Atlas, 2016).OSSEs can be applied to a single observing system like a satellite mission or a combination of a number of observing systems.
Despite the proven benefits of OSSEs, their application to geospace observing systems is rather limited due to the lack of well-supported DA and numerical forecasting tools and other challenging aspects of designing and executing OSSEs.In comparison to the tropospheric numerical weather prediction community, there is a considerable shortage of accessible software infrastructure and workforce development opportunities within the geospace community to meet the demands of implementing and carrying out OSSEs.
There are some examples of OSSE applications to different geospace observing ground-based and space-based systems in the past decade.For example, Matsuo and Araujo-Pradere (2011) demonstrate the capability of DA on the global ionospheric specification using the OSSE of ionosonde electron density.Hsu et al. (2014Hsu et al. ( , 2018) ) and Dietrich et al. (2022) focus on OSSEs of Constellation Observing System for Meteorology, Ionosphere, and Climate (COSMIC) I and II Electron Density Profile (EDP) data, while Yue et al. (2014) and Pedatella et al. (2020) assimilate synthetic data from both ground-based and space-based GNSS observing systems into coupled models of the thermosphere and ionosphere.Recently, the potential DA impact of observations from NASA missions, such as Global-scale Observations of the Limb and Disk (GOLD) and Ionospheric Connection Explorer (ICON), is also being investigated by OSSEs presented by Hsu and Pedatella (2021), He et al. (2021), and Laskar et al. (2021).Many of these studies are enabled by the DA Research Testbed/Thermosphere-Ionosphere-Electrodynamics General Circulation Model (DART/TIEGCM), which is currently the most widely used ionosphere-thermosphere (IT) DA system built on well-documented robust open-source community tools developed, tested, and maintained by the NSF NCAR over the past decade.Moreover, the NSF NCAR DART/ TIEGCM tool has been recently updated to be compatible with the DART DA and diagnostic tools and has gone through the onboarding process with the NASA/NSF Community Coordinated Modeling Center (CCMC).In the near future, DART/TIEGCM will be publicly available via CCMC's Run on Request service to serve the groundbased or space-based observing teams in the geospace community.
In this study, we simulate an observing system that will likely be provided by the planned NASA Geospace Dynamics Constellation (GDC) mission (Geospace Dynamics Constellation Science and Technology Definition Team, 2019; Rowland et al., 2022) to demonstrate how DART/TIEGCM community tool can be used to evaluate observational impacts of a proposed observing system.By making multi-parameter measurements (multiple types of both thermospheric and ionospheric environmental state variable measurements), the GDC aims to investigate the response of the ionosphere and thermosphere to the solar wind/magnetosphere forcing and the effect of internal processes on the redistribution of the mass, momentum, and energy in the ionosphere and thermosphere on local, regional, and global spatial scales (Geospace Dynamics Constellation Science and Technology Definition Team, 2019).The GDC is formulated as a 3-year constellation mission of six satellites in high-inclination nearcircular orbits at the 350-400 km altitude range, with each satellite equipped with an identical in situ science payload suite and with four different science operation phases (Rowland et al., 2022).
This study utilizes the GDC Phase 3 orbital configuration from the pre-formulated GDC orbital ephemeris, wherein six satellites are placed into evenly distributed longitudinal planes for global-scale investigation, for sampling synthetic data for OSSEs.Synthetic observations of neutral temperature, neutral wind, neutral composition, atomic oxygen ion density, and ion and electron temperature likely provided by the Modular Spectrometer for Atmosphere and Ionosphere Characterization (MoSAIC), Thermal Plasma Sensor (TPS), and Atmospheric Electrodynamics probe for THERmal plasma (AETHER) are considered as summarized in Table 1.See Section 2 for discussion on the suitability of DART/TIEGCM for OSSEs with certain types of GDC measurements, and Section 5 for possibilities of OSSEs with additional types of GDC measurements than those considered in this paper.
The nature run is set to simulate the response of the IT system to the St. Patrick's Day geomagnetic storm on 17 March 2013 (see Figure 3).Synthetic GDC observational data sampled from this nature run are assimilated into TIEGCM using one of the ensemble filter methods implemented in DART.Following Hsu et al. (2014), wherein a set of OSSEs for the COSMIC observing system are conducted with different OSSE configurations, five sets of OSSEs are conducted to compare the effects of assimilating various combinations of GDC synthetic observations listed in Table 1 during this storm period.The root-mean-square errors in assimilation analyses obtained from these OSSEs are analyzed using the nature run as the truth state.
The rest of the paper is structured as follows.In Section 2, we briefly introduce the DART/TIEGCM tool as well as the OSSE workflow.Section 3 describes specific aspects of the OSSE design for the St. Patrick's Day geomagnetic storm period, including the nature run, TIEGCM ensemble simulation initialization, and synthetic GDC observations used in this study.Section 4 presents the OSSE results.Sections 5 and 6 provide discussion and summary as well as suggestions for some future work.

Ionosphere-Thermosphere OSSE Tool
The DA and forecasting system used to execute OSSEs in this study is DART/TIEGCM, which is built on two community open-source research tools developed and supported by the NSF NCAR.TIEGCM is a numerical dynamical (forecasting) model of the thermosphere and ionosphere system developed at the NSF NCAR High Altitude Observatory (see Section 2.1).As described in Section 2.2, DART is a DA tool developed by the NSF NCAR Computational and Information Systems Lab (CISL).The combination of DART and TIEGCM leads to a unique coupled IT DA capability with a number of benefits to OSSEs of multi-parameter IT observations.Sections 2.3 and 2.4 detail the advantages of DART/TIEGCM to enable so-called coupled IT DA.Section 2.5 describes OSSE workflow steps for DART/TIEGCM.

TIEGCM
TIEGCM is a general circulation model of the thermosphere and ionosphere with self-consistent ionospheric electrodynamics (Qian et al., 2014;Richmond et al., 1992).Non-linear dynamic coupling between the thermosphere and the ionosphere is solved self-consistently so that changes in ionospheric state variables affect the evolution of thermospheric variables and vice versa.The present study employs a low-resolution version of TIEGCM v2.0 with a vertical resolution of two grid points per scale height and a horizontal resolution of 5°× 5°i n geographic longitude and latitude.In the default setting, magnetospheric forcing is specified by empirical models of the high-latitude plasma convection (e.g., Heelis et al., 1982;Weimer, 2005) and an idealized auroral  (Emery et al., 2012;Roble & Ridley, 1987).We use the Weimer model for this study.The Weimer plasma convection and auroral models are parameterized with respect to solar wind velocity, solar wind density, and Interplanetary Magnetic Field (IMF) By and Bz components.

DART
DART provides software tools for implementing a variety of ensemble DA techniques for different types of dynamical models, together with diagnostic tools and OSSE tools (Anderson et al., 2009;NSF NCAR Data Assimilation Research Section, 2024).The ensemble DA utilizes a flow-dependent covariance estimated from ensemble model simulations to account for the model uncertainty (Evensen, 1994).An ensemble of model simulations that simulate the range of possible environmental states is generated using a Monte-Carlo procedure, and is used to represent the evolution of the state and covariance.Ensemble samples are also used to represent the non-linear relationship between observations and model states.When it is applied to TIEGCM, the flowdependent ensembles can describe the dynamical evolution of multi-variable covariance between thermospheric and ionospheric states, which is an enabling factor of coupled IT DA as described in Section 2.4.The Manhattan version of DART provides users with more flexible ensemble DA capabilities, which can now be used with TIEGCM.

DART/TIEGCM
The capability of DART/TIEGCM to specify and forecast the IT system in support of scientific studies and space weather applications has been demonstrated by many studies over the past decade (e.g., Chen et al., 2017;Chen, Lin, Matsuo, & Chen, 2016;Chen, Lin, Matsuo, Chen, Lee, et al., 2016;Dietrich et al., 2022;Hsu et al., 2014Hsu et al., , 2018;;Lee et al., 2013;Matsuo & Araujo-Pradere, 2011;Matsuo & Hsu, 2021).DART/TIEGCM (with the Manhattan version of DART and TIEGCM v2.0) with GDC OSSE function has been released recently.The updated workflow script allows the use of job arrays, which offer a mechanism for submitting and managing collections of similar jobs quickly and easily, for executing TIEGCM ensemble simulations.This makes the forecast step faster than the previous DART/TIEGCM version due to more efficient execution of ensemble simulations by TIEGCM.The functionality of the Manhattan version of DART/TIEGCM is fully tested in this study.
An ensemble DA experiment is implemented as recursive applications of so-called DA cycles as illustrated schematically in Figure 1.One cycle includes an analysis step and a forecast step.Figure 1 depicts how synthetic observational data (gray dot) sampled from the nature run simulation (green line) is assimilated through the cycling of these steps.Ensemble model states are first advanced by TIEGCM from t 0 to t 1 over the first forecast step (orange lines).At the transition of a forecast step to an analysis step, arrays of TIEGCM's thermospheric and ionospheric model state variables defined on the 3-dimensional model grid are vectorized into a single 1dimensional DART state vector, x = [x 1 , x 2 , …, x S ] , where S is the size of the state vector.As described in Section 3.4, each OSSE has a design choice as to which TIEGCM model state variables to be included in DART state vector.In the analysis step, an M-member ensemble of DART state vectors, {x (1) , x (2) , …, x (M) } , is updated based on the observation sequence vector, y = [y 1 , y 2 , …, y N ] , where N is the size of the observation vector.Which type(s) of synthetic observations are assimilated is also a design choice of each OSSE.When transitioning from an analysis step to a forecast step, the DART state vector is reconstructed back to arrays of TIEGCM model state variables.x and y do not have to be limited to either thermospheric or ionospheric variables and observations.Therefore, an ionospheric observation can help update the DART state vector constructed by TIEGCM thermospheric model state variables and vice versa.This is the mechanism by which ionosphere-thermosphere coupling is incorporated in update steps.In the next forecast step, the updated ensemble model states are further advanced to t 2 by TIEGCM, integrating the initial conditions updated by the observations by DART from t 1 to t 2 .When running an ensemble DA experiment, the analysis and forecast steps are carried out alternately.The time interval between two consecutive analysis steps is the DA window, which is 1 hr in this study.When advancing the state variables using TIEGCM, the updated ensemble model states of the thermosphere or ionosphere can affect other non-updated ensemble model states.The time evolution of updated and non-updated thermospheric and ionospheric model states are solved self-consistently over forecast periods.As addressed in detail in Section 2.4, if the coupling between the thermosphere and ionosphere is accounted for during both update and forecast steps, it constitutes a strongly coupled IT DA.If coupling between the thermosphere and ionosphere is considered only during forecast steps, it constitutes a weakly coupled IT DA.
When constructing TIEGCM model state variables, it is important to note that only model state variables that are part of TIEGCM prognostic variables in initial conditions (e.g., horizontal neutral wind, neutral temperature, compositions of major neutral species, densities of primary ion species, ion temperature, and electron temperature) will carry over the impact of DA through analysis-forecast cycles.For instance, in TIEGCM electron density is computed as the sum of densities of all ion species, which are dominated by atomic oxygen ions in the altitude range of 350-400 km, rather than directly computed.Updating TIEGCM electron density during the analysis step thus has no effect in the subsequent forecast step.It is, therefore, important to include TIEGCM model state variables of atomic oxygen ion density in the DART state vector when assimilating plasma density observations (Hsu et al., 2014).
The resolution of TIEGCM used in this study is 5°× 5°in the longitude and latitude.The model is not suited for the assimilation of sub-grid scale observations.In addition, observations related to TIEGCM drivers, such as high-latitude plasma convection and auroral particle precipitation, cannot be directly assimilated into TIEGCM since these are specified by external empirical or numerical models rather than solved in TIEGCM itself.Given the global nature of TIEGCM, the tool is best suited to assimilate globally distributed observations.This is the reason why this OSSE study focuses on the GDC Phase 3 orbital configuration to generate synthetic observations.

Strongly Coupled Ionosphere-Thermosphere Data Assimilation
One of the keys to improving the DA performance for the IT system is the incorporation of neutral-ion coupling in both forecast and analysis steps of DA cycles (e.g., Dietrich et al., 2022;Hsu et al., 2014;Matsuo & Hsu, 2021).The combination of DART and TIEGCM leads to a strongly coupled IT DA capability whereby neutral-ion coupling is incorporated throughout analysis and forecast cycles.
For example, in situ observation of neutral temperature along satellite orbits can be used to update TIEGCM model state variables of atomic oxygen ion density based on the sample correlation between thermospheric and ionospheric variables estimated by the model ensemble.In this case, y is composed of observations of neutral temperature at a given orbital location.The DART state vector, x, needs to include TIEGCM model state variables of both thermospheric neutral temperature and ionospheric atomic oxygen ion density defined on the model grid.
In this way, ion-neutral coupling is taken into account in the analysis step.In the forecast step, through ion-neutral coupling physics solved in the TIEGCM model, updated ionospheric states continue to affect thermospheric states and vice versa.If x is constructed only with the thermospheric neutral temperature, only that state is updated by observations in the analysis step, and ion-neutral coupling is not taken into account in the analysis step.In this case, DART/TIEGCM's capability is restricted to weakly coupled IT DA.
By including TIEGCM model state variables that are not directly observed, we can update unobserved model state variables in the analysis step.However, past DART/TIEGCM studies (e.g., Hsu et al., 2014;Matsuo & Hsu, 2021) suggest that the effectiveness of assimilating observations whose relationship to model state variables is highly non-linear is limited.This is partly due to sampling errors, and the limitation of the linear regression used in DART's Ensemble Adjustment Kalman Filter (EAKF) variant to estimate the relationship between x and y from ensemble samples.Including more TIEGCM model state variables in the DART state vector does not guarantee an improvement in DART/TIEGCM performance.In fact, it can result in an ineffective and inefficient experiment configuration.
With the help of the strongly coupled IT capability of DART/TIEGCM, the impact of multi-parameter IT measurements, such as those expected from the GDC mission, can be maximized.DART/TIEGCM performance depends on how the y vector is constructed with GDC measurements, which TIEGCM model state variables are included in the x vector, as well as other factors that control the performance of TIEGCM forecast performance and DART DA analysis.The intent of the five OSSEs designed in Section 3.4 is to compare strongly and weakly coupled DA approaches.Details of the experiment setup are given in Section 3.

DART/TIEGCM OSSE Workflow
Figure 2 illustrates the OSSE workflow using DART/TIEGCM.There are three essential steps to set up an experiment.In Step 1, a nature run simulation representing the true states of the upper atmosphere needs to be executed, as indicated by the green shaded circle in Figure 2. In Step 2, based on the observing system information, including locations/times and errors of measurements, synthetic observations need to be generated from the nature run simulation using DART's synthetic data generation tool with the assumption of Gaussiandistributed observation errors.This step is indicated by the black box in Figure 2. In Step 3, using the DART ensemble DA tool, the synthetic observations are then assimilated into a set of model ensembles that differ from the nature run as shown in the red box in Figure 2. The assimilation analysis results are compared with the nature run as indicated by the purple arrow in Figure 2. Note that model ensemble simulations need to be initialized before Step 3 (see Section 3.3).Earth and Space Science

OSSE Design and Configuration
This section describes how each step of the OSSE workflow described in Section 2.5 is specifically implemented using DART/TIEGCM for the pre-formulated GDC observing system during the St Patrick's Day Storm on 17 March 2013.Section 3.1 and 3.2 describe how the nature run simulation is executed and how synthetic observational data are generated, respectively.Section 3.2 addresses the limited scope of synthetic GDC observations generated using the readily available DART/TIEGCM tool in this study, which is sufficient to demonstrate the OSSE tool.For example, the synthetic observational data generated here do not reflect realistic GDC measurement characteristics (e.g., resolution, quality, and uncertainty).Section 3.3 describes how TIEGCM model ensemble simulations are initialized.Section 3.4 details how the DART state vector, x, and the observation sequence vector, y, are configured for each OSSE.

Nature Run
To carry out OSSEs under geophysical conditions during the 2013 St. Patrick's geomagnetic storm, the nature run is executed to simulate the response of the thermosphere and ionosphere to this geomagnetic storm using TIEGCM.A strong geomagnetic storm happened on 17 March 2013.At about 0600 UT on 17 March, the IMF Bz turned southward (as shown in Figure 3), while the D st rose to 15 nT rapidly at 0700 UT (the storm's sudden commencement) and dropped down to around 130 nT at about 2100 UT on the same day.The study uses real IMF data, F10.7 index, and solar wind velocity data to drive TIEGCM on 17 March 2013.A 2-week TIEGCM simulation, which started on 03 March 2013 and continued until reaching the quasi-steady state under geomagnetically quiet conditions before the storm period, precedes the 24-hr nature run simulation from 0 UT of 17 March to 0 UT of 18 March.

GDC Synthetic Observational Data
We use the Phase 3 pre-formulated GDC orbital ephemeris data provided by NASA to sample the nature run simulation to generate synthetic observations of neutral temperature, neutral wind, neutral composition, atomic oxygen ion density, and ion and electron temperature as listed in Table 1.For observation errors, uncorrelated Earth and Space Science Gaussian random noise is added to these sampled data.We use the requirement values listed in the GDC announcement of opportunity (Rowland et al., 2022) to specify the standard deviation of Gaussian noise for each type of synthetic GDC observation, and these values are listed in Table 2.The sampling rate of these observations is assumed to be about 30 s for all types.
The number of observations of each type amounts to 2,825 per 1 day.

TIEGCM Ensemble Initialization
The total number of ensemble members is 90 (M = 90).TIEGCM ensemble members are generated by perturbing the solar wind velocity, solar wind density, IMF magnitude and directions, and the F10.7 index from the Gaussian distribution with a standard deviation of 10% of real data.The spinup period is 2 weeks.After the 2-week spin-up period, OSSEs are launched at UT 00 on March 17.The gray lines in Figure 3 show the perturbed solar wind conditions, IMF, and F10.7 index used in TIEGCM ensemble members starting 2 weeks before the experiment, while the red lines show the ensemble mean.The uncertainty in these forcings, as indicated by the spread of gray lines in Figure 3, is passed on to the thermosphere and ionosphere response.
The blue and red lines of solar wind conditions and IMF in Figure 3 are close to each other, implying these perturbations are unbiased.On the other hand, we add a 10 SFU bias to the F10.7 index perturbation.The red line of the F10.7 index is higher than the blue line of the F10.7 index in Figure 3, implying a biased scenario.With this scenario, the simulated mean response of the thermosphere and ionosphere to external forcings in TIEGCM is biased to be higher in terms of F10.7 index.
Although a biased scenario is used in this study, with the forcing perturbation shown in Figure 3, the prior model spread is large enough to assimilate most synthetic data.For example, the global-averaged prior model spread of neutral temperature, ion temperature, and electron temperature are 85, 81, and 155 K at UT 00 of March 17.
Additional TIEGCM ensemble simulations, with the same driver perturbation used for the ensemble initialization but without DA, are also performed as a benchmark, which is referred to as the control (ensemble) experiment.

OSSE Configuration
As described in Section 2.3, we can make certain design choices for each OSSE as to which TIEGCM model state variables are included in the DART state vector, x, and which type(s) of synthetic GDC observations are added to the DART observation vector, y.When making these design choices for x and y, we have taken a few factors into consideration, most importantly, the consideration of the efficient and effective experiment configuration elaborated in Section 2.4.As listed in Table 1, GDC synthetic observations considered in this study include neutral temperature (y Tn ), neutral zonal and meridional winds (y Un and y Vn ), neutral atomic oxygen density (y O ), molecular oxygen density ( y O 2 ), molecular nitrogen density ( y N 2 ) , ion and electron temperature (y Ti and y Te ), and atomic ion oxygen density ( y O + ) .These observation types all directly correspond to TIEGCM model state variables that are parts of TIEGCM initial conditions, including the zonal neutral wind (x Un ), meridional neutral wind (x Vn ), neutral temperature (x Tn ), atomic oxygen mixing ratio (x r O ) , molecular oxygen mixing ratio (x r O 2 ) , ion temperature (x Ti ), electron temperature (x Te ), and atomic oxygen ion density (x O + ) .To assimilate synthetic GDC observations of neutral temperature (y Tn ), we include y Tn into y and include TIEGCM neutral temperature state variable (x Tn ) into x.Note that DART software provides the flexibility to use model state variables that are not part of the DART state vector to compute the predicted observations.The model state variables included in x and updated by DA and the DART state vector do not have to match.
Five sets of OSSEs are designed with different configurations of the DART state vector, x, and observation sequence vector, y, as listed in Table 3.In OSSE1, we focus on adjusting neutral temperature using observations of neutral temperature.In OSSE2, all types of thermospheric observations listed in Table 1: Neutral temperature, neutral zonal and meridional winds, neutral atomic oxygen density, molecular oxygen density, and molecular nitrogen density, are assimilated.We here include in x major neutral composition states of atomic oxygen mixing ratio, molecular oxygen mixing ratio, as well as zonal neutral wind, meridional neutral wind, and neutral temperature.Three major neutral compositions in the thermosphere are atomic oxygen, molecular oxygen, and molecular nitrogen.They are represented by mixing ratios in TIEGCM, and the sum of atomic oxygen, molecular oxygen, and molecular nitrogen mixing ratios are set to one.Therefore, there is no need to include the molecular nitrogen mixing ratio in x, even though molecular nitrogen density is included in y.We update atomic oxygen ion density in the model using observations of atomic ion oxygen density in OSSE3.The atomic oxygen ion is the dominant ion species in the altitude range of 350-400 km.In OSSE4, we update atomic oxygen ion density along with two other ionospheric states of ion temperature and electron temperature.OSSE1 and OSSE2 focus on thermospheric DA, while OSSE3 and OSSE4 focus on ionospheric DA.These OSSEs are thus designed not to take advantage of the strongly coupled IT DA capability offered by DART/TIEGCM.In OSSE5, we update both thermospheric and ionospheric variables using all GDC synthetic observations listed in Table 1.This OSSE is designed to take full advantage of the strongly coupled IT DA capability.
The assimilation window is 1 hr to track hourly changes in the thermospheric and ionospheric response to the geomagnetic storm.To prevent the sum of updated mixing ratios of neutral compositions from becoming larger than one due to a spurious ensemble DA update effect, the mixing ratio of each neutral composition are re-scaled before entering the subsequent forecast step.The rest of the OSSE setting is the same as Hsu et al. (2014).The DA method is EAKF (Anderson, 2001), the localization function is Gaspari-Cohn function (Gaspari & Cohn, 1999) with a vertical localization length scale of 0.2 Earth radius and without vertical location, and the inflation method is Gaussian spatially-varying state space inflation (Anderson & Collins, 2007;Anderson et al., 2009).

Results
This section presents analyses of five OSSE experiments conducted in this study.Through the comparison of results from OSSE5 to the control experiment, the general depiction of data impact is first provided.Quantitative comparisons of results from OSSE1-5 are then shown to contrast the weakly coupled DA approaches wherein observations of either thermospheric parameters are assimilated (e.g., OSSE 1 and OSSE 2) or ionospheric parameters are assimilated (e.g., OSSE 3 and OSSE 4) to the strongly coupled DA approach wherein multiparameter IT observations are assimilated.The post-analysis of OSSE results helps assess the overall impact of assimilating different types of GDC synthetic observations on the global thermospheric and ionospheric specification.
Figures 4 and 5 provide snapshot views of OSSE5 results after one forecast-analysis cycle in terms of the global distribution of atomic oxygen ion density and neutral temperature at about 350 km altitude at UT 01 on 17 March 2013.The colored contour plot in Figure 4a shows the true distribution of atomic oxygen ion density from the nature run simulation and white dots indicate the locations of GDC synthetic observations from 6 satellite tracks.
Figure 4b shows the mean atomic oxygen ion density distribution from the control ensemble simulation with no DA.Note that the temperature in TIEGCM ensemble simulations used in DART DA and control experiments are higher than the temperature in the truth simulated by the nature run since the F10.7 index is biased.Figures 4c and  4d are the difference maps of the ensemble mean of OSSE5 from the truth before and after DA, respectively.Figure 5 is similar to Figure 4, but for neutral temperature.By comparing the pair of (c)-(d) plots in Figures 4 and  5, the impact of DA is clear in the EIA region, especially between 180°and 60°longitude.The regions of data impact match the locations of GDC observations.OSSE5 results shown in Figures 4 and 5 suggest that the GDC OSSE is implemented successfully using the Manhattan version of DART/TIEGCM.Figures 4, 5c, and 5d are useful as they provide a visual inspection of the error reduction distribution due to assimilation of GDC synthetic observations in OSSE5 in terms of difference maps from the truth at a particular time and altitude.Earth and Space Science

10.1029/2024EA003684
To quantify the impact of assimilating GDC synthetic observations over the entire experiment period, the rootmean-square error (RMSE) of OSSE results from the truth is next computed over the entire TIEGCM model domain.Figure 6 displays neutral temperature RMSEs for the mean of the ensemble control experiment, the prior ensemble mean from OSSE5, and the posterior ensemble mean from OSSE5.The posterior RMSE represents the impact of assimilating data up to the current time through forecast-analysis cycles, and the prior RMSE reflects the effect of assimilating data up to the previous analysis step and the current forecast step.The control experiment has the largest RMSEs since no data is assimilated.The posterior RMSEs are the smallest since they represent the state right after DA updates.The prior RMSEs are larger than the posterior RMSEs due to the forecast model error growth during the forecast step.Note that the driver and other model parameters in TIEGCM ensemble simulations are held unchanged during the whole experiment period.Both the prior and posterior RMSEs from OSSE5 are significantly smaller than the RMSEs of the control experiment, indicating a positive impact of assimilating GDC synthetic observations to improve the thermosphere and ionosphere specification under geomagnetically disturbed conditions.The time-averaged RMSEs of the control experiment, prior state, and posterior state are 266.1,68.6, and 39.2 K, respectively.
To contrast the outcome of different OSSE design choices, the results of OSSE1-OSSE5 are compared in terms of prior RMSEs.To provide representative perspectives on the impact of assimilation of GDC synthetic observations on the thermosphere and ionosphere state estimation, RMSEs of neutral temperature and atomic oxygen ion density are shown and discussed in this section.
The prior RMSEs of the neutral temperature from OSSE1-5 and from the control experiment are displayed in Figure 7. Since only synthetic observations of thermospheric parameters are assimilated in OSSE1 and OSSE2, OSSE1-2 RMSEs represent the impact of assimilating thermospheric observations on the global thermospheric temperature estimation, which should be contrasted to RMSEs of OSSE3 and OSSE4 wherein only ionospheric observations are assimilated.OSSE5 represents a case in which both thermospheric and ionospheric observations Earth and Space Science    are assimilated.We notice that RMSEs of OSSE3 and OSSE4 are smaller than that of the control experiment even if none of the thermospheric state variables are updated, indicating the capability of adjusting a hidden state that is not observed via the forecast steps wherein the coupling between the thermosphere and ionosphere is selfconsistently solved by TIEGCM.This result is consistent with Hsu et al. (2014) and Dietrich et al. (2022).
The time-averaged RMSEs of the control experiment, OSSE3, and OSSE4 are 74.6,52.9, and 52.4 K, meaning that the errors in thermospheric temperature in OSSE3 and OSSE4 are reduced to ∼70.9% and ∼70.2% of the control experiment, respectively.RMSEs of OSSE1, OSSE2, and OSSE5 suggest that a significant error reduction in thermospheric temperature can achieved by assimilating thermospheric observations.Note that synthetic observations of thermospheric temperature are assimilated in OSSE1, OSSE2, and OSSE5.The timeaveraged RMSEs of OSSE1, OSSE2, and OSSE5 are 20.0, 20.1, and 18.4 K, which are ∼26.9%,∼27%, and ∼24.6% of the control experiment time-averaged RMSE.Though OSSE1 RMSE and OSSE2 RMSE mostly overlap, OSSE2 RMSE becomes larger sometimes, resulting in the time-averaged RMSE of OSSE2 being slightly larger than the time-averaged RMSE of OSSE1.This suggests that assimilating synthetic GDC observations of neutral winds and temperature and updating corresponding model state variables results in the error increase in temperature.Two possible results may cause this increase in error.First, the nonphysical dynamically imbalanced increments produced by DA during update steps induce error growth during forecast steps.Second, weakly correlated observation types impact the temperature due to the sampling error.This result further reinforces the importance of OSSE design choices in selecting what types of observations and which model state variables are included when constructing x and y.
The prior RMSEs of atomic oxygen ion density from OSSE1-5 and from the control experiment are displayed in Figure 8. Similar to the comparison shown in Figure 7, the intent of Figure 8 is to contrast the RMSEs of plasma density from OSSE1-2 wherein only synthetic observations of thermospheric parameters are assimilated to that of OSSE3-4 wherein only ionospheric observations are assimilated.OSSE1 and OSSE2 RMSEs are both smaller than the control experiment RMSE, and OSSE2 RMSE is significantly smaller than OSSE1 RMSE.This indicates updating neutral composition by including it in the DART state vector, x, is important for estimating oxygen atomic ion density in DART/TIEGCM.The neutral composition is known to affect the plasma production and loss processes.This is aligned with the past work with DART/TIEGCM (Hsu et al., 2014).The OSSE3 and OSSE4 RMSEs show that assimilating synthetic GDC observations of atomic oxygen ion density and updating the corresponding ionospheric model state variables result in the error reduction in atomic oxygen ion density.The comparison of RMSEs from OSSE3-4 suggests that the impact of assimilating ion and electron temperature observations is not very significant in this case.Updating both thermospheric and ionospheric state variables with assimilation of all types of thermospheric and ionospheric synthetic observations leads to the best outcome.The time-averaged RMSEs of OSSE1-5 are 1.18 × 10 5 , 0.95 × 10 5 , 0.66 × 10 5 , 0.66 × 10 5 , and 0.57 × 10 5 , while the time-averaged RMSE of the control experiment is 1.31 × 10 5 .The RMSEs in OSSE1-5 are 89.8%,72.2%, 50.4%, 50.2%, and 43.3% in comparison to that of the control experiment.
The analysis of OSSE1-5 shown in Figures 7 and 8 demonstrate the benefit of the strongly coupled IT DA approach implemented in DART/TIEGCM.Updating the IT system state variables using only thermospheric observations (OSSE1 and OSSE2) or ionospheric observations (OSSE3 and OSSE4) leads to different DA analysis performance and levels of improvements in the thermospheric and/or ionospheric state specification as well as forecasting.Assimilating multi-parameter IT observations simultaneously (OSSE5) results in the most robust performance and greatest improvement.However, as indicated in the size of the state vector and observation sequence vector shown in Table 3, OSSE5 needs more computing resources than other experiments.The strongly coupled IT DA plays a key role in maximizing the data value of multi-parameter IT observations, expected to be made available by the planned GDC mission for space weather research.

Discussion
The OSSEs presented here are appropriately scoped to meet the goal of this study, the OSSE tool demonstration.
In the future, some limitations should be overcome before OSSEs using the DART/TIEGCM tool can be used to quantitatively assess the impact of GDC data on the IT state specification and forecasting as well as GDC science objectives.
Synthetic observational data used in the OSSEs are generated based on the GDC Phase 3 orbital configuration.We assume the observation errors of each type are normally distributed and uncorrelated from each other, with the standard deviation values as listed in Table 2.The measurement sampling rate is assumed to be 30 s.These assumptions likely do not reflect the characteristics of GDC measurements that are being formulated and implemented in the future.In other words, synthetic data used in the current study do not reflect realistic GDC measurement characteristics (e.g., resolution, quality, and uncertainty).The process of generating synthetic Earth and Space Science The ensemble size is set to 90 in the current OSSEs.Sampling errors and rank-deficiency issues often encountered in a Monte-Carlo method for estimating covariance need to be carefully mitigated using the covariance localization and inflation method.Such needs are demonstrated for ionospheric DA (e.g., Hsu et al., 2018).DART can provide tools, such as covariance inflation (e.g., Anderson, 2007) and localization, to ameliorate the potential negative impact.Most DART covariance localization and inflation used in the experiments shown in this study use default settings.The localization function is Gaspari-Cohn function, and the inflation method is Gaussian spatially-varying state space inflation.A dedicated study to tune and calibrate DART auxiliary method parameters will be needed in the future to maximize DART/TIEGCM performance.
Even with the limited scope, this study considers GDC synthetic observation types that are most suitable for the assimilation with DART/TIEGCM, except for EDPs expected from the Probe for Radio Occultation of Ionospheric LayErs (PROFILE).Hsu et al. (2014Hsu et al. ( , 2018)), Pedatella et al. (2020), andDietrich et al. (2022) show that the radio occultation EDPs can be assimilated into DART/TIEGCM, and can help improve the ionospheric and thermospheric nowcasts and forecasts.Although not considered in this study, we expect EDPs from PROFILE to help improve the IT state specification and forecasting.
As mentioned before, observations related to external forcings cannot be directly assimilated into TIEGCM.The planned GDC mission includes payloads such as the Comprehensive Auroral Precipitation Experiment (CAPE) for measuring high-energy charged particles and the Near Earth Magnetometer Instrument in a Small Integrated System (NEMISIS) for measuring magnetic field perturbations due to ionospheric currents.These measurements need to be assimilated first using a different DA tool, such as Assimilative Mapping of Geospace Observations (AMGeO) and Assimilative Mapping of Ionospheric Electrodynamics (AMIE).DART/TIEGCM can be easily configured to incorporate DA analysis of auroral particle and ionospheric convection patterns from the AMGeO and AMIE.

Summary
This paper illustrates the utility of DART/TIEGCM as a community Ionosphere-Thermosphere OSSE tool using a currently planned GDC observing system for a realistic geomagnetic storm scenario.DART/TIEGCM is an open-source community ensemble DA tool that facilitates the ingestion of IT observations into a comprehensive first-principles community model of the IT system.DART/TIEGCM draws on the strengths of both DART and TIEGCM, and implements a strongly coupled IT DA approach wherein the coupling between neutral and plasma parts of the upper atmosphere can be accounted for in both analysis and forecast steps (see Figures 1 and 2).The Manhattan version, DART/TIEGCM, has been released, and its functionality is tested in the current study.A set of OSSEs are carried out under the 2013 St. Patrick's Day storm conditions, utilizing the GDC Phase 3 orbital configuration, wherein six satellites are placed into evenly distributed longitudinal planes for global-scale investigation.Synthetic GDC observations of neutral temperature, neutral winds, neutral composition, atomic oxygen ion density, ion temperature, and electron temperature are considered.The OSSEs generally show the positive impact of assimilating synthetic GDC observations into TIEGCM on reducing forecasting errors of the thermospheric and ionospheric states such as neutral temperature and atomic oxygen ion density.In spite of the limited scope of OSSEs presented in this study, coupled IT DA approaches implemented in DART/TIEGCM can help maximize the impact of multi-parameter observations, such as those from GDC.
Earth and Space Science 10.1029/2024EA003684 Our OSSE experiment shows that updating the system using thermospheric or ionospheric measurements can improve the thermospheric and/or ionospheric weather nowcasts and forecasts, while assimilating both thermospheric and ionospheric measurements shows the most significant improvement.This indicates the importance of multi-parameter IT observations on the IT weather nowcasts, forecasts, and reanalysis in DART/TIEGCM.Furthermore, as discussed in Section 5, a few concrete steps can be taken to mature the GDC OSSE study with DART/TIEGCM and broaden the community participation in the GDC mission development.In spite of the wellrecognized benefits of OSSEs, the lack of accessible and easy-to-use OSSE tools hinders scientists in the geospace community from using OSSE for research purposes.The availability of DART/TIEGCM as a community IT OSSE tool, and additional future user support and training made available through the CCMC can change this reality.

Figure 1 .
Figure 1.Schematic illustration of the ensemble data assimilation forecast-analysis cycles.The green line indicates the nature run (true) state trajectory, the gray dot indicates synthetic observational data, and the orange lines indicate trajectories of the ensemble model states.In the first forecast step, the ensemble model states advance from t 0 to t 1 .At t 1 , the synthetic data is assimilated into the model ensemble to bring the model state closer to the true state.The updated ensemble model states are then used as the new initial condition to advance the model ensemble from t 1 to t 2 in the second forecast step.

Figure 2 .
Figure 2. Flowchart of DART/TIEGCM workflow from Step 1 (green circle) wherein a nature run is executed, Step 2 (black box) wherein synthetic observational data are generated, and to Step 3 (red box) wherein synthetic observations are assimilated.The blue quadrilateral indicates DART tools for generating synthetic data and ensemble Data Assimilation.

Figure 3 .
Figure 3. Interplanetary Magnetic Field Bx, By, Bz, solar wind velocity, and F10.7 index from DOY 62 to 77 of 2013.DOY 76 is 17 March.The blue lines are real values of these solar wind and F10.7 parameters used to drive the TIEGCM nature run simulation, the gray lines are perturbed parameter values used to initialize the 90-member TIEGCM ensemble simulation, and the red lines are the ensemble mean values.The red line of F10.7 index is higher than the blue line, indicating that the nature run is biased to be lower than the model ensemble.

Figure 4 .
Figure 4.The longitude-latitude distribution of atomic oxygen ion density from the pressure level 19 that corresponds to about 350 km altitude at UT 01 on 17 March 2013 from OSSE5.(a) True distribution from the nature run overlaid with Geospace Dynamics Constellation observation locations indicated by white dots.(b) Mean distribution from the control ensemble simulation with no Data Assimilation (DA).(c) Differences between the OSSE5 and truth before DA from OSSE5.(d) Differences between the OSSE5 and truth after DA from OSSE5.

Figure 5 .
Figure 5. Similar to Figure 4, but for neutral temperature.

Figure 6 .
Figure 6.Prior (blue) and posterior (red) root-mean-square error (RMSE) of neutral temperature from OSSE5, and RMSE of neutral temperature of the control experiment (yellow) on 17 March 2013.

Figure 7 .
Figure 7. Prior RMSEs of neutral temperature from OSSE1-5 and from the control experiment on 17 March 2013.

Figure 8 .
Figure 8. Similar to Figure 7, but for atomic oxygen ion density.

Table 1
Synthetic Observations Utilized in the Observing System Simulation Experiments model of the auroral particle precipitation Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2024EA003684,Wiley Online Library on [29/08/2024].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2024EA003684,Wiley Online Library on [29/08/2024].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License

Table 2
Synthetic Observations Utilized in the Observing System Simulation Experiments

Table 3
Configuration of DART State Vector, x, and Observation Sequence Vector, y, and Their Respective Sizes, S, N, for Each Observing System Simulation Experiment (OSSE) + , x Ti , x Te 225504 y O + , y Ti , y Te 354 OSSE5 x Tn , x Un , x V , x r O , x r O 2 , x O + x Ti , x Te 375840 y Tn , y Un , y Vn , y O , y O 2 , y N 2 , y O + , y Ti , y Te 1,062 Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2024EA003684,Wiley Online Library on [29/08/2024].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License

Earth and Space Science
Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2024EA003684,Wiley Online Library on [29/08/2024].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2024EA003684,WileyOnlineLibraryon[29/08/2024].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)onWileyOnlineLibraryfor rules of use; OA articles are governed by the applicable Creative Commons Licenseobservations should, in the future, incorporate instrument simulators that mimic realistic error characteristics of measurements according to a more realistic space-time sampling of the GDC mission.The TIEGCM nature run simulation, from which synthetic observations are sampled, is performed by the same TIEGCM model used in ensemble DA.In reality, models tend to be biased from observations due to inadequate specifications of external forcing and other model parameters as well as missing physics.Therefore, nature-run simulation should be conducted carefully (e.g., European Centre for Medium-Range Weather Forecasts & National Centers for Environmental Prediction/National Weather Service/NOAA/U.S.Department ofCommerce, 2007).Nature runs with a different model setting, such as high-resolution TIEGCM, or different models, such as the Whole Atmosphere Community Climate Model with thermosphere and ionosphere extension (WACCM-X), can provide us with a more measured assessment of the DA impact.