Real-data assimilation experiment with a joint data assimilation system： assimilating carbon dioxide mole fraction measurements from the Greenhouse gases Observing Satellite

Abstract The performance of a joint data assimilation system (Tan-Tracker), which is based on the PODEn4Dvar assimilation method, in assimilating Greenhouse gases Observing SATellite (GOSAT) carbon dioxide (CO2) data, was evaluated. Atmospheric 3D CO2 concentrations and CO2 surface fluxes (CFs) from 2010 were simulated using a global chemistry transport model (GEOS-Chem). The Tan-Tracker system used the simulated CO2 concentrations and fluxes as a background field and assimilated the GOSAT column average dry-air mole fraction of CO2 () data to optimize CO2 concentrations and CFs in the same assimilation window. Monthly simulated () and assimilated () data retrieved at different satellite scan positions were compared with GOSAT-observed () data. The average RMSE between the monthly and data was significantly (30%) lower than the average RMSE between and . Specifically, reductions in error were found for the positions of northern Africa (the Sahara), the Indian peninsula, southern Africa, southern North America, and western Australia. The difference between the correlation coefficients of the and and those of the and was only small. In general, the Tan-Tracker system performed very well after assimilating the GOSAT data.


Introduction
Atmospheric carbon dioxide (CO 2 ) is one of the most important greenhouse gases (GHGs) and its increase since pre-industrial times has directly led to global warming (IPCC 2013). The increase in atmospheric CO 2 is mainly caused by anthropogenic emissions. Moreover, as a long-lived GHG, atmospheric CO 2 can only be partially removed by natural processes, which in turn also contributes to atmospheric CO 2 levels. Natural processes, such as the carbon cycle, are sophisticated interactions forming a complicated feedback loop. The key to understanding these interactions is to determine the spatiotemporal distribution of atmospheric CO 2 concentrations and land CO 2 fluxes (CFs) (Protocol 1997).
There are several methods used to study the spatial and temporal distribution of CO 2 concentrations. Of these, surface observation is one of the most important. In the 1980s, the WMO's Global Atmosphere Watch program was established to monitor the long-lived GHGs that are directly affected by anthropogenic activities. However, there are currently fewer than 200 monitoring stations OPEN ACCESS approach, the Tan-Tracker data assimilation system is expected to be powerful and efficient.
In this study, we assimilated column average dry-air mole fraction of CO 2 (X CO 2 ) GOSAT observations from January 2010 to October 2010 into the GEOS-Chem model using the Tan-Tracker system, and compared the results with GOSAT observations. Details of the Tan-Tracker data assimilation system are presented next, in Section 2, followed by a description of the experiment and results in Section 3. Section 4 summarizes the conclusions that can be drawn from this study.

Tan-Tracker data assimilation system
The Tan-Tracker data assimilation system is a joint assimilation system capable of optimizing model states and parameters simultaneously from noisy measurements through the PODEn4DVar approach developed by Tian et al. (2014).
We initialized Tan-Tracker by running the GEOS-Chem twice to generate the assimilation inputs. The background run started from the previous best analysis C a forced by the prior CFs F a b to determine the background CO 2 concentration fields C b . This was used to prepare the rth background joint vector ( b , C b ) T , where b is the scaling factor between the prior CFs and the updated CFs. The sampling run followed the background run but with a different run length and adopted a 4D moving sampling strategy (Wang et al. 2010) to produce the joint vector ensemble ( m , C m ) T . This is explained in detail in Tian et al. (2014).
The CF dynamical sub-model, together with the CTM (GEOS-Chem), shapes the Tan-Tracker dynamical model: Here, the flux persistence-forecasting model denotes M CF = I (I, identity matrix), which is chosen as the CF evolution dynamical sub-model (based on Peters et al. 2005).
The Tan-Tracker system assimilates X CO 2 directly using the following observation operator (Equation (2)), which explains the link between the observational variable X CO 2 and GEOS-Chem simulated 3D CO 2 concentrations (Feng et al. 2009;Tian et al. 2014): where h is a pressure weighting function, A is the full averaging kernel matrix, C a is the prior CO 2 profile, X CO 2 ,a is the associated column mole fraction, and C m is the model-calculated CO 2 profile. We generated ensemble simulated observations, C o m , and background simulated observations, (2) X CO 2 = X CO 2 ,a + h T A C m − C a , around the world and the records on average are shorter than 30 years. Due to these limitations, it is difficult to analyze spatiotemporal distributions of CO 2 using only surface observations (Reuter et al. 2011). Recently, satellite data have become more widely used by researchers due to the increased spatial and temporal resolution of satellite observations from the Greenhouse gases Observing SATellite (GOSAT) and the Orbiting Carbon Observatory 2 (Yokota et al. 2009;Nassar et al. 2010;Cogan et al. 2012). Previous studies have used global or regional Chemistry Transport Models (CTMs) to obtain continuous simulated spatiotemporal features of global atmospheric CO 2 concentrations and CFs (Cogan et al. 2012;Chen, Zhu, and Zeng 2013). However, there is still an essential and urgent need to improve model simulation efficiency and accuracy.
Carbon cycle data assimilation systems are promising new tools for precisely simulating atmospheric CO 2 concentrations and CFs. These systems tend to yield estimates of CO 2 surface flux by combining information from both CTM simulations and atmospheric CO 2 observations (Peters et al. 2005;Tian et al. 2014). Peters et al. (2005) developed the Carbon-Tracker data assimilation system, which was coupled to the Tracer Model, version 5, using an ensemble square root filter to assimilate surface CO 2 concentration observations. The CF inversion results from the Carbon-Tracker system were consistent with the majority of carbon inventories reported by the first North American State of the Carbon Cycle Report (Peters et al. 2005). Zhang et al. (2014) assimilated CONTRAIl data with the Carbon-Tracker system and dramatically improved the accuracy of simulated concentrations while reducing the uncertainty of CFs in Asia by 20%. However, the accuracy of this simulation was largely dependent on the algorithm, the dynamical model, and surface observations. Due to the limitations of the algorithm, the deficiency of a suitable dynamical model and insufficient surface observations, attempts to construct an efficient and accurate assimilation system still represent a challenging task. Tian et al. (2014) reported a new CF data assimilation system (Tan-Tracker) that was developed by incorporating a joint PODEn4DVar (Tian, Xie, and Sun 2011) assimilation framework into the GEOS-Chem model (V9-01-03; Suntharalingam et al. 2004;Nassar et al. 2010). In their study, an identity operator was chosen as the CF dynamical model to describe CF evolution, and then this CF dynamical model was utilized to create an augmented dynamical model using the GEOS-Chem atmospheric transport model. Therefore, the large-scale state vector of both CFs and CO 2 concentrations is set as the prognostic variable, which is simultaneously constrained by assimilation of atmospheric CO 2 concentration observations. Using this to the ensemble CO 2 concentration, C m , and background CO 2 concentration, C b , respectively.
At this point, the background joint vector ( b , C b ) T , the joint vector ensemble ( m , C m ) T , the background simulated observations (C o b ), the ensemble simulated observations (C o m ) and the real CO 2 concentrations (C obs ) have been determined and can be inputted into the PODEn4DVar assimilation processor to yield the best analysis joint vector ( a , C a ) T and optimized CFs (F a = a F * ). The best analysis is the initial condition for the next assimilation cycle.

Real-data assimilation experiment with retrieved satellite data (X CO 2 )
The efficiency and accuracy of the Tan-Tracker data assimilation system was comprehensively evaluated using a well-designed real-data assimilation experiment applied to data from 2010. The setup of the experiment and results are described below.

Experimental setup
The global 3D chemical transport model GEOS-Chem (version: 9-01-03) was used to simulate the atmospheric CO 2 concentration. This version of GEOS-Chem has a horizontal resolution of 2° × 2.5° (latitude × longitude) and 47 hybrid eta levels up to 0.01 hPa, which is driven by GEOS-5 meteorological fields. These are assimilated meteorological data from the GEOS of the NASA Global Modeling and Assimilation Office.
The original GEOS-Chem CO 2 simulation was designed by Suntharalingam et al. (2004) and updated by Nassar et al. (2010). The prescribed CO 2 fluxes used in the model include monthly fossil fuel burning and cement production CO 2 emissions, monthly biomass burning, climatological biofuel burning, monthly ocean exchange, three-hourly biospheric fluxes, annual climatological terrestrial biosphere exchange, chemical production of CO 2 from the atmospheric oxidation of other carbon species, and monthly emissions from shipping and aviation CO 2 emissions. A detailed description of the basic model input data can be found in Tian et al. (2014).
The experiment began with a two-year spin-up from 1 January 2008, with a globally uniform 3D CO 2 concentration field of 383.76 ppm. This was used since the annual mean CO 2 concentration in 2007 was 383.76 ppm at Mauna loa, which is the marine surface site of the NOAA-ESRl (http://www.esrl.noaa.gov/gmd/obop/mlo/). This twoyear spin-up is a key step to maintaining mass and energy balance, while also allowing model transport, sources, and sinks to develop global spatial patterns. After the spin-up, the acquired CO 2 concentration field is used to drive the real-data assimilation experiment.
The space-borne observations used in our experiment were retrieved from GOSAT, which was launched in 2009. We chose the GOSAT level 2 data; that is, the X CO 2 . This can be found in the Atmospheric CO 2 Observations from Space (ACOS) data product, version 3.3. We applied the recommended data screening criteria and bias correction technique based on the 'ACOS level 2 Standard Product Data User's Guide' (http://disc.sci.gsfc.nasa.gov/acdisc/ documentation/ACOS_v3.3_DataUsersGuide.pdf ).
In order to guarantee high-quality assimilated data, we only retained X CO 2 data with standard error deviations of less than 1.0 ppm. We initially performed CO 2 simulations from 1 January 2010 to 31 October 2010 without assimilation (referred to as 'Sim'), and then we performed the CO 2 simulation with the Tan-Tracker assimilation system (referred to as 'TT') form 1 January 2010 to 31 October 2010. Due to the five-week lag window within the assimilation window, 13 months' worth of data from November 2009 to November 2010 were used to ensure continuity.

Results and discussion
The global distribution of GOSAT-retrieved satellite data, X CO 2 , after quality control, demonstrated obvious spatiotemporal distribution patterns, such as lower concentrations of CO 2 at high latitudes in contrast with higher concentrations in equatorial regions. High concentrations of CO 2 (approximately 388-391 ppm) were found in the NH, while CO 2 concentrations in the SH were lower (approximately 385-388 ppm). The temporal distribution of CO 2 concentrations in the NH clearly followed a seasonal cycle, increasing with time and reaching a peak concentration of 388 ppm at the end of April or beginning of May prior to the maturation period that began in June, and then reducing with time to approximately 382 ppm in late autumn before rising again.
To examine the performance of Tan-Tracker, the X CO 2 derived from Sim (referred to as X CO 2 ,Sim ) and TT (referred to as X CO 2 ,TT ) were compared with GOSAT observations. The CO 2 concentrations obtained from Sim and TT were converted into X CO 2 , i.e., X CO 2 ,Sim and X CO 2 ,TT , by first calculating CO 2 profiles that were the same levels as GOSAT data profiles, and then calculating X CO 2 using Equation (2). The distributions of X CO 2 ,Sim and X CO 2 ,TT at GOSAT satellite geographic coordinates are illustrated in Figures  1 and 2, respectively, which display data from January, April, July, and October 2010. Both of the distributions had spatiotemporal patterns that were comparable to the observed data. As shown in Figure 3, high deviations between the observations, X CO 2 ,Obs , and the simulations, X CO 2 ,Sim , can be seen in northern Africa (Sahara), the Indian peninsula, southern Africa, southern North America, and western Australia, within the range of 2-6 ppm ( Figure  Figure 1. monthly distributions of Geos-chem simulated X CO 2 data (X CO 2 ,Sim ) at GosAt satellite geographic coordinates over January, April, July, and october 2010.  . monthly distributions of the difference between retrieved GosAt satellite X CO 2 data (observation; X CO 2 ,Obs ) and the Geos-chem simulated X CO 2 data (sim; X CO 2 ,Sim ), calculated as X CO 2 ,Obs − X CO 2 ,Sim , at GosAt satellite geographic coordinates over January, April, July, and october 2010. Figure 4. monthly distributions of the difference between retrieved GosAt satellite X CO 2 data (observation;X CO 2 ,Obs ) and the tan-tracker assimilated X CO 2 data (tt; X CO 2 ,TT ), calculated as X CO 2 ,Obs − X CO 2 ,TT , at GosAt satellite geographic coordinates over January, April, July, and october 2010. hybrid assimilation approach (PODEn4DVar), as part of the preparation for the launch of the Chinese CO 2 observation satellite, TanSat (liu et al. 2012;Tian et al. 2014). The spatiotemporal distributions of the simulated data, X CO 2 ,model , with and without assimilation, X CO 2 ,TT andX CO 2 ,Sim , were compared with GOSAT X CO 2 data (X CO 2 ,Obs ) at each satellite scan position. The results demonstrate that assimilation markedly reduces the RMSE and slightly improves the CC. Overall, our real-data assimilation experiment demonstrated that the Tan-Tracker system performs well when implementing GOSAT data assimilation.
3). The high levels of uncertainty at these places were predominantly caused by the deficiency of observations and insufficient knowledge of biophysical and physical processes. The biases between GEOS-Chem and GOSAT found in this study are similar to those reported by Cogan et al. (2012). Relative to the differences shown in Figure  3, the biases of X CO 2 ,Obs and X CO 2 ,TT decrease dramatically to about 1 to 4 ppm, especially in Africa (Sahara) and the Indian peninsula (Figure 4), indicating that assimilating GOSAT observations can effectively eliminate spatial and temporal variation between the model and observations. The RMSE and correlation coefficients (CCs) between monthly time series of X CO 2 ,model (X CO 2 ,Sim and X CO 2 ,TT ) and X CO 2 ,Obs are shown in Figure 5. The RMSE between X CO 2 ,Sim and X CO 2 ,TT ranged from 1.75 to 3.0 ppm and the CC ranged from 0.25 (November) to 0.85 (April), both having an annual cycle. Relative to only a slight increase in the CC, the RMSE between X CO 2 ,TT and X CO 2 ,Obs decreased sharply by 20% to 40% to about 1.25 to 1.75 ppm. These results illustrate that assimilating GOSAT satellite observations through the Tan-Tracker system tangibly improves the performance of model simulations by reducing spatial error, RMSE and improving the CC to yield more accurate 3D CO 2 concentrations and CFs.

Conclusions
This study assessed a Chinese carbon cycle data assimilation system (Tan-Tracker) by evaluating the performance during a real-data assimilation experiment. The Tan-Tracker system was initially developed based on an advanced Figure 5. taylor diagram of tan-tracker assimilated X CO 2 data (X CO 2 ,TT ; red circles) and Geos-chem simulated data (X CO 2 ,Sim ; blue points) compared with the observational data (X CO 2 ,Obs ) from January 2010 to december 2010. the y-coordinates are the rmse and the polar coordinates denote the correlation (cc).