Solar Wind Data Assimilation in an Operational Context: Use of Near‐Real‐Time Data and the Forecast Value of an L5 Monitor

For accurate and timely space weather forecasting, advanced knowledge of the ambient solar wind is required, both for its direct impact on the magnetosphere and for accurately forecasting the propagation of coronal mass ejections to Earth. Data assimilation (DA) combines model output and observations to form an optimum estimation of reality. Initial experiments with assimilation of in situ solar wind speed observations suggest the potential for significant improvement in the forecast skill of near‐Earth solar wind conditions. However, these experiments have assimilated science‐quality observations, rather than near‐real‐time (NRT) data that would be available to an operational forecast scheme. Here, we assimilate both NRT and science observations from the Solar Terrestrial Relations Observatory (STEREO) and near‐Earth observations from the Advanced Composition Explorer and Deep Space Climate Observatory spacecraft. We show that solar wind speed forecasts using NRT data are comparable to those based on science‐level data. This suggests that an operational solar wind DA scheme would provide significant forecast improvement, with reduction in the mean absolute error of solar wind speed around 46% over forecasts without DA. With a proposed space weather monitor planned for the L5 Lagrange point, we also quantify the solar wind forecast gain expected from L5 observations alongside existing observations from L1. This is achieved using configurations of the STEREO and L1 spacecraft. There is a 15% improvement for forecast lead times of less than 5 days when observations from L5 are assimilated alongside those from L1, compared to assimilation of L1 observations alone.

• Solar wind data assimilation needs to perform well with near-real-time (NRT) data for it to be used operationally for space weather forecasting • Despite lower data quality, solar wind speed forecasts based on NRT data are comparable to those based on science-level data • Assimilation of L1 and L5 data gives forecast error improvement of 15% for lead times up to 5 days over assimilation of only L1 data Correspondence to: H. Turner,h.turner3@pgr.reading.ac.uk Citation: Turner, H., Lang, M., Owens, M., Smith, A., Riley, P., Marsh, M., & Gonzi, S. (2023). Solar wind data assimilation in an operational context: Use of near-real-time data and the forecast value of an L5 monitor. Space Weather, 21, e2023SW003457. https://doi. org/10.1029/2023SW003457 storms, is driven by coronal mass ejections (CMEs), which are huge eruptions of coronal material and magnetic field from the Sun (Webb & Howard, 2012). These propagate through the background solar wind, meaning ambient conditions can impact the CME speed and arrival time at Earth (Cargill, 2004;Case et al., 2008;Riley & Ben-Nun, 2021). Although severe space weather causes the largest impacts, the effect of mild and moderate space weather also causes a considerable economic impact, with estimates of effects on the power grid over the EU and US costing USD1.3-2.1 trillion over a century (Schrijver, 2015). With extreme space weather relying less on the background solar wind conditions, the largest improvements in forecasting is expected for mild to moderate space weather events.
Forecasting near-Earth solar wind conditions can be achieved using simple in situ observation-based methods, such as corotation (e.g., M. J. Owens et al., 2013;Thomas et al., 2018;Turner et al., 2022), or data driven methods . These approaches generally do not capture transient solar wind structures, such as CMEs, and only estimate the solar wind at a single point in space. Global solar wind conditions can be forecast on the basis of remote solar observations. Photospheric magnetic field observations are used to constrain semi-empirical (e.g., Wang-Sheeley-Arge, Arge et al., 2003) and more physics-based (e.g., Magnetohydrodynamics Around a Sphere, Linker et al., 1999) models of the corona. The solar wind conditions at the top of the corona can then be propagated to Earth (and beyond) using solar wind models. This is typically achieved with numerical magnetohydrodynamic (MHD) models (e.g., Merkin et al., 2016;Odstrcil, 2003;Riley et al., 2001;Tóth et al., 2005), though reduced-physics approximations can provide a complementary, computationally efficient, approach (Heliospheric Upwind eXtrapolation (HUX), Riley and Lionello (2011); M. J. Owens and Riley (2017), and HUXt, M.
Owens (2020)). CME-like disturbances can be introduced at the lower boundary of the solar wind model based on the CME characteristics observed in coronagraph observations (Odstrcil et al., 2004;Zhao et al., 2002). Once ambient and CME inner boundary conditions are supplied to the solar wind models, there are no further observational constraints on the model evolution.
Data assimilation (DA) combines model output and observations to form an optimum estimation of reality. It has led to huge improvements in terrestrial weather forecasting (Migliorini & Candy, 2019), however has not been fully utilized for solar wind forecasting. The Burger Radius Variational Data Assimilation (BRaVDA) scheme (Lang & Owens, 2019) makes use of in situ observations from spacecraft in both near-Earth space and from other locations within the heliosphere. It has been shown to significantly improve the model representation of the ambient solar wind, which is expected to translate to similar forecast gains (Lang et al., 2021). However, all experiments using BRaVDA so far have been carried out using "science-level" data which has been processed on the ground and is often not made available for weeks or months after the observation date. For solar wind DA to be used operationally to produce timely space weather forecasts, it must be able to perform well with near-real-time (NRT) data. NRT data often includes erroneous results, data gaps, and sometimes systematic biases; a lot of which gets corrected in the subsequent data processing stage. Figure 1 shows 1 month of NRT and science-level solar wind speed data from 2012/04/01 to 2012/05/01 for Advanced Composition Explorer (ACE, Stone et al., 1998), Solar Terrestrial Relations Observatory (STEREO) (Kaiser et al., 2008) Ahead (STEREO-A) and Behind (STEREO-B) spacecraft. Similarly, Figure 2 shows 1 month of data from the Deep Space Climate Observatory (DSCOVR, Burt & Smith, 2012) spacecraft from 2017/07/01 to 2017/08/01. There are numerous features that show the differing quality between the NRT and science level data; for example, the step changes in the ACE NRT data, increased noise in the STEREO-B NRT data and large spikes and data gaps in the DSCOVR NRT data (Smith et al., 2022). In this study, we assess the performance of the BRaVDA scheme using archived NRT data for three time periods; 2009/08/01 to 2011/02/01, 2012/04/01 to 2013/10/01, and 2017/07/01 to 2019/01/01. The first interval covers the 18 months up to the effective boundary between solar minimum and solar maximum, whereas the second interval is during solar maximum. These were selected for their solar cycle location, whereas the final interval was an arbitrary 18-month period once the DSCOVR spacecraft was operational.
Future deployment of an operational DA scheme would aim to exploit observations from Vigil (Luntama et al., 2020), a planned space weather monitoring mission at the L5 Lagrange point, approximately 60° behind Earth in heliospheric longitude. Alongside data from a monitor at L1, for example, DSCOVR, this could form a framework for solar wind speed forecasting using DA. Using configurations of observations from STEREO and from near-Earth, we can approximate the future pairing of L5 and L1 monitors. Here, we test the performance of BRaVDA using NRT and science-level observations from spacecraft that are separated by approximately 60° in longitude to simulate an operational L5 solar wind monitor. We can then assess what forecast advantage we can expect from a future mission pairing.

of 17
The data used in this work are described in Section 2 and the methods in Section 3. The results and discussion are in Section 4 and the conclusions in Section 5.

Data
All data (NRT and science-level) are averaged to an hourly resolution using a boxcar technique with no minimum requirement for the number of data points. This is a good approximation for solar wind speed due to its high autocorrelation (Lockwood et al., 2019).

STEREO Data
The STEREO mission was designed to provide a unique viewpoint of ejecta from the Sun and is comprised of two spacecraft; STEREO ahead (STEREO-A) and STEREO behind (STEREO-B) (Kaiser et al., 2008   https://stereo-ssc.nascom.nasa.gov/data/beacon/ and science-level data from https://cdaweb.gsfc.nasa.gov/. Solar wind speed is measured using the Plasma and Suprathermal Ion Composition instrument, which provides in situ solar wind and ion observations (Galvin et al., 2008). The science data is level 2 processed data. The beacon data is provided in a continuous broadcast mode, at 1-min resolution. For use in BRaVDA, this must be lightly processed so that any unphysical values are removed and the data is on the correct time step. As the input data used in BRaVDA is at an hourly cadence, the NRT data is averaged accordingly. This essentially interpolates over any data gaps that are less than an hour long; if there is a single 1-min value in an hour interval then this will be taken as representative for that hour. Although this technique would not be suitable for other parameters, such as magnetic field direction, it is expected to be an adequate solution for solar wind speed, which has a long auto-correlation time (Lockwood et al., 2019). The NRT data has a typical latency of less than 10-min (Biesecker et al., 2008), which would not cause issues for use operationally, as the DA makes use of hourly averages.
The bottom two panels of Figure 1 show an example of 1 month of data from STEREO-A and STEREO-B. The middle panel shows the STEREO-A data, with NRT in red and science data in black, and in general there is a very good agreement between the two time series. However, the STEREO-B NRT data in the bottom panel shows much greater variability in time compared to the science data. The data plotted is at an averaged hourly resolution, meaning that a large amount of noise must have already been filtered out through this averaging.
The greater variability is also demonstrated in Figure 3, with the STEREO data in the bottom two rows.
Here we have 2D histograms of NRT against science observations, with the color representing the density of observations on a log scale. The three time intervals used in this study are shown; 2009/08/01 to 2011/02/01, 2012/04/01 to 2013/10/01, and 2017/07/01 to 2019/01/01, the choice of which is described in Section 4. The STEREO-A NRT data showed periods of low solar wind speed, as shown in the left hand panel of the middle row in Figure 3. This is data from the period of time from October 2009 to January 2010, as shown in more detail in Figure 4. There is a gradual worsening of the relationship between the NRT and science-level observations, before this is resolved and the relationship returns to lie approximately along y = x. Although the cause of this is unknown, it provides a useful test for the DA to see how data quality affects the resulting forecasts. The later two time periods show a good relationship between NRT and science data.
The greater variability in the STEREO-B NRT data shown in Figure 1 can also be seen in the greater spread about the y = x line in the bottom row of Figure 3. For the intervals shown, the average standard deviation of the difference between the science and NRT observations is 29.1 kms −1 , compared to 13.0 and 23.3 kms −1 for ACE and STEREO-A respectively. This is due to a known issue with the detector and is present for the whole operational lifetime of STEREO-B. This issue is resolved in the processing of the data on the ground that produces the science-level data.

ACE Data
The ACE was launched in August 1997, with the mission aiming to investigate the composition of solar wind plasma at the L1 Lagrange point. The spacecraft carries a suite of instruments, including the Solar Wind Electron, Proton and Alpha Monitor and the Real Time Solar Wind monitoring system (RTSW) (Stone et al., 1998). SWEPAM char acterizes the bulk flow of the solar wind through measurement of electron and ion distribution functions in 3 dimensions (McComas et al., 1998). This is then available as 1-hr science level 2 data through CDAWeb at https://cdaweb.gsfc. nasa.gov/. The RTSW experiment also continually transmits a feed of NRT data that can provide a warning of solar wind conditions to arrive at Earth up to 1 hr later (Stone et al., 1998). This data is available from NASA's Community Coordinated Modeling Centre at https://ccmc.gsfc.nasa.gov/requests/GetInput/get_ace_K.php.
The NRT and science-level data from ACE agree very well. As the top panel in Figure 1 shows, there are some features where the NRT data is constant and then steps back down to the science data. The cause of this is unknown, however, as Figure 3 shows, the observations mostly lie close to the y = x line and so overall there is good agreement.
The NRT data has a typical latency of less than 5 min, which is not expected to cause any problems for an operational DA scheme.

DSCOVR Data
The DSCOVR was launched in February 2015 to the L1 Lagrange point. The mission was launched to succeed ACE and to aid the National Oceans and Atmosphere Administration in real-time monitoring of space weather. For this study, data from the PlasMag instrument was used, which is comprised of a magnetometer, Faraday cup and a top-hat electron electrostatic analyzer. Here, we make use of the observations from the Faraday cup, which measures the solar wind velocity, density and temperature. Both the NRT and science-level (level 2) data is available through the DSCOVR Space Weather Data Portal at https://www.ngdc.noaa.gov/dscovr/portal/index.html#/. As Figure 2 shows, the NRT data shows erroneous spikes in solar wind speed. This is due to periods of very low solar wind density, meaning that the Faraday cup cannot accurately measure the solar wind speed (Loto'aniu et al., 2022). There is mostly a good agreement between the NRT and science data ( Figure 5).
Similarly to ACE, the NRT data latency for DSCOVR is not expected to cause any problems for an operational DA scheme.

BRaVDA and Forecast Generation
A complete description of the BRaVDA methodology can be found in Lang and Owens (2019) and the code is available at https://zenodo.org/record/7892408#.ZFJ8o3bMK3A. Here, we provide a brief overview of the scheme. BRaVDA combines in situ solar wind speed observations with the steady-state "HUX" model, based on Riley and Lionello (2011). BRaVDA maps information contained within in situ observations, typically at 1 AU, back to the model's inner boundary at 30 solar radii (R S ), where it is combined with the prior inner boundary condition. This prior is defined using output from the HelioMAS model Riley et al. (2001) at 30 R S . These model data are available at https://www.predsci.com/portal/home.php. The information is merged through the minimization of a cost function, which aims to find the optimum compromise between the prior information and the  observations, accounting for the uncertainties in both. Once the inner boundary at 30 R S is updated, this can then be propagated back out to 1 AU (and beyond) through the use of any solar wind model. For efficiency, HUX is used again for this stage. This produces an estimate of the solar wind over the 2 dimensional domain from 30 R S to the outer boundary, which here is set to 245 R S , to fully include the orbital radii of all spacecraft considered. The 2D plane considered here is the radius/longitude plane, located at the solar equator.
Note that previous work using BRaVDA (e.g., Lang et al., 2021;Turner et al., 2022) has made the implicit assumption that the observations made from the STEREO spacecraft were taken from 215 R S (1 AU) and the L1 observations are at 213 R S . In reality, this is not the case. As shown in Figure 6, Earth varies from 210 to 219 R S over the year, STEREO-A varies from 206 to 208 R S and STEREO-B varies from 215 to 234 R S . These variations are now included into BRaVDA, ensuring that the observations were taken from the correct orbital radius. Due to the highly correlated nature of the solar wind, this radial variation did not have a significant impact on the accuracy of the forecasts, however it is important to be as representative of the system as possible.
Forecasts are generated using the output from BRaVDA in the same way as Turner et al. (2022). (As archived data are used for this work, what we state here are forecasts are actually hindcasts. However, as these hindcasts are used to inform the performance we would expect from forecasts, we retain the use of the word "forecast" for simplicity.) In summary, BRaVDA is run on a daily cadence, which assimilates observations from the previous 27 days to produce a DA solution. Assuming steady state conditions, this can be corotated to produce a forecast for the subsequent 27 days. Here, forecasts are produced from assimilation of NRT and science-level observations, and both are verified against the science-level observations to assess their accuracy.

L5 Experiments
Future deployment of an operational solar wind DA scheme could make use of both observations from near-Earth space (e.g., from DSCOVR) and from the planned Vigil mission to L5. To test the performance of such a combination, we can use observations from pairs of spacecraft (STEREO-A, STEREO-B and ACE) that are approximately 60° apart in longitude. By using intervals of time where the spacecraft separation is between 50 and 70°, we produce four "L1-L5" analysis periods. These periods are shown in Table 1 and schematically in Figure 7. The spacecraft lagging with respect to solar rotation acts as the effective L5 monitor and the spacecraft leading with respect to solar rotation is the effective near-Earth, or L1, monitor. We can then assess the forecast performance at the leading spacecraft, as this would represent a forecast at Earth.

Results and Discussion
Here we conduct a number of experiments to investigate the impact of using NRT data on forecasts produced using DA. Here, the science-level  observations act as a verification time series for the forecasts to be compared against. The science-level data is also used to produce corotation forecasts, whereby observations are lagged depending on their longitudinal separation from the forecast location. Throughout, we assess the performance of forecasts produced using mean absolute error (MAE) as a function of forecast lead time. As a standard metric, MAE allows for easy comparison of the performance of different forecasts. However, caution must be taken with such "point-by-point" metrics, as they can be misleading with forecasts of markedly different quality, typically over-penalizing forecasts with small timing errors and under-penalizing forecasts with very low variance (M. J. Owens et al., 2005). In this study, the difference between NRT and science forecasts is generally expected to be a small quantitative change, rather than leading to a qualitatively different time series. For this reason, MAE is found to generally agree with the assessment gained by visual inspection. However, Section 4.1 highlights a case where MAE is inadequate to characterize the forecast performance in isolation.

Assimilation of Single and Multiple Spacecraft Observations
We first assimilate observations from a single spacecraft. We have observations from four sources; ACE, STEREO-  Each assimilation experiment is used to produce a forecast at Earth (black lines), a forecast at STEREO-A (red lines) and a forecast at STEREO-B where available (blue lines). Forecasts are verified against the science-level observations at the respective location. Here, and throughout the text, where Earth is used as a forecast verification, this is at the L1 point and so is using data from either ACE or DSCOVR, depending on the respective time period. Forecasts produced using science-level data are shown with a solid line and those using NRT data with a dashed line.
As Figure 8 shows, in general there is little difference between the real time and science forecasts produced using ACE and DSCOVR data. This means that assimilating these data in an operational setting would still produce forecasts of a similar skill to forecasts produced with science-level data.
There is more difference between forecasts based on NRT and science-level data when assimilating only STEREO data. Due to the issues with the STEREO-A beacon data described in Section 2.1 producing a systematic error in the NRT observations, we see a larger difference between the dashed and solid lines for all fore- For the 2009-to-2011 and 2012-to-2013 time intervals, the forecasts assimilating ACE and STEREO data shows the impact from the age of observations, whereby there is a large increase in forecast error when the forecast lead time exceeds the corotation time between the assimilated spacecraft and the forecast location. This is described in more detail in Turner et al. (2022). Figure 9 shows the simultaneous assimilation of ACE, STEREO-A and STEREO-B science-level (solid lines) and NRT (dashed lines) data, used for forecasts verified at Earth, STEREO-A and STEREO-B (black, red and blue respectively). Also included in this plot is the prior forecast, shown in the dotted line, and the L1 corotation forecast using science-level observations verified at Earth in the light gray shaded region. The prior forecast is the forecast produced from previous available information, before the DA is performed. In this case, the prior forecast is the HelioMAS solution from the photospheric magnetic field that is propagated radially outwards First, as Figure 9 shows, especially for Earth, it is clear that assimilating either NRT or science level observations offers a significant improvement in forecast skill from the prior state. Second, using L1 corotation (also known as recurrence or persistence) as a baseline forecast, whereby we lag the observations by 27 days and use them as a forecast, we also find an improvement over all lead times using DA. For the 2009-to-2011 and 2012-to-2013 intervals for Earth, L1 corotation gives MAEs of 68.9 kms −1 and 79.8 kms −1 respectively. Using DA also offers improvement over a forecast produced from L1 corotation as it reconstructs the whole domain between the Sun and Earth's orbital radius and provides an updated inner boundary condition that can be used in MHD models. This allows for the propagation of CMEs through the improved background solar wind, something which cannot be achieved through a simple corotation forecast. With CMEs being the main driver of severe space weather, this offers the opportunity to improve their forecasted speed and arrival time.
It can be seen that there is no major difference between the NRT and science forecasts for the earlier interval. Particularly for the 2009-2011 interval, it could be expected that the lowest MAE would be seen for forecasts at STEREO-A due to the other observations being closer in longitude behind the spacecraft (with respect to solar rotation). However, it is seen that the lowest MAE are seen for forecasts at Earth. The trends for both Earth and STEREO-A are similar, but there is a systematic offset due to different structures being encountered at the spacecraft over a limited time period. The difference is likely not meaningful due to this reason.
For the 2012-to-2013 interval, from a forecast lead time of approximately 10 days, the forecasts produced using NRT observations appear to perform better than those produced with the science-level observations. As demonstrated below, this improvement comes about due to the NRT-based forecasts producing a "flatter" solar wind speed time series that doesn't contain the full variability of the observations. Thus, if timing errors are present in both the science-level and NRT-based forecast, the science forecast would suffer greater penalty when assessed by MAE [e.g. Figure 1 of M. J. Owens (2018)]. This is demonstrated in Figure 10, where the number of high-speed events in the forecast time series using the science-level observations (black line) is greater than those using NRT observations (red line) for all lead times. Here, we define a high-speed event as having a solar wind speed greater than 500 km s −1 . This encapsulates both CMEs and fast solar wind streams. Both science-and NRT-based forecasts underestimate the number of high-speed events compared with observations, as expected as high-speed CMEs are not captured by the steady state DA.
The forecast characteristics can be displayed using a Taylor diagram, as shown in Figure 11, which summarizes the forecast MAE and linear correlation coefficient with the verification data, as well as the standard deviation of the forecasts. As forecasts improve, they move closer to the observation location, shown as a black star. It can be seen that the NRT and science forecasts group into two areas of roughly equal distance from the ideal forecast, but with the science forecasts having a standard deviation more representative of the observations. We can also see that there is an evolution of forecast MAE as the lead time increases, with the longer lead times producing forecasts with a higher MAE.

L5 Experiments
The future Vigil mission offers a chance for an operational DA scheme to make routine use of simultaneous L5 and L1 data. To test this scenario, we can use combinations of STEREO and ACE data during specific intervals to mimic such a pairing. The forecast at the effective L1 position can then be assessed, as that would be Earth in an operational setting. Four intervals (Table 1) were identified where the spacecraft longitudinal separation was between 50 and 70°, and BRaVDA was run with both NRT and science-level observations. Two sets of experiments were run; assimilating both effective L1 and L5 data and assimilating the effective L1 only. This allows the forecast gains from the L5 mission to be assessed. Figure 12 shows the forecast MAE variation with forecast lead time. The prior is shown in the solid black line, the L1 only assimilated observations in red and the L1 and L5 assimilated observations in blue. The assimilated science data is shown in the solid colored lines and the NRT data in dashed. Also shown on this plot are the L1 corotation forecast errors verified at the effective L1 spacecraft in the gray shaded region. These forecasts are made using the science-level observations. The forecasts produced using DA show similar forecast errors to L1 corotation, except in panel (d), where the error from corotation is similar to that of the prior. The time interval covered in panel (d), 2013/10/25 to 2014/02/09, is at approximately solar maximum, whereas the intervals in panels (a-c) are in solar minimum. This means that there are likely more CMEs observed during this time, which cannot be captured in corotation forecasts and would therefore lead to a larger forecast error.
We also compare the DA forecasts to those produced using corotation from L5. Due to the separation of the spacecraft, it takes approximately 5 days for the solar wind to corotate round from the effective L5 spacecraft to the effective L1 spacecraft. As a consequence, the forecast lead time is approximately 5 days, thus giving a L5 corotation forecast. This forecast produces a lower MAE than L1 corotation, due to the shorter amount of time through which the solar wind can evolve whilst the Sun rotates from observation point to forecast point. The darker gray shaded region in Figure 12 shows the MAE from L5 corotation for each associated time interval. For panel (a), the DA outperforms L5 corotation in both instances. For panels (b) and (d), assimilation of L1 and L5 gives the lowest error, whereas L5 corotation outperforms assimilation of L1 only. In panel (c), L5 corotation gives the lowest MAE. Although DA offers no significant improvement over L5 corotation purely through MAE, its advantages come from the reconstruction of the whole domain and updating of the inner boundary condition. This means that it can be used to inform and improve MHD models and also allows for CMEs to be propagated through an updated background solar wind. This could lead to improved CME arrival and speed predictions.
The NRT and science-level observations have very similar forecast errors, with no major difference between the solid and the dashed lines. There is one exception; assimilating only STEREO-A NRT as the effective L1. This forecast shows a larger MAE of approximately 10 kms −1 , as this interval contains the period of time where there is much lower solar wind speeds in the NRT data when compared with the science-level data, as shown in Figure 4.
In general, it can be seen that the assimilation of both L5 and L1 does not offer a large forecast gain for forecast lead times greater than 4-5 days. However; for less than 5 days, the assimilation of L1 and L5 is 9.0 ± 1.1 kms −1 lower in MAE. This is because the corotation time associated with 60 degrees of separation is 4.5 days. Thus the effective age of observations increases significantly after around 4 days, as discussed in Turner et al. (2022).
To further summarize these results, we average the four panels in Figure 12 to give Figure 13, which shows the improvement in the first 5 days of forecast lead time more clearly. Comparing the assimilation of only L1 and of both L1 and L5 against the forecast using the prior information, we can see significant improvements, with a percentage decrease (absolute difference), averaged over all lead times, of 42.7 ± 3.3% (44.5 ± 3.5 kms −1 ) and 46.3 ± 3.3% (48.2 ± 3.4 kms −1 ) respectively. Over all lead times, inclusion of L5 in the assimilation provides a 6.2 ± 1.7% decrease (3.7 ± 1.0 kms −1 ) in MAE from assimilating only L1. However; in the first five days of forecast lead time, there is a 15.1 ± 1.8% (9.0 ± 1.1 kms −1 ) decrease when including L5 data. This is compared to a 4.1 ± 1.6% (2.5 ± 1.0 kms −1 ) decrease for lead times greater than 5 days.
As Figure 13 shows, assimilation of both the science and NRT observations for both L1 only and L1 and L5 performs better than corotation from L1. Only assimilation of L1 and L5 together performs better than corotation from L5. However, as discussed above, the DA offers improvements over simple corotation due to it updating the whole domain and for allowing the propagation of CMEs through its output. Figure 14 summarizes the prior and NRT forecast metrics in a Taylor diagram. The forecasts from the prior information are shown in black, assimilation of L1 and L5 NRT data in blue and only L1 NRT in red. Three lead times are shown; 3 days represented with a circle, 10 days with a square and 15 days with a triangle. The observation metrics are shown with a black star. L1 corotation is shown with the cyan star and L5 corotation with the cyan plus.
We can see that assimilating L1 and L5 reduces the variability (standard deviation, blue axis) compared to just L1, so there is not much of an improved forecast for lead time greater than 5 days, despite the lower MAE (purple axis). However; for lead times less than 5 days (the blue circle), despite the correlation and standard deviation remaining similar to the other forecasts, there is a genuine improvement in the MAE when including L5 data.

Conclusions
In this study we have assessed the performance of the BRaVDA scheme with NRT observations from the STEREO, ACE, and DSCOVR missions. Previous work has been based on the pre-processed, science-level data, but for a solar wind DA scheme to be used operationally it must perform well with NRT data. The forecasts using NRT observations were verified against the science observations, as they are assumed to best represent reality.
The NRT STEREO observations were found to be more problematic. In the NRT STEREO-A observations, a period of approximately 3 months at the end of 2009 had anomalously low NRT values compared to the science-level data. This problem gradually worsened over the 3 months before the NRT values returned close to the science-level observations in 2010/01. The effect of this was seen in the comparison between the DA-forecasts produced using the NRT and science observations, whereby the NRT forecasts have a greater MAE of approximately 10 kms −1 . This problem does not occur in the later two periods, showing that the quality of the observations needs to be continually assessed so that issues can be addressed in a timely manner. From a straight comparison between NRT and science data, it is not obvious what will cause a problem in the assimilation. So it is important to periodically assess the forecast quality by checking previous NRT forecasts against newly made forecasts using science-level data once it is available.
The STEREO-B NRT observations contain a large amount of noise (i.e., high frequency variations) at roughly the hour timescale compared to the science-level observations. As a result, the STEREO-B NRT data produces an inferior forecast in regards to MAE at the position of STEREO-B itself. At other spacecraft locations, however, there is little difference between NRT and science-level forecasts. The reasons for this difference are not obvious, but may be due to the specific solar wind conditions due to these relatively short intervals. However, as the STEREO-B example shows, despite a slight worsening of the forecast error in one instance, the DA copes well with random errors. In the case of STEREO-B, these were large and of the order of 50 kms −1 on an hour timescale. Comparing this to the systematic error seen in a few months of the STEREO-A data, we see that this produces a systematic error in the forecast. This is due to the assumption of a non-biased prior and observations in the formation of the DA framework in general. This is well-known and accounted for in numerical weather prediction (D. P. Dee, 2006;D. Dee & Uppala, 2008), and bias correction methodologies have been developed. This is an area of active research, which this study falls within, and will be improved in future versions. DA can be used to correct and identify biases in input data, whereas corotation cannot.
BRaVDA was also tested with assimilation of multiple spacecraft observations from ACE, STEREO-A and STEREO-B, for both science and NRT. It was found that assimilation of both science and NRT observations performed better that the prior forecasts (i.e., without DA). Comparing these against a benchmark forecast of L1 corotation, we also see an improvement when using DA. As DA updates the entire domain, rather than a single point forecast that is produced from corotation, its solution can be used to initialize MHD models and allows for the propagation of CMEs through its output. This is not possible using corotation, thus the DA forecast model framework adds significant value to solar wind forecasting. Figure 14. Taylor diagram of selected lead time for the prior forecasts (black), L1 near-real-time NRT only forecasts (red) and L1 and L5 NRT forecasts (blue). 3-day lead time is shown with a circle, 10-day with a square and 15-day with a triangle. The cyan star shows the L1 corotation forecast and the cyan plus shows the L5 corotation forecast, both averaged over the four intervals. The observation metrics are shown with a black star. Note that the red circle is overlaid by the red square.
The future mission to the L5 Lagrange point, Vigil, offers the possibility of an operational DA scheme utilising routine NRT data from two vantage points. It is hoped that this will lead to large improvements in solar wind forecasting, but has not been tested from a DA perspective. For this purpose, we used BRaVDA with pairs of the STEREO spacecraft and ACE when they were separated in longitude between 50 and 70°. The forecast was assessed at the effective L1 spacecraft (i.e., 50-70° ahead with respect to solar rotation) to mimic a forecast at Earth. It was found that the NRT observations produce forecasts that are not significantly different to those created with the science-level observation. When these four intervals are averaged together, there is very little difference between the NRT and science forecasts. However, there is a significant improvement when compared to an example of a prior forecast. There is an average improvement of 46.3 (±3.3)%, showing that DA could offer large improvements to solar wind speed forecasting.
The assimilation of effective L1 and L5 observations was compared against assimilation of effective L1 only. We find improvement from L1 corotation, for both L1 only and L1 and L5, and a similar forecast error to L5 corotation for similar lead times for L1 and L5. As stated above, the DA offers value over corotation as it allows for the whole domain to be updated and for the propagation of CMEs. Although including the L5 observations did not provide a large improvement over L1 only for forecast lead times of more than 5 days, it did offer a 15.1 (±1.8)% decrease in forecast MAE for lead times less than 5 days. This lead time is of great interest for space weather forecasting, and so the future mission to L5 could be a step forward for solar wind forecasting capability, if solar wind DA is used operationally to exploit these observations.

Data Availability Statement
STEREO science data were downloaded from the CDAWeb Data Explorer portal at https://cdaweb.gsfc.nasa. gov/ and STEREO NRT data from https://stereo-ssc.nascom.nasa.gov/data/beacon/. ACE science data were also downloaded from CDAWeb and the NRT data from NASA's Community Coordinated Modelling Centre at https://ccmc.gsfc.nasa.gov/requests/GetInput/get_ace_K.php. Both DSCOVR science and NRT data were downloaded from the DSCOVR Space Weather Data Portal at https://www.ngdc.noaa.gov/dscovr/portal/index. html#/. The code for BRaVDA is available at https://zenodo.org/record/7892408#.ZFJ8o3bMK3A. Helio-MAS output can be found on the Predictive Science website at https://www.predsci.com/portal/home. php.