The effective use of anchor observations in variational bias correction in the presence of model bias

In numerical weather prediction, satellite radiance observations have a significant impact on forecast skill. However, radiance observations must usually be bias‐corrected for the satellite data to have a positive impact. Many operational centres use variational bias correction (VarBC) to correct the observation biases, but VarBC assumes that there is no model bias within the system. As model biases are often non‐negligible, unbiased observations (anchor observations) are known to play an important role in VarBC to reduce the contamination of model bias. However, more work is needed to understand what properties the network of anchor observations needs to have to reduce most contamination of model bias. We derive analytical expressions to show the sensitivity of the bias correction to the anchor observations and the expected value of the error in the analysed bias‐correction coefficients. We find that the precision and location of the anchor observations are important in reducing the contamination of model bias in the estimate of observation bias. Anchor observations work best at reducing the effect of model bias when they observe the same state variables as the bias‐corrected observations. When this is not the case, strong background‐error correlations become more important, as they allow more information about the model bias to be passed from the anchor observations to the bias‐corrected observations. The model bias observed by both the biased and anchor observations must be similar, otherwise the anchor observations cannot reduce the contamination of model bias in the observation‐bias correction. These results show that, in operational systems, regions with sparse anchor observations could be more susceptible to model biases within the radiance observation‐bias corrections. We demonstrate these results in a series of idealised numerical experiments that use the Lorenz 96 model as a simplified model of the atmosphere.


INTRODUCTION
In numerical weather prediction (NWP), the models that describe the evolution of the atmosphere are chaotic, which means that errors in the initial state will lead to larger errors in the forecast. To minimise errors in the initial state, it is important to incorporate observations, so that we have a more complete picture of the full system. Data assimilation (DA) is used to combine measurements from observations and previous forecasts (which we call the background state) to find the best estimate of the state of the atmosphere at the start of the forecast. Observations from satellites are very important for NWP, as they have a large impact on improving NWP forecasts (English et al., 2013;Eyre et al., 2022). In particular, microwave and infrared hyperspectral radiances have a higher impact on NWP forecasts than other observation types (Candy et al., 2021). However, most radiance observations need to be bias-corrected (Dee and Uppala, 2009), and this is now done in most operational centres such as the Met Office (Cameron and Bell, 2018) and European Centre for Medium-Range Weather Forecasts (ECMWF: (Han and Bormann, 2016) by variational bias correction (VarBC), a system that calculates the bias correction adaptively and can update the bias-correction coefficient at each cycle (Auligné et al., 2007).
In addition to observation biases, biases can also be present as systematic errors in the model, into which the observations are assimilated. Model biases come from systematic errors in the formulation of the models, which approximate the physics and dynamics of the (real) atmosphere. Some operational centres-for example, ECMWF (Laloyaux et al., 2020a)-use weak-constraint four-dimensional variational assimilation (WC4DVar) to reduce model biases in the stratosphere. This has recently had more success, due to the separation of scales between model bias (large-scale) and initial condition errors (smaller scale), as well as a large number of radio occultation observations available in the stratosphere, which can be assumed to be unbiased. WC4DVar works by including a forcing parameter within the model trajectory, which represents the model bias. The cost function is then minimised as a function of both the state and the new model bias parameter.
WC4DVar is not used in all NWP centres, due to its technical difficulty and the resources needed to implement it. Therefore model bias is often not accounted for explicitly within VarBC. If model biases are left within VarBC, then the observations will be corrected towards the model bias, as the system struggles to separate sources of bias (Talagrand, 1998). Therefore, as with WC4DVar, unbiased observations need to be used to anchor the system so that the magnitude of the analysis bias is smaller than that of the forecast model. We call these observations that are not bias-corrected "anchor observations" (Eyre, 2016).
Currently, several types of observations are used as anchor observations within NWP systems. Radiosondes have been used as observations in NWP for over seven decades, measuring upper-air temperature, pressure, humidity, and wind. They are mainly located over land, but some are launched from ships to get some coverage over oceans. Most radiosondes have a bias due to solar and/or infrared radiation, but these are corrected prior to data assimilation so that the data in an assimilation system are considered to be unbiased (Sun et al., 2013). Radio occultation (RO) measurements provide information about temperature in the stratosphere and upper troposphere with good spatial coverage across the Earth. They do not need to be bias-corrected, as any biases occurring in the bending angles are small in comparison with other biases within the DA system, hence RO observations work well as anchor observations (Healy and Thépaut, 2006;Cucurull et al., 2014). Some radiance observations are not bias-corrected using VarBC, so that they can be used as anchor observations, as it is assumed that the model bias is much bigger than the observation bias in the variables that they observe, although these observations may still be bias-corrected using a static bias-correction prior to use in VarBC. For example, AMSU-A channel 14 is used to anchor the upper stratosphere (Di Tomaso and Bormann, 2011) in addition to RO; selected hyperspectral infrared window channels are used to anchor skin temperature at the Met Office; and selected ozone channels from hyperspectral infrared instruments are used to anchor the upper tropospheric/lower stratospheric ozone analysis at ECMWF (Han and McNally, 2010). However, the assumption that observation biases are small for these observations can be wrong: for example, biases in the observation operator may be large (Han and Bormann, 2016).
Over the past 30 years there has been a significant increase in the number of satellite observations, due to their large impact on NWP (Eyre et al., 2020;Saunders, 2021;Eyre et al., 2022). As a result, it is becoming more important to use unbiased observations more effectively in reducing the contamination of model bias in VarBC. Previous work from Eyre (2016) shows the importance of anchor observations in the presence of model bias for a scalar system with no explicit random error in the observations. Eyre (2016) shows that if there is model bias present in a VarBC system then, as the number of anchor observations reduces, the observation-bias correction is more affected by model bias, so the analysis bias will tend towards the model bias. In this study, we extend the work of Eyre (2016) by looking at a multivariate system with explicit random error to look at the role of the locations of anchor observations and their uncertainty characteristics in reducing the effect of model bias on observation-bias correction.
The article has the following structure. In Section 2, the general method of VarBC is explained. In Section 3, we extend current VarBC theory to include two types of observation to demonstrate the role of anchor observations: in Section 3.1 we derive analysis equations for the state and bias coefficients when there are both observations that are bias-corrected and observations that are used as anchor observations; in Section 3.2 we look at the importance of the location of the anchor observations relative to the bias-corrected observations; in Section 3.3 we look at the importance of the location of the anchor observations relative to the locations of model bias; and in Section 3.4 we show how bias in the analysed bias-correction coefficients filters into the state in subsequent cycles. In Sections 4 and 5 we test the theory using an idealised 40-variable model of the atmosphere (Lorenz, 1996): Section 4 gives details about the setup of the toy data assimilation system used in the experiments and Section 5 presents idealised experiments to show the following: how the observation bias coefficients are affected by model bias, depending on the precision of the anchor observations; the effect of anchor and bias-corrected observations observing different parts of the state; and how the model bias contaminates both the observation-bias correction and the state estimation when VarBC is cycled. Finally, we present our conclusions and further discussion in Section 6. Auligné et al. (2007) describe three types of bias-correction techniques: static, "offline", and variational (VarBC). In the static scheme, a vector to describe the observation biases is calculated via the mean difference between the observations and the background state and this term is taken to be fixed until the bias correction is updated. As a result, this scheme cannot take into account any variation in the bias over different DA cycles since the last update. In the "offline" scheme, the bias-correction coefficients are calculated at every new cycle based on the background state from the analysis of the previous cycle. The bias-correction coefficients are calculated by minimising the cost function with respect to the bias-correction coefficients. Although this scheme can adapt more easily than the static scheme to changing biases, the bias coefficients are calculated via the background state, so the "offline" scheme will be more affected by model bias than VarBC would be (Eyre, 2016). VarBC, on the other hand, is an adaptive scheme that updates the bias coefficients within the data assimilation system, taking advantage of the latest unbiased anchor observations. This is the method primarily used operationally, for example at the Met Office (Cameron and Bell, 2018) and ECMWF (Han and Bormann, 2016). In this article, we will only look at VarBC.

BIAS CORRECTION OF OBSERVATIONS
The bias correction for the kth observation (e.g., a measurement taken by an individual channel of a particular satellite) is given by where s k is a constant; p k,i are the r k predictors used for the kth observation; x ∈ R n is the NWP model state; and k,i are the corresponding bias-correction coefficients that are updated at each cycle for each instrument that is bias-corrected. We define the vector ∈ R r to hold all values of k,i , which for simplicity we will call the bias coefficient, where r = ∑ k r k . We define the vector function c(x, ) ∈ R m 1 to be the bias correction for all observations that are bias-corrected, where m 1 is the number of biased observations.
The Met Office uses the following predictors: air-mass thicknesses, which are the differences in height between different pressure levels; Legendre polynomials to describe the spatial variation of bias across a scan; Fourier series to describe orbital bias (only used for SSMIS and MWRI); and a constant predictor that is not dependent on location (Cameron and Bell, 2018), which is given by p k,0 = 1 (Dee and Uppala, 2009). p k,0 differs from the static bias correction s k in Equation 1, as it is multiplied by the bias coefficient, k,0 , so is updated within VarBC.
In VarBC the cost function, J, for three-dimensional variational assimilation (3DVar) is now dependent on where v b is the background of the state and bias coefficients; B v is the background-error covariance matrix for the state and bias coefficients; R is the observation-error covariance matrix; y is the vector that contains all observations; h v (v) is the observation operator given by a vector function that is the sum of the nonlinear observation operator transforming the state to the observation space, h(x), and the bias correction, c(x, ) (Auligné et al., 2007). h v (v) will vary for each observation: for example, observations that are not bias-corrected will have c(x, ) = 0. In this article we refer to y − h v (v) as the innovation vector. The cost function can be generalised to 4DVar by summing over the innovation vectors to include all observations in the assimilation window and incorporating the model into the observation operator. In the following we have looked only at 3DVar for simplicity. Assuming the observation operator can be linearised around the background state v b , then the cost function in Equation 2 is minimised by the analysis equation, where the superscripts a and b denote the analysis and background, respectively, and K v is known as the Kalman gain matrix for v, given by where H v is the Jacobian of h v (v b ) and m is the total number of combined anchor and bias-corrected observations. We will refer to K v as the sensitivity of the control vector to the observations.

THE IMPORTANCE OF ANCHOR OBSERVATIONS IN VARBC: THEORETICAL RESULTS
Through experience at operational centres, anchor observations are known to be essential in VarBC. Their importance for constraining the system from diverging to the model bias was highlighted in the theoretical work of Eyre (2016) in a scalar system. In this section we derive general insight for VarBC that demonstrates the role of the anchor observations in limiting the contamination of model biases. We use this theory to understand how the role of anchor observations is sensitive to their error characteristics and their location within the spatial domain. We then look at the effect that model bias has on the bias coefficients and how this is translated into the state in subsequent cycles.

Setup and derivation of theory for two observation types
In order to separate the roles of the bias-corrected and anchor observations, we write Equation 3 in terms of the state analysis and bias-coefficient analysis, by assuming that there are no correlations between the errors in the state and the bias coefficients, consistent with operational VarBC implementation (Dee, 2004). By also assuming that there are no correlations between the errors in the bias-corrected and anchor observations, we can further extend Equation 3 to include two observation types, where y (1) ∈ R m 1 are bias-corrected within the assimilation (bias-corrected observations) and y (2) ∈ R m 2 are not bias-corrected (anchor observations): where We use K xy (1) and K xy (2) to denote the sensitivities of the state analysis to the bias-corrected and anchor observations, respectively, and K y (1) and K y (2) to denote the sensitivities of the bias-coefficient analysis to the bias-corrected and anchor observations, respectively. These four Kalman gain matrices are stored in the matrix K v : where H v is given by H (1) is the Jacobian of the bias-corrected observation operator h (1) (x); H (2) is the Jacobian of the anchor observation operator h (2) (x); C x is the Jacobian of c(x, ) with respect to x; and C is the Jacobian of c(x, ) with respect to . Operationally, the bias correction would be a function of the background state such that c(x b , ), hence C x would be zero, as c is dependent on the model state at the time of the background, not the variable x. Therefore in the remainder of this article we will set C x to be zero. B v is the background-error covariance matrix for the state and the bias coefficient, given by R is the observation-error covariance matrix for the bias-corrected and anchor observations, given by ) ∈ R (m 1 +m 2 )×(m 1 +m 2 ) .
From Equation 6, the sensitivities of the biascoefficient analysis to the bias-corrected observations, anchor observations, state background, and biascoefficient background respectively are given by a y (1) = K y (1) , where we have used the convention that a partial derivative is a row vector. We will use Equation 14 to study the dependence of a on the state background, which is the mechanism for model bias to be passed into the bias-coefficient analysis.
In order to understand the sensitivity of a to x b via Equation 14, we need to know what K y (1) and K y (2) are dependent on. Upon calculating K v in terms of K xy (1) , K xy (2) , K y (1) , and K y (2) (see Appendix A ), we can rewrite K y (2) in terms of K y (1) : Therefore this shows that the sensitivity of the bias-coefficient analysis to the anchor observations is dependent on the sensitivity of the bias-coefficient analysis to the bias-corrected observations and vice versa (K y (1) is also dependent on K y (2) ). As K y (2) can be written in terms of K y (1) , we can rewrite the sensitivity of the bias-coefficient analysis to the state background, Equation 14, so that it is written in terms of K y (1) : where D is given by In order to understand how biases in the model are passed into the bias-coefficient analysis, we define the errors in the analysis, background, and observations for both the state and the bias coefficients as follows: where the "t" superscript denotes the true value of the state, bias coefficient, or control vector, respectively. The true bias coefficient is the theoretical vector that would correct the observations in the bias correction perfectly if the biased observations had no random error. Taking the expected value of the analysis errors in the state and bias coefficient (see Appendix B), we find where , see Equation 9. If the right-hand side of Equation 22 is nonzero then the state analysis is biased. If the right-hand side of Equation 23 is nonzero then the bias-coefficient analysis has a bias. Biases in a will filter into ⟨ a x ⟩ when a becomes b in the next cycle, as will be shown in more depth in Section 3.4.
Assuming that c(x t , t ) is the true bias correction, then ⟨ o 1 ⟩ = 0, as o (1) is not dependent on the state background. We assume the anchor observations are unbiased (⟨ o 2 ⟩ = 0). To isolate the effect of model bias on the bias-coefficient analysis, we make the theoretical assumption that the bias-coefficient background has no bias (⟨ b ⟩ = 0). In subsequent cycles, ⟨ b ⟩ will be propagated from the previous cycle, so this assumption can only be possible for the first cycle. We will demonstrate the impact that a biased background bias coefficient has on the state and bias coefficient analyses in Sections 3.4 and 5.2.1. Using Appendix C, the expected value of the bias-coefficient analysis error, Equation 23, simplifies to where D is as defined in Equation 18 and we denote ⟨ a ⟩ with the above assumptions as ⟨ a ⟩ t=0 to highlight that this is only valid for the initial cycle. In the next few sections we will study Equation 24 to see how ⟨ a ⟩ t=0 is affected by model bias when we vary the anchor observation parameters.
The effect of the model bias on a will be small if at least one term in the product K y (1) H (1) (I − D) is also small. If K y (1) is small, then the sensitivity of the bias-coefficient analysis to the bias-corrected observations, Equation 12, would be small. If H (1) is small, then the bias-corrected observations would not be used to determine the state analysis. Therefore, as these are both uninteresting cases for determining the bias-coefficient analysis, in this study we will focus instead on when the magnitude of I − D is small depending on the parameters given in the system.
Note that ⟨ a ⟩ t=0 can also be rewritten in terms of the sensitivity of the bias-coefficient analysis to the state background, Equation 17, such that Equation 24 becomes Therefore if the sensitivity of the bias-coefficient analysis to the state background reduces, less model bias will be able to contaminate the analysed bias coefficient.
In subsequent sections, we look at the importance of the location of the anchor observations relative to the bias-corrected observations for reducing I − D and thus reducing the contamination of model bias in the observation-bias correction.

Position of anchor observations relative to bias-corrected observations
In this section, we want to understand how anchor observations can reduce the contamination of model bias in the bias-coefficient analysis depending on whether the bias-corrected and anchor observations observe the same state variables.

3.2.1
Anchor observations fully observe the domain In order to understand the role of anchor observations in reducing the contamination of model bias in VarBC, we first consider an almost perfect case where we have anchor observations everywhere in the domain. If H (1) and H (2) are both equal to the identity, that is, the state is fully observed directly by both bias-corrected and anchor observations, then D, Equation 18, is given by Equation 24 is then given by Equation 27 shows that, even with complete observation coverage, ⟨ a ⟩ t=0,D I is still nonzero, as it is still a function of the state model bias. However, if D I tends to the identity, then the right-hand side of Equation 27 would tend to zero, such that the observation-bias correction would no longer be contaminated by the model bias. D I would tend to the identity when the elements of R (2) are smaller than the elements of B x . This would occur when the anchor observations are more precise than the state backgrounds that they observe.
Overall, Equation 26 shows that, even with anchor observations at every model grid point, we still have model bias contaminating the bias-coefficient analysis. This can only be reduced by having anchor observations that are more precise than the state backgrounds that they observe.

3.2.2
Anchor observations partially observe the domain Anchor observations and observations to be bias-corrected could observe different parts of the state. Therefore in this section we derive equations for the expected value of the bias-coefficient analysis error and the sensitivity of the bias-coefficient analysis to the state background when the anchor observations do not observe the whole domain.
Let the state x be separated into two parts: x and x , such that Let the anchor observations only observe a subset of the state, such that they only observe variables in x . Bias-corrected observations could observe variables in x and x . Then the linearised bias-corrected and anchor observation operators will be given by where H (1) is related to observations of x and H (1) and H (2) are related to observations of x . We have denoted H (2)p to be the Jacobian of the anchor observation operator when the state is only partially observed by anchor observations, that is, anchor observations only observe variables in x . The background-error covariance matrix that describes the relationships between x b and x b is The magnitude of B x determines how much information about the observations will be shared between the state variables (Bannister, 2008). For example, if the elements in B x are small then there are weak correlations between the errors in x b and x b . We are interested in how the value of D, Equation 18, affects the sensitivity of the bias-coefficient analysis to the state background. When the anchor observations only observe a subset of the state, we denote D by .
The blocks of D p are given by (see Appendix D) Then the sensitivity of a to x b (Equation 17), when ) .
The first block element of the matrix in Equation 35 (i.e., −K y (1) | | |H (2) =H (2)p H (1) ) gives the sensitivity of a to x b and the second block element of the matrix in Equation 35 gives the sensitivity of a to x b . D p and D p are zero (from Equations 31 and 33) so do not appear in this equation, but their role would be to vary the sensitivity of a on x b . Therefore, as D p and D p are zero, the sensitivity of a to x b is not explicitly dependent on anchor observations, when anchor observations do not observe state variables in x . However, there is some implicit dependence of anchor observations in K y (1) . If D p and D p change magnitude, then this will vary the sensitivity of a to x b . Therefore, as D p and D p are explicitly dependent on the anchor observations, the sensitivity of a to x is also explicitly dependent on the anchor observations, as in this case the anchor observations observe state variables in x . Putting D p into Equation 24 gives ⟨ a ⟩ t=0 when the anchor observations observe part of the state: where ⟨ b x ⟩ and ⟨ b x ⟩ are the biases in the background state variables x b and x b respectively. We will use this equation further in Sections 3.2.3 and 3.3 to show how the sensitivity of the bias in a varies depending on the location of the model bias in relation to the anchor observations.

3.2.3
Special case: anchor observations and bias-corrected observations observe different parts of the state Next we look at the case where anchor and bias-corrected observations observe different parts of the state to understand the importance of the background-error covariance matrix. Let the bias-corrected observations only observe x and anchor observations only observe x . Then the observation operators of both observation types will be given by where we have denoted H (1)p and H (2)p to be the Jacobians of the observation operators when both bias-corrected and anchor observations observe the state only partially.

Substituting Equation 37 into Equation 36
gives the expected value of the bias-coefficient analysis when anchor and bias-corrected observations observe different parts of the state, x ⟩ tends towards zero. D p , from Equation 32, is linearly dependent on B x . Therefore, the strength of the background-error covariances between x b and x b will influence how small the difference term in Equation 38 is. In the case in which the model biases ⟨ b x ⟩ and ⟨ b x ⟩ have similar sign and magnitude, if D p tends to the identity, the difference term will reduce and the expected value of the bias-coefficient analysis error will tend to zero. D p will tend to the identity if the background-error correlations between the states are large. In practice, the model biases may be similar for different parts of the state and the background-error correlations between these two states could be large when restricted to specific atmospheric layers higher in the atmosphere. However, this will not be the case when the bias-corrected and anchor observations observe states vertically far apart: for example, biases in stratospheric temperatures tend to be much bigger than biases in tropospheric temperatures.
To understand the importance of B x when the anchor and bias-corrected observations observe different parts of the state, let us look at the case when B x = 0. In this case we can simplify K v from Equation 8, as the off-diagonal blocks in (H and Putting Equations 39 and 40 into where K y (1) and K y (2) can be simplified from Equations A11 and A12 to give The derivation of K y (1) and K y (2) for the general case case be found in Appendix A.
from Equation 13 the sensitivity of the bias-coefficient analysis is independent of the anchor observations when there are no background-error cross-correlations between the states observed by the anchor observations and the states observed by the bias-corrected observations. This means anchor observations do not play a role in determining the bias coefficients if anchor observations do not observe the same state variables as the bias-corrected observations and information is not passed between x and x via B x . Therefore, if bias-corrected and anchor observations observe different state variables, nonzero background-error covariances allow the anchor observations to be used to determine the bias coefficients and therefore reduce the effect of the model bias in a . We will use this result in Section 3.3. So far, we have not considered that the model bias may change magnitude depending on location. In the next section, we will investigate scenarios where the location of the model bias differs relative to the location of the anchor observations.

Position of anchor observations relative to model bias
In reality, biases in the model will not be uniformly distributed throughout the domain. For example, a version of the ECMWF Integrated Forecast System model has a cold bias between 100 and 10 hPa and a warm bias above 10 hPa (Laloyaux et al., 2020b). In this section, we explore the case in which the model bias varies across the domain. For simplicity, we assume that some parts of the domain are biased and others not. We continue to assume that the anchor observations observe only a subset of the state, but assume that the bias-corrected observations could observe the whole state. Figure 1 is a schematic diagram that depicts the different possibilities of the locations of the model bias in relation to the observations. The presence of model bias is shown by the patterned background; the circles with H (1) show the parts of the state observed by the bias-corrected observations; and the circles with H (2) show the parts of the state observed by anchor observations. In Figure 1 a the model bias is in x , which is observed by anchor observations and some bias-corrected observations, in Figure 1 b the model bias is in x , which is only observed by bias-corrected observations, and in Figure 1 c the model bias is in both x and x , such that the whole state that is observed has model bias. In the next sections we assume that the model biases in x and x (when present) have the same sign and magnitude. These scenarios are discussed below.

3.3.1
Model bias in state variables observed by anchor observations, but not in all state variables observed by bias-corrected observations , there is no model bias in x b , the state variables are observed only by the bias-corrected observations) but ⟨ b x ⟩ ≠ 0 (i.e., there is model bias in x b , the state variables are observed by both the bias-corrected and anchor observations), then the expected value of the bias-coefficient analysis, Equation 36, becomes , this would occur if R (2) is smaller than B x , or, in other words, if the anchor observations were more precise than the state backgrounds that they observe, as we saw for a simplified case in Section 3.2.1. If the bias-corrected observations observe state variables in x (so they observe different parts of the state from the anchor observations) such that H (1) is nonzero, then H (1) D p would tend to zero if D p tends to zero. Using Equation 32, D p would tend to zero if the background-error covariances B x are small, which would mean less information is passed between x b and x b than when the background-error covariances are larger.
Therefore, if anchor and bias-corrected observations observe the same state, the anchor observations can reduce the effect of the model bias in the state when the anchor observations are more precise than the state backgrounds they observe. If there are bias-corrected observations that observe a different part of the state from the anchor observations, which do not have model bias, then smaller background-error covariances between the two parts of the state will limit the amount of model bias able to contaminate the estimate of the bias coefficient, as the anchor observations cannot share information about the model bias with the bias-corrected observations. This is in contrast to Section 3.2.3, in which we showed that if anchor and bias-corrected observations observe different states, but the model bias observed was the same, then they would need nonzero background-error cross-correlations for the anchor observations to pass information about the model bias to the bias-corrected observations.

Model bias not in state variables observed by anchor observations
If there is no model bias in x b (i.e., there is no model bias in state variables observed by anchor observations), such that ⟨ b x ⟩ = 0, but there is model bias in x b (i.e., there is model bias in state variables only observed by bias-corrected observations), then the expected value of the bias-coefficient analysis, Equation 36, reduces to In this case, the anchor observations cannot reduce the effect of the model bias in x b via D p , as ⟨ a ⟩ t=0,D p is no longer dependent on D p and no other variables in Equation 45 are explicitly dependent on the anchor observations. Therefore, if anchor observations do not observe the parts of the state that have model bias, they cannot explicitly reduce the effect of model bias on a . Anchor observations will only have an effect on model bias implicitly through K y (1) , as K y (1) is implicitly dependent on H (2) and R (2) (see Equation A12). However, as was discussed at the end of Section 3.1, a small K y (1) would mean the bias-coefficient analysis is independent of the bias-corrected observations, so we are not interested in this case.

3.3.3
Model bias in state variables observed by both anchor and bias-corrected observations If we have model bias in both x b and x b , then we come back to Equation 36, which gives the same results as in Section 3.2 : that is, giving a higher weighting to the anchor observations will reduce the contamination of model bias in the observation bias coefficient. If there are similar model biases in state variables observed by either bias-corrected or anchor observations, background-error covariances between state variables become more important in sharing information about the model bias.
In Sections 3.2 and 3.3, we have shown that, in order for the anchor observations to have the biggest impact on reducing the effect of model bias on a , both anchor and bias-corrected observations need to observe model bias. If both anchor and bias-corrected observations observe the same parts of the state, the effect of the model bias is smallest when D p tends towards the identity, as was shown in Section 3.2.1. If both anchor and bias-corrected observations observe model bias, but do not observe the same state variables, background-error correlations with large magnitudes will have a bigger impact in reducing the effect of model bias on a , as was shown in Section 3.2.3, as D p is linearly dependent on B x (Equation 32).

The effect of a biased bias-coefficient analysis on the state analysis in further cycles
So far we have only looked at the contamination of model bias in a , not in x a . However, any bias in the bias-coefficient analysis will filter into the bias correction and therefore the state analysis in the next cycle. Within this section we extend the theory developed so far to understand the impact of model bias on the state analysis via the implementation of VarBC.
Cycle 1 (theory so far): At the initial time, we assume that b is unbiased. We assume the mean anchor observation errors are zero by definition. When the bias correction function is dependent on the true state and bias coefficient, c(x t , t ), we assume it is perfect, such that the expected value of the bias-corrected observation errors are zero. Then the mean values of the errors in the observations and bias-coefficient background at the first cycle are denoted by If we assume that there is a bias in the state background, which arises from a model bias, then from Equations 22 and 23 we have that the expected value of the analysis errors at the first cycle for the state and bias coefficients respectively are Note that Equation 49 is equivalent to Equation 24. Cycle 2: We assume the bias coefficients are approximately constant between cycles, such that ⟨ b ⟩ t=1 = ⟨ a ⟩ t=0 . We assume that the expected values of the errors in both bias-corrected and unbiased observations are still zero, as the observation errors in Equation 47 are not dependent on the background state and background bias coefficient. The state background is the previous state analysis evolved forward via the linearised model M, plus a bias increment Δt. The expected values of the background and observation errors at the second cycle are therefore given by Then, substituting Equations 50 -52 into Equations 22 and 23, we have that the expected values of the analysis errors for the state and bias coefficients at the second cycle are The expected value of the state analysis is now dependent on both ⟨ b x ⟩ t=1 and ⟨ a ⟩ t=0 . This shows that, if the bias-coefficient analysis is biased at time n, then this will contaminate the estimate of both state and bias-coefficient analysis at time n + 1. As ⟨ b ⟩ t=1 = ⟨ a ⟩ t=0 , we see that in future cycles we cannot assume ⟨ b ⟩ = 0, as any bias in the bias-coefficient analysis will become the bias in the bias-coefficient background. Therefore, this shows that it is important to reduce the contamination of model bias in the observation bias coefficient, as ⟨ a ⟩ feeds into future cycles and contaminates the state analysis.

EXPERIMENTAL DESIGN
In this section, we demonstrate our general theory for a linear system by considering a simplified experiment that uses the nonlinear Lorenz 96 model (Lorenz, 1996), which we will denote as L96, as the basis for our system. L96 has been used as a way to study predictability, particularly in weather and other atmospheric systems, and is often used today as a test problem for numerical weather prediction and in data assimilation (e.g. Ott et al., 2004;Fertig et al., 2007;Brajard et al., 2020). We use L96 as it allows us to choose certain parameters that control how quickly the system displays chaotic behaviour (Lorenz, 1996;Kerin and Engler, 2020). Model bias is added to the equations over more than one cycle by changing the forcing term in the model that propagates the state analysis forward, but not in the model that calculates the true state, which is used to generate the observations. The L96 model is a system of coupled ordinary differential equations which describes the evolution of a quantity via advection, dissipation, and external forcing. The system contains N variables x 1 , ..., x N , and is determined by N equations, where the kth equation is given by where F is independent of k (Lorenz, 1996). This is solved numerically using the fourth-order Runge-Kutta scheme.
We assume that we are on a periodic circular domain such that x 0 = x N and x 1 = x N+1 . The first two terms of Equation 55 are the advection terms, which conserve the total energy: they simulate flow out of and into the grid point k, respectively. The third term is the internal dissipation, where a fraction of the quantity present is destroyed or dissipated, and the fourth term, F, shows how much of the current quantity x k is added to each state (Kerin and Engler, 2020). As in many toy data assimilation experiments utilising the L96 model, we choose the number of spatial variables N to be 40 and the forcing term in the true model, F, to be 8 (e.g. Ott et al., 2004;Fertig et al., 2007;Brajard et al., 2020). We set the distance between grid points ΔL k = 1 for all k, such that the length of the domain is 40 (equal to the number of spatial variables). We initialise the L96 model from a sine curve and run it for 10 5 time steps of Δt = 0.0125, in order to allow the system to relax into its dynamics. The time step Δt = 0.0125 is equivalent to approximately 45 min in the real atmosphere (Ott et al., 2004;Brajard et al., 2020), based on the assumption that the error-doubling time of the atmosphere is approximately 1 day (Simmons and Hollingsworth, 2002). We then use the final time step as the initial condition for the true trajectory in our numerical experiments.
In our data assimilation setup, we set the true model to be the L96 model with forcing F = 8. We define the biased observations to be the truth plus an error consisting of random and systematic error, where x t is the true state; e o (1) is the uncorrelated random error in the biased observations, calculated from a Gaussian distribution with zero mean and error variance 2 o(1) ; and b o is the observation bias. We define the bias to be constant over all observed variables, and set it to 2 (∶= t ). The unbiased observations are given by the truth plus a random error, where e o (2) is the random error in the unbiased observations, calculated from a Gaussian distribution with zero mean and error variance 2 o(2) . These will be our anchor observations. The data assimilation method used for our experiments with the L96 model is 1DVar, which is analogous to the theory presented on 3DVar in Section 3, as we do not use a time dimension. The number of observations in space varies depending on the experiment. For each experiment there is at least one observation (anchor and/or bias-corrected observations) observing every state variable. Although such a high frequency of observations is unrealistic operationally, this is necessary when using the L96 model, as variables that are spatially close in L96 are not highly correlated (Lorenz, 2005), which implies that their errors are not highly correlated. We attempted similar experiments with sparser observations, but it produced background-error covariance matrices with unstable values.
We simulate direct observations, hence H (1) ∈ R m 1 ×40 and H (2) ∈ R m 2 ×40 have values of one at the locations of observations and zero at all other locations. As the biases added to the observations are constant, we set c(x, ) = such that C x = 0 and C = 1. This means that we only use one predictor, which is equal to 1, that is, p k,0 = 1 in Equation 1. R (1) and R (2) are set to be the identity, unless stated otherwise as in experiment 1 (Section 5.1.1). The matrix B v (the matrix that combines B x and B : see Equation 10) is given by two different matrices in the following experiments. In experiments 1 and 3, B v is calculated from an initial ensemble run, so that it is the optimal background-error covariance matrix for the system. The calculation of B v for these experiments is explained in more detail in Appendix E. In experiment 2 (Section 5.1.2), in which we do not cycle the system, the correlations in B x are given by the second-order autoregressive (SOAR) correlation function, which has previously been used to model the background-error correlations of the atmosphere (Ingleby, 2001;Simonin et al., 2014), allowing us to control the length-scales of the error correlations. The SOAR correlation function is given by where k is the index from 1 to N∕2; L b is the length-scale; and e is the exponential function. For values greater than N∕2, s k is repeated in opposite order such that the array s is palindromic. To calculate B x , s is multiplied by 2 bx to create a circulant matrix. The background-error covariances are varied by varying the length-scale of the SOAR function. B is given by the scalar 2 b .

NUMERICAL RESULTS
In this section, we demonstrate the theoretical results from Section 3 using the L96 model and the metrics we have introduced in Section 4. We want to demonstrate the ability of anchor observations to reduce the contamination of model bias acting on the analysed observation bias coefficient. In each experiment we compute the bias-coefficient analysis over a given number of realisations, with random error in the observations and initial background values. The number of realisations varies in each experiment, due to the computational cost and the error variances chosen. From these realisations we obtain the mean and standard deviation of the bias-coefficient analysis. We can illustrate the bias in the analysed bias coefficient with the ratio where | ⋅ | is the absolute value; a is the mean a over all realisations; t is the true bias; and a is the standard deviation of a from all realisations. This shows the bias associated with the mean value of a in relation to the error variance. We will refer to this ratio as the bias ratio. If the bias ratio is large, then the bias can be considered significant in comparison with the random noise, but if the bias ratio is small, then the bias in the bias-coefficient analysis will be lost within the random error and so will be insignificant. The ratio at 0.1 is plotted for reference as a dotted line in Figures 2, 3, and 4 and is referred to as the reference ratio.

One-cycle experiments (no model evolution)
In the first two experiments (Sections 5.1.1 and 5.1.2) we use 1DVar to calculate the analysis of the state and bias coefficients at the initial time step. To simulate a model bias, we set the state background to have a bias, to represent any bias that has accumulated from running a previous cycle forward using the model. The background state is given by the truth plus an error, where e b x is the random error in the state background calculated from a Gaussian distribution with zero mean and from the error covariance matrix B x ; b b is the bias in the initial state background; and e b is the random error in the bias-coefficient background calculated from a Gaussian distribution with zero mean and error variance 2 b . In experiment 1 this is set from the climatological B v (as described in Appendix E) and in experiment 2 this is set to be 1, so that it is equal to the other error variances.

5.1.1
The effect of varying the anchor observation-error variance First we present an experiment to illustrate the results found in Section 3.2.1. The theoretical results showed that, in a system observed fully by anchor observations, small anchor observation-error variances reduce the contamination of model bias in a . Therefore, in this experiment, we have observed all spatial variables with both bias-corrected and anchor observations, such that every spatial variable is observed twice, once by the bias-corrected observations and once by the anchor observations.
In Figure 2, we plot the bias ratio as in Equation 59 from 3000 realisations after one cycle and vary the anchor observation-error standard deviation, to test the importance of the weighting given to anchor observations in reducing the effect of model bias in the analysed bias coefficient. The bias-corrected observation-error standard deviations ( o(1) ) are set to 1, so that they are approximately 10% of the variability of the state. B v is calculated via an ensemble, as described in Appendix E, and is kept constant for the different values of o(2) . Although this gives a suboptimal B v for the system, it isolates the effect of varying the anchor observation-error variance. A bias of 0.15 has been added to the initial conditions, as in Equation 60, which represents our model bias. This is approximately 1.5% of the variability of the state and was chosen so that the system could still control the model bias. o(2) has been varied from one-tenth of o(1) to ten times o(1) (i.e., between 0.1 and 10). The crosses are the results from the 3000 realisations and the solid line is the analytic result given by Equation 24, to give the "true" bias ratio for the system.
In Figure 2, the bias in the observation bias coefficient becomes more significant as the anchor observation-error standard deviation is increased. There is some variation in the bias ratio when calculated from the realisations, which is due to the random error in the observations and background, but they follow the shape of the analytic solution. The bias ratio increasing with larger anchor observation-error variance is in line with the results found from Equation 27, as we showed that if the anchor observation-error variance is small, then the effect of the model bias on a would be small. This is because the ratio D in Equation 18 will tend to the identity as o(2) reduces. The ratio increases as o(2) increases, which shows that, as less trust is put into the anchor observations, the bias in the analysed bias coefficient increases. This occurs as model bias dilutes the analysis of the bias coefficient and pulls the VarBC system away from the truth. In the limit where the anchor observation-error variance is large, the anchor observations receive insignificant weight in the analysis and so the biased observations are bias-corrected towards the model bias, instead of to the truth.

5.1.2
The location of the anchor observations relative to the model bias Next we present experiments to demonstrate the results found in Sections 3.2.3 and 3.3, such that we have anchor and biased observations observing different state variables that do and do not have model biases. In Section 3.3, we showed how the role of the anchor observations in reducing the effect of model bias changes depending on whether or not anchor observations observe state variables that have model bias. When anchor and biased observations observe different parts of the state, as in Section 3.2.3, then the influence of the anchor observations is dependent on the background-error covariances between state variables.
In Figure 3, we have plotted the ratio in Equation 59 from 1000 realisations after one cycle. The observations are spaced evenly at every other spatial variable, such that the biased observations observe the even state variables (x 0 , x 2 , x 4 , etc.) and anchor observations observe all odd state variables (x 1 , x 3 , x 5 , etc.). The observation-error standard deviations are equal to 1. B x is given by the SOAR correlation function, with 2 bx = 1. B is scalar, also equal to 1. To see the effect of different background-error covariances, we have varied B x by varying the length-scale, L b , which is varied in Figure 3 along the x-axis. When L b is small, the background-error covariances between state variables are small, and when L b is increased, so are the background-error covariances between state variables. Larger length-scales will mean more background information is shared between parts of the state that are spatially further away from each other. The dotted line is 0.1, below which any biases are considered to be insignificant. We have considered three cases where a bias of 0.3 has been added to the background state in three different locations: in state variables that are observed by biased observations; in state variables that are observed by anchor observations; and in state variables that are observed by both types of observations. Model bias only in state variables observed by anchor observations (dotted line): In Figure 3, when the model bias is only in state variables observed by anchor observations (dotted line), the bias in the observation bias coefficient is insignificant when the error covariances of B x are small. As the length-scale of B x is increased, the ratio of the bias also increases, which means a has a more significant bias in comparison with the random error. This reflects the results found from Equation 44, as we found that, in order to reduce the contamination of model bias in state variables observed by the anchor observations in ⟨ a ⟩ t=0 , less information had to be passed between the state variables observed by the anchor and bias-corrected observations: that is, smaller background-error correlations. However, in Section 3.2.3, we showed that, if the anchor and bias-corrected observations observe different parts of the state and B x tends to zero, then the anchor observations would not be used in the VarBC system. Hence this scenario would not be ideal, as it would mean that either a is affected by model bias or the anchor observations are not used within the VarBC system. Figure 3, when model bias is only in state variables that are observed by the biased observations (solid line), the bias in ⟨ a ⟩ t=0 remains significant regardless of the magnitude of the covariances in B x . This reflects the theoretical results, as Equation 45 is independent of D, which means that the anchor observations cannot reduce the effect of model bias in state variables directly if they do not observe the parts of the state that have model bias. Figure 3, if model bias is in both state variables observed by biased observations and state variables observed by anchor observations (dash-dotted line), then the bias in ⟨ a ⟩ t=0 is initially significant compared with the random error when the length-scales of B x are small and becomes less significant as more information between background state variables is shared. As was shown in Equation 38, Figure 3 shows that varying B x has a large effect in reducing the effect of the model bias from parts of the state that both are and are not observed by the anchor observations. If model bias is in the system, then having strong error correlations between bias-corrected and anchor observations that observe state variables with similar model biases is the best possibility, as the anchor observations have the biggest effect in reducing the contamination of model bias acting on the estimate of the observation bias coefficient, whilst still being used within VarBC.

Cycled experiment
In the third experiment (Section 5.2.1) the system is run over 50 cycles to demonstrate how the bias in the bias coefficient will accumulate and be passed into the state analysis. On the first cycle, the background state and bias coefficient are set up as in Equations 60 and 61, but with zero added background bias, so that b b = 0. This means the first few cycles will act as a spin-up period to allow the background bias to settle into an equilibrium. In future cycles, the background values at cycle T are given by the analysis values at cycle T − 1 after they are evolved forwards in time. The bias-coefficient background is evolved by the identity, that is, the previous bias-coefficient analysis is taken to be the bias-coefficient background. The state background is evolved forward via the model from Equation 55, but replacing F with F biased = 8.8 to add a bias to the model by changing the forcing term. We set the assimilation window length to be 10 time steps (which represents approximately 7.5 hr in the real atmosphere, as discussed at the beginning of Section 4) to allow the background errors to grow sufficiently within each window.

5.2.1
How bias in the state analysis and bias-coefficient analysis accumulates when the system is cycled Finally, we have an experiment to demonstrate the results from Section 3.4, which show the effect of having a bias in a on the state analysis in the next cycle. These show how the contamination of the model bias in a leads to a biased b , which in turn contributes to the bias in the state analysis.
We run an experiment that has 100 realisations over 200 cycles. In Figure 4, we plot the ratio from Equation 59 for the bias-coefficient analysis (bottom panel), but have included a similar ratio for the norm of the state analysis bias, given by where x a i is the average state analysis across all realisations for the spatial variable x i ; x t i is the true value of the spatial variable x i ; 40 is the number of state variables; and ax is the mean value of the standard deviations of the state analysis from 100 realisations, where the mean is calculated over all state variables. We have both anchor and biased observations at every location; the observation-error standard deviations are 1; and B v is calculated via an ensemble, see Appendix E. We have added a model bias by multiplying the forcing term in the model used to evolve the analysis by 1.1, such that F biased is 8.8, but have not added a model bias via the initial conditions, as explained in Section 5.2. Multiplying the forcing term by 1.1 only changes the state values by 0.01 over one cycle, allowing the model bias still to be constrained. We plot the ratios over 200 cycles to show the evolution of the bias in the state and the bias coefficient.
In Figure 4, the ratios for the state and the bias coefficient start at approximately zero and then the state ratio increases towards 0.1 and the bias coefficient ratio increases towards 0.12. When the ratios are above 0.1, we consider the biases to be significant in comparison with the random error. Note that the magnitude of the ratios would be larger if a larger model bias were chosen, fewer state variables were observed, or less trust were put into the observations, which we do not show here. Therefore the 0.1 line has been included in the bias-coefficient bias ratio to show general trends, not the exact value of parameters that causes the bias to be significant. The bias in the first cycle is zero for both the state and the bias coefficient, because no model bias has been added to the initial conditions. There is only a model bias from the second cycle, as model bias is added via the forcing term F biased , which evolves the state analysis forward, so that the state analysis of the first cycle becomes the state background of the second cycle and so on. We would expect the bias in both the state and the bias coefficient to grow with each cycle, as we showed in Equations 53 and 54; the biases in x a and a are the accumulation of the previous state and bias coefficient biases. In Figure 4, the error in the state appears to reach saturation after approximately 20 cycles and the error in the bias coefficient reaches saturation after approximately 90 cycles. As the model bias, i , is constant in time, the state and bias coefficient analyses reach an equilibrium between the background bias and the truth from the anchor observations. We would expect an equilibrium to exist, as VarBC relies on the background and the unbiased observations as sources of the truth, so if they are different then the analysis will be pulled between the two until it reaches an equilibrium. However, it is not straightforward to evaluate analytically what the equilibrium should be.

CONCLUSIONS AND DISCUSSION
In order to study how model bias can contaminate observation-bias correction, we have looked at the role of unbiased (anchor) observations in variational bias correction (VarBC). The conclusions are based on general optimal estimation theory and we have demonstrated the results with idealised experiments.
In this study, we have focused on the importance of the anchor observations in reducing the contamination of model bias in bias-coefficient analysis, as opposed to reducing the effect on the state analysis. This is because if the bias-coefficient analysis has systematic error, then observations will be wrongly bias-corrected and the systematic error will filter into the state analysis, as was shown when the system was cycled in Section 3.4 and Figure 4.
In a theoretical world where we could have full coverage of anchor observations, we showed in Equation 27 that the model bias can still contaminate the bias-coefficient analysis. The only way that model bias could not affect the bias-coefficient analysis at all would be to have zero random error in the anchor observations. However, as this is only possible theoretically, we can only look at reducing the contamination of model bias in the bias-coefficient analysis, not removing it completely. We showed in Equation 26 and Figure 2 that the effect of model bias is reduced when the anchor observation-error variance is smaller than the state backgrounds that the anchor observations observe. Operationally, the anchor observation-error variance needs to reflect the uncertainty associated with the observations. However, it is possible to change the weighting given to the anchor observations by using more anchor observations within the system, if they are available. Therefore, although we have shown that more precise anchor observations will reduce the contamination of model bias in the observation-bias correction, this result also extends to the importance of a higher spatial frequency of anchor observations, to increase the weighting given to the anchor observations within the system and thus have the biggest impact in reducing the contamination of model bias.
In Equations 36, 44, and 45 and Figure 3, we showed that the contamination of model bias in the bias-coefficient analysis can only be reduced by the anchor observations if the anchor observations also observe state variables that have similar model bias. Anchor observations have typically been from radiosondes, which mostly have coverage over land, but with the increased use of radio occultation (RO) the spatial and temporal distribution of anchor observations in the upper troposphere and stratosphere has increased. Radiosondes and RO give a good coverage of anchor observations for temperature in the troposphere and stratosphere, but variables such as humidity and wind speed are less well observed by unbiased observations. This means that, although the spatial coverage of anchor observations is increasing, there are still significant parts of the model domain where model bias could still contaminate the bias-coefficient analysis.
We showed in Equation 44 and Figure 3 that, if model bias only exists in state variables observed by the anchor observations, such that bias-corrected observations do not observe state variables with model bias, then the anchor observations will pass the model bias into the bias-coefficient analysis if there are strong error correlations between the background state variables. This could occur, for example, in the lower troposphere, as radiosonde measurements are mostly taken over land, with little coverage over sea. If a temperature bias exists over the land but not the sea and there are strong horizontal error correlations between the land and the sea, then the model bias could be passed into the bias coefficients via radiosondes.
In Equation 45 and Figure 3, we showed that, if there is no model bias in the state variables that are observed by the anchor observations but there is model bias in state variables observed by the bias-corrected observations, then the anchor observations cannot explicitly reduce the effect of model bias in other state variables. Therefore, areas that are most at risk of contamination of model bias within the observation-bias correction will be locations with fewer unbiased observations, such as humidity in the upper troposphere, which is only sparsely observed by radiosondes but is known to have model biases.
We showed in Equation 36 and at the end of Section 3.3 that, if both biased and anchor observations observe state variables that have model bias, then anchor observations can reduce the contamination of model bias in the bias-coefficient analysis. We saw in Figure 3 that, if the anchor and biased observations observe different parts of the state that have similar model bias characteristics, the background-error covariances become more important: larger background-error correlations will transfer more information about the model bias between state variables and so will reduce the contamination of the model bias in the bias-coefficient analysis. In an operational system, the length-scales of B x will be locationally dependent: for example, the length-scales of B x will be larger at higher altitudes (Ingleby, 2001). There will also be an element of flow dependence, such that the background-error covariance matrix will deform with the flow (Bannister, 2008). This article shows that anchor observations that observe different variables/locations from the bias-corrected observations, when both variables/locations have similar model biases, will have the largest impact in reducing the contamination of model bias when they are in systems with larger background-error covariances, such as in the upper atmosphere. When similar model biases are present in systems where background-error covariances are small between variables observed by bias-corrected and anchor observations, such as between locations of observations across a front, then anchor observations will only have a small impact in reducing the contamination of model bias in the observation-bias correction.
This article has aimed to derive new insight into the role of anchor observations for mitigating the impact of model bias in VarBC. Our theoretical findings have been tested in a toy system. The next steps for this work should be to extend the theory to 4D variational data assimilation and to look at how the assumptions made hold in an operational system: for example, how the bias predictors used operationally allow the state and bias coefficient domains to be separated more clearly.
and the observation-error variance for both types of observations is given by ) ∈ R (m 1 +m 2 )×(m 1 +m 2 ) .

APPENDIX B. BIAS-COEFFICIENT ANALYSIS ERROR
To calculate the error in the bias-coefficient analysis, we subtract the true bias from both sides of the bias-coefficient analysis Equation 6 and add and subtract h v (1) (v t ) and h v (2) (v t ) in the respective innovation vectors, to find Using Equations 19, 20, and 21, we can simplify this using the error equations, plus rewriting v t as v b − b v and expanding h v around v b ; we find that the bias-coefficient analysis error is given by where H v (1) = ( H (1) , C ) and H v (2) = ( H (2) , 0 ) (see Equation 9). To find the error in the state, we would go through the same procedure, but subtracting the true state from both sides of the state analysis Equation 5 initially, which gives (full derivation not shown).

APPENDIX C. SIMPLIFYING THE EXPECTED VALUE OF A AFTER THE ASSUMPTIONS THAT THERE IS ONLY BIAS IN THE MODEL
If we assume that ⟨ b ⟩, ⟨ o 1 ⟩, and ⟨ o 2 ⟩ are equal to zero, then Equation 23 becomes As we can rewrite K y (2) in terms of K y (1) , this simplifies to where D is as defined in Equation 18 and we denote ⟨ a ⟩ at the first cycle to be ⟨ a ⟩ t=0 .

APPENDIX D. SIMPLIFYING D WHEN THE ANCHOR OBSERVATIONS PARTIALLY OBSERVE THE SYSTEM
When the anchor observations only observe a subset of the state, then D is given by ) ( ( 0, H (2) ) ( ) + R (2) )−1 × ( 0, H (2) ) , sample covariances of all state background errors b x t,i and bias coefficient errors b where b x t,i and b xk t,i are the background errors for the th and kth variables of the state at time t, ensemble i; and x and xk are the mean background errors from the 10,500 samples at state variables and k, respectively. The error covariances between the background state and bias coefficient were set to 0 as before.
The process described above was repeated in another iteration, but B x used within the assimilation was given by the state background-error covariance matrix that had just been calculated. To remove noise caused by the limited sample estimate of B x , we set the covariances between state variables further than two grid points apart to 0. 2 b was given by the estimate of the bias-coefficient background-error variance, which had also just been calculated. Using the new B v , a new background-error covariance matrix was calculated from Equations E2 -E5, again using 700 cycles and 15 realisations. This was taken to be the optimal background-error covariance matrix in the numerical experiments in Sections 5.1.1 and 5.2.1. Figure E1 is the error covariance matrix from the final iteration, which was used in our numerical experiments. As the state variables are on a circular domain, x 0 and x 39 have a correlation to each other, as they would be next to each other in the domain.