Krylov Methods for Adjoint-Free Singular Vector Based Perturbations in Dynamical Systems

The estimation of weather forecast uncertainty with ensemble systems requires a careful selection of perturbations to establish a reliable sampling of the error growth potential in the phase space of the model. Usually, the singular vectors of the tangent linear model propagator are used to identify the fastest growing modes (classical singular vector perturbation (SV) method). In this paper we present an efficient matrix-free block Krylov method for generating fast growing perturbations in high dimensional dynamical systems. A specific matrix containing the non-linear evolution of perturbations is introduced, which we call Evolved Increment Matrix (EIM). Instead of solving an equivalent eigenvalue problem, we use the Arnoldi method for a direct approximation of the leading singular vectors of this matrix, which however is never computed explicitly. This avoids linear and adjoint models but requires forecasts with the full non-linear system. The performance of the approximated perturbations is compared with singular vectors of a full EIM (not with the classical SV method). We show promising results for the Lorenz96 differential equations and a shallow water model, where we obtain good approximations of the fastest growing perturbations by using only a small number of Arnoldi iterations.


Introduction
Weather forecasting is a computationally challenging initial value problem. It involves the estimation of the current atmospheric state and running a complex numerical weather prediction (NWP) model. Forecast errors have two major sources: model error and errors in initial conditions. While the former leads to deviations from the true atmospheric behaviour, the latter may amplify quickly at any time during the forecast due to non-linear dynamical processes in the model. These instabilities may not exactly correspond to the sensitivities in the real atmosphere, but it would be quite revealing to understand which fraction of forecast uncertainty can be attributed to the errors in initial conditions.
For the estimation of forecast uncertainty [Leith, 1974] introduced a Monte Carlo ansatz computing not just one but several randomly perturbed forecasts in order to estimate a probability density function. Given the fact that usually only a small number of forecast integrations with a complex model is possible, the strategy of sampling the initial uncertainty in high dimensional phase spaces by random perturbations is quite inefficient. A similar argument holds for perturbations from existing ensemble data assimilation (EDA) schemes. For example the 4DVar based EDA at ECMWF ( [Isaksen et al., 2010]) selects valid model trajectories consistent with sets of perturbed observations, but does not specifically look for fast separating trajectories. Ensemble Kalman filters (EnKF) use various types of co-variance inflation techniques, where most are either non-flow dependent or of random nature. [Hamill and Whitaker, 2011] discuss the constraints on spread growth in forecasts initialized from EnKF. Of course, analysis error co-variances are the framework of any useful perturbation strategy. They provide an estimate of the initial uncertainty given the amount and quality of local observations. But the same analysis error co-variances could be generated from a large number of different sets of perturbations. EDA schemes do not systematically analyse the spread growth in the analysis cycle, simply because this is not required for successfully running assimilation schemes. However, when generating a forecast, reliable growth of ensemble spread demands a detailed analysis of the model instabilities.
A popular technique to obtain at least some of this information is the empirical breeding method ( Toth, 1993, Kalnay andToth, 1997]). Bred vectors result from an iterative process, which alternates the evolution of a sphere of finite amplitude perturbations with the full non linear equations along a reference trajectory and their regular rescaling. During cycling the sphere of perturbations is distorted along the direction of the fastest growth and must be re-inflated to avoid its degeneration (e.g. [Annan, 2004]). The re-orthogonalisation can be done using the Gram-Schmidt process (e.g. [Benettin et al., 1980]) or singular value decomposition (SVD) (e.g. [Keller et al., 2010, Frolov andBishop, 2016]). Alternatively logarithmic breeding ([Primo et al., 2008]) re-inflates the bred vectors by amplifying the small scale variability in space and time. [Wolf et al., 1985] showed that the long term evolution of the axes of such a directed sphere can be used to estimate the spectrum of the Lyapunov exponents. Since the re-orthogonalisation does not change the first/largest direction of the sphere (bred vector), the corresponding Lyapunov exponent measures the long term growth of two nearby trajectories, which stay within a certain subspace of the full phase space, called the system attractor ( [Kalney, 2003]). Therefore, the bred vectors usually do not point into the direction of maximal local short term growth, which might be caused by a perturbation outside that subspace.
For a more systematic exploration of the phase space the classical singular vector perturbation (SV) technique has been introduced in NWP by [Mureau et al., 1993] and [Molteni et al., 1996]. The linear propagator of the model, the tangent linear model (TLM), is iterated for a specific optimization time and the fastest growing perturbations are given by the leading (right) singular vectors of the propagator matrix. At the European Centre for Medium-Range Weather Forecasts (ECMWF) the singular vector computations are based on a 48 hour time window (see [Leutbecher and Palmer, 2008], [Buizza et al., 2000] and [Magnusson et al., 2008]). They estimate the leading singular vectors by solving an equivalent eigenvalue problem using the Lanczos algorithm ( [Lanczos, 1950]), which in addition requires the approximation of the adjoint propagator. [Leutbecher and Lang, 2014], stated that "the raison d'étre for singular vector perturbations would vanish, if the EDA, together with the representation of model uncertainties, generated enough variance in the space spanned by the leading singular vectors", but this is still not the case. Computing SV is indeed a great effort, but there is no adequate substitute available, yet. For a detailed description about the past findings about SV in atmospheric sciences we refer to [Diaconescu and Laprise, 2012].
Recently, [Frolov and Bishop, 2016] created empirical tangent linear models based on ensembles (ETLM). The main drawback is the rank deficiency of the tangent space spanned by the ensemble perturbations, because the number of ensemble members is generally much smaller than the dynamical dimension. They reduce the rank deficiency problem by localisation assuming that dynamical relations are limited to small finite length scales. This approximation can be justified for data assimilation cycles and short range weather forecasting but does not hold for longer range predictions.
When using the tangent linear approximation in the classical SV computations, the algorithm might probably miss important non-linear developments. [Gilmore and Smith, 1997] showed that an ensemble based on these perturbations is limited with respect to the quality of its spread skill relation (see also [Anderson, 1997]). To overcome this problem [Mu, 2000] introduced nonlinear singular vectors (NLSV) and the conditional non-linear optimal perturbations (CNOP) ( [Mu et al., 2003]). Both generalise the SV technique and further research in this field came up by [Mu and Jiang, 2008a, Mu and Jiang, 2008b, Mu and Jiang, 2009. [Rivière et al., 2008] investigated differences between NLSV and SV in dependence of the amplitude and [Duan and Huo, 2016] showed that CNOP is superior to SV in the Lorenz96 system (L96). But for NLSV and CNOP a non-linear optimisation problem has to be solved making these approaches very expensive.
In this paper we introduce another approach for estimating growing perturbations in dynamical systems which is related to the classical SV technique. We approximate the singular vectors of a specific matrix, which consists of non-linear evolved perturbations of system states, using a block version of the Arnoldi method ( [Arnoldi, 1951]). This algorithm is matrix-free and avoids maintaining linearised or adjoint model versions. The Arnoldi Algorithm has been used also by [Wei and Frederiksen, 2005] in a study on error growth. But instead of computing singular vectors they aim for the leading finite time normal modes (FTNM) of linear propagators.
We introduce the methodology in section 2 and describe the dynamical systems for tests in section 3. The scores for comparing the growth of different perturbations are given in section 4, the results are presented in section 5, followed by a summary and conclusions (section 6).

Validity of the Arnoldi approximation
For the generation of strongly growing perturbations we make use of the Arnoldi method. In this section we give the basic definitions and describe the assumptions made. Moreover, we discuss the validity of the approximations and provide an estimate of its accuracy. We denote the development of a (continuous) dynamical system by the following function (1) Here, T ∈ R + is an arbitrary period of time and x ∈ C n denotes an initial state of the system. The number of dimensions of the dynamical system is denoted by n ∈ N + and the phase space is C n . Hence, ϕ T (x) denotes the evolved state of x after a time T . Based on this, we define the evolved increment function (EIF) where h ∈ R + is the amplitude of a perturbation in direction v ∈ S n and the n-sphere is denoted by S n = {x ∈ C n , ||x|| = 1}. The EIF contains the information about the growth of an initial perturbation in x 0 after the optimization time interval T . A perturbation of the state x 0 with amplitude h having maximal growth within the optimization interval can be defined as follows: Here, || · || N is a norm which measures the perturbation growth. In this paper the Euclidean norm is used for L96 and the Total Energy "Norm" for SWM (see section 3.1). Equation (3) is known in literature as the conditional non-linear optimal perturbation (CNOP, [Mu et al., 2003]). We do not aim for a full computation of the CNOP but look for a suitable approximation of growing directions, which allows an efficient estimation of the local expanding subspaces in the models phase space. We introduce the evolved increment matrix (EIM) which consists of (non-linear) evolved perturbations where x 0 ∈ C n is an (unperturbed) state of the dynamical system and e i is the i−th unit vector. The EIM consist of non-linear evolved perturbations. One could use a linear model for obtaining evolved increments as well, but we want to keep as much as possible of the non-linear dynamics. An EIM is not a tangent linear propagator of the system, it is rather a "collection" of non-linear (or linear) evolved states, merged in a matrix. By using an EIM, one has the ability to work with Krylov subspace methods instead of non-linear optimization solvers. But at the same time, it is not limited to linear growth.
In particular, if the optimal perturbationv T,x0,h (Equation (3)) is equal to an arbitrary unit vector, the leading singular vector of the corresponding EIM is equal to this optimal perturbation. Otherwise a linear combination of non-linear evolved unit vector perturbations can be found. The larger the part of an arbitrary unit vector in the linear combination ofv T,x0,h is, the smaller becomes the difference between EIM's first singular vector and CNOP.
Assuming a sufficiently smooth trajectory, the error depends on the curvature of the trajectories and on the size of h, which can be shown using Taylor approximation. Hence, for sufficiently small quantities the following approximation holds: Here, v i denotes the i−th component of v.
With this approximation, it follows that these matrix-vector-products are approximately equal to an EIF. This is the legitimation for the changeover from a fully non-linear problem to the EIM in order to enable the use of Krylov subspace methods, such as the Arnoldi Algorithm. In addition, it is needed as an approximation for the upcoming matrix-vector products Y T,h (x 0 ) · v within the Arnoldi algorithm. The latter is the major reason for introducing the EIM, a recovery of CNOP is not the main goal here. But it is an important side effect, that the method can also cover non-linear effects, partially.
To obtain the growing directions in the phase space the leading singular vectors of EIM have to be approximated. In this paper we concentrate on the approximation of the first right singular vector of EIM (EIM-SV).
For the computation of singular triplets (singular values and their singular vectors), the Krylov algorithm of Lanczos is widely used. The Lanczos method is applied to solve the equivalent eigenvalue problem with the matrix Y * Y which in addition requires the adjoint Y * . In contrast to this, we introduce an approach which uses the Arnoldi method without using the adjoint of the EIM. Hence, that approximation with the Arnoldi method is rather direct and avoids a detour through the equivalent eigenvalue problem to generate the Krylov subspaces. Adjoint-free resp. transpose-free approaches for singular triplet approximation are rare. There is a paper on this topic by [Berry and Varadhan, 1995]. These authors discussed a block Arnoldi method used for singular value approximation, too but this paper does not contain a theoretical background. We provide a theoretical justification below, in order to legitimate the application of perturbation generation this way.
Applied to some matrix B ∈ C n×n the block Arnoldi method described in the Algorithm below starts with a (small) number ∈ N of orthonormal vectors q 1 , . . . , q ∈ C n and computes an extended orthogonal basis matrix Q r = (q 1 , . . . , q r ) ∈ C n×r , r > as well as a square matrix H r ∈ C r×r satisfying where G r E T is a residual of low rank ( [Saad, 1996]). In detail E ∈ R r× contains the last columns of the identity matrix and G r ∈ C n× is orthogonal to Q r , i.e. Q * r G r = 0. The last property shows that H r = Q * r BQ r is the projection of B in the subspace range(Q r ). For = 1 one obtains the original Arnoldi method.
The basic idea here is to use the singular values of H r as approximations of those of B. In the ideal case with zero residual G r = 0 the identity (6) shows that Q r is the basis of an invariant subspace of B and the eigenvalues of H r are also eigenvalues of B. However, this equality no longer holds for the singular values of H r and B. Hence, the proposed approximation especially makes sense for problems where there is a certain correlation between eigen-and singular values. In order to understand accuracy issues the difference between these two sets of singular values will be investigated.
Our estimates are based on a well-known error estimate (e.g. [Parlett, 1980]) for Hermitian matrices A ∈ C n×n . It states that if (x,λ) is an approximate eigen-pair of A having a small residual ε ≥ 0, then there exists an eigenvalue λ of A within this distance toλ: This error estimate will be applied to the hermitian matrix which has the singular values of B as eigenvalues. The error will depend on parts of the matrix B not contained in the projection Q * BQ. Therefore, an orthogonal extension W r ∈ C n×(n−r) of Q r is required such that (Q r , W r ) is unitary.
Then, for each i ∈ {1, . . . , r} there exists a singular valueσ of B satisfying the following estimate . . , r} are approximations to eigenvectors of A from (8).
With the decomposition I = Q r Q * r + W r W * r it holds due to (6) that Some discussion is required on the meaning of estimate (9). The residual G r 2 is available during execution of the Arnoldi algorithm and its size may be used as a stopping criterion for choosing the dimension r of the basis. So, it may be quite small and for the moment we neglect it in order to concentrate on the meaning of the norm Q * r BW r 2 . This norm is related to the non-normality of the matrix B. Normal matrices commute with their adjoint, BB * = B * B and the triangular matrix Hence, if BB * − B * B or N B becomes small relative to B 2 the errors become small. A convenient way to describe non-normality is its departure ( [Henrici, 1962]) Using the departure gives the possibility to derive another more concrete variant of the estimate. In these discussions often the unitarily invariant Frobenius matrix norm B 2 F := n i,j=1 |b ij | 2 is used in literature. Unfortunately, the Frobenius norm may lead to weaker estimates in many cases.
Lemma 1 Let Q ∈ C n×r be orthonormal and W a basis extension such that (Q, W ) is unitary. If BQ = QH, then Now, since Q defines an invariant subspace of B, the eigenvalues in Λ B consist of those from the diagonal part Λ H of H and the diagonal partΛ of W * BW . Hence, rearranging the terms gives from which (11) follows.
Hence, with (9) and for G r = 0 the error estimate for the singular values becomes This shows that the error will be small if B is not too far from normality and/or if the Krylov subspace spanned by Q r is large enough to collect sufficient amount of information about B in the projection H r = Q * r BQ r . In time-dependent partial differential equations often the part with the highest order is linear and its discretisation leads to a hermitian or skew-hermitian matrix depending on the order. Lower order terms correspond to compact perturbations, only. In these cases B may be nearly normal according to the following estimates due to [Lee, 1995]. It uses the hermitian H(B) := 1 2 (B + B * ) and skew-hermitian S(B) := 1 2i (B − B * ) parts of matrix B and states that So, near-normality of B is given, if its skew-hermitian or its hermitian part is small compared to B leading to small errors in singular values. The ratio BB * −B * B 2 / B 2 is favourable in both systems (< 0.05 in L96 and < 0.1 in SWM). Unfortunately, Formula (11) provides only a weak estimate for both systems. Anyway, computations showed that e.g. twelve dimensional subspaces for L96 (with 50 dimensions) achieve a ratio of H 2 / B 2 > 0.8. Almost the same holds for 40 dimensional subspaces in SWM (1587 dimensions).

Block Arnoldi Algorithm
The Arnoldi method is matrix-free and does not require knowledge of the whole EIM. As stated above, the ability to approximate matrix-vector-products Y T,h (x 0 ) · w of the EIM with arbitrary vectors is needed, only. This can be done by using the approximation (5). Hence, although the EIM is the central matrix for the singular vector approximation, the matrix is not used practically in the algorithm described below.
The Arnoldi method simultaneously generates a sum of Krylov subspaces with an orthonormal basis Q r and also a generalized Hessenberg matrix H r , which is the best representation of B within that subspace.
In this article a block extension of the classical Arnoldi algorithm is used, ([Sadkane, 1993]). The block method allows to start the iteration with ∈ N + linearly independent vectors. The block size within the iteration process is implicitly specified by the number of starting vectors, so these two values are identical. The dimension of the generated subspace equals the number of iteration loops times the block size. We call the perturbations generated this way block Arnoldi perturbations (BAP). The procedure to compute BAP, is described in Algorithm BAP below.
The block in the i−th row and the j−th column of the block Hessenberg matrix H is denoted with is a column vector, which denotes the i-th column of the corresponding matrix and Q = (Q (1) , . . . , Q (m) ) ∈ C n× m contains the orthonormal basis of the generated Krylov subspace.

Algorithm BAP
amplitude of perturbation x 0 ∈ C n unperturbed state of the system Q (1) ∈ C n× initial vectors merged in a matrix m ∈ N number of loops within Arnoldi algorithm k ≤ m ∈ N number of BAP to be returned 2: Q (1) ← orth(Q (1) ) orthonormalization 3: for j = 1 : m do 4: for i = 1 : do 5:  return P 18: end procedure end The first r vectors of the basis matrix Q are denoted with Q r (suitable for H r and H). These are computed in the algorithm up to line 12 and satisfy Equation (6). The final transformation leads with H r = U ΣV * to an approximate singular vector relation The matrix P ∈ C n×k contains the first k right singular vectors of Y T,h (x 0 ), which are the first k columns of Q r V . For this paper always k = 1 is used. Hence, BAP refers here to the approximation of the first singular vector only.
This routine allows the generation of initial perturbations for strongly (non-linear) growth. There is no need for an expensive classical optimization solver nor the usage of an equivalent eigenvalue problem.
It allows to compute these approximations without linear or adjoint models of the system.
The cost for BAP (in complex systems) are mainly determined by the number of required model integrations. This can be calculated with the following equation: where ∆t denotes the discrete time step of the model and · denotes the ceiling function. It may be of great advantage, that each Arnoldi loop can be divided into l parallel computations.

Starting Arnoldi
The choice of the initial vectors for the block Arnoldi iteration is relevant for a good performance. We test two different strategies of choosing initial vectors: random initialisation and chord vectors as differences between states on the past trajectory (see Figure 1). Random vector starts provide a benchmark to compare with. The choice of chord vectors may be varied by choosing length and distance between two consecutive chords. For the experiments in L96 and SWM we used two discrete time steps from start to end of one chord and skip 13 steps between two of them. We assume that directions obtained from chord vectors are more closely related to the local dynamics and, because they are derived from past states, the chords represent (almost) balanced directions. For shorter periods of time trajectories may move in a relatively small subspace of the complete phase space. Hence, a span of chord vectors should lead to a richer subspace and balanced which is more relevant for the local dynamics and may provide a better start up for the Arnoldi iteration than a random selection of perturbations. Nevertheless, the first chord vector alone will probably not be a good initial choice because it is almost a tangent of the trajectory.
Differences between states separated by a relativ long period of time have already been used to generate perturbations in dynamical systems. For example, [Schubert and Suarez, 1989] as well as [Magnusson et al., 2009] used differences between weather patterns from the past as "Random Field" perturbations to initialize ensemble weather forecasts.

Dynamical systems
We test the BAP technique for two different dynamical systems. These are L96 ( [Lorenz, 1995]) and a shallow water model (SWM, e.g. [Pedlosky, 1986]) on a two dimensional domain. L96 is derived from fluid convection and is based on a system of ordinary differential equations with a scalable number of degrees of freedom. It is defined by where y i is the i-th component of y = y(t) ∈ R K , i ∈ {1, ..., K} and y −1 := y K−1 , y 0 := y K and y K+1 := y 1 . The number of dimensions is given by K > 3 ∈ N and F ∈ R + is a constant value, the so called "Forcing Term". In this paper F = 8 and K = 50 is used, which is similar to the choice Lorenz originally discussed (F = 8, K = 36) and we start the integrations perturbing the fixed point, y = (F, . . . , F ). Under the described conditions L96 shows chaotic behaviour. For the computation the standard fourth order Runge Kutta method is used with a discrete time step of 0.01. In contrast to L96 the SWM is based on hyperbolic partial differential equations. It describes the evolution of waves under the condition, that the horizontal length scales are much larger than the vertical one. The basic form of this model is given by Here, t ∈ [0, ∞) is the time variable, z = (z 1 , z 2 ) T ∈ Ω the two-dimensional, horizontal space variable and g is the acceleration due to gravity, which is g = 9.81. The functions h = h(t, z), u = u(t, z), v = v(t, z) describe the height of the fluid above ground, the speed of the fluid in direction z 1 and z 2 respectively. The first equation ensures mass conservation, the two others impulse conservation. The boundary conditions are of Dirichlet type, which means that u(t, z) = 0, v(t, z) = 0 for all z ∈ ∂Ω, which causes reflectivity at the domains boundary. In this paper the SWM is solved on the two dimensional rectangular domain Ω = [0, 22] 2 ⊂ R 2 . We use a discrete time step of 0.01 and a mesh size for the domain of 1, which leads to 23 2 = 529 grid points (441 inner plus 88 boundary points). Together with the three physical variables (height of the fluid, velocity in both horizontal directions) at each grid point, we obtain a dynamical system with 1587 degrees of freedom in total. The hyperbolic initial-boundary-problem is solved with the finite difference scheme of Lax-Wendroff ([Lax and Wendroff, 1960]). [Saiduzzaman and Ray, 2013] showed that the Lax-Wendroff scheme is adequate for solving the SWM.

Total energy in SWM
The SWM is a model of a physical system. The chosen norm for perturbation generation should take this into account and a measurement based on physical properties is more convenient here than the Euclidean norm. The total energy of a state x = (h, u, v) in SWM can be computed with (see [Koh and Wan, 2015]). Obviously, E(x) is not a true norm since it lacks homogeneity. However, the name "Total Energy Norm" can be found quite often. In order to remove the problem, that E is not a true norm, transformed states are introduced: Then the equality ||x|| 2 2 = E(x) holds. With these modifications, the Euclidean norm can be used for energy measurement and also the standard scalar product is available.
We run the SWM also with the euclidean norm and obtained qualitatively similar results, but do not discuss these here.

Measuring perturbation growth
The comparison of the different perturbation methods with respect to their ability of generating growth is mainly done by computation of short term growth rates within a limited period of time. The logarithm of the growth rates are used, since we work with exponential growth rates (EGR) which are defined as follows: Here, x 0 is an unperturbed state of the system and v ∈ C n is a perturbation defined by the perturbation technique PT. The discrete time step ∆t is set equal to the discrete time step of 0.01. The amplitude h (see Equation (2)) of perturbations is h = 0.015 in L96 and h = 0.035 in SWM throughout this paper. The EGR of a specific perturbation technique will vary at different states of the system. Hence, we work with the arithmetic mean of several EGR in order to obtain a mean exponential growth rate (MEGR) (cf. ). The presented MEGR for different perturbation techniques are always computed from a mean of 100 EGR, which are obtained at randomly chosen states along the reference trajectory. In order to ensure that all transients have decayed, at least 1500 time steps were computed before the perturbation experiments start. Besides the MEGR, for comparison of the growth for different techniques, an integral of the MEGR is used as value of reference. To obtain a single scalar value, we approximate the integral within the optimization time and relate it to the corresponding value of EIM-SV. EIM-SV perturbations are the basis for BAP thus a comparison of both is naturally and reasonable. This leads us to the definition of the relative exponential growth integral (REGI): where the growth rates are aggregated within the optimization time T = 0.2. Please note that the construction of EIM-SV and BAP perturbations is only based on states which occur at the end of the optimization period. The exact behaviour before and the whole behaviour after this period is not taken into account. In contrast to this, the MEGR and REGI are based on momentary growth rates. This is beneficial due to additional information but it does not fit exactly the way the optimization of perturbation growth is done.
For SWM also the computational costs of BAP and the full EIM-SV are compared, relatively. A detailed comparison of the costs is of less interest for L96, due to its smaller size.

Lorenz 96
In L96 EIM-SV perturbations lead to growth curves with a strong exponential growth in the beginning, which decay relatively fast later on. Figure 2 compares the mean exponential growth rates of BAP with those from the full EIM-SV, where the BAP approximations have different numbers of iterations. Figure  2, (a)-(c) present the BAP experiments with randomly selected initial vectors (BAP-R) while Figure 2, (d)-(f) shows the SV approximations starting from chord vectors (BAP-C). The REGI measurement for both BAP-R and BAP-C can be found in Table 1. The short-term growth peak within the optimisation time and converge to a certain level of average error growth for longer lead times. This convergence level is an intrinsic property of the dynamical system and is determined by the largest Lyapunov exponent (see [Lyapunov, 1892, Lyapunov, 1992).
BAP obtained from one iteration loop (in Algorithm BAP m = 1) do not involve a (real) Arnoldi step. In this case the algorithm simply generates the best directions within in the subspace spanned by the initial vectors. Consequently, in Figure 2 (a) it is seen that one random perturbation does not show positive growth but a shrinking behavior in the beginning.   Because of its construction, the first chord vector is almost tangential to the trajectory. Hence, perturbations pointing in this direction show almost no growth in the beginning (Figure 2 (d)). This disadvantage is motivation for using more chord vectors and it vanishes, indeed, if this is done ( Figure  2, (e) (f)). BAP performance in L96 is very similar for chord and random initialisation (Table 1).
One can see that the EIM-SV perturbations are fully recovered if and only if the size of the Krylov space reaches the size of the system. Krylov spaces of smaller size should cover growing directions above average, but one would expect that at least a small part of the EIM-SV is not covered therein. It is of interest that large parts of the growth are covered already in relatively small subspaces, which is the case here.
The computational cost-advantage of BAP relative to EIM-SV is relatively small for this problem, because Krylov subspace methods are much more efficient in larger systems. The dimensional size of L96 is still small for Krylov methods. Using subspaces with e.g. 25 dimensions leads to a REGI of 0.85, in exchange an effort of 0.5 is needed, relative to the full computation. For the full recovery of EIM-SV with BAP, through the usage of Krylov spaces with dimension 50, the same computational effort is required (even a bit more), which is expectable a priori. The choice of initial vectors is irrelevant for the computational costs since these are determined by the size of the subspaces only. Indeed, results in L96 are less interesting according to computational costs, but in this system one can observe that BAP is functional in principle. To investigate the behaviour of BAP in larger systems the SWM is considered.

Shallow Water Model
In SWM the ratio of Krylov subspace sizes can be much smaller compared to the system dimension (1587). Due to the larger dimension of SWM, we extended the range for testing up to a Krylov subspace size of 1000.
One should note that the absolute growth rates depend on the chosen norms and intrinsic properties of the used model. Hence, these should not be compared directly.
In the same way as for L96 Figure 4 shows the MEGR for the SWM. The growth curves of EIM-SV and BAP are qualitatively similar in L96 and SWM. This includes a relatively fast decay of growth rates, the shape of the curves and the behaviour of single chord vectors or random vectors.
In SWM EIM-SV produces damped oscillations of MEGR after the period of optimization. With an increasing block size and a larger number of iterations BAP can reproduce the oscillating character of EIM-SV. Once again the EIM can not cover anything after this period, but this behaviour is interesting according intrinsic properties of SWM.
In fact, in SWM it is obvious that the quality of the BAP approximation is not only determined by the dimension of the Krylov subspace but also depends on the number and choice of initial vectors.
It can be seen in Figures 4 and Table 2 that the increase of growth rates is in many cases smaller if random initial vectors were chosen. Using chord vectors enables for many cases a better approximation Perturbations obtained from a e.g. 40-dimensional Krylov subspace are already sufficient to almost accurately reproduce even the oscillating character of the leading EIM-SV in Figure 4. Although the chord choice for initial vectors shows a curious behaviour in SWM (with total energy "norm"): Sometimes the REGI measurements shrink although the size of the subspaces increases ( Table 2).
As stated in section 4, this is possible due to the construction of the perturbations and the definition of the growth scores, although this is an unwanted but noticeable behaviour. Because it does not occur if random initial vectors were chosen, unfavourable properties of the chosen chords in this system may cause this. However, the reasons causing this are not clear.
Nevertheless, chord vectors are an interesting and still promising option for initial vectors here. At least we would like to point out, that the results show, that the choice of initial vectors is relevant for the method.
Besides the behaviour of the growth, the spatial structure of an EIM-SV and two approximations thereof are shown in Figure 3. The EIM-SV (Figure 3, first row) has weight concentrates at the inner grid points of the top left corner.
Perturbing the corners is reasonable, because this will cause interfering waves after reflection at the boundaries. Hence, we observed that in SWM EIM-SV perturbations occur at the four corners of the domain. This structure is quite well recovered by a BAP (started with chord vectors) obtained from a subspace of dimension 1000 (Figure 3, third row). Also the BAP computed in a subspace of size 40 ( Figure 3, second row) contains a relevant weight at the top left corner, but of course the overall picture is less clear. However, taking into account that the used subspace here covers less than three percent of the whole space, this can be classified as a good result.  Here, 4 (20) initial vectors and 10 (50) iteration loops are used for approximation. The first (second / third) column shows theh (ũ /ṽ) (see. Eq (19)) part for each perturbation, respectively. In contrast to L96 the version of the SWM used here is large enough to get significant benefits from BAP with respect to the computational costs. Table 3 shows the low costs of BAP relative to EIM-SV. For the BAP with block size of four and ten iteration loops the computational costs are at only 0.024 relative to the full computation of EIM-SV, while the corresponding REGI reaches 0.75.
We like to point out, that the full computation of EIM-SV is very expensive and can not be compared to the classical SV approximation with a linearised model and the Lanczos algorithm but BAP are several orders of magnitude cheaper. We are convinced that BAP is a method of large potential to generate fast growing perturbations in high dimensional dynamical systems, but it is not exactly clear how BAP compares to classical SV.

Summary and Conclusions
A new parallelisable method for generating fast growing perturbations in complex dynamical systems, especially for estimating uncertainty using ensemble prediction systems (EPS) in numerical weather forecasting, has been introduced in this paper.
It is based on the matrix-free approximation of the singular vectors of an evolved increment matrix (EIM) with a block Arnoldi method. This concept is different from the classical approach of computing singular vector perturbations (SV) for ensemble forecasting, implemented e.g. at ECMWF ( [Leutbecher and Palmer, 2008]), which estimates the leading singular vectors a tangent linear propagator. In contrast to classical SV, this block Arnoldi perturbation (BAP) method requires short forecasts with the full non-linear model only, covers also non-linear effects partly and saves the costs of setting up and maintaining linear and adjoint model versions.
We observed promising results for the Lorenz96 (L96) system with 50 and for a shallow water model (SWM) with 1587 degrees of freedom. BAP were tested with two different kinds of initial vectors, namely random vectors and differences of past states from the reference trajectory (chord vectors). In the more complex SWM the convergence of the block Arnoldi approximation improves especially for smaller subspaces when the algorithm is started by chord vectors instead of random initial vectors. The results show, that the choice of the initial vectors is a relevant part of the method.
We classify an approximation of strongly growing perturbations, as an important subgoal for the generation of reliable ensemble prediction systems. Further research is needed for the questions of how to exactly create an ensemble prediction system using BAP and how such a system works for NWP especially in direct comparison to classical SV.
We are convinced that the BAP approach has the potential to become a relative cheap and relative easy applicable method for identifying the local expanding subspaces in complex dynamical systems.