A mathematical framework for inverse wave problems in heterogeneous media

This paper provides a theoretical foundation for some common formulations of inverse problems in wave propagation, based on hyperbolic systems of linear integro-differential equations with bounded and measurable coefficients. The coefficients of these time-dependent partial differential equations respresent parametrically the spatially varying mechanical properties of materials. Rocks, manufactured materials, and other wave propagation environments often exhibit spatial heterogeneity in mechanical properties at a wide variety of scales, and coefficient functions representing these properties must mimic this heterogeneity. We show how to choose domains (classes of nonsmooth coefficient functions) and data definitions (traces of weak solutions) so that optimization formulations of inverse wave problems satisfy some of the prerequisites for application of Newton's method and its relatives. These results follow from the properties of a class of abstract first-order evolution systems, of which various physical wave systems appear as concrete instances. Finite speed of propagation for linear waves with bounded, measurable mechanical parameter fields is one of the by-products of this theory.


Introduction
Inverse problems for waves in heterogeneous materials occur in seismology, ultrasonic nondestructive evaluation and biomedical imaging, some electromagnetic imaging technologies, and elsewhere. The data of such problems are often idealized as traces of Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
wave fields on space-time hypersurfaces (or surfaces of even lower dimension), and nonlinear least-squares (or other data-fitting) methods developed for their numerical solution, that is, for finding the coefficients in the systems of partial differential equations chosen to model the wave physics. Newton's method (or one of its relatives) is a natural choice of solution method for these optimization formulations, as its rapid convergence compensates to some extent for the very large computational size of reasonable discretizations-see for example Epanomeritakis et al (2009).
Natural and man-made materials may be so heterogeneous that few limits can be placed on the spatial regularity of coefficients representing material parameters (for the case of sedimentary rocks, see for instance (Walden and Hosken 1986, Bourbie et al 1987, White et al 1990). However, traces are well defined only for fields possessing some space-time regularity, and optimal convergence of Newton-like methods requires that objective functions and constraints be regular functions of the model parameters, in some sense. Since nonsmoothness of the material parameter (coefficient) fields implies nonsmoothness of dynamical (solution) fields, it is a priori unclear that the required traces exist, or that the fields depend sufficiently smoothly on the material parameter (coefficient) functions to permit Newton-like optimization methods to be applied. While discretized models may a fortiori produce nominally well-defined data simulations and smooth objective (data misfit) functions, accurate approximation of the continuum fields implies that the continuum limit will determine the behavior of solution algorithms for the corresponding discretized inverse problems. Also, numerical solutions to discretized inverse problems respect the underlying continuum physics only insofar as they converge under refinement of the discretization (at least in principle), and such convergence generally requires that the continuum problems have well-behaved solutions.
The theory presented here provides a mathematical foundation for some common idealizations of inverse problems in wave propagation and approaches to their solution, for material models of bounded and measurable dependence on space variables. Measurability means roughly the existence of averages over arbitrary volumes, which seems a reasonable requirement for physical parameter fields. Most such fields are bounded as a matter of principle (positivity of density) or observation (p-wave velocity range in metals, . . . ).
The dynamical laws considered in this paper may be cast as symmetric hyperbolic systems, either differential or integro-differential, for vector-valued fields u defined on a domain ⊂ R d (typically d = 1, 2 or 3). Denoting the dimension of the dynamical state vector u by k ∈ N, the systems treated in this paper take the form a ∂u ∂t + p(∇ )u + bu + q * u = f in × R; u = 0 for t < 0, in which a, b are k × k matrix-valued functions on , q is a k × k matrix-valued function on × R with q(x, t ) = 0 for t < 0, and p(∇ ) is a k × k matrix of constant-coefficient first-order differential operators in the space variables: Such a system is symmetric hyperbolic if a(x), q(x, t ) and p j are symmetric for all x ∈ , t ∈ R, respectively j = 1, . . . , d, and a(x) is uniformly positive-definite: C * , C * ∈ R exist so that 0 < C * C * , and C * I a(x) C * I, a.e. x ∈ in which I is the d × d identity matrix, and the inequality sign signifies that the matrix on the right differs from the matrix on the left by a positive definite symmetric matrix.
This paper has three central objectives. First, we describe constraints on data (coefficients a, b, p, q and right-hand side f ) and solution under which (1) is well posed, even when the coefficient functions a, b, q are permitted to be quite irregular in their dependence on x ∈in fact, merely bounded and measurable. Second, we determine a sense in which the solution of (1) is regular (C k , k 0) as a function of its coefficients. Third, we characterize conditions under which traces of solutions on time-like hypersurfaces are well defined, and regular as functions of the coefficients.
The complete description of the system (1) requires the specification of domains and ranges for all of the operators appearing in it. We will require that these be chosen so that the expression p(∇ ) defines a skew-adjoint operator.
We will describe two important examples (acoustics and viscoelasticity) in detail below, as symmetric hyperbolic problems of the form (1). The coefficient matrices a, b and q encapsulate spatially varying relations between space and time rates of change of the dynamical state u. A particularly important example, a common component of all linear continuum mechanical systems, is the linearized Newton's law, in which the product of the material density and the time rate of change of velocity balances the sum of internal and external forces acting on the material. The internal forces are expressed as the divergence of a stress tensor. The material density may vary with position and appears somewhere in a, the velocity and stress fields form part of u, and the (spatially invariant) divergence is part of p. These relations may be expressed in algebraically equivalent form via other constitutive laws governing particular examples, but are always present in some form. The time convolution kernel q expresses dissipative and dispersive mechanisms built into the mechanical laws. These mechanisms may originate in upscaling of microscopic physics (for example, poroelasticity), or express generic dissipative mechanisms (viscoelasticity), or be absent (elasticity, acoustics), in particular examples of (1). The right-hand side vector f is a k-vector valued function on × R, vanishing for t < 0 and representing energy input to the system, in the form of external forces, defects in the constitutive laws, and so on.
As in the similar study of Stolk (2000) for second-order hyperbolic systems, we represent (1) as an instance of a class of abstract autonomous integro-differential systems. We study these abstract systems within a framework modeled after the duality approach introduced by Lions and his collaborators in the 1960s (Lions 1971, Lions andMagenes 1972). A natural notion of weak solution is central to this approach, and complements that of strong (pointwise) solution. The class of first-order systems studied here possesses a remarkable property: smoothing solutions in time converts weak solutions to strong (pointwise in t) solutions. Specialized to systems such as acoustics or elastodynamics, this property implies that appropriate traces of solutions are well defined, provided that the right-hand-side or source term is minimally smooth in time (only!). Existence, uniqueness, and regularity of the solution as function of problem data-both right-hand side and (operator) coefficients-also follows from this fact.
The abstract theory accomplishes even more than that, as it applies to a much wider range of dynamics than those commonly occurring in continuum mechanics, accommodating firstorder systems with operator coefficients. This added generality creates no additional difficulty for the basic theory. It is in fact very useful: as will be explained in the Discussion section, it justifies certain infeasible-model methods for inverse problems in wave propagation.
Another useful by-product of the theory is the finite speed of propagation property for hyperbolic systems with bounded, measurable coefficients, a result which so far as we can tell is new. This follows from the similar property for systems with smooth coefficients via the continuous dependence of solutions on the coefficients in the sense of convergence in measure.
The next section of this paper states and discusses the main results to be established for symmetric hyperbolic systems (1). In the third section, we show how acoustics may be construed as an example of our framework: we formulate the physics of linear acoustics in the form of a symmetric hyperbolic system, and show how to define the operators involved so that our main results apply to this example, in particular so that the differential operator p(∇ ) is skew-adjoint. In the following section, we do the same for viscoelastodynamics. Following these extended examples, the fifth section of this paper introduces a class of abstract evolution systems encompassing all of the concrete systems mentioned earlier, and lists the main results to be established concerning it. Having explained our main results and some important examples to which they apply, we devote the sixth section to a discussion of the related literature and various implications for inverse problems in wave propagation, including the importance of the general (operator coefficient) case of the abstract theory in formulating certain inversion algorithms, representation of energy sources, and several other matters not addressed in the remainder of this paper. The succeeding sections develop proofs of the main results for the abstract evolution class described in the fifth section. The final section of this paper specializes these abstract results to provide proofs of the theorems stated in the next section. Two appendices discuss skew-adjointness of the acoustic grad-div operator, and the case of systems with no memory term (q = 0 in (1)), for which initial value problems make sense.

Main results for symmetric hyperbolic systems
Our results hinge on an assumption concerning the skew-symmetric differential operator p(∇ ), which to begin with may be viewed as a densely-defined operator on L 2 ( ) k , whose domain is a subset of C ∞ ( int ) k that contains C ∞ 0 ( int ) k . We assume that p(∇ ) extends to a skew-adjoint operator P with dense domain V ⊂ L 2 ( ) k , and that V is metrized with the graph norm of p(∇ ) or its equivalent, which we denote by · V .
The first of our two major sets of results on symmetric hyperbolic systems addresses well posedness.
Theorem 1. Suppose that, in addition to the hypotheses explained above, the right-hand side f has a square-integrable t-derivative: f ∈ H 1 loc (R, L 2 ( ) k ), and is causal, that is, f (t ) = 0 for t < 0. Then there exists a unique causal u ∈ C 1 (R, L 2 ( ) k ) ∩ C 0 (R, V ) satisfying (1) a.e. in for each t ∈ R. Moreover, there exists an increasing function C : R + → R + , depending on C * , C * , and bounds for b, q, so that for every t ∈ R, The second main result addresses the existence of traces and their regular dependence on the coefficients. Suppose that ⊂ is a smoothly embedded m-dimensional submanifold, , for any T > 0. Abusing notation, call this induced map S[m] as well.
Having parametrized the various possible trace maps, we define the data prediction, or forward, operator, F f ,m , by Theorem 2. In addition to the hypotheses of theorem 1, suppose that S[m] extends continuously to V . Then for any T > 0, A stronger result concerning continuity per se, involving a weaker topology on the coefficient set M, is also important. Recall that a sequence of real-valued measurable functions {g m : m ∈ N} ⊂ L ∞ ( ) converges to zero in measure iff for any > 0, Denote by u m the (strong) solution of (1) provided by theorem 1, with coefficients a m , b m , and q m , likewise by u the solution with coefficients a, b, and q, with common causal right-hand side f ∈ H 1 loc (R, L 2 ( ) k ). Then for any T > 0, lim m→∞ u m − u L ∞ ([0,T ],L 2 ( )) = 0.
One way to generate sequences converging in measure is by mollification. We exploit this observation to use theorem 3 in conjunction with a well-known fact about symmetric hyperbolic systems with smooth coefficients, to prove Theorem 4. Suppose that (a, b, q) ∈ M, > 0, and that τ ∈ R + satisfies for almost every x ∈ and every ξ ∈ R d for which |ξ | = 1. Suppose ω ⊂ , T > 0, and that Remark. Note that theorem 2 expresses 'differentiability with a loss of one derivative', in two senses. First, taking k = 2, even though f ∈ H 2 loc (R, L 2 ( ) k ), the map F f ,m is only once differentiable. Second, even though F f ,m a priori takes values in C 1 ([0, T ], L 2 ( ) l ), it is of class C 1 only as a map with range C 0 ([0, T ], L 2 ( ) l ).
This 'loss of derivative' phenomenon is a fundamental feature of the coefficient-to-solution map for symmetric hyperbolic PDEs, and accounts for much of the ornery nature of the related inverse problems. A simple explicit example illustrates the second sense of derivative loss, and shows that the hypotheses of theorems 1 and 2 cannot be significantly weakened.
The linear advection equation 1 c for waves moving to the left with speed c > 0, conforms to the requirements of theorem 1, with = R, with the choices and A = multiplication by 1/c. A proof that p(∇ ) is self-adjoint with domain V follows from use of lemma 6 from appendix A, together with the Fourier transform characterization of H 1 (R).
The trace manifold in this example is = {x = 0}, and k = l = 1; the map m is simply the identity, and is suppressed from the notation. Thus traces take values in L 2 ( ) = R.
The causal solution of (6) is formally That is, this expression gives the solution guaranteed by theorem 1 when f ∈ C ∞ 0 (R 2 ). It is easy to check that for causal f ∈ L 2 loc (R 2 ), the integral defines a locally squareintegrable distribution with well-defined trace on x = 0-in fact, a weak solution of (6) in the sense explained in the section 6. Weak solutions do not generally possess well-defined traces; the reasons for this exception are mentioned briefly below, and at more length in the Discussion section. In this particular example, the trace on x = 0 defines a continuous map : (H being the Heaviside function), one finds that for c = 1, (m is suppressed, as it is simply the identity in this case). Thus with f ∈ L 2 loc (R, L 2 (R)), we cannot conclude that F f takes values in C 0 (R, R).
For c > 1, as c → 1 in the mean-square sense, but not in the sense of C 0 (R, R). Thus the condition f ∈ H 1 loc (R, L 2 (R)) cannot be replaced by f ∈ L 2 loc (R, L 2 (R)) in theorem 2, part (i).
To illuminate the connection between right-hand side regularity and regularity of F (theorem 2, part (ii)), select w ∈ L 2 loc (R) and multiply by the characteristic function of an interval [a, b], 0 < a < b, to generate a right-hand side: [a,b] .
A brief computation shows that The Lebesgue differentiation theorem implies that for any T > 0, F g : R+ → C 0 ([0, T ], R) is continuous, but not differentiable: indeed, if w is discontinuous, then the Newton quotient converges point-wise a.e. to a discontinuous limit.
If w ∈ H 1 loc (R) is causal, then F g is well defined as a map R + → C 1 (R, R). The formal computation of the derivative, actually expresses the derivative of F g : R + → C 0 (R, R), but F g : R + → C 1 (R, R) is not differentiable-if it were, the value of the derivative would necessarily be the same distribution as with respect to the weaker metric, but under the assumption on w one can only conclude that DF g [c]δc ∈ C 0 (R, R).
In general, if w ∈ H s loc (R) is causal, then F g is well defined and continuous R + → C s (R, R) but differentiable only as a map R + → C s−1 (R, R). This example exhibits the property 'differentiable with loss of derivative' in the second sense mentioned after the statement of theorem 2.
This example does not illustrate first type of derivative loss described in the theorem, that is, the further drop of regularity in t required in general to define the trace. The reason for this is related to the existence of traces even for merely square-integrable right-hand sides, flowing in turn from the propagation of singularity principle for the system (6) (and indeed solution via the method of characteristics, an even more special property). Propagation of singularities along bicharacteristics holds for symmetric or strictly hyperbolic systems with smooth coefficients (see for example Taylor (1981)), and to some limited extent for systems with less regular coefficients (Beals and Reed 1982, 1984, Symes 1986, Lewis and Symes 1991, Bao and Symes 1996. Anomalous trace regularity follows for such systems: the system (6) is a particularly simple example. Some other consequences of this phenomenon will be mentioned in the Discussion section.

Example: acoustics
Linear acoustics provides an important example of the theoretical framework described in the last section. Acoustic wave propagation does not include the memory effect modeled by the convolutional term in (1), but illustrates several other features of the class of problems studied in this paper.
For this example, d = 3, and we suppose that the domain ⊂ R 3 is smoothly bounded (the smoothness requirement can be relaxed to some extent, as discussed briefly in appendix A). The momentum balance and constitutive laws of linear acoustics relate the excess pressure p(t, x) and velocity fluctuations v(t, x) = (v 1 (x, t ), v 2 (x, t ), v 3 (x, t )) T , x ∈ R 3 , to mass density ρ(x), bulk modulus κ(x), constitutive law defect g(t, x), and body force density f(t, x) by In this example, k = 4, and Densities and bulk moduli of sedimentary rocks in the upper crust, for example, lie between definite bounds. To express these natural bound conditions in a sufficiently flexible manner for present purposes, choose 0 < C * C * , and define M ac (C * , C * ) = {(κ, ρ) ∈ L ∞ ( ) 2 : C * 1/κ, ρ C * a.e. in }.
The notation is chosen so that if (κ, ρ) ∈ M ac (C * , C * ), the matrix a satisfies the bounds (3).

Remark.
It is assumed that the fields appearing in equations (7), particularly κ and ρ, have been rendered dimensionless by appropriate scaling. Similar assumptions will be made for other physical systems mentioned in this paper, without comment.
The right-hand side of (7) involves the matrix partial differential operator This matrix operator defines a skew-adjoint operator with domain For the reader's convenience, we present a proof that p(∇ ) : V → L 2 ( ) 4 is skew-adjoint, in appendix A.) Finally, the source vector f is defined by With these conventions, the acoustics system (7) is symmetric hyperbolic and satisfies the conditions of theorem 1. Note that if κ and/or ρ are discontinuous, then the form of the equations (7) immediately implies that no solutions of class C 1 may exist, even if the right-hand side f (that is, the body force density f) is smooth. Physically reasonable fluid configurations thus exist for which solutions in the classical sense cannot be defined, for example piecewise homogeneous mixtures with jump discontinuities of density and/or bulk modulus across smooth interfaces. However, theorem 1 provides a solution with physical sense, as the strain energy is well defined for any such solution, and even of class C 1 in t. The equivalence of the quadratic form defining E and the (square of the) norm in L 2 ( ) 4 follows from the assumption that (κ, ρ) ∈ M ac (C * , C * ), as already noted. Theorem 1 gives u ∈ C 0 (R, V ), provided that f ∈ H 1 loc (R, L 2 ( ) 4 ). To obtain differentiable dependence on the coefficients, it is necessary to sacrifice one more time derivative (theorem 2). This sacrifice seems minor, as modeled energy sources (represented by the right-hand side in (1)) are in many cases quite smooth in time, reflecting the finite bandwidth of energy generation and recording equipment and the loss of high frequencies to dissipative mechanisms during propagation. Membership in V = H 1 0 ( ) × H 1 div ( ) implies that some (but not all!) traces of u are well defined. In the notation of theorem 2, given a smoothly embedded hypersurface , choose l = 1 and m = m D or m = m N , where and n = (n 1 , n 2 , n 3 ) T is a smooth normal field on . Both S[m D ] and S[m N ] extend continuously to V . Note that traces of tangential components of v do not so extend. The end result is that both F f ,m D (pressure data) or F f ,m N (normal velocity data) are well defined and differentiable as functions of the coefficients, of class C s−1 if f ∈ H s loc (R, L 2 ( ) 4 ) (theorem 2). The domain of either version of F in this case is the open set M ac ⊂ L ∞ ( ) 2 , defined by The relation (5) in theorem 4 states that τ is larger than the largest generalized eigenvalue of the matrix p i ξ i with 'mass' matrix a, at a.e. x ∈ . There are two nonzero generalized eigenvalues for each x ∈ , namely ± √ κ(x)/ρ(x), as is easily verified. Thus under the conditions of theorem 4, the solution u vanishes at in ω up to time T if at every intermediate time t, the support of the right-hand side lies at least distance τ ( e. x ∈ , this conclusion is just what one would expect: waves move with speed at most

Example: viscoelasticity
Viscoelasticity provides an example of the symmetric hyperbolic dynamics with dissipation and frequency dispersion of wave energy, unlike acoustics.
in which v is the particle velocity field, σ the stress tensor, f a body force density, ρ the mass density, and S the inverse Hooke operator, or compliance tensor (Christensen 1983, Pipkin 1986). Viscoelasticity generalizes elasticity in that the inverse Hooke operator is a causal convolution operator (in time), rather than a temporally local multiplication operator. Of course, elasticity is a special case of viscoelasticity, in which S is proportional to δ(t ).
We will assume instantaneous elastic response: a nonzero strain rate arises immediately from a stress impulse. Under this assumption, the kernel S can be decomposed as in which S e is the elastic inverse Hooke tensor (inverse of the unrelaxed modulus), and s is a causal kernel. Both the elastic kernel S e and the memory kernel s(t ) act as spatiallyvariable, symmetry-preserving linear operators on symmetric tensor fields. The conventional representation of such things by 4-index tensors, and similarly for s. To avoid technical complications, assume that the viscoelastic material occupies all of R 3 . We require that, for some 0 < g * g * , (1) S e is elliptic: for any symmetric σ ∈ R 3×3 , For the 'state space' H of the viscoelastic system we choose The assumptions 1-3 above and the symmetries (10) imply that to be the subspace of L 2 ( , R 3×3 symm ) consisting of square-integrable symmetric matrix valued functions, each column of which has a square-integrable divergence; An argument similar to that explained in appendix A shows that p(∇ ) has a self-adjoint extension P : V → H, and that the natural norm on V is equivalent to the graph norm of P. Let With these definitions, the system (9) is formally equivalent to the evolution problem (13). The theory developed here thus assures the existence of weak solutions of (9), in material models including discontinuities of densities and/or elastic moduli and/or relaxation moduli.
An immediate consequence of theorem 4 is Corollary 1. Denote by c p the maximum quasi-p-wave velocity of the viscoelastic system (9), defined as for every (x, t ) ∈ supp f. Then the causal weak solution (σ, v) of (9) vanishes in a neighborhood of (x 0 , t 0 ).

A class of abstract first order evolution equations and their properties
Much of the theory developed in this paper applies equally well to a type of abstract evolution equation, of which (1) and its specializations are concrete realizations. Both to clarify the logical structure of the arguments, and because these systems have intrinsic interest, we formulate the basic facts about symmetric hyperbolic systems in terms of this abstraction. Let H be a separable real Hilbert space, with inner product ·, · and norm · . We will denote by V ⊂ H a dense subspace, itself a separable Hilbert space with inner product ·, · V and norm · V , defining a stronger topology on V than that induced by H. B(H ) is the Banach space of bounded linear operators on H, with the operator (uniform) norm. B symm (H ) is the subspace of bounded self-adjoint operators, and B + symm (H ) is the cone of bounded self-adjoint coercive operators. We suppose that The kernel Q defines a continuous linear operator R : The components described above combine to yield the formal evolution problem: find an H-valued function of t, say u, which solves, in a suitable sense, in which the right-hand side f is also an H-valued function of t. We follow (Lions 1971, Lions and Magenes 1972, Stolk 2000 in defining weak solutions u ∈ L 2 loc (R, H ) of the formal evolution problem (13) by integration against smooth test functions. Inspection of (13) suggests that a similar constraint must also be placed on the right-hand side: henceforth, we assume f ∈ L 2 loc (R, H ). Because the operator kernel Q may have unbounded support, we must constrain the growth of candidate members of L 2 loc (R, H ) on the negative half-axis. Accordingly, a weak solution of the formal evolution problem (13) is Note that since R * [φ] is supported in the half-axis (−∞, sup supp φ], and is squareintegrable, assumption 1 implies that the last term on the left-hand side of (14) is well defined.
We will reserve the term strong solution to designate a function u ∈ (13) pointwise. Clearly a strong solution is a weak solution. Existence of a strong solution in this sense also implies that f ∈ C 0 (R, H ).
Because of the causal assumption on the convolution kernel Q, existence of a causal weak solution vanishing for t < T 0 implies that the right-hand side f must also be causal, in fact vanish for t < T 0 .
Concerning the fundamental questions of uniqueness and existence, we prove Theorem 5. Assume that H, V , A, B, P and Q have the properties listed above, and that Regularity in time of the right-hand side implies that the weak solution is strong: Theorem 6. In addition to the assumptions of theorem 5, suppose that f ∈ L 2 Regularity of weak solutions u as functions of the coefficients A, B, Q is expressed through estimates which are uniform over certain sets of problems (that is, certain sets of coefficients), which are important enough to merit a definition: Strong convergence of the coefficient operators, in an appropriate sense, leads to uniform convergence of the corresponding weak solutions on compact sets in R. For simplicity, and because all of the examples we have in mind satisfy this restriction, we assume that all of the systems appearing the formulation of this continuity result share the same 'spatial' operator P. It is possible to weaken this assumption, that is, to approximate P as well; we leave the formulation and proof of this (slightly) stronger result to the reader. H ). Let u m , respectively u, be causal weak solutions of the differential equation (13) With additional regularity of the right-hand side, solutions of (13) have directional (Gâteaux) derivatives as functions of the coefficient operators: the causal strong solution of (13) with these choices of coefficients and right-hand side. Assume that δA, δB in which R (δR) is the convolution operator with kernel Q (δQ), as usual. Then for any choice With the correct choice of topology, the solution is (Fréchet) differentiable as a function of the coefficients, assuming as in theorem 8 that the right-hand side is somewhat regular in t. To express this fact in a form most useful for application to inverse problems, introduce another Hilbert space W (of 'measurements'), and a 'sampling' map S : V → W , assumed continuous, Of course S induces a continuous map : C k (R, V ) → C k (R, W ) for any k ∈ N; we abuse notation by writing S for this induced map also.
is the causal (strong) solution of (13) with coefficients A, P, B, Q and right-hand side f . It follows from theorem 6 that F f is well defined.

Discussion
The theory developed in this paper established a number of properties of the relation between coefficients and solutions of symmetric hyperbolic integro-differential systems, such as continuity, differentiability, finite domain of influence, and so on. These properties are assumed (usually without comment) in the applied literature on inverse problems for wave propagation in highly heterogeneous materials. The minimal hypotheses required by our theory allow for a degree of irregularity in the spatial dependence of coefficients which seems adequate to model the heterogeneity observed in real materials. The methods of proof date to the middle of the last century, yet so far as we are aware these results have not previously been explicitly formulated or collected in one place.
It remains to place our work in the context of related literature, and describe consequences for the solutions of inverse problems. This section begins with a review of related work on hyperbolic systems with non-smooth coefficients. Following this review, we discuss several by-products of the theory with implications for inverse problems in wave propagation: systems with operator coefficients, recently arisen as a key element of convexifying relaxations for seismic waveform inversion; computing descent directions for functions on non-reflexive Banach spaces; trace regularity under additional constraints and implications for the definition of 'Dirichlet-to-Neumann' maps; and various other possible extensions of the theory and open questions.

Prior art
We begin with a brief overview of prior work on the solution of (1). The well posedness of symmetric hyperbolic systems with regular (smooth) coefficients has been well-understood for decades (Courant andHilbert 1962, Lax 2006). Some effort has also been devoted to describing solutions for first-or second-order hyperbolic systems with less-than-smooth (but still continuous) coefficients, mostly focused on the propagation of regularity along bicharacteristics, or the existence of traces (for example, Beals and Reed (1982), Symes (1983), Beals and Reed (1984), Lasiecka (1986), Lasiecka and Trigianni (1989), Symes (1991), ,1993), Smith (1998). Those works treating the continuity or differentiability of the solution as a function of the coefficients have mostly required more smoothness of the coefficients than is allowed in this work, or dealt only with one-dimensional problems (Bamberger et al 1979, Symes 1986, Lewis and Symes 1991, Bao and Symes 1996, Salo 2006, Stefanof and Uhlmann 2009). An exception is Fernandez-Berdaguer et al (1996), in which (results which imply) a special case of theorem 2 is established.
A large number of articles have appeared on abstract first-or second-order hyperbolic equations and related inverse problems, for example Lavrentiev et al (1986), Choulli (1991), Lorenzi and Ramm (2001), Ramm and Koshkin (2001), Awawdeh (2010), Orlovsky et al (2010), mostly using the theory of strongly continuous semigroups (or cosine operators, in the second order case) to obtain a hold on well posedness. Of these, the closest in spirit to this work is that of Orlovsky et al (2010), which treats second order systems via the cosine operator approach but observes that smoothness in time of data implies that weak solutions are strong solutions, analogous to a crucial intermediate result in our work. Choulli (1991) established that a first-order abstract integro-differential initial value problem, resembling those of our abstract framework to be detailed in the next section, is well posed, and also proved uniqueness for the solution of a related inverse problem. The form of the integral term is actually a generalization of our convolutional memory operator, and the requirement on the 'spatial' operator is also a generalization (generates a strongly continuous semigroup-in the problem considered here, this operator is skew-adjoint, and generates a unitary group). However Choulli formulates an inverse problem which involves only determining lower-order terms, rather than the principal part, the inverse problem data does not involve time-like traces, and Choulli does not investigate the regularity of any analogue of F.
The immediate predecessor of our work is chapter 2 of the second-named author's PhD thesis (Stolk 2000), which treated second-order hyperbolic systems with nonsmooth coefficients, including the displacement formulation of elastodynamics, beginning with the techniques of Lions and his collaborators (Lions 1971, Lions andMagenes 1972) but going beyond well posedness to study the dependence of solutions on coefficients. The results on strong convergence and Gâteaux differentiability (theorems 7 and 8) are direct translations of arguments from this source, as is the observation that convergence in measure of L ∞ multipliers implies strong convergence of the corresponding multiplication operators on L 2 (lemma 4).

Descent directions
We have shown how to formulate the 'forward map' (F f ,m in the notation of theorem 2) as a continuously differentiable map on an open subset of a suitable Banach space, with values in a (subset of a) suitable Hilbert space. Thus least squares objective functions for such problems are continuously differentiable in a well-defined sense. However, the domain metric specified by the theory is some version of L ∞ , in the case of concrete problems of form (1), or the operator norm in the case of abstract problems (theorem 9). Thus the ambient Banach space is in all cases nonreflexive, and does not have the properties required for a sensible definition of gradient, as explained for instance by Kaltenbacher et al (2008). Thus differentiability does not necessarily bring with it natural access to descent directions.
Additional information about the derivative is available, however, via the well-known adjoint state method. For brevity, we explain this construction in the case of the abstract evolution problem (13) in the differential case and without lower-order term, that is, B, R = 0, and we proceed formally. With notation as in the statement of theorem 9, we define , u(t ) = 0, t < 0 guaranteed by the theory. If k 2, then F f ,m is of class C 1 , and its derivative is given by Suppose that we are provided data d ∈ L 2 ([0, T ], W ). The adjoint state method (Plessix 2006) represents the derivative of the least-squares function Then for any δA ∈ B symm (H ), It is possible to extract a gradient, that is, a direction of fastest ascent for J f ,m , from this representation of the derivative. Write (for u, v ∈ H) u ⊗ v for the rank 1 member of B(H ) defined by u ⊗ v(w) = v, w u. Such rank-1 operators are very special instances of operators of trace class. We refer to Conway (1990), pp 267-8 and 275 for the properties of this subspace of B(H ) cited here, in particular these: the product of a bounded operator and a trace-class operator is of trace class; a real-valued linear trace function tr is defined on the trace class, generalizing the trace of matrices. The adjoint-state expression (22) is equivalent to well defined since the integral inside the parenthesis defines a trace-class operator. This trace of the product defines a duality pairing, expressing the trace class as the 'predual' of B(H ).
Assuming that f , d are sufficiently smooth in t, the integrand in (23) is well-approximated by quadrature: Linear functionals on B(H ) of the form given by the right-hand side of (24) form precisely the subset of the dual B(H ) * consisting of functionals continuous in the weak operator topology.
Since the unit ball of B(H ) is compact with respect to weak convergence, the functional defined by the right-hand side of (24) has a maximizer over the ball, which is a candidate for an approximate gradient of J f ,m at A. This observation remains correct if A and δA are restricted to subspaces of B(H ), for example L ∞ matrix-valued functions as in the definition of symmetric hyperbolic systems (1).
The program outlined in the preceding paragraphs leaves a number of details to be filled in. For example, actually finding the maximizer of the linear functional in (24) is in some sense a finite dimensional problem, but not a particularly simple one. Also, the precise relation between right-hand sides of (23) and (24) remains to be established. Finally, the adjoint state evolution problem (21) involves a right-hand side (S[m] * (d − F m, f [A]) ∈ L 2 loc (R, V * ) outside of the class for which solutions, strong or weak, have been shown to exist. We will address this latter problem in the next subsection.
We also note that the gradient, defined as a direction of fastest ascent, is not smooth, or even necessarily a function, even under hypotheses which guarantee that J f ,m has several derivatives. This observation leads to various developments in convex optimization, beyond the scope of this paper. We note merely that Bamberger et al (1979), a landmark early paper on inverse problems in wave propagation, addressed this point explicitly.

Sharpness of trace regularity and source definition
Theorem 2 gives sufficient conditions for the trace of a solution of the symmetric hyperbolic system to be well defined and depend smoothly on the coefficients and right-hand side in (1). However these conditions are only sufficient, not necessary. Under some conditions, the trace of even a weak solution may be well defined and depend continuously on the coefficients and right-hand side. We saw an example of such anomalously well-behaved traces already in the discussion of the advection equation (6) in section 2, however the phenomenon is much more general.
For example, suppose that the coefficients of (1) are smooth in a neighborhood ω of the boundary ∂ and containing the measurement surface , that the right-hand side is supported outside of this neighborhood, and that no rays of geometric optics originating outside this neighborhood are tangent to the boundary. Then microlocal propagation of regularity shows that the map Symes (1983), Lasiecka (1986).
By duality, sources supported on also give rise to well-defined solutions with estimates similar to those proven in the body of this paper. Combining these two observations leads to definition of variants of the Dirichlet-to-Neumann map (Sylvester and Uhlmann 1990) which forms the natural abstraction of 'data' for many inverse problems. These mappings, from suitable (source) data supported on to other data supported on (or on another similar surface), thus inherit a definition and regularity properties for minimally regular coefficient classes as described above.

Nonphysical extension via operator coefficients
Perhaps surprisingly, the results on systems with operator coefficients (theorems 5-9) seem likely to be useful in themselves. This utility originates in the resistance of inverse problems for hyperbolic systems to the least-squares (or least-error) approaches that have successfully treated many other types of inverse problems in science and engineering. Natural objective functions change rapidly in some directions, not in others, and appear to possess many stationary points far from any useful model estimate, a nearly fatal feature for variants of Newton's method, often the only feasible approach to model estimation (Gauthier et al 1986, Santosa andSymes 1989).
The last-named author has suggested that use of operator coefficients as an extension of normal continuum physics could conceivably convexify the normal least-squares objective (Symes 2008). Far from an arbitrary introduction of additional degrees of freedom, this reformulation of the inverse problem is closely related to well-tested ideas in seismic data processing, and has already a partial theoretical justification (Stolk et al 2009, Shen andSymes 2008). Very recently, implementations of extended least squares inversion using operator coefficients have produced very promising early results (Biondi and Almomin 2012) which tend to support the convexification hypothesis.

And so on . . .
The fundamental model problem (1) does not include static differential constraints, so the results established above do not apply directly to Maxwell's equations, for example. We suspect that the theory can be extended to accommodate such problems.
Imposition of additional regularity requirements on coefficient matrices, beyond boundedness and measurability, leads to existence of well-behaved solutions with more regularity, and (by duality) less regularity as well. Stolk (2000) offers some results concerning second-order problems with additional regularity of both coefficients and data.
Both physical heuristics and numerical evidence for inverse problems in seismology (Virieux and Operto 2009) suggest strongly that the mapping F is more linear when the righthand side is smoother in time. Estimates similar to those developed in the proof of theorem 9 show that for right-hand sides of the form f ( t ), the Newton quotient remainder is O(h ) relative to the directional derivative, for any (δA, δB, δQ) not in the null space of DF. This observation can be developed into a theoretical justification of the frequency continuation strategy employed in every successful contemporary algorithm for least squares inversion of seismic data.

The energy inequality
Define the energy E(t ) of a weak solution u of (13) by It follows from the definition of weak solution that E is well defined almost everywhere, and locally integrable. Because A is positive-definite, hold for almost all t ∈ R, for suitable C * C * > 0.
Remark. In the linear acoustics and viscoelasticity examples described earlier in this paper, E(t ) is precisely the mechanical (kinetic plus potential) energy of the dynamical field u at time t.
In this and the following sections, we will use C to denote a generic nonnegative constant depending on C * , C * , B B(H ) , and Q L 1 (R,B(H )) , and possibly on other quantities as noted.
We shall repeatedly use the following property of weak solutions: Proposition 1. Suppose that u ∈ L 2 loc (R, H ) is a weak solution of (13). Then for any η ∈ C ∞ 0 (R), the mollification η * u is a strong solution of (13) with right-hand side η * f , and in fact Remark. In applications such as the acoustic and viscoelastic systems described earlier, H and V are function spaces, and membership in V entails additional regularity beyond that required for membership in H. In such contexts, the content of this theorem is that smoothing in time also 'smooths in space', in the sense that the values of the smoothed weak solution are confined to the subspace V .
Proof. Choose the test function φ in (14) to have the special form φ(s) = η(t − s)w, where t ∈ R, w ∈ V , and η ∈ C ∞ 0 (R). Then where the third equality is simply a rearrangement of (14) with the special choice of test function φ(s) = η(t − s)w mentioned above, and the last holds amongst other reasons because R is also convolutional. The right-hand side of (28) is bounded by a w-independent multiple of w H , therefore so is the left. Therefore η * u takes values in the domain D(P * ) of the adjoint P * for any η ∈ C ∞ 0 (R). But P is skew-adjoint, so D(P * ) = D(P) = V . Therefore we can shift P to the left-hand side of the inner product on left-hand side of (28), to obtain As (29) holds for every w ∈ V and V ⊂ H is dense, it follows that that is, (13) is satisfied pointwise in t. Since the right-hand side of (30) is of class C ∞ (R, H ), so is the left-hand side. Denote by T δt ∈ B(L 2 loc (R, H )) the translation operator by δt, defined by T δt u(t ) = u(t + δt ). For each k ∈ N, let k,δt = δt −k m k j=−m k a k, j T j δt be a finite difference operator for which u ∈ C 0 (R, H ) has k derivatives if and only if i,δt u converges pointwise in t for each i = 1, . . . , k to u (i) as δt → 0. From (30) it follows that for each k ∈ N, Since the right-hand side of (31) converges for each t, so does the left-hand side, both versions, as δt → 0. Thus k,δt (η * u)(t ) converges in the graph norm of P, hence in the sense of · V , as δt → 0 for each t ∈ R, k ∈ N. Thus η * u ∈ C ∞ (R, V ).

• after modification on a set of measure zero, E is continuous;
• if in addition f is causal, supp f ⊂ [T 0 , ∞), then for any T ∈ R there exists C T 0 ,T 0 so that for t ∈ (−∞, T ], Proof. Let {η n ∈ C ∞ c (R)} be a sequence of mollifiers approximating the Dirac delta distribution: explicitly, η n (t ) = nη(nt ), where Define Since η n * u → u in L 2 loc (R, H ), E n → E in L 1 loc (R). Thanks to proposition 1, η n * u ∈ C ∞ (R, V ) is a strong solution of (13) for each n ∈ N, with f replaced by η n * f . For each n, E n is smooth; differentiating E n , obtain for any s, t ∈ R thanks to the skew-symmetry of P.
Since convolution with η commutes with the convolution operator R, and with the actions of the other operators appearing in (13), the identity (35) implies that Since u, R[u], and f are locally square-integrable, for each t ∈ R and > 0, there exist t(t, ) > 0 and an N(t, ) ∈ N so that for |s − t| < t(t, ) and n > N(t, ), Continuity of E n implies existence of t(t, ) > 0 so that for |s − t| < t(t, ) and n N, |E n (t ) − E n (s)| < C . Thus the inequality (37) holds for all n ∈ N if s satisfies |s − t| < min( t(t, ), t(t, )). Since t ∈ R, > 0 are arbitrary, we have shown that the sequence {E n } ⊂ C 0 (R) is equicontinuous.
Choose T 0 T ∈ R: it follows from the definition (34) and Young's inequality that Apply Young's inequality to the convolutions with η n appearing in the inequality (36) to conclude that for T 0 s, t T , Taken together, (38)-(40) imply that {E n } is a bounded subset of C 0 ([T 0 , T ]). According to Ascoli's theorem, {E n } is precompact in C 0 ([T 0 , T ]), hence a subsequence converges uniformly to a continuous limit. Since the subsequence is necessarily also L 1 -convergent, and T ∈ R is arbitrary, the first assertion of the theorem is established.
In view of the continuity of E, we may take the limit n → ∞ on both sides of the inequality (36) along the uniformly convergent subsequence whose existence we have just established. Since η n * u → u in L 2 ([T 0 , T ], H ), the right hand side converges, and we obtain (41) We have assumed u to be causal, but this assumption has not appeared in the reasoning up to now. It allows us to take s → −∞ in (41). In view of the equivalence of √ E and the norm · (inequalities (26)), the inequality (41) implies that Gronwall's inequality then yields the second conclusion.

Corollary 2.
The energy E of a weak solution u of (13), as defined above, satisfies for any s, t ∈ R Proof. Continuity of E and convergence of η n * u to u in L 2 loc (R, H ) allows us to take limits on both sides of (35).
Proof. The conclusion follows immediately from the energy inequality (32), applied to the difference u = u 1 − u 2 , which is a weak solution with f ≡ 0.
Proof. For δt ∈ R, denote by u δt the member of L 2 loc (R, H ) defined by u δt (t ) = u(t + δt ). Then u δt is a causal weak solution (the only one, thanks to corollary 3) of (13) with f replaced by f δt ∈ L 2 (R, H ), defined by f δt (t ) = f (t +δt ). The translation group acts strongly continuously on L 2 , i.e. f δt − f L 2 (R,H ) → 0 as δt → 0. Since the difference u δt − u is a causal solution of (13) with right-hand side f δt − f , it follows immediately from (32) that u δt (t ) − u(t ) H → 0 as δt → 0 for any t ∈ R, that is, u ∈ C 0 (R).

Corollary 5. Suppose that (1) K ⊂ B(H ) is a bounded set; (2) L ⊂ B(V, H ) is a bounded set of skew-adjoint operators on H with (common) domain V , whose graph norms are all equivalent (to each other and to the norm in V ); (3) M ⊂ B(H ) is a bounded set of self-adjoint, uniformly positive definite operators: there
exist constants 0 < C * C * so that for all A ∈ M, Let the set P ⊂ M × L × K × Q parametrize a family of formal evolution problems of for (13), with coefficients A ∈ M, P ∈ L, B ∈ K, and Q ∈ Q, with common right-hand side f ∈ L 2 (R, H ), and let U ⊂ L 2 loc (R, H ) be a corresponding family of causal weak solutions. H ) is the content of the last corollary. It follows from the proof of the basic energy estimate (32) that the constant C appearing in its right-hand side may be chosen uniform over P -indeed, the bounds defining the sets listed in conditions 1-4 above are precisely those on which our constants, canonically notated C, depend. Therefore (32) implies that for u ∈ U, from which a uniform modulus of continuity follows.
Additional regularity in time of the right-hand side f translates into additional regularity in time of the solution. H ) is a weak solution of (13). Then u ∈ C k (R, H ); moreover, for j = 1, . . . k, u ( j) is the weak solution of (14) with right-hand side f ( j) .
Proof. Corollary 4 is the case k = 0.
The case k = 1 provides the induction step, so we discuss it first. Since f ∈ H 1 loc (R, H ), using the notation of the proof of corollary 4, in mean square. The energy estimate (32) implies that for each t, has a limit as δt → 0, whence u is differentiable and u satisfies (14) with right-hand-side f . The previous corollary shows that u is continuous. This argument also serves as the induction step to establish the assertion of the corollary for k > 1.
Smoothing the solution in time leads to a different sort of 'regularity': B B(H ) , and Q L 1 (R,B(H )) , so that for any t T ,

T )), and let u denote a causal weak solution of (14). Then for every T
Remark. Note that this inequality is a pointwise bound on η * u in the norm of V . In applications, V is compactly embedded in H, so this is potentially a much stronger statement than the obvious H-norm bound which follows directly from (32).
Proof. The left hand side makes sense, thanks to proposition 1. The identity (28) may be re-written as thanks to the energy estimate (32) and by-now familiar use of Young's inequality and the assumptions on the various operators and quantities in the problem formulation. Now choose w = P(η * u)(t ) to obtain a bound on its H-norm. In combination with (32) and Young's inequality, this estimate implies a bound of the required form on (η * u)(t ) in the graph norm of P, that is, in the norm of V .

Existence of weak solutions: proofs of theorems 5 and 6
The proof of existence follows the pattern laid out by Lions (1971), which in turn echoes Cauchy's proof of the fundamental theorem of ordinary differential equations. We define a Galerkin method, show that it converges, and finally that the limit is a weak solution. Note that no rate of convergence follows from this argument; in fact it is easy to see that none can be expected. Of course, proposition 2 has already assured that the solution so constructed is the only solution.
Proof of theorem 5. In view of the energy estimate (proposition 2), at most one such solution exists, and any sequence of weak solutions, corresponding to an L 2 (R, H )-convergent sequence of right hand sides f , must itself be L 2 (R, H )-convergent. Therefore it suffices to establish existence of solutions for a L 2 (R, H )-dense set of right hand sides. In particular, we may assume that f ∈ C 0 (R, H ), without loss of generality. Since V is separable (with respect to the graph norm of P) and V ⊂ H is dense, countable linearly independent subsets {w k } ∞ k=1 ⊂ V exist for which finite linear combinations are dense in V , hence in H. Without loss of generality, assume that {w k } ∞ k=1 is (H-) orthonormal: w k , w l = δ kl , k, l ∈ N.
Define m × m matrices A m (symmetric positive definite), P m and B m by for 1 k, l m, and the operator R m on L 2 loc (R) m defined analogously to (12) by A minor modification of a standard contraction mapping argument (see for example Coddington and Levinson (1955)) shows that for each m ∈ N, the initial value problem has a unique solution U m ∈ C 1 (R, R m ).
Then the system (49) satisfied by U m , together with the H-orthonormality of {w k }, implies that u m is the weak solution of the evolution equation (13) with right-hand side f m . The energy estimate (32) shows that the sequence {u m } is bounded in L 2 loc (R, H ), hence weakly precompact in L 2 loc (R, H ) thanks to the Tychonoff-Alaoglu theorem and a diagonal process argument. Denote by {u m(l) } a weakly convergent subsequence, and by u its weak limit. Since u m (t ) = 0 for t < T 0 and all m ∈ N, the same is true for u.
To see that the limit u is a weak solution of (13), introduce for each m 0 ∈ N test functions ψ of the form For l sufficiently large that m(l) Letting l → ∞, it follows that u satisfies (14) for all test functions ψ of the form given in equation (50). Since linear combinations of w m 's are dense in V , the set of functions of the form (50) is dense in C 1 0 (R, V ), whence u is a weak solution of (13).
Proof of theorem 6. We give the proof for k = 1; the general case follows by a straightforward induction argument. Choose a Dirac sequence {η n : n ∈ N} ⊂ C ∞ 0 (R) as in the proof of proposition 2. According to corollary 7, η n * u ∈ C k (R, V ) for each n ∈ N. According to corollary 6, u ∈ C 1 (R, H ), so the first term on the right-hand side of (28) may be integrated by parts to yield (η n * u), Pw = η * u, Pw for any w ∈ V . Both sides are continuous in t, so the limit as n → ∞ is valid pointwise. Thus for any w ∈ V , t ∈ R. Pointwise bounds on u(t ) (proposition 2), u (t ) (corollary 6) and the standing assumptions on the various operators imply that for all w ∈ V , t ∈ (−∞, T ] (and any T , but the constant C T 0 ,T depends on T 0 and T ), which shows that u(t ) ∈ D(P * ) = D(P) = V for each t ∈ R. Thus for all w ∈ H, t ∈ (−∞, T ]; taking w = Pu(t ), obtain the pointwise bound (16) for k = 1.
That u ∈ C 0 (R, V ) follows exactly as in the proof of corollary 4, via the strong continuity of the translation group on H 1 (R, H ).

Continuous dependence on parameters: proofs of theorems 7-9
The proof of theorem 7 divides into three steps. In the first step (lemma 1), we show that the assumptions of the theorem imply L 2 loc (R, H )-weak convergence of the sequence of solutions. The second step leverages this result to show that the sequence converges H-weakly, pointwise (lemma 2). The last step combines these two observation with strong convergence of the coefficients to obtain convergence in norm of the solution sequence.

Lemma 1.
Under the conditions of theroem 7, u m converges weakly to u in L 2 loc (R, H ). Remark. In order that R m converge to R pointwise, as assumed in the statement of the preceding theorem, it is sufficient that Q m → Q uniformly in R + .
Proof. The bounds implied by the energy estimate (proposition 2) are uniform over bounded sets of coefficients as described in the statement of the theorem. Therefore {u m } is bounded in L 2 loc (R, H ), hence has an L 2 loc (R, H )-weakly convergent subsequence u m(l) , with limit u ∈ L 2 loc (R, H ). Note that L 2 loc (R, H )-weak convergence implies convergence in the sense The second term vanishes in the limit l → ∞ because of the weak convergence of u m(l) toū. The coefficients A m , . . . range over bounded sets of operators, so we may replace φ and φ in the third term with simple V -valued functions, taking finitely many values, at the price of an arbitrarily small perturbation in this term, uniformly in l. However the strong convergence of the coefficient operators assumed in the statement of the theorem then implies that the resulting integrals become arbitrarily small as l → ∞. Thusū is a weak solution of the problem (13), and must therefore be the same as the (unique) weak solution u constructed in the preceding section. Thus no other weak accumulation point of the bounded sequence {u m } may exist, hence u m u in L 2 loc (R, H ) as claimed.

Lemma 2. Under the conditions of theorem 7, u m converges to u weakly, pointwise in t ∈ R and uniformly on compact sets. That is, u m (t ) u(t ) for all t ∈ R, and for any w ∈ H,
Proof. According to corollary 5, the conditions described in the statement of theorem 7 imply that {u m : m ∈ N} is equicontinuous, hence uniformly equicontinuous on compact sets. Choose However, according to lemma 1, for any w ∈ H, Therefore, assuming without loss of generality that w = 1, for m sufficiently large, t ∈ [T 0 , T ]. Since T 0 , T, and > 0 are arbitrary, the proof is complete.
We require one more fact about the memory operators R: H ) converging strongly to zero. Let K ⊂ L 2 + (R, H ) be bounded, and assume that all u ∈ K are causal, with common support supp u ⊂ [T 0 , ∞). Then for any sequence {u m : m ∈ N} ⊂ K, and any v ∈ L 2 loc (R, H ), any T ∈ R, Note that we have used the symmetry of Q m , and it is for this reason that the assumption was introduced. According to the hypothesis, is bounded, whence the conclusion follows via the Cauchy-Schwarz inequality.
Proof of theorem 7. We will use repeatedly the algebraic identity: for K, If K and L are symmetric, this identity simplifies to The right-hand side f in the formal evolution equation (13) for both u m and u vanishes for sufficiently large negative t, else u could not be causal, but then u m and u must vanish on a common (negative) half-axis, thanks to corollary 3. The energy identity (42) implies that Application of the algebraic identities stated above shows that the right-hand side is Identities (59) and (60) combine to yield in which g m ∈ C 0 (R) is defined by Since the B m 's are uniformly bounded operators on H and the R m 's are uniformly bounded operators on L 2 ((−∞, t], H ) for every t ∈ R (with norm independent of t), (61) implies that in which C means something different each time it occurs, as before, but depends on the quantities indicated the second section.
Application of Gronwall's inequality to (63) yields, for T 0 t T and C depending on T along with everything else, in which C T −T 0 depends on T − T 0 in addition to the standard dependences.
It remains to see that g m (t ) → 0 uniformly in t ∈ [T 0 , T ], as m → ∞. The first term in (62) tends to zero pointwise (in t) thanks to the assumption that A m → A strongly, and to the energy estimate (theorem 5) which assures that u m − u is bounded uniformly in m ∈ N and t ∈ [T 0 , T ]. On the other hand, corollary 5 and the uniform bounds on {A m : m ∈ N} implied by the assumption that {A m , B m , Q m ) : m ∈ N} ⊂ P (C * , C * , C B , C Q ) in turn imply that the first term in (62) defines an equicontinuous sequence of continuous functions on [T 0 , T ]. Since any convergent subsequence converges pointwise to the zero function, so does the entire sequence, and uniformly. The second term in (62) tends to zero, uniformly on [T 0 , T ], thanks to lemma 2, the third because B m → B strongly, the fourth, seventh, eighth and ninth because of lemma 1 and theorem 5, the fifth because R m → R strongly in L 2 + (R, H ), the sixth because of lemma 3.
Remark. This result is sharp, in the sense that nothing stronger than continuity can be expected without additional constraints on the various components of the formal evolution problem (13). In particular, the modulus of continuity cannot be uniform in the right-hand side ( f ∈ L 2 (R, H )), even locally.
For example, the 1D linear advection problem 1 c conforms to the setting described above, with H = L 2 (R). The operator coefficients are: A = multiplication by the positive constant 1/c, in which we have explicitly indicated the dependence of the weak solution on the coefficient 1/c and the right-hand side f . Suppose χ ∈ C ∞ 0 (R), supp χ ⊂ [−1, 1], and χ = 1.
for c = 1. Thus the modulus of continuity of (c, f ) → u[c, f ](t, ·) ∈ H (for t > 1) cannot be uniform over the bounded set [a, b] (a, b), and in particular this map is not locally uniformly continuous. The heuristic reason is that c is the wave speed, so changing c changes the position of arriving waves. This position shift has unboundedly large impact if the frequency of oscillation in the solution is allowed to become arbitrarily large. Oscillations in the data in time translate into oscillations in the solution in space, of similar frequency, because the problem is hyperbolic.
On the other hand, additional regularity in time of the right-hand side entails more regular behavior of the solution, as one might expect since such regularity damps temporal frequencies.
Corollary 8. In addition to the hypotheses of theorem 7, assume that f ∈ H k loc (R, H ), k 1.

Then for any choice of T
Proof. Follows directly from theorems 6 and 7.
Proof of theorem 8. The meaning of (18) is that for any φ ∈ C ∞ 0 (R, V ), On the other hand, both u and u h , h > 0, satisfy (14) with the same right-hand side, so Subtracting (66) from (67) and rearranging, obtain In view of equation (68), the Newton quotient remainder is the weak solution of (13) with right-hand side H ). In view of corollary 8 and the energy estimate (theorem 5) imply that as h → 0 for any T ∈ R. The conclusion then follows from anther use of theorem 5.
Proof of theorem 9. We present the case k = 2. The general case follows by an induction argument, which we omit.
As usual, denote by T 0 ∈ R a lower bound for supp f . In the notation of the proof of theorem 8, u ∈ C 2 (R, H ) ∩ C 1 (R, V ), thanks to theorem 6. Therefore the right-hand side of (18) has a locally square-integrable derivative, whence the causal weak solution δu actually belongs to the class C 1 (R, H ) ∩ C 0 (R, V ). Applying theorems 5 and 6 repeatedly, one sees that δu satisfies for any  (18)) that δu m − δu is the (strong) solution of Theorem 6 implies that {δu m : m ∈ N} is a bounded set in C 1 ((−∞, It follows that the first three terms on the right-hand side of (69) tend to zero in H 1 ([T 0 , T 1 ], H ). Applying theorem 7 to u m − u , the difference of solutions of equations of the form (13) with the same causal right-hand side f ∈ H 1 loc (R, H ), one sees that u m − u → 0 in H 1 ([T 0 , T 1 ], H ) also, whence the second three terms also tend to zero in this sense. Thus the entire right-hand side of (69) tends to zero as m → ∞ in the sense of H 1 ([T 0 , T 1 ], H ). Now it follows from theorem 6 that is continuous, and the theorem is proved in the case k = 2.

Symmetric hyperbolic systems: continuous dependence, proofs of theorems 1-4
For convenience, we repeat the key definitions from the second section.
Recall that k × k symmetric hyperbolic systems take the form in which the coefficient matrices a, b, and q are k × k, and the k × k matrix differential operator in the 'space' variables x ∈ has symmetric and constant coefficient matrices.
M corresponds to a collection of problems of the form (70), with common p(∇ ). The Hilbert space of states is H = L 2 ( ) k . Provide p(∇ ) with a dense domain V ⊂ H, and assume the p(∇ ) : V → H is skew-adjoint, and that the norm in V is equivalent to the graph norm of p(∇ ).
The hypotheses of theorem 5 are all satisfied, so Corollary 9. Suppose that (a, b, q) ∈ M, and f ∈ L 2 loc (R, H ) is causal. Then there exists a unique weak solution of (70), satisfying for every φ ∈ H 1 (R, There exists an increasing function C : R + → R + , depending on C * , C * , and bounds for b, q, so that for every t ∈ R, Theorems 1 and 2 now follow directly from theorems 8 and 9 respectively. Multiplication by a member of L ∞ ( ) defines a continuous map from L ∞ ( ) to B(L 2 ( )), so convergence of the coefficients a, b, q in L ∞ ( ) is sufficient to induce uniform, hence strong, convergence of the corresponding operators, hence convergence of the solutions per theorem 7. However, convergence in a weaker sense is also sufficient to induce strong operator convergence. The key observation is the following result, identical to lemma 2.8.5 in Stolk (2000): F ∈ R + for all m ∈ N. Suppose that r m → 0 in μ-measure. Then for any g ∈ L 2 ( , μ), Proof. Suppose on the contrary that such sequences {r m }, { f m } and square-integrable g exist, also an η > 0, for which the left-hand side of (73) remains η along a common subsequence. Without loss of generality, renumber the subsequence so that Convergence in measure of {r m } means that for any > 0, Choose so that F g L 2 ( ,μ) < η/2. From this definition and the Cauchy-Schwarz inequality, one sees that By passing if necessary to a further subsequence, we may assume that Thus the characteristic functions of the sets (r m ) are almost everywhere convergent to zero as m → ∞. Since |g| 2 ∈ L 1 ( , μ), it follows from the Lebesgue dominated convergence theorem that for large enough m, Thus the left-hand side of (74) can be made smaller than η, a contradiction.
Then A m → A strongly. The same is true for similar sequences of operators on L 2 (R n ) p defined by sequences of p × p matrix-valued functions whose components converge in measure.
Proof. In fact, the operators so defined are self-adjoint, and as follows from lemma 4, taking a m − a for r m , (a m − a)u for f m , and u for g in the notation of that lemma.
Proof of theorem 3. Follows immediately from lemma 5 and theorem 7.
One way to obtain convergence in measure is via mollification. Let {η m : m ∈ N} be a Dirac sequence of mollifiers, as in the proof of proposition 2. Since a ∈ L p loc (R d ) for any p, η m * a → a pointwise almost everywhere, hence in measure. An application of this observation is the finite speed of propagation property: for solutions of (70), support expands at finite speed. This property is well-understood for hyperbolic systems with smooth coefficients: for example, Lax (2006), chapter 4, presents a proof of: Proposition 3. In addition to the hypotheses of theorem 1, suppose that the coefficients in (70) are smooth (of class C ∞ (¯ )). Then Corollary 10. In the setting of proposition 3, suppose that τ ∈ R satisfies then u(x 0 , t 0 ) = 0.
Note that the argument given in Lax (2006) does not quite encompass proposition 3 and corollary 10: it does not apply to systems like (70) with memory terms. However the extension is straightforward.
Proof of theorem 4. Using a sequence of mollifiers as before, obtain a sequence {(a m , b m , q m ) : m ∈ N} of smooth coefficients converging in measure to (a, b, q). Then the corresponding operators converge strongly to those induced by (a, b, q). Since (5) holds almost everywhere, it follows that a m satisfies the spectral inequality (75) in R n for sufficiently large m, whence corollary 10 implies that u m vanishes in ω. Since, u m → u in L 2 ( ) k the conclusion follows.

Acknowledgments
KDB acknowledges the support of his work by the Rice University VIGRE program, funded by the National Science Foundation. The work of KDB and WWS was supported in part by the sponsors of The Rice Inversion Project. The research of CS is supported by the Netherlands Organisation for Scientific Research, through VIDI grant 639.032.509. The authors are grateful to Yin Huang for a very careful reading of a preliminary version and for several helpful suggestions. WWS acknowledges the Isaac Newton Institute for Mathematical Sciences at Cambridge University (programme on Inverse Problems, Fall 2011) for the opportunity to take advantage of its superbly productive environment during the final stages of our work on this project.

Appendix A. The skew-adjoint property of the acoustic grad-div operator
We establish skew-adjointness of the operator D, defined formally in (8), by appealing to an auxiliary result, similar to necessary and sufficient conditions for self-adjointness found in many texts on functional analysis (for example, Conway (1990)).

Lemma 6.
Suppose that H is a Hilbert space, V ⊂ H a dense subspace, and L : V → H a skew-symmetric linear operator. Then L is skew-adjoint if and only if for any c ∈ R \ {0}, L + cI is surjective.
Proof. Suppose first that L is skew-symmetric and L + cI is surjective for every c ∈ R \ {0}. Since the domain of L is dense, for any such c, ker(L * + cI) = rng(L + cI) ⊥ = {0}.
Conversely, if L is skew-adjoint, there can be no nonzero solutions to (L + cI)x = 0 with nonzero c, as follows from skew-symmetry. Hence there are no nonzero solutions to (L * + cI)x = 0 with nonzero c, since L = L * . That is, the kernel of L * + cI is trivial for nonzero c, whence the range of L + cI is dense for any nonzero c. However L is closed (see for instance (Conway 1990), proposition X.1.6, p 305), so its range is closed (this is proved just as is a similar fact for closed symmetric operators, see Conway (1990), proposition X.2.5, p 310) thus rng(L + cI) = H. Proposition 4. The differential operator p(∇ ) with domain C ∞ 0 ( ) 4 , defined formally by (8), extends to a skew-adjoint operator P on H with dense domain Remark. The Hilbert space H 1 div ( ) is the dense subspace of L 2 ( ) 3 obtained by completing C 1 ( ) 3 in the graph norm of the divergence operator. Equivalently, v ∈ L 2 ( ) 3 belongs to H 1 div ( ) if and only if there exists C 0 so that for every φ ∈ H 1 0 ( ), | ∇φ, v L 2 ( ) 3 | C φ L 2 ( ) .
Thus (A.1) implies that − ∇φ, ∇ p − c 2 φ, p = ∇φ, w − c φ, q . (A. 2) The left-hand side of (A.2) defines a bounded and coercive (negative-definite) form on H 1 0 ( ) for any nonzero c, and the right hand side defines a continuous linear form on the same Hilbert space. The Lax-Milgram theorem (Yosida (1996), III.7) implies the existence of a unique p ∈ H 1 0 ( ) for which (A.2) holds for every φ ∈ H 1 0 ( ). With this choice of p, we are required to solve the second of the two conditions (A.1). In fact, a solution v 0 ∈ H 1 ( ) 3 ⊂ H 1 div ( ) exists satisfying v 0 H 1 ( ) 3 C q − cp L 2 ( ) with a constant C depending only on . For a proof, see Brenner and Scott (2007), lemma 11.2.3, who also give references to other results for domains for polygonal, rather than smooth, boundaries.
Thus we have constructed a solution of (A.1) in V = H 1 0 ( )×H 1 div ( ) for any c ∈ R\{0}, whence we conclude that P is skew-adjoint.

Appendix B. The differential case
If the memory term (convolution operator R) is absent, then initial data determine solutions uniquely. In this section, we sketch the theory, parallel to that for causal solutions, which holds in this differential case. We assume throughout this section that Q ≡ 0.
Corollary 12. Suppose that u is a weak solution of (13). Then for any s t ∈ R, Proof. The energy identity (42) applies to weak solutions, causal or not. Take into account R = 0, and use Gronwall's inequality, the boundedness of B, and the equivalence (26) of the energy with the norm in H.
Corollary 13. If u 1 and u 2 are weak solutions of (13) (with the same right-hand side f ∈ L 2 (R, H )), and u 1 (s) = u 2 (s) for some s ∈ R, then u 1 ≡ u 2 .
Proof. Set u = u 1 − u 2 : u is a weak solution with right-hand side f = 0, and u(s) = 0. The result follows immediately from corollary 12.
Theorem 10. Suppose that T 0 ∈ R and u 0 ∈ H. Then there exists a unique weak solution of (13) for which u(T 0 ) = u 0 .

Proof.
A proof of this result is precisely analogous to the proof of theorem 5: the solution is approximated by a Galerkin procedure and the solution of systems of ordinary differential equations, and the energy in the error estimated (in this instance, via corollary 12).
Results precisely analogous to those established in the last section hold concerning regular dependence on the coefficient operators for weak solutions with specified initial data and righthand side. We leave the reader to formulate these results, whose proofs are minor variants of those given above.