A Minimization Approach to Conservation Laws With Random Initial Conditions and Non-smooth, Non-strictly Convex Flux

We obtain solutions to conservation laws under any random initial conditions that are described by Gaussian stochastic processes (in some cases discretized). We analyze the generalization of Burgers' equation for a smooth flux function $H\left( p\right) =\left\vert p\right\vert ^{j}$ for $j\geq2$ under random initial data. We then consider a piecewise linear, non-smooth and non-convex flux function paired with general discretized Gaussian stochastic process initial data. By partitioning the real line into a finite number of points, we obtain an exact expression for the solution of this problem. From this we can also find exact and approximate formulae for the density of shocks in the solution profile at a given time $t$ and spatial coordinate $x$. We discuss the simplification of these results in specific cases, including Brownian motion and Brownian bridge, for which the inverse covariance matrix and corresponding eigenvalue spectrum have some special properties. We calculate the transition probabilities between various cases and examine the variance of the solution $w\left(x,t\right)$ in both $x$ and $t$. We also describe how results may be obtained for a non-discretized version of a Gaussian stochastic process by taking the continuum limit as the partition becomes more fine.

In particular, this is observed in the prototypical case of Burgers' equation, for which H (w) = w 2 /2. The work of Dafermos [8] was instrumental in constructing an approximation for this case and more generally, locally Lipschitz H by using a series of piecewise linear flux functions to construct the solution for deterministic initial conditions. Using our methods for random initial conditions, one also can construct the continuum limit for the deterministic problem as a special case.
In this paper, we consider the introduction of randomness into the initial conditions for the conservation law above. The classical approach for solving the conservation law (1.1) entails using variational methods to derive the Lax-Oleinik formula [11,17,21,22] w (x, t) : requiring assumptions on smoothness, superlinearity, and strict convexity of the flux function. In a previous paper [5], we showed that (1.3) will still hold when these conditions are relaxed, with the only other assumption being that of Lipschitz continuity on the initial condition g (x). These methods thus enable us to derive a number of results for the case of non-deterministic initial data. When dealing with randomness, an important notion is that of a stochastic process X (t). The process is a set of random variables linked together by a law of probability with the set of possible paths given by some state space Ω. In general this is far too broad a class to attack analytically, so attempts to derive exact solutions are often restricted to one specific stochastic process, for example Brownian motion [12]. In this work, we derive results for an entire family of stochastic processes. In Section 2 we provide an introduction and references to some of the key properties of such a family known as Gaussian stochastic processes.
The motivation for this class of process is not only that it has sufficient structural properties to obtain closed-form results but also has the breadth to analyze a number of distinct cases. We also utilize this approach in calculating transition probabilities between different states and examining the variance in space as time increases. Consider the basic problem w t + |w| x = 0 subject to continuous initial conditions w(x, 0) = g (x). Application of our results gives g(x − t) as the solution if the minimum of g(y) is at the left endpoint of the interval, g(x + t) if the minimum is on the right endpoint, and 0 for an interior minimum. Thus for g that is given by Brownian motion, one minimizes integrated Brownian motion over a finite interval and obtains the shock statistics.
This example can provide a pathway to considering the behavior of solutions to more complicated flux functions under random initial data, for example when one has not one but a number of affine segments linked together. This sort of construction is denoted as a polygonal flux function and is thus particularly suited to analysis under the Hopf-Lax minimization approach [17,21,22] discussed below. Together with the results proven by Dafermos [8], this methodology can be the basis for a general class of flux and random initial data.
Notably, both Brownian motion and Brownian bridge initial conditions for g (x) will satisfy the requisite regularity properties, as well the additional property of being Gaussian stochastic processes. Furthermore, when integrated to obtain integrated Brownian motion and integrated Brownian bridge, they remain Gaussian stochastic process, leading to useful properties on both g and g that aid in the analysis and computation of minimizers as specified in (1.3). As is shown in [5], we have the key relationship for the solution w given by w (x, t) = g (y * (x, t)) (1.4) (when g (y * (x, t)) exists) expressed as a function of the greatest minimizer, . (1.5) For the case of Burgers' equation, one uses the relation g (y * (x, t)) = x−y * (x,t) t to obtain a special case of (1.4) given by When one introduces Brownian motion initial data, (1.4) leads to the following result for the variance of the solution expressed in terms of the variance of the minimizer in (1.5), for a flux function given by H (p) = 1 j |p| j Var (w (x, t)) = t − 2 j−1 Var sgn (x − y * (x, t)) |x − y * (x, t)| Var (w (x, t)) = 1 t 2 Var (x − y * (x, t)) . (1.8) Later on, we specialize to the case x = 0. In Section 3 we discuss extensions of (1.8) to more general flux functions. For example, with the flux H (p) = p 2n , we have a generalization of (1.8), providing an understanding of the propagation of variance over the sample space Ω. We also consider a related case of H (p) = |p|, in light of its Legendre transform L (q) = 0 +∞ |q| ≤ 1 |q| > 1 (1.9) for which the solution has a very simple probabilistic interpretation. In particular, one can show formally that the frequency of jumps in w (x, t) decreases in time.
The minimum defined on the right-hand side of (1.5) is of course unique under the classical assumptions that H is smooth, superlinear, and uniformly convex [11]. However, the analog of the result (1.4) remarkably continues to hold even when these assumptions are relaxed, i.e. when the flux function has a lesser degree of regularity, the convexity of H is not uniform or even strict, and when it is no longer superlinear. Even if the initial condition g is also discontinuous, one can prove a modified form of (1.4) by accounting for the vertices of g.
A particularly interesting and yet broad range of cases results from studying a Gaussian stochastic process, X (r), a random process completely determined by its second order statistics. In other words, such a process at arbitrary times {r i } n i=1 is completely determined by the means E {X (r i )} and the corresponding covariance matrix defined by ( ) i j = Cov X (r i ) , X r j . We denote the inverse covariance matrix by A := −1 .
In Section 4, we consider a Gaussian stochastic process X (r) and partition the real line into sets ζ i with measure ν (ζ i ) > ε > 0, and take a finite number of points, among which we compute the minimum. We pair this initial condition with a piecewise linear flux function. The Legendre transform L of this flux function only takes a finite number of slopes, each on an interval of finite measure. By taking the intersection of our partition with this interval, we obtain a finite covering for it, and thus a finite number of points at which to analyze the quantity minimized in (1.5). We want to calculate the solution E {w (x, t)}, which we have shown in [5] to depend only on the derivative of L or g at the point of the minimizer. By considering first the local minima of this quantity for each of the finite number of segments of L, and then taking the minimum over those segments, we obtain a remarkable closed-form expression for the solution. Among the results we can obtain from this is the expectation of the solution at a point (x, t).
are expressed in terms of integrals of nested error functions, and the sets {R i } refer to the event that a minimum occurs at a vertex of L (discussed further in (5.6)). The quantity A i in the first term of (1.11) here denotes the inverse covariance matrix corresponding to the ith affine segment of L, and A for the same quantity over the whole domain. The vector µ is given by the discrete points of µ = E g (y) + tL x−y t , where the second term is deterministic. The quantityμ refers to the analogous quantity in the ith segment. The integers n i correspond to technical details of partitioning the intervals for which L has different slopes, as described in Section 5.
An important aspect of this evolution is that for Burgers' equation with random initial conditions, one has an increasing variance in the probabilistic sense for the solution w (x, t) as a function of t for a given x, but formal calculations suggest that it has decreasing variance in the sense of T V (w (x, t)) as a function of time. In this manner, it is almost paradoxical that conservation laws which lead to shocks from smooth solutions can also act in such a way as to smooth out random initial conditions.
A key issue in kinetic theory is the distribution and formation of discontinuities in the solution, known as shocks. In particular, for a Gaussian stochastic process paired with a piecewise linear flux function, we can expand on the results of Section 5, including (1.11). In doing so, we derive expressions (both exact and approximate) for the distribution of these shocks based on the minimizer switching from a point on one segment of L to another. From this result, we also compute the total density of shocks.
The expressions obtained in Sections 5 and 6 hold for any Gaussian stochastic process. Some cases of particular interest include that of Brownian motion or Brownian bridge initial conditions, as well as Ornstein-Uhlenbeck ( [26], p. 439). In Section 6, we apply our results to special cases and observe the simplifications for these specific processes. Since many of our results are exact and in terms of error functions, this also provides a pathway to further numerical computation. By taking the limit of a finer partition of the real line, we also explain how one may be able to pass to a limit and obtain results for the continuum problem in addition to the discretized one.

Stochastic processes
In considering randomness for the initial conditions along the entire real line R, it is important to have a well-defined construction. A stochastic process X (s) is a collection of random variables on a probability space (Ω, F , P). The space Ω can be thought of as the set of all possible outcomes for the random variable, F as a set of reasonable sets of outcomes (i.e. measurable sets) and P is a measure with total mass 1 assigning the probabilities for which an outcome or set of outcomes in F can occur. The expectation E {X} of a random variable X is defined as E {X} = Ω X (ω) dP (ω). We also require that the measure P be countably additive, i.e. P ( Example 2.1. Consider a collection of random variables {X α } α∈I for some index set I such that X α ∼ N (0, 1). That is, take identically distributed random variables that are normally distributed with mean 0 and variance 1. Define The process S (n) is known as a random walk and is a stochastic process on the probability space (Ω, F , P) where Ω = N, F is the set of Borel sets on R, and P is the usual probability measure.

Gaussian stochastic processes
Let {R i } m i=1 be a collection of independent, identically distributed random variables with standard normal distribution, i.e. R i ∼ N (0, 1) as in Example 2.1. If a set of random variables {X i } n i=1 satisfy for constants a i j , {µ i }, we say that (X 1 , ..., X n ) form a multivariate normal distribution. We also define the covariance of two random variables below.
Definition 2.2. Let X i and X j be random variables. We define the covariance of the random variables by This is the basis of the definition for a Gaussian stochastic process. In particular, one has the following [27]: Theorem 2.3. If (X 1 , ..., X n ) have a multivariate normal distribution with density given by where − → µ is the mean, and Σ the covariance matrix, then the joint moment generating function is given This is easily proven using the definition and algebraic manipulation and properties of characteristic functions. We can now define what we mean by a Gaussian stochastic process: be arbitrary and increasing without loss of generality. We call a stochastic process {X (t)} a Gaussian stochastic process if (X (t 1 ) , ..., X (t n )) has a multivariate normal distribution.
Definition 2.4 implies that a stochastic process that is Gaussian is completely characterized by the properties of its mean and covariance matrix. A natural question to ask is what sort of processes might satisfy this requirement, and how one proves this property. We next give several examples relating to cases of particular interest in the broader literature. More details can be found in [27].
is almost surely continuous (iii) W (t) has independent increments with a normal distribution, i.e.
Definition 2.6. We call a Brownian motion conditioned on W (T ) = 0, denoted {W b (t)} 0≤t≤T a Brownian bridge. An equivalent definition is to set In particular, we note that W b (0) = W b (T ) = 0.
Definition 2.7. We call a process W o (t) stationary Ornstein-Uhlenbeck if it is defined by The stationary Ornstein-Uhlenbeck process has the important property that its associated covariance matrix is invariant under translation in time. A basic result is the following. Part (b) is proven in the Appendix. A basic result that follows from the standard theory is given below.

Theorem 2.9. (Covariance Matrices for Brownian Motion and Brownian Bridge). Let T and {t
(a) Let {S (t)} t>0 be an integrated Brownian motion with piecewise linear drift. Then the covariance at times 0 ≤ s ≤ t is given by For s and t outside of [0, T ], the variance of Z (t) is constant but nonzero.
An outline of the proof of Theorem 2.9 is provided in the Appendix.

Generalizations of Burgers' equation with random initial data
We consider a generalization of Burgers' equation by considering the flux function for j ≥ 2 as follows: In this way, we obtain the following generalization of (1.8).
Lemma 3.1. With H defined by (3.1), consider the conservation law The solution is then .
The solution w (x, t) to the conservation law (1.1) can then be expressed as Remark 3.2. For Burgers' equation, j = 2 and the relation (3.3) reduces to i.e., the result (1.6). Then the solution is given by w (x, t) = g (y * (x, t)) at points where g is continuous, where y * is the value of the minimizer as before.
Proof. The Legendre transform of H is simply The result then follows immediately from the solution formula (1.3).
This result is particularly of interest as it shows, for the special case of the flux function H (p) = |p|, that the solution of the conservation law reduces to finding the minimum of the integral of the initial data, i.e. the minimum of g. For example, if g is given by a Brownian motion, the solution at given x and t is given by solving the problem of finding the minimum of an integrated Brownian motion on a finite interval (whose endpoints are dependent on x and t). The integrated Brownian motion is a Gaussian stochastic process (Lemma 2.8 (b)), so the minimizer y * can easily be approximated by using the mean and covariance matrix (Theorem 2.9 (a)).
For example, if the initial condition g consists of ±1 with probabilities p and 1 − p, then one has w (x, t) ∈ {0, g (x − t) , g (x + t)}. The value taken by w depends on whether the minimum is attained on the interior, left endpoint, or right endpoint of the interval [x − t, x + t].
Remark 3.4. Due to the Legendre transform L taking on infinite value outside of |q| > 1, for a given (x, t), the minimizer y * (x, t) takes its value in the domain This is a manifestation of the finite domain of influence strictly enforced, i.e. within a cone y = x ± t.
Theorem 3.5. Consider the conservation law (1.1) with flux function given by (3.1), i.e. H (p) = p j j , j ≥ 2. For Brownian motion initial conditions one has, with y * depending on (x, t),

9)
noting that y * is short for y * (x, t).

AIMS Mathematics Volume 3, Issue 1, 148-182
Proof. For a flux function of the form (3.1), a minimum is attained when One then writes Remark 3.6. To obtain the result for Burgers' equation, one sets j = 2, and observes that Theorem 3.5 then yields the result (1.8): Further, for the case of Brownian motion and for an arbitrary time t, Var (g (y * (x, t))) is increasing in x since y * (x, t) is increasing in x (see [5]), and thus the variance of w (x, t) also increases.
Remark 3.7. From (3.9) for Burgers' equation at x = 0, using Theorem 3.5 we have Using the solution formula, we also have Combining (3.15) and (3.16) yields Since g is Brownian motion, one has Var g (z) = z for any z > 0. Thus, one can formally write

Combining (3.17) and (3.18) then yields
Further, if we are willing to make the conjecture that for some power p ∈ N, then we can deduce Then the result (3.15) can be further extended to obtain which seems to formally suggest that the variance of the solution on x = 0 increases as time increases. One can also perform computations using a finite difference scheme through a numerical study, e.g. from [30].

Spatial density of shocks
We now present a formal argument that the density of shocks in the solution w (x, t) in x decreases as t increases for the case H (p) = |p| with any random initial data. A shock can only form when there is a transition from Region I to II or III, or from Region II to III (see Figure 1). As noted in 3.4, the range of minimization problem is restricted to the interval |x − y| ≤ t, so that we have (3.23) The minimum of g (y) is either on the interior or attained at one of the endpoints, x ± t. If it occurs on the interior, we must have g (y * ) = 0. However, in the event of a minimum at x + t, one may have g (y * ) = g (x + t) < 0. The case g (x + t) = 0 will occur on a set of measure zero, and thus we have w (x, t) < 0 for this case. Similarly, if the minimum is attained at y * = x − t, we will have w (x, t) = g (x − t) > 0. We want to examine probabilities and consider how minima can change with a small change in the spatial coordinate. Let time t be fixed and consider a small change in x from x to x + ∆x. If the minimizer does not change, then the value of w (x, t) also does not change by the argument above. If it does change, then so does g , and consequently, w (x, t). We calculate the probability that there is a transition in w, i.e. the probability of a lower minimum on (x + ∆x − t, x + ∆x + t) compared to that on (x − t, x + t). This is given by In order for (3.24) to be nonzero, there must be a minimum outside the region of overlap, i.e. one of two events. In the first, a new minimum exists on the segment (x + t, x + ∆x + t), which we term region III, whose value is below the original minimum. In the second case, the original minimum fell in the range (x − t, x + ∆x − t) , (region I), which is excluded when moving from x to x + δx as illustrated in Figure 1. In the latter case, the minimum attained on this segment fails to be in the new interval when x is increased by ∆x, so the new minimum is a consequence of the original no longer being contained in the domain of consideration.
In the former case (with the new minimum attained on the right), the minimum moves lower, so w (x, t) = ∂ x min {g (y)} jumps from zero to a negative value. We express this probability as In the latter case, the minimum on the left gives rise to a larger minimum in the central region, so the jump is to a minimum value. The inequalities above are value for any x and ∆x. In articular, we could write a set of inequalities for a partition We now perform the same computation for any x but with time t + ∆t to obtain the following: When moving from (x, t) to (x + ∆x, t), the interval of minimization is shifted ∆x to the right. To have different minimizers at points x and x + ∆x, at least one of two events must occur: (i) the minimizer was attained in the leftmost segment (x − t, x − t + ∆x) which is omitted for the minimization problem at x + ∆x and hence lost; (ii) a new, lower minimum is introduced by considering the segment (x + t, x + t + ∆x) on the right and hence gained.
We can now evaluate the probability in (3.26) to obtain the probability of a transition for (x − ∆t, t + ∆t) , i.e. obtain We now compare (3.27) with (3.25) noting that the second terms are identical. One has the inequality In other words, in the notation of uniformly spaced points {x i } i∈Z , we have and By combining (3.29) and (3.30) and then summing over all i, one sees that the density of jumps is decreasing in time, i.e. i P jump between x i and x i+1 (3.31) Further, if the process is stationary, one has a stronger result in that the local variation at a given x i is also nonincreasing. In the limit, one may also write Thus, we can bound the probabilities that the right and left-hand limits of the solution at a particular point x in space change as time is increased slightly for a transition of the type from Region I or II to III.
In addition to this case, there are potential transitions from Region I to II. In order for such a transition to occur, we would need Considering the same probabilities at x + ∆x and t + ∆t, we have i.e., Then if a sample Brownian motion path ω satisfies 3.37, it will satisfy 3.35, so one has and similarly, one shows that the density of shocks arising from these transitions is also decreasing in time.

Application to a prototype case
As discussed earlier, an important application of our methodology of involves the case where the flux function is given simply as H (p) = |p|. We will first consider this case for (1.1) to illustrate the main ideas and a convergence proof before proceeding to the more general case in the next section. As we shall see, though the formulation of this problem is simple, it raises a number of interesting points and the solution under our methods presents a clear geometric interpretation. Even for an initial condition given by a stochastic process such as Brownian motion, the essence of computing the solution profile is reduced to calculating in which of three regions Brownian motion has a minimum. To calculate the solution w(x, t), one needs to determine whether the minimum of the Brownian motion is at The more general case is based on the same principles but involves more calculation.
To ease notation, we will present the following prescription as to how one determines the value of the solution profile w (x, t) for a particular realization of one Brownian path and for one specific point (x, t) ∈ R×(0, ∞). Implicit will be the existence of an underlying, appropriate partition of the real line, the technical details of which are omitted here and are to be outlined in the next section.
First note that the Legendre transform of the flux function H given above will be 0 inside a compact interval and +∞ elsewhere. More specifically, we have .
To this end, we consider the quantity One then has the remarkable simplification that minimizing the expression (4.2) over the entire real line is equivalent to computing the minimum of g (y) in the interval [x − t, x + t]. Although finding the minimum of g (y) for a stochastic process, say integrated Brownian motion, may not be so simple, for our purposes it will suffice to determine whether one of three distinct cases can occur. Indeed, either the minimum of g (y) will occur at one of the endpoints or somewhere in the interior.
In our setup, we discretize Brownian motion in the following way. Without loss of generality (see Remark 2.10), consider the interval [0, 1] and let M ∈ N be given. Divide the interval into 2 M pieces, resulting in a partition r (M) n , and choose random variables so that the discretized Brownian motion is constant on each subinterval, with its values prescribed by a random walk. Our goal is to show that the solution with initial conditions given by the discretized version of the integrated Brownian motion indeed converges to integrated Brownian motion as we let M → ∞. We define our setup of the discretization more formally in Definition 5.2 before introducting our main results.
We will denote our discretized Brownian motion by W M , and the integrated process as I M = W M . Note that our procedure superficially resembles the classical construction of Brownian motion as a limit of interpolated (and thus continuous) random variables. However, using the interpolated random walk would not be as useful in our case, as the minima of its integral will not general be at the specified n points (vertices of W M and L).
We define We then integrate and define where (4.4) defines our discretized integrated Brownian motion. The initial value problem we want to consider then becomes (using superscripts for M to ease notation) We construct a solution to this discretized problem. A key result, derived in Section 3.1, is that (4.6) The events that the minimum occurs in these different cases (i.e., either somewhere in the interior of [x − t, x + t] or at one of the endpoints) can be described as Gaussian integrals. In other words, the value of the solution is completely determined by the value of the initial condition depending on whether the minimum is attained at the left endpoint, in the interior, or at the right endpoint. The details are presented in the next section for the more general case, but take the form of a series of nested Gaussian integrals. A second set of questions involve whether the solution converges, and if so in what sense to solutions to the limiting problem, i.e. w t + |w| x = 0, w (x, 0) = g (x) (4.7) where g is given by a Brownian motion. Indeed, one can show that we have convergence in probability of w M to w in the following sense. Given A = B+C, where P (|C| > α) will be small, we will want to bound the probabilities of {A > 0} and {A < 0} in terms of sets only involving B. One may write (4.8) We use Billingsley Theorem 37.9, p. 534 [3]. Let n ∈ M be fixed and let D be the dyadic rationals Then we have for Brownian motion, W, with δ := 2 −M , the bound for any α > 0 P sup where K is a positive real number. By continuity of Brownian motion, we can extend this inequality to all t (not just dyadic). We consider for simplicity the interval (s) (4.14) Thus, we have A = B + C, and C has the property Using the above, one can then write and recalling that δ = 1 Q by definition, we have Then, by choosing α and N appropriately, one has for a pure number K independent of N. We use these inequalities in conjunction with the convergence theorems below.

Properties of the Brownian motion discretization
In particular, we want to prove that I N defined by (4.4) is a Cauchy sequence in probability. Given an ε > 0 we choose J such that ). This means that for all j, k > J we have, for any particular ω in the Polish space, the inequality This means that for any ω and any real number α, one has (4.20) A similar bound is obtained for the integrals of the discretized Brownian motion. Using the definitions above we write, for t ∈ [0, 1] , and any particular ω so that W N j and W N k are continuous, bounded functions, so that for any ω and α ∈ R we have Hence we have (4.23) Next, we claim that if s and r are in their respective closed, bounded intervals, and one has for some C ∈ R and any fixed ω the inequality then one also has (4.25) To prove this note that for each ω the Brownian motion W (s; ω) is continuous, so its difference |W (s + r) − W (s)| is also continuous in (r, s) and will have a maximum and minimum on 0, ∆r (J) × [0, 1]. Let (r * , s * ) be a point of maximum. Then there is a sequence, (r n , s n ) converging to (r * , s * ) . We have thus This implies convergence in probability, which implies convergence in distribution, i.e., for any α such that P [R * = α] = 0, which in our case is all α by continuity, one has lim n→∞ P [R n > α] = P [R > α] , (4.28) i.e., one has Hence, the assumption that P [|W (s n + r n ) − W (s n )| > α] ≤ C implies the conclusion (4.25) , proving the claim. We use these in conjunction with a basic bound on Brownian motion to prove the following Lemma. We apply this to X = W (t) − W (s) with k := 4. Note that X is a Gaussian with mean 0 and variance σ 2 = |t − s| . The fourth moment of this Gaussian is then 3σ 4 = 3 |t − s| 2 . Thus, we can write, for r ∈ [0, δ] the inequality From this inequality, one obtains Combining this with the inequality (4.23) , for j, k > J one has for any t ∈ [0, 1] Our next lemma entails passing to a subsequence to achieve the convergence in the desired sense. We state this without proof as it follows easily from analysis.

Lemma 4.2.
There is a measurable function I * (ω, s) such that I N j → I * in probability and a subsequence converges a.u. in ω and uniformly in s. By Lemma 4.2 we know that there is a subsequence of I N j converging a.u., although the convergence in measure should be adequate. We consider only the subsequence that converges almost uniformly in ω and also in measure in ω uniformly in the s variable. We will just refer to this subsequence as I j for brevity rather than I N j . By containment of sets in conjuction with (4.36) for j ∈ N : j −2/5 < ε, one has the inequalities Define, analogous to B M above, the quantity Since Lemma 4.1 says that I N j (ω, s) is fundamental in measure uniformly in s, we know that there is an I * to which I N j converges in measure. Then we have using I j in place of I N j ,

Using Lemma 4.2 we then have that
Convergence in probability implies convergence in distribution, so that we have P B j < α → P {B * < α} . Now using this in (4.37) we take limits and obtain, P {B * < −ε} ≤ P {A ≤ 0} ≤ P {B * < ε} . (4.38) If we consider only the subsequence that converges almost uniformly, then we have that for any δ > 0 we have a set F such that P (F) < δ and on Ω \ F one has uniform convergence of continuous functions B j so that B * is continuous. Hence we can take the limit as ε → 0 and obtain from (4.38) that Thus we have the following theorem.
In other words, we have convergence in probability of the solution to the discrete problem to that of the continuous problem (1.1).
Recalling that A < 0 is equivalent to I (s) having its minimum at the left endpoint, i.e. (4.14), we see that the probability that the minimum it attained at the left endpoint is a limit of probabilities involving the discretizations I M .
While we have developed these ideas in the context of H (p) = |p|, the same methodology can be implemented for general polygonal flux, as the latter simply intrdouces deterministic drift, and the bounds on Brownian motion remain valid. . When evaluated at the argument x−y t , the function exhibits limiting behavior under small or large time leading to interesting results in these limits for the solution of the minimization problem. For small time t, the function L is infinite outside a small interval, making it more likely that a minimum is obtained at a vertex of L. For large t, L can be approximated as a single linear segment. In this case the minimum is then likely to be at a vertex of g rather than at a vertex of L. In numerical studies, this can be illustrated by comparing the variance in the spatial variable x at given fixed times t 1 , t 2 , etc. As is apparent in Figure 2 computed using [30], the total variation tends to decrease as a function of t. This augments the formal calculations shown in Section 3. This is despite the fact that the probabilistic variance at a point x increases in time.

Analysis of the piecewise linear flux function with Gaussian stochastic process initial data
In addition to the intuitive interpretation of the asymptotic limiting behavior of the solutions, we prove a rigorous result for a broad class of initial conditions. We consider a discretized approximation to a Gaussian stochastic process as an initial condition paired with the piecewise linear flux function. The virtue of this approach is that the result holds for any stochastic process in this class, and is exact without relying on the particular structure of say Brownian motion or Brownian bridge. We recall a result from [5].
Theorem 5.1. Let L be polygonal convex, as described above, and g be piecewise constant. Then w (x, t) = g (y * (x, t)) when the minimizer y * is at a vertex of L = L x − y * (x, t) t when the minimizer y * is at a vertex of g (5.1) Figure 2. By performing numerical simulations using a finite difference scheme on Burgers' equation for Brownian motion initial conditions, one can see that the longer time behavior exhibits less variance in x and appears more smooth, in accordance with the analytic results.
a.e. in x (for fixed t > 0) is a solution to (1.1). Should a minimum occur at a point where both g and L have a vertex, one has a discontinuity in the solution with w (x−, t) taking the value of the left limit g (x, t) and w (x, t) having the value g (x+, t). Note that for a fixed t > 0 and x a.e., one of the cases in (5.1) occurs.
This theorem will be needed as the final step in proving our main result of this section. Before stating the theorem, we clarify the choice of discrete points at which we will sample the stochastic process. This partition will depend on the value of x and t. Given x and t, the Legendre transform L of the flux function will only be finite on a finite interval, and therein L , the quantity of interest, will take N different values. We partition each of these intervals into pieces at which we sample the values of L and g . In addition, we will need to consider the value at each of the vertices of L. One is then left with a finite number of points, and computing the minimum amongst these is a far more manageable task than calculating the minimum along a continuum. Indeed, our main result presents an explicit formula for computing this minimum, and thus an exact expression for the expectation value of the solution w (x, t). One can then consider refining such partitions and taking a limit to the continuum. In a computational context, it may be necessary to first fix a partition, and restrict oneself to that same grid, but we leave the technical details to future works.
x − m 1 t] we set grid points for a fixed x and t as follows. Note that L is divided into N segments on which it takes finite values, and has N + 1 associated vertices. Starting from the leftmost segment, we label these vertices as r 1,1 , ..., r 1,n 1 , r 2,1 , ..., r i,1 , ..., r N+1,1 . We number the n i points that lie between each vertex sequentially using the second index, pairing them with the first index representing the vertex most immediately to its left, i.e. r 1,2 , ..., r 1,n 1 for points between the first and second vertex. We denote the total number of points (vertices and points between) by n, and define the intervals I x,t i := (r i,1 , r i+1,1 ). We illustrate the definition of these partitions in Figure 3.
. Let x and t be given and choose partitions of I x,t i as described in Definition 5.2. Let the number of such points be denoted by n i , and define i n i =: n. Define A i, j as the inverse of the covariance matrix of X (s) with respect to these points s i, j in the subpartition r j,2 , ..., r j,n j and suppress the second index of A, s, etc. Denote the eigenvalues of A i asλ i , and defineμ i as the means of each random variable, i.e.
The matrix A i can be diagonalized by Similarly, define A, {λ i } , {µ i } etc. as the covariance matrix, eigenvalues, means of the random variables for the entire process within the range (x − t, x + t).
Define also Q i as follows: i.e. the minimum on the ith segment of the flux function L.
and partition each of these N intervals into n i points. The problem of finding the minimum over the entire interval I x,t is then simplified to finding the minimum over these N i=1 n i = n points and N + 1 vertices of L, which are the endpoints of I x,t i .
Theorem 5.4. Let the flux function H in the conservation law (1.1) be piecewise linear and convex, with N segments and N + 1 vertices. Let the initial condition g N (s) be given by a discretized Gaussian stochastic process, as outlined in Definition 5.2. The probability that the minimum occurs on the ith segment, i.e.
is given by The probability of such a minimum occuring at a vertex point of L is given by an expression similar to (5.6) with the integral over the jth vertex as the outermost integral (treating the vertex as a segment with only one point).
Finally, define the event that the minimum occurs at the jth vertex of L by R j and set (for notational convenience) p N+i = P {R i }. Then the expected value of the solution is given by a.e., that is, an average of the slopes of L weighted by the probability of the minimum occurring on the ith segment of L, plus terms at vertices of L.
Remark 5.5. In Theorem 5.4, we present closed-form probability of the solution taking on one of a number of values for a given x and t, as well as the expectation of the solution at this given value. In fact, we can also write an expression for the distribution of the local minimum on segment i as a function of its value s by instead taking the last integral in (5.6) up to s instead of ∞. We may also write expressions for the cdf of the value of g + L on the vertex, and for the cdf of the minimum value obtained on just one segment. Consider first, for given i ∈ [1, N], the domain I x,t i (with parameters x, t) where L is a purely affine function. Then the cumulative probability distribution (cdf) for the minimum on this ith segment, i.e. the quantity Q i is given by wheres = (s, s, ..., s) andμ the vector given by E g (ȳ) + tL x−ȳ t evaluated at the discrete set of points.
Furthermore, the cdf of a potential minimum that occurs at the jth of the N + 1 vertices of L is given by Remark 5.6. The case of the Gaussian stochastic process includes the deterministic case, i.e. when p i = 1 for some i dependent on x and t and p j = 0 for j i.
Proof of Theorem 5.4. We consider an initial condition given by the discretized Gaussian stochastic process g Ñ (x). The key feature is that, using our methods, this problem of finding the minimum over the entire interval (x − t, x + t) is reduced to finding a minimum over n points, n i of which are on each segment, and N + 1 points that are on the vertices of L. Given an x and t, these n points are fixed.
Since we have a joint probability density for the values of this function at any of these points, it is a matter of integrating over the appropriate region(s) to obtain the desired probabilities. From there we can calculate not only the expectation value Ew(x, t) but the entire distribution of the solution w. To this end, let Y j n j=1 be a set of random variables with a multivariate probability density given by where A i := −1 i and i is the covariance matrix of the random variables in the subpartition ζ i , and x,μ ∈ R n i are vectors. Here the components of µ and x are related by µ i = tL x i −y i t + Eg (y i ). Thus, f contains all the information we will need to compute the minimum along the ith segment of L.
For simplicity, we write out the distribution over the points n i,2 , ..., n i,n i to have values less than or equal to s i . This is expressed as (5.11) Note we have supressed the first index in Y i, j (which would be i) in these quantities over segment i for ease of notation. To find the probability that a minimum on one segment is below a value s, we can use the result (5.11) and must all sum over all the (mutually exclusive) events that each random variable Y i is a minimum as follows where we have (n i − 1)−fold integrals from −∞ to s on the right-hand side of (5.12). A similar construction then yields the result (5.6). We now want to use equation (5.1) from Theorem 5.1, which states the solution w (x, t) takes the value of L at any point where the minimum is achieved at a vertex of g, and the value g when achieved at a vertex of L. By accounting for both the possibility of a minimum at one of the N segments or at one of the vertices of L, we can write not only the expectation value of the solution at a given point, but the entire distribution. The expectation value is consequently i.e., it is given by the average over these possibilities. The event of a minimum at a vertex of L may be more than a set of measure zero at the discrete level, but one might expect convergence to zero as we take the limitÑ → ∞. The situation in regards to these two cases is illustrated in Figure 4. Here we display integrated Brownian motion, but these arguments hold for any stochastic process: (a) In most cases, none of the points z j i coincide with the vertices of the function L, so the minimum on each segment is well-defined and one computes E {w} as described below; (b) Occasionally the points z j i may intersect {x − m k t} for some i, j, k but this is a set of measure zero and can be ignored.
For a specific realization of the stochastic process gÑ (x), the values of the solution w are within the range {c i }. The solution w takes the value c N+1−i when the minimum is achieved along the ith segment. If the minimum is achieved at a vertex of L, the expectation of the solution is zero by assumption, so the second set of terms drops out. Remark 5.7. As an aside, we note that one can transform the integrals to a different coordinate system and simplify the exponentials to only square terms. However, the domain of integration is transformed in such a way that the integration is nontrivial and results in a nested sequence of integrated error functions. By construction, the matrix A is symmetric and real, so the eigenvalues are real there exists a matrix U i such that U T AU = D, where D is the diagonal matrix consisting of the eigenvalues λ n i . Any repeated eigenvalues are assigned to {λ i } as needed, i.e. λ k = λ k+1 is permitted. Furthermore, we know U −1 = U T , so we can write for a unitary matrix U so that |U| 2 = 1. Hence, given any vector µ and a real, symmetric matrix A, we can write Thus, one has f z j , λ j = (2π) −n/2 |Σ| − 1 over the transformed domain Ω (rotated and scaled from the original region [−∞, s] n i ).

Exact and approximate results for the density of shocks
We now consider the density of shocks in the solution w (x, t), i.e. a spatial coordinate x where This was discussed in the special case H (p) = |p| in Section 3. Recall from Section 4 the notation I x,t i = (r i,1 , r i+1,1 ) to track the intervals along which L has slope d i . From the arguments above, it is clear that at point x where a shock occurs, the minimizer y * (x, t) jumps from one segment of L to another. By continuity, we can assume that for sufficiently small ε > 0, y * (x − ε, t) is located within the same segment in the limit. In other words, one writes Let ∆x > 0 and consider the following. Suppose that y 1 := y * (x − ∆x, t) is located where L has slope d 1 and y 2 := y * (x, t) where L has slope d 2 . When we shift the spatial coordinate by amount ∆x, the value of the quantity to be minimized changes by R 1 = −d 1 ∆x at y 1 and by R 2 = −d 2 ∆x at y 2 . Thus, the net change is a decrease (d 2 − d 1 ) ∆x at y 2 relative to y 1 . This means a shock from this limit can only occur when d 2 > d 1 . This argument is illustrated in Figure 5. If we let ∆x be negative, i.e. we consider the points x and x + ∆x instead of x − ∆x and x, we can have a shock for the case d 1 > d 2 . We now want to find an expression for the probability density of such shocks.
To be more precise, we let a := ∆x (d 2 − d 1 ) . We want to compute the probability that the minimum shifts from one segment to another. First consider the simpler problem with only two segments, and the probability of the minimum shifting from a particular point on the first segment to a particular point on the second segment. To illustrate this in a simple way, consider the points r 1,2 and r 2,2 among the n 1 + n 2 + 1 points in the system. We also use the notation that the random variable Y i, j corresponds to the point r i, j , supressing the subscript r. We compute the probability that the variable Y 1,2 associated with the point r 1,2 is at least s, Y 2,2 is in the range between s and s + a, and Y 1,i , Y 2,i , and Y 3,1 are all at least s + a for all i 2 as follows P Y 1,2 = s 1,2 , Y 2,2 ∈ s 1,2 , s 1,2 + a , Y j,i ≥ s 1,2 + a for all i 2, 1 ≤ j ≤ 3 where f is the joint distribution for the variables s 1,1 , ..., s 3,1 at the n 1 + n 2 + 1 points, and we have fixed s 1,2 . Now, we want to divide by a and take the limit. When we differentiate with respect to x, the integral with respect to s 2,2 (in the (n 1 + 1)−th argument of f ) drops out due to the Fundamental Theorem of Calculus, assuming sufficient continuity properties. The probability of transitioning from a minimum from one segment to the other can be written explictly as follows (1,2) at (x, t) and min at (2,2) at (x + ∆x, t)} (2,2) ds u,v i.e. the s 2,2 entry is evaluated at s 1,2 in f . This can be generalized to computing the probability of transition from one segment to another, which will entail n i n j terms. These terms correspond to a jump from a minimum at s attained at one of the n i possible points from segment i to one of the n j possible points on segment j. Now we extend this computation from one point to another, to all points in a segment i to another segment j. Let that is, the probability that the minimum on segment i equals a given value s, that the minimum on a different segment j falls slightly above this same value s, and that the minimum on all of the other segments excluding i and j is above the value s + a. Then, as x is changed an infinitesimal amount, the value of the solution w (x, t) will flip from d 1 to d 2 . In the limit a ↓ 0 (or equivalently, ∆x ↓ 0), the density of shocks is obtained. One may write an exact expressiom and take this limit to obtain ds m,n f s 1,1 , ..., s N+1,1 | s i,k =s j,k =s . (6.6) for the density of any transitions of a minimum between segments i and j. The joint probability distribution function f is as outlined in (5.8).

Figure 5.
We are interested in conditions under which the global minimizer y * (x, t) changes from one segment to another, as this is the only case for which the solution profile w (x, t) = L (y * (x, t)) will change, resulting in a shock. Consider for ease of notation the case for which L has only two slopes and take an initial condition g that has local minima at y 1 and y 2 and rapidly increases elsewhere. We sketch L x−∆x−y t and L x−y t , and want the minimum to change in the limit ∆x → 0. For simplicity, in this illustration we consider the case N = 2, i.e. where L has only two slopes. In order for the minimum to switch from y 1 to y 2 , the relative change of the argument at y 2 compared to y 1 , given by (d 2 − d 1 ) ∆x, must be negative.
One can diagonalize this matrix. Indeed, denoting the eigenvalues of A by {λ m } n m=1 , A can thus be diagonalized by U T A U = D, where the diagonal matrix D is given by D jk = δ jk λ j . We may then rewrite the integrand f (·) in (6.6) as n−2 l=1 e −λ l z 2 l and have, similar to the proof of Theorem 5.4: where the transformed domains of integration depends upon k and k . The density of transitions, in a range (s, s + ds) can be expressed using these results. The integration over the domain Ω may appear complicated due to the high dimension of the space and the nested nature of the integrals, but one can make some simplifications. For example, as is further illustrated in the next section, the eigenvalue spectrum of A falls off rapidly. Thus, only a (smallest) few of the eigenvalues are relevant and many of the integrals can be well-approximated as δ−functions about y l = 0.

Transitions between vertex and non-vertex points
Above, we have considered the question of transition from an overall minimum from a segment i to a different segment j. There is also the issue of a minimum obtained at one of the N segments of L replaced by a minimum at a vertex, and vice versa. We will quantify the probability of such an occurence.
First, we comment that without discretization, we would have a transition from w( g ( j, 1)). Now, with the discrete model, we will evaluate exactly the probability that if Y i,k = s, then Y j,1 ∈ (s, s − ∆x(d i + g ( j, 1))) and vice versa. We assume for the first case that d i + g ( j, 1) < 0.
Recall that the grid we place is for a fixed x and t. In this section, we will consider changing x (or similarly, t) by such a small amount ∆x that all of the original grid points within (x − t, x + t) remain, and no new grid points are introduced into the interval (x + ∆x − t, x + ∆x + t). Thus, the points r k,l remain unchanged. We also assume that the minimum attained at (x, t) occurs at the vertex r j,1 and make the approximation that L still nearly has a vertex at r j,1 despite the ∆x shift.
The first step is to identify the condition for local minima at vertices r j,1 to overtake Y i,k , the absolute minimum over R located at some kth point on the ith segment. We use Y new i,k to denote the new local minimum when L is shifted by ∆x. We calculate In other words, (6.8) is the requirement for a transition from a minimum at the vertex of g at r i,k to the vertex of L originally at r j,1 that has now shifted. Of course, we also need the condition The condition (6.8) is equivalent to Once we have established conditions (6.9−6.10), we can now calculate that probability using the discrete setup in which we deal with only the points r i,1 , r i,2 , ..., i.e. perform the calculation using Gaussian integrals as follows. Let P ∆x i, j be defined as the probability of a transition from segment i of L to the vertex point r j,1 of L, so that The calculation for the opposite transition, which can only occur when d i + g (r ( j, 1)) is positive, is

Applications to selected examples and open problems
The results of the preceding sections apply in broad generality for any kind of randomness that falls under the umbrella of Gaussian stochastic processes. In addition to the general results proven for an arbitrary Gaussian stochastic process, it is also interesting to analyze the behavior of the solution to the conservation law under a few specific cases. In particular, we will illustrate the results for the cases of Brownian motion and Brownian bridge. As these processes correspond to the initial condition g , we must also note that g (x) will be a Gaussian stochastic process for both of these cases, termed integrated Brownian motion and integrated Brownian bridge, respectively. For each of these cases, the covariance matrix takes a special form as was described in Section 2.

Brownian motion and Brownian bridge as initial conditions
In cases such as Brownian motion, Brownian bridge, or Ornstein-Uhlenbeck, one observes specific structure in the matrices and integrals entailed in potential applications of Theorem 5.4, which may make it somewhat more tractable to numeric computation than an arbitrary Gaussian stochastic process. For example, in some cases the relevant inverses of covariance matrices can be neglected aside from a narrow band of elements close to the diagonal. For example, consider an integrated Brownian motion S (t) and let {t i } n i=1 be given. Observe that the covariance matrix for integrated Brownian motion from (2.9) can be written as [27] i j so it is symmetric with respect to permuting i and j. One of the main quantities of interest in our results is of course the inverse A of this matrix. Upon taking N large, one observes an interesting property in that a substantial fraction of the diagonal elements of the matrix A are concentrated at one value. Numerical computation indicates that this number is close to 14.354... independent of N. Furthermore, the smallest of the eigenvalues start off at one value, with many concentrated near it, and then increase sharply, as illustrated in Figure 6. In addition, the numerically significant parts of the columns of the matrix A are repetitive. We denote the smallest eigenvalue by λ 1 . Due to the nature of the integrands taking the form e −λ n z 2 n dz n , there will be a rapid falloff as λ n increases. As a first-order approximation, one may assume the first N eigenvalues have the value λ 1 exactly, and ignore the rest as they will not make a meaningful contribution to the solution. From the minimization perspective, we are simply neglecting the terms corresponding to points at which the probability of attaining the minimum is very small. This will lead to a simplification of the form where one can set λ i = λ 1 for a number n of the eigenvalues, and ignore the subsequent ones. For the shock densities, one observes a similar phenomenon, and expects a transition probability of the form whereμ and A now correspond to the mean and inverse covariance matrix on segments i and i together. In Section 2, the covariance matrix for Brownian bridge in the range where it is defined is very similar to that of Brownian motion, despite some key differences in the processes. Outside of [0, T ] (under linear transformation), the variance of the integrated Brownian motion is constant. In applying this initial condition, one must set g (x) = g (x) = 0 outside of a finite range [0, T ]. In fact, one observes the same qualitative behavior for the eigenvalues of its inverse covariance matrix. This will lead to a result similar to (5.6) for the solution to the conservation law with Brownian bridge initial data. Another interesting application is that of a stationary Ornstein-Uhlenbeck process, i.e. with a covariance matrix invariant under translation, which may facilitate computations for further results. The advantage of this process is that it has constant variance, making it advantageous for some physical models where the randomness is fairly homogenous, rather than Brownian motion where the process is fixed at 0 and variance grows with time. Figure 6. When N is large, the diagonal elements of the inverse covariance matrix A are concentrated very close to a particular value which can be calculated as approximately 14.3... The eigenvalues are also concentrated near one value, and sharply increase. Larger eigenvalues can be neglected due to the asymptotic behavior of the error function.

Approximation to smooth flux
A classical result by Dafermos [8] utilizing shock techniques describes how solutions to conservation laws with smooth flux can be approximated by those of polygonal flux. Our theorems can be combined with Dafermos' work in order to use polygonal flux as a building block for an arbitrary smooth flux function with randomness of the form of any Gaussian stochastic process.
One may also take the limits of N → ∞ and make the partitions r i, j increasingly fine in our solution (5.6). In this case, one should obtain the result in [12] for which Burgers' equation is considered with Brownian initial conditions and exact solutions are computed. Our approach thus provides an extension of exact, closed-form results without relying on the exact properties of a particular flux function.

Appendix: results for Gaussian stochastic processes
Here we present sketches of the proof of several of the results we cited earlier in Section 2.
S (t n ) = a n1 R 1 + ... + a nn R m + µ n (8.1) We can write the integral as an approximate sum with interpolation points (r 1 , ..., r n ) given as r i := j Nt n , so that By definition, then i.e. the only difference in the identities in (8.3) is where one terminates the sum. Since W (t) is Gaussian stochastic by the part (a), we may write By substituting (8.4) into (8.3) and defining new constants we obtain the result that integrated Brownian motion is also a Gaussian stochastic process by using the Central Limit Theorem for functions ( [26], p. 399).
We do not prove (c) and (d) here as the ideas are similar. In particular, since a Brownian bridge is a linear combination of W (t) and W (T ), it can easily be expressed in terms of independent, identically distributed normal random variables.
Proof of Theorem 2.9. (a) We first consider the covariance matrix for times s and t. Since piecewise drift does not alter the covariance matrix, we assume without loss of generality that this term is zero. Letting W (t) be a Brownian motion, one observes The cross term, E t 0 W (r) dr E t 0 W (r ) dr , vanishes since E {W (t)} = 0 for Brownian motion. Using a discretization argument as in [27], one may also construct (8.5) from first principles. (b) Let Z (t) be an integrated Brownian bridge constructed as in Definition 2.6. As in part (a) we consider the sum approximation to the Brownian bridge assuming no drift, and consider first only two points 0 ≤ s ≤ t ≤ T . We make note of (2.9) and also note Cov W (T ) , One obtains similar results by (stochastically) reflecting the process to consider it on the interval [−T, T ] for symmetry. One can also obtain these results by discretizing the process and writing the approximation of the integral, then limiting to the continuum using the Central Limit Theorem for functions. We also note that although the process Z (s) is constant outside the interval [0, T ] (or [−T, T ] depending on the definition), Z (s) does not vanish for s > T with probability 1.