Deconvolving breath alcohol concentration from biosensor measured transdermal alcohol level under uncertainty: a Bayesian approach

The posterior distribution (PD) of random parameters in a distributed parameter-based population model for biosensor measured transdermal alcohol is estimated. The output of the model is transdermal alcohol concentration (TAC), which, via linear semigroup theory can be expressed as the convolution of blood or breath alcohol concentration (BAC or BrAC) with a filter that depends on the individual participant or subject, the biosensor hardware itself, and environmental conditions, all of which can be considered to be random under the presented framework. The distribution of the input to the model, the BAC or BrAC, is also sequentially estimated. A Bayesian approach is used to estimate the PD of the parameters conditioned on the population sample’s measured BrAC and TAC. We then use the PD for the parameters together with a weak form of the forward random diffusion model to deconvolve an individual subject’s BrAC conditioned on their measured TAC. Priors for the model are obtained from simultaneous temporal population observations of BrAC and TAC via deterministic or statistical methods. The requisite computations require finite dimensional approximation of the underlying state equation, which is achieved through standard finite element (i.e., Galerkin) techniques. The posteriors yield credible regions, which remove the need to calibrate the model to every individual, every sensor, and various environmental conditions. Consistency of the Bayesian estimators and convergence in distribution of the PDs computed based on the finite element model to those based on the underlying infinite dimensional model are established. Results of human subject data-based numerical studies demonstrating the efficacy of the approach are presented and discussed.


Introduction
Historically, researchers and clinicians interested in tracking alcohol consumption and metabolism in the field would require data from either a drinker's self-report or from having them use a breath alcohol analyzer. Because both methods require active participation by the subject, the data they produce are often plagued by inaccuracies. Self-report often leads to misrepresentation as (1) subjects may deviate from naturalistic behaviors due to the reporting requirement seeming unnatural, and (2) alcohol directly impairs subjects' ability to be an active participant [1]. Using a breath alcohol analyzer correctly requires specialized training and can produce erroneous measurements due to mouth alcohol and/or a reading based on a shallow breath by the subject. Dating back to the 1930's, ethanol, the type of alcohol in alcoholic beverages, has been known to be excreted from the human body through the skin [2][3][4][5]. This is due to the fact that water and ethanol are highly miscible [6] and the ethanol finds its way into all of the water in the body. More recently, this observation paved the way for the development of a device to measure the amount of alcohol excreted transdermally through the skin [7][8][9]. The benefits derived from such a device include the availability of near continuous measurements and the ability to collect them passively (i.e., without the active participation of the subject). This gives researchers and clinicians the potential to continuously observe naturalistic drinking behavior and patterns. There is also the possibility of making these devices available on the consumer market (e.g., wearable body system monitoring technology like Fitbits, Apple watches, etc.). In addition, the ideas we discuss here may also be applicable to the monitoring of other substances once the appropriate sensor hardware has been developed.
The challenge in using transdermal alcohol sensors is that they provide transdermal alcohol concentration (TAC), whereas alcohol researchers and clinicians have always based their studies and treatments on measurements of breath alcohol concentration (BrAC) and blood alcohol concentration (BAC). Thus, a means to reliably and accurately convert TAC to BrAC or BAC would be desirable. At levels up to approximately 0.08 (see, for example, [13,14]) BrAC correlates well with BAC via a simple linear relationship based on an empirical relationship known as Henry's law [10,11]: BAC = ρ B/Br × BrAC, where the constant ρ B/Br is known as the partition coefficient of ethanol in blood and breath.
More generally, according to Henry's law, when a liquid is in contact with a gas, the concentrations, C L and C G , of a compound present in both the liquid and the gas will come to equilibrium according to the linear relationship C L = ρ L/G C G , where the empirical determined constant ρ L/G is known as the partition coefficient for the that compound in that liquid and gas. Not surprisingly, the partition coefficient, ρ L/G , is temperature dependent and of course its actual value will vary depending on the choice of units for C L and C G . It has been shown (see, for example, [12]) that at 34°C, the partition coefficient for ethanol in blood and air is ρ B/A = 2157 ± 9.6 for men and ρ B/A = 2195 ± 10.9 for women, at 37°C, the partition coefficient for ethanol in blood and air is ρ B/A = 1783 ± 8.1 for men and ρ B/A = 1830 ± 7.8 for women. Using a regression model, Jones [12] found that at 37°C the partition coefficient for ethanol in water and air is ρ W/A = 2133, in blood and air is ρ B/A = 1756, and between plasma (all of the components of blood with the exception of the oxygen carrying red blood cells) and air is ρ P/A = 2022. All of these values are for the case when the concentration of ethanol in air is given in units of grams per liter, and in water, blood, or plasma in units of grams per deciliter. We note that it is generally accepted that a BrAC reading of 0.08 percent alcohol corresponds to .008 grams of ethanol per 210 liters of breath and a BAC of 0.08 grams of ethanol per 100 milliliters (equal to 1 deciliter (dL) or 0.1 liters (L)) of blood.
Unfortunately, however, the correlation between TAC and BrAC/BAC, on the other hand, can vary due to a number of confounding factors. These factors include, but are not limited to, stable features of the skin like its thickness, tortuosity, and porosity, particularly as they apply to the epidermal layer of the skin, which does not have an active blood supply. Environmental factors such as ambient temperature and humidity can also affect both perspiration and vasodilation, and can thus alter skin conductance, blood flow, the amount of alcohol passing below the skin in the blood, and the amount and rate of alcohol diffusing through the skin. One would also expect there to be manufacturing and operational variations among different TAC sensors.
Earlier attempts to investigate the relationship between TAC and BrAC/BAC have used deterministic models [15][16][17][18][19][20][21]. Some utilized regression-based models [16], whereas others utilized first principles physics-based models that on occasion included modeling the transport of alcohol all the way from ingestion to excretion through the skin [22,23]. In our group's initial efforts, we modeled the transport of alcohol from the blood in the dermal layer through the epidermal layer and its eventual measurement by the sensor using a one-dimensional diffusion equation [15,21]. The parameters in the diffusion equation model then had to be fit or tuned (i.e., calibrated) to each individual subject, the environmental conditions, and the device through the use of simultaneous BrAC/TAC training data collected in the laboratory or clinic through a procedure known as an alcohol challenge. Once the model was fit, it could then be used to deconvolve BrAC from TAC collected in the field. This two-pass approach and the related studies were relatively successful [15, 19-21, 24, 25]. However, this calibration procedure is quite burdensome to researchers, clinicians, subjects and patients, and because the models were fit to a single uni-modal drinking episode, unaccounted for variation and uncertainty in the relationship between BrAC and TAC frequently arose, making it difficult to accurately convert TAC collected in the field to BrAC [26,27].
More recently, to eliminate the need for calibration, deconvolution of BrAC from TAC was effected using population models fit to BrAC/TAC training data from drinking episodes across a cohort of subjects, devices, and environmental conditions [24,25,28]. These population models took the form of the deterministic transport models but now the parameters appearing in the model equations were considered to be random. Then in fitting the models, instead of estimating the actual values of these parameters, it was their joint distributions that were estimated. Once the models were fit, they could be used to deconvolve an estimate of the BrAC input, and by making use of the distribution of the population parameters, conservative error bands could also be generated which quantified the uncertainty in the estimated BrAC [24,25]. The results in these studies were based on a naive pooled data statistical model and a non-linear least squares estimator.
In this paper, we seek to build on the approach described in the previous paragraph by now using a Bayesian approach to account for the underlying uncertainty and variation in the alcohol diffusion and measurement process. We obtain posterior distributions for the transport model parameters conditioned on the training BrAC/TAC data and regularized by prior distributions based on deterministic fits. Being Bayesian based, our approach yields credible sets for the estimated parameters and what we shall refer to as conservative credible or error bands for the deconvolved estimated BrAC. What is meant by the term conservative credible band will be made precise later.
An outline of the remainder of the paper is as follows. In the next section of the paper we provide a description of our method including a derivation of a new abstract parabolic hybrid PDE/ODE model for the transdermal transport of alcohol through he epidermal layer of the skin and its capture in the vapor collection bay of the sensor. Then using linear semigroup theory we obtain an input/output model in the form of a discrete time convolution. A discussion of finite dimensional approximation and convergence issues related to the use of our model to carry out the requisite computations is also included. Then in the results section of the paper we first construct our Bayesian estimator and present two theoretical results related to it: convergence of the finite dimensional approximation and consistency. We then show how our population model based on Bayesian estimates for the random parameters can be used as part of a deconvolution scheme that yields estimated BrAC curves and conservative credible or error bands from a biosensor provided TAC signal. In this section we also present and discuss a sample of our numerical findings demonstrating the efficacy of our approach. Our numerical studies were based on human subject data collected in the Luczak laboratory in the Department of Psychology at USC. A final section contains some discussion of our theoretical and numerical results along with a few concluding remarks and avenues for possible future research.

A distributed parameter model for the transdermal transport of alcohol
As in [21] and [24], and making use of an idea recently introduced in [28], we model the alcohol biosensor problem described in Section 1 using a one dimensional diffusion equation to describe alcohol transport through the epidermal layer of the skin coupled with an inflow/outflow compartment model to describe the perspiration vapor collection chamber of the TAC biosensor.
The epidermal layer of the skin sits atop the dermal layer. The dermal layer has an active blood supply while the epidermal layer does not. The latter consists of both dead (the stratum corneum layer which is closest to the surface) and living (the deeper layers closer to the dermal layer) cells surrounded by interstitial fluid. Not having an active blood supply, the cells in the epidermal layer obtain nourishment primarily from O 2 that diffuses in from the environment beyond the skin.
The SCRAM TAC biosensor (see fig. 1 in section 3.4 below) has a perspiration vapor collection chamber on the bottom of the sensor that sits atop, and is in direct contact with, the stratum corneum layer of the skin's epidermal layer. Perspiration in vapor form collects in the chamber. A small pump extracts a sample of the vapor from the collection chamber approximately once every 30 minutes. This sample is then electro-chemically analyzed based on an oxidation-reduction (redox) reaction in much the same way that a fuel cell produces a current (and heat and water) from hydrogen and oxygen. In the TAC sensor, ethanol molecules in the sample are oxidized producing electrons in the form of an electrical current. This current is converted into the TAC measurement based on an a priori bench calibration.
To make this more precise, we let Λ denote the thickness of the epidermal layer (units: cm) of the skin at the location of the sensor and let η denote the depth in the epidermal layer (units: cm), 0 ≤ η ≤ Λ, η = 0 denoting the skin surface and η = Λ denoting the boundary between the epidermal and dermal layers. Let t denote time (units: hrs) and let x(t, η) denote the concentration of ethanol at time t and depth η in the epidermal layer (units: grams per milliliter of interstitial fluid). Let w(t) denote the concentration of ethanol in the TAC sensor collection chamber at time t (units: grams per milliliter of air), and let u(t) denote the BrAC at time t (units: grams per milliliter of air). Let y(t) denote the TAC at time t (units: grams per milliliter of air), and let w 0 (units: grams per milliliter of air) and φ 0 (units: grams per milliliter of interstitial fluid) denote the initial conditions for w and x, respectively. We will assume that there is no ethanol in either the epidermal layer or the TAC biosensor collection chamber at time t = 0 so w 0 = 0 and φ 0 = 0. Let T denote the duration of the drinking episode (units: hrs). Then, With these definitions, our model takes the form x(t, 0) = ρ P /A w(t), 0 < t < T α ∂x ∂η (t, Λ) = βρ P /A u(t), 0 < t < T , w(0) = w 0 , x(0, η) = φ 0 (η), 0 < η < Λ, y(t) = θw(t), 0 < t < T , (2.1) where α > 0 denotes the effective diffusivity of ethanol in the interstitial fluid in the epidermal layer (units: cm 2 /hr), β > 0 denotes the effective linear flow rate at which capillary blood plasma from the dermal layer replenishes the interstitial fluid in the epidermal layer (units: cm/hr), and ρ P/A denotes the partition coefficient for ethanol in plasma and air with respect to the concentration units of grams per milliliter of plasma and grams per milliliter of air at 37°C (normal body temperature).
In modeling the TAC collection chamber, we assume that the inflow of ethanol is proportional to the flux out of (i.e., from right to left) the epidermal layer at the surface of the skin (i.e., at η = 0), α ∂x ∂η (t, 0), with constant of proportionality γ (units: cm −1 ), and the outflow is simply proportional to the concentration of ethanol in the collection chamber (i.e., a simple linear model) with constant of proportionality δ (units: hr −1 . Finally, the output gain, θ, represents the bench calibration factor for the TAC sensor that converts the Since the thickness of the epidermal layer, Λ, is in general difficult to measure and can be mathematically difficult to estimate computationally due to it determining the spatial domain of the diffusion equation, it is desirable to transform the system eq. (2.1) to a domain of fixed length, Λ = 1. We make the standard change of variable η η Λ thus rendering η dimensionless. For t ≥ 0, We also set w(t) = ρ P /A w(t). Then, recalling our assumption of zero initial conditions, the following hybrid ordinary-partial differential equation input/output system results ∂x ∂t (t, η) = q 1 ∂ 2 x ∂η 2 (t, η), 0 < η < 1, t > 0, dw dt (t) = q 3 ∂x ∂η (t, 0) − q 4 w(t), t > 0, x(t, 0) = w(t), t > 0 q 1 ∂x ∂η (t, 1) = q 2 u(t), t > 0, w(0) = 0, x(0, η) = 0, 0 < η < 1, y(t) = w(t), t ≥ 0, (2.2) where q 1 = α Λ 2 , q 2 = θβρp/A Λ , q 3 = γα Λ , and q 4 = δ. We note that since the only observable and observed quantities are BrAC, u, and TAC, y, the physiological interpretations of the variables and parameters in between that define our model in the form of an input/output map from BrAC to TAC are of little interest to us. Although we have relied on first principles modeling to derive the system of equations given in eq. (2.2), our motivation was not to gain a deeper understanding of the transdermal transport of ethanol. Rather, it was to be able to keep the dimension of the space of unknown parameters as low as possible by capturing the underlying physics and physiology of the transport process, albeit in a greatly simplified form. Indeed, our primary objective here is to first fit the parameters (or, more precisely, their distributions) in the model to observed input/output BrAC/TAC training pairs and to then use the resulting population model to obtain an estimate for the BrAC and associated error bars corresponding to a given TAC signal collected in the field from a member of the cohort or population that provided the data which was used to train the model.
Let q = [q 1 , q 2 ] T denote the unknown, un-measurable, and, in general, subject-dependent physiological parameters. The parameters q 3 and q 4 are device (i.e., hardware) dependent parameters and as such, we consider them to be bench-measurable empirically in the lab. We do note however, with simple changes of variable, the theory and methods we develop below apply, and their distributions could also be estimated along with those of q 1 and q 2 with the same techniques we use here to estimate the distributions of q 1 and q 2 . In addition, q 3 and q 4 could also be estimated using a deterministic scheme such as a regularized nonlinear least squares approach. For clarity and ease of exposition, we will focus our attention here on the development of a population model for a cohort of subjects by estimating the distribution of the un-measurable physiological parameters q 1 and q 2 .
Note that the Sobolev Embedding Theorem [29] yields that the norm induced by the V inner product is equivalent to the standard H 1 norm on V. It is not difficult to show that V is densely and continuously embedded in H q and that we have the Gelfand triple of dense and continuous embeddings V ↪ H q ↪ V*.

Boundedness
There exists a constant α 0 > 0 such that

3.
Continuity For all ψ 1 , ψ 2 ∈ V , we have that q a q, ψ 1 , ψ 2 is a continuous mapping from Q into ℝ.
and let P q n : H q V n denote the orthogonal projection of H q on to V n along (V n ) ⊥ . Standard arguments from the theory of splines (see, for example, [33]) can be used to argue that | P q n (θ, ψ) − (θ, ψ) q 0, as n → ∞, for all (θ, ψ) ∈ H q , and that ‖P q For n = 1, 2, ... and k = 0, 1, 2, ... we set x k n (η) = ∑ i = 0 n X i n, k φ i n (η), and we approximate the operator A(q) using a Galerkin approach. That is, we define the operator A n (q) ∈ ℒ V n , V n by restricting the form a(q, ·, ·) to V n × V n . We then set A n (q) = e A n (q)τ , and B n (q) = (I n − A n (q)) (0, ξ) − A n (q) −1 P q n ( q 3 q 2 q 1 , 0) , (2.9) where B n (q) ∈ ℒ ℝ, V n = V n . The matrix representations for these operators with respect to the basis φ i n i = 0 n are then given by A n (q) = − M n (q) T . Letting C n = [1, 0, 0, …, 0] ∈ ℝ 1 × (n + 1) , we consider the discrete time dynamical system in V n given by 10) or equivalently in ℝ n + 1 given by the system where X n, k + 1 = A n (q) X n, k + B n (q) u k , y k n = C n X n, k , and X n, 0 = [0, 0, …, 0] T ∈ ℝ n + 1 , we obtain that Using linear semigroup theory (see, for example, [21,34,35]) and in particular the Trotter-Kato semigroup approximation theorem (see, for example, [36] and [31]) the following results can be established (for proof, see [32]). (2.9), we have that A n (q)P q n (θ, ψ) − A(q)(θ, ψ) q 0, as n → ∞, for all (θ, ψ) ∈ H q , that ‖A n (q)P q n φ − A(q)φ‖ V 0, as n → ∞, for all φ ∈ V , and that ‖B n (q) − B(q)‖ V 0, as n → ∞, with the convergence in all cases uniform in q for q ∈ Q.
Finally, we will assume that we have training data, u k participants or subjects where without loss of generality (i.e., by padding with zeros) we have assumed that all training input/output datasets have the same number, K, of observations. In this case, for i = 1 , . . . , R we have, (2.12) where ℎ j (q) = CA(q) j B(q) ∈ ℝ and ℎ j n (q) = C n A n (q) j B n (q) ∈ ℝ, for j = 0, 1, . . . , K − 1.
This formulation facilitates the estimation of the population parameters q. If one wishes to find the parameters q for a specific individual, the methods outlined in Section 2.2 can still be applied by letting the indices i = 1, . . . , R refer to different measured BrAC/TAC events each with k = 0, . . . , K denoting the measurement times for the desired individual subject.

Bayesian estimation of dynamical system parameters
In this section we develop a Bayesian framework to estimate the unknown parameters q = [q 1 , q 2 ] T in the system eq. (2.2). To illustrate our approach, for simplicity but without loss of generality, we have assumed that the sensor parameters q 3 and q 4 have been benchmeasured and are therefore known and concentrate our effort on estimating the physiological subject-dependent parameters q 1 and q 2 . All of what follows below can easily be extended to estimating all four of the parameters in the model. Our underlying statistical model incorporating noise is based on the observation of y j i as in eq. (2.12) and is given by where V j i are our measured TAC values, and ε j i are the i.i.d. noise terms corresponding to person i at time jτ with σ > 0, τ > 0. Commonly, as we will assume in Section 3.2 and beyond, ε j i N 0, σ 2 . In order to be able to carry out the requisite computations, we consider the approximating statistical model based on eq. (2.12) (2.14) where once again the V j i ′s are assumed to be the measured TAC values. We consider q to be a random vector on some probability space {Ω, Σ, P} with support Q and assume that the prior distribution of q is given by the push forward measure π 0 . That is for A ⊂ Q, We assume independence across both i (individuals) and j (sampling times for each individual), for each i and j we have V j i − y j i = ε j i (commonly distributed N(0, σ 2 )) and similarly, V j i − y j n, i = ε j i (again commonly distributed N(0, σ 2 )). Letting φ denote the density of ε j i ′s, for q ∈ Q the likelihood and the approximating likelihood functions are given respectively by (see, for example, [37][38][39][40]) An application of Bayes' Theorem (see, for example, Theorem 1.31 in [41]) yields that the posterior distribution of q or the conditional distribution of q conditioned on the data, V j i , is a push forward measure π = π ⋅ | V j i that is absolutely continuous with respect to π 0 and whose Radon-Nikodym derivative, or density, for q ∈ Q is given by (2. 16) In this way, for A ⊂ Q, we have P q ∈ A | V j we have π 0 ≪ λ with density dπ 0 dλ = f 0 where λ denotes Lebesgue measure on Q, then π ≪ λ with conditional density f given by

Convergence in distribution
Consider the random variable q with posterior distribution described by the measure π given in eq. (2.15) and eq. (2.16), and let q n denote the random variable q but with posterior distribution π n given by eq. (2.20) and eq. (2.21). In this section we establish that q n Dist q as n → ∞; that is that q n converges in distribution to q. Recall that due to the physical constraints based on our model for the alcohol biosensor problem, eq. (2.2), we require that the parameters q lie in the interior of the positive orthant of ℝ 2 .
Theorem 3.1.-For Q a compact set in the interior of the positive orthant of ℝ 2 , a prior π 0 with compact support Q and a density that is continuous on Q, and a noise distribution with bounded density function φ and support on ℝ, q n , the random variable with posterior distribution π n given by eq. (2.20) and eq. (2.21) converges in distribution to the random variable q with posterior distribution π given by eq. (2.15) and eq. (2.16).
Proof.: For S a subset Q, the triangle inequality yields where φ is the normal density describing the distribution of the noise term in eq. (2.13), and Z and Z n are as in eq. (2.16) and eq. (2.21), respectively.
Focusing first on the limit of |1/Z − 1/Z n | as n → ∞, by Lemma 2.1 we have that the y i j (q) are continuous in q for q ∈ Q, i ∈ {0, 1, . . . , R}, and j ∈ {0, 1, . . . , K}. Since Q is compact, the y i j (q) are bounded and thus 0 < Z < ∞. By Theorem 2.2, since y j n, i (q) y j i (q) uniformly in q for q ∈ Q as n → ∞, 0 < Z n < ∞ for n large enough. Again by Theorem 2.2 it follows from eq. (2.16), eq. (2.21), and the Bounded Convergence Theorem that Z n → Z as n → ∞ and therefore that |1/Z − 1/Z n | → 0 as n → ∞. Then, essentially the same arguments yield that ∫ S L q | V j i − L n q | V j i dπ 0 (q) 0, from which it then immediately follows i dπ 0 (q) 0, and therefore from eq. (3.1) that q n Dist q as n → ∞ and the theorem has been proved. □ For the push forward measures π and π n from eq. (2.15) and eq. (2.20), respectively, we are commonly interested in their respective expected values and the convergences there within. Since Q is compact, a direct invocation of the Portmanteau theorem (see, [41]) establishes the following corollary which guarantees the convergence of all moments described by π n to those of π.

Consistency
In this section we demonstrate the strong consistency of the posterior distribution with respect to the parameters, q, by imposing stronger assumptions on the distribution of the noise terms ε j i in eq. (2.13) and on the prior, π 0 , by restricting Q to a rectangle in the positive orthant of ℝ 2 , and by applying the framework summarized in [42].
As in [42], we show consistency of the posterior distribution π as in eq. (2.17), rather than consistency of a point estimator based on the posterior distribution. As such, for prior π 0 over Q, posterior π ⋅ | V j i as in eq. (2.15), and i.i.d. noise ε j i N 0, σ 2 for σ > 0, we also consider that for q ∈ Q assumed known we have random variables V j i N y j i (q), σ 2 as determined by eq. (2.13) for i = 1, 2, . . . , R and j = 1, 2, . . . , K. Further, we have that V j i i, j are independent in i and j, but are non-identically distributed (i.n.i.d).
For clarity and brevity we consider the random vector in ℝ K + 1 and independent entries derived from the matrix equivalent to eq. (2.13) and eq. (2.14), namely V i = y i + ε i = Hu i + ε i and V i = y n,i + ε i = H n u i + ε i , with noise vectors With our reframing, for A ⊂ Q by independence in i we may rewrite eq. (2.15) as where for all i we have data vectors V i , and for our purposes we will be interested in the equivalent form on the right-hand side of the equation above where q 0 ∈ Q is the true value of our parameters [q 1 , q 2 ] T .
We first formalize the results discussed in Section 7 of [42] that handle the i.n.i.d case of posterior consistency. As such, we say that our posterior distributions {π(·|{V i })} as in eq.
(3.2) are strongly consistent at q 0 if {π(U|{V i })} → 1 a.s P q 0 ∞ for every neighborhood U of q 0 as R → ∞, where P q 0 ∞ = ∏ i = 1 ∞ P i, q 0 with P i, q 0 the probability distribution generated by For this we show that for sets A ⊂ Q with q 0 ∉ A, J A ({V k }) → 0 and J({V i }) → ∞ as R → ∞ in some appropriate manner to be made precise below. For J A ({V i }) → 0 we take the same approach as expressed in [42] and thus state the following definition below without motivation, where we note that for any two densities f, g on some space X their affinity, denoted Aff(f, g), is given by Aff(f, g) = ∫ X f(x)g(x)dx. For some δ > 0, all A i 's are strongly δ separated from q 0 for the model q ⟼ f i,q , and
To show J({V i }) → ∞ as R → ∞ we utilize the approach as outlined in the proof Theorem 1 of Appendix A.2 in [44] (specifically the proof of (8) in Appendix A.2). For a direct proof see [32]. For a similar approach see [45].
Before moving on to our main theorem we apply the following theorem and subsequent corollary to prove a lemma that will be of use to us later. For proof of the following see Theorem 5.3 of [46]. (Affine) The map q ↦ σ(q) is affine, in the sense that for any u, v ∈ V, σ(q, u, v) = σ 0 (u, v) + σ 1 (q, u, v) where σ 0 is independent of q and the map q ↦ σ 1 (q, ·, ·) is linear, and
Moreover for t ∈ (0, T] we have for q 1  Further, under zero-order hold the differentiability and Lipschitz properties of A(q) = e A(q)τ = T (τ, q) remain. Considering B(q) from Section 2.1, B(q) = (I − A(q)) (0, ξ) − A(q) −1 ( q 3 q 2 q 1 , 0) , we find that it is a sum and product of qdifferentiable and q-Lipschitz terms and thus is differentiable and Lipschitz in q. Since h and y as in eq. (2.12) are a composition and sum of q-differentiable terms they remain differentiable. Further, using eq. (3.4) we have the following Lipschitz bound for all j ∈ {0, 1, . . . , K} and q, q in the interior of Q, for C 1 the max of the operator norms for A(q), B(q), C over Q, and C A , B A the max Lipschitz constants of A(q) and B(q) over all k and Q. The final inequality above follows from eq.
(3.4) by noticing that for all φ ∈ H q , ‖A(q)φ − A(q)φ‖ H q ≤ C V ‖A(q)φ − A(q)φ‖ V and that by identification the supremum over V* is larger than that over H q . For A n (q) and B n (q) as in eq.
(2.9) by a repetition of the above arguments we maintain differentiability in q, and thus h n and y n as in eq. (2.12) are differentiable and Lipschitz in q.
Lastly, for all i = 1, . . . , R, j = 1, . . . , K we have that for q, q ∈ Q y j i (q) − y j where u j i are BrAC values bounded by definition to be in [0, 1], M is the Lipschitz constant from eq. (3.5), and K + 1 is the fixed upper bound on the number of temporal observations, j. Hence, the Lipschitz constant for y j i is independent of (i, j). By noticing that the previous statement holds for y j i, n with a repetition of the work leading to eq. (3.6), our lemma has been proved. □ A direct consequence of Lemma 3.1 is that for all i = 1, 2, . . . , R with K i (q, q) and Λ i (q, q) as in the statement of Theorem 3.3, we have where σ > 0 is the standard deviation of the N(0, σ 2 ) noise density, and M is the Lipschitz constant from eq. (3.6) that is independent of i (and j). Thus, for any δ* > 0, i ∈ {1, . . . , R}, and q, q ∈ q ∈ Q: ‖q 0 − q‖ 1 < δ * , we have that ‖f i, q − f i, q ‖ L 1 ≤ 2 K i (q, q) 1 2 ≤ 2 ℓ ‖q − q‖ 1 1 2 < 2 ℓ δ * 1 2 by the relationship between total variation and Kullback-Leibler distances, for ℓ as in eq. (3.7). If we let q* ∈ Q be such that ‖f i, q 0 − f i, q* ‖ L 1 > δ* and consider the set G = q ∈ Q: ‖q * − q‖ 1 < δ * 2 /(16 ℓ ) , then G is strongly separated from q 0 (see Definition 3.1). This follows from the relationship between Affinity and total variation distance (via the Hellinger distance) as well as by noticing that for any density ν on G, the marginal density of V 1 satisfies ≤ δ * /2 (for full example, see [32] or Example 3.5 in [42]). If this holds for all i then G and q 0 are strongly δ separated with δ independent of i.
With this example in mind we note that items 1 and 2 of Theorem 3.2 are satisfied if the following special condition is met:

1.
For every δ* > 0, there exist sets A 1 , A 2 , . . . with L 1 diameter less than δ*, where π 0 is the prior over Q. This follows from the fact that if special item 1 holds then we may take an ε*-neighborhood of q 0 , U = q ∈ Q: ‖f i, q 0 − f i, q ‖ L 1 < ε * ∀i . As discussed above, since ‖f i, q − f i, q ‖ L 1 is independent of i, U is non-empty and contains the set q ∈ Q: ‖q 0 − q‖ 1 < ε * 2 /(4 ℓ ) . Now set δ * = ε * 2 /(16 ℓ ), and by compactness cover Q with a finite number of disjoint sets A i determined by the balls q ∈ Q: ‖q i − q‖ 1  Specifically, the strong separation condition is satisfied as per the discussion leading up to special item 1 by noticing that on each A i we have ‖f i, q 0 − f i, q i ‖ 1 > ε *, and the convergent sum condition is satisfied by the fact that the A i 's can be considered (made) mutually exclusive with union contained in Q. We now state and prove our main theorem. Proof.: For any set A ⊂ Q with q 0 ∉ A we will use the form of π(A|{V i }) as in eq. (3.2) and handle the numerator, J A and denominator, J separately.
First, as Q is compact, for any δ > 0 we may cover Q by a finite number of sets A i , i = 1, 2, . . . , γ where each A i is a subset of an L 1 ball in Q. That is, for every i and q, q ∈ A i we have that ‖q − q‖ 1 < δ. For R large enough, if on each A i we consider the model q ↦ f i,q for i ∈ {1, . . . , R} and f i,q the density of the random variable V i with q assumed known, then special item 1 is satisfied for prior π 0 . Hence, by Theorem 3.2 we have that for some β 0 > 0, Now for Λ i , K i , and S i as in the statement of Theorem 3.3, for i = 1, 2, . . . , R and q ∈ Q, we have K i q 0 , q ≤ ℓ ‖q 0 − q‖ 1 and the bounds above we have that for every ε > 0 and i, q: K i q 0 , q < ε is non-empty and our choice in such q does not depend on i. Hence the set q: K i q 0 , q < ε ∀i is non-empty.
Thus, for B = Q we satisfy the assumptions of Theorem 3.3 and therefore find that ∀β > 0, e Rβ J({V i }) → ∞ a.s. P q 0 ∞ as k → ∞.
So for any set A ⊂ Q with q 0 ∉ A, from eq. (3.2) we have that π(A|{V i }) → 0 a.s. P q 0 ∞ as R → ∞ and thus the theorem has been proved. □ From Lemma 3.1 we find that we maintain the differentiability and Lipschitz properties of the finite-dimensional semigroup as in eq. (2.9) and respective kernel as in eq. (2.12). Thus, with a straightforward rewriting of eq. (3.2) and eq. (3.2) in terms of the finite-dimensional posterior eq. (2.20), and repetition of the work following Lemma 3.1 through the proof of Theorem 3.5 we have the following corollary.

Deconvolution of BrAC from TAC
In this section we consider the problem of using the biosensor measured TAC signal to estimate BrAC. We do this by deconvolving it; to wit we invert the convolution given in eq. (2.12) subject to a positivity constraint and regularization to mitigate the inherent ill-posedness of the inversion. Recall that the convolution given in eq. (2.12) was found by solving the finite-dimensional discrete time system eq. (2.10) derived from eq. (2.2). We employ the method originally described in [25], wherein the problem is formulated as a constrained, regularized, optimization problem (see, for example, [48]).
We first briefly summarize the treatment in [25] and then follow by showing how our work is able to make direct use of this theory. Let V and H be Hilbert spaces forming a Gelfand Triple, V ↪H↪V * . For an admissible set Q, a compact subset of the positive orthant of ℝ 2 , with q ∈ Q, let A(q) be an abstract parabolic operator defined by a sesquilinear form a(q, ⋅ , ⋅ ): V × V ℝ (i.e., one that satisfies items 1 to 3 in Section 2.1) that when restricted to φ ∈ V : A(q)φ ∈ H generates a holomorphic semigroup on H, e A(q)t : t ≥ 0 .
We next specify finite-dimensional operators A N ∈ ℒ V N , V N , ℬ N ∈ ℒ U, V N , and C N ∈ ℒ V N , ℝ that define the finite-dimensional system analogous to eq. (3.9). That is, let  To numerically carry out the requisite computations to actually determine u L * for given values of M, N and L = (M, N), we continue to apply the results in [25] while also connecting them to our treatment in Sections 2.1 and 2.2 above. We assume that the feasible parameter set Q is a compact rectangle in the positive orthant of ℝ 2 , we set H = H q and V = V as in eq. (2.4), and we identify the operators in eq. (3.8) with those in eq.
(2.6). Our distribution over q, π, is the finite-dimensional posterior π n ⋅ | V j i for fixed n as in eq. (2.20) and we proceed with the Bochner spaces V = L π n ⋅ | V j i 2 (Q; V ) and ℋ = L π n ⋅ K j i 2 q; H Q to achieve eq. (3.9).
For the state variables x j (η, q) we have that η ∈ [0, 1] and q ∈ Q = [a 1 , b 1 ] × [a 2 , b 2 ] for 0 < a i < b i , i = 1, 2. Further, for the inputs u(t, q) we have that t ∈ [0, T] and q ∈ Q. Let n be as in eq. We note that the optimization problem eq. (3.11) is a constrained problem, in that U k N of the previously stated matrix system are to be non-negative. With a proper placement of ℎ k L into the block matrix ℍ L , the approximating deconvolution problem eq. , (3.13) where U M is the Kℳ dimensional column vector of the coefficients of u ∈ U M , and Y M is the K + Kℳ column vector of measured output values y k followed by Kℳ zeros. Further, ℚ i M for i = 1, 2 are matrices with entries given by the U inner products of the basis elements for the subspaces S M as determined by the regularization term where u L; j i, * = E π θ* u L; j i, * , u L; j i, * are the predicted BrAC values found by finding the minimum of J L ( · ; r 1 , r 2 ) given by eq. (3.13) with r 1 and r 2 candidate values for the regularization weights from a specified feasible set in the positive orthant of ℝ 2 , ℝ + × ℝ + and y L; j i, * are the TAC values found by using u L; j i, * as input to eq. (3.12).

Numerical results
All of the data used in the studies detailed below, unless otherwise specifically stated (e.g., as in Section 3.4.2), were collected in USC IRB approved human subject experiments designed and run by researchers in the laboratory of one of the authors (S. E. L.) as part of a National Institutes of Health (NIH) funded investigation (see, [51]). These experiments were carried out in controlled environments wherein 40 participants completed one to four drinking episodes, with viable data recorded in 146 drinking episodes. BrAC was obtained using Alco-sensor IV breath analyzer devices from Intoximeters, Inc, St. Louis, MO, and participants each wore two SCRAM (Secure Continuous Remote Alcohol Monitoring) devices manufactured by Alcohol Monitoring Systems (AMS) in Littleton, Colorado (see Figure 1) simultaneously placed on the participants' left and right arms for TAC. For each separate SCRAM device, participants started their readings with a TAC and BrAC of 0.000, consumed alcohol (equivalent across all sessions per participant) in one of three different drinking patterns (single: over 15 minutes; dual: over two 15-min periods spaced 30-minutes apart; or steady: over 60 minutes), and then ended their session when their TAC and BrAC had returned to 0.000. We note that the placement of the two sensors challenges the independence assumption from Section 2.2, but for experimental purposes we will include all of the data as independently measured drinking episodes with this caveat in mind. In addition, we did not focus on any specific drinking pattern as including all possible patterns is in line with real-world, varying drinking patterns and may improve the generalizability of our model. In the calculations of Sections 3.4.1 and 3.4.2, as in eq. (2.5), time is discretized by a constant sampling time τ of 5 minutes and is subject to our zero-order hold assumption. While this challenges the implications of our zero order hold assumption, namely that τ = .0833 hours implies that subjects' BAC is constant for 5 minutes, this restriction is needed as computational complexity becomes unstable as τ decreases. In order to achieve this sampling time, we first linearly interpolate all of the data (both BrAC and TAC), and then re-sample at our desired rate of τ = 5. For Section 3.4.3, a τ will be discussed. Further, in all sections we assume a truncated multivariate normal (tMVN) prior π 0 (as in eq. (2.20)) on q with mean μ and covariance matrix Σ which varies from example to example.
Unfortunately, the USC IRB approved experiments for collecting human subject data were not designed around the problem of estimating the sensor collection chamber inflow and outflow parameters, q 3 [15,21,24,25], in particular with respect to the creation of the finite-dimensional, discrete-time kernel as in eq. (2.12). Ported code was verified against the original code through the use of unit tests.

Convergence in distribution-We used surface plots as well as Metropolis
Hastings (MH) Markov Chain Monte Carlo (MCMC) methods to validate our convergence in distribution results. Throughout the results described here we have that from eq. (2.14) for all sample times the i.i.d. noise ε is distributed as N(0, 0.005 2 ), and prior π 0 as in eq. (2.20) is distributed as the optimal distribution found in Section 6 of [25]. Specifically, the prior is a tMVN random variable with mean, μ = ] T . The choice of 0.005 for the standard deviation of ε was made to limit the role of noise in our subsequent sampling algorithms so that we may focus on the role of the dimension of our approximating system in the resulting posterior distribution. In addition, when comparing this choice in standard deviation to the peak TAC values of our training dataset, we had a typical peak TAC to noise ratio of 20. For computational reasons, we limit ourselves to measurements from a random subgroup of R = 3 subject drinking episode measurements. Figure 2 contains the resulting surface plots for n values of 1, 3, and 25. Further, Table 1 contains the means and credible regions for n values of 1, 2, 3, and 25 as determined by respective 1000 sample (1100 draws with a 100 draw burn-in period) MH MCMC sampling runs. The MCMC sample size chosen here was due to computational complexities and runtimes. Figure 3 displays deconvolution results for a randomly chosen, non-training drinking episode for different values for the dimension of the approximating system, n. This figure used the method from Section 3.3 along with the resulting posteriors as shown in Figure 2 and Table 1.

Consistency-
We again used surface plots as well as MH MCMC sampling methods to verify our consistency results. For these studies we have assumed that the noise ε is now distributed as N(0, 0.025 2 ) while our prior π 0 from eq. (2.20) is still the optimal distribution found in Section 6 of [25]. That is, π 0 is a tMVN with μ = We now investigate the results of Section 3.2 with respect to the field-measured (BrAC, TAC) data pairs. Note that we no longer are able to know the true value of the parameters, q 0 . Surface plots for increasing amounts of subject drinking episode measurements, R = 1, 26, 76, and 101 are contained within Figure 4. Table 3 displays the calculated means and 90% credible circle radii for increasing numbers of subjects, and thus data (corresponding to R as in Section 3.2) included in determination of the prior. To calculate these values, for each R, we again used 1400 MH MCMC samples (1500 draws with a 100 sample burn-in phase) generated according to our chosen prior. . The noise used was distributed as N 0, 0.025 2 . Note that this choice in prior also highlights the effects of data on the posterior by not providing any initial information to the posterior. When comparing this choice in noise standard deviation to the peak TAC values of our training dataset, we had a typical peak TAC to noise ratio of 8. Further, for the subspaces from Section 3.3 we set our discretization to be n = 3, time discretization as m = 1300, and discretized Q with m 1 = m 2 = 20. As with Section 3.4.2, the noise distribution is meant to simulate a situation where little is known about external effects that play a role in determining noise, and so the data is assumed noisy.

Deconvolution-As
In all of the numerical results presented and discussed in this section, the test dataset used consisted of five drinking episodes from four different participants. These drinking episodes were chosen heuristically so that the test dataset had two drinking episodes with peak BrAC greater than peak TAC, two drinking episodes with peak BrAC less than peak TAC, and one drinking episode with peak BrAC within 0.015 of peak TAC (deemed, "close"). The remaining drinking episodes were used as training data with the added restriction that whenever the desired number of training sets to be used was not too large, BrAC/TAC pairs from any participant who had a dataset included in the selected test data, would be excluded from being among the data used for determining the posterior. The primary exception to this restriction being Figure 6c, wherein we allowed all data that wasn't the current test data point to be included in the training set.
By linearly interpolating the BrAC and TAC data for each subject in all test and training datasets, we are able to re-sample our data with sampling interval τ = 45 seconds, and the time discretization m = 1300 previously mentioned. The associated participant IDs, TAC device placement (left vs. right arm), type of drinking pattern used (single, dual, or steady), and number of subjects used in posterior distribution determination (R) are labeled in Figures 5 and 6. As in eq. (3.14), we utilized all available non-test subject drinking episode measurements (R = 136) to determine population parameters r 1 * , r 2 * to be (4.7733, 1.7020).  Figure 2 illustrates rapid convergence in dimensionality of our spatial dimensions as n grows, thus bolstering the results of Theorem 3.1. Within two steps (n = 3), we have a graph that visually differs from that of n = 25 in ways barely perceptible. Paired with the credible circles in Table 1, these provide evidence that after n = 3 the mean and radius of the q credible circles stay consistent. Thus one can choose a computationally efficient n value that minimizes data lost when projecting eq. (2.6) into finite dimensions, eq. (2.11).

Bayesian estimation of model parameters
For the consistency results, Table 2 exemplifies the theoretical prediction in Theorem 3.5 that as the amount of subject data R grows, the posterior distribution better predicts the true q value by localizing the true parameter q 0 in mean with higher confidence (smaller credible circles). This increasing confidence is backed by the decreasing variance results shown in Figure 4. Notice that although the variance decreases, the mean is allowed to shift as more data are incorporated, as evident from comparing Figure 4c to Figure 4d. This shifting mean is permitted by the theoretical results and is likely due to the incorporation of 70 extra data points. Table 3 displays the shifting of the mean as more data are incorporated while quantitatively displaying a decreasing 90% credible circle radius, as expected.
As a final note, recall that TAC data were collected simultaneously from both the right and left arms of participants. For an investigation into this see [32].

Deconvolution of BrAC from TAC
In Figure 5a, the deconvolved mean BrAC curve more closely resembles the overall curve of the measured TAC values rather than the desired BrAC, with its increased values towards the latter part of the curve. This is to be expected as the measured TAC plays a role in the Bayesian step, but notice that the severity of the increase in the mean value curve is attenuated when compared to that of the TAC curve (red vs. yellow curves at the five hour mark). A similar phenomenon also appears in Figure 6a. For Figures 6a to 6c, as the number of subject drinking episodes R increases, we find that the mean curve grows towards the actual BrAC curve, an expected convergence phenomenon given the theoretical consistency results from Section 3.2.
Lastly, the 90% conservative credible bands about the deconvolved BrAC curves appear to always have a lower bound of zero. For the upper bound, the extreme case is shown in Figure 5c. These wide ranges in BrAC values allow us to capture the true BrAC value with high probability, but also leave us capturing far more area under the curve than needed. Thus, there are times when our two-step method would falsely signal that the TAC device wearer is far more inebriated than they actually are. This incorrect signaling might be due in part to the quantitative inaccurate readings in Figure 5c, wherein the TAC curve is greater than the BrAC curve. If our (training) data are mainly composed of the other cases (TAC following BrAC at an attenuated rate), then the algorithm will learn to "guess up" when turning the TAC back into BrAC. Lastly, this phenomenon may be due to the use of an uninformed prior as the credible regions in Table 3 do not approach zero. Hence, in the future use of an informed prior may be preferable.

Concluding remarks
We believe that the i.n.i.d. assumption from Section 2.2 (specifically Section 3.2) may not reflect the realities of the data collection method wherein two sensors are worn simultaneously on participants' left and right arms. We are currently investigating the elimination of this i.n.i.d assumption. However, the results from Section 3.4 are quite reasonable and are extremely useful when seeking to use this approach computationally in practice. Further investigation is needed regarding the traveling mean exhibited in the numerical results and how it is related to the non-inclusion of other covariate data (age, height, weight, etc.). This investigation may also be aided by attempting to combine the results of Sections 3.1 and 3.2 and let both the approximating dimension of the kernel, as well as the amount of training data, go to infinity simultaneously.
We also believe that the packaging of all error sources into a single random variable in Section 2.2 may yield larger uncertainties than formulations where many additive errors are considered. Namely, mixed-effects formulations may be utilized in order to separate errors and might lower overall uncertainty. However, the results from Section 3.4 are again quite reasonable, and the usage of mixed-effects formulations can be left as a design choice when considering the main goals and implementations of the PDE model from Section 2.1.
When our approach and results are optimized for use in actual practice, some care will have to be taken in regard to the sampling methods used in Sections 3.4.1 and 3.4.2. If Markov Chain Monte Carlo methods are still the method of choice, then issues such as sample size, convergence of the chains, and randomized chain starting points will need to be taken into account. In addition, a laboratory protocol will be need to be developed to estimate the sensor-dependent values of q 3 and q 4 that appear in eq. (2.3). As far as the numerical results presented in Section 3.4 are concerned in regard to the values chosen for q 3 and q 4 , they primarily serve to reinforce the theoretical results in Sections 3.1 and 3.2.
Finally, Of primary interest is the direct inversion of BrAC, u, given TAC as in eq. (2.12) without the need for a two-step process like that of the method used in this paper. We believe that a hierarchical model paired with a Gaussian Process framework may reduce the problem down to a single step (see, [52]). In such a framework, we place a prior on q, as well as a function space prior over u. In this way, we obtain a method that statistically deconvolves BrAC from TAC while providing a distribution from which we may derive error bars on the estimated BrAC values. We are also currently examining the inclusion of another hierarchical Bayesian model that incorporates covariates in both priors placed over q and u.
We believe that this will improve the accuracy of our predictions by allowing the use of all available subject and environment data.  . Associated data are contained in Table 4.