A Container-Attachable Inertial Sensor for Real-Time Hydration Tracking

Various sensors have been proposed to address the negative health ramifications of inadequate fluid consumption. Amongst these solutions, motion-based sensors estimate fluid intake using the characteristics of drinking kinematics. This sensing approach is complicated due to the mutual influence of both the drink volume and the current fill level on the resulting motion pattern, along with differences in biomechanics across individuals. While motion-based strategies are a promising approach due to the proliferation of inertial sensors, previous studies have been characterized by limited accuracy and substantial variability in performance across subjects. This research seeks to address these limitations for a container-attachable triaxial accelerometer sensor. Drink volume is computed using support vector machine regression models with hand-engineered features describing the container’s estimated inclination. Results are presented for a large-scale data collection consisting of 1908 drinks consumed from a refillable bottle by 84 individuals. Per-drink mean absolute percentage error is reduced by 11.05% versus previous state-of-the-art results for a single wrist-wearable inertial measurement unit (IMU) sensor assessed using a similar experimental protocol. Estimates of aggregate consumption are also improved versus previously reported results for an attachable sensor architecture. An alternative tracking approach using the fill level from which a drink is consumed is also explored herein. Fill level regression models are shown to exhibit improved accuracy and reduced inter-subject variability versus volume estimators. A technique for segmenting the entire drink motion sequence into transport and sip phases is also assessed, along with a multi-target framework for addressing the known interdependence of volume and fill level on the resulting drink motion signature.


Introduction
The underconsumption of water is a considerable global health concern [1,2]. Estimates suggest that 20% of adults exhibit significant dehydration, which is associated with numerous adverse health outcomes [3]. Recent evidence suggests that even mild underconsumption of water may have various negative health ramifications, including reduced cognitive function, obesity, and cancer [4]. The lack of an appropriate fluid intake is especially concerning for elderly individuals, due to the degradation of regulatory mechanisms with age [5].
To promote hydration maintenance, numerous sensing technologies have been demonstrated for tracking fluid consumption. Solutions include augmented drinking containers [6], which are currently available in the commercial marketplace, along with wearable [7] and contactless architectures [8]. Amongst these approaches, inertial measurement unit (IMU) sensors are a commonly employed Sensors 2019, 19, 4008 2 of 18 sensing modality. IMU sensors have been used to implement motion-based consumption estimation strategies for both augmented container [9] and wearable [10] hydration trackers.
Previous research has also proposed a container-attachable IMU sensor for hydration tracking applications [11]. This approach alleviates the restrictiveness of augmented containers by allowing for simplistic reconfiguration across multiple drinking vessels [12]. In addition, the non-wearable form factor of this device is especially promising for deployment amongst the at-risk elderly population, who may reject wearable solutions due to either preferences or various physical limitations [13]. The detection of drinking events for container-attached devices is also simplified versus wearable sensors, which may exhibit false alarms for arm movements exhibiting similar kinematics to drinking (i.e., eating, etc.) [14].
While IMU-based sensing is a promising solution for hydration tracking, prior work employing this modality has achieved only limited accuracy [14]. Moreover, while previous container-attachable studies have qualitatively described the relationship between the motion pattern of the bottle and drink volume [11], a direct description of the container's inclination during drinking has not been used. Finally, while approaches estimating the fill level of a drink have been explored for alternative sensing architectures as a potential mechanism to improve the system's performance, this approach has not been considered for an attachable sensing form factor [7].
The research described herein seeks to address these limitations for an attachable triaxial accelerometer sensor. An image of the sensor prototype attached to the bottle used in the collections described within this manuscript is shown in Figure 1. consumption estimation strategies for both augmented container [9] and wearable [10] hydration trackers. Previous research has also proposed a container-attachable IMU sensor for hydration tracking applications [11]. This approach alleviates the restrictiveness of augmented containers by allowing for simplistic reconfiguration across multiple drinking vessels [12]. In addition, the non-wearable form factor of this device is especially promising for deployment amongst the at-risk elderly population, who may reject wearable solutions due to either preferences or various physical limitations [13]. The detection of drinking events for container-attached devices is also simplified versus wearable sensors, which may exhibit false alarms for arm movements exhibiting similar kinematics to drinking (i.e., eating, etc.) [14].
While IMU-based sensing is a promising solution for hydration tracking, prior work employing this modality has achieved only limited accuracy [14]. Moreover, while previous container-attachable studies have qualitatively described the relationship between the motion pattern of the bottle and drink volume [11], a direct description of the container's inclination during drinking has not been used. Finally, while approaches estimating the fill level of a drink have been explored for alternative sensing architectures as a potential mechanism to improve the system's performance, this approach has not been considered for an attachable sensing form factor [7].
The research described herein seeks to address these limitations for an attachable triaxial accelerometer sensor. An image of the sensor prototype attached to the bottle used in the collections described within this manuscript is shown in Figure 1. By utilizing a feature space which more thoroughly describes the kinematics of the container during drinking, we seek to improve accuracy and reduce inter-subject variability versus prior studies utilizing only a limited motion description [14]. In addition, we seek to improve performance by forming aggregate consumption estimates based upon fill level estimates. As prior studies have demonstrated the ability to accurately classify fill levels with low resolution [3], we hypothesize that aggregate consumption estimates utilizing this quantity may improve performance.

Related Work
Successful estimation of the fluid intake associated with a drinking event may be conceptualized as a two-stage process. Namely, the drinking event must first be segmented from the streaming sensor output (i.e., sip detection), followed by the estimation of drink volume from the partitioned data. While the prior task, which constitutes a particular case of online gesture detection or spotting [15], remains an area of active research interest (i.e., [10,16]), this review focuses solely on By utilizing a feature space which more thoroughly describes the kinematics of the container during drinking, we seek to improve accuracy and reduce inter-subject variability versus prior studies utilizing only a limited motion description [14]. In addition, we seek to improve performance by forming aggregate consumption estimates based upon fill level estimates. As prior studies have demonstrated the ability to accurately classify fill levels with low resolution [3], we hypothesize that aggregate consumption estimates utilizing this quantity may improve performance.

Related Work
Successful estimation of the fluid intake associated with a drinking event may be conceptualized as a two-stage process. Namely, the drinking event must first be segmented from the streaming sensor output (i.e., sip detection), followed by the estimation of drink volume from the partitioned data.
While the prior task, which constitutes a particular case of online gesture detection or spotting [15], remains an area of active research interest (i.e., [10,16]), this review focuses solely on the latter portion of the problem statement. Details regarding drink event spotting for our container-attachable sensor can be found in [17].
Amft et al. [7] classified the initial fill level from which a drink was consumed using a magnetic coupling sensor system attached at both the shoulder and wrist. An experiment was conducted, in which three participants consumed 30 drinks from nine different container types in a scripted sequence. Drinks were consumed at three initial fill levels (full, half-full, and near empty), with subjects instructed to ingest only a minimal amount during each drink to avoid overconsumption. Individual-specific classifiers achieved an average classification accuracy of 72% across all subjects and container types. The classification accuracy varied considerably across subjects, ranging from 58% to 83%. While the estimation of aggregate consumption using fill level information is feasible, practical deployment requires increased resolution, along with a consideration of the effect of a varying drink volume on the estimation process. Moreover, the requirement of individual-specific training data limits the practical feasibility of this technique.
Mirtchouk et al. [18] estimated drink volume using multiple wearable audio and motion sensors. Sensors included an acoustic earbud, along with two commercial smartwatches and a commercial headset with embedded IMU sensors. Six participants consumed 171 drinks of multiple liquid types (i.e., coconut water, coffee, etc.) over a 72 hour period in an unscripted experiment intended to mimic real-world deployment. Regression models were trained using a leave-one-drink-out approach to account for the lack of consistent consumption patterns across participants. The mass of each drink was estimated with a best-case mean absolute percentage error (MAPE) of 47.2% under the assumption of a known fluid type. While multi-sensor systems are useful for laboratory collections, their feasibility for practical deployment is limited.
To eliminate the multiple sensor requirement of prior work, Hamatani et al. [14] proposed FluidMeter, a tracking system utilizing the embedded IMU sensor within a commercial smartwatch. While the results for multiple experiments were reported, the Lab-micro+ dataset most closely resembles the experiments conducted herein. In this experiment, 16 individuals consumed 1069 drinks in a laboratory setting. The ground-truth mass of each drink was recorded using a digital scale. Linear regression models utilizing both the sip duration and the integral of the accelerometer signals not parallel to the forearm were employed to estimate drink mass. A best-case MAPE of 58.9% was achieved for the integration model trained using leave-one-subject-out (LOSO) validation. While variability across subjects was not reported for the Lab-micro+ dataset, models trained with this data exhibited considerable dispersion in accuracy across subjects (MAPE ranging from 57.9% to 11.0%) when applied to dedicated in-situ collection (Wild-office dataset). MAPE for the in-situ collection using ground-truth data collected with a commercial smart bottle was 31.8%. While Fluidmeter offers an unobtrusive mechanism for consumption tracking for existing smartwatch users, some individuals may refuse to adopt the requisite technology to employ this approach. Moreover, while the authors noted the influence of both fill level and drink volume on the resulting motion signature, no methods were explored to address this interdependence within the estimation process.
Limited volume estimation results have been previously reported for the attachable architecture considered herein. Dong et al. [11] introduced the proposed sensing methodology, providing results for an experiment where seven subjects took 10 drinks from a refillable bottle. Various regression models using four hand-engineered features extracted from the accelerometer component parallel to the vertical axis of the bottle were evaluated. A best-case average aggregate consumption estimation error of 25% across subjects was achieved using support vector machine (SVM) regression models trained in a LOSO framework. A technique for estimating the relative drink mass using features describing the container's inclination during drinking, denoted as the inclination signature (IS) feature set, was introduced in [19]. The utility of the proposed feature set for predicting the volume of each drink was evaluated by training an SVM classifier for a subset of the data described herein (1200 drinks) Sensors 2019, 19, 4008 4 of 18 using a 90%/10% train/test split. For preliminary proof-of-concept, drink instances were classified according to their relative volume using various percentile-based partitions. For a median partition, a minimum classification error of 29.2% was achieved using a subset of the aforementioned feature set selected using filter-based techniques.
Characteristics of the aforementioned literature are summarized in Table 1.

Data Collection
Eighty-four college-aged subjects (52 male, 32 female, aged 18-37) completed 161 trials of an experiment requiring the consumption of 12 drinks from a refillable 750 mL bottle shown in Figure 1. Subjects self-identified as healthy during the consent process. The proctor did not observe any motor deficiencies (i.e., tremors, etc.) influencing the drink motion pattern during any of the trials. Subjects were permitted to complete a maximum of four trials over multiple sessions.
To begin the experiment, the bottle was filled to a consistent level, as determined visually by the experimental proctor. To ensure that a variety of drink volumes were captured, subjects were instructed to consume either a small, medium, or large drink prior to each sip, according to their personal preferences. The bottle was placed on an electronic scale following each drink to record the ground-truth mass. Variations from protocol (i.e., grasping and transporting the bottle without completing a drink, etc.) were noted by the proctor to allow for removal in pre-processing. The ground-truth fill level was estimated offline using knowledge of the bottle's geometry. Two trials were discarded after collection due to hardware failure, yielding a total valid data set of 159 trials (1908 drinks).
All subject recruitment, data collection, and record storage was conducted according to protocol approved by the Institutional Research Board at Michigan State University. The univariate distribution of the initial fill ratio (fill level normalized to total fillable height) and mass of each drink collected, along with their joint distribution, are depicted in Figure 2. As shown, drink instances are skewed towards larger fill ratios due to the protocol of filling the bottle completely at the beginning of each trial. The volume distribution is right skewed due to the variability in large drink consumption across individuals.
Subjects were permitted to complete a maximum of four trials over multiple sessions.
To begin the experiment, the bottle was filled to a consistent level, as determined visually by the experimental proctor. To ensure that a variety of drink volumes were captured, subjects were instructed to consume either a small, medium, or large drink prior to each sip, according to their personal preferences. The bottle was placed on an electronic scale following each drink to record the groundtruth mass. Variations from protocol (i.e., grasping and transporting the bottle without completing a drink, etc.) were noted by the proctor to allow for removal in pre-processing. The ground-truth fill level was estimated offline using knowledge of the bottle's geometry. Two trials were discarded after collection due to hardware failure, yielding a total valid data set of 159 trials (1908 drinks).
All subject recruitment, data collection, and record storage was conducted according to protocol approved by the Institutional Research Board at Michigan State University. The univariate distribution of the initial fill ratio (fill level normalized to total fillable height) and mass of each drink collected, along with their joint distribution, are depicted in Figure 2. As shown, drink instances are skewed towards larger fill ratios due to the protocol of filling the bottle completely at the beginning of each trial. The volume distribution is right skewed due to the variability in large drink consumption across individuals.

Hardware
A prototype sensor module consisting of a triaxial accelerometer (ADXL345, Analog Devices, Norwood, MA, USA), gyroscope (InvenSense IMU-3000, TDK, San Jose, CA, USA), and 802.15.4 wireless transceiver (IRIS Mote Module, MEMSIC, Andover, MA, USA) was connected to the refillable bottle, as depicted in Figure 1, for all experiments. Only data from the accelerometer was used within the current work. The sensor was placed beneath the lid at the bottom of the bottle to avoid interference with gripping. The local coordinate frame of the sensor was aligned with the bottle geometry as follows: (1) the x axis was aligned parallel to the vertical axis of the bottle, such that the accelerometer signal only consisted of the static acceleration due to gravity when the bottle was placed stationary on a level surface (i.e.: a = gx), (2) the y and z axes were aligned parallel and normal to the surface of the bottle, respectively. Data from the sensor module was transmitted to a base station (IRIS Wireless measurement system, MEMSIC, Andover, MA, USA) connected to a PC via USB at a target polling frequency of 20 Hz. All files were processed offline using MATLAB (Mathworks, Natick, MA, USA). A diagram of the collection system is shown in Figure 3.
refillable bottle, as depicted in Figure 1, for all experiments. Only data from the accelerometer was used within the current work. The sensor was placed beneath the lid at the bottom of the bottle to avoid interference with gripping. The local coordinate frame of the sensor was aligned with the bottle geometry as follows: (1) the x axis was aligned parallel to the vertical axis of the bottle, such that the accelerometer signal only consisted of the static acceleration due to gravity when the bottle was placed stationary on a level surface (i.e.: a = gx), (2) the y and z axes were aligned parallel and normal to the surface of the bottle, respectively.
Data from the sensor module was transmitted to a base station (IRIS Wireless measurement system, MEMSIC, Andover, MA, USA) connected to a PC via USB at a target polling frequency of 20 Hz. All files were processed offline using MATLAB (Mathworks, Natick, MA, USA). A diagram of the collection system is shown in Figure 3.

Preprocessing
To begin preprocessing, the bias of each component was estimated by averaging the initial 50 samples of each recording while the bottle was stationary. Portions of the signal corresponding to variations in protocol were then removed. Each file was subsequently parsed into macro-events using a threshold-based algorithm exploiting the stationary placement of the bottle between drinks. This process captured the entire interval for which the bottle was in motion (i.e., both transport to and from the mouth, along with sipping).
After partitioning, signals were resampled to the target frequency of 20 Hz to account for variability in the base station polling interval. Smoothing was performed using a two-sample moving average filter to mimic the frequency response of the original work conducted in [11]. The sensor's inclination with respect to gravity, which is equivalent to that of the container under ideal sensor

Preprocessing
To begin preprocessing, the bias of each component was estimated by averaging the initial 50 samples of each recording while the bottle was stationary. Portions of the signal corresponding to variations in protocol were then removed. Each file was subsequently parsed into macro-events using a threshold-based algorithm exploiting the stationary placement of the bottle between drinks. This process captured the entire interval for which the bottle was in motion (i.e., both transport to and from the mouth, along with sipping).
After partitioning, signals were resampled to the target frequency of 20 Hz to account for variability in the base station polling interval. Smoothing was performed using a two-sample moving average filter to mimic the frequency response of the original work conducted in [11]. The sensor's inclination with respect to gravity, which is equivalent to that of the container under ideal sensor alignment, was then estimated under the assumption of negligible dynamic acceleration, as specified in (1): where a j denotes the conditioned jth component of the accelerometer. Variation in the inclination over a single trial is depicted in Figure 4. in (1): where denotes the conditioned j th component of the accelerometer. Variation in the inclination over a single trial is depicted in Figure 4.

Micro-Event Partitioning
Further partitioning was conducted to isolate the drinking portion of the total segmented event. As described in similar work (i.e., [14]), this segmentation was motivated by the substantial variation that may occur in the transport motion pattern, depending on the specific drinking scenario, which does not necessarily influence drink volume. For the experiments described herein, variability in the transport motion pattern could be associated with the order of the drink within the trial (i.e., more careful handling for full containers, more rapid transport as the subject becomes familiar with protocol, etc.). In addition, differing orientation of the container upon retrieval may also introduce variability.
To isolate the drinking portion of the event, the asymmetry of the container about its vertical axis was exploited. Namely, as the lid of the container encourages consumption from the opposite edge, we hypothesized that while axial rotations may occur during transport, such rotations would be minimized during drinking to avoid spilling. As originally demonstrated in [20], these rotations may be quantified by computing the sensor's angular position within the cross-sectional plane of the bottle, as specified in (2).
As shown for a random sample of drinks in Figure 5, maintains a relatively constant value near the middle of each event. This stationary interval is used to define the following micro-event partition:


Lift: The portion of the macro-event proceeding the sip micro-event;  Sip: The portion of the micro-event for which is estimated as stationary;  Place: The remainder of the macro-event after termination of the sip micro-event.
To isolate the sip micro-event, the sample-over-sample difference signal of was initially thresholded to a maximum value of 8 degrees. All intervals meeting the threshold criteria which were separated by less than two samples were then merged, with the largest resulting interval extracted.

Micro-Event Partitioning
Further partitioning was conducted to isolate the drinking portion of the total segmented event.
As described in similar work (i.e., [14]), this segmentation was motivated by the substantial variation that may occur in the transport motion pattern, depending on the specific drinking scenario, which does not necessarily influence drink volume. For the experiments described herein, variability in the transport motion pattern could be associated with the order of the drink within the trial (i.e., more careful handling for full containers, more rapid transport as the subject becomes familiar with protocol, etc.). In addition, differing orientation of the container upon retrieval may also introduce variability.
To isolate the drinking portion of the event, the asymmetry of the container about its vertical axis was exploited. Namely, as the lid of the container encourages consumption from the opposite edge, we hypothesized that while axial rotations may occur during transport, such rotations would be minimized during drinking to avoid spilling. As originally demonstrated in [20], these rotations may be quantified by computing the sensor's angular position within the cross-sectional plane of the bottle, as specified in (2).
As shown for a random sample of drinks in Figure 5, α maintains a relatively constant value near the middle of each event. This stationary interval is used to define the following micro-event partition: • Lift: The portion of the macro-event proceeding the sip micro-event; • Sip: The portion of the micro-event for which α is estimated as stationary; • Place: The remainder of the macro-event after termination of the sip micro-event.
To isolate the sip micro-event, the sample-over-sample difference signal of α was initially thresholded to a maximum value of 8 degrees. All intervals meeting the threshold criteria which were separated by less than two samples were then merged, with the largest resulting interval extracted. The resulting micro-partition for the four random drink events depicted in Figure 5 is shown in Figure 6. The resulting micro-partition for the four random drink events depicted in Figure 5 is shown in Figure 6.

Feature Engineering
A set of 33 hand-engineered features describing the container's inclination during drinking were developed based upon visual observations during the experiment and resulting inclination estimates. In addition to key kinematic quantities (i.e., maximum inclination, maximum rate of inclination, etc.), amplitude values of both the raw and normalized curves were binned to create a low-level time-invariant description of the signal. Previously denoted as the IS feature set [11], the version employed herein has been slightly modified from its original form to employ exclusive binning in order to reduce the correlation between features. This modified feature set is summarized in Table A1.
For purposes of benchmarking, linear regression (LR) models utilizing previously identified characteristic motion features (i.e., sip duration and integral of inclination) [14] were also evaluated. While motivated by the methods of [14], it should be re-emphasized that a direct comparison is not applicable due to differences in the sensor placement (i.e., wearable versus attachable).  The resulting micro-partition for the four random drink events depicted in Figure 5 is shown in Figure 6.

Feature Engineering
A set of 33 hand-engineered features describing the container's inclination during drinking were developed based upon visual observations during the experiment and resulting inclination estimates. In addition to key kinematic quantities (i.e., maximum inclination, maximum rate of inclination, etc.), amplitude values of both the raw and normalized curves were binned to create a low-level time-invariant description of the signal. Previously denoted as the IS feature set [11], the version employed herein has been slightly modified from its original form to employ exclusive binning in order to reduce the correlation between features. This modified feature set is summarized in Table A1.
For purposes of benchmarking, linear regression (LR) models utilizing previously identified characteristic motion features (i.e., sip duration and integral of inclination) [14] were also evaluated. While motivated by the methods of [14], it should be re-emphasized that a direct comparison is not applicable due to differences in the sensor placement (i.e., wearable versus attachable).

Feature Engineering
A set of 33 hand-engineered features describing the container's inclination during drinking were developed based upon visual observations during the experiment and resulting inclination estimates. In addition to key kinematic quantities (i.e., maximum inclination, maximum rate of inclination, etc.), amplitude values of both the raw and normalized curves were binned to create a low-level time-invariant description of the signal. Previously denoted as the IS feature set [11], the version employed herein has been slightly modified from its original form to employ exclusive binning in order to reduce the correlation between features. This modified feature set is summarized in Table A1 (Appendix A).
For purposes of benchmarking, linear regression (LR) models utilizing previously identified characteristic motion features (i.e., sip duration and integral of inclination) [14] were also evaluated. While motivated by the methods of [14], it should be re-emphasized that a direct comparison is not applicable due to differences in the sensor placement (i.e., wearable versus attachable).

Machine Learning
To promote comparability with prior work, leave-one-trial-out (LOTO) validation was performed. This approach is consistent with the target use case, where models trained on a broad pool of users would be employed on a new user absent of customization. While a LOTO approach includes some subject-specific training data, the magnitude of this contribution is limited (i.e., maximum subject-specific training data of 1.9%). LOTO was chosen versus a LOSO approach to ensure that the sizes of the training and testing sets were consistent across evaluations.
A set of support vector machine (SVM) regression models was trained for both volume and fill ratio estimation. Linear, medium (kernel scale = 5.7), and coarse (kernel scale = 23) Gaussian kernel functions were considered. Hyperparameters were set to the default values, as established in MATLAB's Regression Learner toolbox. SVMs were chosen based upon their superior performance for the current sensor architecture in [11], where they outperformed both neural network and tree-based regression models. Features were standardized using z-score normalization.

Performance Metrics
Multiple performance metrics were used to assess the quality of the estimation on both a per-drink and aggregate basis. MAPE was used to quantify the estimation performance on a per-drink basis. This metric was chosen over alternative measures (i.e., root mean squared error, etc.) due to its utilization in prior work (i.e., [14,18]). For assessing the aggregate (i.e., multi-drink) consumption estimation accuracy, mean overall absolute percentage error (MOAPE) was used. Within the remainder of this manuscript, MOAPE(n) denotes the computation of MOAPE after the completion of n drinks within a trial. While MAPE provides the most rigorous assessment of the model's performance, MOAPE is useful for exploring utility in practical scenarios where aggregate consumption is of primary concern (i.e., estimating total daily consumption, estimating consumption during exercise, etc.).

Micro-Event Partitioning Analysis
The correlation between micro-event durations and drink volume is provided in Table 2. Consistent with the hypothesis presented in Section 3.4, sip duration is more strongly correlated with volume versus the two transport durations. Correlations between sip duration, along with the previously proposed motion feature related to the integral of the inclination [14], are shown in Table 3 for various ranges of controlled fill levels. Results are consistent with those presented in [14], which reported a correlation coefficient with drink volume of 0.69 and −0.60/−0.55 for sip duration and the integral of accelerometer signals not parallel to the forearm (which were denoted as being related to container inclination), respectively. Moreover, it should be noted that the strength of the correlation between both motion features and volume increases when the fill level is restricted within a narrower range of values. This result is also consistent with prior work [14], which denoted an influence of both volume and fill level on the resulting motion signature. The utility of the proposed micro-event partitioning process for volume estimation is explored in the remainder of this section.

Volume Estimation
Variation in MAPE for the various learning models considered is depicted in Figure 7. Models computed for the sip micro-event only are labeled as Stat., with all other reported results computed for the entire macro-event duration. Consistent with the wearable results in [14], LR models employing the integral of inclination outperformed those using duration. The level of improvement is enhanced versus the results presented in [14]. We hypothesize that this difference is associated with the use of the inclination estimate of the container, as opposed to the individual accelerometer channels which are related to this quantity.
parallel to the forearm (which were denoted as being related to container inclination), respectively. Moreover, it should be noted that the strength of the correlation between both motion features and volume increases when the fill level is restricted within a narrower range of values. This result is also consistent with prior work [14], which denoted an influence of both volume and fill level on the resulting motion signature. The utility of the proposed micro-event partitioning process for volume estimation is explored in the remainder of this section.

Volume Estimation
Variation in MAPE for the various learning models considered is depicted in Figure 7. Models computed for the sip micro-event only are labeled as Stat., with all other reported results computed for the entire macro-event duration. Consistent with the wearable results in [14], LR models employing the integral of inclination outperformed those using duration. The level of improvement is enhanced versus the results presented in [14]. We hypothesize that this difference is associated with the use of the inclination estimate of the container, as opposed to the individual accelerometer channels which are related to this quantity.
All SVM models outperformed the simplistic single factor LR motion models. Moreover, all SVM models exhibited superior performance to the previous best-case reported MAPE of 58.9% for a single wearable sensor in an experiment using scale-based ground-truth [14]. Only minimal differences in MAPE were observed for models utilizing the proposed sip micro-event segmentation versus those computed based on the entire drinking event.
Variation in MAPE across trials is depicted in Figure 8 for the best-case volume estimator (medium kernel, sip micro-event partition). Consistent with prior observations [14], dispersion in the observed error metric is substantial, with a standard deviation of 28.18%.   All SVM models outperformed the simplistic single factor LR motion models. Moreover, all SVM models exhibited superior performance to the previous best-case reported MAPE of 58.9% for a single wearable sensor in an experiment using scale-based ground-truth [14]. Only minimal differences in MAPE were observed for models utilizing the proposed sip micro-event segmentation versus those computed based on the entire drinking event.
Variation in MAPE across trials is depicted in Figure 8 for the best-case volume estimator (medium kernel, sip micro-event partition). Consistent with prior observations [14], dispersion in the observed error metric is substantial, with a standard deviation of 28.18%.
Volume MOAPE for varying drink sequence lengths is presented in Table 4. The aggregate estimation accuracy generally improves with an increased sequence length, with reductions being more pronounced for the proposed IS-based SVM models. While not directly comparable due to the employment of the more stringent MOAPE cumulative metric herein, the best-case aggregate consumption estimation accuracy of 19.49% is improved versus the average value of 25% reported in [11]. Volume MOAPE for varying drink sequence lengths is presented in Table 4. The aggregate estimation accuracy generally improves with an increased sequence length, with reductions being more pronounced for the proposed IS-based SVM models. While not directly comparable due to the employment of the more stringent MOAPE cumulative metric herein, the best-case aggregate consumption estimation accuracy of 19.49% is improved versus the average value of 25% reported in [11].
While the best-case MOAPE (12) value exceeds that computed for the in-the-wild data set reported in [14] (16.95%), direct comparability is limited by the inclusion of potential sip detectionrelated errors (i.e., both false alarms and missed drink detections) in this latter metric, along with the utilization of a commercial smart-bottle for ground-truth labeling (the vendor only reports the accuracy of "a fraction of an ounce" for this device). Variability of the best-case aggregate estimator (medium kernel, entire macro-event duration) is presented in Figure 9. Similar to MAPE, inter-subject variability is considerable (standard deviation of 14.75%). For purposes of comparison, the standard deviation across participants for the in-the-wild dataset in [14] was 14.17%. Table 4. Variation in volume mean overall absolute percentage error for multiple prompt periods (bold values emphasize the best performing model for each period).

Model Identifier MOAPE(3) MOAPE(6) MOAPE(9) MOAPE(12)
Duration   While the best-case MOAPE (12) value exceeds that computed for the in-the-wild data set reported in [14] (16.95%), direct comparability is limited by the inclusion of potential sip detection-related errors (i.e., both false alarms and missed drink detections) in this latter metric, along with the utilization of a commercial smart-bottle for ground-truth labeling (the vendor only reports the accuracy of "a fraction of an ounce" for this device). Variability of the best-case aggregate estimator (medium kernel, entire macro-event duration) is presented in Figure 9. Similar to MAPE, inter-subject variability is considerable (standard deviation of 14.75%). For purposes of comparison, the standard deviation across participants for the in-the-wild dataset in [14] was 14.17%.

Fill Level Estimation
Variation in fill ratio MAPE for the multiple models considered is depicted in Figure 10. Sip duration was replaced with maximum inclination for a single-factor LR benchmark model due to its Figure 9. Distribution of volume mean overall absolute percentage error after 12 drinks for the best-case estimator.

Fill Level Estimation
Variation in fill ratio MAPE for the multiple models considered is depicted in Figure 10. Sip duration was replaced with maximum inclination for a single-factor LR benchmark model due to its strong correlation with the fill ratio. This relationship is emphasized by the considerable increase in this quantity over the course of the experiment, as shown in Figure 4. Fill ratio estimation accuracy is greatly improved versus volume prediction for both the single factor regression and more complex SVM models. Variability in MAPE for the best-case estimator (coarse kernel, entire macro-event partition) is shown in Figure 11. Error dispersion across trials is greatly reduced versus volume estimators. Namely, the fill ratio MAPE standard deviation is 3.39%, versus 28.18% for the best-case volume MAPE. Figure 9. Distribution of volume mean overall absolute percentage error after 12 drinks for the bestcase estimator.

Fill Level Estimation
Variation in fill ratio MAPE for the multiple models considered is depicted in Figure 10. Sip duration was replaced with maximum inclination for a single-factor LR benchmark model due to its strong correlation with the fill ratio. This relationship is emphasized by the considerable increase in this quantity over the course of the experiment, as shown in Figure 4. Fill ratio estimation accuracy is greatly improved versus volume prediction for both the single factor regression and more complex SVM models. Variability in MAPE for the best-case estimator (coarse kernel, entire macro-event partition) is shown in Figure 11. Error dispersion across trials is greatly reduced versus volume estimators. Namely, the fill ratio MAPE standard deviation is 3.39%, versus 28.18% for the best-case volume MAPE. Figure 10. Variation in the fill ratio mean absolute percentage error for the various models considered. 9 Figure 11. Distribution of the fill ratio mean absolute percentage error across trials for the best-case estimator.
The fill ratio MOAPE is shown in Table 5 for varying drink sequence lengths. In contrast to volume estimation, the point nature of fill ratio estimates does not allow for sequential error cancelation across multiple drinks. Accuracy generally degrades with an increasing sequence length. We hypothesize that this discrepancy might be associated with the aforementioned skewing of training data towards larger fill ratios. The variability in fill ratio MOAPE(12) estimates across trials is depicted in Figure 12 for the best-case estimator (coarse kernel, sip micro-event), with a standard deviation of 8.58% observed (versus 14.75% for volume MOAPE(12) estimates).  Figure 11. Distribution of the fill ratio mean absolute percentage error across trials for the best-case estimator.
The fill ratio MOAPE is shown in Table 5 for varying drink sequence lengths. In contrast to volume estimation, the point nature of fill ratio estimates does not allow for sequential error cancelation across multiple drinks. Accuracy generally degrades with an increasing sequence length. We hypothesize that this discrepancy might be associated with the aforementioned skewing of training data towards larger fill ratios. The variability in fill ratio MOAPE(12) estimates across trials is depicted in Figure 12 for the best-case estimator (coarse kernel, sip micro-event), with a standard deviation of 8.58% observed (versus 14.75% for volume MOAPE(12) estimates). cancelation across multiple drinks. Accuracy generally degrades with an increasing sequence length. We hypothesize that this discrepancy might be associated with the aforementioned skewing of training data towards larger fill ratios. The variability in fill ratio MOAPE(12) estimates across trials is depicted in Figure 12 for the best-case estimator (coarse kernel, sip micro-event), with a standard deviation of 8.58% observed (versus 14.75% for volume MOAPE(12) estimates).

Residual Volume Estimation
Fill ratio estimates for pairs of drinks may be used to estimate the aggregate inter-drink consumption for a known container geometry, as specified in (3), whereV i: f denotes the estimated aggregate consumption from drink i to f ; β is a container-specific linear density parameter; and v k and FR k denote the ground-truth volume and estimated fill ratio at the initiation of drink k, respectively.
This mechanism, hereby denoted as residual volume estimation, was assessed herein based upon the noted superior accuracy and reduced inter-subject variability of fill ratio versus volume estimators. The comparison was performed using the MOAPE(11) metric. This sequence length was chosen as it represents the maximum number of drinks which can be assessed using initial fill ratio estimates for our 12-drink protocol. As shown in Table 6, the enhanced accuracy of fill ratio estimates does not improve aggregate consumption estimates versus those formed through the summation of drink-level volume estimates (hereby denoted as cumulative consumption estimation). This discrepancy is attributed to the ability of the latter method to benefit from the cancelation of sequential estimation errors within a drink sequence. Moreover, normalization effects during the conversion to the aggregate consumption volume (i.e., residual volume-based OAPE) serve to distort the achieved accuracy in fill ratio estimation (i.e., fill ratio APE). This distortion is more pronounced for trials with smaller levels of aggregate consumption, as emphasized in Figure 13 and specified in (4).

Multi-Target Estimation Framework
Based upon the observations summarized in Table 3, various techniques were explored for developing volume estimators incorporating fill ratio information. The first approach conditioned the training set using the estimated fill ratio. Namely, training data was restricted to the 150 samples whose fill level labels were closest to the estimated fill ratio in the Euclidean sense. While the computational overhead of this approach is not feasible in practical deployment, similar techniques could be realized by selecting from a pretrained model library for targeted fill ratio ranges.
To explore the maximum achievable benefit of this approach, an analysis was conducted using ground-truth fill ratio information in addition to estimates. Moreover, to assess the utility of explicitly mandating this form of fill ratio incorporation, a strategy of appending the fill ratio into the feature space was also considered. The results for all four analysis combinations are presented in Table 7 for the best-case macro-event volume estimator (coarse Gaussian SVM). Estimated fill ratios were obtained using the coarse Gaussian SVM regression model. As demonstrated, while ground-truth fill ratio information improves volume estimation accuracy, no benefit is realized using noisy fill ratio estimates. Moreover, the proposed approach of restricting training data to examples with similar fill ratios produced only minimal error reduction versus feature space expansion. We hypothesize that this limitation is associated with the reduction in available training data using the prior method.

Multi-Target Estimation Framework
Based upon the observations summarized in Table 3, various techniques were explored for developing volume estimators incorporating fill ratio information. The first approach conditioned the training set using the estimated fill ratio. Namely, training data was restricted to the 150 samples whose fill level labels were closest to the estimated fill ratio in the Euclidean sense. While the computational overhead of this approach is not feasible in practical deployment, similar techniques could be realized by selecting from a pretrained model library for targeted fill ratio ranges.
To explore the maximum achievable benefit of this approach, an analysis was conducted using ground-truth fill ratio information in addition to estimates. Moreover, to assess the utility of explicitly mandating this form of fill ratio incorporation, a strategy of appending the fill ratio into the feature space was also considered. The results for all four analysis combinations are presented in Table 7 for the best-case macro-event volume estimator (coarse Gaussian SVM). Estimated fill ratios were obtained using the coarse Gaussian SVM regression model. As demonstrated, while ground-truth fill ratio information improves volume estimation accuracy, no benefit is realized using noisy fill ratio estimates. Moreover, the proposed approach of restricting training data to examples with similar fill ratios produced only minimal error reduction versus feature space expansion. We hypothesize that this limitation is associated with the reduction in available training data using the prior method.

Limitations
Although the proposed attachable architecture offers notable advantages versus competing approaches, it is characterized by some fundamental limitations. Namely, the sensing approach described herein is restricted to drinking vessels in which flow is introduced through inclination (i.e., no straw-based containers, etc.). Moreover, the attachable device limits ubiquity versus wearable sensors, due to the requirement that dedicated hardware be manually repositioned on the container before each drinking episode.
Beyond these innate restrictions, generalization of the results presented herein is limited by the scripted nature of the experiment. Further analysis exploring the efficacy of the proposed approaches for the intended use case, which includes the requirement of drink spotting amongst potentially confounding activities (i.e., handling, bottle maintenance, etc.), is required. Moreover, the described techniques should also be assessed for more natural (i.e., no scripted volume prompts) consumption patterns. Finally, experiments should be conducted for additional types of drinking containers (i.e, mugs, glasses, etc.).

Future Work
Future work will focus on further analysis of the collected dataset, including the incorporation of gyroscope data within the estimation process. In addition, an analysis of the effect of sensor placement on the estimation performance using a subset of data for which two sensors were attached at varying positions will be conducted. Furthermore, more sophisticated analysis techniques, including the application of end-to-end learning strategies, along with more complete multi-target regression approaches yielding the joint estimation of both volume and fill ratio [21] will be explored. In addition to analyzing existing data, further data collection addressing the mentioned limitations in the prior subsection will be conducted.

Summary
A container-attachable IMU sensor for tracking fluid consumption was demonstrated herein. Consumption estimates were formulated using support vector machine regression models with hand-engineered features describing the inclination of the container during drinking. Results were presented for an experiment consisting of 1908 drinks consumed by 84 participants. MAPE was reduced by 11.05% on a per-drink basis versus the prior state-of-the-art for a single wearable IMU in an experiment utilizing scale-based ground-truth values [14]. A MOAPE of 19.49% was achieved over the 159 trials conducted. This multi-drink estimation accuracy improves upon previously reported results for a container-attachable sensing architecture [11].
Consistent with prior motion-based studies, errors in volume estimates were shown to demonstrate considerable inter-subject variability, as quantified by a MAPE standard deviation of 28.18% for the best-case estimator. As a possible alternative tracking approach, aggregate consumption estimates using fill level information were explored. Fill ratio estimates were shown to exhibit an improved accuracy (best-case MAPE of 7.77%) and reduced inter-subject variability (corresponding standard deviation of 3.39%) versus volume estimates. Aggregate consumption estimates based upon the fill ratio did not exhibit an improved accuracy versus those obtained through the summation of individual drink volume estimates.
In addition, a technique for segmenting the entire drink motion sequence into transport and sip micro-events was proposed and demonstrated. While this micro-partitioning did not considerably affect the estimation accuracy for the scripted results considered herein, it may be useful for in-the-wild applications where variability in transport motion patterns is enhanced. Furthermore, the possibility to improve volume estimation accuracy by exploiting the influence of fill level on the resulting motion pattern was investigated. While the inclusion of ground-truth fill ratio data as both an additional feature and conditioning factor in training data was shown to improve the accuracy, the utilization of noisy fill ratio estimates provided no performance improvement. Funding: This work was supported by a TSGTD grant from Michigan State University.

Conflicts of Interest:
The authors declare no conflicts of interest.

Appendix A
The inclination signature feature set is specified in Table A1, where θ j = {θ 1 , θ 2 , . . . , θ D } denotes the inclination estimate of the jth drink event defined according to (1).