Validity and Feasibility of the Monitoring and Modeling Family Eating Dynamics System to Automatically Detect In-field Family Eating Behavior: Observational Study

Background: The field of dietary assessment has a long history, marked by both controversies and advances. Emerging technologies may be a potential solution to address the limitations of self-report dietary assessment methods. The Monitoring and Modeling Family Eating Dynamics (M2FED) study uses wrist-worn smartwatches to automatically detect real-time eating activity in the field. The ecological momentary assessment (EMA) methodology was also used to confirm whether eating occurred (ie, ground truth) and to measure other contextual information, including positive and negative affect, hunger, satiety, mindful eating, and social context. Objective: This study aims to report on participant compliance (feasibility) to the 2 distinct EMA protocols of the M2FED study (hourly time-triggered and eating event–triggered assessments) and on the performance (validity) of the smartwatch algorithm in automatically detecting eating events in a family-based study. Methods: In all, 20 families (58 participants) participated in the 2-week, observational, M2FED study. All participants wore a smartwatch on their dominant hand and responded to time-triggered and eating event–triggered mobile questionnaires via EMA while at home. Compliance to EMA was calculated overall, for hourly time-triggered mobile questionnaires, and for eating event–triggered mobile questionnaires. The predictors of compliance were determined using a logistic regression model. The number of true and false positive eating events was calculated, as well as the precision of the smartwatch algorithm. The JMIR Mhealth Uhealth 2022 | vol. 10 | iss. 2 | e30211 | p. 1 https://mhealth.jmir.org/2022/2/e30211 (page number not for citation purposes) Bell et al JMIR MHEALTH AND UHEALTH


Challenges to Dietary Assessment
A prevailing challenge in dietary and eating research is the ability to accurately measure dietary intake. Historically, the assessment of dietary intake and eating behaviors uses self-reporting tools [1,2], such as food diaries, food frequency questionnaires, and 24-hour dietary recalls [3,4]. All dietary assessment self-report methods have some level of measurement error (difference between measured and true values) [5,6]. Dietary data collected via self-report methods may be misreported because of biases, such as recall or memory bias (when a respondent erroneously recalls their dietary intake) and social desirability bias (when a respondent desires to present oneself positively) [7][8][9]. Studies have also found that those with certain characteristics (eg, obese weight status and body image dissatisfaction) are more likely to underreport their energy intake [10,11].

Shifting Focus From Dietary Intake to Eating Behavior and Context
The field of nutritional epidemiology has produced an abundance of studies that have examined the role of dietary intake (ie, what and how much is consumed) in human health and disease-specifically, macronutrients (eg, fats and carbohydrates), types of food, quality of food, dietary patterns, and more [12]. Decades of laboratory-based and observational research indicate that dietary intake is a critical component of chronic disease prevention [13]. However, the measurement of diet in free-living populations remains a significant challenge in the field. In addition, even if public health researchers can easily and accurately track free-living dietary intake, dietary intake patterns are notoriously difficult to change long-term [14].
Eating behaviors and patterns (ie, food choices and motives and feeding practices) and context (who is eating, when, where, with whom, etc) also play a significant role in the development of obesity and other chronic diseases, including type 2 diabetes and heart disease [15][16][17][18][19][20]. These findings indicate that the patterns and features of eating events may be key contexts that shape dietary intake, and thus could be more malleable features of eating behavior that could be intervened. However, the field lacks appropriate behavioral theories that provide a richer understanding of how eating behaviors vary across contexts and across time [21,22].

Technology-Assisted Dietary Assessment
Emerging technologies offer a potential solution for the accurate assessment of dietary intake by addressing the limitations of self-reported dietary assessment methods. The incorporation of technologies into dietary assessment can improve the quality and validity of dietary data by passively measuring eating in naturalistic settings over long periods with minimal user interaction [23]. Two emerging technological advances in dietary assessment tools include the following: 1. Ecological momentary assessment (EMA): a data collection technique in which one's behavior is repeatedly sampled in real time and in context [24][25][26]. 2. Wearable devices and sensors: allow for the passive collection of various data streams from the physical environment (eg, acoustic, visual, and inertial) [27].
EMA and wearable sensors are able to measure behavior near or just in time, thereby reducing or eliminating the recall bias that can affect retrospective self-report measures. In addition to improving the validity of data, these technologies offer the opportunity to measure eating behavior frequently and over long periods, allowing researchers to examine how it varies over multiple timescales (varies over the day, over the week, etc).

Monitoring and Modeling Family Eating Dynamics Study
To address the limitations of traditional dietary assessment methods and theories, the Monitoring and Modeling Family Eating Dynamics (M2FED) study developed a sensor system that used smartphones as well as deployable and wearable sensors to collect synchronized real-time data on family eating behavior [28]. This study used the following: (1) wrist-worn smartwatches containing inertial sensors (accelerometer and gyroscope) to automatically detect arm movements and hand gestures associated with eating; (2) EMA via smartphone to confirm whether the eating occurred and to measure other contextual information, such as who was present during the eating event and the current mood of the respondent; and (3) Bluetooth proximity beacons to determine the approximate location of the smartwatches.
Rather than focusing on dietary intake (caloric intake, portion sizes, etc), this study took a novel approach by measuring eating behaviors (ie, food choices and motives and feeding practices) and context (who is eating, when, where, with whom, etc). Family eating dynamics have yet to be measured and modeled dynamically to better contextualize our understanding of social influence processes within family systems. This paper begins the first step toward producing new models that develop behavioral theory, and it may enable the identification of temporally specific processes and events within the family system that can be targeted for personalized, context-specific, real-time feedback.

Assessing Validity of Wearable Sensors
The validity of using wearable sensors to automatically assess eating behavior and context has been tested in both laboratory and field settings [27,[29][30][31], indicating that the performance of the wearable sensors decreases in naturalistic settings (compared with controlled laboratory settings). Studies have used a variety of sensors (eg, microphones, cameras, smartwatches, and electromyography electrodes) to measure various dietary outcomes, including bites, chewing, swallowing, and duration of eating occasions [27,[29][30][31][32][33]. A review by Bell et al [27] indicates that there is still a strong reliance on retrospective self-report methods (eg, end-of-day food diaries) to determine ground-truth eating activity to evaluate wearable sensors in the field. Given the aforementioned limitations of retrospective self-report methods to accurately assess diet, the M2FED study used event-contingent EMA to determine ground-truth eating activity in families. The use of EMA offers unique methodological advantages, such as the following: • The ability to measure behavior near or just in time, thereby reducing recall bias and reducing participant burden.
• The ability to measure behavior at the location in which it actually occurs, thereby maximizing ecological validity [24].
The validity of this method has been tested in a few in-field studies [34,35]; however, it has not yet been tested in a family-based study.

Assessing Feasibility of EMA
One disadvantage of using technologies for data collection is the potential for participant noncompliance. A recent systematic review and meta-analysis by Wen et al [36] found that compliance rates among EMA studies in youth samples were suboptimal; the weighted average compliance rate was 78.3%, falling under the recommended 80% compliance rate [24]. Many studies have explored EMA compliance for various behaviors in various populations [36][37][38][39][40], but the compliance rate for a family-based EMA study is underexplored. A recent EMA study involving mothers and their children found that mothers' presence may enhance children's compliance with EMA questionnaires [41], suggesting that family members and other social relations may be leveraged to increase compliance in future EMA studies.

Study Aims
Therefore, the overall purpose of this study is to report on participant compliance (feasibility) to the 2 distinct EMA protocols of the study (hourly time-triggered and eating event-triggered assessments) and on the performance (validity) of the wearable sensor in automatically detecting eating events in a family-based study. Specifically, the primary aims of this study include the following:

Eligibility
The research team recruited families that contained at least two members (including at least one adult parent and one child between the ages of 11 and 18 years) living in Los Angeles County. Families with children aged <11 years were eligible to participate; however, children aged <11 years were not permitted to participate in the study. Families were not eligible to participate if one or more family members living at home did not primarily speak English. There were no demographic or disease-related exclusion criteria.

Method of Recruitment
Families were recruited in public spaces and at public events in Los Angeles County from May 2017 to August 2019. Snowball sampling was also used, such that participating families were offered an additional US $20 if they referred other eligible families that were successfully enrolled in the study.
All families that expressed interest and met the eligibility requirements were invited to participate in the study. An intake screening tool was administered over the phone by recruitment coordination staff to confirm eligibility before enrolling in the study.
This study was approved by the Institutional Review Board of the University of Southern California (UP-16-00227). All parents provided informed written consent, and all children provided assent.

Overview
The primary objective of the M2FED study is to develop and deploy the M2FED cyberphysical system (Figure 1) in the homes of families. Cyberphysical systems can be defined as "physical and engineered systems whose operations are monitored, coordinated, controlled, and integrated by a computing and communication core" [42]. This novel system monitored in-home family eating behaviors in all participants. This system contained four primary components (1) sensors (including smartwatches, smartphones, and Bluetooth proximity beacons), (2) a base station, (3) an EMA subsystem, and (4) a remote monitoring subsystem, all of which were connected through a Wi-Fi router ( Figure 1).
For the scope of this study, all data collected by the system were measured in the home (ie, no data were collected outside of the home).

Sensors
Participants were instructed to wear a Sony Smartwatch 3 (Android Wear operating system) on their dominant hand during all waking hours that they were in their home. The smartwatches were used to automatically detect eating-related hand-to-mouth (H-t-M) gestures for each participant at home and in real time. Arm movements and H-t-M gestures were detected via an algorithm that used motion data from the inertial sensors inside the smartwatch (accelerometer and gyroscope) [43]. If a cluster of at least two H-t-M gestures were detected within a 1-minute time frame, then the motion data were processed with a more sophisticated algorithm, and these clusters were then characterized as an eating event. An eating event can be defined as a set of H-t-M gestures, representing phenomena such as consuming a meal, snack, drink, or a combination of these consumption behaviors in which H-t-M gestures are clustered temporally. The technical details of the eating event detection algorithm are provided in detail elsewhere [43]. Participants were instructed to wear the smartwatch only at home and to not take it outside or wear it outside of the home. Consequently, data on H-t-M gestures and eating events that were determined by the proximity beacons that occurred outside of the home were discarded.
Participants were each provided with a Samsung Galaxy S7 smartphone (Android operating system) preprogramed with limited functioning. The smartphone app in which they responded to mobile questionnaires was pinned to the screen so that they could not access other apps on the smartphone. This smartphone was only intended for use as a data collection tool. Participants were instructed to keep their smartphones at home and not take it outside of the home. If a smartphone left home and was not within the range of the Wi-Fi router, the phone did not receive any mobile questionnaires. Consequently, data on participants' states and behaviors outside of the home were not collected.
Estimote Bluetooth Low Energy proximity beacons were used to determine the approximate location of smartwatches of participants (including approximately which room the watches were in and whether they were still at home) during the study period. The beacons continuously broadcasted packets that included the unique media access control address of the Bluetooth interface, whereas the smartwatches periodically scanned for these packets. The smartwatches then recorded the received signal strength indicator (signal from the beacons), which indicated the proximity of the smartwatches to the beacons.
Typically, 1 to 2 beacons were placed on a wall in each living space at home (excluding bathrooms and bedrooms), and they required no further action by the participants during the study.

Base Station
A base station is a radio receiver and transmitter and a computing platform that serves as the hub of a local wireless network (the M2FED system). The base station for the M2FED system was a Lenovo ThinkPad laptop, which was placed in the home of the family for the duration of the study. The laptop was placed in a locked cage so that it could not be tampered with. The base station collected and processed the data received from the smartphones and smartwatches through the Wi-Fi router, and managed the EMA subsystem that ran on the laptop as well.

EMA Subsystem
EMA is a data collection technique in which one's behavior is repeatedly sampled in a natural environment [24]. In this study, participants were assessed on several individual behaviors and states via mobile questionnaires sent to their smartphone approximately every hour during waking hours. Each smartphone had an app developed by the members of our research team installed on it. The app acted as a mobile questionnaire platform (ie, participants answered the questionnaires within the app interface).
The two types of EMAs that the participants received are as follows: (1) time-triggered mobile questionnaires and (2) eating event-triggered mobile questionnaires.
A time-triggered mobile questionnaire was sent to the participants' smartphones every hour at the top of the hour (eg, 10 AM, 11 AM, 12 PM, etc; Figure 2A). The questionnaire included a brief validated positive affect and negative affect survey [44][45][46][47] (see Table 1 for the full list of questions).
Shortly after an eating event was detected for any given participant, an eating event-triggered mobile questionnaire was sent to the corresponding participant's smartphone asking to confirm whether they had just eaten ( Figure 2B). If they confirmed that they had just eaten, then following this first question, they were asked a battery of survey items including previously validated measures of hunger and satiety [48], mindful eating [49], positive and negative affect [44][45][46][47], and with whom they were eating, if anyone (see Table 1 for the full list of questions). If the participant had not finished eating, they were given the option to request more time before filling out the questionnaire.
If they responded to the first question indicating that they had not just eaten, then they were asked to report what activity they had just completed. They were then asked to respond to validated measures of positive and negative affect [44][45][46][47]. Figure 3 illustrates the full eating event-triggered EMA question logic. The full list of questions for the time-triggered and the eating event-triggered mobile questionnaires can be found in Table 1 Figure 2A is an example of a time-triggered mobile questionnaire that the participants received on their phone during the study. It contains the first 4 questions of the questionnaire that measure negative affect. Figure 2B is an example of an eating event-triggered mobile questionnaire that the participants received on their phone during the study. It contains the first question of the questionnaire that measures whether the participant had just eaten or drank. EMA: ecological momentary assessment.

Participation Windows
Before a family's deployment started, all participants were individually asked about the time at which they normally woke up and the time at which they normally went to bed. The participants were limited to only 1 personalized participation window for the study. Therefore, they could not have different windows for Monday versus Tuesday and weekday versus weekend. If the times at which they woke up or went to bed varied extensively among days, then they were asked to provide a time frame that generally worked for all days. The purpose was to create personalized participation windows to account for variations in the daily routines and sleeping patterns of the participants. For the duration of the study, the participants only received EMAs during their personalized participation window. For example, if the window of a participant was from 6:30 AM to 11:00 PM, then they only received EMAs during that period.

Remote Monitoring Subsystem
The monitoring subsystem was used to monitor the status of the M2FED system in real time [50]. The subsystem monitored several things, including the battery status and network connection of the smartwatches, smartphones, and base station; the processes running on the base station; the detected eating events; and whether participants responded to any given EMA sent to their smartphones. When the monitoring system detected an issue (eg, the base station was no longer connected to the router), an email was sent to the research team to alert them of the issue.

Procedures
Following enrollment, 2 members of the research team visited the home of each recruited family 2 separate times.

Visit 1
During the first home visit, the team went to the participants' home to obtain consent from all participating family members, take body measurements of the participants using a research-grade Tanita scale (Model TBF 300) and stadiometer, administer baseline surveys, and install the components of the cyberphysical system around the home (all living spaces, not including bedrooms or bathrooms).
The base station, Wi-Fi router, and Bluetooth beacons were placed in a discrete location in the home of the family, so they could run without interference for the duration of the study. Samsung smartphones and Sony smartwatches were provided to all participating family members for the duration of the study (all features except answering questionnaires were turned off). Each phone and watch was designated to a specific participant and labeled with their name so that they knew which devices were their own. The team instructed the family on how to properly wear, charge, and care for the smartwatches and how to answer an EMA on the smartphones. The family was instructed to wear the watch at all times when they were at home and to answer all EMA questionnaires they received when they were at home. They were also instructed to leave their designed phone and watch at home when they left home to prevent the devices from getting damaged or lost while outside of the home.
Upon leaving the visit, family members underwent approximately 14 consecutive days of (1) use of a smartphone to complete hourly time-triggered and eating event-triggered mobile questionnaires, up to once every hour during waking hours; and (2) eating event monitoring, in the form of a wrist-worn smartwatch during waking hours.

Visit 2
At the final home visit, approximately 2 weeks following the first home visit, the research team terminated data collection, and all equipment was uninstalled and removed from the homes. Each participant received US $100 in a Visa gift card format as compensation for the 2-week study.

Eating Events
During the 2-week assessment period, participants were asked to wear their dedicated smartwatch on their dominant wrist at all times while they were home during waking hours. Automatic eating event detection software on the smartwatches developed by our research team [43] collected the timestamps (approximate start and end times in the format mm/dd/yyyy, hh:mm:ss) for all detected eating events that occurred while the watch was worn. After an eating event was detected, participants received a brief mobile questionnaire on their study phones to confirm whether the detected eating event was a true event. The first question on the questionnaire was "Were you eating or drinking just now?" If the participant responded "No," they were asked to report what they were doing. Options included using my phone, smoking, fixing my hair, putting on sunscreen or lotion, or other with an open text field. If the participant responded "Yes," they were asked to report on a range of momentary measures, such as hunger level before the eating event and with whom they were eating. The full list of questions for the time-triggered and the eating event-triggered mobile questionnaires can be found in Table 1.

EMA Questionnaires
Timestamps (format: mm/dd/yyyy, hh:mm:ss) when the hourly time-triggered and eating event-triggered mobile questionnaires were sent to and received by the smartphones of participants were obtained from the monitoring system. In addition, the responses of the participants to the questionnaires were obtained from the monitoring system.

Timing
Time of day at which and day of week on which an eating event occurred was calculated using the timestamp of the detected eating events. The time of day at which the eating event occurred was stored in hh:mm:ss format. The lubridate R package [51] was used to convert the date on which the eating event occurred (format: mm/dd/yyyy) to the day of corresponding week (Monday, Tuesday, etc), which was then converted to weekday (Monday, Tuesday, etc) and weekend (Saturday or Sunday).

Anthropometrics
During home visit 1, height (cm), weight (kg), and body fat percentage (%) were measured in all participants in a private section of the home, using a portable stadiometer and a research-grade Tanita scale (model TBF 300).

Demographics
During home visit 1, participants were asked to provide basic demographic information via a paper-based questionnaire, including their current age (years), gender (female or male), race (Hispanic or Latino, Asian or Pacific Islander, White, Black or African American, American Indian or Native American, Mixed, or other), Hispanic or Latino ethnicity (Yes, No, or Do not know), and family role (mother, father, child, grandparent, aunt, uncle, and others).

Data Processing
A limitation of the EMA sampling protocol of the M2FED study was that the study phones of participants (which were instructed to be kept at home at all times) received hourly, time-triggered surveys regardless of whether the participants themselves were at home or not (at school or work, running errands, etc). This means that the time frame in which any given participant was at home and participating in the study was not necessarily continuous. Although we do not possess the ground truth for presence of the participants at home (eg, no cameras and no self-report diaries), our research team generated a participation algorithm using the EMA system, proximity sensors, and accelerometer in the watch to identify time intervals in which we were confident that the participants were both at home and actively participating in the study (ie, answering EMAs or wearing the smartwatch; Figure 4).
If participants had answered an EMA at time t, then we assigned their status as participating for the 30-minute interval surrounding time t (ie, from t −15 to t +15 minutes). For times outside the EMA interaction windows, we used data from the sensors (smartwatch accelerometer and Bluetooth beacons) to determine the status of the participants. For every minute, if the accelerometer data of the smartwatch was both available (ie, not missing for that minute) and indicated movement (ie, the frequency and instantaneous changes of the sensor signal was above a threshold, representing change in the signal because of movement) and beacon data were available, then they were classified as participating for that 1-minute interval. Contiguous minute intervals with participating status were merged to acquire larger time intervals. For each participant, these participation time intervals were calculated, and the union of all intervals ( Figure 5) was used as the valid time interval in the analyses.  In the first interval, we see that the participant answered an ecological momentary assessment (EMA), and there were available data from the accelerometer and beacon. In the second interval, the participant did not answer an EMA, but there were available data from the accelerometer and beacon. In the third interval, the participant answered an EMA and there were some available data from the accelerometer.

Individual-and Family-Level Characteristics
The mean and SD or the count and proportion of the analytic sample's age, BMI, gender, race, and ethnicity were calculated and reported by family role (child or parent). At the family level, the count and proportion of the type of household of the family (1-or 2-parent household), number of children living at home, and average length of family deployment were reported.

EMA Characteristics
The mean and SD of EMAs received per family, received per person, and received per person per day were calculated after applying the participation algorithm to the EMA data. The frequency distribution of EMAs by family role and time of day was calculated.

Primary Analyses
To test study aim 1A, EMA compliance was calculated as follows (i can be values from 1 to n, where n represents the number of participants in the study): To test study aim 2A, we evaluated the performance of the smartwatch by computing the following metrics for all eating events automatically detected during deployments: True positives = cases in which an eating event actually occurred, and that eating event was correctly detected by the smartwatch algorithm False positives = cases in which an eating event actually did not occur, but an eating event was erroneously detected by the smartwatch algorithm.

Precision = true positives / (true positives + false positives) (4)
To test study aim 2B, nonparametric methods were used to determine whether there were differences in the detection of eating events by participant age, gender, family role, and height. The metric we used to compare across demographic groups was the following: Proportion of correctly detected eating events for participant i = true positives for participant i / total number of detected eating events for participant i (5) If any participant had received fewer than 3 eating event-triggered EMAs, their data were excluded from this analysis.
For categorical variables with 2 groups (ie, gender), the appropriate assumptions were tested, and then the Mann-Whitney U test was used to test for equality of central tendency of the 2 distributions; for categorical variables with 3 or more categories (ie, family role), the Kruskal-Wallis test was used. Finally, for continuous variables (ie, height [cm] and age [years]), the appropriate assumptions were tested, and Spearman rank correlation was used to measure the strength and direction of the relationship between the continuous variable and the proportion of correctly detected events.

Missing Data
There were no missing anthropometric or demographic data. Similarly, there were no missing data on detected eating events and corresponding variables, including time of eating event and day of eating event; however, there were missing data for time-triggered and eating event-triggered EMAs.

Missingness Attributed to Technical Issues
Preliminary analyses indicated that not all EMAs that were sent to the study phones of the participants by the M2FED system were received by the phone. The M2FED system ran independently on the base station regardless of the network connection, and therefore sent EMAs regardless of network connection. However, a network connection was needed for the phone to successfully receive the EMA.
Although we do not have data that explain why this happened at every instance, we know from in-the-field troubleshooting and from accounts given by participants that at least a portion of the nonreceived EMAs resulted from (1) network connection issues at home (ie, the router was not working and the EMAs could not be received on the phone) and (2) EMA app failure (ie, the EMA app on the phone failed to work properly).
For these analyses, we removed any EMAs that were sent by the system but were not received by the phone.

Missingness Attributed to Participant Nonresponse or Partial Response
The different types of missing data that we encountered were because of participant nonresponse (ie, participants did not respond to any EMA questions) or partial responses (ie, participants did not respond to all EMA questions).
For aim 1 analyses, if participants did not respond to any questions on a given mobile questionnaire, then this EMA was labeled as received but not answered. If participants did not respond to all questions, then this EMA was labeled as received and partially answered. These EMA observations were kept in the data set to calculate EMA compliance.
For aim 2 analyses, if participants did not respond to at least the first question on a given eating event-triggered EMA ("Were you eating or drinking just now?"), then this EMA observation was removed from the data set.
Statistical software R (version 4.0.2) was used to perform these analyses.

Individual-and Family-Level Characteristics
A total of 74 participants from 20 families were enrolled in the M2FED study. In all, 18% (13/74) of participants dropped out of the study or were removed from the data set if their participation (as determined by the participation algorithm) was 0% (ie, they did not answer any EMAs and never wore the smartwatch; Figure 6).
In addition, the data from 4% (3/74) nonparent adult participants made up approximately 1.44% (61/4232) of the EMAs received, so these participants were removed from the analytic sample as well. The remaining 78% (58/74) of participants included in the analytic sample did not significantly differ from the enrolled sample (N=74) by age, gender, or parent role (P>.05; Table 2).

Predictors of Compliance
Three separate logistic regression models were fitted with the following data sets: (1) all EMAs, (2) time-triggered EMAs, and (3) eating event-triggered EMAs.
Results from the first model indicate that time of day and whether other family members had also answered an EMA were significant predictors of compliance to all EMAs (Table 7). Participants were 37% less likely (odds ratio [OR] 0.63, 95% CI 0.46-0.86) to respond to an EMA in the afternoon and 39% less likely (OR 0.61, 95% CI 0.45-0.81) to respond to an EMA in the evening compared with the morning (reference group). Participants were 91% more likely (OR 1.91, CI 1.56-2.34) to respond to an EMA if another family member had responded to an EMA in the surrounding 30-minute time interval.
The results from the second model indicate that time of day and whether other family members had also answered an EMA were significant predictors of compliance to time-triggered EMAs ( Table 7). Participants were 40% less likely (OR 0.60, 95% CI 0.42-0.85) to respond to a time-triggered EMA in the afternoon and 47% less likely (OR 0.53, 95% CI 0.38-0.74) to respond to a time-triggered EMA in the evening than in the morning (reference group). Participants were approximately 2 times as likely (OR 2.07, 95% CI 1.66-2.58) to respond to a time-triggered EMA if another family member had responded to any EMA in the surrounding 30-minute time interval.
Results from the third model indicate that weekend status and deployment day were significant predictors of compliance to eating event-triggered EMAs (Table 7). Participants were 2.4 times as likely (OR 2.40, 95% CI 1.25-4.91) to respond to an eating event-triggered EMA on the weekend, than on a weekday. Participants were 8% less likely (OR 0.92, 95% CI 0.86-0.97) to respond to an eating event-triggered EMA for every 1-day increase in deployment day.

Smartwatch Algorithm Evaluation
At least one eating event was automatically detected during the deployment for 46 participants. This subsample (ie, the analytic sample for aim 2A) did not significantly differ from the enrolled sample (N=74) by age, gender, or parent role (P>.05; Table 2).
A total of 461 eating events were automatically detected using the smartwatch algorithm across these 46 participants. Participants responded to 85.7% (395/461) of the corresponding eating event-triggered EMAs. Participants confirmed that 76.5% (302/395) of the detected events were true eating events (ie, true positives) and 23.5% (93/395) were not true eating events (ie, false positives). For approximately one-third of these false positives, participants reported that they were using their phones at the time. The calculated precision measure, that is, the number of true positives divided by the sum of true positives and false negatives, was 0.77.

Differences in Eating Event Detection
At least three eating event-triggered EMAs were received by 36 participants. This subsample (ie, the analytic sample for aim 2B) did not significantly differ from the enrolled sample (N=74) by age, gender, or parent role (P>.05; Table 2). For this subsample, the average individual-level proportion of correctly detected eating events (true positives / total number of detected eating events) was 78.5% (SD 19%; range 30%-100%). In all, 72% (26/36) of the analytic sample had at least one falsely detected eating event (false positive).

Discussion
The M2FED study sought a dramatically different mobile health (mHealth) approach to obesity prevention and intervention by not focusing directly on diet and activity, but rather on family eating dynamics. An in-home sensor system was developed and deployed to monitor family eating dynamics in real time and context.

Evaluating EMA Compliance
After applying our customized participation algorithm, we found that both individual-and family-level compliance rates to the EMA protocols of the study were relatively high (both greater than the recommended 80%) [24]. Compliance was significantly higher in the mornings overall and higher on the weekends for eating event-triggered EMAs, which supported the informal feedback we received from participants that they were more likely to participate (ie, respond to EMAs and wear the smartwatch) when they did not need to go to work or school (typically the weekend days). We also saw that overall compliance decreased as the 2-week study went on, most likely attributable to participant fatigue.
One particularly interesting finding was that participants were significantly more likely to answer an EMA if another family member had answered an EMA in a similar time frame. A similar finding was reported by Dzubur et al [41], in which mother-child dyads were more likely to comply with prompts when they were together. Although the overarching aims of the M2FED study were to measure the social influence of family members on eating behavior, this finding also indicates that social influence came into play in other parts of the study as well. Drawing from the social psychology field, several social mechanisms could partially explain these findings. For instance, an expectation could have been set early on in particular families to answer the EMA prompts, thus establishing a social norm for EMA compliance [52,53]. Similarly, some individuals may have been inclined to answer EMA prompts to conform to the behavior of other family members around the same time [52,53], especially considering that family members received their time-triggered EMAs at approximately the same time as each other.
Studies have used EMA to measure various dietary outcomes, including frequency of food intake, intake of specific types of foods (eg, low glycemic index foods), and energy intake [25]. It has been suggested in a recent systematic review of mobile ecological momentary diet assessment methods that EMA has the potential to be a novel dietary assessment method, both on its own and as a supplement to other mHealth technologies [25]. The use of EMA to assess dietary intake and eating behavior provides some key advantages, namely, the reduction of participant burden and recall bias and the maximization of ecological validity [25]. Taken together with the findings from Dzubur et al [41] and Schembre et al [25], our findings suggest that EMA can be used to sufficiently supplement automatic dietary assessment (ADA) approaches and may be a particularly useful approach for leveraging social relations and maintaining compliance in dyad-and group-based EMA studies.

Evaluating ADA
Various technologies have been used to passively measure eating activity in naturalistic settings over long periods with minimal user interaction. One of the most popular technologies for assessing eating activity in the field is the wrist-worn smartwatch or accelerometer [23,27]. The performance of automatic, wearable-based, in-field eating detection approaches to date has been reviewed by Bell et al [27]. The smartwatch used in the M2FED study performed on par with other in-field devices, although comparability is difficult owing to the wide and varying metrics used by other papers [27]. Although some wearable devices included in this review performed very well, the duration of the free-living deployment was 1 day (approximately 24 hours) or shorter for more than half of the studies, and another one-third were 1 week in length or shorter [27].
Overall, 3 studies had durations that lasted at least two weeks or longer [34,54,55], 66% (n=2) of which had sample sizes of only 1 participant each. Therefore, the M2FED study is one of the first studies to extensively test the feasibility of deploying an ADA approach for a considerable amount of time (2 weeks) and with a relatively large same size (>50 participants). Part of this success stems from the combined use of mobile devices (for EMA) and smartwatches, which were selected for the M2FED study to maximize long-term usability. Although other technologies have been able to perform better in the field, the usability of these technologies (electromyography electrodes, ear and neck sensors, wearable video cameras, etc) may be lower compared with wrist-worn devices because of the inconvenient location of sensor placement, the potential to interfere with the behavior of participants in real life [56], and the potential intrusiveness or discomfort caused by the sensor [57].
This study also demonstrates that EMA is a feasible tool for collecting ground-truth eating activity and thus evaluating the performance of wearable sensors in the field. Only 2 studies [34,35] included in the review by Bell et al [27] used a novel method for obtaining ground-truth eating activity in the wild, similar to the way EMA was used in the M2FED study. In a study by Ye et al [34], when an eating gesture was automatically detected via a wrist-worn sensor, participants were sent a short message on their smartwatch to confirm or reject in real time whether they were eating. Similarly, in a study by Gomes and Sousa [35], when drinking activity was detected via a wearable sensor, participants were sent an alert on their smartphone and could then confirm or reject whether they were drinking via EMA. Although EMA and similar self-report methods have their own limitations [23,58], they offer the ability to capture and validate ground-truth eating activity near the time of eating, thus improving research scalability and participant acceptability [25].
Another key feature of the M2FED study was the ability to capture intrapersonal (individual) and interpersonal (social) contexts with our combined event-and signal-contingent protocols. A systematic review noted that <7% of EMA studies assessing diet use a combined approach [59]. EMA is a powerful tool that can be used to validate automatically detected eating behavior in the field and to easily collect information about meaningful contexts; however, few studies have used this approach and still rely on paper-pen questionnaires to validate their findings [27].

Limitations and Strengths
The M2FED study design had notable limitations. First, our method of obtaining ground-truth eating was only deployed via eating event-triggered EMA after an eating event was detected by the smartwatch. Thus, we could only verify true positive and false positive eating events. The M2FED system was not designed to verify true negative or false negative eating events, which limited our ability to calculate common evaluation metrics (ie, accuracy and F 1 -score) and compare our results to other in-field studies described in the literature. Future research can build upon our study by implementing a verification of true negative and false negative eating events, via time-triggered EMA or other methods, to gain a better understanding of the strengths and weaknesses of such an event detection algorithm.
Second, the false positive eating events were self-reported validation, which might be subject to social desirability in underreporting an eating event. This could potentially bias the validity of the results. Third, we encountered various difficulties with the deployed technologies, including smartwatches (ie, limited battery), mobile phones (ie, limited battery and app crashes), and the Wi-Fi router (ie, wireless connection dropped). Although these challenges were anticipated and were addressed in a timely manner on all occasions, some data were lost during the data collection process.
Finally, as the scope of this study only covered in-home eating behavior, we observed relatively few eating event-triggered EMAs per person across the 2-week study (approximately 8 per person). However, the range was very wide, indicating that some participants consumed more meals inside their homes compared with others. Reasons often provided informally by participants included eating all or most meals at school or work, working early or late, traveling for work, and participating in after-school extracurricular activities.
On the other hand, this study also possesses several strengths. First, we recruited a large and ethnically diverse sample of families from Los Angeles. It has been previously noted that the lack of diverse samples in eating-related mHealth and EMA studies is a major limitation of past research [60]. Second, as noted above, the M2FED study facilitated one of the longest in-field deployments found in the literature so far. Most ADA research has been conducted in the laboratory. By deploying in the field, we are able to better understand real-life eating behavior (vs eating behavior in a laboratory) and gain a better understanding of the challenges that arise when deploying wearable sensors outside of the laboratory. Third, as the deployment process was across a 2-year period, we were able to iteratively improve our automatic eating event detection algorithm and then use the newest version in the following deployments. Finally, this study produced momentary measures of theoretical constructs as well as momentary measures of eating behaviors. The theoretical work that we can now contribute would be to understand which constructs influence behavior, which behaviors influence various constructs, and which constructs play no role at all. We can also begin to understand the role of timing in these influences.

Future Directions
The mHealth field is converging toward the use of a combination of user-friendly devices to assess eating behavior in the wild (eg, mobile phones and wrist-worn devices) [27,31]. Implementing user-friendly technologies for in-field dietary assessment or eating behavior interventions offers at least two substantial advantages-people are generally familiar with them [31] and may be willing to use them for longer periods compared with more intrusive devices. Although early studies experimented with less familiar, often not off-the-shelf technologies (eg, piezoelectric strain gauge sensors), most recent studies have opted for accelerometers and gyroscopes that are embedded within a wrist-worn smartwatch [27]. Furthermore, the combination of a wrist-worn smartwatch to automatically detect eating and a mobile or wearable device to capture ground-truth eating has been featured in a few studies published in the past year [61][62][63]. This approach is becoming more common, and these types of devices offer advantages for the user (participant) and make the use of mHealth technologies more accessible to nonengineering behavioral researchers. However, a number of related challenges have emerged. Future research will need to address comparability between newer technology-assisted measures and more traditional self-report measures of eating [64] versus other similar technology-assisted measures [27].
These user-friendly technologies also allow for passive measurement or low-effort reporting of various contexts and environments with relative ease. For example, fine-grained real-time GPS data can be scraped from both mobile devices and smartwatches to determine an individual's location and potentially assess the external influences on behavior [65,66]. Similarly, the social environment can be gleaned from wearable cameras [67], self-report EMA [68], or proximity Bluetooth sensors [69].
The ability to determine one's context or environment is a necessary component of ecological momentary interventions [70] or just-in-time interventions [71]. These types of intervention designs aim to provide the right amount of support at the right time and in the right context to promote behavior change [71][72][73]. These types of designs are well-suited for and offer unique opportunities for family-based settings [74]. They offer the ability to intervene in children and adolescents and can be designed to target the behavior of multiple family members at once [74]. As family members share genetic, environmental, and behavioral risks, family units are especially important targets for intervention and prevention [75] and have the potential to halt the intergenerational transmission of obesity and other chronic diseases.

Conclusions
This paper demonstrates that EMA is a feasible tool to collect ground-truth eating activity and thus evaluate the performance of wearable sensors in the field. The combination of a wrist-worn smartwatch to automatically detect eating and a mobile or wearable device to capture ground-truth eating activity offers key advantages for the user (participant) and makes the use of mHealth technologies more accessible to nonengineering behavioral researchers.