An ideographic study into physiology, alcohol craving and lapses during one hundred days of daily life monitoring

Highlights • In a 100 days intensive study considerable intra- and interindividual differences were established in physiology and psychological craving of people with alcohol use disorder.• For one third of the people heightened heart rate was associated with high craving.• For most participants that reported lapses, lapses cooccurred with craving in at least 92% of the time.• During treatment ambulatory physiological data can support the detection and discussion of possible high risk craving situations in innovative and reliable ways.

present study showed the possibility and paved the way for future intensive longitudinal designs integrating both physiological, psychological and contextual factors during the challenging and lengthy recovery from addiction.

Problem description
Alcohol craving, the urge to use alcohol (Martinotti et al., 2013), is viewed as a highly challenging obstacle for recovery from dependency (Lowman, Hunt, Litten, & Drummond, 2000). Identifying challenging moments of craving, by finding immediate precursors of craving and constructs that are associated with craving, provides the opportunity to develop treatments to facilitate recovery. Making alcohol use disorder (AUD) patients in treatment timely aware of high-risk situations may prompt them to mobilize their coping resources (Rohsenow and Monti, 1999). After years of drinking, these AUD patients have become oversensitized to such high-risk situations, evoking both physiological (Quintana et al., 2013) and psychological (Ooteman, Koeter, Verheul, Schippers, & Brink, 2006) responses, which may lead to lapsing. Lapses are defined as temporary drinking that violates the abstaining goal (Larimer et al., 1999). Unfortunately, many patients do not recognize such early responses in time and fail to take necessary precautions to prevent lapsing (Baker et al., 2004). Currently, little research has been performed outside the lab to identify these critical moments (Kuerbis et al., 2020;Witteman et al., 2015). Additionally, critical moments might be highly individualized (Drummond, 2001;Kavanagh et al., 2013). Current alcohol models are based on and tested with aggregated data, which do not allow to draw inference on how individuals experience craving over time (Fisher et al., 2018). Therefore, identifying individualized moments of high craving in a person's natural environment is an important step to actively support a person to achieve longterm abstinence.

Craving and (Re)lapse
There is a debate about the definition and role of craving within alcohol addiction models (Alayan et al., 2018;Kavanagh et al., 2013). Most scholars (for a recent review see van Lier et al., 2018) posit a causal role for craving (Baker et al., 2004;Larimer et al., 1999;Reis, 2012;Robinson & Berridge, 1993;Tiffany & Conklin, 2000;Verheul et al., 1999), with a few notable exceptions (e.g. Cox & Klinger, 1988). Some studies find a significant relation between craving and relapse (DeMartini et al., 2020;Higley et al., 2011;Miller et al., 1996;Waters et al., 2020), whereas multiple other studies do not (Cooney et al., 1997;Holt et al., 2012;Krahn et al., 2005). Cooney et al. (1997) hypothesize that craving may only occur in a subset of patients, possibly explaining why the relation between craving and relapse is missing in some AUD patients and therefore low in cross-sectional studies. Currently, relapse is often defined as a dichotomized construct within a study, whether a participant does or does not relapse in the whole study in relation to the amount of craving reported. Finally, the definition of relapse in these studies is very broad and can differ from any amount of drinking to drinking large amounts in specific periods, for an overview see Maisto et al. (2016). Consequently, the relation between craving and actual relapse in empirical studies remains complicated (Waters et al., 2020). Serre et al. (2015) argue that although craving is believed to play a major role in the process of (re)lapse, this association remains poorly understood because of the time-limited nature of craving and retrospective reporting bias present in many studies. Therefore, a significant next step would be to investigate the relationship between craving and lapses within person over a longer timeframe, making it possible to investigate how often individuals actually can or cannot resist different levels of craving in daily life.

Craving and physiology
If heightened craving is predictive of lapses within person, measuring craving continuously would clearly open novel avenues for preventing (re)lapsing. For example, helping patients to use their coping skills or call in help at critical moments of craving (Rohsenow & Monti, 1999). Currently, craving is most often measured subjectively using a questionnaire, interview (Cooney et al., 1997;DeMartini et al., 2020;Higley et al., 2011;Miller et al., 1996) or daily diary (Holt et al., 2012b;Krahn et al., 2005;Waters et al., 2020). However, relying on self-reports in treatment or research for extensive and longer timeframes (i.e. more than two weeks) is burdensome and undesirable, due to retrospective bias (Shiffman, 2009). Therefore, Wray, Merrill and Monti (2014) propose to substitute self-reports with physiological measures. In particular, electrodermal activity (EDA) and cardiovascular activity (CVA) are frequently used physiological measures to investigate the relation with craving in a laboratory setting (Ooteman et al., 2006). Rosenberg (2009) showed that in a laboratory setting heightened physiological (heart rate and sweating) and self-reported craving responses do exist at the same time. Cardiovascular activity (CVA), indicates to what extent the autonomic nervous system is responding to changing situational demands (Appelhans & Luecken, 2006). Patients with substance related disorders might therefore benefit from direct (bio)feedback of CVA, as an early warning of relapse. In addition, EDA provides a measure mostly associated with the activity of the sympathetic nervous system. This parameter has been shown to allow differentiation, in a lab setting, between low-and high-risk individuals with respect to substance abuse (Taylor, 2004).
There are multiple perspectives on the role of physiology in addiction. In the past physiology was solely seen as a (conditioned) withdrawal symptom (Skinner & Aubin, 2010). More recently physiology has often been associated with cue-reactivity, a direct response after an alcohol stimulus (Reynolds & Monti, 2013;Witteman et al., 2015) in combination with a craving response. The idea of general response coherence, the co-occurrence of cognitive and physiological responses to certain cues or high risk situations, presumes (from the evolutionary point of view) that cognitive states prepare a person to take action when needed by their current surroundings, whether the surroundings pose a threat or an opportunity (Evers et al., 2014;Kuppens et al., 2010). However, Carter and Tiffany (1999) performed a meta-analysis on this general response coherence, where both a physiological and selfreported craving were measured after an alcohol cue, which only had a moderate correlation of 0.38 of physiology on self-reported craving. Since then, multiple explanations have been provided for the moderate relation between self-reported craving and physiology. Baker et al. (2004) suggested an individual might only experience craving if craving surpasses a certain threshold. Ooteman et al. (2006) argued that concordance of physiological cue reactivity and craving may only be present in a subgroup of alcoholic persons who are sensitive to their bodily reactions. Drummond (2001) hypothesized that craving has a more temporal character and that the interaction with physiology is probably too complex for the isolated laboratory based cues. In real life, stimuli or precursors to craving are less clear-cut, more personal, and therefore it is more difficult to determine the moment of the cue and its following response. In line with Ooteman et al. (2006), Kavanagh et al. (2013) proposed to measure craving at least daily and during situations of high risk for drinking or cravings' impact on a person's situation to better understand the relationship between cognitive craving and phsyiology.
In order to facilitate the temporal personalized characteristic of craving, longitudinal ecological momentary assessment (EMA; Shiffman, Stone, & Hufford, 2008) has been proposed. EMA methods are designed to address the personal differences between people and collect data on behavior, thoughts and feelings as they occur in the moment (Shiffman, 2009). Therefore, EMA includes the within person fluctuations and tackles the retrospective bias, in which participants are believed to make mistakes if they answer questions from memory (Shiffman, 2009). EMA type research allows researchers to draw conclusion on both the within-person fluctuations over time as well as between person differences (with a large enough sample) of craving (Drummond, 2001). However, alcohol craving EMA research is currently limited in scope, as mostly only cognitive and background variables are included. The physiological components of craving are currently only explored in lab studies. In a meta-analysis (Serre et al., 2015), 15 alcohol related EMA craving studies where included, none of the included parameters related to the physiological side of addiction. Therefore, insights on the temporal within-person relation between craving and physiology outside the lab are lacking.

Craving and influential contextual variables
EMA research takes place outside the lab, in the real world, where stimuli or precursors to craving or physiological responses are more complex (Drummond, 2001). In a literature review (van Lier et al., 2018), we explored potential evidence-based emotional, cognitive and contextual precursors of craving or relapse. This showed that negative affect, stress and social situations are relevant influential factors preceding craving (or relapse). This review is in agreement with EMA studies that found an effect of negative affect and positive social experiences on craving (Zheng et al., 2015). Additionally, the meta-analysis of Serre et al. (2015) found four EMA studies on the effect of negative affect on craving, of which three found a positive correlation. Furthermore, Shiffman, Paty, Gnys, Kassel and Hickcox (1996) described two factors that could further explain the influence of social situations as trigger setting of craving. They found that whether alcohol is available and whether alcohol is permitted in social situations are important contextual factors for alcohol craving in daily life. Cooney, Litt, Morse, Bauer, and Gaupp (1997) found that alcohol craving and nicotine craving are highly related in persons with both addictions. Finally, Rohsenow and Monti (1999) describe that craving is not directly related to relapse, but that this relation is mediated by own belief in effectiveness of coping skills.

Current study
Current advances in wearable and smart-phone technology provide novel opportunities for the detection of personalized situations with heightened risk of (re)lapsing, by enabling continuous tracking of fluctuations in psychological and physiological parameters (e.g., Intille, 2012). The current study explores the association between physiological measures and psychological (emotional and cognitive) craving and craving's association with (re)lapse, by monitoring multiple individuals with alcohol addiction for a long period (100 days). In these hundred days the CVA and EDA of each participant were measured with a wearable device in combination with the collection of self-reported variables with a mobile device, including craving, (re)lapses, and contextual variables related to the social situation.
Summarizing, the primary aims of the present study were twofold: to determine (1) the within-person association between self-reported craving and relapses, and (2) the within-person association between heightened physiological activity and heightened self-reported craving during one hundred days of monitoring people trying to recover from AUD in daily life. The secondary aim is to study whether the association between physiology and craving is moderated by contextual variables.

Materials and methods
This study was an observational study with an Intensive Repeated and Continuous Measures in Naturalistic Settings Case-study design (Moskowitz et al., 2009). Participants were monitored with a wearable bio-sensor (E4 wristband of Empatica) and answered multiple questions every three hours on a smartphone app. This study was approved by the Medical Ethical Committee Twente (registration number: NL58392.044.16).

Participants
Participants were recruited from the AUD patients pool of the outpatient online (alcoholdebaas.nl) and face-to-face treatment of an addiction care facility in the Netherlands. All patients were assessed at the beginning of their treatment on the type and severity of substance use disorder by the Substance Abuse Module of the Composite International Diagnostic Interview (Compton et al., 1996). The recruitment of AUD patients started in September 2016 and continued to March 2017. Participants were asked to start participating once they set their main treatment goal (abstinence or less drinking), which was after approximately six weeks of treatment, since they were expected to start experiencing craving from then. We were interested in craving in the context of a clear abstinence goal. In this way drinking can be defined as a lapses. Additionally, without this goal it is likely that craving is less prominent and urgent than with this drinking goal. Without the explicit commitment to stop or severely cut back on drinking there is a much smaller psychological barrier, and associated craving, to drinking. Finally, lapses cannot be meaningfully defined without this abstinence goal. Prior to this treatment goal participants were likely to drink without much craving, since they were not trying to abstain from alcohol. Therefore, including participants prior to this six week mark was expected to lead to less or even no craving and unrepresentative drinking moments. Participants filled in an informed consent prior to inclusion in the study. Ten participants were included in the study, since one dropped out within a week due to difficulties with the use of the technology. Six men and four women participated in the study with an average age of 40 (sd = 11). The in-and exclusion criteria can be found in Appendix A.

Study design
Participants carried their mobile phone throughout the day, which prompted them at set times (see Fig. 1) for assessments (time-contingent design). The time contingent design was found to be least burdensome for this set of questions in a pilot study with students (van Lier et al., 2017). At these set times the participants were asked to fill out some questions. These questions were discussed with four individuals diagnosed with AUD to reflect on their reactivity (Shiffman, 2009); specifically the possibility that the particular phrasing of the question itself induces additional craving. Further adjustments to the questions were made together with these experience experts. Additional questions were offered at the end of the day about their craving moments and alcohol use (daily diary). Administration with their mobile phone enhanced the ecological validity as the data were collected in real time and in their natural environment (Bolger & Laurenceau, 2013).
Next to the self-report questions, participants were asked to wear a biosensor wristband that measured electrodermal activity (EDA) and cardiovascular activity (CVA). Both measures are related to cognitive craving according to Carter en Tiffany (1999). Participants had to turn the E4 wristband on when waking up and download the data and charge the wristband during the night. The pilot study also showed (van Lier et al., 2017) that the usability of the wristband was high, but that participants wore the E4 from occasionally (few hours a day) to regularly (every day for five or more hours), therefore the micro incentives (further explained below) were added (Musthag et al., 2011).
The monitoring study lasted for 100 days, since the objective of our study is to define craving and lapses. In a review of 20 studies, Kirshenbaum et al. (2009) showed that there is a decelerating risk of 60% for relapse from initial abstinence (time point zero) to 100 days, and the hazard of relapse declines to nearly zero after 100 days. Therefore, one might conclude that any patient who has managed to achieve this milestone likely also possesses the ability to remain abstinent for one year or longer. Therefore, testing longer than 100 days will probably not add any new information. Fig. 1 shows the complete design of the study.

Apparatus
The E4 wristband of Empatica (see Fig. 2) was used to collect physiological measures: electrodermal activity (EDA) and cardiac vascular activity (CVA). Additionally, acceleration was collected with the E4 wristband to correct the EDA and CVA data when necessary for movement. In a previous study (van Lier et al., 2020), we performed a validation experiment on the E4, comparing it to a Thought Technology sensor T7500M, acquired with a Procomp Infinity unit which is a measurement on the fingers for the EDA and wrist for CVA. This is often seen as the golden standard location for EDA measures. The comparison showed that the E4 wearable is valid for the parameters instantaneous heart rate and SD during a high stress event, and for total amplitude of skin conductance responses only when studying strong sustained stressors.
In a pilot study (Enewoldsen et al., 2016), we reviewed the quality of the E4 data by following eight participants for a week. Of the EDA data, 90% was artifact-free as shown through noise analysis (Taylor et al., 2015). Fluctuations showed that people have between 0 and 24 skin conductance responses per minute, which is in line with prior findings (Boucsein et al., 2012). The blood volume pulse (BVP) data from this pilot study consisted of around 20% unrealistic values plus artifacts which were detected through the visualization (the exact artifact free data could not be determined).

Procedure
In the start meeting, demographic variables were collected and in the concluding interview an evaluation of the experiment was held. As mentioned before, participants received a questionnaire at the beginning and end of the day and multiple questions during the day and one at the end of the week: Time-contingent design Alcohol Ecological Momentary Assessment studies range from two to eight questionnaires a day (Collins et al., 1998;Cooney et al., 2009;Helzer et al., 2006;Holt et al., 2012a;Krahn et al., 2005;Litt et al., 2009;Miranda et al., 2014;Ray, 2013;Tidey et al., 2008;Wang et al., 2014). However, most of these studies use four or five predetermined moments, together with an event-contingent design where participants are asked to administer craving moments whenever they occur. Four to five times a day, is in line with the Handbook of Research Methods Studying Daily Living (Mehl & Conner, 2012), claiming the range to be between 4 and 10, normative being 6. Shiffman (2009a) showed that AUD patients might have alternative rhythms and therefore it is best to monitor them by the power on of their phones. We will not assume that they will turn off their phone, but all data missing at night-time (or structurally during the day) will be categorized as 'sleeping'. Therefore, the self-reported questions that were collected during the day were administered at 7, 10, 13, 16, 19, 22, 1 and 4 o' clock. Participants had a period of one hour to respond to the questions. A cumulative micro incentive of max 1 euro a day was given for each finished questionnaire. Musthag, Raij, Ganesan, Kumar and Shiffman (2011) showed that micro incentive studies are low cost, while ensuring high compliance, good data quality, and lower retention issues.  When a "normal" sleeping rhythm is used, participants were expected to answer 5 questionnaires a day (see Fig. 2).
To lower the burden (van Lier et al., 2017), some additional questions about craving moments were administered at the end of the day. The administered moments experiencing craving during the day higher than 2 (on a scale of 0-10) were reported back to the participant, who was then asked to provide additional information. A 4G scheme (Events, Thoughts, Feelings and Behaviour; Shoda & LeeTiernan, 2002) for every high craving event (>2) and social activities during the entire day were administered at the end of the day.

Measures
Lapse There were two moments at which the participants could indicate that they lapsed. The first registration cue occurred at the end of every day, where the activity after craving was questioned, and one of the options was drinking alcohol (thus a lapse). The second registration cue was in the morning; drinking moments of the prior day could be registered, since it might happen that a participant did not answer the question the previous day because of being in an intoxicated state or already sleeping. If a participant would answer that they had been drinking and the number of alcoholic units was asked for, the latter was not used in the study.
Self-reported craving Ooteman, Koeter, Verheul, Schippers, and Brink (2006) argued that a single-item measure is highly correlated with more extensive measures, where both are focused on the current state. Since the self-reported craving was administered every three hours, asking multiple craving items each time would increase the burden of this study unacceptably. Craving was measured with a 0-10 Likert scale, with 0 indicating no craving and 10 being high craving. Craving was measured by asking: "How strong is your craving currently?" (for the Dutch question see Fig. 3). Prior to the reactivity reflection session the question was: "How strong is your alcohol craving currently?" However the experience experts advocated to remove the use of the word alcohol in the questions due to their perception that this was likely to induce further craving.
Total Amplitude -EDA The E4 wristband uses electrodes to collect skin conductance (SC) to determine EDA measures. For the EDA data, Total amplitude was selected as parameter and mean HR for the CVA. Total Amplitude was selected since this is a combination of both the number or skin conductance responses and the amplitude, two often used measures of EDA data. Total Amplitude was retrieved from the SC through classical Trough-to-Peak analysis (TTP) (threshold for an SCR amplitude was set at 0.01 µS) (Boucsein, 2012). The data was analyzed with Ledalab, the default settings for filtering and smoothing from the program were used (Benedek & Kaernbach, 2010). The amplitude of a SC response was determined as the difference in conductance between response onset and response peak. The amplitudes were added in order to determine the total amplitude per minute. The total amplitude was therefore a function of both the number of SCRs and the amplitude of all these SCRs. The total amplitude per minute was then averaged over the tree hours prior to the end of possible question administration.
Mean HR -CVA The E4 wristband uses photoplethysmogram (PPG) to collect blood volume pulse (BVP) to determine CVA measures. Mean HR was selected since this parameter is familiar and a relevant indicator of CVA. The SD and RMSSD need higher quality data, that might not be guaranteed in the wild (Enewoldsen et al., 2016). Instantaneous HR can be determined by dividing the mean PP-interval per minute by 60 (seconds). HR was used instead of PP-interval, since HR is a better known transformation of the PP-interval. Mean HR per minute was again averaged over the three hours prior to the end of possible question administration.
Movement The E4 wristband has an onboard MEMS type 3-axis accelerometer that measures continuous gravitational force (g) applied to each of the three spatial dimensions (x, y, and z). The scale is limited to + -2g and the data is sampled at 32 HZ. Force was determined from the spatial data by calculating: From this force the standard deviation over the three hours prior to the end of possible question administration. Standard deviation was chosen over the mean, since the time intervals in this study are large. Mean movement is expected to be steady over hour, however SD of force incorporate these fluctuations.
Other self-reported measures Multiple other self-reported measures were collected, all as single item constructs with a 0 to 10 Likert scale (if not described otherwise). (1) Negative Affect, which was administered on a valence-arousal scale, a two-dimensional scale, with on the x-axis valence, from negative to positive and on the y-axis arousal, from low to high energy, (2) stress, (3) Social situations, which was divided in five possible social activities, namely "no social activity/ work", "friend/family", "terrace/restaurant", "party", "other". Other activities were hobby related or religious activities, (4) available, a yes or no question, (5) whether drinking was permitted, a yes or no question, (6) own belief in effectiveness of coping skills on a scale of 0 to 10 and (7) nicotine craving, also on a scale of 0 to 10.

Descriptive statistics
The compliance rates, amount of craving above zero, number of lapses and the use of craving medication will be provided to give an overview of the data. Jones et al. (2018) found that the compliance rates were 70% on average in an alcohol dependent sample. If the compliance rates are lower than 80%, Stone and Shiffman (2002) warn for lack of representability of the true responses. Furthermore, the correlation between the compliance rate and lapses were computed to asses whether data might not be missing as random, where missing data could be expected to actually be more likely near lapse moments in alcohol dependent population (Stone and Shiffman, 2002). Cooney et al. (2009) showed that individuals suffering from alcohol addiction report craving in only 8% of the times they administered data, leading to unbalanced craving data. Imbalanced data are data where one classification is underrepresented (Ganganwar, 2012), in this case craving is often zero and less often above zero. As dichotomization is suitable to analyse imbalanced data (Luque et al., 2019), both physiological measures and self-reported craving were recoded into binary variables, whereas lapses were already binary. The data were dichotomized around a personal mean, meaning that above average craving is denoted 1 and below average craving is denoted 0. The cut-off point of using the personalized mean for the dichotomization was optimized over all participants for the Mathews correlation coefficient (MCC), where choosing a cut-off point higher or lower than the personalized mean would lead to either more false positives or false negatives. The MCC, sensitivity, specificity and precision are measures from a confusion matrix, and provide four metrics to analyse the performance of a dichotomized model. The MCC, sensitivity, specificity and precision will be determined for the three hours of physiology "during" answering the questionnaire (three hours prior to the end of the questionnaire registration) and "lagged", meaning the three hours prior to that timeframe of self-reported craving. Other timeframes were also explored, namely one minute, 15 min, 30 min and one hour, but these resulted in lower or similar cross correlation with craving and will not be further reported.

Confusion matrix
The match between above average craving and above average physiology is called a True Positive, see Table 1, and a mismatch a False negative or False positive. A True Negative is when craving is reported below average and the measurement of physiology is also below average. Based on true and false positive, true and false negative from the confusion matrix four measures of coherence can be determined; the MCC, sensitivity, specificity and precision.
The MCC is a measure of the quality of two-class classifications (Chicco, 2017), according to the following formula: Sensitivity is the percentage of true positives correctly predicted to the total of predicted events of for example above average heart rate (Swift et al., 2020), according to the following formula: Specificity is the percentage of true negatives predicted to the total of predicted events of for example above average heart rate (Swift et al., 2020), according to the following formula: Precision is the percentage of true positives predicted correctly to the total of the number of events of for example craving (Deng et al., 2016), according to the following formula: The MCC's outcomes are comparable to a correlation coefficient, with scores between − 1 and +1. Where +1 represents a perfect predictive relation between physiology and craving, 0 indicates random prediction, and − 1 a total disagreement between physiology and craving (when physiology is heightened, craving will be low and vice versa). The acceptable rate of the sensitivity and specificity differs per discipline and is dependent on the context. The precision gives some more information on the rate between the True Positives and False Positives, since the balance of the data has large implications of the outcome. Lalkhen and McCluskey (2008) note that 100% for both sensitivity and specificity is unrealistic in a practical context. For our study a low specificity would be especially problematic, where a low percentage of correct nonevents, would mean that physiology is often below average when selfreported craving is actually above average. This would point towards two findings: First, heightened craving cannot be detected and possible high risk situations are therefore missed, meaning that physiology measured with wearable technology is not suitable for the detection of heightened craving. Second, if the wearable is believed to measure reliably, then the association between physiology and craving is questionable (for that person), since heightened craving does not occur with physiology. Concluding, both for the clinical relevance as empirical support for psycho-physiological concordance specificity should be high (above 80%; Marôco, 2018).
However, when specificity is high, low sensitivity might be less problematic, since this would only indicate that heightened craving is predicted more often for a person than relevant. This could have multiple plausible theoretical explanations. First, a person's physiology responds to more triggers than just craving. EDA and CVA are both influenced by the sympathetic nervous system (SNS), responsible for the fight-or-flight response (Braithwaite, Watson, Jones, & Rowe, 2013) and this is triggered by multiple situations and not only by craving. Therefore, in the next section multiple contextual psychological variables, e.g. stressful events, are included. However, another influencer of CVA and EDA is movement. To assess whether a correction of movement should be included, movement was measured with the wearable and the association with mean HR and Total Amplitude was inspected. If certain outliers of mean HR, meaning above a certain threshold of movement, were strongly related to only movement and consistently not with craving, these outliers were to be removed. Second, as persons diagnosed with AUD have been found to be reluctant to admit their craving (Shiffman, 2009), or may even fail to recognize their own craving (Baker et al., 2004), physiology may detect certain cognitive responses that remain unconscious.
Similarly, the MCC, sensitivity, specificity and precision of selfreported craving and (re)lapse were determined. Here, the amount of expected false negatives will be bigger due to the lack or small amount of lapses expected per participants.

Decision tree
Conditional inference trees (CI trees) were fit to this dichotomized data in order to explore which external factors, measured three hours prior, predicted the level (below average/above average) of selfreported craving. Thus, a prediction above or below average craving, was made with the lagged self-reported external factors and the current and lagged physiological parameters. Self-reported parameters were only included as lagged variables, since we are interested in the predictiveness of these parameters, in addition to physiological parameters, with craving. A CI tree is a decision tree algorithm for binary classifiers, which determines each split on permutation tests, attempting to differentiate between significant and insignificant improvements (see Sardá-Espinosa, Subbiaha, & Bartz-Beielstein, 2017 for a further explanation of CI trees in health data). The final tree was then built based on the entire dataset, without dividing in a training and test dataset, in order to use as much information as possible per participant and the results are aimed to be exploratory. The formula on which the tree was built is the following: Craving* ~ mean HR + mean HR L + Total Amplitude + Total Amplitude L + Stress L + Coping L + Valence L + Arousal L + Nicotine Craving L + Social setting L + Alcohol available L + Permitted L + friend/family L + terrace/restaurant L + party L , where L are lagged variables.

Descriptive statistics
In Table 2 the descriptive statistics of the participants' self-reported craving and relapse data can be found.

Compliance rates
Since we had round the clock measurements and participants were not expected to answer all questions, it is somewhat difficult to determine the exact compliance rate. When we assume a participant is awake for 16 h, he or she can theoretically answer on 6 time points at most. However, it is more realistic to assume five administration moments a day making the aimed for total compliance in a 100 days 500 registrations. Following this logic the compliance rates ranged from 13% to 82%, on average 66%. Stone and Shiffman (2002) warn for the representativeness of the sampling when the non-response is 20% or higher, especially when the data is expected to be not missing at random. However, this was a much longer study than what is typical (see for example; Van Berkel et al., 2017) and the compliance was difficult over this longer period. We included all participants that had a compliance rate of 40%, meaning at least 200 data points for an individual. Therefore, participants 4, 7 and 8 were excluded from the research. Additionally, there was a moderate positive correlation between compliance and relapse, in that the more a participant (re)lapsed the higher the compliance.

Alcohol craving and lapses
We found differences in rates of experienced heightened alcohol craving of above 0, ranging from 8 to 73 percent (mean = 36%) of the administered data points. The mean craving was 1.45 (sd = 1.13), ranging from 0.4 to 3.87. Four persons registered not to experience a lapse at all. These participants did experience craving, but due to the absence of lapses they were excluded in this first analysis. These participants were included in further analyses into the relation between craving and physiology. Of the other participants, multiple lapses were registered, ranging from 6 to 28 lapses.

Confusion matrix
The confusion matrices was used to determine the association between (1) craving and lapses, and the association between (2) physiology, both HR and EDA, and craving.

Association between craving and relapse
The MCC, sensitivity and specificity between craving and relapse during, prior to and 3 h during a (re)lapse of individuals who did relapse are presented in Table 3.
The MCC for self-reported craving during relapse was between 0.19 and 0.90, indicating correlations varying between weak and very strong. In the 3 h prior to lapsing, the MCC was negligible to weak for all participants. Since there were so few relapse events in comparison to the complete data, the specificity for all participants was above 80%, but the sensitivity below 30%. However, three of the five participants had a precision of >92% of the lapses. Meaning, that craving (above average) did not always co-occur with lapses, however, if a lapse occurred, craving was almost always heightened at the same moment. For the craving prior to lapses this was the case for one or two persons (participant 3 and to some extent participant 1). This showed that craving in most individuals was not already heightened 3 h prior to a lapse. In Figs. 4 and 5, two visualizations of the data of participant 6 are shown, the first with the association between lapses and concurrent craving and the latter between lapses and craving 3 h prior to these lapses. Participant 6 showed the highest MCC (0.90) with lapses during craving and the lowest (0.00) with craving prior to the lapses. Other participants have less clear patterns.
The figures show that participant 6 experienced nearly always heightened craving during lapses, however 3 h prior to these lapses craving was mostly absent, and only high in a few incidents. For the first lapse, only during the lapse craving was measured via self-report and not in the 3h prior.

Association between mean HR and self-reported craving
The association between mean HR and self-reported craving was analysed by determining the MCC, sensitivity and specificity during craving and three hours prior to craving, and is shown in Table 4.
All participants showed a negligible to weak MCC between HR and craving during and prior to craving. Four of the seven participants showed high specificity between 88% and 94% during craving, meaning that when their HR was below average, in only approximately 10% of the registrations they experienced above average craving. In these cases the sensitivity was 30% or higher. This showed that craving almost never occurred without heightened HR, but HR can be heightened in the absence of craving. This was also represented in the height of the precision, with precision rates of 71% or higher. Fig. 6 shows participant 5 who had high specificity and Fig. 7 participant 6 who had a lower specificity.

Table 2
Descriptive statistics of the craving and relapse data. The first column shows the participant number, the second the number of answered questionnaires by an individual and the percentage of 500 possible administrations (5 a day for 100 days). The third is the number of above zero answers to the craving question, between brackets the percentage compared to the number of registration in the second column. The remaining columns indicate the number of lapses, mean and standard deviation of the reported craving levels (with a possible range of 0-10), and whether a participant used craving medication.

Participant
Compliance ( Three hours prior to the registration of craving the MCC, precision, specificity and sensitivity are lower for most participants, only two participants (2 and 9) have a specificity of 88 or higher. These participants have a moderate MCC and only participant 9 had a strong precision.

Movement
The association of movement with mean HR and Craving showed no consistent outliers of mean HR related to movement (standard deviation of force). Therefore, no extra threshold based on the SD force could be included to increase the sensitivity of mean HR on craving. The details of this analysis can be found in Appendix B.

Association between total amplitude and craving
For the coherence between Total Amplitude (TA) of the skin conductance responses identified in the EDA signal and self-reported craving a similar MCC, sensitivity and specificity table was made (see Table 5).
Almost all participants showed a negligible to weak MCC between TA and craving during and prior to craving. Only participants 9 and 10 showed high specificity with TA for both during and 3 h prior an above average craving registration. Only for participant 10 (see Fig. 8) the MCC with EDA prior to craving was higher than during. It is important to note that participant 10 only had 30 entries where both craving and EDA were measured successfully. Participant 9 had many data entries, but with low sensitivity. The data of participant 9 is visualized in Fig. 9. Fig. 10 again show an example of a participant with low specificity.

Movement
The association of movement with Total Amplitude and Craving also showed no consistent outliers of Total Amplitude related to movement (standard deviation of force). Therefore, no extra threshold based on the SD force could be included to increase the sensitivity of Total Amplitude on craving. The details of this analysis can be found in Appendix C.

Association between craving and both physiological and contextual variables
In Table 6 the results of the decision trees can be found where contextual variables are added in an attempt to improve the quality of the prediction of craving. Since we want to compare the MCC of the decision trees with the MCC's found by using HR and TA, the MCC's of HR and TA are given in the table in third and fourth column. The MCC during craving was given since this was almost always the highest MCC to compared to the MCC found 3 h prior to craving. Only for participant Table 3 MMC and Sensitivity/Specificity table of craving and lapses. The first column provides the participant number and the second and seventh shows how many were lapses compared to the total amount of registration where both craving was reported and a lapses was registered. The third and eight showed the MCC of craving prior and during a lapse. The fourth, fifth and nineth and tenth are the True Positive, the False Positive with the sensitivity and specificity between the brackets, the True Positive and False Positives add up to the total number of lapses. The sixth and eleventh column display the precision.  Fig. 4. Participant 6 with high compliance, MCC and multiple lapses, red dots are craving measured during a lapse. The x-axis represents time, the y-axis craving. Craving ranges from 0 to 10, with 0 being low craving and 10 high. In the plot every (re)lapse is represented by a red line. The craving value during relapse are colored red, if not missing.
10 the MCC with EDA prior to craving was higher than during (0.85 instead of 0.84). For most participants a decision tree could not be found, meaning that no node could improve the prediction of craving compared to the null model (no craving). A null model is the largest class model, which is predicting there was no craving. The MCC of the decision tree was lower than the MCC with craving and physiology (both HR and EDA) for most of the participants, except two (participant 6 and 9). For participant 6 and 9 the MCC was higher than for HR or TA separately, for all other participants the MCC declined after including contextual variables. Furthermore, participants 6 and 9 had stress and total amplitude (lagged) as part of their decision tree. However, the precision for both was very low. Both had more craving incidents missed than found by their decision tree. Multiple participants have a specificity of above 80% and a sensitivity of above 30%. However, since the data is unbalanced, being overrepresented by low craving moments, often more above craving incidents are missed than registered (low precision).

Discussion
The primary aim of the present ideographic study is twofold: determining (1) the within-person association between self-reported craving and relapses, and (2) the within-person association between heightened physiological activity and heightened self-reported craving during one hundred days of monitoring people trying to recover from AUD in daily life. The secondary aim is to study whether the association between physiology and craving is moderated by contextual variables.
The association between (re)lapses and self-reported craving measured at a similar moment is strong for two of the five participants who relapsed and the other three participants had negligible to weak associations, prior and during relapse. HR has a negligible to weak association with concurrently measured heightened self-reported craving, and for HR three hours prior to craving all participants' correlation's are smaller. The association with EDA is lower than with HR for most participants, except for one participant, both prior and during craving. The association between physiology (both HR and EDA) and craving

Fig. 5.
Participant 6 with high compliance, MCC and multiple relapses, red dots are craving measured 3 h prior to relapse. The x-axis represents time, the y-axis craving. Craving ranges from 0 to 10, with 0 being low craving and 10 high. In the plot every (re)lapse is represented by a red line. The craving value (3 h) prior to the lapse is colored red, if not missing. The black horizontal line is the mean craving.

Table 4
MMC and Sensitivity/Specificity table of mean HR and craving. The first column provides the participant number and the second and seventh shows how many were above average HR measures compared to the total amount of registration where both craving was reported and HR was collected. The third and eight showed the MCC of HR prior and during craving. The fourth, fifth and ninth and tenth are the True Positive, the False Positive with the sensitivity and specificity between the brackets, the True Positive and False Positives add up to the total number of above average craving registrations. The sixth and eleventh column display the precision. improves for two of the seven participants adding contextual variables, stress being the most consistent contributing factor, however the precision is low. Below, we will further discuss the results and their theoretical and practical implications for craving research and alcohol addiction treatment.

Association between craving and relapse
The association between (re)lapses and self-reported craving is evaluated to determine whether heightened self-reported craving would lead to (re)lapses for individuals. Of the 10 participants, six persons register experiencing lapses, which is in agreement with the findings of Kirshenbaum et al., (2009), who found that 60% of people in treatment relapses in the first 100 days. For the five participants who experienced lapses (one is excluded due to lack of other data), the association varies between weak (0.19) and strong (0.53). The overall finding is that the association between lapses and craving is highly different but consistently found across individuals, between weak to strong. The association between lapses and lagged craving is weaker. Two participants (40% of the lapses sample) showed consistent craving three hours prior to Fig. 6. Participant 5 with specificity 90% and sensitivity 30% during above average craving. The red lines represent craving moments and the red dots the corresponding mean HR values. The black horizontal line is the average HR for the individual participant that was used to determine the sensitivity and specificity. The red dots above this mean line are True Positives and below this line are False Positives. The grey dots below the line indicate true negatives, and above the line false negatives. The dark grey dots are overlapping light grey dots. Fig. 7. Participant 6 with specificity 73% and sensitivity 30% during above average craving. The red lines represent craving moments and the red dots the corresponding mean HR values. The black horizontal line is the average HR for the individual participant that was used to determine the sensitivity and specificity. The red dots above this mean line are True Positives and below this line are False Positives. The grey dots represent below the line indicating true negatives, and above the line false negatives.
relapse. This moderate relation of craving during lapses is in line with prior cross-sectional non-EMA studies that found a relation between craving and relapse (Higley et al., 2011;Miller et al., 1996), given that this found cross-sectional relation translates to the individual level. However, it seems to be in contrast to prior EMA studies that found no between person association between craving and relapse (Cooney et al., 1997;Holt et al., 2012;Krahn et al., 2005). To the best of authors knowledge, this is the first study that researched the within-person association of craving with separate lapses or drinking moments in a longitudinal design, therefore comparison with between persons relapse studies is difficult.
The association between self-reported craving during lapses is further explored by the specificity and sensitivity. We find the overall specificity to be high, which indicates that almost all lapses are accompanied by heightened craving. However, two participants experienced approximately 40% of the lapses without craving. Whereas for the other participants this is as low as 0% to 8%, meaning craving is almost always experienced during a lapse. Again, this points to substantial heterogeneity within this population. This might explain the complex relation between craving and drinking, where not every person experiences craving prior to drinking. Cooney et al. (1997) argued that craving may only occur in a subset of individuals, this hypothesis seems less likely since all participants did experience some level of craving. Alternatively, in the cognitive processing model, Tiffany (1999) hypothesised that craving is only experienced when there is a blockage in the drinking process. Meaning that if a person drinks routinely, drinking can happen without craving. Two participant lapsed without craving, the other individuals experienced almost always craving during lapses. Tiffany's cognitive processing model (1999) would imply that there first was some kind of blockage for these individuals (for example trying to stay abstinent or a bar being closed). Yet, the addition of the availability and allowance of alcohol and even own coping skills as a contextual predictors did not improve the association between physiology and craving within the individuals in our study.
The sensitivity of lapse prediction lies between 8% and 25%, showing that a participant lapsed at most a quarter of the times (s)he experienced craving. This shows that even people who lapse are mostly successful in abstaining. This might be explained by Rohsenow and Table 5 MMC and Sensitivity/Specificity table of Total Amplitude and craving. The first column provides the participant number and the second and seventh show how many were above average for craving compared to the total amount of registration where both craving was reported and Total Amplitude was collected. The third and eight showed the MCC of TA prior and during craving. The fourth, fifth and nineth and tenth are the True Positive, the False Positive with the sensitivity and specificity between the brackets, the True Positive and False Positives add up to the total number of above average craving registrations. The sixth and eleventh column display the precision. *Removal of outliers did not increase the sensitivity or specificity. Monti 's (1999) proposal that craving and drinking are moderated by a person's coping skills. A person can withstand craving and decide not to drink. Both Cooper et al. (1992) and Cox and Klinger (1988) advocate this idea that craving is mediated by motivation and a decision whether or not to drink.

Association between physiology and craving
The within person associations between physiology and self-reported craving are lower than the 0.38 between person correlation found by the meta-analysis of lab studies of Carter and Tiffany (1999). Of the participants five out of nine show a weak association between craving and HR between 0.25 and 0.29, the other four participants had a negligible association. This seems to be in line with Ooteman et al. (2006) who argued that concordance of physiological cue reactivity and craving may only be present in a subgroup of alcoholic persons. Their hypothesis is that certain individuals are more sensitive to their bodily reactions than other participants. However, even in this subgroup the associations are weak. The correlation is further explored by dividing it in sensitivity and specificity. All five participants had relatively high specificity (on  average 90%) for the associations HR and Craving, meaning that not many craving "events" are not accompanied by above average HR. The low correlation is more explained by the low sensitivity of heightened HR. The low sensitivity indicates that no craving is present during a considerable number of moments of heightened HR. At these moments many other external or internal events may cause heightened HR (e.g., aggressive dog or excessive rumination about a previous argument), which are not all included in this research. Cacioppo et al. (2016) describes this a one-to-many psychophysiological mapping instead of an one-to-one relation between physiology and craving. Heightened HR maps on multiple psychological responses and not only on craving.
The associations between heightened self-reported craving and EDA is present in one participant. This participant shows nearly only increased TA during craving and vice versa. However, the overlay in data between EDA and craving is limited for this participant. This result is promising, but should be replicated in other individuals with more dense longitudinal data prior to drawing conclusions. Other participants have low precision, meaning that heightened craving are more often missed by TA than discovered. This finding is in contrast to Rosenberg (2009) who argues that there is a relation with both HR and EDA with respect to craving.

Association between (lagged) context precursors of craving
The secondary aim is to study whether the association of craving as obtained with physiological parameters, can be improved by the inclusion of context related variables such as stress, social activities (e.g. upcoming parties), and perceived self-control. The results of this study show that associations cannot be improved consistently across all individuals by including these evidence-based predictors of craving and relapse. This is in contrast to prior daily life studies (Serre et al., 2015) and our literature review (van Lier et al., 2018) that showed multiple context variables that were related to craving. The current study possibly differs from prior studies, since the current study investigates only the lagged influence of the context variables. Therefore, these contextual variables might have an effect more proximal to the moment of craving or longer than 3 h prior to craving, rather than 3 h prior to craving.

Clinical implications
This study shows that individuals experiencing relapse during their attempt to abstain from drinking, drank in 8% to 25% of craving incidents during 100 days. Conversely, 92% to 75% of craving incidents can be successfully resisted by these participants. This suggests that in at least a part of people undergoing alcohol dependence treatment, preventing relapse comes down to helping them to get through those few critical events where personal control fails. However, alarming a person specifically for these rare events based on physiology seems currently not viable as an ecological momentary intervention (EMI; Heron & Smyth (2010)), meaning outside the lab, since physiology and craving do not co-occur in sufficient frequencies over time outside the lab. Only for 57% of participants above average heart rate and craving co-occur in on average 31% of the events. This implies that 69% of the alarms will be false positives, which might become too burdensome (Beckjord & Shiffman, 2014) and disrupting (Yu et al., 2018), and participants could end up ignoring them (Nahum-Shani et al., 2018). However, this is under the assumption that all HR not accompanied by craving are false positives and not missed craving related incidents by the individual. As Baker et al. (2004) hypothesized, an individual might only experience craving if it surpasses a certain awareness threshold. Therefore prior to using wearables as EMI for specific biocueing in addiction treatment, future studies should explore what the true rate of false positives is of physiology and craving or lapses and whether that is an acceptable rate for biocueing (ter Harmsel et al., 2020). However, the specificity is high for participants, indicating not many craving incidents were missed by above average heartrate. Therefore, we do see the value of exploring the potential added benefit of using general, non-specific biocueing during treatment in the sense that many of the craving incidents happen between counseling sessions and due to recall bias patients forget what happened during their week. Furthermore, physiology could have a relation with relapse without a causal role of craving, meaning that physiology instead of craving is predictive of lapsing. Neutral reminders during the week of above average HR occurrences as implemented in many modern wearables or dedicated clinical apps (Derks et al., 2019) can support patients to recognize potential risky moments and help start the conversation about possible high risk situations during that week. This would also give the counselor an overall view of the client's state, since both human and automated feedback gives the highest treatment effect (Wang & Miller, 2020).

Compliance rate
The compliance rate for self reported questions of on average 66% is comparable to the finding of 70% of Jones et al. (2018). Stone and Shiffman (2002) warn for the representativeness of the sampling when the non-response is 20% or higher, especially when the data is expected to be not missing at random. A case of not missing at random might be that participants stopped answering experience sampling questions or wearing the wearable technology when coming near to or during (re) lapses. There is a moderate positive correlation between compliance and relapse, in that the more an participant (re)lapsed the higher the compliance. Therefore it is not plausible that a lot of data was missing Table 6 MCC and Sensitivity/specificity table of decision tree. The first column provides the participant number and the second the number of above average craving situations. The third and fourth column show the MCC for the HR and TA during craving. The fifth shows the MCC for the decision tree. The sixth and seventh are the True Positive and the False Positive with the sensitivity and specificity between the brackets, the True Positive and False Positives add up to the total number of above average craving registrations. The eight shows the MCC and last the significant features from the CI decision tree. not at random. Additionally, the nonresponse Stone and Shiffman (2002) were warning for is mostly based on short EMA studies, where representativeness is a bigger issue.

Study duration
EMA studies differ in duration, Serre et al. (2015) found that EMA studies in substance abuse are on average 34 days, with a max of 175. However, longer studies had often a less intensive design. Stone et al. (1991) recommended to use a max study duration of 2-4 weeks, due to a decline of data quality after these weeks. Van Berkel et al. (2019) found that the accuracy results for the working memory task did not change over the study duration, the accuracy of the recall task dropped as the study progressed. We expect to have little impact of this decline in recall accuracy, since the study mainly questioned current rather than past experiences. However, this study needed more weeks of data than the advised 2 to 4, due to the within person design and since (re)lapses were expected in less individuals in this first period (Kirshenbaum et al., 2009).

E4
Another limitation of the study is the limited validity of the wearable device for EDA for smaller stressors (Menghini et al., 2019;van Lier et al., 2020). The low concordance between self-reported EDA and craving could be explained by the fact that the E4 wearable is not sensitive enough for precise EDA measurements. In a validation study the E4 wearable was found only to be valid for strong sustained stressors with TA and for HR also for smaller environmental stressors (van Lier et al., 2020). The low co-occurrence between TA and craving could be explained by the fact that craving is not a strong sustained stressor and more precise measures of TA are needed. Menghini et al. (2019) found no correlation between the golden standard (fingers) and the E4 (wrist) and hypothesized that the differences in measures could be due to the differences in sites. Hence, EDA measured at the fingers responds differently to emotional and cognitive stressors than EDA measured at the wrist. Menghini et al. (2019) also found the best results for HR, both on a cognitive and environmental stressor.

Repeated single subject design
This was a repeated single subject or n-of-1 design (Vieira et al., 2017), hence with a small sample. Given that this study was the first of its kind, our focus was more on exploring the temporal and withinperson fluctuations of craving , rather than testing specific hypotheses. However, n-of-1 studies are more difficult to generalize to the population of interest. Vieira et al. (2017) argue that these studies are of particular interest when developing more tailored interventions. They state that if multiple studies explore the same topics with small samples, aggregation of multiple n-of-1 studies are possible with meta-analysis or mixed methods. We therefore recommend future studies to keep exploring the relations addressed in this study following the trailblazing approach we presented, in order to perform a metaanalysis of the larger participant base.

Conclusions
The current study is one of the first longitudinal EMA studies that investigated the association between craving and physiology in a within subject design in the daily life of recovering alcoholics. It is an important step towards the development of the use of wearable devices in alcohol treatment on the basis of physiological data, specifically measures related to cardiovascular fluctuations. This study underscores the importance of individual differences amongst people, as suggested by Drummond (2000) and Kavanagh et al. (2013). There is a real need for personalized research, maybe even individualized models and treatment (Alayan et al., 2018).
Funding: This work was supported by the University of Twente's Tech4people program of the BMS faculty

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Relation between Total Amplitude, SD of Force and Craving for participant 10 (correlation = 0.07).