How can food choice best be trained? Approach-avoidance versus go/no-go training

Behavior toward appetitive stimuli can be changed by motor response training procedures in which participants approach or respond to some stimuli and avoid or inhibit behavior to other stimuli. There is discussion in the literature whether effects are different when participants approach versus avoid stimuli during approach-avoidance training compared to when they respond versus not respond to stimuli during go/no-go training. Here, we directly compared effects of approach-avoidance training and go/no-go training on food choice within the same rigorous experimental protocol. Results showed that both training procedures influence food choice such that participants preferred Approach over Avoidance food items, and Go over NoGo food items, and these training effects were not statistically different. The present work suggests any inconsistencies in the literature on possible differences in effectiveness of these training procedures may be explained by differences in methods employed. The present work also raises new theoretical and applied questions about motor response training as a means to change behavior.


Introduction
An important challenge for psychologists is to help people with changing their behavior toward appetitive stimuli such as high calorie foods or alcoholic beverages to reduce overconsumption of such stimuli, which can have severe health consequences (e.g., Marteau, Hollands, & Fletcher, 2012). Various approaches to address this challenge exist ranging from extensive individual counseling, to brief internet interventions employing motivational and social influence techniques (e. g., Kaner et al., 2018;Wilfley, Hayes, Balantekin, Van Buren, & Epstein, 2018). A relatively novel approach to this topic is the use of so-called motor response training (e.g., Houben & Jansen, 2011;Stice, Lawrence, Kemps, & Veling, 2016;Wiers, Rinck, Kordts, Houben, & Strack, 2010). During such training participants execute or withhold responses to images of appetitive stimuli on an electronic device. Despite their apparent simplicity, meta-analyses suggest such trainings can be effective in modifying consumption of high calorie foods and alcoholic beverages (e.g., Aulbach, Knittle, & Haukkala, 2019;Jones et al., 2016;Kakoschke, Kemps, & Tiggemann, 2017a).
Interestingly, although different response training procedures exist (Stice et al., 2016), little is known about which motor response training can best be used to change behavior. In the present work we directly compare the effectiveness of two prominent response training procedures in influencing food choice: Go/no-go training (GNG) and approach-avoidance training (AAT). We think this comparison in the domain of eating behavior is important and interesting for two reasons. First, results from two recent meta-analyses suggest that GNG may be more effective to influence eating behavior than the AAT (e.g., Aulbach et al., 2019;Yang et al., 2019), but it is theoretically unclear why this would be the case. Second, a direct comparison between GNG and AAT tasks would provide information on possible differences in effect sizes, which would be informative for scientists and practitioners to determine which task to choose to influence food choice.
During GNG participants are asked to respond with a simple key press when a go cue is presented and to not respond when a no-go cue is presented. The go cues are consistently presented together with images of Go food items or control Go items, and the no-go cues are consistently presented with NoGo food items. After this training, participants preferred Go food items over NoGo food items for actual (Chen, Holland, Quandt, Dijksterhuis, & Veling, 2019;Porter et al., 2018) and hypothetical (Veling, Aarts, & Stroebe, 2013) consumption, and consumed less of the NoGo items compared a control condition in a bogus taste test (e.g., Houben & Jansen, 2011;Lawrence, Verbruggen, Morrison, Adams, & Chambers, 2015b;Oomen, Grol, Spronk, Booth, & Fox, 2018; but see Kakoschke, Kemps, & Tiggemann, 2017b). Meta-analyses suggest GNG is effective in influencing eating behavior (Aulbach et al., 2019;Yang et al., 2019). A recent p-curve analysis (Simonsohn, Nelson, & Simmons, 2014), however, suggests that reported effects of GNG on eating behavior may partly reflect publication bias and/or p-hacking (Carbine & Larson, 2019). Then again, updating the p-curve analysis with results of recent preregistered experiments (from Chen et al., 2019) does suggest evidential value of GNG on eating behavior (Veling, Chen, Liu, Quandt, & Holland, 2020).
During AAT, participants approach some food items (e.g., by pulling a joystick toward the body in response to low calorie food items) and avoid other food items (by pushing a joystick away from the body in response to high calorie food items) depending on some stimulus feature (e.g., whether the image is in landscape or portrait form). A first study found approaching healthy foods (words like yogurt and apple) and avoiding unhealthy foods (words like cookie and fries) was effective in increasing choices for healthy over unhealthy food items compared to a between-subjects condition with reversed-mapping (Fishbach & Shah, 2006), but this study had several methodological limitations (see e.g., Becker, Jostmann, Wiers, & Holland, 2015). When these limitations where addressed, no consistent effect on choice was found (Becker et al., 2015). One study found avoiding chocolate versus approaching chocolate reduced chocolate intake (Schumacher, Kemps, & Tiggemann, 2016), but several other experiments failed to observe effects on intake (e.g., Becker, Jostmann, & Holland, 2018;Dickson, Kavanagh, & MacLeod, 2016;Kakoschke et al., 2017b). Sometimes an effect on one measure of eating behavior was found (e.g., choice), but not on another measure (e.g., intake; Kakoschke et al., 2017b). Other studies employing modified AAT trainings (e.g., a gamified version or a version in which the movements have clear evaluative consequences) reported effects on choice or intake (e.g., Schakel et al., 2018;Van Dessel, Hughes, & De Houwer, 2018). Nonetheless, a recent meta-analysis suggests no effect of AAT on eating behavior even when including these latter studies (Yang et al., 2019).
Thus, it appears that GNG may be more effective than the AAT in modifying eating behavior. Interestingly, there is no clear theoretical reason to expect that the training tasks should elicit different effects in the domain of eating behavior, but this is also partly because the exact working mechanisms of these tasks have not yet been fully uncovered (e. g., Aulbach et al., 2019;Veling Lawrence et al., 2017). Both tasks do seem to change evaluations of trained stimuli (i.e., Go or Approach items are evaluated more positively than NoGo or Avoidance items; e.g., Chen, Veling, Dijksterhuis, & Holland, 2016;Kawakami, Phills, Steele, & Dovidio, 2007), which points to the possibility that both tasks may work by modifying food evaluations (Johannes, Buijzen, & Veling, 2021; see also Aulbach et al., 2019 for this argument). Awaiting further theoretical development, it seems also important to consider methodological differences between AAT and GNG research, which may account for different effects. Indeed, what complicates comparisons between AAT and GNG, is that each research line has used different conventions in the experimental protocols. Therefore, evaluating whether GNG is more effective than AAT with meta-analyses can be problematic, because these different training tasks may have been examined with crucial methodological differences.
There are a number methodological aspects of experiments examining AAT versus GNG that may need to be controlled to better compare the effects. For instance, in most GNG work, the go or no-go cues are presented slightly after picture onset (e.g., 100 ms Chen et al., 2019) whereas this is not the case for most AAT work. Another notable difference is that research employing the AAT often included a measurement of bias modification after the training and before the measurement of behavior (e.g., an approach avoidance task to measure whether the training is effective; e.g., Becker et al., 2015;Dickson et al., 2016; or an implicit association task; Kakoschke et al., 2017b) whereas this is usually not the case for GNG (e.g., Houben & Jansen, 2011;Chen et al., 2019;Porter et al., 2018). Measuring the effectiveness of the intervention with a measure that resembles the training may, at least partly, undo the effect of the training on behavior. Another difference is that during GNG contingencies between stimuli and responses are usually 100% consistent across trials (e.g., Houben & Jansen, 2011;Chen et al., 2019;Porter et al., 2018), whereas in AAT work this consistency is often lower (e.g., 10% of the trials participants execute the opposite movement; Becker et al., 2015;Dickson et al., 2016;Kakoschke et al., 2017b). This inconsistent mapping between trained responses and images may reduce strength of the training (see Jones et al., 2016). Therefore, we think an important step to evaluate whether the two training tasks are differently effective in changing eating behavior is to compare them directly within the same design in which procedural aspects that could reduce the effectiveness of the trainings on behavior are eliminated. That is, we examined the effectiveness of both tasks without inclusion of a measure of bias modification directly after the training procedure, and with 100% consistency in contingencies between responses and specific food items (i.e., between foods and go or no-go responses or between foods and approach or avoidance responses). Note that one previous study that compared the AAT and GNG within the same experimental design did not find consistent effects on behavior. Supporting our own analysis, this study, however, did include a bias modification measure between the training task and the behavioral measure and also used the 10% inconsistent mapping of responses and images described above (Kakoschke et al., 2017b). Another study that also compared the AAT and GNG within the same experimental protocol, but in the domain of alcohol consumption, found similar effects of both training procedures (i.e., a decrease in intake of alcohol during a taste test), when booster training sessions were implemented after the measurement of bias modification, and right before the measurement of alcohol intake (Di Lemma & Field, 2017). The goal of the present study was to provide another direct comparison of the two training procedures, but this time in the domain of food choice, using a rigorous experimental protocol.
Accordingly, the research question was whether the GNG and AAT influence food choice similarly when they are employed using a similar protocol. We chose to use an experimental protocol that we found to be very robust in eliciting effects on food choice using GNG . Here we adapted that procedure such that it could be used to directly compare the GNG with the AAT. In the current study, participants were randomly assigned to GNG or AAT to train responses to snack food items. The two training tasks were kept as comparable as possible in terms of consistency between items and responses, task design, and cues for Go or Approach items versus NoGo or Avoidance items. The experimental procedure for GNG and AAT participants was equal; participants started with rating the snacks, then received the training (either GNG or AAT), chose preferred snacks in a binary decision task, and finally rated the snacks again. We predicted that participants would choose Go over NoGo items for consumption based on our previous work . We did not have a directional hypothesis for the AAT, because of the null finding in the literature for the AAT, or for the post-training evaluations. For the preregistration see, https://osf. io/syec2/.

Method
This research was approved by the ethics committee of the faculty of social sciences at Radboud University, ECSW-2018-065.

Participants
Participants were recruited using Radboud University research participation system (i.e., SONA) and ads posted in social Media for a study about food preferences and cognitive performance. They were randomly and equally distributed across the GNG and AAT conditions, so that 62 participants were assigned to each condition. In return for their participation, they received study credit or a gift card of 7.50 Euro value as well as one of the snacks chosen in the experiment. The sample size was based on a previous power analysis of effects of GNG on food choice indicating that with 60 participants a preference for Go over NoGo items can be detected with a power of 90% at an alpha level of .05 assuming an effect size of Cohen's d of 0.5 ; slightly more participants participated because participants signed up via an online system where multiple slots were available). Based on preregistered exclusion criteria, two participants were excluded from the GNG condition (% correct responses was under 90% and below 3 SDs from the mean performance), and one participant was excluded from the AAT condition (more than 40 food evaluation ratings under − 75 on a scale from − 100 to 100. A more detailed description of the sample is provided in Table 1.

Materials
The stimuli used in the GNG and AAT were pictures of 60 energydense sweet or savoury snacks that were created by Veling, Chen, et al. (2017). The pictures showed the snack in the package and part of the unwrapped snack on a black background. Ten additional snack pictures were used as practice stimuli.

Procedure
Participants were asked to not eat anything for at least 3 h before participating in the experiment. Therefore, all sessions were scheduled after 11:00 a.m. Before participants went to their individual cubicle, we showed them our complete range of snacks in order to reassure them that the food choices they made in a later task had real consequences. The research assistant who ran the study was blind to the training group of participants (GNG, AAT). Participants executed the tasks in the order that they are described below.

Rating task
Participants first completed a rating task, in which 60 different calorie-dense snack pictures were presented one by one. For each picture, they indicated how much they wanted to eat the depicted snacks at the moment by using a 200-point slider (− 100 = Not at all; 100 = Very much). The purpose of this task was to ensure that the food items could be matched on attractiveness for each participant before the training as explained next.

Sorting and selection
After the rating task, the experimental program rank ordered the pictures from 1 to 60, reflecting the lowest till the highest rating for each participant. Based on this ranking, 32 pictures were selected and used in the training tasks, where 16 items were selected for the go or approach condition, and 16 for the no-go or avoidance condition. Of the 16 items in each item condition (i.e., Go, NoGo, Approach, Avoidance), 8 items were chosen from the lowest ratings (the items ranked 8-23; low value condition), and 8 items from the highest ratings (the items ranked 38-53; high value condition), resulting in an equal distribution of lowvalue and high-value items across item conditions. For instance, item 8, 11, 12 and 15 were trained as Go or Approach items, and item 9, 10,13 and 14 were trained as NoGo or Avoidance items. The more extremely ranked items were used as experimental items (i.e. items 8-15 and items 46-53) and less extremely high and low ranked items were used as filler items (i.e. items 16-23 and items 38-45).

Training
Next participants received instructions for the training. In both conditions, the training was framed as an attention task while looking at food pictures.
During the GNG, a picture of a food item was presented on screen for 1 s, and 100 ms after picture onset a tone (1000Hz or 400Hz, counterbalanced across conditions) was played through headphones which lasted for 300 ms. Participants responded to this cue by using the keyboard. They were instructed to press the B key as fast as possible if they heard a go cue and to not respond when they heard a no-go cue. Trials were separated by a jittered inter-trial interval of 500 ms ( ±100 ms). Participants received 10 training blocks in which each picture was presented once in random order, resulting in 160 training trials in total. Every 2 blocks, participants would be able to take a break, and they received feedback on their performance in that block. In case of the GNG, this was the percentage of accurate responses.
During the AAT, the timing of the images and tones was identical to the GNG. However, instead of responding or not, participants now always responded by using a joystick, and they either approached or avoided the images depending on the cue. Because a joystick was used, trials were now self-paced and each trial started by holding the joystick upright and then pressing a button on top of the joystick. Immediately after the button press, the picture was presented on screen and 100 ms later, the approach or avoidance tone was played via a headphone. If this tone was an approach cue, participants were instructed to pull the joystick towards them, which was accompanied by an optical zoom of the picture coming closer (i.e., by making the image larger); participants were instructed to push the joystick away from them if the tone was an avoidance cue, creating a zoom effect of the picture decreasing in size. The change in size of the picture was linear to the movement of the joystick, with the maximum format being such that the picture of the snack covered the whole screen. When the joystick was moved to the furthest point in the target direction, the picture disappeared and the trial ended. Participants received the same number of trials as in the GNG. They also received feedback after every second block, consisting of the average response time in that block.
Before each training, participants in both conditions first heard the two cues that would be used in the training, with an explanation of how they should respond to the respective cues. To get familiar with the procedure, participants completed four practice trials of both training conditions within their training (go and no-go, or approach and avoidance), showing additional food items only used for practice. In the GNG, participants received visual feedback whether their response was correct, in the AAT, the picture disappeared only if the response was correct. After this, participants could repeat the practice as often as they wanted. The instructions were repeated after practice and before the training started.

Choice task
Immediately after the training, participants completed a snack choice task. They were informed that they would receive a series of choices between snack items, and that at the end of the task one trial would be selected and their choice on that trial would be honoured. On each trial, participants were presented with two snack items, side by side, and they indicated which one they preferred by pressing the U button on a keyboard for the left item or the I button for the right item Note. BMI = Body Mass Index. using a QWERTY keyboard. As feedback, a green frame appeared around the selected item. Participants had 1500 ms to indicate their choice, and if they responded too late, no choice was registered and the trial was repeated at the end. First, participants practiced the task using 8 pairs of non-trained food items that were reserved for this practice block.
The actual choice task consisted of experimental trials and filler trials. On experimental trials, the two snacks in one trial were matched on ratings during the rating task (items were both of high or both of low value), but one had been associated with go (GNG condition) or approach (AAT condition) and one with no-go (GNG condition) or avoidance (AAT condition). On filler trials, the two snacks were from the same item condition (both Go/Approach or both NoGo/Avoidance), but one had received a low rating and one a high rating during the rating task. These filler trials were included to assess the validity of the choice task and rating task (i.e., participants should prefer high over low rated items).
The choice task consisted of 64 experimental trials and 64 filler trials. The experimental and filler trials included the 16 Go/Approach and 16 NoGo/Avoidance trained items. Within each group of these 16 items, 8 were high-value items and 8 were low-value items. Within each of these latter 8 items, 4 were consistently used in filler trials, and 4 in experimental trials. Each item was paired with each of the four items of its subcategory (so for the experimental trials, one high-valued Go/ Approach item was matched to the four high-valued NoGo/Avoidance items, and for the filler items, one high-valued Go/Approach item was matched to the four low-valued Go/Approach items). This means that each item was present in four different choice pairs. Moreover, all choice pairs were presented twice in the choice task with the position of the choice alternatives on the screen (left versus right) counterbalanced. This resulted in a total of 64 experimental trials in which participants chose between a Go/Approach and NoGo/Avoidance item (16 high value choice pairs presented twice and 16 low value choice pairs presented twice), and 64 filler trials in which participants chose between a low and high value item (16 choice pairs between two Go items presented twice and 16 choice pairs between two NoGo items presented twice). Furthermore, two additional choice trials were presented that were randomly selected from a pool of six snacks. The script randomly selected one of these two trials and participants received the item of their choice on this trial after the experiment.

Recognition task
For exploratory reasons, we also measured to what degree people could remember the training condition of each food item. All the pictures from the training task were shown one by one, and participants indicated for each picture whether it was associated with pressing B/not pressing B (GNG) or pulling the joystick/pushing the joystick (AAT) during the training.

Post-training rating task
The rating task was administered again after the training to explore if evaluations of the food items had changed after the training and choice tasks.

Demographics
Participants filled out their age, gender, height, weight, the last time of food consumption, and the current hunger level (0 = Not hungry at all; 100 = Very hungry). They were asked what they thought the study was about and what they thought that the goal of the training ("attention task") was.
At the end of the study, participants were thanked, debriefed and received their payment and the selected snack.

Data analyses 2.11.1. Preregistered analyses
ANOVA's were used to check whether the item matching procedures resulted in item sets (Go versus NoGo or Approach versus Avoidance) that were matched on item evaluation before the training (based on ratings from the pre-training rating task) within each training condition (GNG, AAT) separately, and to examine effects of the training on item evaluation across time (pre versus post training rating task). 1 The main hypothesis that participants would prefer Go over NoGo items was tested with an intercept-only mixed logistic regression model with item condition of the chosen snack as the outcome variable (choices for Go over NoGo items), and including a random intercept per participant. Note that we only included a random intercept per participant, and not per item, in all reported mixed logistic regression models, because items were different for each participant due to the sorting and selection procedure, which may result in convergence problems when fitting models. We addressed this issue for the main analyses in the additional analyses described below.
A similar analysis was performed on the AAT data, but now with choices for Approach versus Avoidance items as the outcome variable, and this analysis was exploratory instead of confirmatory. Additional exploratory analyses were performed to test whether effects of the training were different within high value and low value choice pairs by adding the factor value of choice pairs (high versus low) to the model as the predictor variable. Effects of the training within low and high value choice pairs separately within each training condition were also explored with mixed logistic regressions. We also tested whether participants preferred high over low value items using the intercept-only mixed logistic regression model within each training condition.
For each mixed model, the contrast coding was as follows: training condition (GNG = 1, AAT = 0), item condition (experimental trials, Go/ Approach = 1, NoGo/Avoidance = 0; filler trials, high value item = 1, low value item = 0), value of choice pair (high = 1, low = 0). A coding of 0 refers to the reference category. An effect is considered significant when the confidence interval (CI) of the effect does not contain 1. Hence, it should be noted that we report CIs instead of p-values for the main analyses. This is because to compute p-values for mixed models, one should preferably perform likelihood ratio tests (LRT; Gudicha, Schmittmann, & Vermunt, 2017), which compare the likelihood of one model to the likelihood of a simpler model, such as the intercept-only model. However, since we employ the intercept-only mixed models (the simplest possible model) for the main analyses here, it would be impossible to obtain the p-values with this method.
To explore whether the GNG training and choices influenced subsequent evaluations of snack items, we preregistered an exploratory repeated measures ANOVA on the change in ratings from pre-training to post-training. This analyses was conducted separately for the GNG and AAT, with time (pre versus post), item value (high versus low) and item condition (Go versus NoGo or Approach versus Avoidance) as within factors.

Additional analyses
We preregistered to conduct Bayesian analyses on the effects of the training on food choice within each training condition, but did not specify how exactly so therefore we report them as additional analyses. As a robustness check, we also ran a Bayesian mixed logistic regression model with item condition of the chosen snack (Go versus NoGo) as the outcome variable, including random intercepts per participant and per item. Note that in this Bayesian version mixed model we included random intercepts at both the participant-level and the item-level, as the maximal varying effect structure can generally be fitted in a Bayesian framework (Bates et al., 2015a;Eager & Roy, 2017;Nicenboim & Vasishth, 2016;Sorensen, Hohenstein, & Vasishth, 2016). We chose a weakly informative prior for the intercept (i.e., α ~ Nomral(0, 10)) according to the advice of Nicenboim and Vasishth (2016). For random effects we used the default priors provided by the brms package (Bürkner, 2017;Carpenter et al., 2017). If the 95% credible interval does not include 1, we deem an effect "significant": we get a probability distribution of true values for a specific parameter (in this case the intercept); and if the 95% range of that distribution does not include 1, we deem it likely enough that the true value does not include 1 and call the effect significant. One may see Kruschke and Liddell (2018) for more detailed information.
To explore the effect of the training condition on food choice, a mixed logistic regression model was performed with choices for Approach/Go items versus Avoidance/NoGo items as the outcome variable, and with training condition as the predictor variable and including a random intercept per participant. To test whether memory for training cues (i.e., go or no-go, approach or avoidance) was significantly different in the GNG and AAT condition, we performed another mixed logistic regression model with the proportion of correct over incorrect remembered cues as the outcome variable and the training condition (GNG vs AAT) as the predictor variable and including a random intercept per participant. For all mixed models, we only report confidence intervals (CIs) in order to be consistent with the aforementioned main preregistered analyses.
To further explore evaluations, we conducted an additional repeated measures ANOVA and computed correlations. We investigated whether the effects of GNG and AAT on the evaluations were significantly different in a repeated measures ANOVA with time (pre versus post) and training condition (Go or Approach versus NoGo or Avoid) as within factors, and training type (GNG or AAT) as between factor. Furthermore, and to explore the association between choices and evaluations, we computed two-sided Pearson correlations between choice proportions and evaluations. Per participant, choice proportions were computed for Go versus NoGo items, or for Approach versus Avoidance items. These choice proportions were correlated to the participants' average difference in evaluations from pre-training to post-training as well as for their evaluations in the post-training measure only, also separately for Go and for NoGo, or for Approach and Avoidance items.

Results
We first report whether the selection procedure selected Go/ Approach and NoGo/Avoidance items that were equally liked by participants before the training. This is followed by the analyses on food choice. Finally, we report the preregistered exploratory analyses on food evaluations. Before presenting these results, we describe the general performance of participants in the two training tasks and in the memory task.

Performance
Participants' response accuracy in the GNG Training is reflected in the percentage of not responding to no-go trials and responding to go trials. On no-go trials, accuracy of withholding responses was 98% (SD = 2%), and the accuracy of responding on go trials was 99% (SD = 3%). The median reaction time of the responses in go trials was 478 ms (SD = 56 ms). Participants correctly categorized on average 66% (SD = 18%) of the Go trained items as go trained, and 64% (SD = 19%) of the NoGo items as no-go trained in the recognition task.
Response accuracy in AAT consists of moving the joystick directly in the correct direction, without moving to the other direction. The reaction times reflect the time it took participants to move the joystick to the end position. On approach trials, participants had an average response accuracy of 86% (SD = 8%) and a median reaction time of 667 ms (SD = 78 ms). On avoidance trials, they had an average accuracy of 84% (SD = 9%) and a median reaction time of 665 ms (SD = 76 ms). Participants on average correctly remembered 53% (SD = 15%) of the Approach items as approach-trained, and 54% (SD = 16%) of the Avoidance items as avoidance-trained. Regarding whether memory for training cues (i.e. go or no-go, approach or avoidance) was significantly different in the GNG and AAT condition: The average proportion of correctly remembered training cues was significantly higher in the GNG training condition compared to the AAT training condition, OR = 1.70, 95% CI [1.35; 2.15].

Selection procedure check
Then, we checked whether the selection procedure correctly categorized positively and negatively evaluated items as high and low value items in the case of GNG, and whether these items were evenly distributed across the go and no-go item conditions. In order to test this, the average pre-training ratings were analysed with a 2 (item value, high versus low) by 2 (item condition, Go versus NoGo) Analysis of Variance (ANOVA). As expected, average pre-training ratings for high versus low item value items differed significantly F(1, 59) = 737.66, p < .001, η 2 generalized = 0.760, indicating that items categorized as high value were the items that received higher ratings before the training compared to low value items. There were no significant differences in pre-training ratings between item conditions (Go versus NoGo), F(1, 59) = 0.07, p = .784, or of the interaction between item condition and value, F(1, 59) = 1.57, p = .215. This indicates that the selection procedure selected Go and NoGo items that were equally liked by participants before the training. 1 In the AAT, we ran a similar selection procedure check as for the GNG training. A 2 (item value, high versus low) by 2 (item condition, Approach versus Avoidance) ANOVA with pre-training ratings as dependent variable yielded a significant effect of item value on pretraining ratings F(1, 60) = 789.40, p < .001, η 2 generalized = 0.745. Item condition did not significantly predict pre-training ratings, F(1, 60) = 1.878, p = .176, nor did the interaction between item condition and value, F(1, 60) = 0.005, p = .944. This analysis confirms that snack items were correctly categorized as low or high value, and evenly distributed across item conditions.

The effect of training condition on food choice
We hypothesized that participants would be more likely to choose a Go than a NoGo snack on experimental trials. Results of the mixed logistic regression model showed that the odds of the intercept was significantly higher than one, Estimate = 0.26 (SE = 0.08), odds = 1.29, 95% CI [1.10; 1.53], confirming the hypothesis that overall participants chose Go snacks (M = 56%; SD = 15%) over NoGo snacks (M = 44.1%; SD = 14.9%) more often than chance. Moreover, these results were consistent with that from the Bayesian mixed logistic regression model, Estimate = 0.37 (SE = 0.11), odds = 1.45, 95% CrI [1.15, 1.80]), providing further support that overall participants chose Go over NoGo items more often after the training. See right panel of Fig. 1 for the patterns.
Next, we explored whether the AAT increased the likelihood of choosing Approach over items in the choice task. Results of the mixed logistic regression model showed that the odds of the intercept was significantly higher than one, Estimate = 0.166 (SE = 0.062), odds = 1.18, 95% CI [1.05; 1.33]. This showed that participants preferred Approach trained snacks (M = 54%; SD = 12%) over Avoidance trained snacks (M = 46%; SD = 12%). Moreover, these results were consistent with that from the Bayesian mixed logistic regression model, Estimate = 0.22 (SE = 0.09), odds = 1.25, 95% CrI [1.05, 1.49]), providing further support that overall participants chose approach trained over avoidance trained snacks more often after the training. See left panel in Fig. 1 for the patterns. Note that we present the results separately for low value and high value choice pairs, as value level has been shown to sometimes influence effects of response training procedures in previous work (Chen et al., 2016;Schonberg et al., 2014).

Comparison between AAT and GNG on food choice
Next, we tested whether the GNG and AAT were differently effective in changing food choice. Results showed that the effect of training task (AAT versus GNG) on choices for Go/Approach items was nonsignificant, Estimate = 0.09 (SE = 0.10), OR = 1.09, 95% CI [0.89; 1.34]. Furthermore, and for exploratory reasons, we tested whether recognition accuracy was related to the strength of the effect of the training on choice. Specifically, we computed Pearson correlations between the proportion of choices for Go or Approach items and the percentage of items that were correctly remembered as go/approach or nogo/avoidance trained. This correlation was not significant in either of the two conditions (for participants in the GNG-condition: r = 0.19, p = .15; for participants in the AAT-condition: r = 0.10, p = .43).

High value versus low value choice pairs
Then, we explored whether the effect of GNG training was different for high-value compared to low-value choice pairs. Results showed that the odds of the intercept still was significantly different from one, Estimate = 0.39 (SE = 0.09), odds = 1.47, 95% CI [1.233; 1.759]. The effect of value was significant as well, Estimate = − 0.26 (SE = 0.07), OR = 0.77, 95% CI [0.68; 0.88]. This indicates that the GNG effect on snack choice was more pronounced in choices between two high value snacks compared to choices between two low value snacks.
In two additional analyses, preference for Go over NoGo items was examined separately for high-value and low-value choice pairs. Results showed that the training effect was only significant within high value choice pairs, Estimate = 0.42 (SE = 0.13), odds = 1.52, 95% CI [1.19; 1.95]. However, if only low-value items were included in the analysis, participants did not have a preference for Go over NoGo items, Estimate = 0.15 (SE = 0.11), odds = 1.16, 95% CI [0.93; 1.44].
On filler trials, participants chose between a high-value and lowvalue snack. In these trials, results of the mixed logistic regression Fig. 1. Choices for food items in the AAT (left) and GNG (right) training conditions. Data are presented separately for high value choice pairs (choices between two high value items; dark bars) and low value choice pairs (choices between two low value items; light bars). Bars represent the mean percentage of choices across participants for either Approach items (AAT condition) or Go items (GNG condition) and dots represent the average preference for Go or Approach items within each participant. Error bars represent the 95% within-participant confidence interval. NS = not significant. * = significant difference. An effect is considered significant when the confidence interval of the confirmatory (GNG) or exploratory (AAT) mixed logistic regression analyses does not contain 1. model indicated that participants chose high-value snacks (M = 82%; SD = 14%) over low-value snacks (M = 18%; SD = 14%) more often, Estimate = 1.806 (SE = 0.145), odds = 6.09, 95% CI [4.58; 8.08].
We also explored whether the training effects in the AAT condition were different for low and high value snacks. Results showed that the odds of the intercept was significantly larger than one, Estimate = 0.21 (SE = 0.07), odds = 1.22, 95% CI [1.07; 1.41], confirming that participants preferred Approach items over Avoidance items. The effect of value of the choice pair on choices was not significant, Estimate = − 0.08 (SE = 0.07), OR = 0.92, 95% CI [0.81; 1.05], indicating that participants' preference for Approach items over Avoidance items was not different between the low and high value choice pairs. Second, we investigated whether the AAT effect on choices for Approach over Avoidance items would be present within high-value choice pairs and low-value choice pairs. In high-value choice pairs, the odds of the intercept was significantly higher than one, Estimate = 0.22 (SE = 0.10), odds = 1.24, 95% CI [1.03; 1.50]. This means that in high-value choice pairs participants were more likely to choose Approach over Avoidance items. However, in low-value choice pairs, the odds of the intercept was not significantly different from one, Estimate = 0.14 (SE = 0.10), odds = 1.14, 95% CI [0.94; 1.39]. This means that in low-value choice pairs the probability of choosing the Approach items did not significantly differ from the probability of choosing the Avoidance items.
Finally, we tested whether participants in the AAT condition would choose high-value items over low-value items more often. The odds of the intercept was significantly different higher than one, Estimate = 1.97 (SE = 0.15), odds = 7.18, 95% CI [4.58; 8.08], indicating that participants were more likely to choose high value snacks (M = 83%; SD = 14%) over low value snacks (M = 17%; SD = 14%).

Effects on food evaluation
The effect of GNG training and choices on subsequent evaluations of snack items was analysed using a repeated measures ANOVA with time (pre versus post), item value (high versus low) and item condition (Go versus NoGo) as within factors. This analysis indicated a significant ef- However, this interaction effect was not significant, and all other interaction effects were not significant either (all ps > .053; see Table 2).
The effect of AAT on evaluations was examined in the same way. Like in the GNG training, this repeated measures ANOVA showed a significant main effect of value on the ratings, F(1, 60) = 455.54, p < .001, η 2 generalized = 0.699, with items from the high value condition receiving higher ratings (M high = 49.39, SD high = 34.35; M low = − 44.96, SD low = 44.87). In the AAT condition, item condition also influenced ratings, F(1, 60) = 9.60, p = .003, η 2 generalized = 0.003, with higher ratings for Approach items compared to Avoidance items (M approach = 3.96, SD approach = 61.85; M avoidance = 0.47, SD avoidance = 61.78). The ratings did not change significantly over time, p = .802, but the interaction between value and time on ratings was significant, F(1, 60) = The direction of this effect thus indicated a regression to the mean, as in the GNG training; ratings of high-value items reduced after the training, and ratings of low-value items increased after the training.
Interestingly, the interaction between item condition and time was significant as well, F(1, 60) = 8.44, p = .005, η 2 generalized = 0.003, see Table 2. None of the other interaction effects was significant (all ps > .817). The means in Table 2 suggest that, if anything, this interaction is caused by the fact that Approach items were liked more strongly after the training compared to before the training. However, this difference was not significant, p = .121.
To investigate whether the effects of GNG and AAT on the evaluations were significantly different, we conducted a repeated measures ANOVA with time (pre versus post) and training condition (Go or Approach versus NoGo or Avoid) as within factors, and training type (GNG or AAT) as between factor. This analysis indicated a significant main effect of training condition F(1, 119) = 9.81, p = .002, η 2 generalized = 0.004, such that snacks in the Go or Approach condition received higher ratings (M go/approach = 3.87, SD go/approach = 61.71; M no-go/avoid = 0.87, SD no-go/avoid = 60.83). The interaction effect between training condition and time was significant as well, F(1, 119) = 8.74, p = .004, η 2 generalized = 0.003, showing that ratings of Go or Approach trained items on average increased stronger from before the training to after the training compared to the ratings of NoGo or Avoidance trained items (for Go/Approach: M pre = 1.31, SD pre = 63.29; M post = 6.43, SD post = 60.00; for NoGo/Avoidance: M pre = 1.18, SD pre = 63.13; M post = 0.57, SD post = 58.47). None of the other effects were significant (all ps > .102). If the AAT would have led to a significantly stronger increase in evaluations compared to the GNG, this would have been visible in a significant interaction effect between time, training condition, and training type on the ratings. However, this three-way interaction was not significant (p = .671).

The relation between choices and evaluations
Finally, for exploratory purposes, the association between participants' choices on the one hand and their evaluations on the other hand were explored by computing correlations. In the GNG, the proportion of choices for Go over NoGo items was significantly and positively correlated to the increase in evaluations (from pre to post) of Go items, r = 0.32 (p = .012; 95% CI [0.08; 0.53]), and it was significantly and negatively related to an increase in evaluations (from pre to post) of NoGo items, r = − 0.51 (p < .001 95% CI [-0.68  Note. Data of high and low value items are aggregated. The scale ranges from − 100 to 100. significantly and negatively correlated with post-evaluations of Avoidance items (r = − 0.38, p = .002, 95% CI [-0.58; − 0.15]).

Discussion
The main results of the present experiment are clear: Both GNG and AAT influence choices for food items when they are implemented with the same experimental protocol. On average, people prefer Go over NoGo items after GNG (confirmatory preregistered test), and they prefer Approach over Avoidance items after AAT (exploratory preregistered test). The fact that people preferred high over low value food items on filler trials speaks to the validity of these results. Furthermore, despite the fact that the effect of the GNG was descriptively stronger, effects were not statistically different between the AAT and GNG, suggesting that both tasks work equally well to influence food choice. The results converge well with a previous experiment showing similar effects of GNG and AAT on alcohol intake (Di Lemma & Field, 2017). The current results are important in light of recent discussions in the literature on the reasons of why GNG may work better to influence eating behavior than AAT (e.g., Aulbach et al., 2019;Yang et al., 2019). The present experiment suggests that the observed difference in the literature may be due to different experimental protocols that have been employed in the GNG literature on the one hand and the AAT literature on the other hand, as we find both trainings are effective when procedural differences are eliminated.
There were three interesting findings regarding the comparison between the two trainings. First, effects of GNG were more pronounced for high value items than low value items whereas this difference was not statistically significant for the AAT. The fact that the effect for GNG is more pronounced for high value compared to low value food items is consistent with some other findings showing that effects of GNG (Veling, Holland, & van Knippenberg, 2008;Chen et al., 2016) and other response training procedures (e.g., cue-approach training; Bakkour et al., 2016;Schonberg et al., 2014;Salomon et al., 2018) can be stronger for high value food items. This value effect is not consistently observed, however (Chen, Veling, Dijksterhuis, & Holland, 2018;a Salomon et al., 2018). One explanation for the value effect is that high value items draw more attention during the training, which has been shown to amplify the effect of GNG on food evaluation . It would be interesting to test in future work whether there are differences between the GNG and AAT in terms of how much these training procedures rely on people's attention during the training. However, it should be noted that descriptively the effect of the AAT was also stronger for high versus low-value food pairs. The effect of food value also raises the question to what degree response training tasks may be more effective in changing responses to unhealthy foods, which may be perceived as very attractive, than to healthy foods, which may be perceived less attractive.
A second and clear difference between the AAT and GNG concerned recognition memory. Participants were better in categorizing items into the go and no-go conditions after the training than they were in categorizing food items into the approach and avoidance categories. This finding again raises the question whether people may attend more closely to the food items during GNG than during AAT. Alternatively, this finding may reflect the fact that action (executing go responses) improves episodic encoding for stimuli unrelated to the action, compared to inaction (no-go responses; Yebra et al., 2019), and that it is hence harder for people to distinguish between two action relevant items (Approach or Avoidance) than between action and non-action items (Go versus NoGo). Interestingly, the relation between recognition memory and choice behavior was not significant in both conditions, suggesting recognition memory may not contribute to the strength of the training effect.
A third interesting difference was found on stimulus evaluation. Effects on evaluation were found for the AAT, but not for GNG, although this difference between conditions was not significant. The fact that we did not observe effects for GNG on evaluation is noteworthy as effects of GNG on food evaluations have been shown to be very robust, in particular lower evaluations of NoGo items compared to Go items after the training (Chen et al, 2016(Chen et al, , 2018Lawrence et al., 2015a;Veling et al., 2008;Quandt et al., 2019). One explanation for the absence of an effect of GNG on evaluation in the current experiment could be that the choice task distorted effects of the training on evaluations more strongly for GNG than for AAT (previous work on GNG and evaluation did not present a choice task before evaluation, e.g., Chen et al., 2016). After all, choices for NoGo items can be conceived of as go responses and responding to NoGo items during the choice task may hence partly serve to undo the training, weakening any effects of GNG on evaluation that would have been found without implementation of the choice task. This may be less of an issue in the case of the AAT, because people respond to all stimuli during AAT, and responses to the stimuli during the choice task may be less influential after this training.
Recent work using a similar experimental protocol as used here, but using smartphone apps as stimuli, has shown that effects of GNG may best be explained by training-induced changes in stimulus evaluation (Johannes et al., 2021). The current exploratory correlational findings indicate that increased choices for Go over NoGo items or Approach over Avoidance items were respectively accompanied by increases in evaluations of Go and Approach items. Because the evaluations were measured after the choices this could indicate that the choices increased the evaluations of these items, or that the evaluations were directly influenced by the training, or a combination of these factors. Therefore, the current correlational data do not allow for any conclusions on how the tasks influenced the choices. For instance, this effect may be caused by changes in evaluation or by changing motor responses to the items (for elaborate discussions see Chen et al., 2019;Johannes et al., 2021).
The present findings raise a number of new theoretical and applied questions. First, in light of the differences between GNG and AAT on the secondary measures just discussed, it seems important to gain a better understanding of the underlying mechanisms of how these two different training tasks influence choices. Is there a qualitative difference between learning to go or not to go versus learning to approach or to avoid food items? Do these training tasks rely similarly on evaluative, motor and attention processes? Another important question raised by the current findings is how robust effects of the current trainings are. As outlined in the introduction, in the AAT literature, there is often a measurement of the effect of the training (e.g., an approach avoidance task) between the training and the outcome measure. We show that without such a measurement we could find effects of the training on food choice. However, because we did not manipulate the presence of this measurement, we cannot conclude that it is this feature that is indeed of importance. With respect to GNG, we observed that the otherwise very robust effect of GNG on food evaluation was not found when a food choice task preceded the evaluation. This may indicate that when a measure is presented after a training that resembles the training in some way, effects of the training may become weaker. Future research is needed to test this possibility directly.

Conclusion
The goal of the present study was to provide a direct comparison of two popular response training tasks that have in the past been used to change (food) preferences. Here, we present a fully-powered and preregistered study using a rigorous experimental protocol, and internally valid measurements of food value and food choice. Our findings suggest that the AAT and GNG have similar effects on food choices, such that participants preferred food items associated with go or approach responses over food items associated with no go or avoid responses. The contribution of this research lies in directly addressing a discussion in the literature on whether AAT is less effective than GNG in influencing eating behavior, and by providing evidence that under controlled conditions they impact food choice similarly.

Ethics statement
This research was approved by the ethics committee of the faculty of social sciences at Radboud University, ECSW-2018-065.