The German Version of the Dutch Eating Behavior Questionnaire: Psychometric Properties, Measurement Invariance, and Population-Based Norms

The Dutch Eating Behavior Questionnaire is an internationally widely used instrument assessing different eating styles that may contribute to weight gain and overweight: emotional eating, external eating, and restraint. This study aimed to evaluate the psychometric properties of the 30-item German version of the DEBQ including its measurement invariance across gender, age, and BMI-status in a representative German population sample. Furthermore, we examined the distribution of eating styles in the general population and provide population-based norms for DEBQ scales. A representative sample of the German general population (N = 2513, age ≥ 14 years) was assessed with the German version of the DEBQ along with information on sociodemographic characteristics and body weight and height. The German version of the DEQB demonstrates good item characteristics and reliability (restraint: α = .92, emotional eating: α = .94, external eating: α = .89). The 3-factor structure of the DEBQ could be replicated in exploratory and confirmatory factor analyses and results of multi-group confirmatory factor analyses supported its metric and scalar measurement invariance across gender, age, and BMI-status. External eating was the most prevalent eating style in the German general population. Women scored higher on emotional and restrained eating scales than men, and overweight individuals scored higher in all three eating styles compared to normal weight individuals. Small differences across age were found for external eating. Norms were provided according to gender, age, and BMI-status. Our findings suggest that the German version of the DEBQ has good reliability and construct validity, and is suitable to reliably measure eating styles across age, gender, and BMI-status. Furthermore, the results demonstrate a considerable variation of eating styles across gender and BMI-status.


Introduction
containing restraint, external eating and a two-dimensional emotional eating factor: eating in response to clearly labeled emotions (e.g., fear, anger) and eating in response to diffuse emotions (e.g., loneliness), which was also found for the Dutch original version [6]. In contrast, three factor solutions containing the three theoretical domains (restraint, emotional eating, external eating) were found for the English version [13] and other translations thereof [14]. Thus, the evidence with regard to the factorial validity of the DEBQ is somewhat controversial and the evaluation of the factor structure of the German version is limited to an exploratory approach applied in a small convenience sample of college students. Furthermore, item statistics for the German version of the DEBQ have not been systematically reported. Therefore, the first aim of our study was to evaluate the psychometric properties of the German version of the DEBQ, including item statistics, factorial structure using exploratory and confirmatory factor analysis (CFA), and internal consistency in a large population-based sample of the German general population.
Another important aspect of factor-analytic studies is to determine the measurement invariance across relevant subgroups. Studies examining DEBQ subscales as a function of gender and BMI-status consistently showed higher scores on DEBQ subscales among individuals with overweight compared to normal weight individuals [16,19,[27][28][29]. Women were shown to score significantly higher than men on the restraint and emotional eating scale [13,16,22,30]. Research on the influence of age on the DEBQ subscales is limited indicating lower scores among older age groups on the emotional eating subscale [16,22]. Such between-group differences can only be interpreted when measurement invariance is given, i.e., if the same construct is measured in every subgroup [31,32]. Measurement invariance across gender, age, and BMIstatus has rarely been assessed for the DEBQ [16,20,22]. It was confirmed for the 20-item DEBQ childrens' version [20] and a 16-item French version adapted for the older population [22]. Dakanalis et al. [16] found evidence for measurement invariance of a 33-item Italian translation of the DEBQ across gender, age, and BMI-status. Measurement invariance has, however, not been examined for the 30-item German version of the DEBQ thus far. The second aim of our study was therefore to examine the measurement invariance of the German version of the DEBQ across gender, age, and BMI-status (underweight/normal weight, overweight, obesity) and to explore the distributions of eating behavior across gender, age, and BMI-status. Furthermore, while norms for different subgroups are available for the Dutch and English versions of the DEBQ [6,13], norms for the German general population have not yet been reported. Therefore, the third aim of study was to provide population-based norms for the German version of the DEBQ based on a representative sample of the German general population.

Sampling
Between March and May 2015, a representative sample of the German population was recruited for a cross-sectional questionnaire survey. A three-stage random sampling procedure was conducted: 1) selection of 258 sample point areas using a random allocation procedure, (2) random selection of target households within the sample point areas through a random route procedure, (3) random selection of target persons within target households using a kish selection grid. Participants were in included in the study if they were 14 years or older, fluent in German, and provided written informed consent. The study was conducted according to the ethical standards of the Declaration of Helsinki and was approved by the Ethical Review Committee of the University of Leipzig.

Field work and measures
The field work was conducted by an independent market and social research agency (USUMA, Berlin, Germany). Selected individuals were approached in-person by a trained interviewer. At maximum, four attempts were made to reach a target person. Potential participants were informed about the study and provided written informed consent including additional parental consent for minor participants. Interviewers collected sociodemographic information face-toface. Afterwards, participants anonymously filled out a battery of self-report questionnaires, including the German version of the DEBQ [18] with 30 items on a five-point Likert scale (1 = 'never', 5 = 'very often'; for a detailed description of the German version of the DEBQ see introduction section). Body weight and height were assessed by self-report and the BMI was calculated using the formula: BMI = body weight (kg)/(body height (m 2 )). BMI was categorized into underweight/normal weight (BMI < 25 kg/m 2 ), overweight (25 kg/m 2 BMI < 30 kg/m 2 ), and obesity (BMI ! 30 kg/m 2 ) [33].

Participants
A total of 4844 individuals were randomly sampled. Of these, 2576 agreed to participate and participated in the assessment (response rate: 53.2%). Reasons for non-participation were that households could not be reached (13.8%) or refused to participate (14.6%), that target persons could not be reached (2.0%), were out of town (0.6%) or incapacitated (0.4%), and that target persons refused to participate (15.4%). Overall, 63 cases had to be excluded from the analyses resulting in a final data set of 2513 individuals. A detailed overview of descriptive data is displayed in Table 1. Of the total study sample, 1394 (55.5%) were women and 1119 (44.5%) were

Statistical analysis
All statistical analyses, if not otherwise indicated, were conducted using SPSS 20 and the significance level was set to α = .05. Item analyses. We examined the item descriptives, percentages of missing values, item difficulties (in %) using the formula p i = ((" x i −min(x i ))/(max(x i )-min(x i )) Ã 100 (with " x i = mean of item i; min(x i ) = minimal value on item i; max(x i ) = maximum value on item i), and corrected item-total-correlations.
Internal factor structure. The internal factor structure of the DEBQ as an indicator of construct validity was evaluated by means of a split-half factor analysis approach. The total sample was randomly divided into two subsamples using the SPSS 20 random case selection procedure. In a first step, a Principal Axis Factor analysis (PAF) with VARIMAX rotation was conducted in the first split-half sample. Extraction criteria were eigenvalues > 1 in conjunction with a visual inspection of the scree plot. In a second step, a second PAF was conducted using a pre-determined number of factors according to the number of factors extracted in the first analysis and a criterion for factor loadings of ! 0.40. Then a CFA was performed in the second split-half sample testing the model obtained in the EFA using AMOS 23. The Standardized Root Mean Square Residual (SRMR) and the Root Mean Square Error of Approximation including the 90% confidence interval were used as absolute fit indices, and the Comparative Fit Index (CFI) and the Tucker-Lewis Index (TLI) were used as comparative fit indices. SRMR values < .08 indicate a good model fit. RMSEA values < .05 can be considered as good fit and values between .05 and .08 as acceptable fit. CFI and TLI values ! .90 indicate a good model fit [34,35].
Reliability. The internal consistency of the subscales was determined using Cronbach's α. Mean inter-item correlations were calculated as an indicator of subscale homogeneity. Measurement invariance across gender, age, and BMI-status. To test the measurement invariance across different subgroups, the sample was split by the variables of interest (gender, age, BMI-status). Firstly, the factorial model found in the total sample was fitted for different subgroups of age, gender, and BMI-status. Successive multi-group CFAs with nested models [31] were conducted to examine the measurement invariance across gender, age, and BMI-status using AMOS 23 [36]. In a first step, a multi-group baseline model (Model 0) was fitted. All subsequent models were compared to the baseline model. In a second step, we tested for metric invariance assuming the equivalence of factor loadings across groups (Model 1). In a third step, scalar invariance was tested by assuming equal factor loadings and equal item intercepts across groups (Model 2). As the Χ 2 -difference test is very sensitive to large sample sizes, Δ CFI was used as indicator of model equivalence. According to Cheung and Rensvold [32], changes ! .01 in the CFI between two nested models indicate a significant decrease in model fit and lead to a rejection of the constrained model.
Distribution of DEBQ subscales scores across gender, age, and BMI-status. Effects of gender, age and BMI on DEBQ subscale scores were assessed using multi-and univariate three-factorial (Gender Ã Age Ã BMI-status) analyses of variance (ANOVA) and post-hoc tests. Interaction effects were specified up to second-order (Gender Ã Age, Gender Ã BMI-status, Age Ã BMI-status). Results of univariate ANOVAs and post-hoc tests were only interpreted in case of significant higher-order effects. Partial η 2 was calculated as estimation of effect size. Partial η 2 -values of 0.01 can be considered small and values of 0.06 and 0.14 medium and large effects, respectively. Normative data. We calculated percentile ranks based on the DEBQ subscale raw scores for different subgroups.

Item analyses
The item characteristics according to the numbering and subscale attribution of the German version of the DEBQ are displayed in Table 2. The percentage of missing values was low ( 0.7%) for all items with a mean percentage of missing items per subscale of 0.4% (SD = 0.07). About 37.0% (SD = 18.12) of the scores per domain fell into the lowest category ('never'). The emotional eating domain showed highest percentages of scores in the lowest category (M = 55.9%, SD = 6.58) while lowest percentages were found for the external eating domain (M = 19.8%, SD = 9.80). All items, except for three items of the external eating domain were positively skewed. Items of the restraint and external eating domain showed a negative kurtosis and items of the emotional eating domain showed a positive kurtosis. Item difficulties ranged between 13.3% and 50.0% indicating a low to medium probability of scores > 1 ('never'). Corrected item-total-correlations were moderate to high (0.48 r it 0.80).

Internal factor structure
The PAF conducted in the first randomly selected subsample (N 1 = 1152) revealed three factors with eigenvalues > 1, which were confirmed by a visual inspection of the scree plot and a second PAF using a pre-determined number of three factors and factor loadings ! 0.  Table 3. A three-factor model with correlated factors (restraint, emotional eating, external eating) and cross-loadings fixed to 0 revealed acceptable scores for the CFI, RMSEA and SRMR. Each item significantly loaded on the specified factor (all p < .001). The TLI was slightly out of the acceptable range (TLI = .895). Additional inspection of the modification indices (MIs) and standardized expected parameter changes (SEPCs) showed that this was a result of correlated unique variances of items 8 and 9, which both assess eating in response to diffuse emotions (eating when bored, eating when nothing to do). After allowing the unique variances of both items to correlate (three-factor model (a)), the model fit substantially increased, indicating a good model fit (Table 3). PFA factor loadings and CFA standardized coefficients are displayed in Table 2. Measurement invariance across gender, age, and BMI-status Table 3 shows the fit measures of the multi-group models for testing measurement invariance across gender, age, and BMI-status. In a first step, the same three-factor model (with correlated unique variance between items 8 and 9) which revealed a good fit in pervious CFA was calculated for both genders, the seven age groups and underweight/normal weight, overweight and obese individuals. Model fit was acceptable for all subgroups, with the exception of age groups 14-24 years and ! 75 years where the CLI and TLI were slightly below the acceptable range while all other fit indices scored above the cut-offs.
Measurement invariance was tested using multi-group comparisons with nested models. Considering gender, model 0 -assuming that the three-factor model fits for both groups with varying parameter values (configural invariance)-showed an adequate fit. The model fit for model 1 -assuming equal factor loadings (metric invariance) was nearly similar and Δ CFI was inferior to .01. Although the fit of model 2 -assuming equal factor loadings and equal item intercepts (scalar invariance)-showed a slightly worse fit than model 0, Δ CFI was still inferior to 0.01 suggesting that measurement invariance can be assumed across gender. Similar results were found for age and BMI-status (Δ CFI inferior to .01) indicating that the structure, factor loadings and item intercepts are invariant across age and BMI-status (Table 3). Notes. CFA = confirmatory factor analysis; CFI = comparative fit index; CI, confidence interval, RMSEA = Root Mean Square Error of Approximation; SRMR = standardized root mean square residual; three-factor model = correlated factors with cross-loadings fixed to 0; three-factor model (a) = correlated factors with cross-loadings fixed to 0 and correlation of unique variances between items 8 and 9 (diffuse emotions); Δ CFI, difference values between model 1/2 and model 0 a , three-factor model with correlation of unique variances between items 8 and 9 (diffuse emotions).  Women showed significantly higher scores of restraint and emotional eating than men. Posthoc test revealed that younger age groups showed significantly higher scores of emotional eating and external eating than older age groups. Individuals with overweight and obesity scored significantly higher on all DEBQ subscales compared to normal weight individuals. On the emotional and external eating subscales, individuals with obesity also showed significantly higher scores compared to individuals with overweight (Table 4). Univariate analyses also revealed significant interaction effects of Age Ã BMI-status on the emotional eating and external eating subscales (all p-values < .01). Effect sizes were, however, negligibly small (η 2 0.01), and the interaction effect may thus be attributable to the large sample size. No interaction effects were found for BMI-status and gender. Cronbach's α of the DEBQ subscales was assessed in every subgroup showing a good internal consistency across different groups (.88 α .96). In all subgroups, DEBQ subscales were significantly correlated (.13 r .61, all p-values < .05) except for the the restraint and external eating subscale among participants with obesity (r = .10, p = .056). Highest correlations were consistently found for the emotional and external eating subscale (.54 r .61, all p-values < .001).

Normative data
Given the measurement invariance of the DEBQ subscales across gender, age and BMI-status and the significant main effects for gender, age and BMI but negligible interaction-effects between the variables, we calculated percentile ranks based on the DEBQ subscale raw scores for gender, age and BMI-status separately. Age-groups with homogeneous eating behavior were combined (S1-S3 Tables).

Discussion
The DEBQ, originally developed by van Strien et al. [20], is a prominent questionnaire which is extensively being utilized internationally to assess three theoretically derived (psychosomatic theory, externality theory, restraint theory) types of eating behavior associated with weight gain and overweight: restraint, emotional eating, and external eating. Our study aimed to extend previous work on the 30-item German version of the DEBQ by examining its dimensional structure using an exploratory and confirmatory approach, by testing configural, metric and scalar measurement invariance across gender, age, and BMI-status, and by providing norm data based on a representative sample of the German general population. Furthermore, item descriptives and internal consistency were evaluated.
Item descriptives showed that the German version of the DEBQ was very well accepted by the participants as reflected by the very low percentage of missing values on DEBQ items ( 0.7%). Item difficulties were medium to high and items were mostly positively skewed supporting the questionnaire's focus on atypical eating behaviors. Item difficulties and skewness were highest for emotional eating which is considered an atypical response to emotional distress [7,8] and lowest for external eating. Results for corrected item-total correlations and item homogeneity were satisfactory reflecting findings for the Dutch original version [6]. Cronbach's α exceeded the critical value of .80 for all subscales indicating good internal consistency, which is in accordance with findings for the Dutch original version [6] and previous findings for the German version of the DEBQ, based on small convenience or clinical samples [18,[23][24][25][26]. The internal consistency of the DEBQ subscales was equally high across genders, different age-groups and BMI-status, indicating that the German version of the DEBQ reliably measures eating behavior across different subgroups.
The construct validity of the German version of the DEBQ was examined using EFA and CFA. Results from the EFA indicated a three-factor solution which replicated the three theoretical domains restraint, emotional eating and external eating and explained 56.6% of the variance. The finding of substantial cross-loadings of items 8 (eating when nothing to do) and 9 (eating when bored) has also been observed for the original [6] and translated (e.g. English, French, Spanish) versions of the DEBQ [13][14][15], reflecting the partial inter-relatedness of the concepts of external and emotional eating [16]. The three-factor model showed an acceptable fit in a CFA confirming the three main theoretical factors of the original version [6] and the three-factor structure which has been reported for the English version of the DEBQ [13] and translations thereof [14][15][16]. The model fit substantially improved when allowing for unique variances of items 8 (eating when nothing to do) and 9 (eating when bored) to correlate. This modification is reasonable because both items assess eating in response to diffuse emotions while most other items loading on the factor assess eating in response to clearly labeled emotions. Furthermore, both items showed substantial cross-loadings on the external eating domain in EFA. Thus, the correlated unique variances of both items may resemble an underlying diffuse emotions or external eating latent construct. In the German translated version of the DEBQ the wording of both items is quite similar due to a rough translation of 'restless' into 'nothing to do' (item 8: 'I have the desire to eat when I have nothing to do'; item 9: 'I have the desire to eat when I am bored or I have nothing to do'), which may also contribute to the fact that both items also represent manifest variables for another latent concept which was not specified in the model.
Besides one study evaluating the measurement invariance across gender, age and BMI-status in an adult sample using a 33-item Italian translation of the DEBQ, our study is the first to systematically examine measurement invariance across these subgroups for the 30-item German version of the DEBQ in a large-scale study taking into account the entire life-span (14-94 years). Using multi-group comparisons of nested models, our results demonstrate configural, metric, and scalar measurement invariance across gender, age, and BMI-status (normal weight, overweight, obesity) indicating that the German version of the DEBQ validly measures restraint, emotional eating and external eating across different subgroups. However, considering the models for the youngest (14-24 years) and oldest (! 75 years) in the study, CFI and TLI scored slightly out of the acceptable range, while the RMSEA and the SRMR were still in the acceptable range, indicating that it may be of value to develop adapted versions of the DEBQ for the older and oldest population as it has been done by Bailly et al. [22] and adolescents.
The significant positive correlation between emotional and external eating found in the total sample and across all subgroups corroborates findings from previous studies [6,[13][14][15][16]. The finding also corresponds with the theory that, although emotional eating and external eating are different concepts, both emotionality and food cues can operate together and promote eating behavior disregarding internal signals of hunger or satiety [6,16,19,20]. Furthermore, negative emotions or emotional distress may increase the awareness of the environment (e.g., the immediate food environment) and decrease the awareness of the self [37]. Contradictory to previous findings [13,16,19,20] where non-significant or even negative correlations were found between restrained eating and both emotional and external eating, we found lower but still significant positive correlations in the total sample and most subgroups. While the correlation between external eating and restraint was negligibly low, a medium-low relation was found between emotional eating and restraint indicating that at least for some individuals who consciously restrict their food intake emotional distress may be a factor leading to a disruption of restrictive cognitive control [6].
Considering the distributional analysis in a representative sample of the German population, external eating was found the most prevalent eating behavior in the total sample and across gender, age, and BMI-status, followed by restraint and emotional eating. Corroborating findings from previous studies [13,16,22,30,38] women showed higher levels of emotional eating and restraint. Higher scores of restraint among women reflect the fact that women are more likely than men to diet [16,22], which may be explained by higher levels of weight and shape concerns among women [39] and current standards for female beauty in the society [16,22]. Most prominent effects of age were found for external eating indicating higher levels of external eating among younger age groups compared to older age groups, which is in line with prior research [14,27]. Furthermore, BMI-status and eating behavior were significantly related, with individuals with obesity and overweight scoring significantly higher than normal weight individuals on all DEBQ subscales and individuals with obesity scoring significantly higher than overweight individuals on emotional and external eating subscales. This result supports the theoretical basis of the DEBQ. The finding that restrained eating is more frequent in overweight and obese individuals is in line with several previous studies [14,19,27,28] supporting the assumption that, paradoxically, restrained eating may be a risk factor for overconsumption and weight gain when cognitive self-control is undermined [10]. Based on these findings, population norms were provided for several subgroups according to gender, age and BMI-status.
A major strength of our study is the use of a large, population-based sample representative of the German general population with regard to gender and age. To the best of our knowledge, our study is the first to examine the factor structure of the German version of the DEBQ in a large sample using a confirmatory approach and to prove its measurement invariance across gender, age, and BMI-status. The provided population based norms will enhance the applicability of the German version of the DEBQ with regard to individual diagnostics. Major limitations of our study concern the fact that we did not collect data on retest-reliability and criterion validity, which means that two important other indicators of psychometric quality were not evaluated. The cross-sectional design of the study does not allow for conclusions about the direction of obtained associations (e.g., between eating behavior and overweight) and precludes the assessment of measurement invariance over time. Furthermore, the calculation of BMI was based on self-reported body weight and height. Although it has been shown that self-reported and objectively measured BMI are strongly correlated, BMI based on self-report tends to underestimate the real BMI due to a typical overvaluation of body height and an undervaluation of body weight in self-report [40]. The BMI-distribution in our study differs from the BMI-distribution found in the large representative 'German Health Interview and Examination Survey for Adults' (DEGS1 study, N = 7116) conducted by the Robert Koch Institute between 2008 and 2011 using objectively measured body weight and height. While in our study 49.3% women and 61.3% men were overweight or obese, the prevalence of overweight and obesity was estimated at 53.0% for women and 67.1% for men. The prevalence of obesity was 16.7% among women and 14.5% among men, while the DEGS study found prevalence figures of 23.3% and 23.9%, respectively [4]. This discrepancy may at least partly be attributed to the use of self-reported height and weight and limits the generalizability of our findings.
In conclusion, our study demonstrated that the 30-item German version of the DEBQ [18] has adequate psychometric properties and reliably measures eating behavior across age, gender, and BMI-status. Furthermore, our study indicated measurement invariance of the DEBQ across gender, age, and BMI-status supporting the assumption that the construct validity of the instrument is not influenced by these variables. The provided population-based norms can be applied for diagnostic purposes. Detailed information on different eating styles is of value for the prevention and treatment of overweight and obesity.

Author Contributions
Conceived and designed the experiments: EB AK AH MdZ.