Reliability and validity of assessing energy and nutrient intake with the Vienna food record: a cross-over randomised study

Background The Vienna Food Record was developed as a simple paper-based pre-coded food record for use in Austrian adults, which can be completed over a flexible period of time. The present study aimed at evaluating test-retest reliability of the Vienna Food Record and its concurrent validity against a weighed food record. Methods A randomised cross-over study served to compare outcomes of the Vienna Food Record with those of the weighed food record. The Vienna Food Record was completed for a second time, in order to assess test-retest reliability. Three assessment phases were interrupted by two-week wash-out phases. Sixty-seven free living Austrians aged 18–64 years, without (self-) diagnosed food allergies or intolerances, not at any medication, and not nutrition experts, were randomly assigned to one of two study arms. After drop-outs and exclusion of under-reporters, data of 35 participants has been analysed. Paired t-tests were performed for comparisons, regarding test-retest reliability and criterion validity, where mean differences were calculated as effect sizes. Consistency between repeated assessments with the Vienna Food Record was expressed by intra-class-correlation coefficients (ICC), while Pearson’s r was used for agreement regarding validity. Bland-Altman Plots with 95% limits of agreement were created for energy and macronutrients. Validity metrics for macronutrients were analysed additionally separated by gender, taking an adjustment of energy intake into account. Total energy intakes as well as intakes of macro- and selected micronutrients, expressed as daily means, were defined as 34 primary outcomes. Results ICCs for energy and intake of preselected nutrients, expressing the consistency of the Vienna Food Record, ranged from not significant to 0.95. Pearson’s correlation coefficients, expressing the agreement of the Vienna Food Record with the weighed food record ranged from not significant to 0.80. Conclusions This study demonstrates acceptable reliability and validity of the Vienna Food Record as an instrument for the assessment of energy and nutrient intake, comparable to the results of similar studies.


Background
Self-reported dietary intake is assessed by methods of prospective records and methods of recall [1]. Food records are considered an accurate way of dietary intake assessment [2]. However, especially weighed food records (WFR), are very time consuming for participants and the research personnel [3]. Recall methods are, thus, employed more frequently [1]. Amongst those, Food Frequency Questionnaires (FFQ), which evaluate a person's usual intake over a defined period of time, are relatively cheap and easy to administer [4]. Standards for the evaluation of the intake of food, nutrients, and potentially hazardous chemicals by means of 24-h recalls have been developed and validated within the European Food Consumption Validation (EFCOVAL) project [5]. Self-administered web-based applications of 24-h-recalls gained popularity recently [3,6,7]. As compared to food records, recalls have been shown to be more prone to over-reporting low intakes, and under-reporting high intakes, which has also been referred to as the flat slope syndrome [8]. While all of these aforementioned methods underlie potential sources of error, the participants' inability to fully and accurately recall their intakes, specifically applies to recall methods [1]. Prospective records, on the contrary, may be subject to influence the participants' dietary behaviour [9]. The Vienna Food Record (VFR) was developed as simple paper-based prospective food record for use in Austrian adults, which can be completed over a flexible period of time, with a minimum of three weekdays and one weekend day being recommended as a minimum to draw conclusions on the overall dietary behaviour. However, the VFR may also be used for assessing a single meal. A likewise prospective food record has been developed and validated for use in the German part of the EPIC (European Prospective Investigation into Cancer and Nutrition) project [10]. The VFR includes a brief introduction page and infographics to support the estimation of portion sizes, and can hence be completed without an interview or instruction by an expert. 182 predefined food items were selected in order to meet specific requirements of Austrian users. Within this user-centred design approach, user feedback on selected food items, clarity and usability was collected throughout three iterations. The VFR has been embedded in the software package nut.s [11]. The software integrated evaluation routine facilitates analysing a completed VFR in about 10 min, by entering the sum of portions for each food item recorded. Moreover, the VFR is freely available for non-commercial teaching and research (Creative Commons CC BY-NC-ND 3.0 AT https://creativecommons.org/licenses/ by-nc-nd/3.0/at/deed.en). Details on the development process of the VFR are published elsewhere [Bersenkowitsch I, Kogler B, Tritscher A, Visontai S, Putz P: User-centered development of a prospective estimated dietary record for use in Austrian adults: The Vienna food record, submitted]. The aim of the present study was to evaluate the VFR concerning its test-retest reliability and concurrent validity against a WFR as reference method.

Aim and design
The design of a randomised cross-over study was chosen in order to avoid bias arising from the sequence of protocol completion and to require a smaller sample size [12]. Equal numbers of participants were randomly allocated to a study arm, completing first the VFR and then the WFR, or vice versa, respectively. Finally all participants completed the VFR for a second time, with the purpose of obtaining outcomes concerning test-retest reliability. The examinations were interrupted by two-week wash-out phases in order to reduce the risk of possible carry-over effects related to diet or diet-recording behaviour. No changes to the specifications of materials, methods and outcomes were made, after the study has commenced. The collection of data ended as scheduled after the finalisation of the assessments taking place between February and March 2018.

Participants and setting
Subjects were considered eligible if they were 1) 18-64 years old, 2) without (self-)diagnosed food allergies or food intolerances, 3) not at any medication, 4) not a nutrition expert (such as nutrition scientists or dietitians) or in education therefor, and 5) provided informed consent. Participants were excluded after randomisation, if they 1) were classified as under-reporters, 2) were classified as over-reporters, 3) reported energy intakes differing more than two standard deviations (SD) between two assessments, 3) commenced with any medication due to sickness, 4) left Austria in the course of an active assessment phase. Participants were classified as under-reporters, if they reported an average daily energy intake smaller than their basal metabolic rate multiplied by 1.1, where basal metabolic rate was estimated with the Henry equations [13], based on self-reported body size and weight. Participants were classified as over-reporters, if they reported an average daily energy intake higher than 4500 kcal. Deliberations on how to critically evaluate energy intake are described elsewhere [14,15]. All participants were residents of Austria and they were visited at home by a member of the study team to receive instructions and study materials. Material was handed out and instructions were given by students of dietetics in their final year of academic education, who received 3 h of training for that purpose.
A sample size of 49 participants proved to be sufficiently statistically powered in a comparable study [16], and hence 50 participants has been aimed for. Considering an estimated drop-out rate of 25%, 67 subjects were enrolled. The principal investigator used an online sequence generator [17], to randomly allocate the participants to one of the two study arms (Fig. 1), without applying any clustering or blocking approaches. All study material (case report forms, protocols, scales, picture books) were labelled with the individual participant code, including an indication on the allocation. Due to the nature of the studied assessments, no measures were taken regarding allocation concealment and participant blinding. Data entry and analysis was done by blinded outcome assessors, where pseudonyms were created by means of participant coding.

Outcome measures
Each of the three assessments was carried out, in the time between February and March 2018, over four consecutive days, including one weekend day. Participants were instructed to select, and stick with, one out of two options: to record from Wednesday to Saturday, or from Sunday to Wednesday, respectively. For the completion of the VFR participants received no further instructions, but to read the information provided on its cover page and backside. This includes one A5-format page, explaining how to protocol consumed items and one A5-format page, showing infographics supporting the estimation of portion sizes, e.g. 150 g for a full portion of meat. For the completion of the WFR, the aforementioned students of dietetics explained the process of completing the protocol using a pre-filled example and a written guideline. In brief, the guideline instructed 1) to weigh all foods before consumption, as well as leftovers, with a kitchen scale (Soehnle Vita 65,119), 2) how to precisely describe the food item (product specifications where applicable), 3) how to indicate fat content (where applicable) and way of preparation, 4) not to skip drinks or in-between meals, 5) how to indicate out-of-home consumption. Components of mixed dishes were weighed separately. Intake of dietary supplements was recorded and included in the analyses, in both VFR and WFR. Regarding out-of-home consumption, participants were asked to take pictures of foods/dishes with their smartphone and to match their pictures later on with a print-out of the Austrian adaptation of a portion size picture book, provided with friendly permission by the International Agency for the Research on Cancer (IARC). Participants were encouraged to contact the principal investigator in case of uncertainty, and they were offered to receive an overview, compiling the results of their WFR after having all study procedures completed. Apart from that, no incentives were given. Total energy intakes and intakes of macro-and micronutrients, expressed as daily means were defined as 34 primary outcomes, where micronutrients were selected as done for the Austrian Nutrition Report 2017 [18].  21 showing the progress of participants through the phases of the crossover randomised study. Blue boxes refer to assessments with the Vienna Food Record (VFR), and orange boxes refer to assessments with the weighed food record (WFR). Participants were classified as under-reporters, if self-reported average daily energy intake was smaller than their basal metabolic rate multiplied by 1.1, where basal metabolic rate was estimated with the Henry equations [13] Dietary intake data was analysed using nut.s nutritional software (https://www.nutritional-software.at) in its recent version (May 2018) [11], based on the German food composition database Bundeslebensmittelschlüssel (BLS) and its Austrian extension. All VFR were entered into the software by one person, and all WFR were entered by another person. Gender, age, self-indicated body weight, height, and highest completed education were recorded as background information. Moreover, a system usability survey was filled in after completing the VFR twice, as a secondary pre-specified, outcome. Hence, the usability sample is equal to the one described for test-retest reliability. Five questions, in the style of the Food4Me FFQ validation study [16], assessed whether the participants perceived the VFR as 1) easy to complete, 2) time consuming, 3) interesting to complete, 4) subject to make them reflect their dietary behaviour, and 5) something they would be willing to complete again in future. For each question, one box of a closed five-level scale had to be checked, including the options "applies fully", "applies rather", "neutral", "applies rather not" and, "does not apply".

Statistical analyses performed
Shapiro-Wilk tests and graphical inspections of histograms were performed to check data for normality. Data were expressed as daily means including SD. Independent t-Tests were performed, to see whether the VFR outcomes of follow-up 2 differed between study arm A and study arm B. For comparisons, regarding test-retest reliability and criterion validity, paired t-tests were performed, and effect sizes were expressed as mean differences, including an indication as percentage. For reliability, the consistency of a test, intra-class-correlation coefficient (ICC) and standard error of measurement (SEM) are described as common metrics [19]. For the test-retest evaluation, ICCs (3.1) were calculated for absolute agreement of single values. SEM (SEM = SD √(1-ICC)) was calculated in order to examine the precision of the measurement in the unit of the specific outcome (e.g. kcal/d), where the standard deviation for all test scores was derived from the total sum of squares of the ICC's ANOVA (SD = √(SS/(n-1)) [19]. Pearson's r was used to express agreement for validity. Moreover, Bland-Altman Plots [20] with 95% limits of agreement have been created for energy and macronutrients. Due to the wide range of possible applications of the VFR, no indications for clinically acceptable deviations were defined a priori. Alpha was set at 0.05p-values rounded to the second position after decimal points are reported throughout the manuscript. Statistical analysis was performed with IBM SPSS Statistics version 24 [21].

Summary
An overview on the numbers of participants randomly assigned to a study arm, carried out assessments, and analysed for the primary outcomes is summarised in Fig.  1, by means of a CONSORT flow chart [22]. Participants' outcomes were analysed for validity, if assessments were complete at baseline and follow-up 1. Reliability analyses were carried out, if the VFR was completed twice. After drop-outs and exclusion of under-reporters, 35 participants remained in the analysis of reliability, and also, largely overlapping, 35 participants in the analysis of validity. None of the participants was classified as over-reporter. One participant was excluded based on initial outlier screening, with more than two SD difference in energy intake between two assessments, both for reliability and validity. For 4 out of 34 observations (protein, cholesterol, iodine, phosphorus), the VFR outcomes in follow-up 2 differed significantly between the two study arms. Based upon the overall consistency, a pooled reliability analysis, merging both study arms, was carried out. For outcomes like Vit D and alcohol, neither the Shapiro-Wilk test nor the graphical inspection supported the assumption of approximate normality. However, due to the high share of participants scoring "0" in these outcomes, parametric options for data presentation and inferential statistics were performed for those and all outcomes.

Overview of the study population
Baseline demographic characteristics are summarised in Table 1. Men are underrepresented in the sample, and on average younger than female participants. Due to this eventual lack of gender balance, gender separated validity outcomes were analysed additionally for macronutrients, taking an adjustment of energy intake (2500 kcal/d) into account. Body mass indices of all participants analysed, ranged from 17.0-31.1 kg/m 2 , based on self-reported indications of body weight and height. 29% of the participants analysed, indicated a university degree as highest completed education, while 21% did not have a general qualification for university entrance. When comparing participants included in the validity analysis (n = 35), with those lost to follow up, excluded as under-reporters or outliers (n = 30), no significant deviations were observed regarding, age, BMI, and level of education. Another two participants were randomly assigned to a study arm, but did not report such data.
Concerning the analysis of reliability and usability, there were no significant deviations regarding age and level of education. However, BMI differed significantly (p = 0.01) between participants analysed (n = 35, mean: 22.2, SD: 3.0) and those lost to follow-up and excluded (n = 30, mean: 24.3, SD: 3.6).
Test-retest reliability of the VFR  Table 2 displays SEM as a further metric of consistency.
Agreement of the VFR with the WFR as reference method  Figure 2 shows Bland-Altman plots with 95% limits of agreement, comparing the VFR with the WFR for energy and macronutrients. As for reliability, the energy intake was fairly similar between the two methods, with a mean difference of 78 kcal/d. Proportional bias was observed for fat (beta: 0.39; T: 2.40, p: 0.022), but not for energy, protein, and carbohydrates. Table 4 shows exploratory outcomes related to criterion validity for macronutrients as shown in Table 3, but separated by gender. In order to adjust for gender differences in energy intake, these data were divided by the respective energy intake in kcal/d and multiplied by 2500. Agreement was markedly stronger in men (ranging from 0.64-0.87), as compared to women (ranging from 0.26 (not significant) -0.52).

System usability of the VFR
In terms of the VFR's system usability, 26 of 35 (74%) perceived it as easy or rather easy to complete, 10 of 35 (29%) indicated that it was time consuming or rather time consuming, and 23 of 35 (66%) found it interesting or rather interesting to complete. 26 of 35 (74%) fully or rather agreed that the VFR made them reflect their dietary behaviour. 20 of 35 (57%) fully or rather agreed that they would be willing to complete the VFR again in future. Generally, no important harms or unintended effects have been observed.

Discussion of the results
The energy intake derived from the VFR appeared to be fairly reproducible over time (ICC = 0.69), strongly correlated to the WFR as reference method (r = 0.78), and generally within a very plausible magnitude (test: mean 2245, sd: 496 kcal/d; retest: mean 2301, sd: 596 kcal/d).
In the first VFR, men reported a mean energy intake of 2736 kcal/d (n = 11, sd: 433); while women reported a mean intake of 2020 kcal/d (n = 24, sd: 338 kcal). A likewise Danish study validating a pre-coded food diary, reported a mean energy intake of 2317 kcal (sd: 573) for both genders [23]. Differences between the two assessments were found for some micronutrients, both for reliability and agreement with the reference method. While ICCs for energy and nutrient intake regarding test-retest reliability of the VFR ranged from 0.01-0.95, Pearson's correlation coefficients, expressing agreement with the reference method, ranged from 0.15-0.80. The aforementioned Danish study, reported Pearson's correlation coefficients from 0.16-0.71, in this context [23]. Energy, protein, and carbohydrates were among the outcomes with the highest correlation coefficients, being above 0.7 in both studies. Validation studies on self-administered web-based applications of 24-h-recalls, reported Pearson's correlation coefficients up to 0.75 [6], and from 0.06-0.64 [7], respectively. The number of outcomes and specifically micronutrients analysed in these studies, were fairly similar. When testing energy adjusted macronutrient intake, validity of the VFR was better in men (average r: 0.74) than in women (average r: 0.41). Supposing that women are able to complete the VFR as accurate and complete as men do, this may be due to a generally higher inconsistency of diet behaviour in women. Although only 57% of the participants fully or rather agreed that they would be willing to complete the VFR again in future, there was generally positive response in terms of system usability.

Implications for clinical practice
For clinical practice, the VFR may be recommended for estimations of energy and nutrient intake, e.g. as a basis for a dietetic consultation. For the interpretation of repeated assessments from a single client, clinicians may also calculate the so called minimal detectable change (MDC) based on values of SEM provided in Table 2. MDC 95 indicates the required change of a measurement that would, with a certainty of 95%, exceed the outcome's test-retest variability (MDC 95 = SEM × 1.96 x √2) [24]. Health professionals, like dieticians, may also find the additional software output "intake of food groups" useful. As indicated in the VFR introduction page, a minimum of 4 days including one weekend day should serve as basis to draw conclusions on the overall dietary behaviour. The VFR may also serve as an instrument for research studies conducted in Austria, while for cross-national comparisons preference should be given to instruments that were specifically designed for this purpose.
Conclusions on the overall dietary behaviour, based upon a 4 day assessment with the VFR, need to be drawn more conservatively for women, as compared to men.

Strengths and limitations
The randomised cross-over design, with the WFR carried out in a rigid and transparent way, with extensive  users' behaviour and thus introduce bias [9,23]. Since nutritional epidemiology is still lacking a gold-standard measurement, such a "validation study" can only aim at understanding the structural equation of the measurement error model rather than to assess the validity of an instrument measuring dietary intakes. Hence, the administration of a combination of both, objective biomarkers and subjective reports is becoming increasingly popular to address methodological limitations, such as comparing one self-reported tool against another [1].

Conclusions
The VFR is a simple paper-based pre-coded dietary intake record, which is fully flexible regarding the duration of logging. However, the provided details regarding its reliability and validity refer to a period of four consecutive days, including one weekend day. The study resulted in acceptable reliability and agreement with the reference method, with a very plausible estimation of energy intake. These results are comparable to those of similar validation studies of prospective records carried out in other countries.