Evaluation of a Diet Quality Index Based on the Probability of Adequate Nutrient Intake (PANDiet) Using National French and US Dietary Surveys

Background Existing diet quality indices often show theoretical and methodological limitations, especially with regard to validation. Objective To develop a diet quality index based on the probability of adequate nutrient intake (PANDiet) and evaluate its validity using data from French and US populations. Material and Methods The PANDiet is composed of adequacy probabilities for 24 nutrients grouped into two sub-scores. The relationship between the PANDiet score and energy intake were investigated. We evaluated the construct validity of the index by comparing scores for population sub-groups with ‘a priori’ differences in diet quality, according to smoking status, energy density, food intakes, plasma folate and carotenoid concentrations. French and US implementations of the PANDiet were developed and evaluated using national nutritional recommendations and dietary surveys. Results The PANDiet was not correlated with energy for the French implementation (r = −0.02, P>0.05) and correlated at a low level for the US implementation (r = −0.11, P<0.0001). In both implementations, a higher PANDiet score (i.e. a better diet quality) was associated with not smoking, having a lower-energy-dense diet, consuming higher amounts of fruits, vegetables, fish, milk and other dairy products and lower amounts of cheese, pizza, eggs, meat and processed meat, and having higher plasma folate and carotenoid concentrations after controlling for appropriate factors (all P<0.05, carotenoid data for US not available). Conclusions The PANDiet provides a single score that measures the adequacy of nutrient intake and reflects diet quality. This index is adaptable for use in different countries and relevant at the individual and population levels.


Introduction
Nutritional epidemiology typically involves the analysis of associations between a specific nutrient, food or food category, and health-related outcomes. Such an approach fails to consider the complexity of the diet as a whole, which includes multiple correlations between foods and nutrients. Dietary patterns are complementary to classical analyses because they can tackle the diet complexity using a holistic approach [1]. There are two main approaches for characterizing dietary patterns in a population. The first approach, known as 'a posteriori', uses data-driven techniques such as principal component analysis and cluster analysis [2,3]. The second approach, known as 'a priori', defines dietary patterns based on current nutrition knowledge, mainly expressed as food or nutrient based dietary guidelines [4][5][6]. The overall adherence or proximity to these dietary patterns is used to build indices of diet quality. The majority of existing indices are based on the traditional Mediterranean diet or national foodbased dietary guidelines.
One practical drawback of the food-based dietary guidelines approach is that indices can rarely be applied to populations with different dietary practices and therefore must be adapted [7][8][9] or developed [10]. Another drawback is the paucity of nutritional evidence used to construct a food-based index. In contrast, there is a large body of evidence regarding nutrient intakes (including recommended dietary intakes and lower and upper intake levels) that has not often been used to estimate the overall quality of the diet. Nutrient-based diet quality indices are robust and adaptable to different populations and countries. For example the Mean Adequacy Ratio index is used as an indicator of nutritional quality [11,12] and the Mean Probability Adequacy index provides a composite measure of adequacy of several nutrients [13,14]. However these indices do not take into account the upper levels of intake and therefore cannot be used to estimate the overall quality of the diet.
Lastly, it has been reported that current diet quality indices present many theoretical and methodological limitations [4][5][6]15], including a lack of evaluation or validation. This is due partly to a lack of criterion for estimating diet quality and a lack of amenability to classical criterion validation. Nevertheless, some studies have proposed strategies to evaluate the validity of diet quality indices [10,16] or a nutrient profile model [17] based on relevant methodologies developed in the psychometric sciences for questionnaire scales [18,19].
The aim of this study was therefore to develop a new diet quality index based on the intake of all nutrients, using a probabilistic approach for estimating the adequacy of nutrient intake [20], and to carry out an evaluation of its validity using French and US national survey data.

Subjects and Data
Data used in this study came from the French Nutrition and Health Survey (Etude nationale nutrition santé -ENNS, [2006][2007] and the US National Health and Nutrition Examination Survey (NHANES, 2007(NHANES, -2008. The design, methodology and results of ENNS have been described in full elsewhere [21]. Briefly, the ENNS survey was a multistage stratified descriptive cross-sectional survey undertaken on a randomly selected sample of non-institutionalized 18-74 years olds living in mainland France. Dietary data were collected using three 24-hour recalls (one of which was on the weekend) randomly selected within a 2-week period. Dietary recalls were conducted over the telephone by trained dieticians. Nutritional values for energy and nutrients came from a previously published nutrient database [22], updated to include recently marketed foods and recipes. Blood samples were collected for determination of plasma folate using competitive immunoassay with direct chemiluminescence and for determination of alpha-and beta-carotene using HPLC.
The design, methodology and results of NHANES has also been described in full elsewhere [23]. Briefly, the NHANES survey was a multistage stratified descriptive cross-sectional survey on a randomly selected sample of the civilian non-institutionalized population of the US, 20 to 80 years old. Subjects completed two 24-hour recalls, the first of which was collected in-person by trained dieticians and the second was collected over the telephone between 3 and 10 days later. Nutritional values for energy and nutrients came from the USDA's Food and Nutrient Database for Dietary Studies 4.1 (FNDDS 4.1). Blood samples were collected for determination of plasma folate using the microbiologic assay. Carotenoid data were not collected.
In both surveys, mean individual intakes of food (in grams) and nutrients were calculated, including a weighting for the day of the week (weekday or weekend day). Nutrient intakes were expressed as absolute values and as a percentage of total energy intake, excluding energy from alcohol. In the present study, the food and drink items from ENNS (n = 1427) and NHANES (n = 7177) food composition databases were classified into thirty-seven food categories. These food categories are principally the same for the two databases however some minor discrepancies exist due to differences in data collection and coding procedures.
Of those subjects who completed the surveys (n = 3115 in ENNS and n = 5935 in NHANES), we excluded those who (i) did not provide complete dietary data (complete data was defined as three 24-hour recalls in ENNS and two 24-hour recalls in NHANES), (ii) had missing information for analysed variables or variables required for the development of the index (e.g. bodyweight), (iii) were pregnant or lactating and (iv) were identified as over-or under-reporters based on the method proposed by Black and colleagues [24]. This resulted in a final number of 1330 subjects in ENNS (43% of those who completed the survey) and 2391 subjects in NHANES (40%) available for the analysis.
Development of a Diet Quality Index Based on the Probability of Adequate Nutrient Intake (PANDiet) The PANDiet aims to measure the overall diet quality of an individual through the probability of having an adequate nutrient intake.
We used the probabilistic approach developed by the Institute of Medicine [20] to estimate, for each individual, if the usual intake of a nutrient was adequate. The calculation of the probability takes into account the number of days of dietary data, the mean intake and the day-to-day variability of intake, the nutrient reference value and the interindividual variability ( Figure 1). Values range from 0 to 1, where 1 represents a 100% probability that the usual intake was adequate For each nutrient, adequate intake was assumed to be the level likely to satisfy the nutrient requirements and unlikely to be excessive and elicit adverse health effects. Therefore, we assessed separately the probability that the intake was adequate inasmuch as it satisfied the requirement, on one hand, and the probability that it was not excessive, on the other hand. Consequently, the PANDiet was constructed based on two sub-scores -the Adequacy sub-score and the Moderation sub-score.
The Adequacy sub-score was calculated as the average of the probability of adequacy for items for which the usual intake should be above a reference value, multiplied by 100. According to the nutrient reference values, the probability was determined as follows:

1)
For the majority of nutrients, the probability was determined from the distribution of requirements as specified by the Estimated Average Requirement (EAR) and the variability of the requirement in the specific population.

2)
For some nutrients, the probability was determined from the same principle using the Adequate Intake (AI) instead of the EAR. Because interindividual variability is not specified for the AI, it was set at the same value as the variability for most nutrient requirements, 15% for France [25] and 10% for the US [20].

3)
For total carbohydrate and total fat, the recommended dietary intakes are expressed as a percentage of energy intake excluding alcohol and represented by an acceptable distribution range in both French and US recommendations. The probability was calculated using the lower bounds of the acceptable distribution range. Because the use of an acceptable distribution range already accounts in part for the interindividual biological variability, no variability value was added.

4)
For iron, the probability was determined using published values [31].
The Moderation sub-score was calculated as the average of the probability of adequacy for items for which the usual intake should not exceed a reference value and penalty values, multiplied by 100. According to the nutrient reference values, the probability was determined as follows: 1) For protein (upper bound), SFA, cholesterol and sodium, the probability was determined from the same principle as above, using the upper tolerable limit of intake instead of the AI. Because interindividual variability has not been specified for the upper tolerable limit, we set it at 15% for France [25] and 10% for the US [20], except for protein (upper bound) where it has been set at 0% [26]. 2) For total carbohydrate and total fat, the probability of an excess in intake was calculated using upper bounds of the acceptable distribution range.
For other vitamins and minerals with available upper tolerable limits but where the risk of excessive intake is low, we used a penalty value system: a value equal to 0 was generated when the average intake of a nutrient exceeded the upper tolerable limit of intake.
The PANDiet score is the average of the Adequacy and Moderation sub-scores. In principle, the score ranges from 0 to 100; the higher the score, the better the diet quality.
A French implementation of the PANDiet ( Figure 2) was developed based on the French nutritional recommendations for adults [25][26][27], including European Community values when specific French recommendations did not exist [28][29][30]. A US implementation of the PANDiet (Figure 3) was developed based on the US nutritional recommendations for adults [31][32][33][34][35][36][37][38]. Although the structure of these two implementations is almost identical, it should be noted that the differences in reference values renders cross-national comparisons of PANDiet scores meaningless.

Evaluation of the Validity
The French and US implementations of the PANDiet were evaluated by assessing their content and construct validity.
Content validity consists of a judgment as to whether or not the index samples all the relevant or important content or domains [18,19]. The correlations between the individual items and the PANDiet and the relationship between the PANDiet score and energy intake were investigated. The latter was checked in order to verify if a higher score would be automatically attributed to a higher energy diet.
Construct validity is an on-going process which involves three steps: 1) explicitly spelling out a set of theoretical concepts and how they are related 2) developing scales to measure these theoretical constructs and 3) testing the relationships among the constructs [18,19]. We evaluated the construct validity of the PANDiet using different theories relating to subgroups of the population that present 'a priori' different diet qualities. We selected specific traits supported by literature in both France and the US: 1) We hypothesised that non-smokers have a better diet quality than smokers [39][40][41]. Accordingly, participants with a higher PANDiet score should be more likely to be non-smokers. In the present study, smokers were defined as current smokers (including heavy or occasional) and non-smokers were defined as ex-or never-smokers. 2) We hypothesised that individuals consuming a lower-energydense diet have a better diet quality than individuals consuming a higher-energy-dense diet [42][43][44]. Accordingly, participants with a higher PANDiet score should be more likely to have a lower-energy-dense diet. In this study, total energy density of the diet was calculated by dividing total energy intake (kcal) from food for each day by the total weight of the reported food intake (g). All beverages were excluded from this calculation based on an approach previously published [45]. 3) We hypothesised that following food-based recommendations [38,46] ensures a good nutritional quality of the diet. Accordingly, participants with a higher PANDiet score should be more likely to have food intakes in line with the international nutrition policies (e.g. more fruits and vegetables and less meat and processed meat).
In addition, given that fruit and vegetable intakes are main contributors to intakes of folate [47] and carotenoids [48], we hypothesised that higher plasma folate and carotenoids concentrations would reflect diet quality. Accordingly, participants with a higher PANDiet score should be more likely to have a higher plasma folate, alpha and beta-carotene concentrations.

Statistical Analyses
All analyses were performed using SAS version 9.1.3 (SAS Institute). Weighting schemes proposed by ENNS and NHANES were used to account for the complex survey designs and were adapted to the population samples analyzed. To describe the distribution of the PANDiet, elemental statistics (mean, standard error of the mean and quartiles) were used. Continuous variables are presented as mean 6 SEM. Because the probabilities of adequacy were not normally distributed, correlation coefficients between the PANDiet items, sub-scores, score and energy intake were assessed using Spearman's correlations. Associations between the PANDiet (dependent variable) and sex, age, smoking status, total energy density of the diet, food intakes, plasma folate, and alpha-and beta-carotene (independent variables) were assessed in simple linear models and in a multivariate model after adjusting   for age, sex and smoking status where appropriate. P,0.05 was considered significant.

French Implementation of the PANDiet
The mean PANDiet score was 63.2560.29 (range: 42.69-89.61). The PANDiet was approximately normally distributed (skewness = 0.21 and kurtosis = 20.34). The correlation with the PANDiet score was higher for the Moderation sub-score (r = 0.71) than the Adequacy sub-score (r = 0.47, Table 1). The correlations between the PANDiet score and PANDiet items were as expected, except for PUFA, zinc, vitamin A, vitamin B-12 and vitamin D ( Table 1). The inter-correlations between individual items, expressed in absolute values, ranged from r = 0.00 to r = 0.84 (Table S1). The correlation with the PANDiet score was not significant for total energy intake excluding alcohol (r = 20.02, P = 0.50). While participants with a higher PANDiet score were more likely to be older (P = 0.0314), there was no significant association with sex (P = 0.10, Table 2).
Participants with a higher PANDiet score were more likely to be non-smokers (P = 0.0007) and to have a lower-energy-dense diet (P,0.0001, Table 2). Figure 4 presents the results for the PANDiet score according to 10 food groups identified as likely to indicate diet quality, important in terms of nutrition policies and with a robust number of consumers. Full results for all food groups are shown in Table S2. Participants with a higher PANDiet score had a diet higher in the intake of milk, other dairy products (e.g. yogurt), fish, fruit and vegetables (all P,0.01 except for milk where P = 0.0237) and lower in cheese, eggs, meat, processed meat and pizza (all P,0.01 except for meat where P = 0.0131 and eggs where P = 0.0570). Participants with a higher PANDiet score were more likely to have higher plasma folate, alpha and beta-carotene concentrations (all P,0.0001, Table 2).

US Implementation of the PANDiet
The mean PANDiet score was 58.7360.36 (range: 34.74-89.97). The PANDiet was approximately normally distributed (skewness = 0.13 and kurtosis = 20.60). The correlation with the PANDiet score was higher for the Moderation sub-score (r = 0.82) than the Adequacy sub-score (r = 0.43, Table 3). The correlations  between the PANDiet score and PANDiet items were as expected, except for PUFA, vitamin B-12 and vitamin E ( Table 3). The inter-correlations between individual items, expressed in absolute values, ranged from r = 0.00 to r = 0.72 (Table S3). The correlation with the PANDiet score was significant but low for total energy intake excluding alcohol (r = 20.11, P,0.0001).
Participants with a higher PANDiet score were more likely to be female (P = 0.0002) whereas there was no association with age (P = 0.42, Table 2). Participants with a higher PANDiet score were more likely to be non-smokers (P = 0.0020) and to have a lower-energy-dense diet (P,0.0001, Table 2). As shown in Figure 5, participants with a higher PANDiet score had a diet higher in the intake of milk, other dairy products (e.g. yogurt), fish, fruit and vegetables (all P,0.01 except for fish where P = 0.0327) and lower in intakes of cheese, eggs, meat, processed meat and pizza (all P,0.01). Full results are shown in Table S2. Participants with a higher PANDiet score were more likely to have a higher plasma folate concentration (P,0.0001, Table 2).

Discussion
The present study describes the development of a new diet quality index, the PANDiet. This index provides a measure of overall diet quality and each PANDiet item assesses the probability of adequate nutrient intake according to a specific nutritional reference. We report the strategy used to evaluate the validity of this index, and the ensuing validity elements based on the application of the PANDiet to data from two different populations.
The correlation between the PANDiet score and PANDiet items reflect the contribution of the variation of each item to the variation of the PANDiet score. In both implementations, we found that the items related to total carbohydrates (lower bound), total fat (upper bound), SFA, fibre and vitamin C had the most important influence on the PANDiet score and thus, satisfying the recommendations for these nutrients were the most important factors in discriminating the diet quality of the population samples analyzed. Conversely, low correlations reflected that some nutritional recommendations were not discriminating factors and the related items did not influence the PANDiet score (e.g. vitamin Table 3. PANDiet and individual item scores shown by quartiles and Spearman correlations between the PANDiet score and individual item scores for US sample (n = 2391). D). Nevertheless, such items still provide important information and need to be taken into account in an overall assessment of diet quality.
Recent publications have emphasized that diet quality indices developed to date present several unresolved methodological issues that may reduce their diagnostic capacity [4][5][6]15]. One issue concerns the existence of high inter-correlations between index items that may lead to an undesirable over-contribution of some items to the score. The inter-correlations between items of the PANDiet reflect the complexity of the diet and interactions between dietary and nutrient intakes. These inter-correlations do not point to a problem of assessing similar aspects of the diet with different items. Because of the lack of a science-based rationale to develop a weighting system for the nutrients, we used an equal weighting for nutrients within each sub-score of the PANDiet. It should be noted that using two sub-scores and averaging their scores to provide the final PANDiet score designates a higher weight to the items of the Moderation sub-score than to the items of the Adequacy sub-score since the former includes fewer items than the latter.
Like very few other diet quality indices [10,16], the validity of the PANDiet was evaluated through a strategy based on methodologies developed in the psychometric sciences [18,19].
The PANDiet passed the different tests of validity that were based on factors considered to be associated with diet quality from the literature in both France and the US. We have shown that the PANDiet was in line with published findings that consistently indicate smokers have higher intakes of total fat and SFA, and lower intakes of folate, vitamin C and fibre compared with nonsmokers [39][40][41]. This ability to detect differences in the quality of the diet of smokers and non-smokers has also been reported for several other diet quality indices [7,10,16,[49][50][51][52]. We have also shown that the PANDiet assesses nutrient adequacy independently of energy intake, as demonstrated by the absence of a correlation in the French sample and a very low correlation in the US sample between the PANDiet score and total energy intake. Furthermore, the significant negative association with energy density indicates that a higher PANDiet score reflects diets that are nutrient but not energy dense. Low or insignificant correlations between the total score and total energy intake have been reported for several diet quality indices [10,16,50] but the association with energy density has been rarely investigated [53]. Lastly, we have shown that the PANDiet assesses diet quality in terms of relative food consumption. The variation in the intake of ten food groups presented according to the PANDiet score are in line with the international nutrition policies [38,46] and diet modelling based on current nutritional recommendations [54,55]: lowering the intakes of several animal products (e.g. meat and processed meat), increasing that of fruits, vegetables and fish and equilibrating the intake of items within the dairy product category (lowering the intake of higher fat cheeses in favour of lower fat milks or yogurts).
Unfortunately some nutrients could not be included in the index despite nutritional recommendations existing (e.g. added sugars) due to a lack of data in the food composition databases. Similarly, items estimating the probability of an adequate intake of simple and complex sugars could not be included due to a lack of specific nutritional recommendations. Nevertheless, when such recommendations are developed or updated or nutrient composition information is available, it will be possible to include new items in the PANDiet and confirm the validity of the updated index. Lastly, it should be noted that the restricted samples on which these analyses were undertaken could limit the representativeness of the findings and the generalizability of the results. The use of relevant weighting schemes has limited this potential bias.
The majority of other published diet quality indices rely on food-based dietary guidelines, which simplifies the selection of the items in the index, the scoring system and the weighting. Since this approach does not require a translation of food intakes into nutrient intakes, it therefore enables the application to shorter or less detailed methods of dietary assessment, which are often used in field research. In addition, this approach indirectly assesses intakes of nutrient and non-nutrient components in food. However, food-based dietary guidelines are drawn from a mix of different nutrition knowledge: some recommendations are based on epidemiological data that have ascertained a relationship with a health-related outcome (e.g. intake of fruits and vegetables), other food intake recommendations arise indirectly from a recommendation in nutrient intake (e.g. intake of dairy products in relation to the requirement for calcium), or, even more indirectly to the place left for some food categories once the frequency or amount of others have been defined. Therefore, food-based dietary guidelines account for nutrient intake recommendations only very indirectly. Accordingly, scoring using food-based dietary guidelines does not use the precise information of food and diet quality at the individual level. One example of the mismatch between food-based dietary guidelines and nutrient adequacy is that adherence to food patterns built from food-based dietary guidelines does not always ensure adequate intake of several nutrients, such as vitamin E or potassium [56]. The large heterogeneity commonly found within food groups in terms of nutrient density tends to reduce the sensitivity of the index. Furthermore, food-based dietary guideline indices have to be adapted [7][8][9] in order to be used in countries with different dietary practices. Indeed, nutrient requirements can be covered in many different ways, which explain why considering the nutrient level can assess more accurately the quality of the diet at the individual level. In the PANDiet, which is a diet quality index based only on nutrients, this accuracy is strengthened by the use of the probabilistic calculation of nutrient adequacy. The PANDiet accounts for the precision of the estimation of usual intakes of nutrients from dietary surveys, and utilizes all current knowledge based on nutrient intakes (including EAR, AI, and tolerable upper limit of intake). Finally, the PANDiet offers a complete diet quality index relevant at the nutrient level. For studying the diet quality of populations, the PANDiet appears complementary to indices relying on food based patterns (e.g. Mediterranean diets). At the individual level, the PANDiet offers an accurate index to qualify the diet quality that could be used for individual diagnosis and follow-up in the framework of tailored dietary advice.
In conclusion, there is strong evidence suggesting that the PANDiet is a useful tool to assess diet quality at the population level. Although this study concerns the French and US general adult populations, the PANDiet could be applied to other countries or specific populations, where relevant nutritional recommendations and nationally or specific population representative dietary data are available. Further validation of the PANDiet would require the examination of the relationship between the PANDiet score and a large set of biochemical and clinical indicators of nutritional status. The PANDiet stands as a useful tool to explore how diet quality, as captured by this nutrient-based index, relates to risk of morbidity and mortality using longitudinal surveys.