Rasch analysis of consumer attitudes towards the mountain product label

In 2012 the European Union adopted the Regulation No. 1151/2012, which, among others, defines the legal framework to protect the originality and authenticity of mountain foods through the “Mountain Product” quality scheme. The research aims to analyze people’s attitudes towards mountain foods and the EU Mountain Product label, as well as their area of origin, i.e., the mountains. For the purpose of this research, the Rasch model was used since its properties make it suitable to identify the measure of interest. The results allow us to identify potential leverage to plan promotional activities in order to enhance the value of mountain food, raise awareness on the EU label, thus improving the sustainability of mountain farms and regions.

Page 2 of 25 Bassi et al. Agricultural and Food Economics (2022) 10:13 scheme grants consumers the possibility to take informed decisions (Bonadonna 2016;McMorran et al. 2015;Nicolosi et al. 2019). A number of scholars already investigated some issues concerning mountain products and the EU label. Martins and Ferreira (2017) provided and overview of mountain foods using the European certification schemes, i.e. PDO/PGI/TSG labels and the optional quality term "mountain food". The authors emphasized how the use of this term could encourage better recognition of mountain foods by consumers, thus improve the economic and market performance of farms, and consequently support the development of mountain areas. Bonadonna (2016) and Bonadonna et al. (2017) analyzed the level of awareness on and interest in the EU quality scheme among farmers in the North-Western Alps. They highlighted the positive attitude of farmers, who consider this quality scheme a useful tool to promote their products. They also underlined the need for more communication to improve label recognition. Bonadonna et al. (2015) and Bonadonna and Duglio (2016) investigated the applicability of the EU regulation to the production of some traditional mountain cheeses in Piedmont (Italy). They found that farms producing such cheeses could use the label since they are able to self-produce the fodder they need or obtain it locally from other farms. This is not the case for all requirements, for which national derogations may be necessary. Bentivoglio et al. (2019) studied the main protocols used to assess the authenticity of dairy and meat products, the most widespread mountain food products in Italy. Indeed, the possibility of assessing authenticity through traceability methods, and then communicating it through the label, plays an important role in increasing consumer loyalty. Finco et al. (2017) analyzed the interest in the application of the EU label and the perception of Mountain products in the Italian region of Marche. They carried out two surveys, one among producers and one among retailers, and found that although this new label is generally accepted, it is not yet well known, probably due to a lack of communication. Ante litteram, also Baritaux et al. (2011) focused on retailers and analyzed their perceptions of consumers' preferences as regards mountain food products in Europe. They found that perceptions differ by country and supply channel, since retailers associated with alternative supply chains appear to have a more accurate perception.
Consumers' behavior in specific agricultural sectors has been the focus of a few of other scholars. Sanjuán and Khliji (2016) focused on the beef sector and investigated the role of the EU mountain label among urban consumers: the results highlight its limited impact on consumption, in particular if compared to other, more straightforward differentiation claims, such as cattle breeds. Nam et al. (2020) analyzed consumers' preference and willingness to pay for milk. They estimated that the marginal willingness to pay for "performing sustainability" through mountainous farming was the highest, especially among consumers with pre-existing knowledge and high awareness of sustainability. Mazzocchi et al. (2021) assessed the influence of consumers' attitudes in purchasing a "Mountain Product"-labelled Alpine cheese, showing the influence of green consumers' values on the mountain product label choice, and a strong relationship between green consciousness and adherence to animal well-being values. Brun et al. (2020) focused on the honey market and explored Italian consumer attitude towards the EU label. They found that Italian consumers have a positive attitude towards "mountain" honey, Page 3 of 25 Bassi et al. Agricultural and Food Economics (2022) 10:13 suggesting that the mountain quality term could be a useful tool for the valorization of honey.
The review of the literature shows that, while some important questions have been already addressed, further investigation is needed to fully understand consumers' attitudes towards mountain food products and the EU label, as well as their area of origin. To this aim, we decided to investigate four focal dimensions: Mountain Product label attitude, purchase intention, mountain attractiveness and mountain food attractiveness. The research presented in this paper aims to analyze people's attitude towards these issues in order to identify potential leverage for enhancing the value of mountain foods through promotional activities, thus contributing to the sustainability of mountain farms and regions. For the purpose of this research, the Rasch model was used since its properties make it suitable for identifying the measure of interest, as described in the Methods section.

Research design
To describe and measure the four abovementioned dimensions, 47 items were developed by reference to the relevant literature. The measurement scales were adapted to appropriately suit the study topics (Table 1).
To collect the data, a questionnaire consisting of two sections was planned. The first section included the items listed in Table 1, which were explored by using a 5-point Likert-like scale, ranging from 1 (strongly disagree) to 5 (strongly agree). The second section aimed at exploring the respondent's sociodemographic characteristics such as gender, age, education level, income and province of residence.
Data were collected through a CAWI (Computer Assisted Web Interview) survey by an external specialized company. This survey method presents multiple benefits: online panels are a pool of people who have agreed to repeatedly take part in web surveys; they can be used as a sampling source for thematically and methodologically diverse studies; in contrast to ad hoc recruitment, online panels reduce the costs associated with locating appropriate respondents and ensure their immediate availability; they allow an easy identification of key sample segments, increased response, augmented response quality, shorter field times and ethical advantages. Furthermore, web-based questionnaires allow for complexity even if they appear simple and attractive. Online panels are an important, if not the dominant, form of reactive web-based research in the medium term (Couper 2000;Gritz et al. 2002).
The Friuli Venezia Giulia region (north-eastern Italy) was designated as the study area. The total sample size was 310 and included respondents from all four provinces of the region. Most of them were from the provinces of Trieste (39.4% of the units) and Udine (23.4%) and lived in small municipalities (31.6%). The sample included 57.1% of females and 42.9% of males. All age classes were adequately represented. The largest number of respondents fall into the middle age classes, that is 35-44 (27.7%) and 45-54 (20.6%) years old. Most respondents (82.6%) held at least a high school diploma. The size of the subsamples defined on the basis of income classes was quite uneven: most respondents were people with low (less than 1,500 euro per month) or mid-low (between 1,500 and 3,000 euro) income level (24.8% and 55.5% respectively).  There is a strong likelihood that I will try this product if seen in a Farm shop/Mountain hut

PI03
There is a strong likelihood that I will seek out this product in a Farm shop/Mountain hut PI08 There is a strong likelihood that I will try this product if seen in a Supermarket

PI04
There is a strong likelihood that I will seek out this product in a Supermarket

PI09
There is a strong likelihood that I will try this product if seen in Food store specialties

PI05
There is a strong likelihood that I will seek out this product in Food store specialties

Methods: the Rasch models
Rasch models are measurement models which use dichotomous or ordinal data to construct a measure of the latent quantity of interest for each person interviewed. With respect to our study, a linear measure of interest in the four dimensions was constructed from the 1-5 Likert-like scores attached by respondents to the related items. According to the nature of the variables, different Rasch models are available. In the case of two ordered categories, the Dichotomous Rasch model is provided (Rasch 1960), while the Rating Scale model (Andrich 1978) and the Partial Credit model (Masters 1982) are best suited for higher ordered categories. We applied the Rating Scale model: where N is the number of persons; J is the number of items;X ij ∈ {1, 2 . . . K } is the response of person i to item j: α i is the so called "ability" of the person (i.e. the degree of his/her positive attitude towards the aspect of interest), and β j is the so called "difficulty" of the item (i.e. the degree of difficulty to endorse the question posed), expressed with the same scale of the latent trait; τ k is a "threshold" that measures the difficulty to endorse category k, identical for every item. Higher values of α i mean that the persons have high probability of answering to the question using high scores; conversely, lower values of α i mean that the persons have high probability of answering to the question using lower scores. Higher values of β j mean that is unlikely that a person answers to the question j using high scores; conversely, lower values of β j mean that a person would likely answer to the question j using high scores. In this model, τ k -with k ∈ {2, . . . K } , the "threshold" that measures the difficulty to endorse category k-is called Andrich Threshold (Linacre 2001) and is expected to satisfy an ascending order (see later). For instance, considering the latent dimension "Mountain attractiveness (MA)", if respondent i assigned a high score to the item MA01, he/she would then be expected to show more positive attitude α i than respondent g who assigned to the same item a lower score: this mean that the person i has a higher attitude for the latent dimension MA than the person g. Moreover, if MA02 was more difficult to endorse (i.e. with higher value of β j ) than MA01, then the same persons are expected to assign to MA02 lower scores than those assigned to MA01. In practice, persons with higher estimated attitude α i are expected to assign higher scores to all items than persons with less positive attitude, and items with higher β j value (i.e. greater difficulty to endorse the item j) are expected to received lower scores than items with lower difficulty. This is one of the fundamental properties of Rasch models, the so called "Specific Objectivity" (Rasch 1960(Rasch , 1977. Specific Objectivity is the requirement that measures produced by a measurement model should be independent of the distribution of the difficulties of the items and the abilities of the persons. There are indeed examples that show that splitting the original sample in two sets, one including persons with lower positive attitude and the other grouping persons with higher positive attitude, the estimated difficulty parameters β j calculated on the two subsamples are (statistically) the same. This means that, excluding one part of the sample, the estimates of the attitude (1) Page 6 of 25 Bassi et al. Agricultural and Food Economics (2022) 10:13 parameters for persons in the reduced sample should be (statistically) the same as that obtained from the complete sample: if this doesn't happen, the data do not satisfy the model, possibly due to coding or response errors, such as miscoding in the data, random answers, or misinterpretation of measurement scales (e.g. respondents interpret the scale in a reversed manner). This property is also known as "Person-free Test Calibration" and in practice ensures that, to estimate a Rasch model, we do not necessarily need a random sample. Symmetrically, "Item-free person measurement" means that estimates of person attitudes are as statistically independent as possible, whatever item and whatever distribution of item difficulties are included in the test. In particular, the familiar statistical assumption of normal (or any known) distribution of model parameters is not required. To put it another way, if we estimate persons' attitudes using different sets of items, these attitudes tend to coincide within the usual statistical error margin. Given the optimal theoretical properties of the Rasch model, the main problem in the analysis is to understand how good the data fit the model. Data were analyzed using Winsteps (www. winst eps. com), one of the most widespread software for Rasch Analysis (Bond and Fox 2015). To verify compatibility of data with the model and compliance with its assumptions, the correlation coefficient between the empirical observations X ij ∈ {1, 2 . . . K } and the Rasch measures obtained in a first run of the estimation program were examined. The correlations were calculated both for items and persons. These correlations are expected to be positive as, according to the assumptions of our model, persons with more positive attitude tend to answer to the item using higher scores, while persons with less positive attitude tend to answer using lower scores. Therefore, should the correlation be negative or very low, this would mean that the item would not work as intended, or that its codes would have been reversed, i.e. 1 means "more" (rather than less), and K means "less" (instead of more). In this case the item should be excluded, or codes reversed. For what concerns the correlations for persons, they measures the correlation between the answers X ij ∈ {1, 2 . . . K } of the persons to the item j, and the estimated "easiness to endorse" the item j −β j (putting a minus in front of the difficulty parameter β j give us the easiness of the item). Since persons should tend to assign higher scores to easier to endorse items, and lower scores to less easy items, if the data fit the model, we expect a positive correlation. Negative or very low correlations would imply that the person answered randomly to the questions without reflecting enough or answered using codes in a reversed manner. Should this be the case, the unit would be excluded. Unfortunately, our dataset showed negative or very low correlations for a large part of the sample (ranging from 25 to 40% of the sample, depending on the dimension investigated); therefore, these observations were dropped to reduce bias in the estimated measures for items and persons. Rather than manipulating the results, statistical data treatment is essential to handle errors related to response bias or badly conceived items. In other words, the Rasch Model Analysis checks for good quality data before obtaining the final model estimate.
Further reduction of the number of observations may potentially derive from the analysis of extreme scores, i.e. respondents who answer using only the lowest (or the highest) score of the scale, 1 (or 5) in our case, to every item. Extreme scores imply extreme, but indefinitely located measures (abilities). Methods to estimate measures corresponding to extreme scores are available, but these produce truncated variables. Hence, we opted for their exclusion from further analysis. The presence of extreme scores implies that the items used to investigate the dimension of interest are either too easy (leading to maximum scores) or too difficult (resulting in minimum scores), and the only way to solve this question is to add new items to the dimension. However, such option was not included in our survey, and depending on the dimension investigated we observed extreme scores ranging from 10 to 20% of the sample. Finally, additional exclusions are related to the analysis of fit indexes, which will be discussed later.
Once completed the preliminary analysis, the other assumptions of the model should be checked to understand whether the categories created assuming value 1,2 etc. have an actual meaning and therefore can be interpreted. Once estimated the model, the average of the estimated attitude for persons, for each item, is expected to grow with the scores 1,2 etc. Indeed, the model assumes that an individual's more positive attitude should result in the expression of a high score, hence we should observe for lower scores, lower average attitudes, and for higher scores, higher average attitudes. The dissatisfaction of this assumption, e.g. the average attitude for scores 2 is higher than the average attitude for score 3, would mean that respondents inverted or incoherently used scores 2 and 3, hence the two should be merged in a unique category. The Andrich Threshold τ k (Linacre 2001) provides further information on the coherent/incoherent use of category measures: in case of inverted use (i.e. not in ascending order by the score k), a common solution is to reduce the number of categories, merging adjacent ones with very similar average attitude or Andrich Thresholds.
Another important aspect of fit is the possible violation of local independence hypothesis (Lord and Novick 1968) and multidimensionality (Linacre 2011). For what concerns the first problem, we may look at the correlation for the standardized residuals: if this is low (< 0.70) we may conclude that the local independence hypothesis is not violated; a correlation > 0.70 means that some couples of items have almost the same meaning; therefore, one of them must be eliminated to satisfy the local independence hypothesis. Regarding the second problem, in a dataset fitting the Rasch model, variability depends on both the model and residual variability due to randomness. Rasch "Principal Component Analysis (PCA) of residuals" looks for patterns in the part of the data due to randomness. This eventual pattern is the "unexpected" part of the data that may be due, among other reasons (Smith 2002), to the presence of multiple dimensions in the data. In the Rasch PCA of residuals, we are looking for groups of items sharing the same patterns of unexpectedness. In particular, the matrix of item correlations based on residuals is decomposed to identify possible "contrasts" (the principal components) that may be affecting response patterns. Usually, the contrast needs to have the strength (eigenvalue) of at least two items to be above the noise level. If the largest eigenvalue of PCA is around 2 or less, the latent measure under investigation may be considered unidimensional. Instead, if it is much greater than 2, we can look at the correlation between measures obtained on the same set of persons, splitting the items in clusters according to the loadings, and applying the model separately for each cluster: as suggested by Linacre (2011), if these correlations are near 1, we can consider the items as making part of a unique dimension; otherwise, if the correlations are very low (< 0.30) we may split the items and exclude the one that do not seem compatible with the dimension of interest. Once these issues have been investigated and resolved, analysis of fit statistics will give an estimation of the degree to which participants and items are responding according to our expectations based on the model. These fit statistics will be therefore a summary of all the residuals (the difference between what is observed and what was expected) of each item for each person. They can assume values between zero and infinite. Values above 1 indicate greater variation than expected, while values less than 1 indicate lower variation than estimated. Values around 1 mean that the data fit the model adequately.
The fit statistics will be divided in two categories, a weighted one, called Infit, and an unweighted one, called Outfit. For suggestions regarding good practice interval see Bond and Fox (2015). In our analysis we followed the suggestions of Linacre (2011): 0.5-1.5 for items. Persons that do not fit should be removed from the model to increase the validity of the results obtained. We decided to retain from the analysis persons with Infit or Outfit < 3.
Finally, we may proceed to examine the overall fit of the model, and in particular the reliability and separation indexes for items and persons: values of reliability > 0.80 and separation > 3 are indicators of good fit of the scale, telling how well this sample of respondents have spread out the items along the measure of the test, and so defined a meaningful dimension.

Checking the fit of the model: preliminary analysis and final model
We applied Winsteps to the data to estimate an initial model, for each dimension (LA, PI, MA, and MFA). The results are illustrated in Table 2. As we can see, all the items in all dimensions show a positive and high correlation with the estimated measure and therefore they are consistent with the hypothesis of the model. Instead, for what concerns the persons, we observe elevated percentage of persons that give scores to the items incoherently with respect to the scale (correlation ≤ 0.20). The percentage of persons with extreme scores ranges from 11 to 23%, meaning that the scales lack of items: in particular, as we will see later, the items are too easy, and this leads to a high share of persons with maximum score (i.e. they give 5 to every item).
After this first run, we proceeded to run the Winsteps program recursively, excluding persons with correlation less than 0.20, and persons with Infit or Outfit greater than 3, thus obtaining the final model. Misfitting items were removed from the final model, but this will be discussed later. Here we want to compare the item and person measures estimated with the initial model, to those obtained in the final model. As the Rasch model satisfies "Specific Objectivity", and therefore enjoys Person-free Test Calibration property, if the persons and items deleted as described above would not bias the results, we would observe that measures for items and persons, estimated with final and initial model, would be the same, laying within a 95% confidence interval around the identity accordingly to what explained above (agreement with the identity line). To this aim, for each dimension we divided the samples of persons (for each dimension the set of persons included in the final model do not necessarily coincide) in two parts: the upper part, including persons with attitude greater than the median, and the lower part, including persons with attitude lower or equal to the median. Figure 4 (in Appendix) shows the identity lines for the difficulties of the items estimated separately on the two subsamples.
Moreover, for each dimension, we divided the items in two subsets: the first set was composed of the items in odd position, and the second set was formed by the items in the even position from the ordered set by measure. Then we estimated the person attitudes from the two sets of items. Figure 5 (in Appendix) shows the identity lines for these estimated attitudes. 1 As we can see from Fig. 4, the item difficulties estimated with the upper sample coincide with those estimated with the lower sample, within a reasonable margin of error (if we use a 95% confidence interval around the identity line all the items lie inside it). The same logic applies to persons' attitudes (Fig. 5). This confirms that the sample selected for the final model satisfies the Specific Objectivity property of the Rasch model: this example show what Person-free Test Calibration and Item-free Person Measurement means, that no random sample is needed to estimate the Rasch model.
The preliminary analysis allowed us to come to the results of the final model. Next, we must estimate the fit of the final model. The Category Characteristic Curves for each dimension (Fig. 1) suggest that the scale categories (scores 1 to 5) function as expected: the curves are well separated along the measurement scales; the estimated average measures are in ascending order with the scores for all dimensions, meaning that persons with more positive attitude tend to use higher scores, and persons with less positive attitude tend to use lower scores, as expected. The Infit and Outfit indices are near 1 and all in the acceptance range (0.5-1.5) suggested by literature (Bond and Fox 2015).
Tables 4, 5, 6, and 7 (in Appendix) report for each dimension: the final list of items (in descending difficulty order), and for each item, Measure, SE, Infit and Outfit Mean Squares, Infit and Outfit t, and Correlation. Marginal fit values are marked with an asterisk. Given the low stakes nature of this investigation, we decided to retain items with 1 The difficulty parameters to endorse the items estimated with the "lower subsample" (y-axis) are plotted against the difficulty parameters estimated with the "upper subsample" (x-axis) (Fig. 4). Conventionally, the value zero is assigned to the average difficulty of items (and persons) in either set, so that an identity line is expected (the diagonal). Similarly, persons' attitude level estimated with the first set of items (y-axis) are plotted against persons' attitude level estimated with the second set of items (x-axis) (Fig. 5). In both cases this is paralleled by 95% confidence line (Wright and Stone 1979).
Page 10 of 25 Bassi et al. Agricultural and Food Economics (2022) 10:13 Infit and Outfit less or equal to 1.50 (Bond and Fox 2015). The tables contain also values of Separation and Reliability indexes, both for items and persons. From these tables we may see that all items have positive correlation with the estimated measures (persons have a correlation ranging from a minimum of 0.20 to 0.24, contrary to the initial model, where a consistent number of persons showed negative correlation with respect to the estimated measures). All dimensions show good indexes of separation and reliability.

Mountain product label attitude, LA (Table 4)
Infit and Outfit values greater than 1.50 suggest that the item LA13 "I think MP label is an attractive label" is slightly misfitting, probably because the label is still little known, hence it could be difficult for persons to describe its attractiveness. Hence, we eliminated the item from the final model. The other items generally show good fit, except for LA09 "I like what MP label stands for", whose Infit (1.43) is still within the 0.5-1.5 interval. As we may observe, LA09 behaves quite like the misfitting item LA13.

Purchase intention, PI (Table 5)
Item PI08 "There is a strong likelihood that I will seek out this product in a Farm shop/ Mountain hut" is slightly misfitting (Infit = 1.5, Outfit = 1.62). One possible explanation could be that participants did not know what a "malga" (Italian word for Mountain hut) is or do not go very often to mountains. Indeed, Winsteps reports unexpected (with respect to their estimated measure) answers to item PI08 by 14 persons, with 9 of them using the middle score 3 (say neutral), against an expected score of 1 or 5. In this case, we decided to assign a missing value to the answer of these persons to the item PI08 and rerun the Winsteps estimation program obtaining the "final model for PI with PI08 missing" including the item PI08. Figure 6 (in Appendix) compares the measures for persons (attitudes) and items (difficulties) estimated with the "final model for PI" and the "final model for PI with PI08 missing" (that set item PI08 missing for the 14 persons). As we can see from the graphs, the person attitudes and item difficulties estimated by the two models lie within a 95% interval around the identity line, which means that exclusion and inclusion of the item PI08 lead to the same results. We also considered the Infit and Outfit indices of the items by the final model for PI with PI08 missing, and we found that the fit of the item PI08 is almost perfect after setting as missing the answer to this item by the 14 persons with unexpected answer. The other items show values of Infit and Outfit near 1, which lie inside the range 0.5-1.5. In the light of the above considerations, PI08 was dropped from the final model for subsequent analysis.
Mountain attractiveness, MA (Table 6) All items, except MA09 "I think that the mountains have convenient transportation", show good fit indices. The reason why MA09 misfits may be that people mainly use private means of transport. Consequently, we decided to exclude this item from the final model.

Mountain food attractiveness, MFA (Table 7)
All items show good fit indices, and none was excluded from the final model.
Finally, is worth to investigate local independence and the one-dimensionality hypotheses. The analysis of the largest standardized residual correlations between items showed very low values, that do not exceed the threshold suggested to consider the items as dependent (0.70) for all dimensions: the highest correlation, though rather low (0.30) is observed between items MFA06 and MFA07. This confirms that the local independence hypothesis is not violated. Table 3 shows the results of the Rasch PCA of residuals, which can be used to detect the presence of multiple dimensions in the data. As we may see, dimensions PI and MA show an unexplained variance in the first contrast that do not exceed 2, so that we do not expect multidimensionality in the data. For what concerns LA and MFA, the unexplained variance in the first contrast slightly exceeds 2; however, looking at the correlations between person measures obtained splitting the items in clusters, as suggested by the loadings, we may see that these correlations are 1 or near 1 (> 0.80), and in any case very far from the threshold (0.30) suggested to split the items in different dimensions. Looking at dimension LA, the items with the highest loading (LA07, LA06, LA11, LA17) may represent the psychological side, while the items with lower loading (LA15, LA02, LA03, LA10) may represent physical side of the dimension. However, the evidence does not suggest the opportunity of considering separately these two traits of the dimension. For what concerns MFA, in this case as well we may consider the absence

Results and discussion
Item difficulties and their standard errors estimated from final models are shown in Tables 4, 5, 6, and 7 (in Appendix). They are of fundamental importance to confirm the validity of the proposed measurement scales. Figure 2 illustrates the Wright map 2 which compares the difficulties of the items with respect to the persons' attitudes, expressed on the same scale unit. The order of the item difficulties is important: should it be in line with theoretical expectations, it would confirm the validity of the scale. The easiest item to endorse is PI06 "I would consider buying MP food" and PI03 "There is a strong likelihood that I will try this product if seen in a Farm shop/Mountain hut": these are the easiest ones to endorse, as they deal with the mere possibility of buying MP product if the people encounter them by chance. Items PI01 and PI02 are somewhat more difficult (difficulty around -1), as they investigate the strong possibility of buying these products. Then follow the items regarding the possibility of buying the MP products "if seen" in a specialized shop (easier -0.06) or at the supermarket (more difficult 0.67). The most difficult items to endorse are, in ascending order, PI08 "seek out this product in a Farm shop/Mountain hut", PI10 "seek out this product in Food store specialties", and PI09 "seek out this product in a Supermarket" (the hardest with difficulty 2.14). The results of the model are consistent with what we could expect from a theoretical point of view for this dimension as well.

Mountain attractiveness, MA
The easiest item to endorse, rather distant from the others with -3.27 of difficulty, is MA05 "the mountains have beautiful/attractive scenery", whose interpretation is straightforward. At level -1.70 of difficulty in the Rasch scale is the item MA01 "mountains have a unique eco-environment". At average difficulty we may see MA04 "local specialties and souvenirs" and MA03 "historical vestiges". Then we have the hardest items to endorse, MA06 "well-developed environment management", MA07 "well-appointed accommodation and restaurants", at level around 1, and the last: MA02 "special village Page 15 of 25 Bassi et al. Agricultural and Food Economics (2022) 10:13 culture", MA08 "good services and facilities". Also, for this dimension, the item ordering that comes from the Rasch model is plausible.

Mountain food attractiveness, MFA
The easiest items to endorse are MFA04 "mountains provide traditional food culture", MFA02 "mountains provide delicious food", MFA03 "I will say positive things about MP food". Then we observe MFA01 "mountains provide a rich food culture", and MFA10 "I will recommend MP food". Items MFA09 "the mountains, as a tourism destination, provide unique food", MFA06 "I would like to (re)visit the mountains to explore different local foods", MFA08 "I would like to come back to the mountains to enjoy MP food", and MFA07 "I would like to travel to the mountains for food tourism" show the average difficulty. The hardest item to endorse (with difficulty 1.75) is MFA05 "the mountains, as a tourism destination, provide diverse food". Finally, all dimensions are characterized by the fact that the average attitude of persons is largely above the average difficulty of the items that, by default, is set to zero in the Rasch models (measures in Rasch models are interval scale, so their origin is conventional and set equal to the difficulty average). This means that, for all the dimensions of interest, the items chosen to measure the person attitude are too easy to endorse: indeed, Fig. 2 shows on the top left many participants with maximum measure that coincide with extreme score (in this case persons who answered 5 to all items). These persons didn't find, between the items, anyone that could match their attitude. In view of all this, for future research in this area, we suggest the selection of more difficult items than those considered in this paper to adequately measure all respondents.
One final aspect to analyze is the Differential Item Functioning (DIF). DIF investigates the potential interactions between items and sample characteristics. In practical terms, we want to check whether the difficulty to endorse the items differs between Sex or Age groups, as well as other individual characteristics. If the difference is statistically significant, and relevant, it is recommendable to split the item: in practice, if there is a difference between Males and Females in the difficulty to endorse one item, we may double the column of the item setting one missing for Male and the other missing for Female, so to estimate for the same item two different difficulties, one for each category.
The DIF analysis were carried out for Sex, Age, Education, Province of residence, and Income, revealing that no DIF were present regarding these characteristics, with an exception. Indeed, the only significant DIF was found in dimensions MA for the variable Age: the item MA02 "I think that the mountains have special village culture" for the second class of Age (25-34 years) (Fig. 3), shows t-value greater than 3, while in most of the cases the t-values fall within the range (− 2, + 2). The difficulty of this item for persons in the age class 25-34 years appears more difficult to endorse, meaning that young people do not value much "village culture", maybe because they are more interested in sport activities than culture when they go to mountains (the dimension MA Page 16 of 25 Bassi et al. Agricultural and Food Economics (2022) 10:13 regards "mountain attractiveness"). We decided not to differentiate this unique item; nevertheless, future work based on this scale should consider deeper investigation of this question.
We also analyzed the average attitudes for the different sample characteristics. Calculations are based only on the persons that are not extreme scores. Even though this further reduces sample size, we preferred to exclude extreme observations since the model cannot measure their attitude. In the following we will comment only statistically significant differences.
For what concerns Sex, for dimension LA, Female show a lower appreciation (attitude 1.95 for Female against 2.48 for Male), suggesting a gender influence on label appreciation, at least with regard to the issues here associated with the Mountain product label.
Looking at Age classes, highest appreciation for the dimension LA is observed in the average classes 35-44, 45-54, with attitudes 2.86 and 2.62, while the youngest classes show the lowest level (1.48 and 1.69), and average level in the oldest classes (2.15 and 2.05). With respect to dimension PI, we may observe similar patterns with the class 35-44 showing the highest appreciation (2.86) for PI. These findings may be related: the middle age group includes people who are already permanently employed, perhaps in well-paid jobs, and therefore more inclined to buy food that has a higher value, including food with a label.
Regarding the level of Education, all dimensions show some significant difference between categories: having a relatively low level of education makes people more sensitive to all dimensions.  et al. Agricultural and Food Economics (2022) 10:13 For what concerns the Province of residence, no significant difference is observed between provinces in all dimensions.
Finally, with respect to Income classes, only for dimension PI is observed a lower propensity to purchase intention in the lowest class: this can be explained by the generally high prices of mountain products.
Obviously, given the low number of persons in the selected samples, these correlations must be taken with caution.

Conclusion
This research aimed to analyze people's attitudes towards mountain food and the EU label, as well as their area of origin, i.e. the mountains, thus enriching the understanding of these topics with useful information for designing effective promotional activities.
The selected sample and items satisfy quite well all the conditions required for a good fit of the Rasch model, the methodological tool used in our research, allowing us to state that the measures obtained for the persons selected with the procedure described above satisfy the fundamental property of Specific Objectivity, typical of the Rasch model.
The results show that people are relatively more aware of some specific issues. As regards the label, we found that people easily associate safeness and tastiness to it. However, its association with other quality characteristics such as flavors, variety of ingredients, nutritional aspects etc., is less straightforward, and even more difficult with respect to hygienic and healthy aspects of food. The hardest items to endorse are all related to the psychological effect of MP label ("makes me feel happy" etc.). In the light of these considerations, the promotion of the MP label should leverage primarily on the broad concepts of safeness and tastiness (for example products produced in a healthy environment, which gives to them good flavor), and to a lesser extent on "good nutrition", "variety of ingredients", "high standard of quality". More intangible connotations of the consumption experience, and in particular its psychological dimension, may have less communicative and evocative power: for this reason, their integration in communication and marketing strategies should be accurately weighted, and rather intended in the context of consumer education.
As regards the purchase intention, people do not exclude a priori the possibility of buying MP products. Nevertheless, a random encounter with these products will not automatically result in buying decisions, and active search of MP products is even less likely. In this sense, MP foods are perceived as geographically and locally embedded, as the easiest perceived shopping venues are local farm shops and mountain farms, rather than food specialties stores or supermarkets in urban contexts.
We also found that people more easily associate mountains with positive environmental characteristics (beautiful scenery, cleanness etc.) than with local products, even traditional ones (good and souvenirs). The association with hospitality and services, such as accommodation, restaurants etc., is even more difficult.
Page 18 of 25 Bassi et al. Agricultural and Food Economics (2022) 10:13 Finally, the results show that people are quite aware that mountains provide delicious foods and traditional food culture, and therefore people will recommend them. Instead, variety and diversity do not seem to be the strengths of mountain foods. Furthermore, it seems to be difficult to associate food with visiting the mountains, probably because people do not consider it necessary to go there to experience a specific product.
The disaggregation of the analysis by sociodemographic characteristics provides further insights to plan and target promotional activities. For what concerns gender, we found differences between males and females only about label perception, with the latter apparently less sensitive to it. Age affects both label perception and purchase intention, with middle age people being more likely to them,. The level of education seems to affect all dimensions: having a relatively low level of education makes people more sensitive to mountains, mountain food and its specific label, and increases purchase intention. Finally, income only influences purchase intention, which is obviously lower in the lowest income class, probably due to the possible high prices of mountain products.
To conclude, we argue that using the results of this research to plan promotional activities can effectively contribute to enhancing the value of mountain foods, raising awareness of the EU label, and thus improving the sustainability of mountain farms and regions. Our findings are also relevant regarding the appropriateness of the psychometric methodology (Rasch model) used in the analysis of the latent dimensions related to these topics. From the point of the validity of the instruments used, to adequately measure all persons, future research should focus on other items, more difficult to endorse than those considered in this paper, since extreme scores are useless to estimate Rasch models and their attitude is estimated by additional hypothesis. Although the sample size and its composition proved to be quantitatively and qualitatively appropriate for the scope of our study, the replication of the investigation on different geographical areas may allow for a broader generalization of the results. For this reason, further research should be conducted in other regions for comparison, both in Italy and in other EU countries.
Finally, from the statistical point of view, the selection of the persons who fit the Rasch model, with valid measurement scales like the one adopted here, may lead to results of causal effects models, between the dimensions involved, that are quite different from those obtained by applying methodologies that do not perform this preliminary selection based on Rasch models. But this is an aspect which seems appropriate to postpone to future works. Bassi et al. Agricultural and Food Economics (2022)