Communicating epidemiological results through alternative indicators: Cognitive interviewing to assess a questionnaire on risk perception in a high environmental risk area

Abstract Participatory approaches to environmental research and decision-making require that all social stakeholders are involved from the onset of the debate. In such a setting, communication among different expertise is crucial, but language and technicalities may represent a barrier. In the clinical setting, decisions regarding treatment preferences may be influenced by the summary statistics used, but, according to the literature, no study has compared different statistical indicators for risk communication in environmental epidemiology. In this paper, we report on the qualitative results of the cognitive interviews conducted for assessing two questionnaires devoted to investigating risk perception when selected epidemiological results are communicated, by using different statistical indicators of health impact and uncertainty. The initial questionnaires were tested on 15 people residing in the high environmental risk area of Livorno (Italy). Cognitive interviewing led to substantial revision of the initial drafts. Moreover, it highlighted the difficulty of communicating statistical uncertainty and the need to account for the complex interaction between mathematical skills, affective factors and individual a priori knowledge on environmental risk perception.


PUBLIC INTEREST STATEMENT
Understanding how individual risk perception may vary depending on the different ways used to communicate results of environmental and health researches is crucial to empower and promote informed participation of citizens to decision making processes. We thus designed a randomized study aimed to investigate if individual risk appraisal changed when the same epidemiological results were communicated using different epidemiological measures of health risk and statistical uncertainty. The target population was the population residing in the area of Livorno (Italy), classified since 1995 as a high risk environmental site, due to the presence of polluting industries and a large commercial harbor. In this paper, we report on the results of cognitive interviews conducted in a pilot phase of the study to validate the questionnaires successively used in the experiment. The cognitive interviews led to substantial revision of the initial questionnaires. They highlighted the important role of mathematical skills, affective factors and individual a priori knowledge on environmental risk perception, understanding and appraisal.

Introduction
Since the 80s, the growing hazards placed by industrial sites, the emerging need for environmental equity, greater transparency and direct public participation in environmental policies shifted risk communication strategies from a top-down approach, where the public was a passive component of the process, to a participatory approach (Boholm, 2008;Leiss, 1996;Otway, 1987). The right to be informed, as stated by the Seveso Directive has become the right to participate (Commission of the European Communities, 1990Communities, , 1997Communities, , 2012Marchi, Funtowicz, & Pereira, 2001).
According to the Habermas' theory of communicative rationality, risk communication is a dialogic and deliberative process where all participants are actively involved to improve the message (Habermas, 1985;Jovchelovitch, 2007;Krauss & Fussell, 1996;Renn, 2004). Risk understanding is an inter-subjective process which is "continually modified by acts of communication" (Wardman, 2008(Wardman, , p. 1627. In this perspective, a top-down communication approach based on the technocratic expertise is inefficient (Renn, 1998). On the contrary, a participatory approach allows for the inclusion of collective values and preferences in risk management and communication (Löfstedt, 2005;Stern & Fineberg, 1996). Involving all social stakeholders (people, government, scientists and experts, corporations and media) from the onset of debate, each expertise, value and perspective is shared and taken into account within a multi-actor process oriented to produce health and environmental policies and decisions (Bennett & Calman, 1999;Brydon-Miller, 2003;Leung, Yen, & Minkler, 2004;Renn, 1998Renn, , 2004Wardman, 2008). This results in the enhancement of democratized research practices through implementation of co-production strategies and co-operative inquiry techniques (Cataldi, 2014;Reason & Bradbury, 2001). In this framework, social science has a public engagement role (Burawoy, 2005;Karnieli-Miller, Strier, & Pessach, 2009). Implementing participatory risk communication strategies is a challenging task because of the complexity of risk perception. According to the psychometric paradigm, risk perception is multi-dimensional and depends on individual attitudes and factors, e.g. voluntariness, dread, knowledge, hazard controllability, familiarity (Fischhoff, Slovic, Lichtenstein, Read, & Combs, 1978;Slovic, Fischhoff, & Lichtenstein 1984). The socio-cultural approach suggests that media portrayals of hazards and risk, public perceptions, social roles and networks can also affect risk perception (Bouyer, Bagdassarian, Chaabanne, & Mullet, 2001;Scherer & Cho, 2003;Sjoberg, 2000). The important analysis of Douglas and Wildavsky (1982) showed that risk perception is affected by social orientations such as egalitarianism, individualism, hierarchism, and fatalism.
The work reported in this paper represents the preliminary phase of a research on risk communication conducted in Livorno, a city of 159,431 inhabitants located in Tuscany, Italy. This research, which was conducted according to a participatory perspective, originated from the need of the citizens to be informed about the environmental and health risks related to the presence of high environmental impact industries and a large commercial harbor in the city area. The present paper focuses on methods and qualitative results of cognitive interviews conducted during the initial phase of the project, in order to assess two questionnaires to be successively administrated to a large sample of the local population within a randomized experiment (see Section 2). The questionnaires were devoted to investigate risk perception when selected epidemiological results were communicated using different statistical indicators of health impact and uncertainty.
After presenting the context in which the project arose and the research questions guiding the first qualitative phase of the study (Sections 2 and 3), we explain the alternative indicators employed in the questionnaires (Section 4). Then, we describe the structure of the preliminary drafts (Section 5), the modality of recruitment of people participating in the cognitive interviews and the cognitive interview process (Section 6). Finally, the qualitative results of the interviews are reported and discussed (Sections 7 and 8). In the last section of the paper we briefly draw our conclusions (Section 9).

The local context
As briefly explained in the introduction, the city of Livorno is characterized by the presence of high environmental impact industries and a large commercial harbor. In particular, the Northern districts of Livorno and the neighboring municipality of Collesalvetti (16,791 inhabitants) are characterized by high population density and social housing; they host a petrochemical plant, an incinerator and several highly polluting industries for waste treatment. In 1995 the Livorno-Collesalvetti area was classified as a high risk environmental site according to the Seveso Directive, but, despite the acknowledged environmental pressure, private houses and industries have continued to coexist up to the present day (Commission of the European Communities, 1990Communities, , 1997Communities, , 2012. Moreover, no effective environmental requalification plan has been implemented and, recently, new impacting interventions were proposed by the local government to reinforce the activities of waste treatment: the building of a regasification plant, a biomass power plant, a landfill and a second incinerator. Faced with this political agenda, the local community rose up, with the creation of citizen committees that lodged complaints, organized strikes and petitions against the polluting industrial activities, accusing local authorities of hiding data and avoiding the involvement of the population in the decision processes. In response to these actions, the regional government funded a project, in 2007, aimed to implement a participatory and inclusive approach to define strategies for sustainable development of the area (http://www.provincia.livorno.it/altri/ambiente/progetto-parteciparia/). However, in a climate of growing distrust of institutions, the committees repeatedly asked epidemiologists to directly communicate their results on health risks in the area to the population. These requests inevitably confronted the "experts" with the need of considering the impact of their scientific communications on people's risk perceptions.
As we discussed above, the way data are presented, including the statistical indicators used to express health risks, can influence risk perception and/or create misunderstandings deriving from the discrepancy between the rationality of lay people and the rationality of experts. Most literature on the use of risk indicators is focused on individual clinical informed decision, and studies have shown that decisions regarding treatment preferences may be influenced by the summary statistics used (Akl et al., 2011;Feldman-Stewart, Kocovski, McConnell, Brundage, & Mackillop, 2000;Halvorsen, Selmer, & Kristiansen, 2007;Misselbrook & Armstrong, 2001;Nelson, Reyna, Fagerlin, Lipkus, & Peters, 2008;Paling, 2003;Sheridan et al., 2003;Sorensen et al., 2008). However, no data exist on risk perception when different statistical indicators are used in environmental epidemiology, particularly in Italy. Therefore, in the spirit of promoting effective communication strategies within a participatory perspective, in 2010 the Istituto Toscano Tumori funded our project titled: "Epidemiological and statistical approaches to Risk Communication in an area at high environmental hazard". The project, approved by the local ethics committee on 6 September 2010, aimed to collect information about the environmental risk perception in Livorno and to experiment different ways to communicate the epidemiological results to the population. A trial was planned on a sample of 600 residents in Livorno, randomized to reply to two different questionnaires, where two risk indexes and two statistical uncertainty indicators were introduced to express the same epidemiological results. The ultimate goal was to obtain useful insights for the development of future communication plans on local environmental health risks, accounting for people's judgment on clarity and understanding of epidemiological results, local knowledge and risk perception.
The project required the development of ad hoc questionnaires aimed to assess whether the residents' risk perception was influenced by the statistical indicators used to communicate the epidemiological data and to collect other relevant baseline information. The main items of the questionnaires were based on results concerning the actual health profile of the Livorno-Collesalvetti population, establishing a link between individual risk communication for patients and community risk communication.

Aim of cognitive interviewing and research questions
The social mobilization against industrial pollution in Livorno, as well as the awareness of a gap between lay and expert knowledge, motivated us to use co-operative inquiry techniques in order to incorporate lay people's points of view and perceptions in the questionnaires (Heron, 1996;Reason & Bradbury, 2001). Therefore, according to a participatory perspective, we preliminarily assessed the first draft of the questionnaires through qualitative cognitive interviews involving a selected sample of citizens.
Cognitive interviewing is a qualitative method which is frequently applied to develop questionnaires and used within multistage approaches aimed to pre-test survey instruments (Grant et al., 1999;Horwood, Pollard, Ayis, McIlvenna, & Johnston, 2010;Simon et al., 2012;Tourangeau, 1984;Willis, Royston, & Bercini, 1991). It allows to identify how the respondents interpret and answer the proposed questions, and to detect possible errors in the items. Interpretation of words or phrasing, relevance of items and responses can be investigated in order to assess the feasibility of the questionnaires, heighten the validity of the results, and limit the non-response rate due to incorrect interpretation (Collins, 2003;Drennan, 2003).
In the present study we adopted cognitive interviewing to pursue two general and broad objectives: to problematize the distance between the participants' and researchers' language; to eventually reorient the initial researchers' considerations and the items of the questionnaires, also enabling the emergence of possible new categories of interpretation and new research hypotheses. These objectives translated into four main research questions: (Q1) Is it possible to improve the items in order to make the statistical indicators more understandable?
(Q2) How should the questions be formulated in order to enable understanding of the concept of statistical uncertainty?
(Q3) What is the level of respondents' understanding when faced with scientific terms and sentences that the experts consider as "neutral" or "objective"? This question is more general and aimed to improve the overall suitability of the items, updating them according to suggestions and comments on each specific question and the questionnaires as a whole.
(Q4) Do we need to collect additional information in order to better capture issues relevant to risk perception?
Cognitive interviews were accompanied by unstructured interviews, devoted to investigate knowledge, feeling and attitude of the respondents about general issues such as environmental hazards, involvement in advocate groups, and trust in local authorities.

Statistical indicators
A preliminary analysis of the health profile of the population of Livorno from 2001 to 2006 supplied the epidemiological results used to formulate the main items of the questionnaire. The analysis was performed on current administrative data and compared mortality from specific causes or groups of causes in Livorno to Tuscany as a whole, according to standard methodologies (Biggeri et al., 2006;Martuzzi, Mitis, Biggeri, Terracini, & Bertollini, 2002). For each cause of death, we estimated the Standardized Mortality Ratio (SMR), i.e. the ratio between the observed number of deaths in Livorno during the period of interest (O) and the number of deaths that would have been observed in the same period if the risk in Livorno was equal to the average risk in the region (E). We adjusted SMRs for age, gender and deprivation level. Then, we expressed the burden of mortality attributable to "living in Livorno" through two indicators: • the percent excess of risk of death in Livorno with respect to Tuscany: % excess = 100 * (O − E)/E = 100 * (SMR-1); • the Time Needed to Harm (TNH), i.e. the average number of days one has to expect to observe one death in excess in Livorno, taking Tuscany as the reference: where N is the total follow up duration, in days.
For example, in Livorno, during the reference period there were 620 deaths from cancer, corresponding to a SMR equal to 104.5%. This result was expressed in terms of a 4.5% excess in mortality from cancer, or in terms of one more death from cancer every 13 days. While the percent excess represents a relative measure of excess, the TNH is an absolute measure of excess. It should be noticed that the TNH is a newly developed statistical indicator developed within this project. It can be seen in some sense as analogous to the number needed to harm (NNH) used in a clinical context, which expresses the number of patients, on average, that have to be exposed to a risk factor/treatment, over a specific period of time, to cause harm in one patient who otherwise would not have been harmed. Comparisons between NNH and relative risk have been performed in clinical risk communication (Naik, Ahmed, & Edwards, 2012;Sheridan et al., 2003).
We also focused on communication of statistical uncertainty around epidemiological results. We compared the usual way to express uncertainty in terms of statistically "significant" or "not significant" result, on the basis of p-value, with an alternative approach which quantifies uncertainty in terms of q-value (Storey, 2003). In our specific context, each q-value could be interpreted as posterior probability that the null hypothesis of no excess in Livorno was true, given the observed data.

Structure of the initial questionnaires
We developed two questionnaires, A and B, each containing 7 sets of questions:

R7-Individual socio-demographic characteristics
While sections R1 and R7 were common to both questionnaires, sections R2-R6 varied depending on the indicators used to communicate the epidemiological results: % excess risk and p-value in draft A, TNH and q-value in draft B. Through questions R2 and R3, participants were asked to express their judgment about selected epidemiological results on cause-specific mortalities, deriving from the analysis of the health profile of the population of Livorno. Questions R4-R6 were formulated in order to assign a degree of reliability to different epidemiological results when presented together with a measure of statistical uncertainty. These questions progressively introduced more difficult issues, in order to disclose the highest number of problems in understanding or answering.

R1-Risk perception
Question R1 on risk perception (not reported) was drawn from the Original 40-Item Domain-Specific Risk-Taking (DOSPERT) Scale 2002 (Weber, Blais, & Betz, 2002). The DOSPERT scale is a psychometric scale that assesses risk attitude and perception in five content domains: financial decisions, health/ safety, recreational, ethical, and social decisions. It has been validated in a wide range of settings and populations, and it is considered particularly relevant in clinical situations (Harrison, Young, Butow, Salkeld, & Solomon, 2005). First, respondents rate the likelihood that they would engage in domain-specific risky activities; then, they rate the magnitude of the perceived risks associated to the same activities. In our questionnaire, we initially included only the risk perception scale and only the questions relative to the health/safety domain. The risk perception responses evaluated the respondents' gut level of the risk entailed by each activity/behaviour, on the basis of a 7-point rating scale ranging from 1 (Not at all) to 7 (Extremely Risky).

R2 and R3-Understanding of impact indicators
Question R2 was formulated in order to rate the individual concern about mortality from cancer in Livorno in respect to Tuscany as a whole, on a scale ranging from 1 (no concern) to 10 (extremely concerned), plus an opt-out category (Table 1). In question R3, we asked the interviewee to rate, according to her/his concern, three different epidemiological results concerning mortality from ovarian cancer, thyroid cancer and myocardial infarction in women. The results were reported in terms of percent excess risk in questionnaire A and in terms of TNH in questionnaire B. In questionnaire A, we also reported the total number of yearly deaths from each specific cause of death. In fact, this additional information was needed to derive TNH from the percent excess, thus making sure that the information provided by the two questionnaires was equivalent from a statistical point of view. Finally, participants were asked to explain the reasons for their response through an open-end question (R3.1) (not reported).

R4 and R5-Understanding of uncertainty indicators
Items R4 and R5 were addressed to investigate how respondents interpreted messages containing a description or a value of the statistical uncertainty associated with the impact indicators. In these questions, the level of uncertainty around selected epidemiological results was presented in terms of "significance"/p-values (questionnaire A), or in terms of q-values (questionnaire B) ( Table 2).
Through posing question R4, we wanted to test how people dealt with the uncertainty indicators, without any introductory explanation about the concept of statistical uncertainty. The question consisted of a brief sentence reporting the result concerning the risk of death from melanoma in Livorno-Collesalvetti with respect to Tuscany as a whole. Then, the respondents were asked to rate the reliability of this result under three hypothetical scenarios corresponding to different levels of statistical uncertainty.
Through posing question R5, we investigated the individual level of confidence on the presented results, after a brief introduction which focused on p-value in questionnaire A and q-value in questionnaire B. Participants were asked to express their judgment about statements reporting both impact estimate and uncertainty indicator (that is, to express their judgment considering two different pieces of information simultaneously).
In both questions, the respondents could express their judgment about the reliability of the results on a scale ranging from 1 to 10, plus an opt-out category.

R6-Comparing messages containing both impact indicator and uncertainty indicator
Question R6 consisted of a comparative table, where number of deaths, impact estimates (RRs or TNH) and related uncertainty indicators (p or q values) were presented for 27 causes of death (not reported). Respondents were asked to rate the three most concerning illnesses. This table was introduced to gather information on people's ability to face a very complex task: to make a comparison among many diseases by considering three different pieces of information (risk indicator, uncertainty indicator, cause of death).

R7-Socio-demographic characteristics
Socio-demographic data were gathered at the end of the questionnaire in R7 (not reported). Please state your concern about this result on a scale ranging from 1 (no concern) to 10 (extremely concerned) Questionnaire B From 2001 to 2006, we observed 1 more death from cancer every 13 days in Livorno-Collesalvetti in respect to Tuscany Please state your concern about this result on a scale ranging from 1 (no concern) to 10 (extremely concerned)

Question R3
Questionnaire A From 2001 to 2006, we observed the following results in Livorno-Collesalvetti: (1) The risk of death from ovarian cancer was 25% higher than in Tuscany, with an overall number of 13 deaths every year; (2) The risk of deaths from thyroid cancer in women was 60% higher than in Tuscany, with an overall number of 2 deaths every year; (3) The risk of deaths from acute myocardial infarction in women was 6% higher than in Tuscany, with an overall number of 152 deaths every year.
Please, tick which result is the most concerning to you and which result is the least concerning to you Questionnaire B From 2001 to 2006, we observed the following results in Livorno-Collesalvetti: (1) One more death every 4 months in respect to Tuscany from ovarian cancer; (2) One more death every 14 months in respect to Tuscany from thyroid cancer in women; (3) One more death every 46 days in respect to Tuscany from acute myocardial infarction in women.
Please, tick which result is the most concerning to you and which result is the least concerning to you

Interviewing process and sample recruitment
Each participant had to complete both questionnaires A and B, which were alternatively submitted as first questionnaire to consecutive participants. The interview was structured into 4 parts: (1) Statistically significant, how reliable do you rate this result on a scale ranging from 1 (not at all) to 10 (completely)?
(2) Highly statistically significant, how reliable do you rate this result on a scale ranging from 1 (not at all) to 10 (completely)?
(3) Not statistically significant, how reliable do you rate this result on a scale ranging from 1 (not at all) to 10 (completely)?
Questionnaire B From 2001 to 2006 in the Livorno-Collesalvetti area, we observed a higher risk of death from melanoma than in Tuscany.
(1) If I say that there is a 20% probability that this result does not represent a true higher risk of death, how reliable do you rate this result on a scale ranging from 1 (not at all) to 10 (completely)?
(2) If I say that there is a 5% probability that this result does not represent a true higher risk of death, how reliable do you rate this result on a scale ranging from 1 (not at all) to 10 (completely)?
(3) If I say that there is a 80% probability that this result does not represent a true higher risk of death: how reliable do you rate this result on a scale ranging from 1 (not at all) to 10 (completely)?

Question R5
Questionnaire A In the Livorno-Collesalvetti area, in 2001-2006 we observed: (1) A 13% excess of mortality from breast cancer in women in respect to Tuscany. This result is statistically significant (p-value = 0.02). How probable do you think that this excess is due to chance on a scale ranging from 1 (not at all) to 10 (completely)?
(2) A 68% excess of mortality from pleural cancer in men in respect to Tuscany. This result is highly statistically significant (p-value = 0.0001). How probable do you think that this excess is due to chance on a scale ranging from 1 (not at all) to 10 (completely)?
(3) A 100% excess of mortality from testis cancer in respect to Tuscany. This result is not statistically significant (p-value = 0.08). How probable do you think that this excess is due to chance on a scale ranging from 1 (not at all) to 10 (completely)?
(4) An 8% excess of mortality from bone cancer in men in respect to Tuscany. This result is not statistically significant (p-value = 0.32). How probable do you think that this excess is due to chance on a scale ranging from 1 (not at all) to 10 (completely)?
Questionnaire B In the Livorno-Collesalvetti area, in 2001-2006 we observed: (1) One more death every 2 months from breast cancer in women in respect to Tuscany, with a q-value of 20%. How probable do you think that this excess is due to chance on a scale ranging from 1 (not at all) to 10 (completely)?
(2) One more death every 3 months from pleural cancer in men in respect to Tuscany, with a q-value of 3%. How probable do you think that this excess is due to chance on a scale ranging from 1 (not at all) to 10 (completely)?
(3) One more death every 6 years from testis cancer in respect to Tuscany, with a q-value of 50%. How probable do you think that this excess is due to chance on a scale ranging from 1 (not at all) to 10 (completely)?
(4) One more death every 6 and a half years from bone cancer in men in respect to Tuscany, with a q-value of 52%. How probable do you think that this excess is due to chance on a scale ranging from 1 (not at all) to 10 (completely)?
Part 1-A brief explanation of the aim of the interview was provided. The interviewer read and explained to the interviewee the instructions reported on the first page of the questionnaire. The participant then filled the first assigned questionnaire. After its completion, she/he was asked to highlight encountered difficulties and to provide a general comment. Part 2-As we needed to give the respondent a "break" before the administration of the second questionnaire, open questions were asked to explore her/his knowledge about local industrial hazards, trust in local authorities and involvement in local advocate groups; the participant was also asked about personal experiences related to the health conditions presented in the questionnaire. Part 3-Questions R2-R6 from the second questionnaire were submitted. Part 4-Impressions about the general formulation of the two drafts were collected, asking which wording, format or statistical indicator was the most suitable or clear. A brief discussion was undertaken with the interviewee when inconsistent answers emerged in parallel questions from the two questionnaires. This allowed us to investigate mechanisms in answer generation and detect possible relevant factors affecting responses.
The interviewer was a trained sociologist with experience in qualitative interviews and risk perception studies. Questionnaires were handed to respondents. Both think-aloud and probing approaches, frequently used in cognitive interviews, were used (Beatty & Willis, 2007).
Each respondent was informed about the content and purpose of the survey before the interview in order to guarantee that decision to participate derived from informed judgment. Confidentiality and anonymity of the responses were guaranteed.

Sample
A sample of 15 individuals participated in the interviewing process. Participants were chosen through the snowball sampling technique, a form of non-probability sampling which relies on referrals from initial subjects to generate additional subjects (Biernacki & Waldorf, 1981). First, a small pool of initial participants was identified; then, these initial participants were used to nominate, through their social networks, other individuals who could be appropriate respondents. Snowball sampling does Yes not produce a representative sample of the study population, but, in order to limit the selection bias, we used three different initial contacts who didn't know each other. Thus, they were asked to recruit five acquaintances with specific socio-demographic characteristics, including age, gender, education level, kind of occupation and level of activism in local committees.

Items revision
We did not use a formal classification of the problems emerging during the cognitive interviews, nor a quantitative scale for summarizing the results (Conrad, 1996). A detailed qualitative description of each interview was produced instead. Each interview was recorded after participants' consent. A written report was completed by the sociologist after each interview, including information on the first questionnaire submitted (A or B), the duration of the interview, the socio-demographic characteristics of the interviewee, her/his degree of trust in media, as well as specific comments on single items and on the questionnaire as a whole. Participants' written notes were also accounted for. A final summary was therefore completed: individual and common themes were discussed and used in drafting the final version of the questionnaires.

Results
Fifteen respondents participated in the cognitive interview process (Table 3): nine males and six females, aged 37-74 years, with different educational backgrounds (from primary to university levels). Six participants were active in local environmental organizations. Time for completing the whole interview ranged from 50 to 70 min. In most cases, questions from the questionnaire administered first place, independently of being A or B, took more time to be completed than the same questions from the one administered second place.
All participants were given a clear view of the aim of the interview and its relevance within the project. However, all subjects rated the questionnaires as very demanding, and most of them thought that the interview required complex reasoning and high mathematical competence. Some respondents feared to give "wrong answers", although the interviewer had clearly explained before the interview that there were no right or wrong responses. Despite this, a maximum of 15 min was needed to complete the first questionnaire and only one person, a 74 year-old woman with low educational level, was unable to complete it. As she appeared stressed and uncomfortable, the interviewer decided not to ask her to respond to a second questionnaire.
Cognitive interviewing provided relevant insights to address the issues summarized in the research questions Q1-Q4. In describing the evidence arising from the results, we referred to each specific research question when possible, but it should be stressed that Q1-Q4 are closely related to each other, thus justifying an overall discussion.
Question R1 was rated as clear in content. Suggestions emerged for rewording two items: "Having sex without protection" was turned into "Having occasional sex without protection" and "Buying drugs for personal use" was turned into "Using drugs". However, despite clarity, we observed that most respondents had difficulties in relating the proposed risks to their own situation and rated all items as risky (scores higher than six). For this reason, in the final questionnaires we decided to also add the DOSPERT attitude scale, in order to gain additional and more effective information.
Questions R2 and R3 provided some insight regarding research question Q1. In question R2, we asked the respondents to express their concern about mortality from cancer in Livorno compared to Tuscany (Table 1). In both questionnaires, R2 was rated as clear in content. At first glance, TNH was interpreted straightforwardly by most respondents. Those who preferred communication based on percent excess underlined that the temporal reference (annual deaths) and the percentage format were clearer and allowed an easier rating of concern. When directly asked to compare clarity of the two indicators, some participants suggested that TNH should have been matched with the total number of yearly deaths as in the corresponding question in questionnaire A: without a baseline reference, TNH alone was perceived insufficient to evaluate the magnitude of the effect.
In question R3, we asked the interviewee to rate mortalities from three different diseases, according to his/her concern (Table 1). Overall, question R3 was considered understandable, but two main issues emerged: • When communication was based on percent excess, most participants recognized the total number of deaths as the most correct indicator to rate the three diseases; only few participants were driven in their judgment by the percentage values.
• In the final comments, most respondents considered TNH easier to interpret than percent excess: the time frame for one excess death was interpreted as clear information, with no need to keep in mind different data in the decision process, as in questionnaire A (percent value and total number of deaths).
Furthermore, the following points arose concerning the overall suitability of the item (Q3): • Seven respondents stated that their answer was driven by their personal experience or their knowledge about the health effects of pollution, irrespective of numbers and whichever statistical indicator was used. Involvement in environmental advocate groups strongly affected the response. Low trust in local agencies and fear of data manipulation were main concerns.
• Some respondents underlined that using strongly different diseases (ovarian and thyroid cancer on one side and acute myocardial infarction on the other side) was misleading: cardiovascular diseases and cancer were perceived very differently for etiology, prevention and care. In particular, cancer was considered highly associated to pollution, while myocardial infarction was mainly associated to genetics or individual lifestyle: "Acute myocardial infarction is less worrying because it is not caused by pollution" (Interview 8). Moreover, cancer was considered incurable and myocardial infarction curable. Therefore, many people rated cancer as more concerning than myocardial infarction, irrespective of data.
• Some male participants emphasized that items should not address problems that are relevant to females only, like ovarian cancer: "Ovarian cancer is less concerning because it affects women only" (Interview 7).
Following the last two observations, in the final questionnaires we modified question R3 by reporting only results for cancer, relevant to both sexes. Moreover, being personal experience or perceived knowledge on pollutants so influent on the response, we replaced the open-end questions R3.1 with a close-end question on the possible reasons of the choice: the numerical data presented; personal knowledge or experience about the illness; personal knowledge about pollutants released in Livorno (Q4).
Question R4 asked respondents to assign a "reliability" score to the same sentence concerning mortality from melanoma in Livorno, when coupled with different levels of uncertainty expressed in terms of statistical significance (i.e. p-value) (questionnaire A) or q-value (questionnaire B) ( Table 2). Phrasing of question R4 was very problematic in both questionnaires (Q2). Most participants needed to read the question several times and required extensive explanations before replying. In general, the wording "higher risk of death from melanoma than in Tuscany", without numerical quantification, was felt as insufficient to frame the burden of mortality. Moreover, respondents did not understand that they were dealing with alternative hypothetical scenarios of statistical uncertainty (Q3). The difficulty of asking questions which require comparison of hypothetical scenarios is reported elsewhere in the literature (Jardine & Hrudey, 1997). Respondents emphasized that the "if" in the main sentence generated confusion. In fact, in the brief description of the study reported in the first page of the questionnaires, it was clearly stated that the epidemiological results used were those actually estimated for Livorno-Collesalvetti during the period 2001-2016. Thus, proposing three alternative scenarios, two of which forcedly false, contradicted this initial information, generating ambiguities and producing stress in the respondents. As a consequence, question R4 was perceived as a double-barreled question or a trick question, and the interviewees had the uncomfortable feeling of being under scrutiny. Finally, due to perceiving the question as "confusing", personal beliefs emerged to help: for melanoma, "all excesses are true because of the widespread habit of the inhabitants of Livorno of sunbathing without protection" (Interview 9).
Problems were also detected with regard to the response scale and the adjectives used to express the degree of confidence in the reported results: • The word "true" in question R4 of questionnaire B resulted in a sharp interpretation of the question itself: are the results true or false? As a consequence, respondents tended to use the extreme values of the response scale.
• Some respondents interpreted "reliable" as a statement concerning the validity of the result or the way data were generated, and directly related this concept to their individual trust in the research agency which provided the results.
• The answering scale was used differently by participants. In particular, central values were used both by respondents who actually wanted to express a medium level of reliability and by those who were uncertain about the meaning of the question and did not know how to answer.
For all these reasons, we decided not to include question R4 in the final questionnaires.
In reporting epidemiological results to the population, the focus is usually on risk or impact indicators. One of the aims of our research was also to explore effective ways to communicate to lay people the statistical uncertainty affecting impact estimates (in terms of p-values in questionnaire A and q-value in questionnaire B), making risk communication multidimensional. Question R5 consisted of four statements whereby we reported, at the same time, both impacts and uncertainty measures concerning different causes of deaths. Interviewees were asked to express their judgment about the probabilities that four different results were due to chance. An explanatory note on pvalue and q-value preceded question R5 (not reported). The wording of this item was rated as complex by the respondents (Q2). Many answers were influenced by personal experiences and focused on extreme values of the answering scale: "all excess are true because Livorno is a polluted area" (Interview 7). However, important observations arose which guided us in rewording the question in a more appropriate way. During the interview process, question R5 was reformulated several times and presented in alternative versions to the respondents. In the final questionnaires, the explanatory note about uncertainty was made clearer, question R5 was completely reworded and the answering scale was reduced to a four point-scale, plus an opt-out choice. A wording description of the score was added to each point of the scale.
The comments of the respondents on questions R4 and R5 highlighted that q-values were initially considered less intuitive. However, after the explanation given by the interviewer (following the explanatory notes reported in the questionnaires), most participants perceived them clearer than pvalues. The assigned scores appeared more modulated when uncertainty was expressed in terms of q-value, with the whole range of the answering scale being used. On the contrary, p-values seemed to induce a true/false interpretation of the communicated results (Q2).
Regarding question R6, consisting in a very complex comparative table (not reported), most respondents based their answers on one measure only, being unable to use the three figures (number of deaths, impact indicator and uncertainty) jointly without contradiction. As in previous questions, scoring was driven by individual experience, especially when people felt they could not grasp the required reasoning. Again, involvement in environmental groups and committees strongly affected data interpretation. As expected, respondents felt uncomfortable with question R6, which thus appeared clearly unsuitable for being used in the larger randomized experiment, despite its utility during the cognitive interviewing process. Question R6 was therefore excluded from the final questionnaires.
As cognitive interviewing is considered particularly useful where there is uncertainty about respondent understanding of concepts and questions, we considered that it was a proper approach for pretesting the feasibility of our initial drafts and to understand which factors might affect interpretation of questions, validity of answers and nonresponse (Beatty & Willis, 2007;Drennan, 2003). Cognitive interviewing is a method of questionnaire testing from the participant point of view, as respondents are asked to explain problems they face and to suggest solutions to items design (Carbone et al., 2002;Drennan, 2003). Therefore, cognitive interviewing gave us the opportunity to share information with the participants and to extend our insight on more general factors relative to the local context, which could affect the answering process. Cognitive interviewing also allowed us to include lay people's suggestions, requirements and languages in the questionnaires, in accordance with a participatory approach to risk communication.
The results of the cognitive interviewing process led to a radical revision of the initial questionnaires designed by the experts, in response to the problems addressed by the participants. At the same time, they brought out interesting points which deserve to be discussed.
The relevance of the questions was challenged by participants. Male participants claimed to be less interested or unable to answer the question on ovarian cancer, due to being strictly related to females. This result stresses the importance of paying attention when proposing gender-specific or disease-specific items (Hay et al., 2014). More generally, it confirms that people are more prone to engage in the answering task if they feel that the proposed issue is relevant to them (Fortune-Greeley et al., 2009;Morgan, Amtmann, Abrahamson, Kajlich, & Hafner, 2014;Thompson et al., 2011).
Inconsistencies in word interpretation emerged. For example, the Italian words for "true" and "reliable" are used in plain language, but their meaning was not univocal among participants, resulting in different interpretations of the same question and different uses of the answering scale. Ambiguity of terms, inconsistent interpretation of words or ignorance of their meanings are the most frequent issues reported during cognitive interviews for questionnaire design, and may occur even when plain language is used (Carbone et al., 2002;Hay et al., 2014;Horwood et al., 2010;Sherman et al., 2014;Watt et al., 2008). Word choice, both in items and in response options, is a key step to ensure that a question is homogenously interpreted by the interviewees and to guarantee a substantial consistency with the original intention of the researcher (Carbone et al., 2002;Horwood et al., 2010;Irwin, Varni, Yeatts, & DeWalt, 2009;Myers & Newman, 2007). Sometimes, even minimal changes in wording can improve participants' understanding of questions (Johnson & Slovic, 1995). When interpretation problems arose, we modified the questionnaire items. However, because of economic constraints, we could not perform multiple rounds of cognitive interviews in order to check whether our item reformulation reduced the initial inconsistencies.
The feedback from the interviewees on the reported statistical results was not homogeneous. Some people reacted calling for "less numbers". For instance, someone explicitly stated that, in question R3, TNH was clearer because "risk was conveyed by one single number". On the other hand, someone asked for additional numerical information, even if not strictly needed to understand the data. Both these reactions clearly indicate that statistical sufficiency does not correspond to informative sufficiency.
Our results on R4, R5 and R6 questions confirm that statistical uncertainty is a very difficult issue to communicate (Johnson & Slovic, 1995, 1998Williams, 2004). The wording of these questions reflected a style which was usual for epidemiologists and statisticians but unfamiliar to lay people. The respondents had difficulties in evaluating alternative hypothetical scenarios linked to increasing degrees of uncertainty (R4), and in expressing their judgment on the basis of multiple information about impact and statistical uncertainty (R5 and R6). Selective attention, that is, the inability of judging more than a single item of information at any one time, has been widely observed during the interview: most participants were unable to use number of deaths, effect and uncertainty measures jointly, grounding their judgment only on one of these items. This does not mean that we must abandon the idea of multidimensional risk communication. On the contrary, cognitive interviews allowed us to figure out what was wrong in the structure of the questions (e.g. the use of hypothetical scenarios), to avoid words that could generate ambiguity ("true", "reliable"), and to simplify the answering scale. We received important suggestions to improve the quality of the language used in our questionnaires. From a more general perspective, this indicates that involving lay people in designing risk communication is crucial.
Our findings also suggest the importance of investigating mathematical literacy as a factor which can have a strong influence on personal decision processes (Gigerenzer & Edwards, 2003;Manganello & Clayman, 2011;Nelson et al., 2008;Reyna et al., 2009). In order to better address the comparison between TNH and percent excess, which was the main objective of the subsequent randomized experiment, a simple 3-item scale was introduced in the final questionnaires to assess the basic numeracy comprehension of the interviewee (Schwartz, Woloshin, Black, & Welch, 1997).
Because of the complexity of communicating environmental risk to the resident population, factors other than strict cognitive processes affected item understanding and answering (Slovic, Finucane, Peters, & MacGregor, 2007). We found that personal experiences of illnesses, a priori knowledge of the risk factors, trust and involvement in advocate groups potentially had strong influence on the interpretation of the epidemiological results and on related concern (Covello, 1989;Covello & Sandman, 2001;McComas, 2006;White et al., 2004). For example, the different perception about aetiology and curability of cancer versus cardiovascular diseases, already reported in the literature, was clearly underlined by the respondents (Wang et al., 2009). Furthermore, environmental activists stated their position before answering, and it was difficult to redirect their reasoning on the numbers that had been transmitted.
Finally, we noticed that the effect of personal beliefs on the responses was stronger in people who experienced greater difficulty in understanding questions or tasks required for answering. This is in line with the literature on medical decision-making, which provides evidence that low numeracy comprehension affects risk perception through increased susceptibility to affective factors (Nelson et al., 2008;Peters, 2008;Peters et al., 2006;Reyna et al., 2009).

Conclusions
The work reported in this paper is part of a more general project that aims to investigate how citizens' risk perception varies according to the indicators of impact and statistical uncertainty used to communicate specific epidemiological results concerning the health profile of the population in a high environmental risk area. The project as a whole, as well as the cognitive interviews described in the present paper have been conceived within a participatory perspective. Adopting participatory approaches allowed for the involvement of the population from the onset of the risk communication process, enhancing its efficacy and accountability. Moreover, such approaches can improve community empowerment, enabling people to better represent their interests.
From a practical point of view, communication or study tools can be tuned and refined on the basis of lay people's points of view. In our study, the use of cognitive interviewing helped us to perfect and debug two questionnaires, successively administrated to a larger random sample of inhabitants, by incorporating interviewees' suggestions and requirements. In fact, the initial drafts of the questionnaires, formulated by experts, reflected their ideas regarding the best way to ask the relevant questions to the population, without accounting for lay people's expertise and perspectives. For this reason, these drafts could not be considered optimal to collect information and measure risk perception in the population, before confronting them with the community's standpoint.
The cognitive interviews confirmed the need to change part of the initial questionnaires and to introduce a few new sections. The qualitative results of the cognitive interviews highlighted problems related to meaning and interpretation of some words, and the importance of a careful selection of the causes of mortality compared in the questionnaires, in order to avoid biases related to gender and a priori beliefs about disease severity. Interesting issues also arose regarding the influence of personal experiences and concerns on risk perception and the crucial role of mathematical skills. Finally, we ascertained that it was quite difficult for interviewees to reply to questions relating uncertainty measures alongside health impact measures.
As widely discussed in the previous section, our findings partly confirmed the results of previously published qualitative and quantitative studies.
In conclusion, our study is an example of how cognitive interviewing can be used to integrate lay people's standpoints in defining study tools/questionnaires conceived by experts. This kind of approach is particularly recommended when sensitive issues, such as community risk perception, are investigated. In the future, it would be advisable to implement procedures of co-building of questionnaires between lay people and experts, through inclusive methodologies as those developed in participatory action research and in co-operative inquiry (Baum, MacDougall, & Smith, 2006;Chevalier & Buckles, 2013;Heron, 1996;Reason & Bradbury, 2001).