Quality of epidemiological studies: Procedural rules for uncertain science for policy, a case study on bisphenol-A

https://doi.org/10.1016/j.envsci.2018.03.010Get rights and content

Highlights

  • Qualichem_epi allows collective quality assessment of epidemiologic studies.

  • Qualichem_epi reflects both majority and minority judgments in a group.

  • Expert judgments about quality are heterogeneous even using the same criteria.

  • Group consensus can be an artifact of procedural rules in health agencies.

Abstract

This paper proposes a method for in-depth mapping of heterogeneity in expert judgment, in the evaluation of the quality of epidemiological studies used in regulatory chemical risk assessment. Whereas consensus in scientific advisory groups provides legitimation for subsequent political action, it can also have unintended effects on the quality of regulatory risk assessment.

Based on empirical testing of our method, called Qualichem_epi, with ten experts and two epidemiological case studies about bisphenol A (BPA)’s effects on human health, we have shown that expert judgment plays an essential role in managing uncertainty and deciding what “quality” of a study actually means. We found substantial heterogeneity of scientists’ judgments about the quality of epidemiological studies, even if the same criteria were used for the assessment. This heterogeneity is not present anymore in reports produced by expert groups, where results are presented under the collective signature of all the scientists involved. We argue that flattening heterogeneity can be an important problem when it is not the result of true scientific agreement but only a secondary effect of consensus-based working procedures of agencies that experts have to follow.

Qualichem_epi provides an easy to understand color-based picture of both majority and minority opinions in a scientific advisory group. We suggest that it could be used on a regular basis for communicating quality assessments of epidemiological studies in regulatory chemical risk assessment.

Introduction

What counts as valid “evidence”, along with implicit and explicit criteria used for appraising its quality, is the crux of the scientific advisory process. However, in practice this may differ largely among scientists and expert groups active at the science-policy interface. For example, Beronius et al. (2010) compared ten risk assessments produced by expert groups in the European Union (EU), the United States (USA), Canada and Japan, and seven of them (published between 2002 and 2008) found no risk to the general population. One, namely the Chapel Hill experts who gathered in 2006 at a meeting sponsored by National Institute of Environmental Health Sciences (NIEHS) and US Environmental Protection Agency (US EPA), concluded that there is a risk to the entire population at current exposure levels. The remaining two committees published their results in 2008 and expressed concern about some risks, primarily to fetuses and infants.

These divergent conclusions are characteristic for the controversy about the effects of endocrine disrupters on human health and the environment. The core of the disagreement is about the health effects of small doses of chemicals, potentially following nonmonotonic dose-response patterns, which might act on the endocrine system and hence affect a wide range of body functions, on the long term (exposure of mothers leading to lifelong effects on the child to born) (Vandenberg et al., 2009). In a regulatory framework, two problems fuel the controversy: such effects might be hard to grasp by standardized OECD testing protocols that have nevertheless a status of “recognized evidence” (Maxim and Van der Sluijs, 2014), and the definition of endocrine disrupters is debated (Horel and Bienkowski, 2013).

More recent assessments of BPA risk by ANSES (Agence nationale de sécurité sanitaire de l’alimentation, de l’environnement et du travail) and EFSA (European Food Safety Authority) reveal the same pattern: ANSES (2013) concluded that “…handling thermal paper receipts leads to risk situations for the four types of effects considered: mammary gland, brain and behaviour, the female reproductive system, metabolism and obesity.” (p. 6) Two years later, EFSA (2015) was publishing an opposite conclusion: “there is no health concern for any age group from dietary exposure and low health concern from aggregated exposure” (p. 1).

Different scientists may evaluate a given published study (or even raw data) as having very different quality and relevance (Rudén, 2001). Similarly, Maxim and Van der Sluijs (2014) showed that the discipline could influence the expert judgment about the quality of a scientific article on BPA. What is considered a “good paper” can be very different between specialists in endocrinology and scientists trained in other disciplines. Furthermore, at the science–policy interface, scientific quality can be assessed according to regulatory standards, such as OECD’s standardized protocols, which can come in conflict with academic standards (Demortain, 2013; Maxim and Van der Sluijs, 2014). Van der Sluijs and Van Eijndhoven (1998) investigated variability in the climate risk assessment reports by expert groups in the Netherlands during the pre-IPCC period. They found remarkable differences in the conclusions of two expert groups despite a large overlap in composition. They showed that the context in which the experts operated and the commitments they made in each setting were key factors in explaining the variability in framing, judgments and conclusions. In practice, there is a lot of flexibility an expert group may introduce into the argumentative strategy when new scientific data or new practical situations arise. Van Eijndhoven and Groenewegen (1991) showed that despite the availability of scientific data that may call for a change in the assessment, the context can drive expert groups to stick to their former conclusions, whereas from the same data other conclusions can be constructed if the context changes.

For highly complex issues such as chemical risks, experts must reach a conclusion despite uncertainty, and expert judgment is used to fill in the gaps (Van der Sluijs et al., 2008). How uncertainty is dealt with in a particular context and by a particular group of scientists determines what is considered to be “good science” (and “evidence”), which has the political function to decide if the risk is “acceptable” or not (Jasanoff, 1990).

The essential role of uncertainty in science for policy has long been recognized and extensively addressed in the field of post-normal science (see Strand, 2017 for a recent overview). Funtowicz and Ravetz (1993) showed that present day complex issues at the science-policy interface exhibit characteristics that make them hard to tackle with normal scientific procedures. This requires new ways of interfacing science and policy (Funtowicz and Ravetz,1990). Funtowicz and Ravetz (1993) have called this class of problems post-normal, where 'normal' refers to Kuhn's (1962) concept of normal science: the practice of uncritical puzzle solving within an unquestioned framework or 'paradigm'.

Funtowicz and Ravetz (1993) signalised that normal science runs into serious limitations when addressing societal issues (in that time nuclear reactor safety) where scientific evidence is highly contested and plagued by uncertainties while decisions are urgent, stakes high, and values in dispute. The available knowledge is typically characterised by imperfect understanding of the complex systems involved. Models, scenarios, and assumptions dominate assessments of such issues, and many (hidden) value loadings reside in problem frames, indicators chosen, and assumptions made.

Scientific assessments of complex risks are thus unavoidably based on a mixture of knowledge, assumptions, models, scenarios, extrapolations and known and unknown unknowns. Consequently, scientific assessments will unavoidably use expert judgements. It comprises bits and pieces of knowledge that differ in status, covering the entire spectrum from well-established knowledge to judgments, educated guesses, tentative assumptions and even crude speculations (Van der Sluijs et al., 2005, 2008). Knowledge utilisation for risk governance requires a full and public awareness of the various sorts of uncertainty and underlying assumptions. To perform this task, Knowledge Quality Assessment (KQA) tools are essential (Van der Sluijs et al., 2008; Maxim and Van der Sluijs, 2011; Maxim and Van der Sluijs, 2014). KQA seeks to systematically reflect on the limits of knowledge in relation to its fitness for function. It comprises systematic analysis of, and critical reflection on, uncertainty, assumptions and dissent in scientific assessments in its societal and institutional context.

Despite the central role of study-quality in underpinning chemical risk policies, there is currently no structured framework for assessing the quality of epidemiologic studies regularly used in agencies. Whereas for toxicological studies several frameworks have been proposed in the literature1 (Maxim and van der Sluijs, 2014; Samuel et al., 2016; Roth and Ciffroy, 2016) and the Klimisch score is recommended – while disputed - in regulatory contexts such as REACH,2 some similar attempts were published for epidemiologic studies (World Health Organization, 2000; Deeks et al., 2003; Vandenbroucke et al., 2007; Briggs et al., 2009; Dor et al., 2009; Westreich, 2012) but no guidelines are available for their use in the advisory practice, in chemical risk assessment, at European level. Each agency may then use its own reviewing guideline, which may include the criteria considered relevant at that moment and in that specific institutional and socio-political context.

The choice of the types of studies to be included or excluded in the risk assessment is also implicitly or explicitly decided during the advisory activity and depends on each topic and their contextual/regulatory framework: whereas exclusively published studies could be used in some groups (Barthes, 2014), both published and unpublished literature, like ad-hoc reports funded by agencies for filling in gaps in the existing exposure knowledge, were used by ANSES (2013) and by EFSA (2015). Furthermore, the endpoints considered relevant for measuring the health or environmental impact of a substance can differ from one expert group to another (e.g., different endpoints were considered by ANSES and EFSA for deciding about the risk of BPA).

The work of expert groups is organized along formal and informal procedures in force in health and environmental agencies (henceforth “agencies”). A very common procedure is judgment by consensus among the members. While in practice part of the advisory work may include exchanges of contradictory evidence and contradictory interpretations of evidence, true or at least apparent-but-undisputed consensus is encouraged, for producing final conclusions.

Besides providing legitimacy for decision-making, the pursuit of consensus in expert groups has a second political role, which is to give a public sign of adhesion to a conclusion for all the members of the group. Jasanoff (1990) has shown how conclusions of expert groups represent the result of a negotiation between the scientists involved, and how expertise in itself may represent for agencies a way to lower or prevent controversies through involvement of the relevant scientists in advisory procedures. The exchange of arguments in an expert group can also lead to sound argument closure (Beauchamp, 1987) on parts of the scientific disputes. Further, dissident scientists for example might have the opportunity to express their views in a process that avoids deployment of their arguments in the public arena and subsequent criticism.

However, consensus is sometimes only an outward appearance (see for example the case of Love Canal described by Jasanoff, 1990). In any group, strong personalities can greatly influence collective discussions and limit the ability of other individuals to express critical opinions. When consensus is overtly favored, some individuals might be reluctant to express criticism when their opinions disagree with the group’s majority view and/or chairman, a phenomenon coined “the spiral of silence” (Noelle-Neumann, 1986). The chairman of an expert group has a major role in balancing the different views but he/she might unconsciously favor those views that agree with his own or his institution.

It might be argued that recent procedures in agencies allow the expression of minority opinions. For example, ANSES recently encouraged the expression of minority opinions, supported by French legislation that ground scientific advisory activities in the “contradiction principle” (ANSES, 2012, 2016). Minority opinions can hence be included as an annex to the main scientific output (e.g., a report), a similar procedure to that of EFSA (EFSA, 2017).

However, in practice, very few experts use this opportunity. Indeed, expressing a minority opinion remains exceptional in advisory procedures, as it produces isolation of the expert from the rest of the group and demands his/her strong commitment for using this procedure. Furthermore, it is adapted to few, salient aspects of the work, but not for managing regular heterogeneity of experts’ judgments. Whereas it could theoretically reinforce the robustness of the group results, regular criticism can also be perceived as individual inability to work in a group, or even worse, a questioning of the scientific qualities of group colleagues (Barthes, 2014). In extreme cases, such an overtly and repeated criticism can contribute to attributing a label of troublemaker slowing down the collective work to the “guilty” expert. The resulting attitude from those experts can be to prioritize their criticism and focus only on some aspects, the others on which he/she might be critical being concealed to the harmony of the group functioning. In addition, other personal aspects come into play and discourage overt criticism, such as sympathy and respect that often creates during informal interactions (during lunches, coffee breaks, etc.).

In all these ways, the informal rule of consensus leads to losing the deviating views and judgments of certain individuals in the expert groups’ discussions, despite their potentially significant contribution to the quality of the final conclusions. Indeed, minority (individual) views in a group are not necessarily minority views in science, but can simply be an artifact of the criteria used to choose experts to include in that group (Maxim and van der Sluijs, 2014).

In all, downplaying “minority” views can have important, if not dramatic consequences, as shown for the Fukushima case by Fujigaki and Tsukahara (2011). Even if some scientists had predicted earthquakes and tsunamis-related nuclear crises similar to what finally happened, those responsible for atomic policies ignored them.

The push for consensus can negatively influence the final quality of the advisory work through still another mechanism: one expert with undeclared conflicts of interests, but with sufficient discursive skills, can be enough to influence the whole judgment of the group, as the other members are collegially striving to consider his/her opinion for reaching consensus.

The criteria used to select the experts to be included influence the ability to reach final consensus and its content. In practice, these criteria are rather general, referring to scientific competence, conflicts of interests, available time to be dedicated to advisory work and balance among different disciplines in a group.3

Similarly, the relative weight given to the competence of potential experts, compared to their personality, previous public positions taken on the issues to be addressed in the advisory activities or their scientific discipline is a patchwork job, specific to each situation of advising. Certain characters might be preferred, e.g., those who are more easily reaching consensus with their colleagues, and who adapt easier to the institutional framework of expertise, to procedural rules and to group functioning - all very different of those in academia. The choice of the experts may also depend on the very particular setting of the environmental topic assessed. For example, for the controversial issue of radio frequencies on human health in France, AFSSET built the group with the objective to find a balance between scientists who previously took public positions against NGOs, and those who hadn’t, such an offset aiming to attaining somehow the “group impartiality” (Barthes, 2014). At EFSA, the group must reflect “a balanced representation of skills and qualities and a broad and deep range of expertise and scientific perspectives4.

The overall objective in agencies is balance between the experts in a group, which additionally contributes to flatten heterogeneity among scientists relevant for a particular topic. Aiming at attaining balance among different disciplines and even public views on a topic increases the probability that few - if not single - specialists of particular issues in a discipline or a research field are present in a group. Given that the perceived level of uncertainty in a given body of knowledge depends on the “distance from site of production” - i.e., the degree of specialization and knowledge about that issue (MacKenzie, 1990) - allowing the expression of heterogeneity in a group in of key importance to avoid biases. Indeed, compromise may lead to exclusion of specific knowledge that only one member of the group detains, which in some cases might even contradict the views of other members of the group which are not specialists of that specific subject (e.g., the case of statistical analysis of data in scientific papers, for which a specialist can provide a highly qualified insight that most generalist users of statistics do not have).

The BPA case study is particularly appropriate for our objectives: suspected to be an endocrine disrupter, BPA has made headlines all over the world—particularly in the USA and the EU. During the last years, beyond ANSES’ and EFSA’s reports addressed in this paper, the risk of BPA has been intensively assessed by many advisory groups and agencies, e.g., the European Scientific Committee on Food in 2002, the European Chemicals Bureau in 2003, the EFSA in 2006, 2008, 2009, 2010, 2011, 2015 and 2016, by Environment Canada and Health Canada in 2008, by WHO and FAO in 2009 and 2010, by FDA in 2010 and 2014, by the Swiss federal health authority in 2016, by the Japanese Research Institute of Science for Safety and Sustainability and the National Institute of Advanced Industrial Science and Technology in 2007 and 2011, by the Danish EPA in 2011, by RIVM in 2016. These numerous assessments responded institutionally to the intense socio-political controversy over the potential negative impacts of BPA present in many products (baby bottles and other baby food containers, cash receipts, epoxy resins, coatings of cans for food and beverages, electronic equipment housing units, dental sealants, etc.). Exposure during pregnancy was suspected to produce endocrine-related damages in the babies of the women concerned, including cancer, metabolic diseases such as obesity and neurobehavioral problems.

Explicitly addressing the heterogeneity of expert groups can contribute to reinforcing high quality professional scientific judgment, based on continuous contradiction5 and critical peer-review. Indeed, scientific work is based on the principle of peer-review as an essential contributor to quality and robustness. Accounting for criticism and diverging opinions is not contradictory with the pursuit of consensus, but aims at avoiding that consensus is reached for the wrong reasons (e.g., in scientific advisory activities, undue influence from one particular expert, or experts striving collectively to agree on a conclusion that nevertheless remains scientifically unsatisfactory for some of them).

In case where divergences remain, reporting them can be an option. The assumption that scientific legitimacy can only be based on consensus is based on the untenable linear model of the relationship between science and policy, which - in spite of its demonstrated unrealism - is still very present and produces problematic underexposure of policy-relevant scientific dissent (Van der Sluijs et al., 2010). For an agency, the perverse effect of flattening heterogeneity can be to give by itself all the reasons why it could be criticized by those who are not involved in its work – who will inevitably exist, given the large number of scientists working on some controversial issues, to the point that the agency cannot all include them in its groups, - or even by those involved but uncomfortable with the conclusions produced by the group (Barthes, 2014).

Inspired by post-normal science, we propose a tool allowing the expression of the disagreement between experts, in addition to points of convergence, for the review of epidemiological studies considered in chemical hasard and risk assessments. Such a tool could be used for strengthening the quality and transparency of the group’s work and/or for communicating remaining uncertainties and dissent.

Section snippets

An original typology

To enable testing of our hypothesis, we combined the analysis of documents produced by ANSES (2011) and EFSA (2010, 2014) with an empirical setting that involved 10 scientists in academia — which is a sample of suitable size (Knol et al., 2010). We thus compared the evaluation of study quality by academic scientists alone (who were not subject to any procedural rules) with quality assessment of the same studies made by expert groups in two agencies, the EFSA and ANSES (where experts worked

Results: Qualichem_epi for two published papers and comparison with assessments by ANSES and EFSA

Within the application of Qualichem we distinguish two levels of quality: aggregated quality and level of confidence in the whole study (see details in Appendix A in Supplementary material). They provide a way to represent both majority and minority opinions, and hence those results should be considered together. We represent the overall scores in graphs that are divided into three colored areas: red (including scores and median scores < 3), orange (for scores and median scores between 3 and 4)

Discussion and conclusion

Our comparison of Qualichem responses with the reports of ANSES (2011) and EFSA (2010, 2014) showed that the level of heterogeneity in our respondents’ answers is much higher than what was reflected in these documents, where study quality assessment is reported as a result of the whole expert group, without differentiation between the views of particular scientists. That heterogeneity is a normal feature of any expert group, which nevertheless is lost during the processes of consensus-based

Acknowledgments

This work was supported by the French Ministry of Ecology, of Sustainable Development, of Transport and Housing (MEDDTL) in the framework of the PNRPE 2010 programme (URL: http://www.pnrpe.fr/), as part of the project “Toolkit for uncertainty and knowledge quality analysis of endocrine disruptors’ risk assessments: the case study of Bisphenol A” (DICO-Risk). We are grateful to Céline Vaslin for help with the figures, to Kara Lefevre for stylistic and linguistic improvements and to two anonymous

References (41)

  • ANSES (Agence Nationale de Sécurité Sanitaire, Alimentation, Environnement, Travail), 2013. Perturbateurs Endocriniens...
  • ANSES (Agence Nationale de Sécurité Sanitaire, Alimentation, Environnement, Travail), 2016. Avis n° 2016-2 relatif à la...
  • Y. Barthes

    L’expertise scientifique vue de l’intérieur : le groupe de travail « Radiofréquences » de l’Afsset (2008–2009)

    Environment, risque et santé

    (2014)
  • T.L. Beauchamp

    Ethical theory and the problem of closure

  • D.J. Briggs et al.

    Uncertainty in epidemiology and health risk and impact assessment

    Environ. Geochem. Health

    (2009)
  • J.J. Deeks et al.

    Evaluating non-randomised intervention studies

    Health Technol. Assess.

    (2003)
  • D. Demortain

    Regulatory toxicology in controversy

    Sci. Technol. Hum. Values

    (2013)
  • EFSA (European Food Safety Authority)

    Scientific opinion on bisphenol A: evaluation of a study investigating its neurodevelopmental toxicity, review of recent scientific literature on its toxicity and advice on the Danish risk assessment of bisphenol A

    EFSA J.

    (2010)
  • EFSA (European Food Safety Authority)

    Draft Scientific Opinion on the risks to public health related to the presence of bisphenol A (BPA) in foodstuffs. Endorsed for public consultation draft scientific opinion.

    (2014)
  • EFSA

    Scientific opinion on the risks to public health related to the presence of bisphenol A (BPA) in foodstuffs

    EFSA J.

    (2015)
  • Cited by (6)

    • Multi-layered enzyme coating on highly conductive magnetic biochar nanoparticles for bisphenol A sensing in water

      2020, Chemical Engineering Journal
      Citation Excerpt :

      Due to its large scale use and wide spread application, BPA has been emitted into environment, which will further pollute surface water and groundwater [2]. Meanwhile, as an endocrine disrupting chemical with estrogenic activity, it can cause various health problems, including reproductive and developmental disorders in infants and children, neurological diseases and cancer [3]. Therefore, BPA poses a great threat to the environment and humans’ health.

    • On the role of review papers in the face of escalating publication rates - a case study of research on contaminants of emerging concern (CECs)

      2019, Environment International
      Citation Excerpt :

      Assessing the quality of a research field is not only a question of investigating if researchers are ‘doing things right’ but also if they are ‘doing the right things’ (Giampietro and Bukkens, 2015). This includes clear communication of uncertainties, which is particularly relevant when dealing with complex policy-relevant issues such as CECs, as “Knowledge utilisation for risk governance requires a full and public awareness of the various sorts of uncertainty and underlying assumptions” (Maxim and Van der Sluijs, 2018, p. 81). Traditionally, review papers are assumed to play a central role to that end and the increasing publication rates suggest that their role is likely to expand.

    • Economics and power in EU chemicals policy and regulation

      2023, Economics and Power in EU Chemicals Policy and Regulation
    View full text