Exploring Perspectives and Preferences for OverDiagnosis and Overtreatment of Thyroid Nodules using a MCDA Method

Background: The detection of thyroid cancer has rapidly increased over last few decades without an increase in disease specic mortality. Several studies claim that the diagnose of thyroid nodules through routine ultrasound imaging is often the trigger for cascade effects leading to unnecessary follow-up over many years or to invasive treatment. The objective of this study was to explore physicians’ and patients’ insights and preferences for diagnosis and treatment of thyroid nodules as well as awareness of overdiagnosis and overtreatment. Methods: An online survey was developed using a comprehensive multi criteria decision Analysis (MCDA) framework, the EVIdence based Decision-Making (EVIDEM). The EVIDEM core model used in this study encompassed 13 quantitative criteria and four qualitative criteria. Participants were asked to provide weights for each criterion, performance scores for appraising the thyroid nodules interventions and their consideration of impact of contextual criteria. Normalized weights and standardized scores were combined to calculate a value contribution across all participants, additionally differences across participants were explored. Results: all 105 participants, there are 48 patients, 31 physicians and 26 normal citizens. The highest value estimate of the intervention reached and value value gave such supportive


Background
Thyroid nodules (TN) refers to an abnormal growth of thyroid cells that form a lump within the thyroid gland [1]. They are mostly benign and harmless, but also can be malign or lead to hyperthyroidism. The nature of most TN is not clear until by imaging alone. TN are very common, according to the American Thyroid Association, about 50% of all people by age 60 have TN, over 90% of them are noncancerous [1].
In order to detect thyroid cancer, most TN if seen in ultrasound need further evaluation, such as control thyroid ultrasonography over time, thyroid hormone test, scintigraphy or ne needle aspiration cytology.
The detection and thus the prevalence of thyroid cancer has rapidly increased over the last few decades without an increase in disease speci c mortality [2]. Thyroid surgery is more common in Germany in contrast with the situation in some other countries, in a 2015 study, the rate was about four times higher than that in England and seven times higher than the rate of Netherland [3]. Especially in Germany the diagnosis of TN leads to a growing amount of thyroid surgery after which histology reveals benign nodules that would not have needed to be removed [4]. Several studies from industrialized countries have shown the increasing numbers of thyroid cancer with constant or decreasing mortality [2]. Signi cant increase in TN and thyroid cancer in developed countries is attributed mainly to an unapproved diagnostic imaging of the thyroid gland by ultrasound [5,6]. We claim that the diagnose of TN through routine ultrasound imaging is often the trigger for cascade effects leading to unnecessary follow-up over many years or to invasive treatment [7].
"PRO PRICARE (Preventing Overdiagnosis in Primary Care)" is a network of health services research at the Friedrich Alexander-Universität Erlangen-Nürnberg (FAU) funded by the Germany Federal Ministry for Education and Research (BMBF) [8]. The project was led by the Institute of General Practice, in cooperation with other institutes at the FAU and the University Hospital Erlangen. PRO-PRICARE was divided into three sub-projects, one of them being "Adverse cascade effects" (ACE). Cascade effects are de ned as "processes that, once initiated, will stepwise proceed until their seemingly inevitable results".
One common example is the treatment of patients with TN being potentially malignant discussed above.
The aim of the ACE was to learn more about the existence and reasons for cascades in the care for patients with TN. One of the studies within ACE is the one here reported. Although overdiagnosis and overtreatment in the care of patients with TN in Germany are well described, little is known about the perspectives and preferences of physicians and patients.
The objective of this study was to explore physicians' and patients' insights and preferences for diagnosis and treatment of TN as well as awareness of over-diagnosis and overtreatment, using a multi criteria decision analysis (MCDA) method.

Study design
The EVIdence based decision-making (EVIDEM, 10th Edition 2019) framework was selected to investigate participants' insights and preferences. EVIDEM framework is designed to re ect and to stimulate structured re ection and pragmatic collection of preferences on healthcare interventions from all participants, through a broad spectrum of quantitative and qualitative criteria [9]. In this study, we provided synthesized data based on analysis of the intervention on TN for each of these criteria to create a TN speci c EVIDEM core model online questionnaire. Participants were investigated to ll in an online questionnaire to collect their insights and preferences. Participants included physicians, patients who ever had thyroid diseases and normal citizens who had not thyroid diseases before, named as citizens in this study, stemming from treating or living with the disease, or having great interest on this topic. Participants were recruited through public access like medical networks, regional distributors and newspaper announcement. All participants were recruited anonymously and voluntarily.

Online questionnaire design and conduct
An online survey was created in line with the core model of EVIDEM framework for the participants. The EVIDEM core model used in this study encompassed ve categories in total 13 quantitative criteria, while the contextual tool consisted of four qualitative criteria. The description of each category and criterion used in this study is shown in Table 1. A de nition of all criteria as well as background knowledge such as sociodemographic data used in this survey was provided to participants in the online questionnaire in German language. To provide su cient evidence to appraise each criterion, a literature review was used to obtain relevant information on TN and its current management [8,10,11]. All participants were invited to ll in an online questionnaire. The online questionnaire was constructed through the survey software EFS Survey by UNIPARK (https://www.unipark.com/umfragesoftware/) [12]. Informed written consent and online questionnaire were approved by "Data protection o cer" at FAU according to the Bavarian State Representative Data Protection (https://www.datenschutz-bayern.de/vorstell/impressum.html). We contacted an online survey open to the public, Informed written consent was clicked while ful lling the online survey by each participant. Participants were assured that the research would not contain their personal identifying information.
In the rst part of the survey, participants' perspectives on what matters most in general, i.e., which criteria contribute the most to the value of healthcare interventions, was captured by weight. Our study used a 5point weighting scale (1 = lowest relative importance, 5 = highest relative importance). In the second part of the survey, participants were asked to appraise the actual intervention on TN about its performance for each criterion, which captured by score. Participants scored performance of the intervention on TN using two types of scoring scale, for non-comparative criteria from 0 to + 5, for comparative criteria from − 5 to + 5. Higher score indicates better performance. The third part of the survey was about qualitative contextual criteria, participants indicated whether consideration of a given criterion had a negative, neutral, or positive impact on the decision about the intervention on TN.

Data analysis
Numerical outputs were calculated for each participant. In this study, we used Excel to calculate and analyze weight, score and contextual impact. Mean and standard deviations (SD) were calculated in Excel to quantify the variability as descriptive statistics. Normalized weights were summed up to 1.0 (Wx, Σ Wx = 1). For example, for one single participant, the normalized weight of each criterion equals to the weight of this criterion given by this participant divided by the total weights given by this participant. For example, participant A weighted a criterion with a "5", and total weight of all criteria given by participant A was "50", then the normalized weight for the criterion given by participant A was 5/50 = 0.1. The value contribution (VCx) of each criterion was calculated following a linear additive model as sum of the products of the normalized weights and standardized scores (= score/5). For example, if a criterion received a normalized weight of 0.05, and a score of 5, its value contribution is 0.05*(5/5) = 0.05. For the evaluation of the contextual criteria, a numerical scale (-1, 0, 1) was used to represent negative impact, no impact and positive impact.

Results
The online survey was conducted from 2018 November 20th to 2019 June 30th. At the point of closing the online system on 2019 June 30th, we had received valid data from 105 participants in total. Of all these participants, there are 31 physicians, 48 patients who have or ever had thyroid diseases, and 26 normal citizens who had no thyroid diseases, we called citizens in this study (See Table 2).

Perspectives of participants on decision criteria
Regarding weights provided by participants, Fig. 1 shows that the most important criteria was "Comparative effectiveness (0.088 ± 0.010)", followed by "Type of therapeutic bene t (0.086 ± 0.010)", and then "Disease severity (0.086 ± 0.011)". Normalized weight across criteria was summing up to 1.0, higher weight indicates that the criterion is more important from participants' view. As Fig. 1 shows, the least important criteria were three cost consequences of intervention relative criteria. The largest variations in weights were observed for "Cost of non-medical intervention (SD, 0.022)", "Size of population (SD, 0.021)" and "Unmet needs (SD, 0.020)". The smallest variations were "Quality of life (SD, 0.008)", "Comparative effectiveness (SD, 0.010)", "Type of therapeutic bene t (SD, 0.010)".
Mean normalized weights were also calculated assigned to each criterion by three groups as physicians' group, patients' group and citizens' group (Fig. 2). Physicians weighted "Safety of intervention" higher than the other two groups, for the physicians' group 0.086 ± 0.011, for patients' group 0.079 ± 0.014 and citizens' group 0.079 ± 0.017. Large variance was also showed in 'Type of preventive bene ts', for citizens' group was 0.082 ± 0.011, for patients' group was 0.084 ± 0.012, however, in the physicians' group, the weight was much lower than in the other two groups with a weight of 0.072 ± 0.018. Similar situation also happened to the criterion of" Comparative patient perceived health", the weights given by physicians 0.074 ± 0.021 lower than the other two groups, for patients 0.085 ± 0.008 and for citizens' group 0.089 ± 0.009.

Scores
As Fig. 3 shows, for non-comparative criteria, "Type of preventive bene t" received the highest score (0.682 ± 0.261), which shows most participants gave highest performances score on the preventive methods for thyroid disease. Followed by "Type of therapeutic bene t", this was scored 0.636 ± 0.188.
Especially this criterion had the smallest SD, which indicates most participants have an agreement on therapeutic methods are highly useful for treating thyroid disease. For the category "Need of intervention", "Unmet needs" received the smallest score 0.408 ± 0.251, followed by "Disease Severity", with a score of 0.488 ± 0.211. But "Size of affected population" in the same category received a higher score 0.533 ± 0.196.
For comparative criteria, "Comparative effectiveness" received the highest score, 0.634 ± 0.402. However, in the meantime, the SD is also very high, which indicates some participants gave a very low score. This was also the case to other comparative criteria, all ve comparative criteria got relatively higher SD compared to non-comparative criteria. "Comparative safety" in the same category "Treatment Interventions" got a low score of 0.299 ± 0.446. We designed this question as "How do you assess the safety of surgical treatments of TN?" In this category, the criterion "Comparative patient perceived health" got a very low score 0.006 ± 0.435. The question is: "How do you assess the in uence of the diagnosis of TN on the quality of life of those affected?" As shown in Fig. 4, we also calculated mean (SD) standardized scores by different stakeholders as physicians' group, patients' group and citizens' group. The physicians' group viewed the impact on 'comparative effectiveness' and 'Quality of evidence' higher than other criteria, however, the patients' group and the citizens' group gave 'Type of preventive bene t' and 'Type of therapeutic bene t' highest scores, respectively. Three criteria that also need to be highlighted are 'Disease severity', 'Comparative safety' and 'Clinical practice guidelines'. Referring to 'Disease severity', the score of physicians' group (0.419 ± 0.209) was smaller than in the patients' group (0.479 ± 0.192), and much smaller than in the citizens' group (0.585 ± 0.219). Referring to 'Comparative safety', the score of the physicians' group (0.174 ± 0.534) was smaller than patients' group (0.388 ± 0.432) and citizens' group (0.285 ± 0.319). Another interesting criterion was 'Clinical practice guidelines', the score of the physicians' group (0.420 ± 0.215) was also smaller than the other two groups (patients 0.583 ± 0.206, citizens 0.569 ± 0.185). As shown in Fig. 6, the highest value estimate of the intervention on TN reached 0.401 for citizens' group and lowest value of 0.287 was reported for physicians' group. The patients' group reported a value of 0.368. This gure shows very clear that different stakeholders' preferences and thoughts are different. In the physicians' group, the highest value contributors were the 'Comparative effectiveness' (0.058 ± 0.042) and the 'Quality of evidence' (0.048 ± 0.022). In the patients' group, the highest value contributors were the 'Type of preventive bene ts' (0.059 ± 0.022) and the 'Comparative effectiveness' (0.058 ± 0.033). In the citizens' group, the highest value contributors were the 'Type of preventive bene ts' (0.066 ± 0.021) and the 'Type of therapeutic bene ts' (0.061 ± 0.019). Figure 7 illustrates qualitative contextual criteria. 59% participants considered "Mandate and scope of the healthcare system" ("How do you rate the impact of healthcare system on thyroid diagnostics") had a positive impact. Consideration of "System capacity and appropriate use of intervention", 63% participants thought it had a negative impact, and for "Population priorities and access", with the question "How do you rate the in uence of stakeholders on the procedure in thyroid diagnostics", also 47% participants considered it had a negative impact. Nearly 58% participants thought "Political/historical/cultural context" had no impact on the intervention of thyroid nodules. The overall negative impact, no impact and positive impact was 38%, 35%, and 27%, respectively.

Using online platform to conduct MCDA
This study refers to the current issues which are discussed in many countries' healthcare system [10,11,[13][14][15][16]. The incidence of thyroid cancer has been increasing faster than other cancer because of the prevalence of low risk, non-lethal tumors from detection of a large subclinical reservoir of disease [14]. This increased incidence of thyroid cancer with attribution to over-diagnosis has been described in most developed countries where patients have high access to health detection. The study estimated that 70-90% of these patients had asymptomatic lesions during lifetime if ultrasound and other imaging studies were not available [14]. Our study used MCDA to identify the perspective and preference of TN interventions from physicians', patients' and citizens' groups under German healthcare system. EVIDEM was the decision support tool we selected to ful ll this MCDA study.
EVIDEM provides a set of generic decision criteria which were made and selected with the goal to support the substantive legitimacy of the decision with regard to the common goal of healthcare system [9]. To highlight the bene ts of using EVIDEM: one is to get preferences of which criteria contribute the most to the worth of healthcare interventions in general, which captured as "weights"; the other is to explore the value of the intervention on TN, which captured by performance scores. Since EVIDEM provides a set of standardized criteria, this gives the chance for researchers to compare different perspectives of the same criterion on different disease. There are more and more researchers using EVIDEM to explore perspectives on healthcare intervention. Most of them recruited a group of people to make a focus group to do interview face to face [17][18][19][20]. Our study used an on-line platform to involve more participants [21]. Recruitment of more participants can avoid being in uenced by others, such as people in a focus group.

Perspectives of participants in general on decision criteria
Weights re ect participants' values and preferences, like what matters most to them. Participants in this study indicated that the most important criteria were 'Comparative effectiveness', 'Type of therapeutic bene t', 'Disease severity', 'Comparative patients' perceived health', 'Comparative safety' and 'Unmet needs'. These results are similar to the results of a survey for the topic of chronic heart disease of all types of stakeholders across the healthcare decision continuum in Germany, which indicated the most important criteria were 'Clinical effectiveness', 'Patients' perceived health', 'Disease severity', 'Clinical safety' and 'Quality of evidence' [21].
Compared to physicians, patients tended to assign higher weights to the criteria of patients' perceived health, higher quality of life for them is much more important after the intervention. This showed also in other study, that patients' group assigned greater weight to the impact on Health Related Quality of Life [19,20]. Meanwhile compared to patients, physicians took 'safety' more into account. Since physicians are healthcare providers and patients are the receivers of the intervention, physician knows more about what the risk of the intervention, physician indicated keeping the intervention safe and effective were much more important. This difference truly highlights the need for effective communication between physicians and patients, which helps patients to express their need and incorporate their individual priorities in patients-centered healthcare system. Referring to 'Economic consequence of intervention', all participants assigned lower values than other criteria, this showed the same in another MCDA study under German healthcare system [21]. These results are in agreement with health professionals, they wish to help patients without focusing on economic constrains [21,22]. For the patients, they do not have economic constrains in the context of German healthcare insurance system for most diseases [23].

Appraisal of the intervention on thyroid nodules
Participants used scores to express their views on how each criterion favored for the health intervention based on their own experience and knowledge as well as provided information. Compared to physicians', patients' group and citizens' group highly believed that TN can be prevented and treated. Although this disease is very common, the causes and risk factors of most TN are not clear, it is di cult to prevent this disease [24]. There also one study shows that nutrition can prevent TN, like Selenium [25]. More evidence and other preventive methods we still need more scienti c researches.
Referring to 'Disease severity', physicians thought TN as a disease with lower severity than the other two groups. This result is in agreement of many studies, although the prognosis is increasing since the beginning of the twenty-rst century, the incidence rates of thyroid cancer and mortality have stabilized in more recent years [26]. Referring to 'Comparative safety', the score of physicians' group was smaller than the other two groups, which shows that physicians thought an operation on TN was not as safe as patients and others thought. This result is in agreement with other medical researchers, operative complications can be signi cant, including permanent hypoparathyroidism, vocal fold paralysis, and airway compromise [27]. For 'Clinical practice guidelines', the score of physicians' group was smaller than of the other two groups. As health practitioners, physicians thought they did not follow the recommendation for surgical treatment of a TN in accordance with the guidelines as the others. However other study already proved with the 2009 American Thyroid Association (ATA) guideline revisions, the rapid increase in thyroid cancer incidence rates has recently slowed, especially among small-sized cancers and women [27] .

Impact of contextual criteria
48% participants considered "Population priorities and access" having a negative impact on the intervention of TN. This agrees with many studies, that increased incidence of thyroid cancer with attribution to over-diagnosis has been described in most developed countries where patients have high access to health detection. Additional such diagnosis brings a growing amount of thyroid surgery on benign nodules that would not have needed to be removed. Nearly 58% participants thought "Political/historical/cultural context" had no impact on the intervention of TN. This result is in agreement with other studies, possible changes in exposure to risk factors such as diagnostic radiation overweight, diabetes may increase patients' medical surveillance, and changes in access to health inspection of thyroid gland may also be the likely explanations [10].

Limitation of Study
There are several limitations in this study. Lacking comparators is the second limitation of our study, since the design of EVIDEM framework divided into comparative criteria and non-comparative criteria, analysis in different participants' groups is our compensation method. Since our sources of recruitment of participants were Network or newspaper, there may be exited Recruitment bias, like the participants we recruited in this study are more than 60 years old.

Conclusion
Over-diagnosis and overtreatment of TN are highly discussed by many research teams in many countries. Many follow-up studies have been conducted on this topic. To explore awareness of over-diagnosis and overtreatment, our study shows participants' preferences on each criterion, i.e., they thought unmet needs and disease severity are important criteria, but they gave less scores for both criteria regarding TN. Especially physicians' group gave less score for disease severity than other participants. For citizens' group who had no experiences and knowledges of this disease, their thoughts of severity are much more severe than the two other groups. Physician indicated keeping the intervention safe and effective were much more important, patients indicated the quality of life after receiving intervention were much more important. However, we cannot roughly conclude from our study, there is overdiagnosis and overtreatment of TN. This study provides a new perspective to explore preferences and insights from different groups. Additionally, through comparison among physicians, patients and citizens, differences have been highlighted in the study, which can make further better communication between physicians and patients. This study provided a suppurative decision making for physicians and policy makers when they conduct researches on TN. We hope the result of this study can contribute to improve diagnosis and treatment of this disease, in addition to ensure sustainable and equitable healthcare resources distribution.