Exploring individual and demographic characteristics and their relation to CHNRI Criteria from an international public stakeholder group: an analysis using random intercept and logistic regression modelling

Introduction The Child Health and Nutrition Research Initiative (CHNRI) method for health research prioritisation relies on stakeholders weighting criteria used to assess research options. These weights in turn impact on the final scores and ranks assigned to research options. Three quarters of CHNRI studies published to date have not involved stakeholders in criteria weighting. Of those that have, few incorporated members of the public into stakeholder groups. Those that have compared different stakeholder groups, such as donors, researchers, or policy makers, showed that different groups place different values upon CHNRI criteria. When choosing the composition of a stakeholder group, it may be important to understand factors that may influence weighting. Drawing upon a group of international public stakeholders, this study explores some of the effects of individual and demographic characteristics has on the weights assigned to the most commonly used CHNRI criteria, with the aim of informing future researchers on avoiding future biases. Methods Individual and demographic information and 5-point Likert scale responses to questions about the importance of 15 CHNRI criteria were collected from 1031 “Turkers” (Amazon Mechanical Turk workers) via Amazon Mechanical Turk (AMT), which is an online crowdsourcing platform. Thirteen of the fifteen criteria were analysed using random-intercept models and the remaining two were analysed through logistic regression. Results Self-reported health status explained most of the variability in participants’ responses across criteria (11/15 criteria), followed by being female (10/15), ethnicity (9/15), employment (8/15), and religion (7/15). Differences across criteria indicate that when choosing stakeholder groups, researchers need to consider these factors to minimise bias. Conclusion Researchers should collect and report more detailed information from stakeholders, including individual and demographic characteristics, and ensure participation from both genders, multiple ethnicities, religious beliefs, and people with differing health statuses to be transparent regarding possible biases in health research prioritisation. Our analyses indicate that these factors do influence the relative importance of these values, even when the data appears fairly homogeneous.


Data collection
The survey was hosted on Amazon Mechanical Turk (AMT), a crowdsourcing platform. AMT pays its workers, called "Turkers," for completing micro-tasks, such as image annotation, or to answer a survey. Researchers buy credits on AMT, enabling a set number of tasks to be completed. AMT will advertise the task on behalf of the researcher and its Turkers will sign up to complete the tasks. Researchers are able to approve or reject the tasks based on the Turker' s performance. No identifiable information is passed through AMT; however, researchers could ask for identifiable information in surveys.
Turkers were informed that this survey was for a research study, how and when they could withdraw their data, that their participation should be voluntary, as well as any perceived risks and benefits to the research. They were provided with the email address of the lead author, should they have further questions about the survey. Ethical approval was obtained through the Usher Institute of Population Health Sciences and Informatics and through the Moray School of Education, both at the University of Edinburgh, and the survey abided by the Guidelines for Academic Requesters.
Turkers were paid US$ 1.75 for each completed survey, which we allotted 30 minutes for. As AMT 'times out' and participants would lose the ability to be reimbursed for their participation after a set time, we allotted much more than the time expected, as we did not want the survey to time out on any participants. The average time to complete the surveys was just under 7 minutes per survey.
Questionnaire CHNRI criteria were transformed into question statements (Appendix S1 in Online Supplementary Document). The criteria can also represent what one values more when investing into health research. For example, if there are competing interests, is it more important that one invests in research that reduces disease burden (criteria: disease burden reduction) or that is respectful to other cultures (acceptability/ issues surrounding use)? The criteria and the corresponding questions can be found in Table 1.
Non-identifiable individual and demographic information, such as age, gender, self-reported urban vs rural status, self-reported health status, country of residence, political views, immigration status, employ-VIEWPOINTS PAPERS ment status and ethnicity were asked. The full survey is available in Appendix S1 in Online Supplementary Document.
Surveys were released in batches at different times of day to provide opportunities for Turkers living in different time-zones to answer and to facilitate a more global response. Location blocking was also used to facilitate as much of a global response as possible; in particular, as participants from India and the US were overrepresented in earlier surveys, in later surveys, "location blockers" were applied to these regions in order to encourage representation from other geographic regions. While AMT states that users from these countries cannot participate with their location-blocking function, users from these countries did participate in the 'location-blocked' surveys.
Questions to identify malicious Turkers, which are those who indiscriminately click on answers without reading questions, were included in the survey. Those who were identified as a malicious Turker were rejected from the study, their data was disposed of and excluded from the analysis. An example of a question to identify malicious Turkers is "please select the fourth star." Further questions are available in Appendix S1 of Online Supplementary Appendix. 25 malicious Turkers (2% of respondents) were identified and their responses were excluded from the analysis.

Data analysis
Descriptive statistics were calculated for individual and demographic variables. Due to low responses in certain categories, several categories of the variables were combined in order to prepare the data for random-intercept analysis. Black African, Black Caribbean, and "other black" ethnicities were combined into "Black," under ethnicity, and Southeast and East Asian were combined into a joint category, due to a low number of respondents in the categories. Within marital status, separated, divorced, and widowed were combined into a new category of "no longer married," while married, in a domestic partnership or a common-law relationship were combined into "married or in a domestic partnership/co-habiting." Buddhist, Greek or Russian Orthodox, Mormon, and spiritual were added to the "other" religion category. Catholic and Christian denominations (including Protestant, Baptist, Lutheran, and Methodist) were combined into "Christian or Catholic." In employment, not employed, disabled, not able to work, and retired were combined into a category of "not currently working." Primary and secondary education were combined

VIEWPOINTS PAPERS
into "no higher education," while completed college, university, graduate, or professional categories were combined into "higher education." Non-binary and other genders were combined into "non-binary or other." Finally, country of birth and country of residence were compared and used to create a proxy for immigration status; if country of residence was different than country of birth, immigration status was coded as "yes," and if they were the same, it was coded as "no." Additionally, countries were organised into the seven World Bank regions (Latin America and the Caribbean, North America, Europe and Central Asia, East Asia and the Pacific, South Asia, Middle East and North Africa, and Sub-Saharan Africa). Political beliefs, household size, age, and self-reported health status were treated as continuous variables in the analysis. For political beliefs, participants answered on a scale of 1-7 (extremely liberal to extremely conservative), which was used to code as a continuous variable. In self-reported health status, Likert scale responses were used to create a continuous variable.
Fixed intercept and random intercept models for each CHNRI criterion were compared to determine suitability for a random-intercept linear mixed effects model. This determined that a random-intercept linear mixed effects model was suitable in thirteen of fifteen cases. In order to account for variation between countries, the country of residence determined the random-intercept in the models. No random slopes were introduced to the models. In the remaining two cases, logistic regression was used to explore the relationship between the individual/demographic characteristics and the CHNRI criteria.
A forced-entry method was used for each random-intercept model using the maximum likelihood method, and the Akaike Information Criteria (AIC), Bayesian Information Criteria (BIC) and log likelihood ratio were used to determine goodness of fit.
In the logistic regression models, because the data was extremely left skewed, the Likert scale options for "very important" and "important" were combined to represent a positive response and the options for "neutral," "slightly important," and "not important at all' were combined to represent a negative response. A forced entry method was used to build the models. The AIC, null and residual deviances were used to examine model fit. A model χ 2 statistic determined that the model significantly predicted the fit better than a null model (χ 2 =50.66, df=25, P=0.002 for 'acceptability', and χ 2 =45.31, df=13, P<0.001 for 'deliverability'). The Hosmer-Lemeshow and Nagelkerke' s goodness of fit tests were non-significant for both models, indicating acceptable fit in each case. There was no multicollinearity in either model.
Nonlinearity of continuous variables were tested against each outcome using multiple fractional polynomials. Several transformations were required due to nonlinearity for each of the four continuous variables. A legend displaying the transformations can be found in Table 2.
All analyses were completed in R Studio with R version 3.3.0 (R Studio, Boston, MA, USA).

RESULTS
A total of 1031 Turkers from 73 countries, representing the 7 World Bank regions completed the survey. A summary of the individual and demographic characteristics can be found in Table 3.   Table 4 displays the b-values and confidence intervals for all random-intercept models, with p-values indicated. Each model is displayed within a column, with estimates given in the corresponding rows. Rows without estimates were not including in the respective models, due to model fit. A legend of the transformations can be found in Table 2. Table 5 contains the results of the logistic regression models. Table  6 displays a summary of the individual and demographic characteristics and the criteria they differ in.
Results are discussed below, by independent variable, across models. Results are presented in beta-values unless otherwise specified.

Immigration
Compared to those who are living in their country of birth, those classified as immigrants only differed from those who weren't on two of fifteen criteria. Those who have immigrated find the potential for research to translate to policy more important (0.24, confidence interval 95% 95% CI = 0.05 to 0.45, P = 0.01) and were more likely to find deliverability to be important than those not classified as immigrants in a logistic regression (OR = 2.41, 95% 95% CI = 1.19 to 5.55, P = 0.02).

Ethnicity
The largest differences in ethnicity were found between black and white Turkers.

Religion
Compared to those who were atheist or agnostic, those who were Hindu differed the most in their valuation of CHNRI criteria. All religions, compared to those who were atheist or agnostic, attributed greater Those who were Hindu also ranked cost (0.34, 95% CI = 0.05 to 0.63, P = 0.02), and technical possibility (0.42, 95% CI = 0.11 to 0.73, P = 0.01) higher than those who were atheist or agnostic. Conversely, Hindu Turkers ranked disease burden reduction and feasibility less important than those who were atheist or agnostic (-0.21, 95% CI = -0.41 to -0.002, P = 0.05; -0.29, 95% CI = -0.48 to -0.09, P = 0.004, respectively). Catholic or Christian Turkers also rated cost (0.21, 95% CI = 0.04 to 0.38, P = 0.02) higher than those who were atheist or agnostic. There were no significant differences between Turkers who were Jewish to those who were atheist or agnostic, though the sample size was low and there may not have been sufficient power to detect this difference.   Table displays thirteen random-intercept models. Each model is displayed in a column, with the dependent variable listed at the head of the column, and the independent variables listed within each row. Where there is a "-" in a cell, the variable was not included in the model due to impacted the fit negatively. Each cell displays the b-value and 95% confidence intervals.

Education
Education only had a significant effect on scale, with those who enrolled, but did not complete, a college degree finding scale to be more important than those with no higher education (0.28, 95% CI = 0.06 to 0.52, P = 0.01).

Age
Age as a linear variable was positively correlated with equity (0.01, 95% CI = 0.001 to 0.010, P = 0.01). As a transformation (A1), age was positively associated with disease burden reduction (

DISCUSSION
The results show that within many of the criteria, there are differences in relative importance of criteria from responders. The individual and demographic characteristics that were most commonly associated with differences across criteria were self-reported health status, which was significantly associated with differences in responses across 11 criteria, gender, which was significantly associated with differences in responses across 10 criteria, ethnicity, which was significantly associated with differences in responses across 9 criteria, and, employment and religion, which were significantly associated with differences in responses across 8 and 7 criteria, respectively.
Disease burden reduction, feasibility, and acceptability had the most individual and demographic characteristics that contributed to differences in their perceived importance; each of these criteria had 7 individual or demographic characteristics that significantly contributed to their perceived importance. Demographic and individual characteristics were least predictive of responses in likelihood to fill a knowledge gap, answerability, and sustainability. Interestingly, disease burden reduction, feasibility, answerability, and sustainability all have relatively high mean scores (4.42, 4.41, and 4.40 respectively), which would limit the variation in responses. Acceptability had the lowest mean among all criteria (3.11), which may be reflective of heterogeneity of the Turkers.
There were several counterintuitive results. Having a larger household size was negatively correlated with being concerned with the cost of the product of the research; one may assume that having a large household would result in financial constraints and more concern for cost. Those who were unemployed were less likely to consider cost or translational value (ie, that the research would inform policy) important, in comparison to those employed full-time. Additionally, those who were classified as health stakeholders were less likely to rank disease burden reduction, effectiveness, feasibility, or innovation as important in comparison with those who were not. While these results are indeed counterintuitive, the data was extremely left skewed. The resulting patterns may be not that these groups do not find these criteria unimportant; rather, it may demonstrate that they simply find them slightly less important than their counterparts. However, it may be interesting to run the experiment again asking participants to allocate a truly relative valuation of the criteria, for example through allocation of imaginary money amongst the criteria.
There have been no CHNRI exercises that have involved stakeholder groups that have asked stakeholders information on their health status, gender, employment status, religion, or ethnicity to the authors' knowledge. However, this information may be important in achieving a balanced, well-rounded and representative approach to forming a stakeholder group, especially one involving the public. Being female vs male was a significant predictor of finding 10 of the 15 criteria important. While many CHNRI exercises report on the gender of the researchers, none have reported on the gender of the stakeholder groups [6,[8][9][10][11].
While no CHNRI exercises collected demographic data with regards to the stakeholder portion of the exercise, one asked health stakeholders (those working in national or district hospitals, health facilities, teaching hospitals, or in United Nations posts) in Uganda whether demographic characteristics of patients (eg, age, religion, societal power, affluence, mental, and physical capabilities) should be criteria to influence the priorities, but still did not report even gender-related information on the stakeholder group weighting the criteria [12].
Our data shows that self-reported health status was the most important predictor of differences within 11 of 15 criteria, more than any other demographic; this indicates that forming a stakeholder group of people affected by a disease may provide a unique perspective in terms of needs and values.
Being a health stakeholder, defined by responding yes to working or having worked in the health sector, was a predictor of a difference in rating 6 of the 15 CHNRI criteria, and all 5 of the original and most widely used CHNRI criteria. Many exercises that have employed stakeholder groups have used the original CHNRI criteria, but few have employed non-health stakeholders (eg, members of public, patients with the disease or condition, caregivers of patients, etc.). It may be important for future exercises to include these groups, as there are differences in how they view the importance of criteria. While researchers or health professionals may have a particular lens to viewing a criterion, a member of the public may find another aspect of research more of a priority and it can be important to consider this wider perspective as well.

Limitations
This exercise explores the associations between individual and demographic characteristics and CHN-RI criteria using data collected from AMT, which is a crowdsourcing platform. The Turkers who partici-