INTRODUCTION

Health numeracy is the term used to describe a person’s ability to use quantitative information to communicate with their providers, participate in medical decision making, and take care of their health. Health numeracy is one domain of the general construct of health literacy,1 with theoretical frameworks identifying a range of skills encompassed by the construct including basic number sense, use of tables and graphs, and concepts of probability and statistics.25 Previous studies reporting an association between low health literacy and poorer health outcomes have used measures that incorporate both print and numeric domains.6 However, an emerging literature supports the association of skills in the health numeracy domain to health-related outcomes including accuracy of risk perceptions,7,8 ability to interpret of risk information,912 identification of high quality care options,13 decreased susceptibility to framing biases,14 reduced hospitalizations for asthma,15 improved anticoagulation control16 and management of diabetes17 and increased medication management for HIV infection.

Numeric concepts may differ across cultures and be challenging to translate across languages. Understanding a patient’s level of health numeracy may help a provider to communicate more effectively and provide a higher level of patient-centered care.18 Many limited English proficient (LEP) individuals in the US have inadequate or marginal general health literacy,1922 defined as “the degree to which individuals have the capacity to obtain, process, and understand basic health information needed to make appropriate health decisions.”23 Low health literacy has been associated with disparities across a range of conditions and settings for LEP patients.2428 This is a large and growing problem in the U.S; the proportion of people with limited English proficiency increased by 80 % between 1990 and 201029 and the vast majority of this population is Spanish speaking.

Less is known about health numeracy among LEP patients regarding numerical concepts important in health-related self-care and decision making.25,30 While there are health numeracy measures available, they vary in their area of focus and whether they assess knowledge, skills, and/or preference for use of numbers or the full range of skills an adult needs to understand and use numbers in decision making and medication administration.11,15,16,3137 Relatively few health numeracy specific measures that have been translated into Spanish. We identified three composite health literacy measures that include measures of health numeracy and have also been translated and validated in Spanish; the Parental Health Literacy Activities Test (PHLAT-Spanish),38 the Newest Vital Sign,36 and the Spanish Test of Functional Health Literacy in Adults (STOFHLA).39 In addition, we identified one specific health numeracy measure, the Diabetes-Specific Numeracy Measure, that has been validated in Spanish.40 These measures have strengths, but also limitations that include that they are not self-administered (PHLAT-Spanish and NVS) or were not developed to measure the full range of numeracy skills important to communication, decision making, and participation in one’s health care. The PHLAT-Spanish is also specifically designed to assess health literacy and numeracy among parents of young children rather than adults in general. Finally, none of these measures were developed using modern measurement approaches to scale development such as item response theory.41 Modern approaches have several advantages over classical test theory, including the ability to evaluate item bias that may exist between groups of respondents, facilitating the development and validation of versions translated to other languages.41

Our objective was to develop and validate a short and easy to use measure of health numeracy for Spanish-speaking adults in the United States using item response theory psychometric methods.

METHODS

Overview

The Spanish Numeracy Understanding In Medicine Instrument (NUMi) was developed using response theory (IRT). Details of the development and validation of the 20-item and eight-item versions of the English NUMi have been previously published.42,43 We chose to base our Spanish NUMi on the shorter version of the English NUMi (eight-item) to create a scale that has as low a respondent burden as possible while maintaining strong psychometric properties. In brief, the NUMi was based on an empirically derived theoretical framework of health numeracy including the domains of number sense, tables and graphs, probability, and statistics.2,4 The use of IRT psychometric methods offers several advantages for scale development. IRT methods allow for the assessment of measurement bias through use of differential item functioning (DIF) analysis, a valuable approach for the development and evaluation of cross-cultural measurement tools.44,45 We used IRT methods to generate and calibrate a large item bank (n = 110) to assess the health numeracy construct.42 Items were calibrated using data from the English respondents using BILOG software.46 Qualitative studies 5,30 were conducted using both English and Spanish-speaking adults to develop items that would work in both Spanish-speaking and English-speaking and Hispanic and Non-Hispanic populations. Data used for psychometric analyses was initially obtained from English speakers, with 29 % of the sample identifying as being Hispanic.

The selection of items involved a systematic process that considered the psychometric properties of each item and the range of item content. Data from respondents who took the English version of the test were used as a reference group and those who took the Spanish version of the test were used as a focal group in the DIF analysis. The assumption of unidimensionality and model fit were evaluated as described below. Scale reliability was evaluated with a Cronbach alpha statistic and convergent validation was evaluated through a comparison of test scores with level of education and general health literacy as measured by the Spanish TOFHLA.

Spanish Translation

We translated all items originally created in English into Spanish using the group translation method. In this process, multiple qualified individuals translated the items individually, compared their translations, and, using group consensus, modified and agreed upon a final translation that would be most accessible and understandable to Spanish-speaking patients from a variety of backgrounds. Six translators participated in this process. One Spanish translator then reviewed the agreed-upon final translation of the Spanish items and compared them to the English items; their equivalency in both Spanish and English was then evaluated by this translator and the bilingual members of the study team. Several modifications were made to the Spanish items in this step to make them understandable and culturally relevant and as equivalent as possible to the already validated English items. We then conducted 16 cognitive interviews with Spanish-speaking participants from a variety of educational levels to make sure they were measuring the same constructs of numeracy in Spanish as they were in English.

Study Population

The resulting Spanish item bank was administered to 232 Spanish speakers. Participants were recruited using a purposive convenience sampling. We recruited participants from Milwaukee (A Community Center) and Chicago (three clinical sites, three schools, three community centers, and one church). Prior to enrollment, potential participants were interviewed in Spanish by a native Spanish-speaking and bilingual research assistant to ensure that the participant’s Spanish fluency was high enough to allow them to be able to comfortably respond to the Spanish items. In addition to ascertaining basic demographic characteristics in this interview, we asked them if they could read and write in Spanish, and provided an example of a test question in Spanish. Potential participants who could respond adequately to the questions in Spanish were invited to participate in this study. Tests were administered in a classroom setting. All participants had the option of having the test items read to them in a private setting. Additional inclusion criteria included self-identifying Spanish as a primary language and being 21 years of age or older. Exclusion criteria included visual acuity less than 20/50 with corrective lenses. Test packets included the numeracy items, the Spanish S-TOHFLA and socio-demographic data including age, gender, level of education, and country or region of origin.

Testing the Assumption of Unidimensionality

A key assumption of IRT is that the items are representing a single construct, a concept defined as unidimensionality. The unidimensionality assumption was tested using the full item bank obtained from English speakers using Stout’s Test of Essential Dimensionality (DIMTEST).47 Unidimensionality was tested across the four domains that comprised our heath numeracy construct: number sense, tables and graphs, probability, and statistics. Two hypotheses were tested: 1) the null hypothesis of unidimensionality between items in the statistics domain and remaining items and 2) the null hypothesis of unidimensionality between items in the probability and statistics domains combined compared to remaining items. Neither hypothesis yielded significant findings (t = 0.00, p = 0.500; and t = -0.37, p = 0.64, respectively). The analysis was then conducted on a 20-item version of the test with five items from each domain with similar results, supporting the assumption of unidimensionality.

Assessment of Model Fit

We used a two-parameter IRT model for item analysis.42 Log likelihood ratio tests were conducted to compare model fit for the one-parameter and two-paremter IRT models for our data. The chi-square statistic was large with a rejection of the null hypothesis of no difference between the one- and two-parameter models at a significance level of p < 0.001, indicating that the two-parameter model resulted in a statistically significant improvement in fit compared to the one-parameter model. We did not evaluate the three-parameter model (including a guessing parameter), as we directed our respondents to leave items blank rather than guess if they did not know the answer to a question.

Differential Item Functioning

Differential item functioning (DIF) refers to the potential that there are unequal probabilities of providing a correct response due to factors other than the primary construct being measured by test items. The presence of DIF would require the re-calibration of item parameters in the Spanish version of the test. We evaluated all candidate items and included only those that did not demonstrate DIF in the Spanish NUMi. DIF was evaluated using Simultaneous Item Bias Testing (SIBTEST).45,48,49 SIBTEST compares the performance on candidate studied items in the focal group (Spanish-speaking respondents) to performance on those items in the reference group (English-speaking respondents), after matching respondents by ability. The difference in item performance is represented by a Beta Index with a level of 0.10 set as a significant degree of DIF. The null hypothesis of a non-zero difference is typically rejected if the p value is less than 0.05, with adjustments for multiple comparisons. A two-step analysis was conducted. In step 1, each Spanish item was compared to performance on the corresponding item in English, using the remaining items as the matching subtest. Items flagged for DIF were then removed from the matching or conditioning subtest. In step 2, the DIF analyses were repeated on all items using this purified matching subtest.44,45,48,49 For this analysis, the reference group consisted of English-speaking examinees (n = 200) and the focal group consisted of Spanish-speaking examinees (n = 232).

Reliability and Validity Testing

We used Cronbach’s alpha to evaluate the internal consistency of the Spanish NUMi and evaluated its convergent validity by comparing scores on the Spanish NUMi to the Spanish Test of Functional Health Literacy in Adults (S-TOFHLA)39 and level of education. The association of the Spanish NUMi and S-TOFHLA was evaluated using a Spearman correlation test. An independent sample t test was performed, assuming unequal variances, comparing the mean Spanish NUMi scores of the high, medium, and low education groups. We divided our sample into these three groups based on the natural distribution of our sample into those who had middle school level of education or less, 9–12 years of education, and those who had some college or more. These groups are also likely to have different numeracy levels. We also evaluated the correlation of scores on the Spanish NUMi with the number correct of the 18 out of 20 translated items (those that did not demonstrate DIF) from the original NUMi.

We expected the Spanish- NUMi to be positively associated with higher health literacy as measured by the S-TOFHLA (with print and numeracy components) higher levels of education, and scores from the translated longer original NUMi measure.

RESULTS

Study Population

The study population was diverse in age, gender, and level of education (Table 1). The majority (70 %) of participants reported that Mexico was their country of origin. Ninety-three percent of participants (93 %) cited Spanish as their primary language. Forty-one percent (41 %) had no more than an eighth grade level education. Twenty-nine percent (29 %) had inadequate health literacy as measured by the Spanish Test of Functional Health Literacy in Adults (S-TOFHLA).

Table 1. Characteristics of the Study Population

Translation Issues

A particular challenge was the translation of concepts pertaining to risk, uncertainty, and statistical significance. For example, the English version of items that asked about the risk of side effects used the phrases “chance of a serious side effect” and “how many would be expected” to have a side effect to assess understanding of the probability of an occurrence. The Spanish version translated this concept to “probabilidades de experimenta un effecto secundario” and “anticipa que experimentaran.” However, the concepts of a probabilistic expectation were not easy to translate. The term riesgo (risk) was not interpreted by participants as a probabilistic concept as much as a term to identify exposures that put one at risk for adverse outcomes. Items that attempted to assess false positive test results and the meaning of a p value were also difficult to translate, perhaps due to difficulty in translating concepts of chance, risk, and statistical significance between English and Spanish.

Differential Item Functioning

Ninety-six items from our original item bank were evaluated for DIF with 15 demonstrating DIF, including two items selected for the English NUMi Eight-Item Test (Table 2). Items evaluating understanding of false positive tests and the meaning of a statistically significant finding were dropped from the Spanish version due to the finding of DIF between the English and Spanish versions, resulting in a six-item Spanish NUMi. The English version of these items and those dropped due to DIF are presented in Table 2.

Table 2. English Version of Candidate Items for Spanish NUMi

Psychometric Analysis of Spanish NUMi

The six-item Spanish NUMi (available in the online appendix) demonstrated adequate reliability with a Cronbach’s alpha of 0.72. Items demonstrated a range of difficulty using classical test statistics (percent correct: 0.49 to 0.86) and adequate discrimination (item-total score correlation: 0.34–0.56). Similarly, IRT parameters indicate a range in the difficulty parameter. Typically, IRT difficulty parameters range from -3.0 (easiest) to 3.0 (hardest) and discrimination parameters range from 0 (poor discrimination) to 3.0 (excellent discrimination). The easiest items on the test addressed interpretation of a nutrition label and determining if a glucose level was within goal range (difficulty parameters of -1.31 and -1.27, respectively). The hardest item on the test addressed interpretation of small risks and had a difficulty parameter of 0.199. The discrimination parameters ranged from a low of 0.62 to a high of 1.40 indicated acceptable discrimination (Table 3). The Item Characteristic Curves (ICCs) and Item Information Curves (IICs) demonstrate the range of difficulty and discrimination across the items (Figs. 1, and 2). The majority of items had negative IRT difficulty parameters, indicating they were relatively easy for examinees with high health numeracy. The Test Information Function of the final Spanish NUMi indicates that the test is most discriminating at a lower health numeracy level (Fig. 3).

Table 3. Description and Item Level Analysis of the Spanish NUMi Candidate Items
Figure 1.
figure 1

Item Characteristic curves. This figure displays the item information curves for the 8 candidate items for the Spanish NUMi. Items 1–6 were including in the Spanish NUMi. The item content is presented in English in Table 2. The IRT difficulty and discrimination parameters that were used to determine these curves are presented in Table 3. The curves that have their inflection point at a higher Theta value indicate harder items.

Figure 2.
figure 2

Item information curves. This figure displays the test information function for each candidate item of the Spanish NUMi. Items 1–6 were including in the Spanish NUMi. The item content is presented in English in Table 2. The IRT difficulty and discrimination parameters that were used to determine these curves are presented in Table 3. The curves that have the greatest height indicate items that have the highest discrimination. The Theta value at which the curve peaks indicates the difficulty level at which the item is most discriminating.

Figure 3.
figure 3

Test Information Function of Spanish NUMi. This figure indicates the level of the numeracy trait, theta, at which the six-item Spanish NUMi test will be most discriminating.

Unidimensionality Assumption

The unidimensionality assumption for the item bank was supported by the Stout’s Test of Essential Dimensionality (DIMTEST).42 In order to provide supportive evidence for the assumption of unidimensionality, we conducted exploratory factor analysis of the eight items considered for inclusion in the Spanish NUM to provide additional supportive evidence for unidimensionality. The analysis yielded only one factor that demonstrated an eigenvalue > 1.0. Evaluation of the residual matrix of the factor analysis also supports the independence of items (Online Fig. A).

Reliability and Validity

A strong correlation as demonstrated by a high Pearson’s correlation coefficient was found between the Spanish NUMi and print literacy as measured by the S- TOFHLA (0.67; p < 0.001), education (0.67; p < 0.001), and the mean score that the respondents had on the longer NUMi form (0.87; p < 0.001). We further divided the Spanish-speaking sample into three education levels: up to an eigth grade level education, ninth to twelfth grade education, or some college or more. As anticipated, those in the low education group scored significantly lower 2.48 ± 1.64; m, SD, n = 93) than the examinees in the medium (4.15 ± 1.45, m, SD, n = 78) and high education groups (4.82 ± 1.37, m SD, n =61) (t (135) = -7.109, p < .001).

For the entire sample of examinees, the mean score on the six-item Spanish NUMi was 3.66 with a standard deviation of 1.80. The distribution of scores demonstrates is slightly negatively skewed in this sample of examinees (Online Fig. B).

Short Form Content

Despite the fact that the short form consists of only six items, these six items reflect a range of domains. Four items on the final shortened version of the NUMi are from the number sense domain, one item is from the tables and graphs domain, and one item is from the probability domain (Table 2).

DISCUSSION

We have documented that the Spanish NUMi is a valid measure of the range of important numerical concepts that are used in understanding and communicating health information. While there are other Spanish measures of health numeracy, they have important limitations that the Spanish NUMI does not have. The Spanish NUMi is an important addition to the field, as we know that low health numeracy and literacy are more common in Spanish-speaking populations in the United States, placing this population at greater risk of health disparities. In addition, this tool is short, easy to use, and focuses on health numeracy and the range of numerical skills that individuals need to understand to effectively and actively participate in their health care.

The rigorous development of the NUMi for use in Spanish and English should not be overlooked. We found that some terms and concepts were difficult to translate linguistically and culturally, highlighting the importance of excellent translation and a need for future research to focus on how to overcome these issues.

Measures of health numeracy may be useful in guiding communication strategies in health that are tailored to individuals or targeted to populations.18 For example, the Spanish NUMi could be given to patients at initial registration for care or while waiting for a visit and information about the patient’s numeracy given to providers so they can adjust their communication accordingly. If patients are found to have low health numeracy, providers could use fewer numbers and/or spend more time explaining numerical health information or instructions. Additionally, this additional information on a patient’s level of health numeracy could cue the provider to use diagrams, tables, pictographs, and frequencies, rather than percentages, when trying to communicate risks, benefits and other treatment information to a patient with low health numeracy.31

Our study was not without limitations. Our sample was predominantly Mexican Spanish speakers from the Midwest. While this might limit generalizability to other Spanish-speaking populations, we do not think this is likely because we used a group translation method in which Spanish speakers from Mexico and Central and South America were involved. Second, the study met the psychometric requirements of IRT (e.g., a large item bank that was originally calibrated using a large sample). However, our validation is based on one sample. Use of the Spanish NUMi among broader populations is needed to support the predictive validity of the Spanish NUMi with respect to communication, decision making, and health outcomes. Further work is also needed to support meaningful categories of scores that could define low, medium, and how numeracy groups.

Despite these limitations, we were able to document that the Spanish NUMi is a reliable and valid measure of the range of important numerical concepts that are used in understanding and communicating health information. The Spanish NUMi is an easy to use, brief measure of literacy that could be used in Spanish speakers in a large range of settings to assess the health numeracy, and guide communication and care to be more effective and accessible for low-literate Spanish-speaking patients.