Assessing measurement error in surveys using latent class analysis : application to self-reported illicit drug use in data from the Iranian Mental Health Survey

Department of Epidemiology and Biostatistics, School of Public Health, Tehran University of Medical Sciences, Tehran; Iranian National Center for Addiction Studies, Tehran University of Medical Sciences, Tehran; Addiction and High Risk Behavior Research Center, Iran University of Medical Sciences, Tehran; Department of Psychiatry, Iran University of Medical Sciences, Tehran; Psychiatry and Psychology Research Center, Tehran University of Medical Sciences, Tehran; Department of Psychiatry, Tehran University of Medical Sciences, Tehran; Department for Mental Health and Substance Use, Iranian Research Center for HIV/AIDS, Iranian Institute for Reduction of High-Risk Behaviors, Tehran University of Medical Sciences, Tehran; Department of Epidemiology, School of Public Health, Iran University of Medical Sciences, Tehran, Iran


INTRODUCTION
Measurement error is a major source of systematic error when estimating variables, especially those reflecting outcomes related to sensitive topics in surveys.This error refers to the difference between the true value of an outcome and what is obtained from a measuring tool [1,2].Such errors may cause misclassification in categorical outcomes.The measuring tool, the interviewers, the respondents, and the data collection style (in person, telephone, web, etc.) are among the major sources of this type of error in surveys [1][2][3].This type of error in categorical data causes bias and reduces the precision of the estimation [4,5].Despite the range of measures that may be taken in the design and execution of surveys, such errors are inevitable, especially for sensitive outcomes, such as risky behaviors involving sexual relations and the use of illicit drugs.In such situations, the assessment and correction of measurement error becomes more important.
If a survey contains at least two measurements of a categorical outcome, one of which is a gold standard (i.e., a measurement with negligible error), the classification error of another tool can easily be estimated and corrected.This method of assessing classification error is known as the finite fixture approach [1].However, using this approach to assess and correct classification errors is not possible in most cases due to the lack of a gold standard or its cost.Latent class analysis (LCA) is a suitable solution for assessing and correcting the classification error of any measurement in a survey study in which different wording is used to obtain repeated measurements of an outcome when none of the measurements is a gold standard [1,3].LCA was first introduced by Lazarsfeld & Henry [6] for classification error assessment in 1968.Extensive experience has been accumulated regarding its advantages and limitations.The use of LCA can be highly suitable for the quantitative analysis of classification error in survey data if the assumptions of the model are logically established.Otherwise, its incorrect use may lead to invalid results, similarly to any other modeling approach [3].
This paper aims to introduce the LCA approach for assessing and correcting classification error in the estimation of categorical outcomes in surveys through focusing on its required assumptions, which should be considered in order to use it correctly.As an example, we used LCA to estimate the prevalence of illicit drug use in the past 12 months based on data from the Iranian Mental Health Survey (IranMHS) and we discuss the essential assumptions involved in doing so.

LATENT CLASS ANALYSIS AND THE ASSESSMENT OF MEASUREMENT ERROR IN SURVEY DATA
Several indicator (manifest) variables are used in the LCA approach of estimating a latent variable [1].This method assumes a relationship between the classifications of the indicator variables and the unobserved latent classes of an outcome variable [7,8].
In order to use an LCA model to assess and correct the measurement error of a categorical outcome in survey data, two or more indicators should measure the variable, and none of them should necessarily be a gold standard [9].The other indicators can be provided by re-measuring the outcome using another survey a few weeks later, or by measuring the outcome in a survey through multiple items that use different wording.The first method is difficult and costly to implement, and the short interval between the two measurements may lead to interrelated responses due to the respondent's memories.The limitations of the second method include the possibility that respondents may exhibit a lack of cooperativity in answering similar questions that seem to be redundant, as well as the possibility that none of the items may accurately measure a variable that is necessarily latent [3].
In standard LCA models, the local independence assumption is used to provide the degrees of freedom required to estimate the parameters of the model [7,10].According to this assumption, the indicators are independent from each other condition on the latent variable and 100% of the correlation among indicators is justified by the latent variable (Figure 1) [3,7,10].As illustrated in Figure 1, if a condition is placed on the latent variable, the connecting path between indicators A and B will be closed completely.Most statistical software and packages used to carry out LCA analysis have been designed based on this assumption.
It may not be possible to confirm the local independence assumption when using LCA models to assess measurement error in survey data [3,7].Violation of the assumption occurs for several possible reasons.One factor is behaviorally correlated error, which occurs when a respondent, for instance, answers indicators A and B in an intentionally incorrect manner or when the answers to both indicators are incorrect because the respon- The first scenario is probable when repeating a measurement in a survey and the second scenario is probable both when repeating measurement in a survey and when performing re-measurement using another survey [3,9].A second factor is bivocality, which occurs when the indicators do not necessarily measure a common latent variable.For example, indicator A measures latent variable X, while indicator B measures latent variable Y; in this scenario, the latent variables X and Y may be correlated, but not identical.In this scenario, indicators A and B are also correlated condition on the latent variable (Figure 2B).This situation is probable when repeated measurements are made in a single survey [3,9].A third factor, latent heterogeneity, occurs when the classification errors of indicators changes at different levels of an unknown grouping variable in a population.For instance, it is evident that the probability that respondents with a low level of literacy may misunderstand a question is higher than the probability that highly literate respondents would have a similar misunderstanding (Figure 2C) [3,9].If the possibility that the local independence assumption is violated is not considered in LCA models, measurement bias in estimating parameters such as the classification error rate of indicators or the prevalence of the outcome variable is not completely corrected.For instance, if a positive correlation is present between the classification errors of an indicator, those values for different indicators using an LCA model with the local independence assumption would be underestimated [11,12].
The use of latent class log-linear (LCLL) models provides the flexibility required for data analysis when the local independence assumption is not established [1].The non-establishment of the local independence assumption can be taken into account in LCLL models by inserting interaction terms between indicators that are correlated due to bivocality or behaviorally correlated error [3].It is possible to provide a suficient degree of freedom for LCLL models by inserting grouping variables to estimate the correlation parameters between indicators [7].Of course, it is better to select the grouping variables from those that play a role in latent heterogeneity [3].

APPLICATION OF LATENT CLASS ANALYSIS TO THE IRANIAN MENTAL HEALTH SURVEY DATA ON ILLICIT DRUG USE
The IranMHS was a three-stage national household survey conducted from January to June 2011.The primary goal was to estimate the 12-month prevalence and severity of psychiatric disorders among the Iranian population aged 15 to 64 years.
The validated Persian version of the paper-and-pencil interview form of the Composite International Diagnosis Interview version 2.1 (CIDI 2.1) was used as the main tool for diagnosing psychiatric disorders in this study.The CIDI 2.1 includes questions regarding drug and alcohol use disorders.The study design and field procedures have been published elsewhere [13].
We aimed to assess measurement errors in estimations of the prevalence of illicit drug use in the past 12 months in the Iran-MHS data using an LCA approach.The list of illicit drugs in the IranMHS included various forms of cannabis; amphetaminetype stimulants; opioids, including opium and heroin/crack of heroin [14]; the hallucinogens ecstasy and lysergic acid diethylamide; volatile solvents; and other illicit substances.The list did not include over-the-coun ter sedative/hypnotics and codeinecontaining medications.
The LCA approach using LCLL models were used to fit the IranMHS data.A LCLL model was fit to this data using two indicators for illicit drug use in the past 12 months according to the IranMHS questionnaires, and 10 different LCLL models were also fitted to the data using three indicators.Moreover, a gender grouping variable was defined in order to account for latent heterogeneity and to increase the degrees of freedom of the models.
The following three indicators were used.Indicator A was based on answers to questions about the use of any kind of illicit drugs without a prescription more than five times in the past 12 months during a face-to-face interview in which the interviewee was presented with a list of the drugs in question.Indicator B was based on answers to 10 questions about the use of any kind of illicit drugs without a prescription at least one time in the past 12 months using a self-administered questionnaire.Indicator C corresponded to face-to-face questions about the respondent's need to receive treatment for drug use and dependency over the past 12 months, or the use of any outpatient, inpatient, short-term residential, or traditional services due to drug use and dependency in the past 12 months.Receiving agonist maintenance treatment and membership in Narcotics Anonymous were excluded from indicator C because many people in these long-term treatment programs would be likely to have been abstinent from their primary drug for a long time.Definitions of these three indicators are presented in Appendix 1 based on the specific terms used in the IranMHS questionnaires.Table 2 presents the characteristics of the 11 different LCLL models applied to the IranMHS data.In these models, Y denotes the real drug consumption status in the absence of classification errors, which is latent.Model 0 is a two-indicator model, while models 1-10 are three-indicator models.In all models, the amount of classification errors among indicators is expressed by the interaction terms (AY and BY in model 0; and AY, BY, and CY in models 1-10).In all models, GY indicates different estimates of the prevalence of illicit drug use according to gender after adjusting for classification errors.In models 0, 1, and 9, it was assumed that no correlation was present between the indicators condition on the latent status (the local independence assumption) [7,10].In all models except models 9 and 10, it was assumed that the amount of classification errors for each indicator was the same in both genders.In models 2-8 and 10, different scenarios where the local independence assumption did not hold were imposed to the data.The interaction term between two indicators showed that a correlation was present between the two indicators condition on the latent Y variable.Model 9 was obtained by adding the interaction terms AG, BG, and CG to model 1.These terms imposed a specific kind of variability on the data in the amount of the classification error for each indicator between the two genders.For example, the BG term next to the BY term in model 9 in Table 2 implies that the multiplicative product of the odds of a false positive error in indicator B and the odds of a false negative error in indicator B was always constant for both genders.In order for this to have been the case, it would have been necessary (for example) for men who responded to indicator B with a higher false positive error probability than women to have also responded to the indicator with a lower false negative error probability.Thus, the above product of the odds shall remain constant for both genders.This is discussed in more detail in Biemer & Wiesen [7].
By adding the interaction terms (AC, AB) to model 9, model 10 was obtained, which also imposed the absence of the local independence assumption on the data.Two differences are present between model 10 and model 1: first, in model 10, it was assumed that the amount of the classification errors for each indicator varied with gender; second, in model 10, the local independence assumption no longer held.
In order to select the best three-indicator model in Table 2, the Lin & Dayton [15] criteria were used.Based on these criteria, all of the three-indicator models in Table 2 were identifiable, since the number of their parameters was smaller than the degrees of freedom (i.e., the number of cells in the ABCG cross table) [16,17].However, model 7 had the lowest Bayesian information criterion (BIC), was selected as the best three-indicator model, and was then used to estimate the prevalence of illicit drug use after adjusting for the classification error and the sensitivity and specificity of each of the indicators.
The expectation maximization algorithm is the method we used to estimate the parameters of the models [10,17,18].The non-parametric bootstrap 95% confidence intervals of the estimates of the two-indicator and the best three-indicator LCLL models were constructed through independent resampling of the province strata in the main data.This was due to the fact that the first stage of the sampling of the IranMHS was carried The reason for replacing BIC with "BIC [2*ln(likelihood of saturated model)]" is explained in Appendix 3.
out using stratified sampling of the provinces.The multi-stage sampling design of the IranMHS was not taken into account in the estimations.All statistical analyses were carried out in R software version 3.2.0[19].The emgllm function of the gllm package [20] was used to fit the LCLL models.Samples of the code used in the statistical analysis in R are presented in Appendix 2.

Ethics statement
The study protocol was approved by the institutional review board of Tehran University of Medical Sciences.

RESULTS OF APPLYING LATENT CLASS ANALYSIS TO THE IRANIAN MENTAL HEALTH SURVEY DATA
Indicators A and C were assessed based on face-to-face interviews with the participants.However, the questionnaire for indicator B was completed by the participants themselves and was put in a closed receptacle to protect their confidentiality.Table 1 presents the degree of disagreement among indicators A, B, and C for the use of any illicit drug in the IranMHS.Indicator B showed the highest degree of inconsistency with the other indicators.
As presented in Table 2, model 7 was selected as the best threeindicator model.This model took into account correlations between indicators A and C and indicators A and B in estimating the prevalence of illicit drug use in the past 12 months.An expert indicated that the estimated prevalence of illicit drug use by this model was unrealistically high.
Table 3 presents estimates of sensitivity and specificity (%) of the indicators after application of the two-indicator model using indicators A and B (model 0) and model 7 to the IranMHS data.The estimated specificity of all the indicators in models 0 and 7 was extremely high because the likelihood of a false positive response about the use of illicit drugs is very low.In contrast, the estimated sensitivity of the indicators in model 7 was much lower than in model 0, because in model 0, a correlation between the indicators was not imposed on the data, meaning that the false negative error of the indicators was underestimated.
The sensitivity of indicator B was higher than that of indicator A in model 0 and the two other indicators in model 7, and the sensitivity of indicator A was higher than that of indicator C in model 7. Since indicator B directly assessed the use of illicit drugs at least one time in the past 12 months using a selfadministrated questionnaire, it had a better sensitivity than the other indicators.In contrast, since the indicator C asked about the perceived necessity of treatment or service using for substance-use disorders, and so ignored non-problematic drug users, it had the lowest sensitivity among the indicators in model 7.
In Table 4, estimates of the prevalence of illicit drug use employing the LCLL models with two and three indicators (adjusted for classification errors) and those using indicators A, B, and C (unadjusted) are presented by gender.After adjusting for classification errors in the self-report of illicit drug use, the estimated prevalence of drug use increased.This increase was lower in the LCLL model with two indicators than in the model with three indicators.The reason for this is that the measurement er-   4 also indicate that measurement error in self-reported illicit drug use was an important factor in underestimating the prevalence of the use of these drugs.Although the adoption of techniques such as self-administered questionnaires leads to a reduction in the occurrence of such errors in the course of research [21], it does not fully prevent them.

DISCUSSION
In the IranMHS data, the indicators used in the LCA models were bivocal because they did not measure a common latent variable.Indicator A measured the latent variable of more than five instances of using any illicit drug in the past 12 months.Indicator B measured the latent variable of at least one instance of using any illicit drug in the past 12 months.Indicator C measured the latent variable of the need for or the use of treatment services due to drug abuse and addiction in the past 12 months.The latent variable of indicator C is nested within the latent variable of indicator A, and the latent variable of indicator A is nested within the latent variable of indicator B. The correlation between the latent variables of indicators A and B was high, but the correlation between the latent variable of indicator C and the latent variables of the other two indicators was weaker.Therefore, indicator C was considered a weak indicator.
In contrast, behaviorally correlated error is expected to occur among the three indicators due to the nature of the questions, and in particular, the fact that they assessed illicit drug use.That is, if a respondent initially denies illicit drug use when answering indicator A, he or she would be expected to deny it deliberately when answering indicators B and C.However, this pattern may have occurred less for indicator C due to its indirect nature.
The incidence of bivocality and behaviorally correlated error among the IranMHS indicators indicate the non-establishment of the local independence assumption when using the LCA approach to analyze the data of this survey.As the correlation among the indicators in this survey was positive, it may be expected that failing to consider the violation of this assumption in LCA models would lead to bias in estimating the model parameters manifesting in the underestimation of the classification error of the indicators and, consequently, an incomplete correction of the estimated prevalence of drug use.In model 0, which only used indicators A and B along with a gender grouping variable for assessing and correcting classification error in the prevalence of illicit drug use (known as the Hui-Walter method [22]), the local independence assumption was imposed on the data.Therefore, it may be expected that the classification error rate of these two indicators would be underestimated and the estimation of the prevalence of drug use would be lower than the real rate due to incomplete correction.Only eight degrees of freedom are present in the two-indicator model along with one binary grouping variable, which was used to estimate its eight parameters, and it would not be possible to assess a correlation between indicators A and B through inserting an interaction term between them into the model due the insufficient degrees of freedom.
As sufficient degrees of freedom are present for estimating the parameters of correlation between indicators in the threeindicator models with one gender grouping variable (models 1-10), the non-establishment of the local independence assumption can be imposed on the data in these models.Among the three-indicator models presented in Table 2, the local independence assumption is established in models 1 and 9. Therefore, these two models, like the two-indicator model (model 0), are not suitable for the IranMHS data.Among the other three-indicator models, different modes of the non-establishment of the local independence assumption were imposed on the data.It was noted that variation was present in the estimated value of prevalence of drug use among models with different modes of correlation structure among the indicators.However, experts believe that the prevalence estimation obtained from the selected three-indicator model (model 7) is unlikely to be correct.
Indicators A and C in the IranMHS data showed the highest marginal correlation, and the correlation strength between indicators A and B was second-order.It was noticed that the BIC of the model with interaction term AC was the lowest and the BIC of the model with the interaction term BC was the highest in models 2-4, which only added one interaction term to the LCA models.The BIC of the model with AC and AB interaction terms was the lowest and the BIC of the model with AB and BC interaction terms was the highest among models 5-7, which added two interaction terms to the LCA models.This means that a lower BIC was obtained in the models in which the correlation structure between indicators had more conformity with the correlation structure in the data.
Both bivocality and behaviorally correlated error phenomena occurred in the IranMHS data; however, the correlation structure that emerged among the indicators due to these two phenomena did not correspond together.While bivocality was strong between indicators AC and BC, and weak between indicators AB, behaviorally corrected error was strong between indicators AC and AB, and weak between indicators BC.The correlation structure caused by behaviorally correlated error in the data exceeded that caused by bivocality.Therefore, some bivocality remained in the models with greater fitness with this type of correlation structure in the data (such as model 7, with the low-est BIC), that was not considered, and it may be expected to lead to estimations that are not completely unbiased.
In contrast, only the gender grouping variable was used in the LCA models for the IranMHS data.Statistically, this variable did not play a crucial role in creating latent heterogeneity (with respect to the BIC of models 9 and 10).One of the other factors that was probably involved in the incorrectness of estimations of the three-indicator model with lowest BIC (model 7) was the probability of latent heterogeneity in unknown variables that were not considered in analyzing the IranMHS data.
In order to obtain unbiased estimations when assessing and correcting classification errors in estimates of categorical outcomes in surveys using the LCA approach, the factors leading to violations of local independence (bivocality, behaviorally corrected error, and latent heterogeneity) should be considered in models.At least three indicators should be developed for a latent variable when designing survey studies, and these indicators should have univocality or at least minimum bivocality.Moreover, all variables for which the classification error of indicators is expected to be variable at their levels and consequently lead to latent heterogeneity should be identified and measured.Finally, well-known methods should be used for considering any factor violating the local independence assumption in models when analyzing survey data using LCA.Indicator B= Yes, if any of the 2 to 5 options (referring to the following statements in the mentioned order for the past 12 months: "only once or several times", "at least once a month", "at least once a week", and "almost every day") were selected in response to each of the aforementioned questions; and Indicator B= No, if choice 1 ("never") was selected for all questions.

Indicator C
This indicator was formed based on responses to six groups of questions about the need for receiving services because of drug use and addiction, and receiving medical services in a form other than taking part in NA addiction groups (anonymous addicts) or receiving agonist maintenance treatment because of drug use and addiction, as in-patient, short-term residential, outpatient and traditional treatments in the past 12 months. -

Table 2 .
Characteristics of alternative latent class log-linear models applied to the IranMHS data regarding illicit drug use IranMHS, Iranian Mental Health Survey; BIC, Bayesian information criterion.1 -2*(ln[likelihood of current model] -ln[likelihood of saturated model]).2

Table 3 .
Estimated sensitivity and specificity (%) of indicators of illicit drug use in the past 12 months

Table 4 .
Comparison of prevalence (%) estimates using indicators A, B, C and the LCLL models with two and three indicators for the use of any illicit drug in the past 12 months Epidemiology and Health 2016;38:e2016013 ror correction in the LCLL model with two indicators was incomplete, because this model did not take into account the correlation between the indicators.The findings presented in Table 195% CIs calculated using non-parametric bootstrapping with 500 iterations.
Indicators A, B and C of any illicit drug use in the past 12 months in the Iranian Mental Health Survey (Iran-MHS) data -Have you used, even once, any other substance in the past 12 months?(cocaine, gasoline, acetone, ether, paste, testosterone, nandrolone, etc.) Name the substance if you have.
Have you been in need of "visiting" a center or a therapist to receive treatment or a solution for drug use and addiction in the past 12 months?Or has anyone suggested that you need to "visit" a medical center or therapist for treatment or a solution for the aforementioned problem?-If you have been hospitalized in inpatient centers in the past 12 months because of psychiatric problems or addiction, have you received Ultra Rapid Detoxification (UROD)or detoxification with anesthesia?(Inpatient centers include hospital emergency wards, hospitals, clinics (general or specialized polyclinics or comprehensive psychiatric centers), care centers (for mentally ill patients, the elderly, etc.), detoxification camps (rehabilitation houses for quitting or care centers for addicts), and other inpatient centers.) -Have you visited any outpatient treatment center (except for the centers providing agonist maintenance treatment) and received treatment services for drug use and addiction