Two Indias: The structure of primary health care markets in rural Indian villages with implications for policy

We visited 1519 villages across 19 Indian states in 2009 to (a) count all health care providers and (b) elicit their quality as measured through tests of medical knowledge. We document three main findings. First, 75% of villages have at least one health care provider and 64% of care is sought in villages with 3 or more providers. Most providers are in the private sector (86%) and, within the private sector, the majority are ‘informal providers' without any formal medical training. Our estimates suggest that such informal providers account for 68% of the total provider population in rural India. Second, there is considerable variation in quality across states and formal qualifications are a poor predictor of quality. For instance, the medical knowledge of informal providers in Tamil Nadu and Karnataka is higher than that of fully trained doctors in Bihar and Uttar Pradesh. Surprisingly, the share of informal providers does not decline with socioeconomic status. Instead, their quality, along with the quality of doctors in the private and public sector, increases sharply. Third, India is divided into two nations not just by quality of health care providers, but also by costs: Better performing states provide higher quality at lower per-visit costs, suggesting that they are on a different production possibility frontier. These patterns are consistent with significant variation across states in the availability and quality of medical education. Our results highlight the complex structure of health care markets, the large share of private informal providers, and the substantial variation in the quality and cost of care across and within markets in rural India. Measuring and accounting for this complexity is essential for health care policy in India.

The Medical Advice Quality and Availability in Rural India (MAQARI) Study was conducted in the 19 most populous states in India, except Delhi, which was not included because the study focused on rural India. The MAQARI study of quality and availability is intended to reflect a nationally-representative sampling strategy with weights appropriate to reporting the availability of providers to the average person living in a rural area. The MAQARI sample comprised two distinct sampling frames. In the first, a census exercise was conducted to identify all providers practicing in a representative sample of villages in rural India -the main Village Provider sample, which is used to calculate all availability, cost, and quality statistics at the village level in the main text. The study also included a representative sample of Primary Health Centers (PHCs) and Community Health Centers (CHCs), which are the central units of public health care system in India -the PHC/CHC sample. However, these centers tend to be located in denser and more urban areas. Therefore, this sample is used only for the construction of vignette scores and state-level knowledge measures in this study, and is not used for any calculation of costs or availability.
To construct these sample frames, a total of 10 districts were sampled from each state. To ensure broad geographical coverage, each state was divided into distinct sociocultural regions (SCRs), which served as strata for sampling districts and then villages. The number of SCRs in a given state ranges from three to eight depending on the size of the state. Once the state was divided into SCRs, the 10 districts were allocated across the SCRs proportional to population -so that larger SCRs had more sampled districts. The aim of the stratification was to ensure that the study is representative across all major social and cultural regions of the country. Once the number of districts was assigned to each SCR, districts were selected within the SCR with probability proportional to their population size. Since the study is on health care in rural India, districts with more than 60% urban population were removed from the sampling frame.
Districts and villages were randomly sampled using a population-based probability proportional to size (PPS) sampling method, on the basis of the 1991 population census. Villages were defined as under 100,000 residents in Kerala and under 20,000 residents for all other states. Within each sampled district, 8 primary sampling units (PSUs) were selected. Primary sampling units were PHCs or CHCs in the PHC/CHC sample and villages in the main Village Provider sample. Exceptions were Uttarakhand and Uttar Pradesh. Uttarakhand had only 9 districts, while Uttar Pradesh was the largest state in terms of population. Therefore, the Uttarakhand sample had 9 districts (90 PSUs) with the additional district assigned to the neighboring state of Uttar Pradesh, where we sampled 11 districts or 110 PSUs. All analysis was weighted by the inverse of sampling probability, so the final estimates are nationally-representative on a population-weighted basis. Because sampling probability was proportional to population given a SCR, the sample represents the population in each SCR by construction. SCRs with larger population were allocated a larger number of sample districts; however, it is not an accurate reflection of differences in population size because of rounding error to obtain an integer.
Moreover, sampling allocation was conducted within each state and does not reflect the differences in population across states. Therefore, we adjust for the population size by assigning the weight to each PSU (village or PHC/CHC), denoted by W PSU , which is given by, Note that all the PSUs in a given SCR are assigned the same weight. W PSU is used in the calculation of the number of providers available per village analysis from the Village Provider sample. This choice of weight is designed to calculate the experience of representative population in rural India. Larger SCRs and villages are oversampled relative to those in smaller ones due to the PPS design.
To ensure the representativeness of the sample of medical service providers, enumerators first conducted a full mapping of all public and private providers in each sampled village. Enumerators conducted Participatory Resource Assessments (PRAs) with at least three households within each village to obtain a list of all medical service providers within the boundary of the village. After enumerators conducted the PRAs and compiled a list of providers, they visited all of the providers listed. These providers were asked about other providers in their village, and enuerators in turn added them to the village sampling list. All enumerated providers were then administered a Health Provider Census which included questions about their demographic characteristics, qualifications, patient fee, training, employment history, and various details of their practice.
20% of all listed providers did not complete the Health Provider Census (Appendix Table A1). Among them, 82 providers were recorded as having permanently left the clinic and they were dropped from the sample; 138 refused, and 21 had no reason recorded. The majority who did not complete the survey (618 providers) were temporarily away; therefore we believe the survey accurately reflects the average availability of providers as temporary unavailability is common among Indian providers.
Anganwadi and Accredited Social Health Activist (ASHA) workers were also excluded from the sample because they are public health workers but not medical care providers (2,390 individuals). Providers with dental degrees and chemists or pharmacists were also excluded. The final village provider sample used in the analysis of the availability of village providers included including 4,335 health care providers, which we use to calculate the availability of medical care in the main text, and 1,622 paramedical staff, which we report here for completeness.

Medical vignette administration
The provider listing resulting from the Health Provider Census was used as the sampling frame for follow-up surveys, which included medical vignettes for both public and private providers. Up to six providers per village were sampled for the follow-up surveys. If the village had six or fewer providers, all providers were sampled. If the village had more than six providers, one public provider was randomly sampled and then the remaining five were sampled from the pool of both public and private providers. Since there were fewer public providers, this sampling procedure ensured that at least one public provider was included in the sample if there was one in a village. If a village has many public providers because there is a CHC located in the village, up to four public providers were sampled. All estimates were re-weighted back to be representative of the provider type.
In a PHC or CHC in a village, only doctors, nurses, Auxilliary Nurse Midwives (ANMs), and Multipurpose Worker (MPWs) were eligible to be administered vignettes. In sub-centers and private clinics, all providers were subject to vignettes. 4,464 providers were sampled for the vignettes and 2,704 providers completed.
All doctors, nurses, ANMs, and MPWs working in PHC/CHCs and all providers in sub-centers and private clinics in the sample were eligible to be administered vignettes. Vignettes were conducted by a team of two enumerators. For each vignette, one of the enumerators acted as the patient, while the other recorded all history and examination questions requested by the provider and furnished the results from the physical examinations carried out by the provider. The overall completion rate among PHC/CHC providers was 85% among MBBS and AYUSH providers and 45% among providers with no qualification who are mostly nurses, ANMs, and MPWs. The data used in this paper are for the sample of health care providers (that is, excluding those paramedical staff). Among public and private providers in the village sample who were selected for vignettes, the completion rates were 43% and 54%, respectively.
There is not enough information to conclude whether vignettes were missing at random, whether lower quality providers opted out of vignettes because they were not confident or higher quality providers opted out of vignettes because they had higher opportunity cost of participating. On the one hand, in PHC/CHC sample, the fact that the completion rate of qualified doctors was higher than other providers and anecdotal evidence reported by enumerators that many ANMs and MPWs refused to participate in the vignettes saying that they do not treat patients suggest that lower quality providers may have opted out of vignettes. On the other hand, the lower completion rates observed among qualified public doctors in the village sample suggests the other way around.
In the vignettes presentations, each provider was presented with three out of four hypothetical cases with specific symptoms.
Providers were instructed to treat these cases as though these were their real patients and could ask any necessary questions, conduct tests or examinations, and recommend treatments. Four hypothetical cases used in the vignettes are provided below: • Tuberculosis: A 40-year-old man comes to you. The man complies with all tests and medications that you recommend and will return to you if you require. The patient says, "Doctor, I have been suffering from a fever, cough, weakness, and 3/26 weight loss for the last month." (All providers) • Pre-eclampsia: A 22-year-old female has come to you. As she walks in, you notice that her pregnancy is fairly advanced.
The woman complies with all tests and medications that you recommend and will return to you if you require. The patient says, "Doctor, I have been having a splitting headache -it feels like my head is going to burst." (All providers) • Diarrhea: A mother brings in an 8-month-old male child to you. The mother complies with all tests and medications that you recommend and will return to you if you require. The mother says "My child has been suffering from diarrhea for the last two days, and I do not know what to do." (Half of providers) • Dysentery: A 25-year-old mother of a young child comes to you. She complies with all instructions, tests and medications that you recommend for her child and will follow up with you if you require. The mother says "Doctor, my 2-year-old child has been suffering from diarrhea for 2 days." This difference with the diarrhea case is that further questioning would reveal the presence of a fever and blood in the stool, suggestive of an infection. (Half of providers) Each vignette case included an exhaustive list of questions that the doctor may ask with standard responses prepared for the enumerators to provide additional information to the provider. Enumerators recorded whether the provider asked questions or conducted examinations that are relevant to each case. Providers were also asked to pronounce diagnosis and recommend treatment for each case. Whether a provider prescribed correct medicine or gave a referral for each case were recorded. Every prescription was post-coded by a team of pharmacologists. For tuberculosis and preeclampsia, treatment was coded as being correct if a provider prescribed a correct medicine or if a combination of correct or partially correct diagnosis and a referral was given. For the diarrhea cases, treatment was coded as being correct if Oral Rehydration Salt (ORS) solution was recommended.

Provider competence score construction
The competence score variable was estimated using Item Response Theory (IRT) across a checklist of diagnostic questions for all vignette cases combined. This method creates a score which weights more difficult questions (those asked by the fewest providers) as more highly demonstrative of diagnostic knowledge, and eliminates redundancy between highly correlated items.
The composite measure of provider quality is based on questions asked across all three cases and therefore represents an underlying generalized diagnostic competence score for that provider.
The same set of vignettes was administered with providers in PHC/CHC sample. This sample was used to calculate average knowledge scores for public providers across all districts; and to calculate state-level average competences for MBBS and non-MBBS providers overall. Again, all providers were administered tuberculosis and preeclampsia (hypertension in pregnant woman) cases, and a half of providers were administered a child diarrhea case and another half was administered a child dysentery case.
The distribution of scores in the full (combined) vignettes sample is illustrated in Figure A2 and correlations are shown with: the correct management of the vignette case and the use of (unnecessary) antibiotics in the vignette case. Vignette weights 4/26 are derived from the survey design, and further adjustment was made to account for heterogeneous vignettes completion rates across provider types and regions. Let us define a variable V prov for each provider of a given (s, p, m)  The weights used in vignettes analysis, denoted by W vig , is given by In the analysis of district-level observations, each observation was assigned the following weight:

Socioeconomic status (SES) measure
As part of the Participatory Resource Assessments used to enumerate and identify health care providers in each village, information was collected on the asset ownership and status of households in each village and district to create a measure of socioeconomic status (SES) across the sample. The underlying components for this measure are reported in Appendix Table   A2. They varied across both states and districts such that representative measures could be constructed at each level and used as predictors or covariates of quality and availability. Principal components analysis (PCA) was used at the household level to create an index of availability of these assets and statuses (membership in a scheduled caste/tribes; adult primary education completion). PCA was chosen in order to account for covariation among the index components and appropriately weight components according to their contribution to variation.

Cost and quality calculations
In our simulations of the effect of specific policy proposals on the cost and quality of health care in average rural Indian villages, we compared specific simulated outcomes to the status quo. The status quo results are reported in detail in Appendix Table A3.
There, we report the public sector share of patient visits, the average public MBBS salary cost, the overall average cost per patient in that state, and the competence of the provider seen by the average patient in that state. The estimated share of patients visiting public sector providers in each state is calculated as the total number of (self-reported) patients seen by public sector providers divided by the total number of (self-reported) patient visits across all providers observed in all villages sampled in each state. The average public sector MBBS salary is the average salary reported by MBBS providers in each state. (No public sector providers were observed in Jharkand and Uttarakhand and therefore these values are not available.) The total cost per patient (in Indian Rupees) is the patient-share-weighted average of (a) the public sector average cost, calculated as the total salary cost divided by the total number of patients seen; and (b) the private sector average cost, calculated as the patient-weighted self-reported price per patient across private sector providers, or their total reported income divided by the number of patients, whichever is greater. The average provider competence is the patient-weighted average of the vignettes competence index across all providers in the state.

Average village availability of paramedical staff
To supplement our report of the availability of health care providers in the main text, we report the estimated availability of paramedical staff (including nurses, midwives, compounders and assistants) in each village. Appendix Figure A1 illustrates this availability measure. The majority of these staff are public-sector staff in most states with the exception of Bihar, and the majority do not hold formal medical qualifications of any type. Since many are public-sector staff, however, they will be concentrated in a few villages and an average of, for example, 1 per village (as in Maharashtra) does not imply that every village has access to such a provider. Additionally, they are often co-located with health care providers in larger facilities. Among all such providers, a small minority are AYUSH practitioners, and only a handful in the entire national sample are MBBS-qualified.

6/26
As with our estimates of health care provider availability across states, there is no apparent correlation between health outcomes at the state level such as the under-5 mortality rate (reported here) and the number of paramedical staff available in the average village.

Provider competence score construction
We utilize the set of vignettes in which providers successfully completed some checklist questions to assess the degree to which demonstrated knowledge of diagnostic technique (as reflected in the calculated competence score) predicts demonstrated knowledge of appropriate management technique for the same vignette cases. The distribution is constructed such that the average provider in the distribution has a score of zero, and the overall distribution has a standard deviation of 1. In Appendix Figure A2, we illustrate the correlations between this score -estimated only from the diagnostic questions and examinations performed by each provider -with the subsequent care decisions they took in the vignettes, including the use of unnecessary antibiotics and their provision of the correct treatment for each of the cases. Across the range of competence scores, the likelihood of correctly managing the case increases from 25% to 100% across the entire range of competence scores. The average provider (score 0) demonstrated knowledge of correct management for 75% of cases and a 1-standard-deviation increase in the competence score corresponds to a change of 10 percentage points likelihood of correct treatment for any case.
By contrast, the use of any medication in the vignette and the use of antibiotics specifically were very weakly correlated with the diagnostic score: providers said they would give some medication in 75% of cases and said they would give antibiotics in 25%, at all levels of diagnostic competence and treatment ability. (Note that the mean provider in this sample is somewhat better than the mean provider in the village-availability sample due to the inclusion of the broader PHC sample.)

Patterns in primary care-seeking
As part of the household survey, residents were asked if they had sought primary medical care in the past 30 days. Of the 10% of respondents who did, 66% reported that they sought care in the private sector and 33% reported they sought care from the public sector. About two-thirds reported seeking care in a village; and a third went to a town or city for care. There was no appreciable variation across sector; however, there was significant variation across states, as illustrated in Appendix Figure A3.

Validation of provider caseload and time use self-reporting
In a previous study involving 315 rural health care providers in West Bengal, provider surveys were combined with a participant observation (PO) exercise. In the PO exercise, a trained enumerator spent a whole day in each clinic to capture information about the actual patients and practice of each provider; this data produced matched data on 271 health care providers. This figure uses that data to compare the caseload and time-per-patient measures as self-reported in the provider survey against the caseload observed in a single day of complete clinic observation. The dashed line reports the relationship if the two were perfectly correlated; the solid line reports the relationship as observed in the data. We observe that the observed caseload is correlated with, but substantially below, the self-reported caseload in most instances, and that there is substantial heaping of self-reporting at 10, 15, and 20 patients per day, and at 10, 15, and 20 minutes per patient, as illustrated in Appendix Figure   7/26 Table 3, we note that the association between district SES and the knowledge of private providers disappears when we include the competence of public providers in Column 4; and we note that the MBBS coefficient is substantially reduced for private providers when we include state fixed effects. We provide some additional analysis in Appendix Figure   A7, A8, and A9 by investigating specific relationships between quality and district SES and MBBS qualifications across the various specifications.

Returning to
First, we note that the competence of public providers used as a control in the regression is strongly correlated with district SES across the whole sample. However, at the state level, the relationships are significantly weaker and more varied (Appendix Figure A8). For MBBS qualifications, we simultaneously find that in the village sample, the states with a higher proportion of MBBS providers in the sample performed better, driving the overall relationship (but preventing significant results once state fixed-effects are used due to the small number of states with significant MBBS samples). Using the larger PHC/CHC sample, we estimate the MBBS differences at the state level, and generally find significant effects, (Appendix Table A5 and Appendix Figure A9) although we note that these are primarily public sector providers and not necessarily those in the same rural areas, so we remain to have little to say about the skills of the private MBBS providers who have chosen to locate in the areas covered by the village sample.

Attrition in vignettes scoring
Since a significant proportion of the village sample was lost to follow-up in the vignettes exercise, we perform an analysis of the bias in our estimates due to this loss. First, in Appendix Table A6, we assess the differences between the samples with completed vignettes and those without using the same characteristics as in our demographic analysis. We observe that, overall, those with no advanced education, no other occupation, and busier and more expensive clinics are over-represented in the vignettes sample; this suggests a potential upward bias in our overall estimates of quality.
To obtain a better picture of the potential size of this bias, we conduct a two-step estimation procedure. First, using a large number of covariates, including all those above and additional demographic information about caste/tribe, religion, facility ownership, and place of birth, we use the elastic net regression procedure to machine-select those covariates that best predict completion of the vignettes module. Then, we regress both the vignettes completion and the vignettes performance on that set of covariates, reporting the results in the accompanying figure. The units for completion are percentage points; the units for performance are standard deviations. To expect substantial bias, we would need to identify some set of covariates that have large effects on both of these margins. Very few fit this criterion -full-time providers are one (leading to the potential upward biases mentioned above). The remainder of the correlates are either not selected or not significant in one or both of the selected regression models. The selected correlates and their estimated correlations with both vignettes completion and vignettes performance are reported in Appendix Figure A10.

8/26
Finally, using this set of correlates, we re-estimate our cost and quality results from Figure 6, using inverse probability weighting on the predicted likelihood of vignette completion from the elastic net model to reweight each provider's contribution to the estimates. We report both the original estimate and the re-weighted estimates in Appendix Table A7. We note that the changes in the estimates are both small and variable in direction, suggesting no large, predictable biases based on the initial survey data. Notes: This table reports the sampling and survey completion results from each state. The first column reports the total number of providers identified in the sampled villages in that state. The second reports the total number of providers successfully surveyed. The third reports survey non-completion. The remainder of the columns report recorded reasons for survey noncompletion: permanent closure, temporary unavailability, refusal, and no reason recorded. Data source: village provider census and survey.

Table A4. Cost and quality of average patient visits by state
Notes: This table reports the public sector share of patient visits, the average public MBBS salary cost, the overall average cost per patient in that state, and the competence of the provider seen by the average patient in that state. The estimated share of patients visiting public sector providers in each state is calculated as the total number of (self-reported) patients seen by public sector providers divided by the total number of (self-reported) patient visits across all providers observed in all villages sampled in each state. The average public sector MBBS salary is the average salary reported by MBBS providers in each state. (No public sector providers were observed in Jharkand and Uttarakhand and therefore these values are not available.) The total cost per patient (in Indian Rupees) is the patient-share-weighted average of (a) the public sector average cost, calculated as the total salary cost divided by the total number of patients seen; and (b) the private sector average cost, calculated as the patient-weighted self-reported price per patient across private sector providers, or their total reported income divided by the number of patients, whichever is greater. The average provider competence is the patient-weighted average of the vignettes competence index across all providers in the state. Data source: village provider census and survey; village vignettes sample.

Relationships between provider characteristics and vignette completion
Notes: This table reports balance checks from the sampled providers in the village survey that did and did not complete the vignettes samples. Data source: village provider census and survey.

Table A7. Effect of IPW reweighting on cost and quality estimates
Notes: This table reports the cost in Indian rupees per visit seen against the average caseload-weighted provider competence for each state, using both the original unweighted values as well as values calculated from weighting each provider by the inverse of the likelihood of their participation in the vignettes sample using the elastic net model. The cost per visit in the public sector is calculated as the sum of the total public sector wage bill using reported monthly salaries from all providers and dividing this total by the total number of patients seen each day multiplied by 20 working days per month. For the private sector, it is the greater of total reported medical income divided by the number of patients seen per day times 20 working days; or the self-reported fee per patient. The mean provider competence corresponds to a score of zero and the standard deviation is 1.

Completion Performance
Notes: This figure illustrates the estimated relationships between machine-selected covariates and completion and performance in the vignettes sample for providers observed in the village survey round. First using our original demographic characteristics and additional demographic information about caste/tribe, religion, facility ownership, and place of birth, we use the elastic net regression procedure to machine-select those covariates that best predict completion of the vignettes module. We then regress both the vignettes completion and the vignettes performance on that set of covariates, reporting the estimated coefficients in this figure. Units for vignettes completion are percentage points (a coefficient of +0.5 represents a 50% higher likelihood of completing the vignette); units for vignettes performance are standard errors relative to the mean performance. N=3,338 providers (vignette completion) and 1,406 providers (vignette performance). Data source: village survey and vignettes sample.