The influence of population characteristics on variation in general practice based morbidity estimations

Background General practice based registration networks (GPRNs) provide information on morbidity rates in the population. Morbidity rate estimates from different GPRNs, however, reveal considerable, unexplained differences. We studied the range and variation in morbidity estimates, as well as the extent to which the differences in morbidity rates between general practices and networks change if socio-demographic characteristics of the listed patient populations are taken into account. Methods The variation in incidence and prevalence rates of thirteen diseases among six Dutch GPRNs and the influence of age, gender, socio economic status (SES), urbanization level, and ethnicity are analyzed using multilevel logistic regression analysis. Results are expressed in median odds ratios (MOR). Results We observed large differences in morbidity rate estimates both on the level of general practices as on the level of networks. The differences in SES, urbanization level and ethnicity distribution among the networks' practice populations are substantial. The variation in morbidity rate estimates among networks did not decrease after adjusting for these socio-demographic characteristics. Conclusion Socio-demographic characteristics of populations do not explain the differences in morbidity estimations among GPRNs.


Background
Policy makers need valid epidemiological information about the incidence and prevalence rates of diseases in the population to formulate public health policy. Every four years, the Dutch Public Health Status and Forecasts Report presents an overview of the population's health status using key public health indicators such as (healthy) life expectancy, morbidity rates and health determinants [1,2]. In this report general practice based data are used to estimate the population's morbidity in terms of incidence and prevalence rates of many diseases.
Using data generated by general practice registration networks (GPRNs) to estimate morbidity has many advantages, especially in countries with a strong primary care system, like the United Kingdom and the Netherlands [3][4][5]. In these countries, all non-institutionalized residents are listed with a single general practitioner (GP), which makes a precise determination of the population at risk possible.
GPRNs put a lot of effort in building a reliable database. GPs, who belong to the same GPRN, are expected to use uniform recording methods and classification systems to record diseases. Furthermore, GPRNs systematically check the data to assure quality. Still, GPRNs differ from each other on several aspects. For example, there are GPRNs that include all morbidity presented in general practice, 'episode based' registries, while others only record chronic or very serious conditions into their database, also called 'problem based' registries [4].
In a previous paper, we identified possible explanations for differences in morbidity rates among Dutch GPRNs and categorized them into four types of factors, health care system, methodology, practice/practitioner characteristics and patient characteristics. Until now, the contribution and mechanisms of these factors on the differences in morbidity estimation among GPRNs are not fully understood [3,4]. To improve the usability of GPRN data for morbidity estimations of the total national population these aspects need to be investigated.
In this paper we investigate the effect of differences in patient characteristics on variation in morbidity estimations among GPRNs. Age, gender, socio-economic status (SES), urbanization level and ethnicity affect the probability to be diagnosed with a certain disease. For example, 65 percent of the people in low socio-economic class is chronically ill compared to nearly 40 percent of the people in the highest socio-economic class [1]. There is reason to believe that the distribution of population characteristics varies among GPRNs, because some networks only operate in urban areas, while others operate in both urban and rural areas [4]. Furthermore, most networks operate in a specific region, while immigrants are not equally spread across the Netherlands [6].
Before investigating the effect of socio-demographic characteristics on the variation in morbidity among GPRNs, we studied the variation between networks and practices. We assume that for diseases with more ambiguous diagnostic criteria (e.g. depression) the variation among networks and among practices is larger than for diseases with clear diagnostic criteria (e.g. diabetes mellitus) [7]. For diseases with disease-free periods (e.g. dermatitis, depression), we expect more variation in prevalence rates than in incidence rates [8,9]. These differences result from difficulties in determining the ending of an episode in the registration. An episode starts when a GP records information about a patient's health, from contact with the patient or from information about the patient's condition from other health care providers, in the patient's medical record. On the other hand, a GP does not receive information when a disorder is cured [10,11].
In summary, the goal of this paper is to study the variation among general practices and networks in incidence and prevalence rates of a selection of diseases. To gain more insight in possible explanations for these differences in morbidity rates, we investigate the influence of population characteristics. We hypothesize that adjusting for differences in age, gender, SES, urbanization level, and ethnicity among networks will reduce the variation among networks and therefore partly explain the differences in morbidity estimations among GPRNs.

Databases
We used 'episode based' data, which include information about all contacts for a specific health problem of an individual patient. Episodes are defined as the period between the first presentation of a health problem in general practice until the last recorded contact for the same health problem or disease. Episodes contain the coded information about diagnosis, referrals, interventions and prescribed medication [10].
We used data from six Dutch GPRNs, who were able to supply episode based data; the Continuous Morbidity Registration Nijmegen (CMR-N), the General Practice Network Academic Medical Centre (GP-net-AMC) the Netherlands Information Network of General Practice (LINH), the Registration Network of General Practitioners Associated with Leiden University (RNUH-LEO), the Study of Medical Information and Lifestyle in Eindhoven (SMILE) and the Transition project (Trans). Details of these GPRNs and other Dutch databases can be found elsewhere [4].

Using the data
We performed an observational study without any interventions. In the Netherlands, no approval is necessary from an ethical committee for analyzing data from general practice registration networks. The data are not openly available, permission to use the data is granted by RNUH-LEO, SMILE, Transition project, LINH steering committee, HAG-net-AMC steering committee and the chair of CMR-N.

Selection of diseases
We selected the diseases on the basis of three criteria. First, the expected occurrence of the disorder in the general Dutch population should be at least 3 per 1000 per year, with a preference for the more common diseases [7]. Second, we aimed to represent all ICD classification chapters to obtain a broad spectrum of diseases (chronic and acute illnesses, psychological and somatic diagnoses, illnesses of different organ systems). Third, we selected a variation of diseases to include a variation of diseases which mainly occur in specific groups of people (young, old people, women, men). Twelve diseases were selected; gastrointestinal tract infections, diabetes mellitus, depression, anxiety disorders, stroke, coronary heart disease (CHD), chronic obstructive pulmonary disease (COPD), asthma, urinary tract infection, dermatitis, osteoarthritis and neck and back problems. Shingles or herpes zoster was added as 13 th disease because of its consistent occurrence in the population. Fleming and colleagues demonstrated that the incidence rates of herpes zoster can be used as an indicator of accurate population estimates and it might be used as a indicator of recording quality [12].

Incidence and prevalence rates
In general, GPs record diagnoses according to the International Classification of Primary Care (ICPC) [13], only one GPRN uses the so-called E-list codes [14,15]. To obtain comparable morbidity rates some codes were combined to determine incidence and prevalence rates. Different codes for neck and back problems are, for example, combined into one disease category. The GPs of all GPRNs are trained to use the classification system properly.
In this study, we used data recorded in 2007. To determine incidence rates we counted all patients with a new episode of a certain disease in the period from January 1 2007 to December 31 2007 per 1, 000 listed patients. The incidence of chronic diseases represents the number of patients that have been diagnosed with the disease for the first time. The incidence figures of acute or recurring illnesses represent the number of patients that at least had one new episode of the disease in 2007. Prevalence rates were calculated by counting the number of patients with a new or an existing episode of a specific disease in 2007 per 1000 listed patients. Incidence rates were calculated for all thirteen diseases; prevalence rates were only calculated for the 10 chronic or recurring diseases. The epidemiological denominator was measured by counting all listed patients adjusted for the number of days a person was registered in the general practice (in case of moving from or to the practice, death or new-borns) in 2007. One GPRN (HAG-net-AMC) had only prevalence data available.

Socio-demographic characteristics
We analyzed the effect of age, gender, SES, urbanization level and ethnicity. Age (in years) and gender were derived from the central database of the GPRN. SES, urbanization level and ethnicity were determined by proxy using 4-digit postal codes of the patients' home address (the population size is about 4, 000 per postal code area) [16]. The SES score was developed by Knol and colleagues, who estimated SES using principal-component analysis on the basis of different factors indicating socio-economic position, such as average income per household, percentage low income households, percentage unemployed, and percentage households with a low educational level. These indicators are commonly used to determine SES and contribute to a fair estimation of the SES of the population a particular area. The results of this analysis were available on the website of the Netherlands Institute for Social Research (SCP) [17]. The values were divided into quintiles, but to retain the power in our analyses we recoded SES into three categories (1-2 = high, 3 = medium, 4-5 = low SES). Following common practice, urbanization level and ethnicity were derived from Statistics Netherlands [16].
Urbanization level was analysed in three categories; 'very urban', 'urban' and 'rural', based on the total number of addresses in one postal code. Ethnicity was based on the percentage non-western immigrants in a postal code area according to the definition of Statistics Netherlands. To be classified as a non-western immigrant a person or at least one of his/her parents must be born in a non-western country (Turkey, all countries in Africa, countries in Asia or the South-America, except of Netherlands East Indies and Japan). We distinguished four categories: people living in neighbourhoods with almost no (0 < 10%), some (10 < 50%), many (50 < 70%) or most (≥70%) persons from non western origin. This represents the probability that a person is from nonwestern origin.

Analyses
Descriptive analyses were applied to get insight into the frequency and distribution of socio-demographic characteristics of the listed patient population of GPRNs. To explore the differences in morbidity rate estimates among GPRNs multilevel logistic regression analysis was used, distinguishing three levels (patient, practice, and network). We used random intercepts on network and practice level to determine the unexplained variation among GPRNs and practices. The differences in morbidity estimations among GPRNs were analyzed by calculating the corresponding median odds ratio (MOR) and 95% confidence intervals. MOR quantifies the variation between clusters by comparing two 'identical' persons from two randomly chosen, but different clusters. MOR expresses the heterogeneity on an odds ratio scale among clusters and represents the median increased risk. Consequently MOR can never be smaller than one. A cluster consists of all patients belonging to the same practice or network, respectively. In multilevel logistic regression analysis, MOR can be calculated for the network and practice level. In this paper, MOR implies that between two randomly chosen practices or networks, the risk of being diagnosed with a disease (i.e. diabetes mellitus) is x times higher in the randomly chosen network or practice with the highest occurrence rate compared to the risk of being diagnosed with that disease in the other randomly chosen network or practice with the a lowest occurrence rate [18,19].
We analysed the effect of socio-demographic characteristics in three steps. The first step consisted of analyzing the variation in an empty model (model 0), where no socio-demographic characteristics were taken into account. In the second step, the variation among networks and practices was adjusted for age and gender (model 1) and in the third step SES, level of urbanization and ethnicity were also considered (model 2). All analyses were carried out using SAS version 9.2.

Socio-demographic characteristics
The total study population consisted of 487, 516 persons in 109 practices with a mean age of 38.5 years and almost fifty percent males (49.0%), see Table 1. The distribution of age and gender was comparable among GPRNs, the proportion of males ranged from 47.4 to 49.4 percent and the age differences among GPRNs varied in the age group under 20 years from 22.9 to 26.3 percent and the age group over 65 years from 11.4 to 17.5 percent. The distribution of SES, urbanization level and ethnicity was more diverse: the relative size of the low SES group ranged from 10.6 to 79.7 percent and some GPRNs operated almost exclusively in 'very urban' areas (highest rate 86.0%) while others operated mainly in 'rural' areas (highest rate 71.8%). Less than 0.5 percent of the population of CMR-N, RNUH-LEO, SMILE and Trans lived in neighbourhoods with 50% or more non western immigrants. Table 2 shows the included ICPC-1 codes of the diseases and disorders under study. The range of the incidence and prevalence rates among GPRNs is large (see table 2). For example, the estimated incidence rates of depression range from 4.4 to 14.2 per 1000 in 2007. We observed these relatively large differences in most diseases. This is also illustrated by the MOR. The results of model 0 illustrate the variations without adjusting for any socio-demographic covariates. If we consider the incidence rates of depression again, a MOR of 1.49 (1.14-3.04) is shown among networks and 1.40 (1.29-1.52) among practices. This implies that in two randomly chosen GPRNs, the risk of being diagnosed with depression is "on average" about 1.5 times higher in the GPRN with the highest incidence rate than in the GPRN with the lowest incidence rate. Statistical significant differences among GPRNs were found for most other diseases. There were some exceptions. The incidence rates of herpes zoster showed no significant differences among networks (MOR network = 1.08 (1.00-1.34) p-value = 0.19), as did the incidence rates of diabetes mellitus, coronary heart disease, urinary tract infection and osteoarthritis.

Differences in morbidity estimations among GPRNs
In general, the amount of variation among practices is larger than among networks. This is visible in incidence rates of 10 out of 13 diseases and in prevalence rates of 6 out of 10 diseases. An evident example is diabetes mellitus, where the morbidity rate estimates of diabetes mellitus show relatively small differences among networks (incidence rates MOR network = 1.00 (1.00-1.37) and prevalence rates MOR networks = 1.20 (1.08-1.61) but the variations among practices are relatively large (incidence rates MOR practice = 1.59 (1.44-1.77) and prevalence rates MOR practice = 1.49 (1.43-1.53)).
Looking at differences among networks, relatively large differences (MOR > 1.40) were seen in the incidence rates of gastrointestinal tract infections, depression and anxiety disorders and the prevalence rates of depression, anxiety disorders, stroke, CHD, dermatitis, osteoarthritis and neck and back problems. Overall, the variation in incidence rates is smaller than the variation in prevalence rates among networks as well as among practices.

Socio-demographic characteristics and differences in morbidity
The socio-demographic characteristics, age and gender contributed significantly to the morbidity estimates of all diseases (except gender in COPD). SES, ethnicity and urbanization level showed only a significant contribution to morbidity rate estimates for a part of the diseases under study (results not shown). Even though differences in the distribution of socio-demographic characteristics are apparent (Table 1) we observe only small changes in variation in morbidity estimates among GPRNs (Table 3 and 4). In most diseases the MOR seems to decrease after adjustment for population characteristics, although for some diseases, the MOR even increased. For example, the variations among GPRNs in incidence rates of depression with and without adjusting for socio-demographic characteristics, expressed in MOR, are 1.49 (1.14-3.04) (no adjustments), 1.48 (1.12-3.02) (age and gender) and 1.40 (1.00-2.77) (adjusted for age, gender, SES, ethnicity, and urbanization level). Overall, accounting for socio-demographic characteristics did not explain the variation between GPRNs or practices, though in most diseases the confidence intervals of MOR became smaller.

Discussion
Morbidity estimates can be derived from routine data collected in general practice. A setback for using these data for public health reporting is that morbidity estimates vary largely between different general practice registration networks (GPRNs). In this study we quantified these differences and studied the effect of sociodemographic characteristics of the population covered by the different GPRNs on the variations in 'episode based' morbidity data.

Summary of main findings
There are large differences in morbidity rate estimates among GPRNs and these differences are more apparent for prevalence than for incidence rates. The risk of being diagnosed with a particular disease depends on the GPRN or general practice a patient belongs to. An exception is, for example, the incidence of diabetes mellitus which shows almost no variation. Differences in socio-demographic characteristics could not explain the variation in morbidity estimations among GPRNs.

Differences among networks and among practices
Hardly any variations among GPRNs are observed in the incidence rates of diabetes mellitus, CHD, urinary tract infections, osteoarthritis and herpes zoster. Diabetes is a disease which can be clearly diagnosed. The same is true for urinary tract infection, osteoarthritis and herpes zoster, which are often painful and therefore patients are likely to seek medical care. For patients with CHD it is important to receive medical care and therefore these patients are nearly always known by the GP. We expected differences among GPRNs and practices in morbidity estimates to be larger in diseases with more ambiguous diagnostic criteria [7]. In accordance with this expectation, large differences were seen in depression, anxiety disorders and gastrointestinal tract infections, where determination of these disorders depends highly on the presentation of the complaints to the GP.
Furthermore, large differences were expected in the prevalence rates of recurring diseases. Prevalence is influenced by the routine of closing episodes of diseases in the registration when the recurrence of the condition is over [3]. The large variations found in the prevalence rates of depression, dermatitis and neck and back problems might be explained by differences in these routines among GPRNs. Interestingly, we expected large differences in diseases for which people receive little medical care, but this was only observed in the prevalence of osteoarthritis. We observed hardly any differences in incidence rates, which suggest that GPs see and diagnose relatively the same number of patients with osteoarthritis. This may also be true for neck and back problems. The large differences could be explained by different operational definitions and recording rules of prevalent cases in the different GPRNs. Defining a prevalent case in "episode based" data can be done in two ways: 1) a case is prevalent only when the patient has had at least one GP-contact for that disease in the year of interest or 2) all known cases with a previously recorded diagnoses for that disease, count as prevalent cases, irrespective whether a contact for that disease took place in the observation year. Osteoarthritis is a chronic disease, but since health care cannot always provide effective treatment patients do not necessarily contact their GP each year. These differences in recording rules may explain some of the variation in prevalence rates among GPRNs.
For most diseases differences are larger among practices than among GPRNs. This is apparent in the incidence rates of diabetes mellitus, even though diabetes mellitus has clear diagnostic criteria and results are adjusted for the socio-demographic characteristics of the patients. This can possibly be explained by coding qualities of practices within networks or differences in practice characteristics, but this was not investigated in this research. In this context it is also interesting to investigate the differences between strict and more interpretable recording rules on variation among practices.

Influence of population characteristics
Although age and gender contribute significantly to the determination of morbidity, differences among GPRNs and among practices do not change after adjustment for these variables. This finding seems contradictive, but there are just small differences in age en gender distribution among GPRNs and therefore only small changes are possible.
The influence of SES, ethnicity and urbanization level is also limited, despite the large differences in distribution among GPRNs. We believe this to be the case due to little power, because of the small numbers of patients diagnosed with a disorder in comparison to the 'healthy' people. Furthermore if the socio-demographic characteristics significantly contribute to an improved morbidity estimation, as for example SES and ethnicity in back and neck problems (results not shown), this effect is too Despite the small changes in variation after adjustment, differences among GPRNs and practices still remain large.

Strengths and limitations of this study
To our knowledge, this is the first study to investigate the influence of socio-demographic characteristics on the variation of morbidity estimates among 'episode based' GPRNs. The distribution of age and gender in the different network populations corresponds reasonably well to the Dutch general population. The differences in ethnicity and urbanization level are much larger among networks, which is caused by the fact that most networks operate regionally and the distribution of these characteristics is not equally distributed between regions in the Netherlands. Therefore we think adjusting for these characteristic is essential. Some GPRNs show an extreme distribution on some of the socio-demographic characteristics as, for example; more than 85% of the HAG-net-AMC population lives in very urban areas. Reanalysis without this GPRN did not lead to changes: some variations slightly increased, some decreased, and still hardly any changes were seen after adjusting for socio-demographic characteristics (results not shown).
To investigate the effect of socio-demographic characteristics, we adjusted for the differences in population composition among GPRNs. However, direct measures of SES and ethnicity were not available, and we had to rely on proxy measures. This may have led to an underestimation of the effects of SES and ethnicity because of these less accurate estimates. Overall, the relations found seem to be legitimate. For example, low SES was related to higher morbidity rates of diabetes mellitus and in COPD high SES was related to lower morbidity rates (results not shown) [20]. Although direct measures are more precise this could not explain that some variations even increase. Therefore we assume our conclusion, that socio-demographic characteristics do not explain differences among GPRNs, to be valid.
The differences in incidence estimations of herpes zoster among GPRNs are small and within the range seen in other research [12]. As the crude figures for herpes zoster show no significant variation among networks (MOR network = 1.08 (1.00-1.34)), we can conclude that the populations used are sufficient. It might even indicate a good recording quality of the GPRNs [12].
We only used 'episode based' data to rule out the differences due to different types of data. We have data of eight Dutch GPRNs, four networks only have 'episode based' morbidity data, two have 'problem based' data and two have both. Such a low number of GPRNs makes it impossible to include data type in the multilevel analyses. Other Dutch GPRNs did not want to participate or were not able to deliver their data on time. Dutch GPRNs differ from each other, but the distribution of the population characteristics in different GPRNs was broad and therefore we think considering other GPRNs would not have changed our conclusion.

Conclusions
In a previous paper, we identified factors which may be responsible for the differences in morbidity among general practices and registration networks. Current research showed that one of the factors, the characteristics of the patient population, could not explain these differences. Understanding the differences between GPRNs and practices is a first step to come to the most valid and reliable estimate for the morbidity in the general population.