Reference values for spirometry – report from the Obstructive Lung Disease in Northern Sweden studies

Background Abnormal lung function is commonly identified by comparing observed spirometric values to corresponding reference values. It is recommended that such reference values for spirometry are evaluated and updated frequently. The aim of this study was to estimate new reference values for Swedish adults by fitting a multivariable regression model to a healthy non-smoking general population sample from northern Sweden. Further aims were to evaluate the external validity of the obtained reference values on a contemporary sample from south-western Sweden, and to compare them to the Global Lung Function Initiative (GLI) reference values. Method Sex-specific multivariable linear regression models were fitted to the spirometric data of n=501 healthy non-smoking adults aged 22–91 years, with age and height as predictors. The models were extended to allow the scatter around the outcome variable to depend on age, and age-dependent spline functions were incorporated into the models to provide a smooth fit over the entire age range. Mean values and lower limits of normal, defined as the lower 5th percentiles, were derived. Result This modelling approach resulted in unbiased estimates of the spirometric outcomes, and the obtained estimates were appropriate not only for the northern Sweden sample but also for the south-western Sweden sample. On average, the GLI reference values for forced expiratory volume in one second (FEV1) and, in particular, forced expiratory vital capacity (FVC) were lower than both the observed values and the new reference values, but higher for the FEV1/FVC ratio. Conclusion The evaluation based on the sample of healthy non-smokers from northern Sweden show that the Obstructive Lung Disease in Northern Sweden reference values are valid. Furthermore, the evaluation based on the south-western Sweden sample indicates a high external validity. The comparison with GLI brought further evidence to the consensus that, when available, appropriate local population-specific reference values may be preferred.

spirometry in 2012 (10), which currently are endorsed by several respiratory societies such as the ERS and the American Thoracic Society (ATS) (10,15). However, it has recently been shown that the GLI reference values may not be appropriate in all countries (16Á20).
It is essential that the sample from which the reference values are derived is representative for the contemporary healthy non-smoking population. The age range and other anthropometric, ethnic, environmental, and socioeconomic factors must be considered, since such factors can affect lung function (21). In addition, the spirometric measurements should be performed in line with recommended guidelines (22Á25). It is also important that reference values for spirometry are evaluated and updated continuously (22,24,25).
Linear regression models have commonly been used to model lung function. In most models, the predictors are sex, age, and height, but sometimes also weight and ethnicity (1Á6, 9Á13). It has been shown that subjects of European ancestry (Caucasians) have larger lung volumes compared with subjects of other races/ethnicities (22,26,27). Height is a proxy for chest size, and women have smaller lung volumes than men. Age is a proxy for maturity, and lung volumes increase by age during childhood and adolescence followed by a plateau with a subsequent decrease, with a starting point some years post adolescence (3,10,23). In general, there is a high variability in the lung function development in elderly subjects and the age dependence in later stages of life is less studied (24).
Recently, progress has been made in the area of modelling lung function and the Lambda-Mu-Sigma (LMS) method imbedded in the generalised additive models for location, scale and shape (GAMLSS) models (28,29) is preferred by some authors (12,30). This method was used to derive the GLI reference values (10,31). Beside the mean, the GAMLSS allow for skewness and kurtosis to be modelled. It is also common to use spline functions to allow both the predicted estimates and the standard deviation (SD) to vary non-linearly as functions of an explanatory variable (9,10,12,30). Previous studies have shown that not only the predicted mean but also the SD vary with age, especially when including ages from childhood to adulthood (10,31), while a large but somewhat older study indicated constant variance for adults (32).
The aim of this study was to estimate new up-to-date reference values for spirometry for adults of European ancestry by fitting a multivariable regression model to data from Caucasian healthy non-smokers sampled from the general population of northern Sweden. Further aims were to evaluate the external validity of the new reference values on contemporary data of healthy non-smokers from south-western Sweden and to compare them with the GLI reference values.

Material and method
The northern Sweden reference sample As a part of the Obstructive Lung Disease in Northern Sweden (OLIN) studies, 1,016 randomly selected respondents from a large postal questionnaire survey in 2006 (33) were invited to clinical examinations in 2008Á2009. Of them, 737 subjects (72.5%) aged 21Á86 years participated in structured interviews and spirometry (34). In 2011Á 2013, 738 additional healthy non-smokers according to the 2006 questionnaire survey were invited to identical examinations, and 448 subjects (60.6%) aged 25Á91 years participated (17). The study flow chart is illustrated in Fig. 1. Information about respiratory diseases and symptoms, other diseases, and smoking history was collected at the interview. The study was approved by the Regional Ethical Review Board at Umeå University, Sweden.
Healthy non-smokers were defined as having none of the following criteria: usually wheeze when breathing; sputum production most days in periods of 3 months per year; ever have had asthma; been diagnosed as having  asthma, chronic bronchitis, COPD or emphysema by a physician; ever use of asthma medication regularly or when needed; ever use of medication for chronic bronchitis, COPD, or emphysema; ever had ischemic heart disease; wheeze in past 12 months with concurrent breathlessness; wheeze in past 12 months without having a cold; wheeze in the past 12 months most days per week; any other disability that could affect the lung capacity; or mMRC dyspnoea scale]2. A further exclusion criterion was a cumulative life-long smoking history of !1 pack year.
The age was calculated by one decimal point as the difference between date of birth and date of examination. Date of birth was collected from the Swedish national registry. Height was measured without shoes with an accurate stadiometer with 0.5 cm precision. Weight was measured with 0.5 kg precision with empty pockets without jacket and shoes. Caucasian ethnicity was defined as a subject of European ancestry. Two Jaeger Masterscope spirometers (JLAB version 5.21 software, CareFusion, Wü rzburg, Germany) were used to measure forced expiratory volume in one second (FEV 1 ), forced expiratory vital capacity (FVC), and slow expiratory vital capacity (SVC). The procedure followed the ATS/ERS recommendations (35) but with a reproducibility criterion of55% instead of5150 ml deviation from the second highest value. The same highly experienced, trained, and qualified research nurses performed the measurements throughout the study. The spirometers were calibrated every morning. The highest value for FEV 1 , FVC, and SVC was recorded for each subject and at least three up to a maximum of eight measurements were performed to fulfil the reproducibility criterion. The mean (median) absolute differences between the highest and the second highest values were: 38 ml (27 ml) for women and 59 ml (45 ml) for men in FEV1, and 53 ml (44 ml) for women and 69 ml (54 ml) for men in FVC. In total, 2.8% of the FEV1 measurements and 6.1% of the FVC measurements deviated!150 ml. Vital capacity (VC) was defined as the highest value of FVC and SVC. In total, 501 subjects, 49% women, were identified as healthy non-smoking subjects of European ancestry with adequate spirometry quality. Further details regarding the measurements of lung function can be found in the Appendix.
Derivation of the OLIN reference values based on the northern Sweden reference sample Normal QÁQ plots were used to evaluate and confirm normal distribution of the spirometric indices FEV 1 , FVC, SVC, VC, the FEV 1 /FVC ratio and the FEV 1 /VC ratio, and linear regression was used to model each outcome measure. Sex-specific multivariable linear regression models were estimated for each of the spirometric indices, with age and height as independent variables. The relationships with weight were also investigated, but as model improve-ments were absent or negligible, weight was omitted to keep the models parsimonious. Age-dependent spline functions were incorporated in each model to allow the outcome variable to vary smoothly over the age range. A common assumption of multivariable linear regression is that the SD around the regression function is constant (homogeneous variance). This assumption was extended to include the case when the SD is a linear function of one of the regression variables (an explanatory variable), in this case age. Further details of the statistical modelling are described in the Appendix.
Model accuracy check Z-scores (standardised residuals) and percentiles were calculated for each subject. If the observed value has a perfect normal distribution with mean and SD equal to the mean and SD of the reference values, then mean and SD of the percentile point will be 0.50 and 0.2887, respectively. Equivalently, the mean and SD of the Z-score will be 0 and 1, respectively. These facts were used to statistically test the accuracy of the estimations. A p ]0.05 implies good model accuracy. In addition, possible relationships between Z-scores and age, height, weight, and sex were examined by ordinary linear regression models, with significant associations presented as Beta-coefficients. If the reference values are in perfect agreement with the observed values, no such relationship will exist.
The south-western Sweden sample To evaluate the external validity of the OLIN reference values, they were also applied to spirometry data of a general population sample from another region in Sweden. In 2009Á2012, clinical examinations were performed on a sample aged 17Á78 years from the Swedish region of West Gothia (36). The examinations included dynamic spirometry also using a Jaeger Masterscope spirometer (JLAB version 5.03 software, CareFusion) and a structured interview with the same questions as in the interview for the northern Sweden sample. In total, 2,000 subjects were invited to participate in the clinical examinations, of which 1,172 (59%) participated. The study was approved by the Regional Ethical Review Board at the University of Gothenburg. From the participants, 358 healthy non-smoking subjects of European ancestry ]22 years of age, 54% women, were identified using the same eligibility criteria for healthy non-smokers as in the northern Sweden reference sample. Percent of predicted, Z-scores (standardised residuals) and percentiles based on the OLIN reference values were calculated for each subject.

Comparison with the GLI reference values
The predicted sex-and age-specific FEV 1 , FVC and FEV 1 / FVC ratio (mean OLIN and LLN OLIN ) was compared to predictions by the GLI reference values (mean GLI and

Results
Equations for the OLIN reference values for spirometry As a result of the modelling technique, the SD, reference value (mean), and lower limit of normal (LLN, defined as the lower 5th percentile) for each of the spirometric indices are calculated by the following formulas: SD: A'B*age Mean: (B(J)*X(J))*SD LLN: Mean (1.645*SD where possible values for J ranges from 1Á5, age is expressed in years, and the coefficients A, B, B1-5 and the variables X1-5 are found in Table 1. Calculation examples are illustrated in the Appendix. The mean ages for the 244 women and 257 men in the reference sample from northern Sweden were 49.2 (range 22Á91) and 46.6 (range 22Á86) years, respectively. The mean height was 163.3 cm (range 139.0Á181.0) for women and 178.9 cm (range 162.5Á198.0) for men, and the mean weight was 68.2 kg (range 45.0Á118.0) for women and 84.8 kg (range 56.0Á148.0) for men. These are the ranges of age, height, and weight in which the equations for the OLIN reference values are valid.
Model accuracy check based on the northern Sweden reference sample Mean observed and predicted values, observed values as percent of predicted values, Z-scores and percentiles for the reference sample (n0501) are shown in Table 2, stratified by sex and three age groups. The mean observed values and predicted values were close to identical, which yielded mean percent of predicted close to 100%. Correspondingly, the mean Z-scores were approximately 0 with SDs close to 1, and mean percentile points are approximately 0.5 with SDs close to 0.288. Statistical testing revealed that the models for all spirometric indices were concordant with the observed values. These results were equivalent across all age groups and for both sexes. No significant associations between the Z-scores and sex, age, height or weight were found.

Applying the OLIN reference values to the southwestern Sweden sample
The OLIN reference values were applied to the southwestern Sweden sample, and percent of predicted, Z-scores and percentiles were calculated for each subject. The mean percent of predicted was close to 100% and mean absolute values of the Z-scores are approximately 50.4, corresponding to approximately53% of predicted, The standard deviation (SD) is calculated by the following formula: A'B*age. The reference value (i.e., the mean) is calculated by the following formula: (S B(J)*X(J))*SD.   for all spirometric indices in both age groups and for both sexes in the south-western Sweden sample (n0358). However, statistical testing revealed that the model fit was not perfect for some of the spirometric indices, especially among women (Table 3). Also, small but statistically significant associations between the Z-scores and sex were found for FEV 1 , SVC, and VC such that women on average had higher Z-scores than men when corrected for age and height (B-coefficients between 0.331 and 0.453). When also corrected for weight, the association between the Z-scores for FVC and FEV 1 /FVC reached statistical significance for female sex (B-coefficients of 0.330 and 0.177, respectively). No other statistically significant associations between the Z-scores and age, height, or weight were found when weight was included, except for the Z-score for FVC where height measured in cm yielded a B-coefficient of 0.018 with p-value 0.049.

Comparison with GLI
The predicted mean GLI for FEV 1 in the northern Sweden reference sample was on average 67 ml lower (pB0.001) than the predicted mean OLIN for women and 103 ml lower (pB0.001) than the predicted mean OLIN for men. For FVC, the predicted mean GLI was on average 203 ml lower for women (pB0.001) and 190 ml lower for men (pB0.001). The average predicted mean GLI for the FEV 1 / FVC ratio was 0.028 units higher (pB0.001) for women and 0.011 units higher for men (pB0.001).
To make further comparisons to the GLI reference values, the predicted reference values of FEV 1 , FVC and the FEV 1 /FVC ratio (mean and LLN) for a woman and man of average height are plotted by age in Fig. 2aÁf. The comparisons showed that for a man of average height, mean GLI for both FEV 1 and, in particular, for FVC, were lower than mean OLIN across the ages 22Á86 years. For a woman of average height, mean OLIN and mean GLI for FEV 1 were similar, but mean GLI FVC is constantly lower than mean OLIN FVC. Consequently, the mean GLI FEV 1 /FVC ratio was also consistently lower than the mean OLIN for the woman. For a woman of average height, the LLN GLI and LLN OLIN for the FEV 1 /FVC ratio was well in concordance throughout the age span, but less so for FEV 1 and FVC where LLN GLI was consistently lower than LLN OLIN . The pattern was somewhat different for the average height man, where the gap between LLN GLI and LLN OLIN increased with increasing age for all three spirometric indices. LLN GLI for the FEV 1 / FVC ratio reached below 0.7 at the age between 46 and 47 years for a woman of average height and at the age

Discussion
The modelling approach in this study produced unbiased estimates of mean observed values in the reference sample from northern Sweden. The approach to model lung function by linear regression is a powerful technique, which we successfully extended to let the SD depend on age. In general, the GLI reference values for FEV 1 and FVC were lower than the corresponding values in our sample, but higher for the FEV 1 /FVC ratio. In addition, the SD for the GLI reference values was larger for all spirometric indices, which can impact the LLN. Furthermore, the results indicate a high external validity for Swedish conditions, when evaluated on the sample from south-western Sweden. Historically, multivariable linear regression has commonly been used to model spirometric outcomes since it is robust and straight-forward. However, linear regression assumes that the outcome variable is normally distributed; an assumption not always true for all spirometric indices (3,10). In addition, traditional multivariable linear regression modelling does not incorporate the dependence that exists between the dispersion around the spirometric outcome and age (10,25), that is, one of the independent or explanatory variables. In this study however, we found all spirometric outcomes to be reasonably normally distributed. A new approach to model spirometric outcomes by linear regression is presented, where the sexspecific spirometric outcomes are predicted by age and height, and where the SD also is allowed to linearly depend on age. Because of the strengths of linear regression modelling mentioned above, this approach presents a useful technique which can be used to model lung function in an efficient way.
It is essential that reference values are evaluated and updated frequently in order to confirm that the classifications of normal and abnormal reflects the realities of the contemporary general population (22,24,25). It has recently been shown that the GLI reference values may not be appropriate for all countries (16Á20). The comparison between these new Swedish reference values, that is, the OLIN reference values, and the GLI reference values show that, on average, both the predicted mean values and LLN for FEV 1 , FVC and FEV 1 /FVC for a woman and man of average height differ across the entire life span with differences generally more pronounced in older ages. Despite recent debate of its appropriateness (37), classification of airway and lung disease severity often relies on FEV 1 and FVC expressed as percent of the predicted value (38). Also, since the LLN for the FEV 1 /FVC ratio is used as a key element for the diagnosis of airway obstruction, the use of GLI may lead to invalid diagnosis of airway obstruction along with erroneous classification of airway and lung disease severity in the Swedish population (17).
A prominent advantage of the GLI reference values is that they cover an extensive age span and different ethnicities, and thus provide estimates without junction points. The GLI reference values for Caucasians are based on collated data from !57,000 subjects, and estimations based on such an extensive amount of data are likely to be valid. Undoubtedly, the GLI is an extremely valuable contribution to the field of lung function reference values, and provide the opportunity to compare a subject or a population to the world-wide average within each ethnicity which is of substantial value for, for example, international multi-centre studies. Quanjer et al. (39) has previously shown that the GLI SDs are expected to be larger compared to SDs derived from smaller populations due to the large number of data subsets included in the estimation of GLI, an expectation which is confirmed in our sample (17). Quanjer et al. argue that although the LLN according to GLI will identify a somewhat larger proportion of subjects as below LLN in smaller populations compared to in larger populations, this will not lead to any bias since the smaller populations merely are subsets of the population at large. However, we argue that since there are considerable differences in, for example, environmental pollution, occupational exposures and socioeconomic factors which can affect lung function between regions (21), bias can indeed be introduced and observed differences should not be neglected. In addition, although the GLI reference values currently are recommended by many respiratory societies, the consensus is still that, when available, appropriate and applicable local populationspecific reference values may be preferred (22,23,25). This consensus is confirmed in our study.
An advantage of our study is that, along with FEV 1 and FVC, reference values are provided also for SVC and for the highest value of the forced and slow VC, that is, the VC. Although the FVC is a recognised proxy for the VC, additional information of the lung function can be obtained by measuring the SVC, especially in subjects where FVC is reduced by air-trapping due to smallairways obstruction, for example, at early stages of COPD. For example, it is common practice that the FEV 1 /VC ratio is relied on when diagnosing obstruction and the GOLD criteria for the diagnosis of COPD recognise this alternative to the FEV 1 /FVC ratio (38), but hitherto few recently published studies provide reference values (and LLN) also for SVC and VC. A weakness of our study is that the age range does not cover children or adults younger than 22 years, that sitting height was not recorded although it has been argued to be a better predictor than standing height (20,26) and that the geographical coverage was limited to northern Sweden. The small but still observable differences in lung function between the samples from northern and south-western Sweden could possibly be due to minor technical or procedural differences, which are not uncommon (39), while substantial biological differences are more unlikely.
Two spirometers, instead of only one, were used for the northern Sweden sample. However, they were of identical brand and model which limits the possible bias. The spirometers were calibrated on a daily basis, but the syringes were not sent for yearly calibration according to the ERS/ATS recommendations and there is a lack of calibration traceability. Further, future studies would most likely benefit from European Spirometry Driving Licenses for the staff and deep insight in previous study protocols such as the NHANES (40,41). However, all measurements in this study were performed by the same three well-trained research nurses with an extensive experience. Also, seasonal variation effects on lung function are avoided due to the random sampling over several years, regardless of season.
The random sample size of 501 subjects is greater than the sample sizes in commonly used Swedish reference values (1,4,5) and suffices to produce valid estimates, results in line with previous studies (39). It is possible that our eligibility criteria are too strict and consequently exclude too many subjects, particularly among the elderly. However, obstructive lung diseases, smoking history, breathlessness, cough and wheeze have been shown to be feasible exclusion criteria to define healthy non-smokers in reference populations for spirometry (25,42). Since the same structured interview questions were used in northern and south-western Sweden, these criteria could be applied identically for both samples. The reference sample from northern Sweden constitutes a representative sample of the general healthy non-smoking population of the area. Thus, bias introduced by including smokers (1,4,5) or by using selected samples such as workers from specific industries (13) or respiratory healthy patients from certain hospitals is avoided. The evaluation of the reference values in the south-western Sweden sample yielded mean Z-scores close to 0 and Z-score SDs close to 1, with corresponding mean percent of predicted close to 100%, for all six different spirometric parameters. Although statistical differences were observed for some of the spirometric parameters, the evaluations reveal a high external validity.
To summarise, the evaluation of the OLIN reference values based on the reference sample of healthy nonsmokers from northern Sweden show that the models are valid. Further, the evaluation based on the healthy nonsmokers from south-western Sweden indicates a high external validity of the model. The comparison with GLI brought further evidence to the consensus that, when available, appropriate local population-specific reference values may be preferred.

Details of the statistical modelling
For each individual we consider a vector x of independent variables and x is a component of the vector. The SD s(x) is assumed to be where a and b are parameters to be estimated together with a set of parameters describing the mean m of the dependent variable as a function of the components of the vector x. The log-likelihood function is a sum of terms À0:5 Á logð2 Á pÞ À logðrðxÞÞ À 0:5 Á ðy À mðxÞÞ 2 =r 2 ðxÞ (1) The mean m(x) does not need to be linear as a function of the components of x, but the mean has to be linear as a function of the parameters. An efficient way of presenting (writing) the derivatives of the log-likelihood function simplifies programming and procedure checking. The derivative of (1) with respect to a, here denoted FA1, is The second derivative d 2 /da 2 denoted FA2, which we need for the Taylor expansion, is and d 2 /dadb, which we denote FAB, is The derivative of (1) with respect to b, denoted FB1, is FB1 ¼ Àx=ða þ b Á xÞ À 0:5 Á ðy À mðxÞÞ 2 Á ðÀ2Þ The second derivative d 2 /db 2 , denoted FB2, is The derivative d 2 /dbda, denoted FBA, equals d 2 /dadb, denoted FAB, which is given above. Thus Further evaluation of the statistical models Predicted values based on mean heights for the n0501 subjects in the reference sample are plotted against age in the graphs below in order to further evaluate that the sample size is sufficient and that the models are valid. Height is predicted by sex and age in these graphs. Mean9 1.645*sd for each of the spirometric indices are presented. All equipment including the two calibration syringes used in this study were newly purchased, serviced and calibrated at study start. The service providers offered support during the study. The syringe was placed on a table during calibration; no 'hands on' was applied. The calibration was performed with both the elbow connection and the filter attached. The three flow protocol was applied at study start and at study stop and the results showed no deviance from linearity. The Masterscreens warnings of insufficient forced expiration were enabled. The Lilly screen was ocularly inspected and cleansed according to the recommendations on a daily basis and replaced when necessary. A nose clip was applied during spirometry.
The following filter was used throughout the study: CareFusion V-892380 MicroGard † IIB, bacteria/viral filter With both inspiratory and expiratory resistance: B0:04kPa=LPSat1L=sec ðB0:4cmH 2 O=L=secat1L=secÞ The research staff served as biological controls continuously during the data collection to check for inconsistencies in spirometer performance, however not in a structured form. No obvious deviations were observed.

Calculation of OLIN reference values Á worked examples
The sex-specific coefficients A, B, B1, B2, B3, B4 and B5 for each of the different spirometric indices are found in Table 1 in the main manuscript.
The formula for calculation of the predicted mean value is as follows: where J ranges from 1 to 5 and SD is calculated as A'B* age) This can be expressed as follows: The formula for calculation of the lower limit of normal (LLN) is as follows: LLN