Response rates and selection problems, with emphasis on mental health variables and DNA sampling, in large population-based, cross-sectional and longitudinal studies of adolescents in Norway

Background Selection bias is a threat to the internal validity of epidemiological studies. In light of a growing number of studies which aim to provide DNA, as well as a considerable number of invitees who declined to participate, we discuss response rates, predictors of lost to follow-up and failure to provide DNA, and the presence of possible selection bias, based on five samples of adolescents. Methods We included nearly 7,000 adolescents from two longitudinal studies of 18/19 year olds with two corresponding cross-sectional baseline studies at age 15/16 (10th graders), and one cross-sectional study of 13th graders (18/19 years old). DNA was sampled from the cheek mucosa of 18/19 year olds. Predictors of lost to follow-up and failure to provide DNA were studied by Poisson regression. Selection bias in the follow-up at age 18/19 was estimated through investigation of prevalence ratios (PRs) between selected exposures (physical activity, smoking) and outcome variables (general health, mental distress, externalizing problems) measured at baseline. Results Out of 5,750 who participated at age 15/16, we lost 42% at follow-up at age 18/19. The percentage of participants who gave their consent to DNA provision was as high as the percentage that consented to a linkage of data with other health registers and surveys, approximately 90%. Significant predictors of lost to follow-up and failure to provide DNA samples in the present genetic epidemiological study were: male gender; non-western ethnicity; postal survey compared with school-based; low educational plans; low education and income of father; low perceived family economy; unmarried parents; poor self-reported health; externalized symptoms and smoking, with some differences in subgroups of ethnicity and gender. The association measures (PRs) were quite similar among participants and all invitees, with some minor discrepancies in subgroups of non-western boys and girls. Conclusions Lost to follow-up had marginal impact on the estimated prevalence ratios. It is not likely that the invitation to provide DNA influenced the response rates of 18/19 year olds. Non-western ethnicity, male gender and characteristics related to a low social class and general and mental health problems measured at baseline are associated with lost to follow-up and failure to provide DNA.


Background
The general decline in the participation rate in epidemiological studies during recent years [1] may introduce errors in the estimations of exposure and disease occurrence and association measures, and is a major concern for researchers. There are several epidemiological studies investigating factors associated with non-response among adolescents, but studies on the effect of self-selection on association measures, both in cross-sectional and longitudinal designs, are scarce. It is difficult to find discussions of response rates in genetic epidemiological studies and the factors associated with non-response or refusal of agreeing to DNA sampling, even though the same threats to internal validity are in operation as for studies with non-genetic information. This is a concern, as the proportion of epidemiological studies that collect biological specimens increases over time, as reviewed by Morton et al. [2]. This review further reported that less than onethird of epidemiological studies yielded separate participation figures for the biological specimen component of the study. Additionally, we have been able to disclose only one study which has investigated a possible selection bias in the association between genotype and outcome [3], and no studies on factors associated with a refusal to agree to DNA sampling.
Studies of pre-adolescents [4] and adults [5,6] concluded that selection has little or no impact on the association measures, although such information is insufficient, especially in adolescents and children.
As compared with postal-based, school-based research provides a comparatively inexpensive method of obtaining large samples of children and adolescents with high response rates. In surveys of adolescents under the age of 16, parental consent is required in several countries, including Norway. Tiggers [7] reviewed that active parental consent led to parental permission and response rates in the range of 30%-60% for students biased in the direction of an exclusion of minorities, students having problems in school or students engaged in or at risk for problem behaviours. If passive parental consent is required, parental permission is in the range of 93%-100% [7].
The aims of the present genetic epidemiological study of adolescents aged 15/16 and 18/19 in the multi-cultural city of Oslo and the rural county of Hedmark were to: 1. Describe response rates across genetic and nongenetic epidemiological studies; 2. Identify predictors of lost to three-year follow-up in a genetic epidemiological study of 18/19-year-old adolescents, with a particular emphasis on gender and ethnicity; 3. Identify predictors of failure to provide DNA in a follow-up of 18/19-year-old adolescents; 4. Investigate the magnitude and direction of possible selection bias in a follow-up at age 18/19 through an investigation of association measures (prevalence ratio) between selected exposures and general and mental health outcome variables measured at baseline (aged 15/16).

15/16 year olds in 2001: two cross-sectional studies of 10 th graders in Oslo and Hedmark
All students in 10th grade in all 60 and 41 primary schools in Oslo ( Figure 1, Sample 1) and Hedmark ( Figure 1, Sample 2), respectively, were invited to enter the youth portion of the Oslo Health Study and the Hedmark Health Study, respectively. The data collection was performed at the end of the school year in 2001. All parents received written information and the students signed a consent form before beginning participation. The students completed two four-page questionnaires during two school hours. A project assistant was present in the classroom to inform the students about the survey and to administer the questionnaires. Questionnaires were left at school to be completed by students not present on the day of the survey. Those who did not respond received a copy by mail to their home address, together with a pre-stamped return envelope. Ten years at school is compulsory in Norway; hence, the study included all 15/16 year olds in the study areas. A more detailed description has been published elsewhere [27].

18/19 year olds in 2004: one cross-sectional study of 13 th graders in Oslo and two longitudinal studies in Oslo and Hedmark
A three-year follow-up study of the two cross-sectional samples of 10th graders in Oslo (Figure 1, Sample 4) and Hedmark ( Figure 1, Sample 5) was conducted in 2004, partly in school and partly via mail. In both instances, the participants were invited to give consent to and provide DNA through a sampling of cells from cheek mucosa. The data collection was performed before the end of the school year, and the participants were invited to join a lottery with three sums of NOK 15,000 (i.e. USD 2,470/EUR 1,740).
In Oslo, the follow-up was conducted as a schoolbased study similar to the baseline, inviting all the final year students (13 th graders) in all 32 secondary high schools. This was done in order to reach as many of the baseline participants as possible, representing a crosssectional study of 13 th graders in Oslo ( Figure 1, Sample 3). In Norway, all young people between the ages of 16 and 19 have a right to three years of upper secondary education and training funded by the government, and the vast majority of them take advantage of this opportunity. The students filled in a four-page questionnaire and provided a cell sample from their cheek mucosa during one school hour. Because a number of students were not present when the study was conducted, schools were visited several times. Those students who were not reached in school were invited by mail and included in the school-based portion of the follow-up study.
Participants from the baseline study who were not enrolled in the final year of secondary high school in Oslo and who had consented to participate in a followup were invited by mail ( Figure 1). The package included an invitation letter, an information brochure, a consent form, a questionnaire and two cytobrushes, including a container for buccal cell sampling and a pre-stamped return envelope. Two reminders were sent to those who did not respond.
Methods similar to those applied in the postal-based portion in Oslo were applied for the entire three-year follow-up of all participants from the 2001 baseline study in rural Hedmark ( Figure 1). The youth studies are described more thoroughly elsewhere [28].

DNA
Two samples of cells from the cheek mucosa were collected from both the left and right side, using two cytobrushes (Medscand Medical AB, Malmö, Sweden) in the studies of 18/19 year olds. In the school-based study, the students were instructed on how to perform the rubbing, which was done simultaneously by all students in the classroom. In the postal-based studies an instruction letter was included, and a plastic tube with a cap and two brushes was returned by mail together with the Figure 1 Flow chart of the study populations. a 27 individuals had an unknown address for the postal-based reminder; b 61 individuals had an unknown address and 173 did not consent to be invited to a follow-up; c 27 had an unknown address and 229 did not consent to be invited to a follow-up. An additional 55 who gave their consent, but did not fill out the questionnaire at baseline, were invited, of whom 18 participated. They are defined as non-responders at baseline, and are not included in the present analyses. questionnaire and consent form. The cytobrushes were frozen at -20 degrees C.

Mental health Externalizing problems
We used the self-report version of the Strengths and Difficulties Questionnaire (SDQ) [29,30], which is a 25item wide-angle screening questionnaire with five subscales. Each subscale consists of five items, generating scores for emotional symptoms, conduct problems, hyperactivity-inattention, peer problems and prosocial behaviour. Each item can be answered with "not correct" (0), "partly correct" (1) or "completely correct" (2). We used two of the SDQ subscales: conduct problems and hyperactivityinattention to summarize an index of externalizing problems with a cut-off point at the 90th percentile of the study sample [31].
The SDQ self-report is designed and validated for youngsters (11-16 years old), but has also been used as a valuable instrument for older youths [32,33]. To adapt the instrument to 18/19 year olds, some small linguistic changes were made in the follow-up questionnaire in accordance with the approved Norwegian translation.

Mental distress
The Hopkins Symptom Checklist (HSCL-10) is comprised of 10 questions regarding psychological symptoms of depression and anxiety (mental distress) experienced the previous week [34]. For each question, there are four possible answers ranging from: "not troubled" (1) to "heavily troubled" (4), and the average score of the items is used as a measure of mental distress [34]. The 10-item version has approximately the same sensitivity and specificity for detecting psychological symptoms or global distress as the more widely used HSCL-25 [35][36][37]. The HSCL-25 version has proven to have a satisfactory validity and reliability as a measure of mental distress in adults [38], and the 10-question version performs almost as well as the longer versions, including among subjects aged 16-24 [34]. An average score for all 10 items equal or above 1.85 was used as the cut-off point for mental distress, corresponding to the 1.75 cut-off of HSCL-25 [34].

General health
Self-evaluated general health status was measured from the question: How would you describe your present state of health? (poor, not very good, good, very good). The categories were operationalized into poor/not very good and good/very good.

Physical activity in leisure time
Physical activity was measured by a question on the amount of weekly hours concerning physical activity outside of school "to an extent that made you sweat and/or out of breath". The answers were rated 0, 1-2, 3-4, 5-7, 8-10 or 11 hours or more per week, and were operationalized in the present study into 0-2 hours per week and more than 2 hours.

Smoking
The question: "Do you smoke or have you smoked earlier?" had four alternatives: never smoking; smoking before, but quit; smoking now and then; and smoking daily. The two middle categories were merged together into one category since the initial results showed that these two groups responded in a similar way.

Ethnic background/Country of origin
Ethnicity was self-reported and determined on basis of the parents' country of birth. Statistics Norway's definition of ethnic minorities, which is those having both parents born in a country other than Norway, was applied [39]: western (one or both parents born in Norway or another western country) vs. non-western (both parents born in a non-western country).

Parental educational level, income and marital status
To obtain information on the parental educational level and income, the questionnaire data was linked to sociodemographic information collected by Statistics Norway for all participants [40,41]. We applied Statistics Norway's register of highest parental education completed as per Oct. 1, 2000.
The educational level was operationalized into three major groups according to the highest attained educational level: university/college; higher secondary and lower secondary education for the father. Income was operationalized into high (above the 75 th percentile), medium (25 th to 75 th percentile) and low (below the 25 th percentile) for the father. The family economic status was self-reported as bad, medium, good or very good, based on a question comparing the family economy with other families in Norway.
The marital status of the parents was dichotomized into married (married/living together) versus not married (unmarried/divorced/separated/one or both parents deceased).

Educational plans
The 15/16 year olds recorded the highest future education they had considered, which was operationalized into: university higher (i.e. university or regional college higher degree); other (university or regional college intermediate level; upper secondary school; vocational education at upper secondary school; one year at upper secondary school; other plans) and not decided. Their educational ambition was used as an indicator of social class of destination.

Invitation group
We created a variable "invitation group", dividing participants into three groups based on participation using "postal-based Hedmark", "postal-based Oslo" and "school-based Oslo" as an exposure variable and for adjustment in multivariate analyses.

Statistical methods
Response rates are presented as numbers and percentages for the five studies ( Figure 1, . For all studies of 18/19 year olds in 2004, the numbers and percentages for agreeing to link data to other health surveys and registers, as well as agreeing to provide DNA, are presented. Additionally, for the two longitudinal studies, response and consent rates are presented by gender and ethnicity. The relative risk (RR crude and RR adjusted ) with a corresponding 95% Confidence Interval (CI) for lost to follow-up in 2004 is presented for baseline socio-demographic characteristics (in 2001), using Poisson regression analyses. Data from the two study sites (Hedmark and Oslo) are combined, as there were only small differences in results between the two samples, and we have adjusted for "invitation groups". The variables predicting "lost to follow-up" are similar to those predicting "failure to provide DNA". Thus, we do not present results for "failure to provide DNA", but regard analyses of "lost to follow-up" as a proxy.
Crude and adjusted relative risks (RRs), with a 95% CI of lost to follow-up for selected baseline variables, general health, mental health and risk factors, are presented. Adjustments were done for invitation group, gender, ethnicity, family economy, parental marital status, educational plans and father's education and income using Poisson regression.
The possible effects of selection in attendance on the associations were also assessed based on baseline data. The association (prevalence ratio, PR) between selected risk factors (smoking, physical activity) and three outcome variables (general health, mental distress and externalizing symptoms) are presented by gender and ethnicity among both the participants and all invitees. Analyses were conducted separately by gender and ethnicity due to the large differences in the occurrence of mental health problems. The independent variables were chosen since they are well-known risk factors for adverse health effects and one of them (smoking) is associated with non-response, while the other (physical inactivity) is not.
Analyses were performed using SPSS version 14 and STATA version 10.

Ethics
The study protocol was evaluated by the Regional Committee for Medical Research Ethics and approved by the Norwegian Data Inspectorate. The studies carried out in the schools received approval from the school authorities.
Information from public registers in Statistics Norway about the father's education and income was linked with data from the questionnaire/studies through the individual's personal identification number. All personal identification was erased before the data were analysed.

Results
The samples constituting the present study include data from two cross-sectional school-based studies with a total of 5,750 (88.9% response) 15/16-year-old 10 th graders in Oslo (Sample 1, Figure 1) and Hedmark (Sample 2, Figure 1) obtained in 2001, and one corresponding cross-sectional study of 3,308 (90.4% response) 18/19year-old 13 th graders in Oslo (Sample 3, Figure 1) obtained in 2004 (DNA from 3,095) ( agreed to linkage to the baseline data (Sample 4 and 5, Figure 1). All data obtained in the latter two longitudinal studies is derived from some of the participants in the cross-sectional studies mentioned above (Table 1).

Response rates
In the studies of 10 th graders aged 15/16, in which passive parental consent was obtained, the participation rate was similar to rural Hedmark (88.3%) and urban Oslo (89.2%) ( Table 1). The response rate was also quite similar in the cross-sectional school-based study of 13 th graders aged 18/19 (90.4%), although collection of DNA from each individual was added as part of the survey. The response rate in the pure postal-based study in Hedmark among 18/19 year olds was considerably lower (55.4%) ( Table 1) Almost all participants in the school-based survey of 18/19 year olds consented to give their DNA (93.6%), and the rate was also high in the postal-based data collections, at 82% and 88.7% in Oslo and Hedmark, respectively ( Table 1). The percentage of participants who consented to give their DNA was about the same as the percentage who agreed to link their data to registers and previous health surveys.

Factors associated with follow-up rates
Lost to follow-up was closely related to failure to provide DNA, and analyses revealed that there were similar predictors operating in the two instances. As a result, the factors associated with lost to follow-up presented in the following are valid for failure to provide DNA (data not shown).
For the purpose of linking data from the two time points, 2,489 (65.3%) out of 3,811 15/16 year olds in Oslo who participated in the baseline study in 2001participated and consented to the linkage of data in 2004 (Table 2). Thus, the lost to follow-up in Oslo was 34.7%. In Hedmark, 827 (42.7%) of the participants in 2001 also participated and agreed to linkage in 2004, yielding a lost to follow-up of 57.3%. More girls than boys and more participants with a Norwegian/western than non-western background participated in the follow-up study ( Table 2). In Oslo, 97.7% of participants in the follow-up who consented to linkage of data also agreed to provide DNA, which was quite similar to Hedmark (96.3%) ( Table 2).
There were no major differences between Oslo and Hedmark in socioeconomic predictors for lost to follow-up (data not shown). Therefore, the relative risk of selected socio-demographic baseline factors for lost to followup for Hedmark and Oslo are presented as combined (Table 3). Significant predictors were: male gender; nonwestern ethnicity; postal survey compared to school-based; lower educational plans than university/higher education; low education and income of father; low perceived economy in the family and unmarried as compared to married parents (Table 3). Adjustments for "invitation group" resulted in weaker risk estimates, except for non-western ethnicity which increased from RR = 1.17 (95% CI: 1.08-1.26) to RR = 1.31 (95% CI: 1.21-1.43). Separate analyses by western/non-western ethnicity were conducted. There was no major change in risk estimates for western boys and western girls (results not shown). However, in nonwestern boys and girls, the father's income or education was not a significant predictor for lost to follow-up, and a poor perceived family economic situation was significantly associated with lost to follow-up in girls only (results not shown).
The relative risk of selected baseline health and health-related factors for lost to follow-up did not differ in separate analyses between Hedmark and Oslo. In the combined data set, poor self-reported health (borderline significantly), externalized symptoms and smoking were  significant predictors for lost to follow-up (Table 4). When stratifying ethnic groups, none of the selected factors were significant predictors for lost to follow-up in non-western boys and girls (data not shown).
In subgroups of western boys, the predictors were similar to the total sample, with mental distress yielding a significant effect, while in western girls only externalized symptoms and smoking were significant predictors for lost to follow-up (data not shown).

Association measures
In order to investigate a possible distortion in association measures due to lost to follow-up, we examined the prevalence ratios (PRs) between baseline exposure data from 2001 (smoking and physical activity) and selected outcome health variables (self-reported health, mental distress and externalized symptoms) among participants and all invitees in the follow-up in 2004.
Regarding the associations between physical activity and self-reported health, mental distress and externalized symptoms, the prevalence ratios were similar in the groups of participants and all invitees except for self-reported health among non-western girls, in which the PR was 3.0 (1.3-6.7) for participants and 2.0 (1.1-3.6) for all invitees (Table 5). Even so, the number of participants in these groups was low and the confidence intervals were wide. The analyses of association between smoking and self-reported health, mental distress and externalized symptoms gave similar prevalence ratios in the groups of participants and all invitees for both western boys and girls (Table 6). In subgroups of non-western boys and girls, there were differences in PRs and CIs between participants and all invitees. Nevertheless, the number of participants in these groups was low and the confidence intervals were wide (Table 6).

Main findings
We report similarly high response rates in all the school-based surveys -irrespective of age of the adolescents, year of the study or whether the survey was carried out in urban Oslo or rural Hedmark. In the followup, the response rate was markedly higher in Oslo when school-based and postal-based data collection were combined in comparison with the pure postal-based data collection in Hedmark. The rate of participants who gave their consent to DNA provision was as high as the rate of those who consented to linkage of data with other health registers and previous surveys. Significant predictors of lost to follow-up and failure to provide DNA samples were: male gender; non-western ethnicity; postal survey compared with school-based; lower educational plans than university/higher education; low education and income of father; low perceived economy in the family; unmarried as compared with married parents; poor self-reported health; externalized symptoms and smoking, with some differences in subgroups of ethnicity and gender. Regarding the association between  (16), the numbers will not sum up to the total numbers in Oslo, nor Hedmark; 6,7 Based on self-reported native country of mother and father.
selected exposures and outcomes, the main finding was that the association measures (PRs) were quite similar in the groups of participants and all invitees. In subgroups of non-western boys and girls, however, we found some differences, though the pattern was inconsistent, the number in the analysis was small and the confidence intervals of the estimated associations were large.

Response rates
When conducting epidemiological studies, we aim to select samples in which all groups are represented in the study sample in the same way as their representation in the general population. Still, almost every study is hampered by a number of invitees who decline to participate. Any sign of selective attendance in which When substituting "lost to follow-up" with "failure to provide DNA" as dependent variable, the RRs were similar; 2 Adjusted for invitation group; 3 Gender does not sum up to 5750 as shown in Figure 1 and Table 1 due to "missing information on gender" on 21 individuals; 4 Based on self-reported native country of mother and father.
certain exposed groups are grossly under or overrepresented may incur disturbances to the conclusion. Epidemiological studies with a low level of participation are particularly vulnerable to self-selection bias threatening the internal validity. In the three present cross-sectional, school-based studies, the response rate was approximately 90%. In addition to the advantage with the classroom setting, several other factors may have contributed to the high response rates. Active parental consent was not needed, which in other studies has reduced the response rate to the level of 19%-60% [7,42]. We used no invasive methods and the data collection only took a little of the participants' time [43,44]: two hours for the 15/16 year olds and one hour for the 18/19 year olds. Face-to-face recruitment instead of a less personal form of contact between the study recruiter and potential participants may also increase the participation rate [45]. Among adults, monetary incentives may increase the response rate, but the effect on differential study participation is mixed [46]. Greater monetary incentives may have a greater impact on minority and low education individuals participating than on those who are non-minority with a higher education, though in contrast, potential responders with a high income or education may have a greater demand to be compensated for their time [46]. In the present study of 18/19 year olds, an incentive was given by letting participants join a lottery consisting of three sums of NOK 15,000 (i.e. USD 2,470/EUR 1,740), but it is not known whether incentives may bias studies of adolescents or had any impact on the response rate in this study.

Predictors of lost to follow-up
About 10% did not agree to the linkage to other health surveys or registries, including their own baseline, thereby contributing to "lost to follow-up". In the consent form, the question of agreeing to a linkage to their own baseline was written in the same sentence as the When substituting "lost to follow-up" with "failure to provide DNA" as dependent variable, the RRs were similar 2 Adjusted for invitation group, gender, ethnicity, family economy, parental marital status, educational plans, father's education and income. The number included in adjusted analyses is lower than in crude analyses due to missing data for some of the variables. For example, in analyses of self-reported health there are 5657 in crude analyses and 4527 in adjusted analyses. In particular, we lack data on father's education and income, which are obtained from Statistics Norway. In reanalyses of RR crude , including only the number available in adjusted analyses, the associations were slightly stronger (except for physical activity), but did not change confidence intervals to significantly/non-significantly for either of the associations.
linkage to registers. We are not able to rule out whether this mix could be the reason why so many adolescents refused the linkage.
Most of the predictors of lost to follow-up found in the present genetic epidemiological study have previously been reported in surveys of adolescents, and are the ones most often supportive of our findings [10,21]. In addition, we have found the following predictors of lost to follow-up, which as far as the authors are aware, have not previously been reported: postal survey compared with school-based; lower educational plans than university/higher education and low perceived economy in the family. Most studies report that urban area of living, as compared to rural, predict non-response [10,21], which is in contrast to the findings of the present study, in which we detected no differences between Oslo and Hedmark in the response rate among 15/16-year-old 10 th graders. It could be that school-based studies are less sensitive to location than other settings due to oral information about the purpose of the study [45] and a possible team feeling.

Association measures
In the follow-up studies, the response rate was 65% in Oslo when combining the school-based and postal portion and only 43% in Hedmark, which is a concern regarding internal validity. However, in a respiratory health survey in Norway of 15-70-year olds, early responders were compared with late responders after a first and second reminder and telephone follow-up with respect to prevalence estimates and association measures [6]. The response rates increased from 42.7% to 79.9%, but there were only marginal differences in the exposure-disease relationship and prevalence estimates when initial responders were compared with all responders. This is in accordance with the present study of adolescents, in which we found no marked differences in association measures (PRs) among responders and all invitees, when restricted to western girls and boys. In non-western girls and boys, however, there were differences in the prevalence ratios in the association between exposure to smoking and physical activity on selected outcomes, with no discernable pattern. This could be Percentage with health symptoms and prevalence ratios (PRs) within categories of physical activity. 1 Based on self-reported native country of mother and father.
due in part to the low number of participants in these groups or information bias, i.e. a linguistic or cultural problem in understanding the meaning of any of the questions from the questionnaire [47,48]. We may also draw support from a Dutch study on preadolescents by de Winter et al. [4] regarding western boys and girls in the present study, as well as a warning that prevalence estimates of mental health problems may increase with increasing participation rates. They utilized information from community registers, parents, teachers and classmates in order to investigate a possible bias in association measures and prevalence estimates. Responders were compared with late responders and nonresponders, demonstrating that extra efforts to increase the sample size from 66% to 76% prevented an underestimation of the prevalence of psychopathology. Nonetheless, even with differences between non-responders and responders on several individual characteristics, no significant differences were found pertaining to associations between these characteristics and psychopathology [4]. In the present study, mental health was associated with lost to follow-up in the western subgroup, and we also detected that after reminders, late responders reported more mental health problems than early responders [49]. For that reason, we may have underreported the occurrence of mental health in the follow-up.
In a two-year follow-up of 15-18-year-old psychiatric outpatients, it was possible to reach all 101 patients except four, using a comprehensive tracking system [26]. Axis I and II disorders at the two-year follow-up were significantly associated with follow-up contact difficulties, while baseline psychopathology and sociodemographic variables were not. Thus, relying on baseline characteristics of adolescents may underestimate the extent of psychopathology at follow-up. Based on the above mentioned studies, there might be a higher rate of mental health problems among those lost to followup compared to participants in our study. Even though we found that relying on baseline information yielded a higher overall prevalence of externalized symptoms in those lost to follow-up compared to participants and overall no difference regarding mental distress, we cannot rule out of whether there was any underestimation of psychopathology at follow-up in the present study.
According to Hartge [50], poor response rates may be of little concern if the willingness to participate is Percentage with health symptoms and prevalence ratios (PRs) among smokers and non-smokers. 1 Based on self-reported native country of mother and father.
essentially unrelated to exposure. Even if willingness differs with exposure, bias will still not result unless the tendency is stronger (or weaker) in different levels of outcome (i.e. in individuals with disease vs. no disease). According to Kleinbaum et al. [51], even if a willingness to participate is unrelated to exposure, this willingness may be stronger (or weaker) associated with baseline exposure by level of outcome, meaning that selection bias may occur. In the present study, we have investigated whether there are differences in associations measures (PRs) between baseline exposures (smoking, physical activity) and baseline health outcomes (selfreported health, mental distress, externalized symptoms) among participants and all invitees. The use of baseline outcome variables must be regarded as a proxy evaluation of selection bias in associations between baseline exposures and outcomes at follow-up. In our study, a willingness to participate was associated with one of the two exposure variables, namely baseline smoking, but not with baseline physical activity (Table 4). For the total material and in Norwegian/western participants, we detected no selection bias in the association measures (PRs) when utilizing baseline smoking and physical activity as the exposures and selected baseline outcomes (mental distress, externalized symptoms, self-reported health), thereby indicating that the associations between willingness to participate and smoking (and physical activity) are similar by level of outcomes [51]. However, in subgroups of non-western youths (especially in boys) the association between willingness to participate and exposure (particularly with smoking) differed by level of outcome (mental distress, externalized symptoms, selfreported health). So in accordance with Kleinbaum [51], the estimated PRs are biased in these subgroups of participants due to selection, which are indicated with different estimates of PRs between participating and all invited non-western boys and girls (Table 6). However, the number of participants in these groups was low and the confidence intervals were wide. Unfortunately, it is not possible to conclude that the association between baseline exposures (smoking or physical activity) and selected outcomes measured at followup (self-reported health, mental distress, externalized symptoms) are free from selection bias, but if we lean on analyses of the present baseline outcome data and reports from previous studies [4,6], we may be able to say that there is probably no major selection bias. Regarding subgroups of non-western immigrants, it is more likely that the association measure is biased.

DNA
In the present study, it is not possible to directly assess whether the task of providing DNA has affected the rate of lost to follow-up among 18/19 year olds. However, in the school-based study of 13th graders in which DNA was provided, the response rate was high at 90%, which is similar to the school-based surveys of 10th graders in which DNA sampling was not included. Because of this, it is unlikely that a particular fear of providing DNA played a role in the response rate for the school-based portion of the study. In a qualitative study in the UK [52] of 23-67-year-old participants from an epidemiological health study which collected DNA, it was reported that most of the panel had a positive attitude to medical research and that genetic research in particular was seen as being especially rich in the potential for medical advancement. Other reasons for participating in this genetic health study were: a desire to do good; the possibility of a health gain in the form of a health check; confidence in the research process and its governance and a perception of low risk [52]. The study revealed that the participants had these positive attitudes although most of them misunderstood the aim of the genetic epidemiological study, which was explained in information leaflets. It is not known which factors were in operation for adolescents, and this should be further explored.
To the best of our knowledge, the present study is the first of its kind to investigate predictors of failure to provide DNA. We hypothesize from the present data that there are similar personal reasons behind a willingness to provide DNA and a willingness to agree to linkage of data to registers and health surveys, which should also be further explored. The proposed hypothesis is based on detection of a similarly high response rate in the school-based studies that did or did not collect DNA, with a consent rate as high for providing DNA as for linking data to registers and health surveys.

Conclusions
Studies on the effect of self-selection on association measures in epidemiological studies on adolescents are scarce. For this reason, more studies primarily designed to address this topic are needed. Carefully designed studies on self-selection problems may, however, not be a universal answer or truth for all other studies on adolescents. Consequently, epidemiological studies should be carefully planned to allow a judgement of strength and direction of potential errors due to selection. In the present study, associations between selected exposures and health variables measured by prevalence ratios differed somewhat between the participants and all the invitees for non-western boys and girls, although not for western boys and girls. Further studies that aim at validating instruments and questions in a multicultural setting are recommended, as the difference between ethnic groups could be due to linguistic and cultural differences. In general, however, we conclude that the estimated prevalence ratios in the present study were only marginally influenced by lost to follow-up.
As opposed to most studies on adolescents, we did not find a lower response rate in urban as compared with a rural area, and we conclude with similar ratios among 15/16-year-old adolescents in the school-based studies in urban Oslo and rural Hedmark.
As expected, the response rate was considerably higher in school-based surveys than in postal-based surveys. It is not likely that the invitation to provide DNA has influenced the response rates of 18/19 year olds, especially in the school-based survey. We also conclude that the willingness to provide DNA is slightly lower in a postal-based study as compared with a school-based study and that there were similar proportions of participants who consented to provide DNA and who agreed to the linkage of data to other health surveys and health registers. Non-western ethnicity, male gender and characteristics related to low social class and general and mental health problems measured at baseline were associated with lost to follow-up and failure to provide DNA in the present genetic epidemiological study. The predictors were similar to those of non-genetic epidemiological studies of adolescents.
This study is based on both urban and rural samples, and results regarding lost to follow-up, failure to provide DNA and self-selection on association measures were similar across the various samples. The findings may therefore be generalizable to adolescents living in similar urban and rural areas. However, care should be taken in generalizing findings from the group of non-western ethnicity due to low numbers and corresponding wide confidence intervals of estimates.