Screening Foster Children for Mental Disorders: Properties of the Strengths and Difficulties Questionnaire

Background High prevalence of mental disorders among foster children highlight the need to examine the mental health of children placed out of home. We examined the properties of the Strengths and Difficulties Questionnaire (SDQ) in screening school-aged foster children for mental disorders. Methods Foster parents and teachers of 279 foster children completed the SDQ and the diagnostic interview Developmental and Well-Being Assessment (DAWBA). Using the diagnoses derived from the DAWBA as the standard, we examined the performance of the SDQ scales as dimensional measures of mental health problems using receiver operating characteristic (ROC) analyses. Recommended cut-off scores were derived from ROC coordinates. The SDQ predictive algorithms were also examined. Results ROC analyses supported the screening properties of the SDQ Total difficulties and Impact scores (AUC = 0.80–0.83). Logistic regression analyses showed that the prevalence of mental disorders increased linearly with higher SDQ Total difficulties scores (X2 = 121.47, df = 13, p<.001) and Impact scores (X2 = 69.93, df = 6, p<.001). Our results indicated that there is an additive value of combining the scores from the Total difficulties and Impact scales, where scores above cut-off on any of the two scales predicted disorders with high sensitivity (89.1%), but moderate specificity (62.1%). Scores above cut-off on both scales yielded somewhat lower sensitivity (73.4%), but higher specificity (81.1%). The SDQ multi-informant algorithm showed low discriminative ability for the main diagnostic categories, with an exception being the SDQ Conduct subscale, which accurately predicted the absence of behavioural disorders (LHR− = 0.00). Conclusions The results support the use of the SDQ Total difficulties and Impact scales when screening foster children for mental health problems. Cut-off values for both scales are suggested. The SDQ multi-informant algorithms are not recommended for mental health screening of foster children in Norway.


Introduction
The high prevalence and comorbidity of mental disorders in foster children [1][2][3] highlight the need to examine the mental health of children entering foster homes. However, child welfare services often have limited competence and resources for conducting in-depth assessments of mental health. Therefore, shorter screening tools may be useful as a first step in identifying children in need of further specialised assessments. We examined the screening properties of the Strengths and Difficulties Questionnaire (SDQ) [4] with a sample of school-aged foster children in Norway.
The SDQ is a brief mental health questionnaire measuring symptoms and impairments in the child's daily life. Both a Total difficulties scale and an Impact scale may be considered dimensional measures of mental health [5]. Used this way, the SDQ Total difficulties score has shown good predictive ability in community samples in Britain (n = 18,415, of whom 983 had a mental disorder) [5], Sweden (n = 478, of whom 221 were clinical cases) [6], and the US (n = 1.0,367, where 9% were high scorers) [7], and in British looked-after children (n = 1391, of whom 38.6% had a mental disorder) [8]. The Impact score has also been found to be a strong predictor of mental disorders in community samples (n = 4,479, where 7% had a mental disorder) [9], service use in child welfare samples (n = 292, where 29% of these had contact with mental health care) [10], and to discriminate well between a community (n = 467) and clinical sample (n = 232) [11].
By combining the SDQ Symptom scores and the Impact score from different informants, multi-informant algorithms have been developed to estimate the probability that a child has a mental disorder [12]. In Britain, these algorithms have demonstrated acceptable levels of accuracy when predicting the type of disorder in a clinical sample (n = 101, of whom 74% had a mental disorder) [12], and in a sample of looked-after children with mental disorders (n = 539) [13]. In a community sample, these algorithms adequately discriminated between children with (n = 698) and without (n = 2.286) mental disorders, but were not suitable to discriminate between specific types of disorders [14]. In Norway, the algorithms have shown high sensitivity and specificity when screening children with chronic physical illness (n = 559, 11% high scorers) for Any mental disorder and disorder subtype [15]. However, this finding has not been confirmed in youth who have been referred to community mental health services (n = 286, of whom 66% had a mental disorder) in Norway [16].
R. Goodman, Renfrew, et al. [12] state that (the SDQ) ''algorithms are… likely to work best in the sample on which they are developed'' (p. 130); therefore, it is important to study the SDQ predictive algorithms in the settings in which they are to be used [17]. According to Goodman and Scott [18], the rather narrow range of problems measured by the SDQ limits its suitability in samples with broad psychopathology and high comorbidity. However, the SDQ is currently implemented as part of the annual follow-up of looked-after children in Britain [8]. Given that populations and child welfare systems differ substantially across societies [19], there is a need to examine the screening properties of the SDQ with foster children outside of Britain.
The present study examined the screening properties of the SDQ for categories of mental disorders in school-aged foster children in Norway. The following research questions were addressed: How well do the Total difficulties scale and the Impact scale discriminate between foster children with and without mental disorders? Can optimal cut-off values for use of the SDQ with foster children be recommended? Do the SDQ scales have equal validity across the full continuum of severity? Previous studies have demonstrated good predictive values for both the Total difficulties scale and the Impact scale, yet these scales have always been analysed separately. Will a combination of scores from the Total difficulties scale and the Impact scale yield additional predictive value? How accurate are the UK-based multi-informant algorithms for predicting mental disorders in foster children in Norway?

Measures
The SDQ is a 25-item mental health questionnaire for 3-to 16-year-olds that may be completed by parents and teachers, and as a self-report beginning at the age of 11 years [20]. The SDQ, originally developed in English, is currently available for downloading in 75 authorized translations from its official website run by Youthinmind (http://www.sdqinfo.org/). The SDQ consists of a prosocial subscale, a peer problems subscale and three symptom subscales, measuring Emotional symptoms, Conduct problems and Hyperactivity-Inattention symptoms. Each subscale consists of five items that are rated on a scale (0-1-2), providing a total score range of 0-10. A Total difficulties score is computed by summing the three symptom and the peer problem subscales, giving a total score ranging from 0-40. The two-page version of the SDQ also includes an Impact scale, measuring distress to the child and the interference of symptoms and problems in the child's daily life [11]. The parent version of the Impact scale consists of 5 items, providing a total score range of 0-10, whereas the teacher version consists of 3 items, providing a total score range of 0-6. In a recent review of 18 studies concerning the psychometric properties of the SDQ [21], the SDQ was found to have a satisfactory internal consistency, testretest reliability and inter rater agreement. The current five factor structure was supported by 15 of the 18 reviewed studies, two of these 15 studies presenting data from Norwegian community samples.
The multi-informant algorithms combine scores from the three SDQ symptom subscales and the Impact scale when these scales have been completed by at least two types of informants [12]. The algorithms estimate the following probabilities for the presence of a disorder: Unlikely, Possible and Probable. Independent estimates are provided for Emotional, Behavioural and Hyperactivity-Inattention disorders, and an overall estimate is provided for Any mental disorder.
The DAWBA [22] is a structured interview for the diagnostic assessment of mental disorders that may be rated according to the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV) [23], or the International Classification of Diseases (ICD-10) [24]. The DAWBA may be completed by parents or caregivers, and children can complete it themselves beginning at the age of 11. There is also a shorter teacher version. Trained clinicians rate the interviews after reviewing all of the information from the informants, which is presented through a separate scoring program. The DAWBA adequately discriminates between children from community and clinical settings [22] and generates realistic prevalence estimates for mental disorders when used in public health services [25,26]. The SDQ has been validated against the DAWBA in a number of studies [5,8,9,[13][14][15][16][17].

Procedure
The data collection started on September 1 st 2011, and lasted until the end of February 2012. In this prospective study, eligible participants were foster children between the age of 6 and 12 years who had lived for at least 5 months in foster homes in the 63 municipalities encompassed by the Southern Regional Office for Children Youth and Family Affairs (BUFETAT), following legally mandated placement. According to the central register of BUFETAT, a total of 391 children were eligible in the 63 municipalities. Information letters were sent to the head of each municipal child welfare office. The office heads were asked to review the list of foster children from the central register, and add potentially eligible children, if any; to those in the register. This search process identified 28 additional eligible children. Twenty children who had been returned to biological families or who had been adopted were removed from the list. Another three children were deemed ineligible because of serious neurological disabilities. The final number of eligible children was therefore 396. The municipal child welfare offices were asked to provide contact information for schools and teachers of these children.
Foster parents received a postal letter with detailed information about the study, and instructions on how to complete the SDQ and DAWBA interview online. They were also asked to return contact information for the children's school and teacher. In total, contact information was obtained for 307 teachers, who were then contacted by postal mail and asked to complete SDQ and DAWBA interview online. The data collection is illustrated in figure 1.
The first and second authors, both specialists in child mental health, rated the DAWBA according to the DSM-IV criteria [23] and were blind to the SDQ scores. All available DAWBA information from both foster parents and teachers were used in the diagnostic assessment. For the present analyses, mental disorders were grouped into the following categories: Any mental disorder (includes all diagnoses), Emotional (i.e., Depression and Anxiety), Behavioural (i.e., Conduct and Oppositional Defiant disorders) and Attention Deficit/Hyperactive disorders (ADHD). Further details regarding diagnostic ratings are reported in Lehmann et al. [3].

Ethics
The Regional Committee for Medical and Health Research Ethics for West Norway approved this study. In accordance with Norwegian ethics requirements, assent was obtained from children who were at least 12 years old. According to Norwegian legislation, foster parents do not have the mandate to consent on behalf of their foster children. The study were therefore reviewed by the Ministry of Children, Equality and Integration, who provided caseworkers, foster parents and teachers with exemption from confidentiality for the current study. The study is reported in compliance with the STARD guidelines [27].

Study Sample
The study sample, hereafter referred to as the ''All data'' sample comprised 279 of 396 eligible children (70.5%), such that at least one informant, i.e. a foster parent or teacher, had completed the SDQ and DAWBA.
For the multi-informant algorithms, we used data from a subset of children who had their SDQs completed by caregivers and teachers (n = 141), hereafter referred to as the ''Two informants'' sample.

Statistical Analysis
We used SPSS version 19 for Windows for data analyses, with the exception of the receiver operating characteristic (ROC) analyses, which were conducted using STATA 12.
The Total Difficulties and Impact scales. We conducted ROC analyses on the Total difficulties scale, the three symptom subscales and the Impact scale. Area under the receiver operating characteristics (AUROC) values were estimated for the scores reported by caregivers (n = 223) and by teachers (n = 195) separately.
The association between the SDQ scale scores and Any mental disorder were analysed by two separate logistic regression analyses using different definitions of the scales. In the first analysis, we estimated the relative increase in the prevalence of Any mental disorder with increasing scores on the Total difficulties and Impact scales. As in a previous study of SDQ as a dimensional measure [5], the scores from both SDQ scales were recoded into broader score categories in order to prevent unstable estimates due to the small number of children, i.e., n,10; at some scale scores. For the Total difficulties scale, scores 0 to 3 were collapsed into one single category ''0-3''. For the SDQ score from 4-25, two and two SDQ scores were combined -e.g., scores 4 and 5 into ''4-5'', 6 and 7 into ''6-7'' and so on. Scores from 26 and higher were recoded into ''26+''. The original 40 steps in the scale were thus reduced to 13 categories. The same procedure was used for the Impact scale: Scores 0-10 were recoded into 6 categories, starting with 0, and then values 1 and 2 were collapsed into one category ''1-2'' and so on. In a second logistic regression analysis, the Total difficulties and Impact scales were treated as continuous variables in order to obtain Odds Ratios (OR) for mental disorders, as a consequence of a single step increase in the scales. We did run logistic regression analyses both for the recoded version and the original version of the scales.
Coordinates of the ROC curves were used to select optimal cutoff values for the Total difficulties and Impact scales. We calculated Sensitivity and Specificity, together with Positive and Negative predictive values. As these measures are dependent on the prevalence of disorder in the sample [28], we also calculated likelihood ratios (LHR), to express the probability that more children with a disorder would test positive relative to those without a disorder [29]. For more details regarding the use of LHR estimates, see Fisher et al [30], McGee [31], and Marasco, Doerfler and Roschier [32]. Predictive values were interpreted with use of Bayes theorem nomogram [33]. The added value of combining the Total difficulties and Impact scales was examined using logistic regression analyses.

Probabilities
based on the multi-informant algorithms. Chi-square analyses were used to estimate the goodness of fit between the three probability levels derived from the multi-informant algorithms, and the prevalence of mental disorders. The three probability levels were then dichotomised into a conservative ''Probable'' cut-off level and a more liberal ''Possible'' cut-off level for receiving a positive test result. As for the Total difficulties and Impact scales, predictive values for the algorithms were estimated for the two cut-off levels separately.

Results
For the ''All data'' sample (N = 279), the mean age of children was 9.0 years (SD 2.0), with 47.0% being female. As described in a previous report [3], 50.9% (n = 142) of the sample had one or more DSM-IV disorders, in the following categories: Emotional (24.0%), Behavioural (21.5%), ADHD (19.0%) and Reactive attachment disorders (RAD) (19.4%). The comorbidity rate was high with 63.4% of children with disorders having more than one mental disorder.
In the sub sample used to calculate accuracy for carer completed SDQs (n = 223), the prevalence of any disorder was 57.4%. In the subsample used to calculate accuracy for teacher completed SDQs, the prevalence of any disorder was 48.7%.
In the ''Two informants'' sample (n = 141), the prevalence of any disorder was 47.5%. The caregivers reported a mean SDQ Total difficulties score of 14.7 (SD 7.8), whereas the teachers reported a mean of 11.9 (SD 7.2, t = 4.8, df = 140, p,.001). The mean SDQ Impact score was 2.8 (SD 2.8) for the caregiver reports, and 1.8 (SD 1.9) for the teacher reports. As the Impact scale for foster parents comprised more items (5 vs 3 items) than the Impact scale for teachers, statistical analysis of the difference in mean score for the two samples could not be performed. No significant differences were evident between the ''All data'' and ''Two informant'' samples regarding age, gender, SDQ Total difficulties score or DAWBA disorder prevalence (results not shown).

AUROC and Dimensional Properties of the Total Difficulties and Impact Scales
The Total difficulties and Impact scores predicted the presence of disorders at greater than chance rates for both groups of informants (Table 1). For these scales, the results indicate excellent accuracy for caregivers and acceptable accuracy for teachers, according to criteria suggested by Hosmer Jr et al. [34]. Overall, the predictive values for the three SDQ subscale scores were comparable to those for the Total difficulties and Impact scores. Figure 2 displays the ROC curve for the Total Difficulties and Impact scales completed by caregivers (n = 223).
The level of agreement between the increase in recoded Total difficulties scores and the increase in prevalence of mental disorders was strong (X 2 = 121.47, Kendall's tau-b.47, df = 13, p,.001) for the ''All data'' sample, as illustrated in Figure 3. The recoded scores of ''10-11'' and ''16-17'' represented a break in the linear trend.
In the logistic regression analyses, the Total difficulties scale and the Impact scale was entered as continuous scales to estimate the ORs for the risk for Any mental disorder related to one step increase on the relevant scale.  Table 2 presents the sensitivities and specificities of the different Total difficulties scores, which were derived from the ROC analysis. Given equal weight to specificity and sensitivity, a cut-off score of 13 is optimal for both caregivers (82.8% sensitivity, 73.7% specificity) and teachers (86.4% sensitivity, 77.3% specificity). Table 3 presents the sensitivities and specificities of the different Impact scale scores, which were derived from the ROC analysis.

Cut-Off Values for the Total Difficulties and Impact Scales
Given equal weight to specificity and sensitivity, a cut-off score of 2 (80.0% sensitivity, 70.0% specificity) is suggested for caregiver's SDQ, whereas a cut-off score of 1 (77.9% sensitivity, 67.0% specificity) is optimal for teacher's SDQ.
AUROC values revealed overlapping confidence intervals for males and females, and the coordinates for the curves indicated similar cut-off points across genders. Table 4 illustrates the distribution of cases and non-cases for test positives and test negatives according to the recommended cutoffs, for carer completed SDQ and teacher completed SDQ respectively.
As shown in table 5, we estimated the possible additive value of combining the Total difficulties and the Impact scales when interpreting the SDQ reports, using the recommended cut-off scores for both scales on SDQs completed by caregivers. With foster children scoring below the suggested cut-offs on both scales serving as a reference group, a score above the cut-off on either of the two scales increased the risk for Any mental disorder (adjusted OR 4.70, 95% CI 1.98-11.10, p,.001), predicting Any mental disorder with 89.1% sensitivity and 62.1% specificity. Scores above the cut-offs on both scales predicted Any mental disorder with 73.4% sensitivity and 81.1% specificity. Post-hoc tests revealed a significant increase in the risk for Any mental disorder for children who scored above the cut-offs on both scales compared to those who scored above the cut-off on only one of the scales.   Table 6 shows the predictive values of recommended cut-offs for each scale of carer completed SDQs, separately and combined. The likelihood ratios indicate that a cut-off at 13 on the Total difficulties score will increase the post-test probability of any disorder to 81.0%, from the pre-test probability of 57.4%. A negative test will decrease the post-test probability to 23.0%. The  predictive value of the Impact score was somewhat lower for test positive scores. Using the combination of Total difficulties and Impact score, scoring above cut-off on both scales will increase the post-test probability to 84.0%, but with a decreasing predictive value for negative tests to a post-test probability of 30.0%. By defining test positives as scoring above cut off on one of the scales, the probability of disorder will increase to only 76.0%, while testnegatives by will decrease their probability of disorder to 19.0%, from the pre-test probability of 57.4%.

The Multi-Informant Algorithms: Testing the Predictive Values of Two Different Cut-Off Scores
In the ''Two informants'' sample (n = 141), the multi-informant algorithm predicted that Any mental disorder was ''Unlikely'' for 32.3% of the children, ''Possible'' for 24.7% and ''Probable'' for 43.0%. The level of agreement between the SDQ algorithms' results and the prevalence of Any mental disorder from DAWBA, as presented in table 7, was strong (X 2 = 37.15, Kendall's taub = .49, 95% CI = .35-.62, p,.001). A similar level of agreement was observed for the algorithmic predictions derived from the three SDQ symptom subscales and their corresponding diagnostic categories. The agreement was strongest for Behavioural disorders (X 2 = 46.87, Kendall's tau-b.55, 95% CI = .44-.65, p,.001) and somewhat more moderate for ADHD disorders (X 2 = 27.68, Kendall's tau-b = .37, 95% CI = .22-.51, p,.001) and Emotional disorders (X 2 = 24.27, Kendall's tau-b = .39, 95% CI = .23-.54, p,.001). Table 8 presents the accuracy of the algorithms in predicting the corresponding DAWBA diagnostic groups based on the two cutoffs ''Probable'' and ''Possible''. Sensitivity was highest when the ''Possible'' cut-off was used. However, this cut-off had relatively low specificity. Using the stricter ''Probable'' cut-off for positive cases, sensitivity declined and specificity increased. Although this latter cut-off demonstrated sufficient ability to include only those children with a disorder, the relatively low sensitivity renders this cut-off level unsuitable for screening purposes.
Based on the LHR+ values, only the SDQ Emotional subscale with the ''Probable'' cut-off had the potential to identify emotional disorders without including too many false positives. Findings in a previous report [3] indicate that the pre-test probability of having an Emotional disorder is 24.0% for Norwegian foster children. An LHR+ value of 5.35 for the SDQ Emotional subscale signifies an increased post-test probability of disorder of 62.0% for Emotional disorders in children who scored above the cut-off. However, an LHR2 value of 0.74 suggests that scoring below the cut-off decreases the probability of disorder only slightly, to a post-test probability of 19.0%.
Only the ''Possible'' cut-off for the Conduct subscale showed potential predictive usefulness, as no child scoring below this cutoff had Behavioural disorders, compared with a pre-test prevalence of 21.5%.

The Total Difficulties and Impact Scales
The ability of the Total difficulties and Impact scales to discriminate between children with and without Any disorder, according to the ROC analyses, is in the upper range compared to results from previous studies on SDQ used with school-aged children [21]. Furthermore, the AUROC for these two scales revealed discriminative ability superior to that reported for Norwegian pre-school children [35], especially as measured by the Impact scale. Examining an older age group with a higher prevalence of disorders may have contributed to the present findings for foster children compared to the pre-school community sample.
Our findings regarding the screening properties of the SDQ as a dimensional measure are generally consistent with previous reports with community samples [5,17], clinical samples [11] and lookedafter children [8]. This suggests that the Total difficulties and Impact scales are appropriate for use across samples with different disorder prevalence rates. Our findings also suggest that SDQ used as a dimensional measure is valid across a continuum of severity and thereby suitable for screening purposes in foster children with a broad range of mental health problems.
One purpose of screening is to identify children who are in need of more in-depth mental health assessments. To aid in this decision, a cut-off value is often preferred. Here, the consequences of not detecting mental disorders must be weighed against the costs of extensive assessments of children who do not have a disorder. Although a cut-off of 13 on the carer-completed Total difficulties scale may provide the best balance between sensitivity and specificity, it is important to note that children with Total difficulties scores in the low range from 4 to 9 had a prevalence of disorders ranging between 13.0 and 29.0% (Figure 3).
In line with this finding, the high prevalence of mental disorders in foster children warrants a general alertness in child welfare settings. False positives may still have vulnerabilities that do not manifest until children are exposed to new situations, demands and expectations, e.g., starting school. Furthermore; one cannot rule out the possibility that false positives in this high risk group are children with substantial mental health problems, just below the requirements of diagnostic criteria. For example, in a newly reported study on mental health screening in a foster-care sample from New Zealand (N = 577), Tarren-Sweeny [36] found that a majority of false-positive children had at least one mental health score in clinical range as measured with Child Behaviour Checklist [37]. Post-hoc analyses of our data support this finding. Depending on the subscale, 52.0-88.0% of false positives were high-scorers (defined as one SD + above mean score using British norms). Therefore, cut-offs with higher sensitivity may be preferable, in spite of their lower specificity.
An optimal balance between sensitivity and specificity was obtained when the cut-offs for both scales were combined. Defining test positives as a score above the cut-off on one of the two scales identified 89.1% of the children with a disorder. Of the test positives, 37.9% did not have a mental disorder. The added predictive value when combining these two scales indicate that the Impact scale and the Total difficulties scale are not parallel; rather, they complement each other by measuring different but equally relevant aspects of child mental health. In high-risk samples, not only a high prevalence rate; but also a broad range of symptoms and high comorbidity may contribute to these results, which render the Impact scale equally important as the Total difficulties scale for screening purposes. To sum up, if the main purpose of screening is to reduce the number of undetected (false negative) children with a need for more detailed mental health examination, then we recommend cut-offs at either 13+ on the Total difficulties scale or 2+ on the Impact scale to be defined as test positives. The low negative likelihood ratio for this combination indicates a decrease in posttest probability of having a disorder from 57.4% to 19.0% for testnegatives. If on the other hand, an equal emphasize on positive and negative predictive values is preferred, then test positives could be defined by scoring above cut-off on Total difficulties scale only, regardless of score on the Impact scale. We cannot recommend scoring above cut-off on both Total difficulties and Impact scale as a requirement to be defined as test positive, as 30.0% of test negatives here have a post-test probability of having a disorder. For teacher-completed SDQs, the threshold for the Impact scale should be lowered to 1+, while the recommended cut-off for the Total difficulties scale remains 13.

The Multi-Informant Algorithms
Although estimates derived from the algorithms showed some discriminative ability (Table 7), the predictive values for the four diagnostic categories used in the present study were moderate to low, according to Fisher's guidelines [30]. However, the algorithmic estimates for Behavioural disorders showed markedly more sensitivity compared to those for Emotional disorders.
Goodman et al. [13] found 85.0% sensitivity and 80.0% specificity for the ''Probable'' prediction of Any mental disorder in looked-after British children. Given that the overall rates of disorder in our sample were comparable to those of that sample; our lower sensitivity is somewhat surprising. However, a previous study of the predictive value of the multi-informant algorithms in a Norwegian clinical sample reported results similar to ours [16]. The algorithms are calculated using a fixed combination of scores, derived from a British normative sample [25]. Finnish norms for SDQ suggests a cut-off 2-3 points lower than that derived from the British norms [38], illustrating that the UK multi-informant algorithms are based on cut-offs that may not fit populations in other countries. Furthermore, when the algorithms were examined with a British clinical sample [12], the algorithms were modified by increasing the threshold for identifying emotional disorders. For both the clinical sample and the looked-after British children, behavioural disorders were reported almost three times as often as emotional disorders. By contrast, in our sample of Norwegian foster children, there were similar prevalence rates of these two disorders, with a lower rate of behavioural disorders and a higher rate of emotional disorders than in the British samples [3].

Limitations
The statistical analyses presented for the Total difficulties scale, the Impact scale and the multi-informant algorithms are all based on dichotomous diagnostic outcomes. However, individuals differ not only in the presence or absence of a disorder but also in the severity and number of symptoms experienced, their duration and their impact on daily life [39]. In a high-prevalence sample, the size of this sub-threshold group would be larger than in the general population, which would decrease the predictive value of a screening instrument with a defined cut-off value.
In addition, when a sample is divided into subgroups, the sample size determines the degree of vulnerability for random errors in the values of the target variable. In our study, the relatively small sample size may have influenced the fit between the Total difficulties score and the prevalence of disorders, as illustrated in Figure 3. Here, a relatively steadily ascending curve is interrupted by sudden drops that occur at scores ''10-11'' and ''16-17'', suggesting need for caution when interpreting our results. The relatively large confidence intervals add to this reservation. Nevertheless, Chi-square analyses with corresponding ORs suggest that there is a relatively good correspondence between Table 5. Applying recommended cut-offs for SDQ: Total Difficulties Scale and Impact Scale for Caregiver SDQs (n = 223). the increase in SDQ scores and the prevalence of mental disorders. Furthermore, the nearly identical ORs for the recoded and original version of the Total difficulties and the Impact scales support the validity of SDQ used as a dimensional measure across a continuum of severity.

Clinical Implications
The good fit between the increased SDQ scores and the prevalence of disorders suggests that the SDQ is a useful measure for guiding service plans and for comparing child welfare groups with regard to intervention needs. Furthermore, the use of brief mental health questionnaires, such as the SDQ, may both improve communication between child welfare and mental health services, and facilitate the description of children's needs across these relevant services.
If a cut-off for further assessment is preferred, we recommend the use of an interpretation that is based on a combination of the Total difficulties score and the Impact score. Our findings suggest that either a Total difficulties score of 13+ or an Impact score of 2+ for the carer-completed SDQ may indicate the presence of a mental disorder and warrants a follow-up with the child. Based on our findings, we cannot recommend the use of the predictive algorithm to screen foster children in Norway for mental disorders.