Is health-related quality of life the same for elderly polish migrants, Turkish migrants and German natives? Testing the reliability and construct validity of the Sf-36 health survey in a cross-cultural comparison

Abstract Objective: The Sf-36 is the most widely used instrument to measure health-related quality of life (HRQoL) with the most convincing evidence of both internal consistency and test–retest reliability. In addition, it is appropriate for use among elderly and minority groups like migrants. The aim of this study is to investigate and compare the reliability and the factorial structure of the Sf-36 in a sample of elderly migrants and natives. The hypothesis is that the construct (the HRQoL consisting of eight dimensions correlated with two components) is the same for elderly Turkish migrants, Polish migrants and German natives. This means that the Sf-36 model shows good psychometric properties and model invariance for the three groups investigated in this study. Methods: The Sf-36 v.2 was forward and backward translated to Turkish and Polish. In this cross-sectional study, interviews were conducted with a sample of elderly migrants from Turkey (n = 100), from Poland (n = 103) and a sample of elderly German natives (n = 101). All data were entered and analysed using SPSS version 21 and AMOS Graphics. Cronbach’s α was used to analyse the reliability of the Sf-36. Multi-group confirmatory factor analysis (MGCFA) and structural equation modelling (SEM) were used for the Sf-36 model invariance testing. Results: The reliability of the Sf-36 was good to excellent for all Sf-36 dimensions (α > 0.7) except for General Health (0.55) in the Polish group. Multi-group confirmatory factor analysis (MGCFA) showed non-invariance between the three groups (CMIN: 180.172, df: 51, CMIN/df: 3.533, p < 0.001, CFI: 0.895, RMSEA: 0.092 for the unconstrained model). Model modifications resulted in a good model fit for the Polish group. However, an applicable common Sf-36 model for the three groups was not attained. Conclusion: This study doesn’t support the idea that the factorial structure of the Sf-36 with two components and eight dimensions is the same across three ethnically and culturally diverse groups of elderly subjects. Therefore, comparing subscale scores of the Sf-36 between different ethnic groups may be problematic.

ABOUT THE AUTHORS Johanna Buchcik, PhD, is a research associate and lecturer for Statistics and Public Health Nutrition at the Hamburg University of Applied Sciences in Germany. Her research work focuses on health promotion, and in more detail on health and care of elderly migrants. She obtained her PhD at the University of the West of Scotland, where her research activities included description and comparison of cross-cultural quality of life of elderly Turkish and Polish migrants and German natives. These activities were carried out in the context of a study called Sağlık, where she worked as a research assistant. The study was conducted in 2013 by the Hamburg University of Applied Sciences under the leadership of Prof. Dr Joachim Westenhöfer. The major aim of the study was to develop, implement and evaluate intercultural and interdisciplinary measures of social space-oriented health promotions, with a special focus on the promotion of healthy nutrition, movement and social participation with the intention to promote the health of older Turkish women and men.

PUBLIC INTEREST STATEMENT
Health-Related Quality of Life (HrQoL) is multidimensional, which means that it includes individually perceived aspects related to physical, mental, emotional and social circumstances. Therefore, it can differ between individuals and groups (e.g. seniors and adults, women and men, migrants and natives). However, there are instruments, like the Sf-36, which are uniformly used for measuring HrQoL between different groups and individuals. This article describes and compares the HrQoL of elderly migrants and non-migrants. The HrQoL was measured using the Sf-36 questionnaire, which assumes the HrQoL consisting of a mental and a physical component. Based on interviews with Turkish and Polish migrants and German natives, it was found that the HrQoL with these two components is not the same across the three ethnically and culturally diverse groups of elderly subjects. Therefore, comparing the HrQoL with the Sf-36 questionnaire between different ethnic groups may be problematic.

Introduction
Health-related quality of life (HRQoL) is the perceived quality of an individual's health and daily life and therefore increasing HRQoL is an important goal in health promotion and disease prevention. As a consequence, information regarding health and HRQoL can serve to meet the needs in a socially and politically adequate way, to create offers on health issues and to adapt or change policies in these regards.
The group of elderly migrants is heterogeneous not only due to a number of different home countries, but also on account of different social milieus as well as cultural, economic and social backgrounds (Knipper & Bilgin, 2009). Accordingly, its group members differ not only in health status but also in their expectations and needs for the health care system (Knipper & Bilgin, 2009). Consequently, not only perceived health, but also HRQoL, depend on cultural background, subjective conceptions, cultural values, languages, attitudes, beliefs, intentions, motives, moods and behaviours and can lead to missing measurement invariance across different populations (Gregorich, 2006).
Research findings on HRQoL of elderly migrants such as the elderly Turkish and Polish population in Germany are still rare and data can be categorised as insufficient, with only a few investigations available (Bayram, Thorburn, Demirhan, & Bilgel, 2007;Berdes & Zych, 2000;Knurowski et al., 2004). Moreover, findings often miss comparing the results of different, especially elderly, migrant and native groups with each other which is inhibited by a lack of standardised questionnaires and procedures (Buchcik, Westenhoefer, & Martin, 2013).
When undertaking research in populations with diverse cultural and/or migration backgrounds, the psychometric properties of HRQoL measures are an important consideration (Hoopman, Terwee, Muller, Öry, & Aaronson, 2009). And especially "[…] when measurements are provided by self-report or other fallible methods, concerns about instrumentation are often exacerbated" (Gregorich, 2006, p. 1).
In cases where major differences in measurement equivalence were found, it must be concluded that similar interpretations between migrant groups (and natives) should be treated cautiously (Milfont & Fischer, 2010). Consequently, without appropriate measurement equivalence, it can be assumed that differences are caused by measurement artefact, being a consequence of item response biases and not by the objective differences in HRQoL.
The Sf-36 is a generic instrument for measuring HRQoL across different age-, gender-, and national-groups (Bullinger, 2000). Additionally, there are several studies reporting psychometric properties (the reliability, validity and objectivity) of the Sf-36 in different languages and for different groups (Aaronson et al., 1992;Mbada et al., 2015;Salazar & Bernabé, 2015;Wang, Chen, Yang, & Wu, 2015). The Sf-36 was reported to be the most widely used questionnaire (Bullinger & Morfeld, 2004) having the most convincing evidence of both internal consistency and test-retest reliability (Haywood, Garratt, & Fitzpatrick, 2005) as well as being appropriate for the use of elderly and minority groups like migrants (Buchcik et al., 2013). Analysing the results is straight forward, because it requires basically a calculation of sum scores, which will be transformed into values between 0 and 100. This, consequently, allows an interpretation of the sum scores and a comparison between different participants (and a reference population (Bullinger & Kirchberger, 1998)).
Based on this reported psychometric properties and its usefulness for research for different groups, we hypothesised the construct of HRQoL-including eight Sf-36 dimensions which correlate with a mental and physical component-being the same for elderly Turkish migrants, elderly Polish migrants and elderly German natives. As the Sf-36 was reported to show good psychometric properties between different population groups (e.g. Turner-Bowker, Bartley, & Ware, 2002), it is assumed that it will show good psychometric properties and model invariance for these three groups.
According to the original Sf-36 model (Ware, Kosinski, & Keller, 1994), we expected a model with the following characteristics for all three groups: (1) The model shows a two factor structure, consisting of two major components of health-related quality of life: the physical (PC) and the mental (MC), (2) four dimensions (GH, PF, RP, and BP) are indicators of physical and four dimensions (RE, MH, VT, and SF) are indicators of mental health and (3) the model also assumes two split loadings of the dimensions Vitality and General Health. This model is illustrated in Figure 1.

Material and methods
Approval for the interviews was granted by the University of Applied Sciences Hamburg (HAW) and by the committee of the University of the West of Scotland (UWS).

Study participants
According to the Statistical Office for Hamburg and Schleswig-Holstein (2010), people with a migration background living in Hamburg (the second largest city in Germany) predominantly come from Turkey (18.0%) and Poland (13.0%). Therefore, a sample of migrants born in Turkey (n = 100), in Poland (n = 103) and a sample of German natives (n = 101) was interviewed in Hamburg.
The participants in this study had to meet certain criteria for inclusion: Two of the three groups (Turkish and Polish) had to have first-hand migration experience. The participants had to be at least 60 years old, because it is expected that the number of elderly (migrants) will increase, as the population ages. The participants had to live in selected districts of Hamburg (named Wilhelmsburg, Billstedt, Altona-Nord, Altona-Altstadt and Harburg) because the proportion of migrants is particularly high in these districts. In the case of German participants, the surveys also took place in these districts to ensure better comparability. The participants did not live in nursing or senior homes or required professional nursing care, because having professional support in daily life may result in a different health status and therefore a different HRQoL. Furthermore, one aim of the surveys was to estimate the need for health promotion measures among elderly women and men with and without a migrant background and who lived independently.

Recruitment of participants
The recruitment of participants using passives steps (like an announcement in a brochure or on a website) proved ineffective. The most effective recruitment method was to ask the participants for their participation directly, face to face. Turkish participants were located in Turkish facilities, in mosques, in Turkish cafés and on the street, where they spend their leisure time. Polish migrants were recruited in two Catholic churches, in cultural facilities, in cafés, in Polish grocery stores and on the street. German participants were found in cafés, in grocery stores and on the street. In addition, participants were recruited using the snowball method (family members or friends of the interviewers and family members or friends of the participants).
Due to different recruitment processes, which have been usually informal (e.g. by family members), the documentation of refusal of other potential subjects was not possible. Table 1 gives an overview of the recruitment process for Turkish migrants, Polish migrants and German natives.

Data collection
The recruitment and the interview venue differed sometimes. The Turkish inquiries were carried out in different locations or at a friend's house or (more usually) in the participants' homes. The Polish interviews mostly took place in the parish hall, in cultural facilities or at the participants' houses. The German participants were mostly interviewed on the street or in various public locations (e.g. bakery, café). To ensure that migrants with poor German language skills and Germans could adequately reply to the questions, all participants were given the opportunity to answer in their native language (Turkish, Polish or German) (see translation process below). All interviews were conducted in the language according to the person's background. Table 2 gives an overview of where the interviews were carried out.

Study instrument
The Sf-36 (Ware & Sherbourne, 1992) is a generic instrument with 36 questions which measure eight dimensions of the physical and mental health component (see Figure 1). It has been developed within the Medical Outcome Study (MOS) with the aim to examine the services of health-insurance systems in America (Bullinger, 2000).
The Sf-36 is depicted by the Ware et al. (1994). In this model, the Sf-36 forms two major components of health-related quality of life: the physical (PC) and the mental (MC). These are covered by 36 items which refer to eight health dimensions (physical functioning, role physical, bodily pain, general health, vitality, social functioning, role emotional, and mental health). According to this theoretical model, the first four dimensions are indicators of the physical and the last four dimensions are indicators of mental health. The model also assumes two split loadings of the dimensions Vitality and General Health.

Translation process
The translation process included forward and backward translations from German to Turkish and Polish. Afterwards, it was pre-tested on native speakers with a Turkish and Polish migrant background, respectively. Native speakers weren't involved in the study and the process so that they could comment on the equivalences with neutrality and compare the back-translated questionnaire with the source version. Additionally, in a number of several-hour-long meetings, academic staff compared the back-translated questionnaire with the source, item by item and word by word. Words which were mistranslated, which would be too difficult to understand or which had another meaning were changed or left out without changing the conceptual frame. Some words, for example, which were appropriate within the German context, could not be used in the same way for Turkish or Polish participants.

Statistical analysis
All data were entered and analysed using SPSS version 21. Descriptive statistics are reported as means and standard deviations (means ± SD), absolute frequencies and percentages. Statistical significance was assessed for p < 0.05. The internal consistency was tested using Cronbach's α (Cronbach, 1951). Multi-group confirmatory factor analysis (MGCFA) and structural equation modelling (SEM) were used for the Sf-36 model invariance testing with AMOS. Thus, SEM was conducted on the eight dimensions and two components.
To investigate model invariance and measurement equivalence, we followed the procedure outlined by Weiber and Mühlhaus (2010) (see also Byrne, 2004) using five models with increasing constraints on the model parameters (factor loadings/regression slopes, intercepts, error variances, covariances between latent variables). (1) In the "unconstrained model" which is estimated first, all model parameters are estimated without any requirement that these parameters are equal for the different groups.
(2) The "weights model" is used to assess whether factor loadings (i.e. regression slopes) are the same across the groups. If this invariance is satisfied, the latent variables are being measured in the same way across groups. (3) The "intercepts model" assumes that intercepts in addition to regression slopes are the same for all groups. If this type of invariance does not hold, then group comparisons of the indicator variables may be limited in their validity. (4) The "covariances model" requires in addition to the "intercepts model" that the covariances between the latent variables are the same for all the groups. And finally (5) in the "residual model" additionally the invariance of the error variances (residuals) is assumed across the analysed groups. So, the "residual model" represents complete invariance of factor loadings, intercepts, covariances and error variances across the groups.

Participants' characteristics
The demographic characteristics of the study groups (Turkish, Polish and German) are shown in Table 3. A total of 304 participants responded to the questionnaire (100 Turkish migrants, 103 Polish migrants, 101 German natives). The mean age of the study group was 68.3 ± 6.9 (range 60-89). 13 Turkish participants didn't state their age. 58.2% of the study group was female and 41.8% was male. All Turkish participants stated Turkish as their native language. Polish participants named two options: First, language Polish and German, which means that they indicated Polish as their first native language and German as their second native language. Second, language German and Polish, which means that they indicated German as their first and Polish as their second native language. Table 4 shows socio-economic data of participants. 11.0% of the Turkish women and men stated that they never went to school. In contrast, all other participants had at least some school education. Mostly, Polish respondents visited school for more than 12 years (45.6%). More Turkish than Polish or German participants reported not having a formal professional education. Only Polish and German groups stated having a monthly personal income >2.501€; however, only 2 and 3 men reported this, respectively.    (57) 50.5% (51) 1.501-2.500€ 6.0% (6) 14.6% (15) 17.8% (18) >2.501€ -1.9% (2) 3.0% (3) Not specified/unknown 14.0% (14) 6.8% (7) 6.9% (7)

Factor analysis
The calculation for the unmodified Sf-36 model shows the following results (CMIN/df, CFI and RMSEA) for each national group. The CMIN/df is above 3.0 for the Turkish and German group, the CFI is under 0.90 for the Turkish and German group and the RMSEA is in all three groups above 0.08. Of the three, the Polish group shows the best model fit (see Table 6).
Model modification was therefore based on the Polish model and the following changes were made: The first modification included a path between mental component (MC) and vitality (VT) which was dropped due to high standard error (SE 1.349). The second modification excluded the path between MC and general health (GH) due to high standard error (SE 1.405). In a third step, covariance between e5 and e8 (MI 20.919) was included. These steps led to better model fit (CMIN/df: 1.819, CFI: 0.959 and RMSEA: 0.090). However, to improve the RMSEA (which was still above 0.08) a covariance between e3 and e4 (MI 5.028) was allowed.
As a result, good model fit for the Polish group was achieved (CMIN/df: 1.501, CFI: 0.976, RMSEA: 0.070). The same modifications could not lead to good model fit of Turkish (CMIN/df: 5.307, CFI: 0.868, RMSEA: 0.209) and German (CMIN/df: 3.790, CFI: 0.851, RMSEA: 0.167) data. The modified Polish model was therefore used as the configural model and was transferred to the other two groups for calculation using MGCFA. These modified models and their factor loadings are shown in Figures 2(a)-(c).
The results of the MGCFA are shown in Table 7. It is obvious that non-invariance is given, because the CFI is < 0.09, the RMSEA is > 0.08 and the CMIN/df is > 3.

Reliability
The results highlight the fact that the instrument had generally good to excellent Cronbach's α coefficients (≥0.7) for all dimensions, except for General Health in Polish group. The psychometric properties suggest that HRQoL assessed by the Sf-36 has good reproducibility for the three groups and especially for elderly Germans. However, some low Cronbach's α coefficients were found only within the Turkish and Polish group and not within the German group. Five possible reasons are mentioned in the literature to explain such differences between migrants and natives: First, people with a migration background have different perceptions of health, illness and health-related quality of life (Bowling et al., 2003). Second, the characteristics of the German sample were different from those with Polish or Turkish background (e.g. it was a healthier population) (Razum, 2006). Third, the Vitality and General Health perception scales measure both physical and mental health components (these two dimensions have cross-loadings with both components) and had therefore the most complex assignment and interpretation (Mchorney, Johne, & Anastasiae, 1993). Fourth, these crossloadings can lead to a lack of direct mapping, which in turn can mean that some items are not compatible with the dimensions of General Health. Fifth, the incompatibility of these two dimensions could also arise from the translation and adaptation process which could lead to different understandings of the instrument (Uysal-Bozkir, Parlevliet, & de Rooij, 2013).
While we didn't directly investigate the relevance of the first four possibilities, we tried to address the fifth possible reason by our translation and adaptation process as described in the methods section. Nevertheless, we cannot exclude possible differences due to the translation process.
The results are in contrast to the findings of Bullinger and Kirchberger (1998), who examined the reliability of the Sf-36 within a German norm population (n = 2,914, mean age 47.7). The lowest internal consistency was reported for General Health (0.76) and Social Functioning (0.74). These results are partly consistent with Bullinger (1995) where the internal consistency coefficients of a German group were above 0.70. However, the General Health perception scale ranged from 0.64 to 0.75.  Data from general population samples in 11 countries (including Germany (n = 2,914), however not including Poland and Turkey) were used to assess psychometric data of the Sf-36. Internal consistency reliability of the eight Sf-36 dimensions was found for all scales above 0.74, however, the lowest for Social Functioning (0.74) and again for General Health (0.76). Gandek et al. (1998) explain these dissent α values in the eight dimensions as the result both of linguistic and cultural differences and of differing modes of data collection (mail, phone, and interview).
The results are also partly consistent with that reported by Hoopman, Terwee, Muller, and Aaronson (2006) where internal consistency reliability were above 0.70 for all dimensions, with the exception of Social Functioning and General Health among Turkish (cancer) migrants in Netherlands. Results are not consistent with those reported by Pinar (2005) where Cronbach's α ranged between 0.79 and 0.90. However, both studies focused on Turkish patients. It was not possible to directly compare the current study with other studies because no others targeted healthy Turkish, Polish and German natives over the age of 60.
Moreover, results are inconsistent with a study by Demiral et al. (2006) in which Vitality and Mental Health scales have a low level of internal consistency, with their coefficients at 0.65 and 0.64, respectively. A limited comparison with these results is necessary as the mean age of the Demiral et al. (2006) study group (a Turkish urban group) was 42.9 ± 14.7 years, a young and relatively healthy population, since only 9.0% of their sample was over the age of 65. Demiral et al. (2006) explain this low level of internal consistency as resulting both from a lack of refinement in the questionnairewhich should include a full cultural adaptation and translating process-and from differences reflected by the diversity of the sample.

Factor structure
The results are somewhat different from those reported by Ware et al. (1994) where physical functioning, role physical, bodily pain and general health correlated best with the physical component, while vitality, social functioning, role emotional and mental health with the mental component. SEM didn't support the eight first-order and two second-order factors, respectively, that are the basis for the summary of measurements of physical and mental health.
Within the national groups (Turkish, Polish, and German), modifications had to be undertaken to run MGCFA. As a result, non-invariance of factor structure was shown between the groups. Physical functioning, role physical, bodily pain, general health and vitality were shown to have a relatively high correlation with physical component, while role emotional and mental health were shown to correlate with mental component. Social functioning was shown, in all groups, to have a low correlation with the mental component.
The lack of good invariance between the groups suggests the need to be cautious when interpreting and comparing the results (mean scores of the Sf-36) of Turkish, Polish and German participants. In terms of the research questions, this means that individuals from different populations may interpret HRQoL in different manners. This may be caused by differences resulting from different perceptions, values and behaviours (going hand in hand with migration background and/or cultural background) and by individual preferences, as it is well known that migrants among themselves form a heterogeneous research group. This would mean that models, such as the Sf-36 model from Ware et al. (1994) are not usable to explore HRQoL.
In addition, the differences between the groups may be explained as follows.
They could be caused by linguistic, cultural or translation features. It should be kept in mind that the questionnaire was translated forward and backward, taking conceptual equivalences (Herdman, Fox-Rushby, & Badia, 1997) into account. Turkish participants were classified as Turkish speaking, Polish participants as Polish or German speaking and German natives as German speaking-based on their subjective preference at the interview. In this context, an interview bias could be caused by the translation and adaptation process, in which words, sentences, phrases and questions were changed. These changes could have led to comprehension difficulties in the interviewees.
In addition, it should be kept in mind that the ways that Turkish migrants, Polish migrants and Germans define their health, illness and HRQoL might be different. This could be caused by cultural differences (which would explain the differences between Turkish and Polish migrants). To understand health and HRQoL within different groups is challenging as factors and predictors vary enormously between subjects and groups, as reported by Bowling et al. (2003). Qualitative research methods (like narrative interviews, focus groups, expert interviews etc.) are needed in more detail to understand different concepts of health and HRQoL within groups with different migration and cultural backgrounds.
Although studies have shown that the model of Ware et al. (1994) provides a good structure (Reed, 1998), this study shows that the second-order factorial structure of the Sf-36 differs from the hypothesised structure. This second-order factorial structure could not be supported for all nationality groups within this study. Only the model for Polish women and Polish men had a good fit with the collected data. In any case, the fit of the original model to the collected data was unacceptable. The fit improved after relaxing some of the constraints of the original model (e.g. correlated error variables of mental health and vitality as well as of physical functioning and role physical).
The results led to considerations of whether HRQoL is influenced by more than a Mental and a Physical Component, as there may also be other aspects including political, environmental and cultural aspects which are not included in the original model. In addition, it should be taken into account whether a third component (e.g. general well-being) should be included. This was shown, for example, by the IQOLA Project Group (Keller et al., 1998). This group tried to explore the structure by means of SEM across different countries (including Germany, not Turkey and Poland). They found their data to improve by a single third-order factor, interpreted as general well-being, as there are factor loadings between dimensions and Mental Component, Physical Component and well-being.
We investigated this single third-order factor (results not reported) and came to the conclusion that this structure didn't lead to any improvement as it showed worse goodness of fit parameters. However, comparisons of results of the current study with other investigation weren't completely possible due to the fact that there are very few studies focusing either on the combination of these groups or on a Polish or Turkish translated version of Sf-36 (Vet, Adèr, Terwee, & Pouwer, 2005). Researchers could take this as an indication for the need for further research to explore the reasons for these results in more detail.
When testing the measurement model with German data, Maurischat and Krüger-Bödecker (2004) conclude that there are interferences and common variance explanations between the physical and psychological component. This can be confirmed by the results in this study. The same authors point out that the Sf-36 model may be tested with or without mixed loadings. In this study, these mixed loadings were taken into account, because they correspond to the model of Ware et al. (1994). Investigations into further studies could examine whether and how different mixed loads led to different results.

Limitations
Some limitations of this study need to be mentioned: Findings can be generalised only to Turkish and Polish migrants and Germans aged 60 and above and living in selected districts of Hamburg, Germany. In addition, the generalisation of the results is limited to the analysed groups: the migrants have been defined based on their birth country, but migrant status measured by birth country is not the only differential factor in perceived health and HRQoL. Differences between ethnic groups (e.g. cultural differences between Turkish migrants and Polish migrants themselves which derive from elements such as group cohesion, social support and religion) may change the migration experience and the health and HRQoL, too. However, a distinction between ethnic groups would lead to excessively small sample sizes and limited comparisons. The questionnaire has been translated and adapted and all interviewers have been well educated in conducting interviews with the focus on minority groups. However, the possibility cannot be excluded that some of the Polish and Turkish migrants were reluctant to participate; therefore, the interviews were limited to the individuals found in public places and by friends or family members.
Access through familiar institutions, individuals and the snowball method were useful (see Table 2). These methods could lead to selection bias and differences in results but enabled access to interviewees not coming across on the street and/or public places and not speaking the German language. Otherwise, an access especially to migrants would not have been possible. Often, there is the limitation that migrant groups cannot be reached due to language barriers and mistrust. Consequently, they are not included in different investigations. This limited access is reported in different studies (Berens et al., 2015;Razum et al., 2008). In addition, it became apparent from preliminary discussions with members of the Turkish and Polish community in Hamburg, that an approach via trusted institutions and people was sensible, as otherwise researchers may have expected a high level of mistrust and a high refusal rate towards surveys. All interviews could only be conducted in the appropriate mother languages (Turkish, Polish and German). This ensured that access was guaranteed to the target groups not speaking German and that the questions could be answered by them; however, this lead to a selection bias.
When using the Sf-36 questionnaire, different aspects of HRQoL like constraints in daily life due to pain or emotional problems were recorded, but the items can be perceived very differently from culture to culture. This leads to difficulties in making comparisons between different cultural groups. Finally, the researcher cannot discount the fact that, in spite that all the interviewers were native speakers of the relevant language, some questions may have been difficult to understand and led to misconceptions.
The confirmatory factor analysis (CFA) is an instrument to test reflective measurement models and the multi-group CFA allows the simultaneous estimation of a causal model across different groups.
In this study, the CMIN/df, the CFI and the RMSEA were reported for multi-group CFA assessment. Many additional indices are available, but first not all software programs provide the same indices and second, reviewers may prefer specific indices (Weston & Gore, 2006). However, this study reported the most common indices. The Chi-square value is the traditional measure for evaluating model fit (Hooper et al., 2008). The CFI takes into account the sample sizes that perform well even within small sample sizes. It was therefore used for this study sample (especially for gender comparisons). The RMSEA is 'one of the most informative fit indices' (Rye, 2014) because it is sensitive to the number of estimated parameters in the model (Hooper et al., 2008). These are the reasons for using these particular indices; other indices might have produced different results.
It still remains an area of controversy and discussion which specific goodness-of-fit measures should be used and also which cut-off points are acceptable. This leads to a lack of standard when reporting model fit and to the inability to compare this study with others. Despite the use of the most common indices, namely CMIN/df, the CFI and the RMSEA, in the current study comparisons with other studies using different indices will be limited.

Conclusion
To answer the research objectives on the reliability of the Sf-36 instrument, it can be concluded both from the literature and the analysis of the current study data that, though the Sf-36 was well suited to these populations, it needs to be handled cautiously-especially as concerns the adaptation and translation process-when it is used to analyse the experience of minorities with diverse cultural backgrounds.
The results illustrate that a rigorous and complete invariance is not given and some differences are observable between the groups. However, these differences do not necessarily preclude the usefulness of the instrument or valid comparisons between ethnic minority groups, if researchers are aware of such differences and interpret results with appropriate precaution. Additional and larger studies are needed to study the psychometrics and equivalence of the underlying model of the questionnaire when used among elderly ethnic minority groups in Germany.