Residential satisfaction questionnaires: A systematic review

Residential satisfaction is a topic that has been extensively studied in recent decades because it can offer important insights into the quality of the residential environment. However, many inconsistencies and unanswered questions on this topic still persist. Because the understanding of any field of inquiry is importantly affected by the quality of the methodology and measurement instruments employed, this article explores the current state of development and investigation of the psychometric properties of one of the most widely employed methods of measuring residential satisfaction: self-assessment questionnaires that measure satisfaction by assessing satisfaction with specific aspects of the residential environment. A review of representative studies shows a general lack of properly developed and validated questionnaires, lack of sufficient reporting on the origin, development, and psychometric characteristics of the questionnaires employed, and often too little thought and effort invested in developing and validating questionnaires. Such observations are especially important for evaluating the quality of studies and their implications for residential satisfaction, and they are the points where research practice could be improved.


Introduction
Research on residential satisfaction has taken place for decades in disciplines such as planning, geography, sociology, and psychology (Lu, 1999), and has recently gained renewed interest and yielded insightful developments (Dekker et al., 2011;Aigbavboa & Thwala, 2016;Wang & Wang, 2016). Residential satisfaction has been identified as an important component of life satisfaction, wellbeing, and general quality of life (Lu, 1999;Wang & Wang, 2016), and, because it represents a subjective evaluation of the residential environment, it determines the way individuals respond to their environment (Lu, 1999). From a broader perspective, the importance of residential satisfaction research rests in the fact that many housing policies in different parts of the world often include improving residents' satisfaction with their housing environment as one of their main objectives (Wang & Wang, 2016). To achieve these objectives, an understanding of determinants of residential satisfaction is required (Aigbavboa & Thwala, 2016), and, to evaluate whether the objectives of these policies have been met, a good understanding of whether individuals are satisfied with their residential environment is of utmost importance (Wang & Wang, 2016).
To truly understand residential satisfaction, its determinants, and its implications, it first must be adequately measured (Gifford, 2014). In the history of residential satisfaction research, the most common way of quantifying it is through self-assessment questionnaires, which mostly take one of two main approaches (see Pinquart & Burmedi, 2003): either by measuring residential satisfaction with one or more global or general questions about satisfaction with overall or specific level(s) of the residential environment (Lu, 1999;Li & Song, 2009;Dekker et al., 2011) or by assessment through asking respondents about levels of satisfaction with specific aspects or components of the residential environment (Wang & Wang, 2016), usually resulting in a residential satisfaction index of some form.
Even though research on residential satisfaction has been present for a long time, to the best of our knowledge, no systematic review of the questionnaires employed in studies on residential satisfaction has been published yet. This qualitative systematic review article focuses on reviewing the psychometric quality of questionnaires for assessing residential satisfaction, particularly on psychometric evaluation of the questionnaires that assess satisfaction with a collection of aspects of the residential environment.

Literature review
Residential satisfaction is a multidimensional concept that has been defined in several different theories and frameworks (e.g. Amérigo & Aragonés, 1997;Parkes et al., 2002;Shin, 2016). Most commonly it is conceptualized as the perception of how the actual residential environment meets an individual's residential aspirations (Lu, 1999), therefore representing individual's cognitive responses to the residential environment (Wang & Wang, 2016).
Residential satisfaction can be divided into satisfaction with one's dwelling (housing satisfaction), satisfaction with one's neighbourhood (neighbourhood satisfaction), and general satisfaction with the area (community satisfaction; Pinquart & Burmedi, 2003), which are usually considered separate components of residential satisfaction (Dekker et al., 2011) and are therefore mostly assessed and analysed separately (Aigbavboa & Thwala, 2016). As Buys and Miller (2012) point out, the majority of research on residential satisfaction has focused on only one of these three levels of the residential environment, with satisfaction at the level of neighbourhoods being most focused on, whereas much less is known about satisfaction at the level of dwellings (Aigbavboa & Thwala, 2016). Studies simultaneously assessing more than one of these domains are rare, despite the growing recognition that these domains of residential satisfaction are interrelated and share an overlap of predictors (Parkes et al., 2002). When residential satisfaction is being assessed, individuals implicitly evaluate their current housing situation with regard to more than one level (Galster & Hesser, 1981;Adriaanse, 2007); specifically, interrelatedness is obvious in the assessment of one's housing, which is likely to include its immediate surroundings and even relationships with neighbours (Lu, 1999;Aigbavboa & Thwala, 2016).
There is an extensive body of literature on the conceptualization, measurement, and determinants of residential satisfaction (e.g., Lu, 1999;Dekker et al., 2011;Wang & Wang, 2016). Special interest lies in which aspects of the residential environment predict residents' (global) residential satisfaction (Parkes et al., 2002). Studies to date have revealed some important determinants; namely, housing conditions, neighbourhood characteristics, and household economics (e.g., closeness of neighbourhoods to employment and recreation opportunities, the general appearance of a neighbourhood, the socioeconomic composition of residents, availability of services, etc.; e.g., Wang & Wang, 2016). This question is difficult to address because studies on residential satisfaction vary greatly in many aspects; for example, in the sample characteristics (from nationwide surveys to surveys of individual neighbourhoods) and the range of variables included (Parkes et al., 2002). They often yield contradictory findings on the predictors of residential satisfaction; for example, fear of crime or feelings of safety in some studies proved to be the most important predictors of neighbourhood satisfaction, whereas other studies found that this is a less important predictor in comparison to environmental variables such as sunlight and noise (Parkes et al., 2002), and a similar situation exists regarding crowdedness or population density in the neighbourhood (Wang & Wang, 2016).
There are many inconsistencies in empirical findings on residential satisfaction and, as Lu (1999) points out, at least part of them may be attributable to frequently different definitions of a key residential satisfaction variable among the studieswhich, along with differences in model specification and the type of data collected, prevent a direct comparison of studies' results. Therefore, "the way residential satisfaction is measured is important in empirical analysis because it directly influences the findings" (Lu, 1999: 270).
Two main approaches to measuring residential satisfaction are assessment of general satisfaction and assessment of satisfaction with various aspects of the residential environment (Lu, 1999;Dekker et al., 2011;Wang & Wang, 2016). Although the majority of studies on residential satisfaction employ the approach of single-item indicators (115 studies versus forty-seven studies that employed sum-scales, as reported in a meta-analysis by Pinquart & Burmedi, 2003), measuring residential satisfaction might not be as simple as asking respondents whether or not they like their apartment or neighbourhood. It is known that the satisfaction of a resident can vary depending on many factors; for instance, the standard of comparison individuals have in mind when responding to questions on residential satisfaction and various aspects of the environment (e.g., based on the way these are used by the resident; Gifford, 2014; see also Jansen, 2013Jansen, , 2014, for a discussion on why residential satisfaction usually proves to be relatively high across various conditions). Therefore, it is unlikely that a single question about satisfaction with the residential environment could be an accurate measure of what residents really think about their environment (Parkes et al., 2002).
The second approach -measuring responses to multiple items addressing various components of the environment -most commonly involves preparing a list of attributes of the residential environment that are potentially desirable or deemed important for residents and residential satisfaction, and asking respondents to express their satisfaction with them or (dis) agreement with statements reflecting attitudes toward these attributes, usually on a Likert-type scale. These ratings are then summed up in an additive index to represent an aggregate measure of residential satisfaction (Lu, 1999;Adriaanse, 2007). Some of the main pitfalls in this type of measurement include the arbitrariness with which additive measures are often constructed and individuals being likely to attach different levels of importance to various attributes of their housing environment for their satisfaction, which is very challenging to understand well, making the construction of a reliable measure very difficult (Lu, 1999). With this in mind, some researchers advocate against using this type of measure and claim that an overall measure is a better choice because it avoids these complications altogether (e.g., Lu, 1999). Although following their advice might be justified, it must be acknowledged that residents' opinions about the specific aspects of their environment might offer important insights; for example, they have the potential to reveal which neighbourhood characteristics have a positive/ negative and greater/lesser impact on overall residential satisfaction (Adriaanse, 2007). Therefore, it is a great limitation of a study if residential satisfaction is assessed only through a general question without also focusing on specific attributes of the residential environment (Buys & Miller, 2012), all of this under the assumption that the research is based not only on mere lists of physical and social characteristics arbitrarily defined by the researcher. This is often the case because there is an absence of selection criteria for the attributes included because only a minority of studies have explored the relationship between satisfaction with specific attributes and overall assessments of residential satisfaction (Adriaanse, 2007).
Drawing attention to contradictions among findings in residential satisfaction research and the many questions to be answered regarding assessment, the fundamental question of appropriateness and quality of measures used in residential satisfaction research arises. The importance of this question is reflected in the quote from Furr and Bacharach (2013: 2): "If something is not measured or is not measured well, then it cannot be studied with any scientific validity. If you wish to interpret your research findings in a meaningful and accurate manner, then you must evaluate critically the data that you have collected in your research."

Research questions
The main objective of this review is to evaluate the development and psychometric properties of residential satisfaction questionnaires that measure satisfaction by assessing opinions on specific aspects of the residential environment and are focused on residential satisfaction with housing and neighbourhood because these represent the most personal and immediate home environments (Pinquart & Burmedi, 2003). We focus on the current state of questionnaires used to study residential satisfaction and explore the options for improving existing practices, rather than providing a detailed discussion of attaining a psychometric standard for each of the very heterogeneous group of questionnaires.
Two specific research questions were formulated: 1. What kind of questionnaires are employed in residential satisfaction research? Do researchers use already existing scales, adapt them from some other study or questionnaire, or do they develop them for the studies in question? 2. What procedures have been employed to assess the psychometric properties (generalizability, internal structure, and external validity) of the questionnaires used?

Method
Following the research questions, inclusion and exclusion criteria were formed for the studies included in the review. For the study to be included, the following criteria had to be met: • An empirical, quantitative study focusing on residential, housing, dwelling, and/or neighbourhood satisfaction (excluding community satisfaction); • Main focus on at least one of the following levels of the residential environment: dwelling unit, building or building complex, or neighbourhood (excluding cities and wider regions); • Focus on apartment buildings at the level of dwellings and buildings (excluding studies that explicitly focused on single-family homes and excluding student dormitories, retirement homes, etc.); • Assessment through a self-report questionnaire; • Assessment of residential satisfaction through multiple aspects of the residential environment (excluding studies with only general questions about satisfaction with the residential environment); • Adult population, excluding psychiatric patients and students.
A search of potential studies for inclusion was performed through the University of Ljubljana's digital library database from 28 August to 8 September 2017. The disciplines selected for the search were architecture, psychology, and environmental sciences, which resulted in the following databases/content providers included: PsychINFO, J-STAGE, Scopus, Complementary Index, Academic Search Complete, Science Citation Index, Social Sciences Citation Index, Supplemental Index, MEDLINE, GreenFILE, ScienceDirect, JSTOR Journals, Potentially relevant studies identified and screened for retrieval from an electronic database search (n = 144).
Studies excluded if not matching criteria after reading the abstract (if not clear, a study was retained for full-text reading; n = 74).
Additional potential studies identified through full-text reading of selected studies (n = 65).

Studies excluded if not matching criteria after reading the abstract (n = 34).
Potentially relevant studies for full-text reading (n = 31).

Studies excluded if not matching criteria after fulltext reading (n = 27).
Potentially relevant studies for full-text reading (n = 50).

Studies selected for full-text reading (n = 81).
Final number of studies included (n = 54). ERIC, and PsychARTICLES. The search was limited to the following source types: academic journals, dissertations/theses, conference materials, eBooks, and reviews, with no limit on the publication date. The search terms included residential satisfaction, housing satisfaction, dwelling satisfaction, and neighbourhood satisfaction alone and in combination with the terms scale, measurement, and questionnaire in a Boolean/Phrase search mode. Additional studies were identified through full-text reading of selected articles, as indicated in the flow diagram of the study selection process ( Figure 1).
The selection process resulted in fifty-four studies included in the review, with forty-seven original scales on residential satisfaction complying with the criteria presented. No studies were excluded based on the quality of the questionnaire employed or the study itself because one of the main points of this review is to present the most comprehensive picture possible of such questionnaires.
From the studies selected, the following data were extracted: • Origin of the questionnaire used (already existing questionnaire, questionnaire developed for the study, questionnaire adapted from another questionnaire or system, etc.); • Country where the study was carried out; • Sample size; • Level(s) of the residential environment; • Number of items/aspects; • Item form and scale type; • Psychometric characteristics of the questionnaire reported and procedures employed (examination of the internal structure of the questionnaire, reliability of the questionnaire and its subscales, validation procedures).

Results and discussion
The fifty-four studies included in the review (see Table 1) employed forty-seven different scales or questionnaires on residential satisfaction. The number of questionnaires reviewed is not the same as the number of studies included because the main purpose of this review was to evaluate all available studies on residential satisfaction and the questionnaires they employed that complied with the selected criteria. This decision is further supported by the fact that the process of questionnaire validation is lengthy (John & Soto, 2009), often reported in more than one study. As noted in Table 1, despite our best effort, some of the studies were not available, and therefore the list of studies included is not perfect, but we conclude that it is sufficient to represent the general state of research practice in this field.

Questionnaires in residential satisfaction research
Following the first research question, we examined the type of questionnaires employed in the studies reviewed. Most of the studies (n = 19; see Table 1) did not report where the questionnaire selected for the study originated; that is, there was no reference for the questionnaire employed, nor was information on the development of the questionnaire provided. The second-largest category (n = 18) of studies in this regard consisted of questionnaires developed specifically for the study in question. A smaller number of studies employed already existing scales (n = 8) or adapted them from some other study, questionnaire, or system (n = 9). Most of the questionnaires were employed in only one of the studies reviewed, except for the following five cases: 1) the Scale of habitability used by Phillips et al. (2005) and by Fernández-Portero et al. (2017), 2) the questionnaire used by Jansen (2013Jansen ( , 2014, 3) the questionnaire used by Leslie and Cerin (2008) and by Lee et al. (2017), 4) the questionnaire used by Kellekci and Berkoz (2006) and by Berkoz et al. (2009), where we assume that the same data set was employed in both studies, and 5) the questionnaire used by Ibem and Aduwo (2013) and by Ibem and Amole (2013a, 2013b, where there is no reference for the questionnaire employed in any of the studies, but we assumed that the same scale or a slight variation of it was employed in all four of them based on the reported questionnaire items and characteristics of the questionnaire.
The first thing to note about the studies reviewed is the lack of sufficient reporting on the questionnaires used. For nineteen studies, there was no clear information on the origin of the questionnaire, and therefore very limited information was available to the reader regarding the questionnaire characteristics and development, which are needed for making informed judgements about the quality of the questionnaire employed, the study methodology, and the general quality of the study implications.
The next interesting observation is that, in eighteen out of fifty-four studies, the authors decided to develop a questionnaire for the study in question, usually with very limited reporting on the rationale for making this kind of decision and on the development of the questionnaire, which supports the observation by Adriaanse (2007), who found that residential satisfaction research is often based on lists of characteristics of the residential environment that are arbitrarily defined by the researcher. Although the development of a new questionnaire for a specific study is not incorrect per se, the question arises about the rationality of this decision. In general, the develop-  Notes: 0 Full papers were not available to the authors of this review. 1 If the initial selection of items was reduced for the final form of the questionnaire and/or for the final analysis, the number of the final selection is presented. 2 The same questionnaire is reported, but characteristics of the questionnaires differ. 3,5 The questionnaire employed is probably the same based on the characteristics reported, but there is no direct reference. 4,9 The same questionnaire is used. 6 Both the scale and its abbreviated version are reported in the same study. 7 The same questionnaire base is used, but reported characteristics of the questionnaires differ. 8 Thirty-six aspects were included in the Saga City study and thirty in the Kitakyushu City study. 10 Given the results reported in the paper, we assume that the data set was the same for both studies. 11 The same questionnaire is used, but psychometric analyses differ.  1. sense of community; 2. satisfaction with shared outdoor space; 3. satisfaction with nearby nature; 4. concern about local density; 5. concern about regional density Notes: This table contains information on questionnaires' internal structure, reliability, and validity reported in the studies reviewed, and therefore studies that did not provide any of this information are excluded from the table. 1 Analysis of internal structure of the questionnaire: EFA = exploratory factor analysis, FA = factor analysis (when not reported whether EFA or confirmatory factor analysis was employed), PCA = principal component analysis. ment of a psychometrically sound questionnaire can take several years and require that many studies be conducted, which results in a questionnaire of known characteristics, based on which (among other things) judgements of the study's quality can be made. Although there is a lack of psychometrically sound questionnaires on residential satisfaction, if an ad hoc questionnaire is developed for each study, it is likely that the psychometric characteristics are not assessed thoroughly enough, therefore calling into question the implications of the studies conducted.
We then reviewed the questionnaires with regard to their elemental characteristics. Based on the content of items (rather than on the information reported by the authors of the studies), most of the questionnaires focused on satisfaction with the neighbourhood (n = 18) and slightly fewer on all three levels of the residential environment included in this review (dwelling unit, building, and neighbourhood; n = 16). Only a small number of questionnaires focused on the dwelling and neighbourhood level (n = 8), building and neighbourhood level (n = 3), dwelling and building level (n = 2), and dwelling level (n = 1), which contradicts previous studies that found the majority of research to be focused only on one of the levels of the residential environment (e.g., Buys & Miller, 2012), but is in line with the observation by Aigbavboa and Thwala (2016) that the neighbourhood level is most focused on when studying residential satisfaction.
The questionnaires reviewed included from three to 107 aspects of the residential environment, with an average of 28.6 aspects. These aspects were presented in two forms of itemsnamely, lists of aspects (n = 29) and sentence-like items (n = 14) -and the item form was not reported for four questionnaires (nor were the items themselves). For most questionnaires, respondents indicated their opinions on a five-point Likert scale (n = 21).

Procedures employed to assess psychometric properties of the questionnaires
In the theoretical model of scale development proposed by Loevinger (1957) and elaborated by Clark and Watson (1995), three components of construct validity are important: substantive and structural validity, which together refer to the measure's internal validity, and external validity. Substantive validity focuses on the critical point in the development of any scale because it refers to the theoretical conceptualization of what one wishes to measure and the development of items for potential inclusion in the measure, but it is not the primary focus of this review. Because several conceptualizations of residential satisfaction exist with different implications for measurement attached to them, and because the studies reviewed represented various forms and levels of detail that they include in reporting on the development of the questionnaire items from their conceptualizations, extensive separate review(s) are necessary to fully evaluate this process. Therefore, in the second part of the review process, we focused on structural validity with reliability and processes to understand the structure of the questionnaires, and on the external validity processes employed (see Table 2) because these also are the fundamental concepts that help evaluate the quality of measures (John & Soto, 2009).

Generalizability
Generalizability refers to the degree to which inferences from our observation can be made with regard to other items, samples, measures, and so on, which is one of the fundamental concerns of empirical science. Assessment of generalizability is needed in questionnaire validation because measurements for which evidence of generalizability can be provided are much more useful in comparison to those for which generalizations cannot be made (John & Soto, 2009). In this review, the majority of questionnaires (n = 30) fall into the latter category because no procedure for assessing reliability was reported in any of the studies reviewed.
The notion of generalizability includes traditionally examined concepts of both reliability and criterion validity, which are discussed later in this review. Reliability assessment plays an important role in the psychometric evaluation of a questionnaire because it refers to the consistency of a measurement procedure and its indices imply the extent to which the scores obtained by measurement are reproducible. The characteristics of a participant, testing situation, questionnaire, and experimenter can all introduce measurement error, and investigation of the reliability of the questionnaire offers insight into the amount of this error and provides cues for decisions about whether the amount of this error is still tolerable given the goals of the research. Following generalizability theory (John & Soto, 2009), we are concerned with reliability because of the desire to generalize from one observation to some other class of observations, be it to other items (within the questionnaire), occasions (e.g., satisfaction with a neighbourhood at two points in time), or raters (e.g., when assessing how similar the importance ratings are for various aspects of the environment across residents). It can be argued that, for any questionnaire included in this review, at least one of these aspects would be of interest to researchers and readers.
Depending on the kind of observation we want to generalize, three types of procedures and study designs are typical: internal consistency procedures (items), re-test designs (occasions), and interrater agreement designs (raters; John & Soto, 2009). Among the questionnaires for which procedures of assessing reliability were employed (seventeen questionnaires in twenty studies), the most prominently reported coefficient used was (only) Cronbach's alpha (n = 78). One study also reported test-retest reliability along with Cronbach's alpha value, and another study reported only the value but not the type of coefficient employed. In general, researchers were concerned with generalizability across items because Cronbach's alpha is the most widely used coefficient of internal consistency (John & Soto, 2009;Bonnet & Wright, 2015;Cho & Kim, 2015). For only one questionnaire (Lee et al., 2017), researchers additionally reported correlations between participants' scores at two points in time, expanding the focus of reliability assessment to occasions, and therefore potentially providing more evidence for the generalizability of the inferences based on the measure in question.
Because Cronbach's alpha was the procedure of choice employed in the studies reviewed, it is important to note that Cronbach's alpha should not be an automatic choice. It is an accurate measure of reliability when the test items are approximately essentially tau-equivalent, which implies that they measure a single factor, and when the error scores of the items are uncorrelated. Because the essential tau-equivalence in particular is rarely met in practice, it is recommended that this assumption be examined beforehand (Cortina, 1993;Cho & Kim, 2015), which was not (sufficiently) evident in the studies reviewed. Following general practice, the studies reported only the sample value of Cronbach's alpha coefficient, which generally across the studies, with some exceptions, proved to be at the acceptable level of .80 or .90 (Nunnally & Bernstein, 1994; see Table 2). However, this is not, as suggested by Bonnet and Wright (2015), an entirely appropriate approach, especially for small samples (e.g., as in Potter & Cantarero, 2006;Rioux & Werner, 2011;Dinç et al., 2014;Ibem & Amole, 2013a) because "the sample value of Cronbach's alpha contains sampling error of unknown directions and unknown magnitude" (Bonnet & Wright, 2015: 4). They suggest that confidence intervals for the population value of Cronbach's alpha should also be reported, which is lacking in the studies reviewed.

Structural validity
For the studies reviewed, it is interesting to note that some of the authors (Schwirian & Schwirian, 1993;Potter & Cantarero, 2006;Xue et al., 2016;Lee et al., 2017) report coefficients of internal consistency but do not make reported attempts to assess the dimensionality of a measure. These are important because coefficient alpha does not allow inferences on the dimensionality of a measure (John & Soto, 2009), even though it might be conceived as though it could. If a test demonstrates an acceptable level of alpha, then the error associated with the use of different items is relatively small. However, all that can be inferred from this information is that the test measures something consistently, but exactly what it measures is still unknown, and therefore to form a meaning of a measure some form of construct validation is necessary (Cortina, 1993), which also includes assessing the internal structure of a test (Furr & Bacharach, 2013).
The internal structure of a test is a case of structural validity that requires evidence about the structure of the items being consistent with the hypothesized internal structure (John & Soto, 2009). It refers to the dimensionality of a questionnaire; that is, whether the questionnaire is intended to measure one or more physical or psychological attribute(s) of an object or person (Furr & Bacharach, 2013). The understanding of the type of questionnaire being developed or used in terms of its dimensionality is of utmost importance because different types of tests have different properties, which have important implications for scoring, evaluation, and use regarding the implications they provide. To evaluate a questionnaire's internal structure, a variety of statistical procedures are available (e.g., factor analysis, cluster analysis, multidimensional scaling; Furr & Bacharach, 2013). Among the questionnaires reviewed (see Table 2), for only less than half of them (n = 23 reported in twenty-five studies) some procedure for assessing the internal structure was reported in at least one of the studies. For most questionnaires (n = 20), principal component analysis was conducted, and other methods were less prominent: explor-atory factor analysis (n = 3) and not-further-specified factor analysis (n = 1). The findings of the procedures carried out to assess the internal structure of the questionnaires reviewed are beyond the scope of this review, but attention needs to be drawn to the fact that for twenty-four questionnaires reviewed there was no report of internal structure assessment procedures in any of the studies included (n = 25). Although in at least some of these studies the intended focus might have been to assess satisfaction with the specific, intentionally chosen attributes of the residential environment, with no intention to proceed to total scores representing residential satisfaction and some more complex analyses to add to the understanding of the data in question, this was not the case for many of them.
When analysing the internal structure of the questionnaire, some questions have to be addressed; for example, how many dimensions do test items reflect? If there is more than one, are they correlated with each other and what exactly are those dimensions, or, more specifically, what psychological, physical, or other kind of attributes do they correspond to? This is important because, if there is more than one dimension, each dimension might be assessed by a separate subscale requiring a separate psychometric analysis, the associations between them have implications for the meaning of a "total score" if calculated, and, finally, when it comes to interpretation, the score's meaning must be understood (Furr & Bacharach, 2013). Because many of the studies formed additive indices to represent satisfaction with the residential environment at selected level(s), it might also be wise to explore dimensionality, which could provide further guidance on how to make more informed decisions for the conclusions drawn from the analyses and in general help decrease the state of arbitrariness in which these additive measures are too often constructed, as already noted by Lu (1999) and Adriaanse (2007).

External validity
The external validity of a measure refers to the process usually understood as what validity is all about: it refers to evidence from the process of validating the measure relating to other measures and to non-test criteria in ways that would be theoretically expected. Some of the most common ways to assess external validity are through criterion correlation, where the question is whether measurement scores correlate with the criteria chosen (John & Soto, 2009). This was the method of choice in the nineteen studies (for seventeen questionnaires) that reported information on the validity procedures applied. The most common form of validation procedures reported was to predict or correlate general satisfaction with the chosen level of the residential environment from scores on individual dimensions or aspects included in the questionnaire.  Mridha, 2015;Xue et al., 2016), followed by examining correlation coefficients (Buys & Miller, 2012;Oshio & Urakawa, 2012;McCrea et al., 2014) and structural equation modelling (Fernández-Portero et al., 2017). An interesting procedure was reported by Adriaanse (2007): validating the RESS scale and its abbreviated version, in which the author assessed whether the score on residential satisfaction scale was in an anticipated relation to a participant's neighbourhood.
The structural and external validity of any measurement procedure are only two directions to be explored in the validation process, and the decision to limit the scope of this review to these two directions was guided by the studies reviewed and the reports they made on the efforts put into validating the questionnaires. The classic definition of validity refers to the degree to which a test measures what it is supposed to measure and includes construct, criterion, and content validity, whereas the more contemporary perspective reaches beyond this scope because it states that there must be underlying theory and empirical evidence supporting an interpretation of test scores (Furr & Bacharach, 2013). There is no single statistic that can be reported to prove that the measurement procedure is valid. Validation of any measurement is an ongoing process (John & Soto, 2009), which with every further step has the potential to provide more information and proof for the questionnaire at hand, and to show that its interpretations are worth trusting in specific situations and usage contexts.

Conclusion
After reviewing studies on residential satisfaction and questionnaires following the approach of measuring responses to multiple items on satisfaction with various aspects of the environment, it can be concluded that residential satisfaction is relatively frequently investigated through this approach, but in most cases too little thought and effort are put into developing and validating the questionnaires employed, at least inasmuch as can be observed from the information reported in the studies reviewed. Questionnaires or scales rely on measurement models that, like most models, are simplifications of the concept and situation investigated. "Although they should represent the best possible approximation of the phenomena of interest, we must expect them, like all 'working models, ' to be eventually proven wrong and to be superseded by better models. For this reason, measurement models must be specified explicitly so that they can be evaluated, disconfirmed, and improved" (John & Soto, 2009: 462). However, as Clark and Watson (1995) observed, the complexity of these concepts is still not fully appreciated by researchers, and their statement also holds true for residential satisfaction. The lack of employing and reporting validation procedures is making assessment of the quality of studies that employ these questionnaires an overly taxing job. The lack of properly developed and psychometrically tested questionnaires might also contribute to the fact that researchers so often decide to form their own measures because there are not many readily available questionnaires for use in this research, as a result of which there is continued inadequacy in residential satisfaction questionnaires.
Based on this review, a few recommendations for increasing the quality of research on residential satisfaction can be made. First of all, researchers (and reviewers) should make sure to provide clear information about the questionnaires employed in all of the publications on the topic. This information should include the origin of the questionnaire and its basic characteristics (type of questionnaire, response scale, example of a questionnaire item, internal consistency coefficients, etc.). Even though there are not many readily available validated questionnaires on this topic, researchers should invest more effort in including questionnaires already developed. Where this is not possible and a questionnaire is still needed, development of a new questionnaire should be carefully planned. It should be based on thorough assessment of the theoretical foundation of the questionnaire, including an examination of the criteria and justifications for including specific aspects of the residential environment. Because these criteria and justifications are lacking in the field, this also represents an opportunity for more extensive research. Furthermore, when developing a new questionnaire, items should be carefully formulated, and then the generalizability and the structural and external validity of the questionnaire should be assessed. When this kind of process has taken place, an effort to publish it should be made for at least two reasons: first, to inform other researchers of the existence of a questionnaire that could potentially be helpful to them, and, second, to make empirical research employing the questionnaire in question more transparent. The same should be provided for translations of already existing questionnaires. With these recommendations in mind, in our opinion researchers can improve their work and make important contributions to studying residential satisfaction.
Gregor Sočan University of Ljubljana, Faculty of Arts, Department of Psychology, Ljubljana, Slovenia E-mail: gregor.socan@ff.uni-lj.si