Evaluating assumptions of scales for subjective assessment of thermal environments – Do laypersons perceive them the way, we researchers believe?

scales changed with contextual factors, such as climate, season, and language. These ﬁndings highlight the need to carefully consider context-dependent factors in interpreting and reporting results from thermal comfort studies or post-occupancy evaluations, as well as to revisit the use of rating scales and the analysis methods used in thermal comfort studies to improve their reliability.


a b s t r a c t
People's subjective response to any thermal environment is commonly investigated by using rating scales describing the degree of thermal sensation, comfort, and acceptability. Subsequent analyses of results collected in this way rely on the assumption that specific distances between verbal anchors placed on the scale exist and that relationships between verbal anchors from different dimensions that are assessed (e.g. thermal sensation and comfort) do not change. Another inherent assumption is that such scales are independent of the context in which they are used (climate zone, season, etc.). Despite their use worldwide, there is indication that contextual differences influence the way the scales are perceived and therefore question the reliability of the scales' interpretation. To address this issue, a large international collaborative questionnaire study was conducted in 26 countries, using 21 different languages, which led to a dataset of 8225 questionnaires. Results, analysed by means of robust statistical techniques, revealed that only a subset of the responses are in accordance with the mentioned assumptions. Significant differences appeared between groups of participants in their perception of the scales, both in relation to distances of the anchors and relationships between scales. It was also found that respondents' interpretations of

Introduction
The first part of the widely used and often cited definition of thermal comfort states that "thermal comfort is the condition of mind that expresses satisfaction with the thermal environment and is assessed by subjective evaluation" [1] . Despite the apparent simplicity and elegance of this definition, determining and providing such conditions is a complex and partly unresolved task.
As stated in the second part of the definition, thermal comfort "is assessed by subjective evaluation" [1] . Rating scales are widely used to collect such subjective evaluations of thermal conditions in built environments. Most commonly, thermal sensation is assessed to determine whether a specific thermal condition can be considered comfortable or not [ 1 , 2 ]. The most prominent scale used for the assessment of thermal sensation is the ASHRAE 7-point scale, which consists of seven verbal anchors: "cold", "cool", "slightly cool", "neutral", "slightly warm", "warm", and "hot". Thereby, the objective is to describe a one-dimensional relationship between the physical parameters of the indoor environment such as air temperature, mean radiant temperature, air velocity, relative humidity, personal parameters such as activity level and clothing insulation, and subjective thermal sensation [3] . At the same time, ISO 10551 [4] suggests to use one or more dimensions for the assessment of thermal perception depending on the subject of the examination. Dimensions mentioned in ISO 10551 are thermal sensation (from "cold" to "hot"), affective aspects (the level of discomfort "comfortable" to "very uncomfortable"), thermal preference (from "colder" to "warmer"), personal acceptance ("generally acceptable", "generally unacceptable"), and tolerance of the indoor environment (5-points from "perfectly tolerable" to "intolerable"). Regarding data analysis, ISO 10551 [4] gives guidance on both the analysis of thermal sensation votes obtained on ordinal measurement level as well as the determination of percentage of dissatisfied from thermal sensation votes obtained from relatively small samples of respondents. These guidelines seem to be rarely applied in practice.
Several explicit and implicit assumptions underlie the usage of these rating scales. The following three of these assumptions relevant for rating scales applied in the area of thermal comfort research will be briefly introduced in the background section and further explored in light of the results in the discussion section. These are assumptions related to: 1) The distances between individual verbal anchors of a scale, 2) the relationship between verbal anchors from different dimensions of rating scales, and 3) the independence of the scale interpretation of the context in which the scale is used (climate zone, season, etc.).
This paper reports the results of a large-scale international collaborative questionnaire study, which had as its main objectives: a) To review the validity of some of these assumptions related to scales for subjective assessments of thermal environments and; b) to investigate possible differences in the interpretation of such scales due to the context (e.g. climate, or season).
Beyond the scope of this introduction and paper are discussions related to the type of scales (ranging from binary outcome, verbal description, multi-point scales, to visual analogue scales) or the number of anchors used (see e.g. [5][6][7][8] for further discussions of this topic).

Assumptions related to distances between verbal anchors
The common approach for assessing thermal sensation indoors assumes that the distances between individual verbal anchors are equal. For example, the distance between "cold" and "cool" is assumed to be equal to that between "cool" and "slightly cool". This assumption is underlying both approaches in thermal comfort (heat balance and adaptive comfort model) presented in ASHRAE 55-2017 [1] . However, recent research has questioned this assumption [ 6 , 9-12 ] and hence, the applicability of analyses widely applied (e.g. linear regression). For example, Fuchs et al. [12] analysed the equidistance assumption based on a data set from 63 participants by latent class regression (LCR). Their analyses revealed the existence of subgroups, whose responses vary in the magnitude/extent of equidistance. Likewise, Al-Khatri and Gadi [13] applied the successive categories method to investigate this assumption of equidistance and the assumption of coincidence between the middle category's centre and the centre of the thermal continuum of translated versions of ASHRAE, Bedford, and Nicol scale; three different scales often used in surveys. Their findings revealed irregular widths of categories and shifts of central categories from the centre of the thermal continuum.
The assumption of equidistance combined with the assumption of coincidence between the centre of the middle category and the centre of the thermal optimum on the thermal sensation scale implies that a complete symmetry of the scale's categories can exist.
Thermal comfort scales used so far have different representations: some are symmetrical, such as the thermal sensation scale [14] , while others are presented like an intensity scale, having only one verbal anchor for the comfortable category, but more than one verbal anchor for different extents of feeling uncomfortable [4] . Although indicating an ordinal scale type, for which, based on its general definition, the distance between naturally ordered categories is unknown, the thermal comfort scale is often interpreted with equidistant verbal anchors.
Often used comfort acceptance scales are dichotomous: "generally (not) acceptable" [4] . In recent years also continuous scales have been developed for thermal indoor environment acceptance [15] , comparable to the scales used for indoor air quality assessment [16] . On such scales, the scale's centre is "0 with the adjacent verbal anchors being "just (not) acceptable" equalling but not being exactly 0 and the verbal anchors marking the scale's ends being "clearly (not) acceptable".

Assumptions concerning the relationship between different dimensions of rating scales
A classic assumption is that the middle three verbal anchors of the widely applied ASHRAE-scale, i.e. "slightly cool", "neutral", and "slightly warm", represent thermally comfortable or acceptable conditions, i.e. satisfaction. This assumption is the basis for the relationship between predicted percentage dissatisfied (PPD) and the predicted mean vote (PMV) [17] and was used to establish acceptance levels for the adaptive comfort model [18] . This assumption appears to originate from a study by Gagge et al. [19] that found that the subjective perception of "comfort" and "neutral" from one male subject occurred at the same temperature, and that discomfort began to occur at "slightly cool" or "slightly warm". Fanger cites these findings in his formulation of the PPD index [17] , which is subsequently cited by de Dear and Brager [20] in the development of the adaptive comfort model's upper and lower acceptability limits.
Several studies with far greater sample sizes have since then shown individual and contextual differences that do not support this assumption [ 6 , 10 , 21 , 22 ]. Back in 1998, based on a study with 100 German subjects voting on their thermal sensation and satisfaction, Mayer [23] showed a PMV-PPD curve shifted to the right, i.e. a value between neutral and slightly warm associated with the lowest PPD and the lowest PPD being 15% instead of 5% as given by Fanger [17] . The review by van Hoof [24] presents three more variations of the PMV-PPD relationship based on 1866, 40, and 1200 votes and assigns them to different ventilation types and sample sizes in addition to a large variation in the contextual differences in these studies (laboratory vs. field, Korean, Brazilian, German climatic context). In a study with 140 participants, the difference between comfort and neutrality was statistically proven [11] . In this study, conducted in winter season with mostly English participants, comfort was shown to equal a preference for warmer conditions and neutrality did not represent the mid-point of the thermal continuum. Likewise, such a differentiation between comfort and neutrality was also revealed in a recent study that investigated Eastern Arabs' interpretation of ASHRAE thermal scale phrases [9] , in which thermal sensations on the cold side of the scale were considered as comfortable. Using the ASHRAE Global Thermal Comfort Database II, analyses with 50 0 0 to 90,0 0 0 data points showed that the PMV model predicted thermal sensation correctly only in one out of three times and that the PPD was not able to predict dissatisfaction rate, if PMV is used as input. Cheung et al [22] showed the relationships between the observed thermal sensation and the observed percentage of unacceptability depend on climate, ventilation strategy, and building types. The percentage of dissatisfied was usually between 15-25% in the neutral zone.

Assumptions related to contextual independence
A third aspect related to the use of any thermal perception scale is related to differences in respondents' perception or interpretation of a scale either due to the background of the individual (climatic or cultural) or due to the language and choice of words for the verbal anchors [ 10 , 25 ]. While there are numerous studies analysing contextual influences, such as the type of building or outdoor conditions, on thermal perception, there is hardly any research on the effect of the interpretation of scales.
Research has shown that translation and semantic aspects are meaningful in the assessment of thermal comfort. For instance, a sample of Eastern Arab students translated "cool" to an Arabic semantic that literally means "moderate", "mild", or "neither cool nor warm", which indicates a shift of neutrality to the cool side of the thermal scale [9] . Additionally, this study compared the translated phrases used in some recent Arabic studies of thermal comfort and results suggest that the literal translation, without consideration of the climatic background, results in sensation votes perceived as comfortable, but outside the three central categories.
The literature in the field of semantics of the words describing thermal sensations confirmed the dependence of temperature sensitivity on climate regions [25] . A total of 1141 university students of two different climate regions, Bangladesh and Japan, were in-vestigated using the ASHRAE 11-point scale. The survey reported distinct preferences for levels of comfort and sensitivity to temperature for the two groups, namely "neutral" and "cold" for the first group from Bangladesh, and "cool" and "hot" temperature for the second. The authors suggest that these differences reflect the acclimatization effect in different climates. Although it was emphasized that words describing thermal sensations can be found in each dictionary, their exact semantic meaning does not correspond perfectly to the ASHRAE definitions. This linguistic aspect could cause difficulties for non-English speaking researchers in thermal comfort studies.
The importance of words and their definitions used in thermal comfort studies was also indicated by Damiati et al. [26] . They describe a comfort survey conducted in four different Asian countries. Related to semantics, they mention that in the ASHRAE Japanese translation "cool" and "warm" have positive meanings. To avoid mistakes in indicating discomfort conditions amongst Japanese respondents they chose the 7-point scale based on SHASE (Society of Heating, Air-conditioning and Sanitary Engineering of Japan) with less affective associations.

Relevance and research questions
An increasing amount of evidence, partly explored above, demonstrates that the traditional assumptions are likely not comprehensive or inadequately understood in the analysis of thermal perception scale data. This is a critical issue because these assumptions are implicit in thermal comfort models, incorporated in international Standards (e.g. ASHRAE Standard 55-2017 [1] , CEN EN 15251-2007 [2] and CEN EN 16798-1 [27] ) that are applied in design and operation of buildings worldwide. The design of passive measures and building concepts to support adaptive actions [28] or configurations of active HVAC systems rely on these standards and they are important for decisions by designers or engineers whether or not comfort criteria are achievable with different design alternatives. With respect to the operation of buildings, there are trends towards personification of climatic control in buildings [29] and the development of personal comfort models [30][31][32] . New technologies allow collecting data on thermal perception from occupants and utilize such data for control/optimization of HVAC [33] . In this context, it is important that thermal perception data are collected using appropriate scales and analysed using appropriate statistical methods. A study by Petersen and Pedersen [34] is one example of a promising thermal sensation data polling station, in which any assessment of thermal sensation is mixed with assessment of comfort. The risk is that through the application of potentially inappropriate standards, buildings are designed or operated based on decisions that either do not meet the needs of the occupants or do not consider the potential for relaxing requirements and therefore may have unintended consequences, e.g. excessive energy consumption or impact on occupants' health and wellbeing.
The main research questions arising from the introduced assumptions and research background are as follows: 1 Related to assumption 1: Are the distances between the verbal anchors of thermal perception scales, i.e. the thermal sensation scale, thermal comfort scale, and thermal acceptance scale, perceived by subjects as equidistant or not? 2 Related to assumption 2: What is the relationship between verbal anchors of the thermal sensation, the thermal comfort, and thermal acceptance scale? In addition, is the traditional assumption correct that the three middle votes of the thermal sensation scale can be considered as comfortable? 3 Related to assumption 3: Is the relationship between verbal anchors of thermal sensation and thermal comfort scale indepen-dent of contextual factors, particularly of short-term climatic context (e.g., season), long-term climatic context, and language?

Methods
The methods described below were developed within IEA EBC Annex 69 through several discussion rounds between an international and interdisciplinary group of researchers from the field of thermal comfort -the initial core group. In addition to faceto-face discussions, an online survey amongst these experts was conducted to speed up the communication. The details of these discussions, including the steps described in brief below up to Section 2.3.2 , were submitted and registered to the Open Science Framework as pre-analysis plan (PAP) [35] . The aforementioned process resulted in the development of a detailed survey questionnaire focused on composition and interpretation of assessment scales. At the time of submission of the PAP, one application of the questionnaire had been conducted, but the resulting completed questionnaires were securely stored and untouched until submission of the PAP.

Questionnaire and target group
The questionnaire consists of an introductory page, the main part dealing with the scales (2 pages), and a fourth page addressing the respondents' background and current thermal state (see English version in supplementary materials and all language versions in the online repository).
The questionnaire was designed to investigate scales on thermal sensation, thermal comfort, and thermal acceptance. In the main part, instructions prompted respondents to position the verbal anchors on a straight line with a verbal anchor fixed at either end (free-positioning task). A first application of the freepositioning task for the thermal sensation scale was presented by Pitts [11] . The free-positioning task applied in this study is based on previous work, which implemented and established the freepositioning task within a structured interview [ 6 , 36 ]. During these interviews, the free-positioning task was combined with a thinkaloud technique [37] , which enabled a direct comparison of participants drawings and verbalized explanations [38] . Further, Fuchs et al. [36] showed the predictive capacity of results obtained by the free-positioning task for participants' voting behaviour under experimental conditions. Fig. 1 illustrates the five free-positioning tasks, of which three were related to the distance between verbal anchors on the same scale and two about the relationship between (1) thermal sensation and thermal comfort and (2) thermal sensation and thermal acceptance. The questionnaire (see supplementary material) comprised written instructions with examples, unrelated to thermal comfort, demonstrating how to deal with this task. Respondents did not receive any further verbal instructions from the researchers in order to minimize potential biases or differences between applications. Each drawing was followed by the question whether the distances between verbal anchors were intended to be "all equal", "some equal / some not" or "none equal". In addition, respondents were asked to draw a circle representing the area they perceived as comfortable on the straight line for thermal sensation and a circle for the area perceived as acceptable on the line for thermal comfort.
On the fourth page, questions prompted respondents to state their current thermal perception, sex, age group, country and city of residence, previous residence, and origin together with the period they have been residing in the current city of residence.
Additional information collected by the researchers during or after distribution of the questionnaire were outdoor conditions acquired from available data from close-by weather stations (either owned by the researchers, available to researchers, or using wunderground.com) and optionally a record of the indoor conditions.
The initial core group developed the English version of the questionnaire. Researchers from the core group using other languages than English translated the questionnaire and discussed the translation among experts in the field. The initial English version and the other language versions were piloted by seven independent research groups in six countries (Australia, China, Germany, Korea, Sweden, UK). Each of these language versions were tested with at least seven individuals (laypersons and experts) to ensure that questions were perceived as intended. After these pilot tests, the core group finalized the questionnaire. The group of researchers was extended through a call within the Network for Comfort and Energy Use in Buildings (NCEUB) and personal contacts. In the following, individual research groups developed additional language versions following the same protocol as outlined above.
In total, 21 language versions were developed. Verbal anchors of the English version were chosen according to ISO 10551. In other languages, verbal anchors were either taken from ISO 7730, national versions of ISO 10551, EN 15251, ASHRAE 55, already existing language versions, or new translations (see supplementary material for full list and descriptions). Table 1 shows the translations of the verbal anchors used for the thermal sensation scale. Translations of verbal anchors for the thermal comfort and acceptance scales are given in the supplementary materials.
The questionnaires were distributed as paper-pencil versions. In each city, the questionnaire had to be distributed as a minimum two times during two distinct seasons, depending on the local climate. Data had to be collected from a minimum of 100 respondents per city (minimum 50 per season in case when more than two seasons were collected). Questionnaires were distributed preferably at the end or, if necessary, during classes, when respondents had been seated for at least 30 minutes.
Respondents were university students, because they have a minimal variance in age/activity level, promoting the focus on cultural differences, and because they are easy to access. In addition, students should not have been acquainted with the concept of thermal comfort in their studies. Each respondent could only participate once (between-subject sample).
Ethic approvals were acquired where institutional or national requirements made it necessary.

Data preparation
The position of the verbal anchors drawn in the free positioning tasks was quantified using a ruler and by measuring their distance from the left end of each horizontal line. Researchers participating in this study were advised to print the questionnaire in a way that the actual length of the line was exactly 100 mm in length. However, there were several cases where the printouts were slightly distorted, i.e. the lines were shorter or longer. Researchers reported the actual length of the line in the printout together with the measured distances. In case a line was distorted, the measured values were retrospectively adjusted for the ratio of the actual length of the line in the printed version to the prescribed length of 100 mm.
Köppen-Geiger (KG) classification was derived for the place of survey (provided by the researcher), and the places of residence, previous residence, and origin (as stated by the respondents). To get the KG class for each combination of city and country, the KG world map (Version March 2017) provided for R [39] was used, which represents a re-analysed KG map [40] .
Translations of verbal anchors for thermal sensation differ in ISO 10551 with respect to the number of adjectives for the verbal anchors of the warm and cold side of the scale -the same applies to the translations used for this study. Some languages, such as   English, have in total four different adjectives, two on each side alongside neutrality ("warm" and "hot", "cool" and "cold") -for these language versions, the variable sensation type (SensType) has the value 2. Other combinations and their values are explained in Table 2 .
Based on the comparison between the language of the questionnaire and the official languages in the country of origin, the variable native was either "yes", i.e. the questionnaire language was (one of) the official language(s) in the country of origin, or "no".
Independent variables, their scale of measurement, and levels (if applicable) relevant for this article are presented in Table 2 . For nominal variables, the reference level is defined as the level having the highest frequency in the data.
Individual research groups digitalized the data from their questionnaires and submitted the data to the project leader. The project leader controlled the raw dataset using an automated R-script together with visual inspections and frequently demanded further checks in case of inconsistent datasets. A detailed description of all data quality checks can be found in Schweiker et al. [41] . Table 3 presents a detailed overview of applied statistical analysis methods aligned with the research questions stated in the introduction.

Descriptive statistics
Descriptive statistics were used to summarize the univariate data obtained with the free-positioning tasks. In order to present the variability inherent in the data, a focus was placed on showing means alongside medians and quartiles.
In order to assess the agreement between the responses in the free-positioning tasks and the question, whether the drawing was intended to be equidistant, each response (i.e. subject and question) was classified as a) equidistant or b) non-equidistant. Based on a similar assessment presented by Schweiker et al. [6] , the following procedure was described in the PAP: "If all positions of verbal anchors are within ±2 standard deviations (SD) from the theoretical equidistant position of the subset of those answering all equal, the response is regarded as equidistant" As the survey resulted in a dataset that was 100 times larger than the one analysed in the reference study [7] , the range of ±2 SD was found to be too large to apply here; it would have resulted in a tolerance of up to half of the line length. Therefore, 0.32 SD, 0.68 SD, and 1 SD were used here, leading to comparable absolute tolerance levels in mm and referring to around 25%, 50%, and 68% of responses.

Latent class regression analysis
Latent Class Regression (LCR) analyses were used in order to define groups of respondents, whose patterns in the free-positioning tasks differ significantly from each other. The R-package flexmix [44] and function initFlexmix were used, which automatically select a meaningful number of classes based on maximum likelihood principles and the Integrated Completed Likelihood (ICL) criterion.
In contrast to other clustering methods, LCR analyses derive clusters using a probabilistic model to describe the distribution of the data. Because a statistical model is used, the probability that certain cases are members of a certain latent class (it is assumed that there is some process or latent structure underlying the data) can be assessed as well as the goodness of fit, both not possible with other clustering methods.
The regression model used in R with LCR was defined as

Position on line [ mm ]
∼ poly ( as . numeric ( Name of verbal anchor ) , 3 , raw = T ) (1) which creates a third order polynomial model with the measured position on the line as dependent variable and the ordered verbal anchor names transferred to integers as independent variable [see also 12]. Note that in the R language, the dependent variable is written to the left of the ~-sign and the independent variable(s) to the right.

X ²-tests
X ²-tests comparing the first order with the second order polynomial model of the observed mean positions for each scale with the theoretical equidistant positions were used to assess the Table 3 Mapping of research questions, analysis methods, and corresponding section in methods and results.

Research question
Analysis method Details in section assumption of equidistance quantitatively. Further, X ²-tests were used to analyse whether observed frequencies of cluster sizes for distinct contextual differences in season, climate of residence, and language differed from expected frequencies. The R-function chisq.test was used and individual contributions of cells to the X ²statistic were analysed based on percentage of each cells' residual on the X ²-statistic.

Data characteristics
The total number of questionnaires distributed was 9111 of which 8225 were submitted to the analysis (mean response rate 86%). The percentage of female/male respondents was 46.3 %/51.6 %. 0.3 % stated "Other" and 1.8 % either chose "I do not wish to specify" or did not respond to the question.
An overview of the geographic distribution of climates of residence, previous residence, and origin is given in Fig. 2 . Note: for reasons of keeping respondents' anonymity, respondents from climate zones with less than 5 respondents were placed in the group "other". 600 (7.3 %) of the respondents had been living less than one year in their current location, 677 (8.2 %) between one year and three years, and 6268 (76.2 %) longer than 3 years in the climate zone of residence.
The distributions of the current thermal state of the respondents are presented in Fig. 3 . Sensation votes were nearly normally distributed with a tendency towards the cold side. The majority of respondents felt comfortable, preferred no change, and perceived conditions as just acceptable.
Indoor and outdoor conditions during and preceding the application of each questionnaire are summarized in Table 4 . Fig. 4 summarises the results of the free positioning task for the thermal sensation, comfort, and acceptance scales. It can be seen from the figure that the positions assuming equidistance are always placed within the interquartile range.

Descriptive statistics of interpretation of distances
Visual interpretation of Fig. 4 suggests that (1) the thermal sensation scale anchors are close to equidistance with a slight shift towards the cold side, (2) thermal comfort scale anchors are not equidistant and shifted towards extremely uncomfortable, and (3) thermal acceptance scale anchors are close to equidistance. Based on the X ²-tests, distributions of the positioning of the verbal anchors for thermal sensation and comfort ( Fig. 4 a) and b)) do not follow the assumption of equidistance (sensation: X ² = 10.7, df = 1, p < .0 0 01; comfort: X ² = 50.9, df = 1, p < .0 0 01). The majority (56%) of respondents indicated that only some verbal anchors of thermal sensation scale should be equidistant ( Table 5 ). From Fig. 4 b) it can be seen that the positioning on the comfort scale has the largest variance. The distances between "comfortable"/"slightly uncomfortable" and "slightly uncomfortable"/"uncomfortable" appear rather similar and larger than the distances towards "very uncomfortable" with different intentions regarding equidistance documented in Table 5 .
The distribution of the verbal anchors on thermal acceptance scale is according to the equidistance assumption (X ² = 0.12, df = 1, p = .63). This result stands in contrast to the com-    mon assumption of positioning the two verbal anchors around 0, hence in the centre of the scale. One third of respondents each stated "all"/"some"/"none" of the verbal anchors being equidistant ( Table 5 ).

Latent classes of perceived distances
While the previous section describes general trends, the results of the LCR analysis provide a more detailed picture ( Fig. 5 ). The subgroups resulting from the LCR show distinct patterns suggesting quite different interpretation of the scales. The number of statistically distinct clusters (subgroups) resulting from the LCR analysis differs across the different scales used: 6 for thermal sensation, and 8 each for comfort and acceptance.
For thermal sensation, subgroup 2 consists of the largest groups of respondents (25.6%) and their pattern shows nearly equidistance, but slightly shifted towards cold. Still, the majority of respondents of this group -nearly 3 out of 4 -did not perceive the thermal sensation scale as equidistant. Subgroup 1 perceived the distances narrower on the cold side, while subgroup 5 perceived them narrower on the warm side. Subgroup 4 perceived distances between the middle votes narrower, while subgroup 6 places narrower distances close to scale extremes. Subgroup 3 had the largest variance and is not well definable.
The verbal anchors of the thermal comfort scale are perceived by the largest group (subgroup 7) with the largest distance between "comfortable" and "slightly uncomfortable". This tendency is even more extreme in subgroup 3 (the second largest subgroup) and subgroup 1, indicating a very large comfort range. In contrast, subgroup 2 perceives the largest distance between "extremely" and "very uncomfortable". Positions of subgroups 6 and 8 appear nearly equidistant. Subgroup 5 perceived "slightly uncomfortable" close to "comfortable" and the other two verbal anchors closer to "extremely uncomfortable". Subgroup 4 had the largest variance and is not well definable.
For thermal acceptance anchors, the largest group (subgroup 4) perceived the distances as equidistant, but its weight is far from the majority. For most, the distance between the two middle verbal anchors increased from subgroup 4 to subgroups 5 (with a slight shift towards "unacceptable") and 8 (with a shift to "acceptable") to subgroups 3 and 1 showing the largest distance between "just acceptable" and "just unacceptable". The distance between these two verbal anchors is minimal for subgroup 2, which is close to the assumption, that "just unacceptable" and "just acceptable" are close to each other in the middle of the scale. Note that subgroup 2 was the second largest in the sample. Subgroup 6 perceived the two verbal anchors closer to "unacceptable", while subgroup 7 closer to "acceptable".

Perceived relationship between dimensions of thermal perception (assumption 2)
The questionnaire contained two different ways to assess the perceived relationship between dimensions of thermal perception as presented in the following.

Overall interpretation of relationship between dimensions
In addition to drawing the position of verbal anchors for thermal sensation in Q1, respondents draw circles around the verbal anchors, which they perceive as comfortable. As presented in  Fig. 4 a, overall the two boxes denoting the 1st and 3rd quartile of the lower and 1st and 3rd quartile of the higher end of comfort range as indicated by respondents comprised the central three categories of thermal sensation: "slightly warm", "neutral", and "slightly cool". Fig. 6 a shows that the majority of respondents made their circle around no or one verbal anchors only, but at the same time Fig. 6 b confirms, that the middle three verbal anchors are most often within the comfort range.
The two boxes representing the 1st and 3rd quartiles of the acceptance range ( Fig. 4 b) cover two verbal anchors of the comfort scale: "comfortable" and "slightly uncomfortable". This is again confirmed by the analysis presented in Fig. 6 d with these two verbal anchors appearing most often inside the acceptance range. Hence, for a large number of respondents, the perception of slightly uncomfortable conditions is still acceptable. Fig. 7 summarizes the drawings of the free positioning task combining a) thermal sensation and comfort, and b) thermal sensation and acceptance. The distribution of all data tends to be steeper on the warm side, indicating that warmer sensations would be perceived being less comfortable/acceptable compared to cooler sensations. Interquartile ranges are lowest for "hot" sensations followed by "slightly cool" sensations, but show a comparable magnitude for "cold", "cool", "neutral", "slightly warm", and "warm" sensations. The difference between positioning verbal anchors for thermal sensation on the comfort scale compared to the acceptance scale appears to be rather small.
The perception of comfort (the area between "slightly uncomfortable" and "comfortable" in Fig. 7 a) tends to occur for the range between the 1st and 2nd quartile of "slightly cool" and "neutral". This result is in good agreement with Fig. 4 a. The acceptance range (the lowest half of Fig. 7 b) covers also large parts of "cool" and to a bit lesser degree "warm", hence "slightly uncomfortable" conditions. This tendency is in agreement with the results presented in Fig. 6 d.

Latent classes of relationships
As observed for individual scales ( Section 4.1.2 ), the LCR analysis shows distinct patterns indicating quite different interpretation of the relationships between the verbal anchors of the scales.
The number of statistically distinct subgroups was 6 each for sensation on comfort and for sensation on acceptance. In Fig. 8 , the differences between these subgroups are clearly visible: • Subgroup 1 -"warm comfort" (the largest group): Comfort is associated with verbal anchors on the warm side. • Subgroup 2 -"3 central category comfort": Most symmetric pattern that follows the classic assumption of the three central verbal anchors of the thermal sensation scale being within the comfort range. • Subgroup 3 -"wide comfort range": "Cool" included in the comfort range. • Subgroups 4 and 6 -"cold comfort" and "cool comfort": Comfort was associated with verbal anchors on the cold side, with subgroup 4 being more extreme than subgroup 6 in this regard. • Subgroup 5 -"linear": Nearly equidistant relationship with cool side in the comfort range, "neutral" placed as "uncomfortable" and warm side towards the "extremely uncomfortable" end. Given the almost perfect intervals and small interquartile differences, this subgroup will be investigated further in the discussion section. Fig. 9 presents the results for each subgroup resulting from the LCR related to thermal sensation anchors on the thermal acceptance scale: • Subgroup 1 -"wide acceptance": Symmetric with the five central anchors (including "cool" and "warm") within the acceptance range. • Subgroups 2 and 3 -"cool acceptance" and "cold acceptance": The cool side is more acceptable with subgroup 3 being more extreme than subgroup 2 in this regard. • Subgroup 4 (the largest group) -"3 central category acceptance": The three central verbal anchors of thermal sensation within the acceptance range.  • Subgroup 5 -"linear": Nearly equidistant with the cool side in the acceptability range, "neutral" placed between "just acceptable" and "just unacceptable", and the warm side towards the "unacceptable" end. Given the almost perfect intervals and small interquartile differences, this subgroup will be investigated further in the discussion section. • Subgroup 6-"warm acceptance": The warm side is more acceptable.

Season and climate
In order to analyse the influence of climatic context (season and type of climate of residence) on the perceived relationship between the verbal anchors of thermal sensation and thermal comfort, frequencies of each subgroup in different seasons were compared.
In Fig. 10 , the frequencies for each subgroup and each season are presented. Only seasons with more than 100 respondents were considered here. Subgroup 1 ("warm comfort", see Fig. 8 ) is most frequent in autumn and winter ( Fig. 10 ). Subgroup 2 ("3 cen-tral category comfort") is frequent in all seasons except "dry" and "wet". Subgroup 3 (wide comfort range from "cool" to "slightly warm") is most frequent in spring, while the frequency of subgroup 4 ("cold" to "slightly cool") is low in autumn, winter, and spring, but above average in "dry" and "wet" seasons. Subgroup 5 (equidistant with "cool" as lowest) is highest in "dry" and "wet" seasons, and subgroup 6 ("cool" to "neutral") has the highest frequencies in "summer" and "wet" seasons.
The X ²-test of independence reveals that there is a significant association between season and the subgroup number (X ² = 964, df = 25, p-value < 0.0 0 01).
In Fig. 11 , the frequencies for each subgroup and climate zone are presented. Only climates of residence with more than 100 respondents were considered. Subgroup 1 ("warm comfort", see Fig. 8 ) is most frequent for respondents living in KG class Cf, while the least frequent for Af and Aw ( Fig. 11 ). Subgroup 2 ("central 3 category comfort") gets more frequent with cooler climates such as Cf and Cw. Subgroup 3 ("wide comfort range") has in general a low frequency, but is most prominent in Cw. Subgroup 4 ("cold comfort") and subgroup 5 ("linear") are most frequent in Af and Aw. Subgroup 6 ("cool comfort") has the highest frequencies for "Aw", "BS" and "BW".
The X ²-test of independence reveals that there is a significant association between climate of residence (KG class) and the subgroup number (X ² = 1427, df = 30, p-value < 0.0 0 01).

Language
Questionnaire language, language type, and whether the respondents were native speakers of the questionnaire language might have affected the results of the free-positioning task. According to the X ²-test of independence, the distribution of subgroups for the free-positioning task for thermal sensation anchors on the thermal comfort scale differed significantly between the questionnaire language versions (X ² = 2027, df = 100, p-value < 0.0 0 01). The analysis of the contribution (in %) of a given cell to the total X ² score reveals that the most contributing cell is the language Farsi/subgroup 6 (16.4%/ + ). This is followed by German/subgroup 1 (10%/ + ) and Korean/subgroup 3 (5.5%/-). There is also an effect of language type (X ² = 582.12, df = 15, p-value < 0.0 0 01) (for a description of types see Table 2 ). The analysis of the contribution (in %) of a given cell to the total X ² score reveals that the most contributing cell is language type 3c/subgroup 6 (50.3%/ + ). This is followed by 3c/subgroup 2 (8%/-), 2/subgroup 6 (7.6%/-) and 2/subgroup 3 (5.7%/ + ). Note that the 780 votes for language type 3c are solely from the Farsi and Greek language version.
This study incorporated two languages written from right to left (Arabic and Farsi) so that the direction of writing might have confounded the results. The X ²-test of independence reveals that, the distribution of subgroups for all free-positioning tasks differs significantly between these two language versions and all other data (sensation: X ² = 84.8, df = 5, p-value < 0.0 0 01; comfort: X ² = 124.1, df = 7, p-value < 0.0 0 01; acceptance: X ² = 263.3, df = 7, p-value < 0.0 0 01; sensation on comfort: X ² = 355.3, df = 5, p-value < 0.0 0 01; sensation on acceptance: X ² = 348.2, df = 5, p-value < 0.0 0 01). However, inspection of individual cells' contri- butions does not reveal a consistent pattern, which could be attributed to respondents starting from a different direction.
One third of our respondents (2731, 33.6%) were non-native speakers of the questionnaire language. As mentioned above, questionnaires were provided in the local language (all 21 language versions are available online [45] ). The percentage of non-native speakers varies strongly between countries and applications. For example, the percentage of non-native speakers is below 5% in questionnaires from Ecuador, France, Iran, and Italy, while it is above 60% in Australia and the United Kingdom, which corresponds to differences in percentages of foreign students in respective countries. The X ²-test of independence showed a significant difference between the distribution of subgroups between native speakers and non-native speakers (sensation: X ² = 28.3, df = 5, p-value < 0.0 0 01; comfort: X ² = 122.9, df = 7, p-value < 0.0 0 01; acceptance: X ² = 31.5, df = 7, p-value < 0.0 0 01; sensation on comfort: X ² = 41.2, df = 5, p-value < 0.0 0 01; sensation on acceptance: X ² = 61.5, df = 5, p-value < 0.0 0 01). However, the inspection of individual cells did not reveal consistent patterns amongst the five free-positioning tasks.

Discussion
The questionnaire and the assessment procedure developed for this study is new, therefore methodological aspects will be discussed first in this section. This is followed by a discussion of our findings and their implications.

Methodological aspects
Methodological aspects refer to the questionnaire itself and potential limitations and implications for future applications arising from this study.
Earlier attempts to challenge the equidistance assumption of the thermal sensation scale used either the method of graded dichotomies [46] or the successive categories method [47] . The method of graded dichotomies require large numbers of actual votes together with measured indoor environmental conditions. The successive categories method requires subjects in similar thermal conditions. In both methods, participants state their thermal perception for one specific thermal condition, so that for example a combination for one specific thermal sensation vote with one specific thermal comfort vote is obtained. In contrast, the questionnaire using the free-positioning task can be applied to analyse participants' conceptions of rating scales without any physical measurements and assesses the full range of the scales at the same time. The influence of actual conditions on the free-positioning task will be explored in future analyses.
The present sample consists of university students in lecture rooms. This homogeneous sample in terms of subjects and buildings was selected in order to focus on contextual differences such as season, climate, and language. It can be expected that the results will differ for other populations and in other circumstances. The method may introduce some bias since all surveys are conducted in somehow controlled-environments, such as college buildings, which can lead to a narrowing of the personal expectations (no user expects to be very cold or very hot inside these buildings). We requested from all data contributors to involve only students not having had lectures on indoor environment, building physics, or building services in order to mirror thermal comfort concepts of laypersons in our study. Future work, extending the current sample and context (e.g. by including different building types), is highly encouraged in order to analyse additional influences on the perception of scales.
There was a substantial number of questionnaires with missing data. The percentage of fully completed questionnaires was 33.4%. The percentage of questionnaires having all mandatory questions answered was 65.4%. The high proportion of surveys with missing data may affect generalizability, however we have reasonable confidence in the results presented above as the missing responses were mainly against secondary questions (i.e. not the free positioning tasks that are the primary focus of questionnaire). One potential reason for the high number of missing responses may be the context in which the questionnaire was distributed (e.g. at the end of a lecture, when students would like to leave the lecture room) in combination with the complexity of the questionnaire, which is in part atypical and requires a high level of comprehension. The latter is reflected in a number of comments (3.5 %) stating that the questionnaire was very complex and not easy to answer. The number of comments on complexity was low compared to the subjective impression of the researchers involved. 31 out of 51 groups (60%) experienced respondents reporting verbally about the complexity of the questionnaire, but some of these respondents seemed not to have added such comments in the provided space in the questionnaire. The estimated median of feedback of students reporting difficulties by these 31 research groups was 10%. Therefore, the total number of respondents perceiving the questionnaire as complex or not easy to answer is likely higher than 3.5%. Future applications need to consider such rate and increase their sample size accordingly or need to find solutions to reduce the complexity.
During the preparation of this study, two potential ways of balancing parts of the questionnaire were considered and discussed: (1) balancing the order of the free-positioning tasks and (2) balancing the direction of drawing. With respect to the order of the free-positioning tasks, respondents may understand the way of responding to the free-positioning tasks only after the first or second question. Therefore, balancing the order might change the response patterns, especially for the first two questions, in case respondents did not understand the given task, but did not want to change their response later. However, it was decided not to balance the order, because the latter questions looking at the relationship between verbal anchors of different dimensions built upon the interaction between individual scales and the order sensation, comfort, acceptance is the common order these questions are presented. Future applications might have to think about a first "dummy" freepositioning task in order to accustom respondents with this type of questions. Here, we decided to show examples of responses in the free-positioning task in order to visualize the way the task was expected to be completed. The preference scale traditionally used in thermal comfort studies was not used because previous applications of the free-positioning task [12] , including our own pilot application [48] , suggested that this scale is often misinterpreted by respondents [ 12 , 48 ]. A similar tendency has been pointed out by Humphreys et al. [10] .

Validity of responses
As presented in Tables 6-8 , a remarkably high number of respondents contradicted themselves with their drawings and the answer to the question whether the anchors are supposed to be equidistant. Overall, the majority of those participants, who answered that their drawing is not equidistant, did not draw the markers as equidistant. In contrast, the percentage of those participants, who answered that their drawing is equidistant and indeed did draw the markers as equidistant, is not similarly high. For example, for thermal sensation and with a conservative tolerance level of 2.2 mm for the drawings, 91.7% (1561 out of 1703) of the respondents, who answered that the markers were supposed to be equidistant, produced drawings with no equidistance between makers. Even by assuming that some drawings might not came out as intended by the respondents or were done in a sloppy way by applying a liberal tolerance of 6.7 mm, still 39% (665 out of 1703) of the respondents, who answered that the markers were supposed to be equidistant, presented drawings that could not be considered as such. This contradiction, even with a high tolerance, is an interesting observation, illustrating that the type of question or task can strongly affect respondents' responses. This phenomenon is well-known in cognitive psychology, for example, demonstrating that whether response alternatives are phrased in a vague or precise manner results in differential response patterns. In addition, decisions of individuals vary strongly on whether choices are phrased in a positive or negative context [49][50][51] . Accordingly, our findings illustrate that responses on scales and their anchors can be strongly influenced by the way participants are questioned. A simple question might trigger a simple answer. It is conceivable that such simple answers hide more complex conceptions of respondents, which can be observed in the free positioning task. It is impossible to decide which answer is the "correct" one, but this contradiction highlights that questionnaires and surveys have to be constructed carefully to minimise the risk of response biases (see [ 52 , 53 ]).
Subgroups 5, which were characterized by the remarkable linear patterns in Figs. 8 and 9 , are characterized by a low variance, linearity, and nearly perfect equidistance in a question, for which no equidistant answers were expected. This might suggest that respondents did not understand the question or may not have paid enough attention to understand that their response pattern should change between the first three free-positioning tasks and the last two. This assumption could be partially confirmed, if the same re- Table 6 Comparison of equidistance between verbal anchors for thermal sensation and the answer to the question, whether the drawing was intended to be equidistant.

Q1: Sensation
Drawing spondents belonged to subgroup 5 in both free-positioning tasks. However, only 312 (of 538/517) respondents, i.e. about 60%, belonged to subgroup 5 in both cases. In addition, these respondents were distributed across different language applications, suggesting that there was no systematic distortion with a specific language version or application. Looking at the percentage of respondents from each application who belonged to subgroup 5, the highest values (between 40 and 50% can be found in the applications from Nigeria (three out of four applications by two different research groups) and Malaysia (one out of five applications by one research group). Both these regions are characterized by hot and humid climates with mostly stable outdoor temperatures year-round. According to the discussion above, this climatic context increases the likelihood that the verbal anchors "cool" and "cold" are related to "comfortable" and "acceptable" conditions and that the dual nature of the scale (warm/cool with "neutral" in the middle) is not perceived in the same way as in climates with distinct cool and warm seasons. Therefore, these responses should be considered as valid, unless further analyses of additional answers promote the suggestion that respondents did not fill out the questionnaire properly.
The validity of results can further be questioned with respect to the question whether respondents were able to relate a specific perception of intensity (thermal sensation vote), for example, to the corresponding affective response (thermal comfort vote). It could be argued that the free-positioning task does not explain how people would vote under real conditions. Respondents in the present study also described their current level of perceived thermal sensation, thermal comfort, and thermal acceptance. In Fig. 12 , the distribution of votes for thermal sensation and comfort for the same subgroups as in Fig. 8 is very similar to the patterns of the free-positioning task presented in Fig. 8 . This suggests that the answers in the free-positioning task for the relation between thermal sensation and thermal comfort are likely close to what people would vote in a natural setting. It also suggests that people, after having many thermal experiences in their life, are able to associate sensations to comfort independent of their current thermal state, but dependent on the prevailing context such as season. In case these findings are supported by further analyses, the freepositioning task for thermal sensation on thermal comfort could serve as a tool to categorise participants of future studies without having to expose them to a large number of thermal conditions. The same figure as Fig. 12 but for current votes of thermal sensation on thermal acceptance is presented in the supplementary materials. In contrast to the comparison between Figs. 12 and 8 , the relationship between thermal sensation and thermal acceptance is less similar between votes of the current thermal sensation and the free-positioning task ( Fig. 9 ). This dissimilarity can be attributed to the lower number of scale points for thermal acceptance (4) or that there is more variance in what is considered as acceptable, which can result in a larger dispersion between votes of the current sensation and the free positioning data.

Effect of language
Given the large number of language versions together with known issues related to the preparation of questionnaires in a multi-language context, the effect of language on the results needs to be discussed.
Translations, language, language type. Despite its widespread application, the ASHRAE thermal sensation scale as well as the thermal comfort and thermal acceptance scales are not defined in large number of languages in the existing standards. Therefore, individual researchers or groups of researchers developed their own equivalent scale, so that the quality of the translation of the questionnaire for these cases depends on the effort made by the translator. The sources for the translation of verbal anchors for each language version together with additional comments can be found in the supplementary material. Note that emphasis for translations was on the meaning of the translation. Zavala-Rojas [54] referred to various past publications to underline the importance of functionality equivalence in the analysis. Functional equivalence theory takes into consideration the relationship between the original receptors and the original text [55] . The translation has not necessarily to be identical, but measured concepts should have comparable behaviour in statistical analysis [54] . We found distinct differences between language versions contributing significantly to certain subgroups of the free-positioning task, in particular disproportionate Farsi to "cool comfort" subgroup 6, German to "warm comfort" subgroup 1 and Korean being underrepresented in the "wide comfort group" subgroup 3. A similar significant effect was found for the language type (definition see Table 2 ). Relationships between current thermal sensation votes and current thermal comfort votes for the subgroups found by the LCR analysis of the data from the freepositioning task of thermal sensation on thermal comfort (see also Fig. 8 ).
An additional observation is that the neutral point was translated in some languages to follow the logic from left to right, while in other languages the neutral point is breaking the linearity from left and right. For example, in German, the votes on the left go from "kalt" (cold) to "heiß" (hot) on the right, but the neutral point is defined as "weder warm noch kalt" (neither warm nor cold). In contrast, for example, in Spanish, the scale ranges from "fria" (cold) to "calurosa" (hot) and with the neutral point corresponding to "ni fresca ni calida" (neither cool nor warm). Research on linguistics of taste [56] suggests that it is important to connect to the commonly used way laypersons talk about certain phenomena. Correspondingly, the German researchers agreed that the order "weder warm noch kalt" (neither warm nor cold) is perceived as more natural compared to "weder kalt noch warm" (neither cold nor warm"). Nevertheless, whether the order of the middle point adjectives affects the results needs to be assessed in future research focusing on variations of individual languages.
Overall, it is important to remember that languages developed under certain climatic conditions and a specific context in terms of culture and traditional constructions. However, some languages spread out later to other climates and contexts, which requires regional variations. The analysis of such processes might be an interesting work to be done in cooperation with linguists, but at the same time, the validation of existing and future verbal anchors might benefit from corresponding discussion.
Direction of writing. During consistency checks, a higher percentage of non-consistent response patterns was observed for question 2 (thermal comfort) in the Arabic version. One potential explanation for this effect might be the direction of the text in the Arabic version. While the text is from right to left, the extreme values of the scale (i.e., "comfortable" and "extremely uncomfortable") were not switched. As a result, while respondents go through the A to C options (placed on the right side of the scale) they still need to place them from left to right, which can be counter-intuitive for a respondent used to writing from right to left. The scales of the other questions (thermal sensation and thermal acceptance) were not switched. However, the occurrence of errors was lower for the sensation and acceptance scale, which might be explained due to their values being easier to understand (e.g., very hot, hot, etc.) compared to "slightly uncomfortable", "uncomfortable", and "very uncomfortable" (see also more specific comments related to the language version found in the supplementary materials). This study included two languages written from right to left: Arabic and Farsi. According to the X ²-test of independence, the distribution of subgroups for all free-positioning tasks differs significantly between these two language versions and all other data. However, inspection of individual cells' contributions did not reveal a consistent pattern, which could be attributed to respondents reading from right to left. In this context, it should also be noted that preference for drawing with the left or right hand was not assessed, which might have a similar effect than that of languages written from right to left.
Native speakers. One third of respondents were not native speakers of the questionnaire language. Their distribution over the subgroups was significantly different from native speakers, although no distinct pattern was observed. Future analyses need to look at these effects in more detail.
In summary, the results show that there are differences between languages, suggesting that translations might not have been functional. However, whether this is due to the language or the fact that some contextual conditions are more frequent in a specific language region (e.g. that a specific language can be related to a specific climatic zone), needs to be assessed in future analyses, taking into account more than one predictor at a time. Such analyses are beyond the scope of this paper and will be reported in future publications.

Effect of Köppen-Geiger classification
The analysis of contextual influences on the perceived relationship between thermal sensation and thermal comfort revealed a significant effect of the KG class on the pattern of this relationship. The KG class was chosen for this study due to its wide spread use in the field of thermal comfort (see e.g. [57] ) and the advantage that KG classes could be automatically derived based on publicly available resources. At the same time, it should be mentioned that limitations in using the KG for thermal comfort studies were shown [ 58 , 59 ]. The main argument is that the KG method was developed to characterize climate types based on plant species and not for human perceptions. The development of a classification scheme for human beings within the built environment is still an open task. The authors of this manuscript encourage any joint activity to develop a classification scheme suitable for human beings within the built environment.

Semantic artefact hypothesis
Results on contextual influences on the perceived relationship between thermal sensation and thermal comfort suggest that seasonal and climatic influences affect this relationship. These observations can be related to the "semantic artefact hypothesis" [18] , which suggests that the preferred temperature in cold climates may in fact be described as "slightly warm," while residents of hot climates may use words such as "slightly cool" to describe their preferred thermal state. For instance, a study conducted in naturally ventilated and air-conditioned classrooms in a hot humid climate in the northern area of Brazil (São Luís, Maranhão -Aw in KG classification) showed that 100% of the occupants, who considered the environment as "slightly cool", were comfortable, which was not the case for the "neutral" responses [60] . On the other hand, in a southern city with a humid temperate climate (Florianópolis, Santa Catarina -Cfa), the preference for a "slightly cool" condition was only observed in naturally ventilated offices in summer [61] . Based on a large sample, the semantic artefact hypothesis is supported by the analysis of the ASHRAE Global Thermal Comfort Database II showing that the effect was stronger in warm climates, but not as prominent in colder climates [22] . The present study suggests that the relationship between thermal sensation and thermal comfort is not static, that is, not solely related to characteristics of the climate, but dynamic and also affected by conditions at present. Due to the interaction between climate and season, an additional multiple variable analysis is necessary to validate this observation.
The authors of the "semantic artefact hypothesis" [18] further state that despite such differences in the interpretation of the verbal anchors, the actual preferred temperatures may be identical and that neutral temperatures, meant as optimal temperatures, may be lower than observed neutral temperatures in warm climates and higher in cold climates. This assumption incorporates the understanding that "neutrality" would be a desired condition. However, researchers repeatedly identified a discrepancy to this assumption by users who declared satisfaction or comfort while feeling warm or cold [ 62 , 63 ].
Further studies have shown that this is likely an incorrect assumption, at least for occupants of real buildings. This type of work generally implements a complementary "preference" scale to determine not just right-here-right-now thermal sensation, but also whether that sensation is a desired condition. Humphreys and Hancock [62] refer to this approach as the double enquiry method [62, p. 868]. Related studies found that in a considerable proportion of collected responses occupants' desired conditions are different to neutral. For example, Humphreys and Hancock's review of evidence from two thermal comfort studies (a sample of tertiary students and a sample of occupants of "ecological" houses) found that on 57% of occasions, occupants' desired conditions were different to neutral (n = 868 individual thermal comfort surveys). Similarly, Shahzad et al.'s [64] study of four office buildings reported that in 36% of responses, occupants' desired conditions were different to neutral (n = 313 thermal surveys). Using a qualitative approach with the objective to assess peoples' concepts of thermal sensation and comfort by means of interviews, 24% of 61 subjects in a study by Schakib-Ekbatan et al. [65] mentioned the middle category ("neither cold nor warm") as difficult to describe. In addition, only few others (N = 4; 6.6%) mentioned positive affective thoughts related to the verbal anchor "neither cold nor warm".
In addition to language and season, some findings might be influenced by experiences and expectations. For example, the results on thermal comfort ( Fig. 6. b) could be related to students' experiences and expectations, reflecting that the acceptance of slight thermal discomfort is a common and expected condition in lecture rooms. At the same time, while a slight lack of comfort is found of-ten, it is not the exposure to extreme environments such as those related to hot / cold, very uncomfortable and unacceptable, which could be associated with the mental distance to the verbal anchors located more at the extremes.
These observations highlight again that there is still a need to scrutinize thermal perception scales to better qualify user responses and to continue the discussion on the operationalization of thermal satisfaction. At the same time, the authors of this paper are aware that "optimal" conditions are neither necessary nor beneficial in all occasions [ 66 , 67 ].

Practical implications and future work
The goal of the present study was to challenge common assumptions made in the use of subjective assessment scales for thermal environmental research. The practical implications of this research can be divided into two parts. The first part consists of recommendations related to study design, data processing, and presentation and interpretation of data from thermal comfort studies. The second relates to the use of thermal comfort data to inform building evaluation and specification.
Focussing first on the implications for future thermal comfort research, several recommendations can be drawn from the analyses and discussions presented above: -The distinct patterns of the subgroups found in the present analyses clearly indicate that the parametric approaches to presentation and analysis of the data are questionable. Instead non-parametric visualisation and analysis methods should be used. -Sometimes, results from thermal comfort studies (either based on actual or predicted votes) are presented as e.g. mean values with two decimals. The variation found in the present study once more clarifies that a) it is important to look at the mean in combination with the variance (both overall and in sub populations) and b) the apparent accuracy of such values is of no practical meaning, thus a lower number of decimals is sufficient. -The large impact of contextual factors (i.e. climate, culture, and language) on how scales are interpreted by subjects leads to the recommendation that future work should be more rigorous in reporting the context in which a study was conducted together with the language version and details of the questions and scales applied.
In the present study, climatic and seasonal variables played a role as well as the language of the questions and potentially the exact wording or visual representation of the questions and verbal/visual anchors. Therefore, the authors of this study encourage all colleagues to make reporting of these factors a minimum requirement for publication of studies in thermal comfort research and to be rigorous in testing the characteristics of the scales they apply in their studies. Authors of the present paper deliberately decided not to suggest changes in current standards dealing with thermal environment [1][2][3][4] , because several issues remain that need to be further studied as discussed above. However, it would be beneficial to consider providing information about possible influence of climatic context and language of the verbal anchors on interpretation of thermal perception scales and obtained results through their application in the standards. These findings also reveal the weakness of assessing only a single dimension of thermal perception. An important area for future work is to examine potential combinations of thermal scales that allow us to more directly infer what conditions can be considered as "comfortable" or "acceptable", similar to work on scales used in pain research [6] .
Besides the implications for research, the results of the present study are also relevant to practice. The current focus on energy efficiency in buildings has led to an increased interest in collecting feedback from occupants. Such feedback mostly concerns the thermal environment. The most widespread applications are post occupancy evaluation (POE) or post occupancy monitoring (POM) [ 68 , 69 ] that serves for documentation of building performance after it has been taken into use or, in the long-term, as a performance optimization tool for facility management. Additionally, there are attempts to include building occupants and their perception directly into a control loop [70] . In the case of post occupancy evaluation, the occupant feedback is usually collected in the form of paper or internet based questionnaires [71] . When perception is to be included in the control of building services, desktop polling devices or mobile phone "apps" are utilized [72] . Apart of the type of the interface, various forms of assessment or preference scales are utilized. In some cases, these scales are constructed based on knowledge from existing thermal comfort research and related international standards [ 2 , 4 ]. When designing feedback processes to the building or HVAC control loops, it will be necessary to analyse this aspect in greater depth in order to establish interfaces adapted to the user's actual needs with greater precision.
Here, it is important to emphasize the dynamic relation between sensation and comfort, influenced by other factors than thermal perception (e.g. climatic context), and thus the importance of their assessment on separated scales. As the conditions found to be "optimal" or "comfortable" may vary significantly depending on the dimension or combination of dimensions assessed, further work and discussions are necessary in order to understand based on which dimension or combination of dimensions a building should be designed or operated in which context. For example, it is important to decide whether set point temperatures should be based on results from thermal sensation, thermal comfort, or thermal acceptance. In addition, future research may aim at the development of new scales more robust to those influences identified in this research.
The development of any new scale needs to be carefully validated with well-planned experimental studies in the laboratory or field. During such attempts, further aspects such as influences between visual and textual individuals [ 73 , 74 ] may be considered. In addition, the need to develop more detailed scales could be discussed, especially in the central region around neutrality and comfort. This aspect became of interest in the context of environmental control systems based on perception and adaptation of the users [ 75 , 76 ].

Conclusions
The present paper questioned the validity of three basic assumptions on scales used in thermal comfort studies. First, the verbal anchors that label thermal sensation scales and thermal comfort scales are assumed to be equidistant and are often statistically treated as metric scales. Verbal anchors labelling thermal acceptance scales are assumed to have a defined distance with the two centre labels positioned just at + /-0. Second, based on the idea that thermal comfort is experienced when thermal neutrality is achieved, the three middle votes of the thermal sensation scale are seen as representing comfortable conditions and the environment as thermally acceptable. Third, the way verbal anchors relate to each other is assumed to be independent of context, with particular reference to short-term and long-term climatic context. Considering the huge emphasis on balancing thermal comfort with energy saving issues in buildings, as well as the availability of technologies capable of controlling HVAC systems as a function of user needs, a proper interpretation of the user's thermal comfort needs becomes of primary importance.
Based on the results of the present world-wide questionnaire survey translated into 21 languages leading to a dataset of 8225 valid questionnaires, this study found that: -General trends based on the complete dataset show that the distances between verbal anchors on thermal sensation and thermal comfort scales are not perceived as equal, in contrast to common assumptions. -The trend for the thermal sensation scale is close to equidistance based on visual inspection, but not according to the statistical analysis. For the comfort scale, the grand average and statistical analysis showed that verbal anchors tended to be distributed in a non-uniform way and that this affected the verbal anchors to be identified as comfortable on the thermal sensation scale. -With regard to the thermal acceptance scale, the interpretation of respondents was equidistant contrary to the researcher's concept behind the wording "just acceptable" and "just unacceptable" marking both the centre of the scale but with a clear tendency towards the one or the other pole of the scale. -Results of the free-positioning task combining thermal sensation with comfort, and thermal sensation with acceptance showed a skewed distribution leading to non-equidistance between the thermal sensation labels drawn on the comfort and thermal acceptance scale. -Significant variations in perceived distances between verbal anchors appeared when latent class regression analyses were applied, identifying subgroups with distinct scale interpretation patterns. The latent class analyses indicated the existence of six different distributions of verbal scale anchors, pertaining to subgroups of different consistence, but showing statistically significant relationships with both season and long-term climatic characteristics of the place of residence. For example, respondents residing in hot climates tended to assign comfort to thermal sensations on the cold side of the scale (from "cold" to "neutral"), while respondents residing in mild and colder climates assigned comfort to thermal sensations ranging from "neutral" to "slightly warm" thermal sensations. This finding clearly indicates that the interpretation of scales and their relationships is subject of adaptation. -Additional analyses showed a statistically significant effect of the survey language on the interpretation of scales demanding further work to validate the functional behaviour of newly translated questionnaire versions.
A comparison between the subgroups' pattern in the freepositioning task and the votes assessing the actual perception of the respondents during the survey revealed strong similarities between these patterns for thermal sensation and thermal comfort votes. This finding can be interpreted in that the thermal perception concept of the respondents represented in the first part of the questionnaire is consistent with the current evaluation of the thermal environment. Whether the interpretation of the scale or the actual perception acts as cause or effect remains a question for further analyses. Nevertheless, the present results suggest that the free-positioning task can be used to assess participants' interpretation of scales and thereby can be used for more reliable analysis of obtained perception votes. In addition, results suggest that the free-positioning task for thermal sensation on thermal comfort could serve as a tool to categorize participants of future studies without having to expose them to a large number of thermal conditions. In order to extend the results presented in this paper, it is necessary to undertake multiple variable approaches because of the relationships between season, climate, and language, and to evaluate whether observed effects are attributable completely to a single factor or to combinations of factors, e.g., when some languages are more prominent in specific climatic re-gions with specific seasons. Such analyses are beyond the scope of this paper and will be addressed in future analyses of this dataset.

Data records
The dataset is available at https://doi.org/10.17605/OSF.IO/ 9P2GQ Data citation: [45] The database is open for additional submission, e.g. applied in other activity contexts, e.g. offices or residential places, and other target groups, i.e. non-students.

Declaration of Competing Interest
None CRediT authorship contribution statement