The impact of classroom design on pupils' learning: Final results of a holistic, multi-level analysis

Assessments have been made of 153 classrooms in 27 schools in order to identify the impact of the physical classroom features on the academic progress of the 3766 pupils who occupied each of those specific spaces. 
 
This study confirms the utility of the naturalness, individuality and stimulation (or more memorably, SIN) conceptual model as a vehicle to organise and study the full range of sensory impacts experienced by an individual occupying a given space. In this particular case the naturalness design principle accounts for around 50% of the impact on learning, with the other two accounting for roughly a quarter each. 
 
Within this structure, seven key design parameters have been identified that together explain 16% of the variation in pupils' academic progress achieved. These are Light, Temperature, Air Quality, Ownership, Flexibility, Complexity and Colour. The muted impact of the whole-building level of analysis provides some support for the importance of “inside-out design”. 
 
The identification of the impact of the built environment factors on learning progress is a major new finding for schools' research, but also suggests that the scale of the impact of building design on human performance and wellbeing in general, can be isolated and that it is non-trivial. It is argued that it makes sense to capitalise on this promising progress and to further develop these concepts and techniques.


Overall
Thank you to the reviewers for their thoughtful comments. We feel the paper is much improved as a result. Our responses to your comments and suggestions have been addressed below and incorporated as appropriate into the revised paper, highlighting with green. Thank you!

Reviewer #1:
The authors investigated the impact of school built environments on pupils' learning progress. They collected large amount of data and analyzed them comprehensively. Their findings and suggestions would be useful to improve primary school's built environments. In particular, Environmenthuman-performance (E-H-P) model would be useful to develop design parameters in school design as the authors demonstrated in their manuscript. Additionally, there are some minor suggestions in the manuscript.
Thank you for these positive overall comments.
We are grateful for the "minor suggestions" listed below, which we have addressed individually (as set out below) with a resultant improvement in the paper.
a) The authors might want to avoid or reduce redundancy in their manuscript.
Where possible the text has been edited down. b) The conclusion is long and could be shortened.
We have looked at this and do feel that all the points are worth making to sum up the study. Thus we would like to leave it as it stands. c) The authors used a multi-level model, a software package MLwiN, and'-2*loglikelihood' but their explanations were not clear enough to demonstrate its validity and reason to choose in this study. The authors need to clarify them.
The rationale and explanation of this approach has been strengthened in Section 3.5.
d) The authors might want to review their manuscript thoroughly again. They use "this study," "this paper," "this project", etc. If they are different, they might want to clarify them. Or, one word would be helpful.
This has been checked and addressed. In almost all cases "this study" has now been used. In a couple of isolated instances it is more accurate to say "paper" or "project" and so these, exceptionally, have been retained. e) If the author want to use an acronym of Naturalness, Individuality and Stimulation, "NIS" would be easy to understand rather than "SIN." Or, the authors might want to change the order.
The introduction of SIN has been deferred until the conclusions to avoid any confusion earlier. However, it is thought worth retaining this acronym as feedback received indicates that it does make the three principles easier for people to rememberand they are quite a radical departure *Detailed Response to Reviewers beyond the normal factors included to date. However, there are still arguable issues in this study. Pupils' learning can be influenced by their socio-cultural and economic environments, not just built environments. Although the authors include various situations of schools in their analysis, their investigations were rarely controlled in the socio-cultural and economic environments. For example, if the cases demonstrate similar socio-cultural and economic, yet different built environments, the impact on learning could be more applicable in school design.
The study does address the pupils' "sociocultural and economic environments" in several ways. Direct measures were not obtainable, but in the UK Free School Meals is a common measure of deprivation and this was explicitly factored in at the pupil level. Beyond this variation in these factors is to be expected given the chosen areas of the study (see 3.2) and the impact of this not picked up by FSMs is compartmented in the unexplained variation at the pupil level. These points are clarified in 3.1, 3.4 and 4.3.

Reviewer #2:
Overall observation The paper explicates many potential social & building factors in educational settings that affect children's school performance measured by standardized test scores. First, the authors build a model of Environment-humanperformance, which provide holistic and contextually sensitive conceptual framework. Second, each of environment, human, and performance is further operationalized. Environment is operationalized through their SIN model (Stimulation, Individualisation, Naturalness). Human is operationalized through various indicators that students bring such as age, gender, special educational categories, etc. Naturalness is operationalized through various indicators of IEQ including visual environment and window views.
True strength of the study lies in the complex conceptualization of humanenvironment interactions as suggested by EHP model and use of MLM to reflect the nested nature of school setting. The strong statistical results (portions explained by the model and weight of individual factors) also stand out. The paper can be further enhanced by attending following issues: Thank you for these positive overall comments.
We are grateful for your careful reading of the paper and the issues raised with a view to the paper being "further enhanced". We have addressed these individually (as set out below) with a resultant improvement in the paper.
1. Methodology section needs major revision. As it stands now it is less well organized and hard to follow.
A range of improvements have been made and are spelt out below against the more specific comments. 2. The labelling of some of important constructs is awkward and misleading. (e.g., naturalness and natural environment). Rather than reinventing words for well The labelling we use is rooted in the neuroscience-informed basis of the conceptual model used. The development of this model has been published as a separate studied constructs, using widely accepted terminology is easier to read and understand.
paper, but that said we have clarified the provenance and logic of the labels "naturalness", "individualisation" and "level of stimulation" in section 1.2. 3. There are too many tables and diagrams, which are informative but nevertheless not central to the argument.
We have reviewed these and feel they aid the clarity of the paper, however, Table 6 has been removed. 4. Descriptions on measurement and reliability of those measurement tools are completely absent. In the context of modelling, such measurement needs to be explicitly described and reliability data has to be provided. Section 2.2 does explain how a hierarchy of principles, parameters, indicators and factors was created. However, to aid clarity further, an extra column has been added to Table 1 to provide information on the measures used for the factors. Section 3.3 then sets out how these measures were informed by a variety of data collection techniques and then represented on 5-point scales.
Line-by-line comments are provided below. Page numbers are missing from manuscript. Thus, my comments are provided section-bysection, then line-by-line. Abstract: Could be reorganized to better reflect the conceptual strengths and overall structure of the study. For one thing, the study is informed by SIN model with three major constructs of naturalness, individuality, and stimulation. Each of these constructs was further divided into individual independent variables of light, temperature, air quality, ownership, etc. And then these IVs were operationalized further.
The way it is written does not really show the sequence of the conceptual process: The first part starts with individual IVs and the second part talks about three major constructs as if they are not related to each other.
Good point. The abstract has been reorganised. Section 1.1: According to the description, it sounds as though the first stage is conducted a while ago, and the second stage reported in the paper is a separate process with separate set of data, though informed by the first phase. But the figure 1 seems to imply that the data set from both phases are used for the analysis reported in the paper. Please clarify.
impact on various human behavior, they briefly mention the studies without really conveying how the studies demonstrated the complexity. A short expansion of what the major issues were in these studies will help readers understand and relate the issue of complexity/holistic approach portrayed in those studies to the current study. Section 1.3, Section on "Naturalness": The literature review in this section does not convey the main point of the author, but rather sounds more like a laundry list. Literature review is not really list of what is going on in the field, but rather a coherent back up for the author's argument. What is the main point that the authors are trying to convey by listing these studies?
Secondly, the current literature review mostly relies on review studies. Strong literature review will rely more on empirical studies, rather than summaries done by others.
Third, the word "Naturalness", which is one of three major constructs of the study, does not appear to represent environmental parameters measured. Most of the environmental parameters that the authors are referring to here acoustical environment, thermal quality, IEQ, and visual exposure to nature is a classic example of IEQ and I wonder why the authors needed to invent another name for the construct when there is a widely accepted term in the field. Arguably, IEQ and access to nature (or Ambient environment) will give better expectations to the potential readers.
It is now made clear that the purpose of the section is to show that there are studies of individual aspects of the schools and their impacts on learning. So there is plenty of potential for influence, but after the three areas have been explored (necessarily quite briefly in the space available) the summing up makes it clear that the impacts of these factors when experienced together is not known and that this is what we are going to explore.
We have given strong examples of specific studies now, although the use of review studies has its role.
We have explained why we used the word "naturalness" above and clarified this in the text. It does link to the terms like IEQ, but actually comes from an argument that places it alongside individualisation and stimulation. There is s logic to this typology and arguably IEQ will in the future move to expand its boundaries so we wouldn't want to use it as shorthand.
Equally all three aspects are to do with "ambient" environment, so we do not want to change this term and cause confusion.
Section on "Individualisation": Like the previous section, this section needs operational definition of the construct clearly stated upfront. What do the authors mean by individualization? Instead, the section simply starts with "an optimal built environment" benefits students in some way or another.
in better individualization / personalization . Section on "Stimulation": This is an interesting addition to commonly studies factors in classroom environment. Again in this section of literature review, the authors need to include primary studies rather than review.
Dealt with as above with preamble and more focused refs.
Fragmented sentence addressed. Section 1.4.: I am not sure whether this section is needed at all, especially when the overall structure is not so much different from conventional research report. Consider removing.
These few lines could be deleted, but we feel brief orientating statements like this aid readers in navigating the paper.
So far, the authors did not provide main research questions nor set of hypothesis. Preferably, these components should come out at the very first part of the introduction as well as definitions of major constructs studied.
The Aim of the study is given at the very start of the paper in 1.1 and the research challenge is then developed in 1.2 together with the proposed conceptual approach and then in 1.3, the link to existing knowledge and the gap to be addressed. This seems to us to be a reasonable lead in to the point where hypotheses can be introduced. We can see that the main hypothesis for the study is more implied than stated, so it has now been made explicit at the end of Section 1.2. Section 2.1.: A holistic model that well supports the study overall Thank you Section 2.2: By reading the description, it is not clear why the authors named their theoretical model as E-H-P. Likewise, the table that this description refers to do not provide any insight as to why the authors use this term. How are these related to the concept of SIN? It appears that SIN is part of Environment, but what about H and P? Does H refers to socioeconomic characteristics and P school performance measured by NC score? If so, clearly state so. While this section extensively talks about hypothesized factors to pupil's learning progress, it does not offer hypothesis itself. The authors needed clear directional hypothesis for overall model with a strong emphasis on the holistic model provided in the previous section. For the same reason, the term "the creation of these hypotheses" in line 12 should be rewritten as "the creation of these hypothetical factors" " Clearly E-H-P has caused confusion and so to avoid this we have dropped the term and instead used the well know Environment-Behaviour (E-B) concept, including a reference to its past use in a similar study.
Section 3: The first two lines of this section These could be deleted, but we feel brief appear unnecessary as it is normally expected.
orientating statements like this aid readers in navigating the paper. Section 3.1: Plenty of climatic information about studied sites provides a good background of the study.
The explanation in the third paragraph is very confusing, especially those who are not familiar with UK educational system. It will be easier if the description start with what Key Stage is, how many are there within the entire educational period, and then describe transitions between them. It will be helpful to also state that the study will focus only on the first two key stages (less reception year).
The explanation of the UK terminology has been clarified. KS1 and KS2 cover the whole of the primary school population and this is emphasised now. The exclusion of the reception year is explained later in 3.4 and this seems appropriate to us. What KS stages did the study focus? These explanations, which sporadically appears throughout the study, should be succinctly described in one place around here. As it stands now, it is hard to grasp not only the UK school system but also the study samples unless readers scan through the entire sections multiple times. This is all the more important given the international scope of the journal.
The clarification on Key Stages mentioned above deals with this. Section 3.3: Classrooms: Again, I highly recommend the authors standardize grade notation system throughout paper. The authors sometimes use age, year, or KS in a way they do not well relate each other. As indicated earlier, provide full description of the grade system along with age at the beginning of the methods (i.e., KS-year-age), and choose one notation system and use it consistently throughout paper.

Done.
First page Line 46~second page line 11: This section need to be re-organize to better explain the link between design principles (SIN)-Design parameters-indicators-factorshow these factors were operationalized and measured. It could be potentially combined with Table1 where most of information is already present. Perhaps adding a column to Table 1 has now been augmented with information about the assessment criteria used as suggested by this reviewer. explain how each factor was measured would do it along with brief descriptions in the main body. As it stands now, the bulleted paragraph is very incomplete in terms of necessary information. Missing information to make a judgment about construct validity and reliability includes: 1) How did the glazing orientations were converted to quantitative data? Did southern exposure have higher score than, for example, north? Or were they treated as categorical data? This has been simplified and the headings of Table 4 clarified.
Line 49-56: Some of missing description I indicated in the previous comments actually appears here. It appears that switching this paragraph with the previous one will make the logical sequence better .
We think this flows now given the changes above. We have however, rationalised the terms used for clarity.
Line 58-next page line 5: Again, this paragraph is hard to follow unless the reader is already familiar with UK education system. The main body talks about NC point system and P scale, but table 4, which the main body is referring to is not using any of those language, but uses two new terms, "TA level", and "Points". What does TA stand for? Is "Points" referring to "P-scale"? what is this P-scale? This is complicated, but needs to be given for clarity in our approach. That said we have again rationalised the terminology for clarity.
Line 16-18: The latter part of the sentence is confusing. I am not sure what this sentence means.
The wording has been simplified. This element will be of particular interest to educationalists. Line 28-29: It is not critical, but it would be helpful for potential readers to grasp overall characteristics of the pupil population if the authors provided percent of these special students in addition to the actual numbers.

Thank you
Second page line 4: I am not sure about the "unusual" has been substituted.
term "novel". Any statistical analysis is novel (including simple mean and standard deviation) if it serves the purpose. Perhaps "complex" or similar terms will be more appropriate.
Line 15: The authors switch between twolevel and three-level analysis until far in to the result where they report that the third level is dropped. Reporting here that the study started out with three level model, but ended up with two level model will help readers follow the line of argument.
This has now been made clear.
Line [26][27][28]: not sure "In passing" is common expression in academic journal. Perhaps the sentence would work just fine without it. Deleted.
Line 54: For the significance level of p<0.10, is this conventional number in MLM? It appears to be higher than commonly accepted p-level.
The rationale for this is now explained.
Line 58 in first page and Line 16: Why did the authors used "step-up" procedure and "topdown" process? The explanation does not need to be lengthy but the rational and implications should be briefly noted.
The rationale for the two approaches and their complementary use is given.
Section 4.1., First page line 52: "In the formulation of the light parameter the highest quantity of natural and electrical light, but without direct sunlight, was found to be optimum". It is an interesting finding but nowhere in previous sections, it was mentioned that direct sunlight were measured. Is this a combination of orientation and control?
Findings about individualisation and stimulation is intriguing.
This detail re the light parameter now appears much earlier in Table 1.

Section 4.2: Line 28-35:
This is an interesting finding and may somehow reflect an important characteristic of the UK educational system. In US, school districts are funded by local property taxes, resulting very uneven quality across social strata. I suspect the finding can be applied to US system. A little bit of background information about UK system will be greatly appreciated by international readers .
Also, the fact that the school level analysis was dropped should be reported early when the authors describe analysis procedure in the methodology section.
We don't think this is UK dependent, but that it does link to the fact that primary school pupils (as opposed to secondary school pupils) spend most of their time in one classroom. We stress this in the conclusions and if it pertains in other countries could well translate.
We don't think it is about social strata, and have said more about these factors in response to Reviewer 1.
We indicate that the school level was dropped earlier now.
Section 5: Line 8: Not sure "natural experiment" is appropriate here since no form of experiment was employed. It reminds me of "quasi-experiment", but then this study is clearly different in terms of site selection. I think terms like "natural inquiry" or "contextually sensitive inquiry" or something along this line would be more appropriate. .
We have changed this to "natural inquiry" as suggested.
6.1  This section is well organized and easy to follow. Perhaps the introduction part can replicate a similar structure of this paragraph. Line 40-147: This is a very clear summary and it is easy to follow.
Thank you.
6.2. Line 54-next page line 11, including Figure 4: The word "naturalness" continues to create wrong impression that the study somehow investigated natural environment, not built environment. Particularly in figure  4, the term "Natural Environment" should be termed as physical or ambient environment, and naturalness should be IEQ, an already well established and widely accepted term.
We have changed Fig 4 (now 3) to leave out "natural environment" as we can see how this could be confusing. However, we have retained "Naturalness" (as a feature of the spaces studied) for the reasons given above on query (2).
Line 26: Avoid fragmented sentence . Done. Line 32-44: Again, very intriguing finding. Would any characteristics of UK system contribute to this finding?
We have responded on this to query on Section 4.2 above.
Line 57 (Table 13): This table could be very  useful for design minded readers. Thank you Section 6.3.,  While agree with the set of suggestions, the proposed directions do not seem to be particularly derived from the current study, but just a cliché. This is one very small part of the suggestions and is valid. We would want to retain it as designers especially are always asking for some action research to "test" our findings.

Overview
This paper reports the final results of the HEAD (Holistic Evidence and Design) study of the impact of the design of primary school. The Aim of the project was to: "To explore if there is any evidence for demonstrable impacts of school building design on the learning rates of pupils in primary schools". This is a focused study of a general issue, namely the impact, in practice, of physical spaces on human health and wellbeing. Primary schools are a good focus to address this knotty problem as: the pupils spend most of their time in one space (the classroom); there are available measures of their (in this case academic) performance; and maximising pupils' achievement is an important societal issue. Phase 1 of the project was reported in 2013 [1] and included 751 pupils from seven schools in the Blackpool area of the UK. In Phase 2 data was collected in two further geographical locations in the UK and the data combined, increasing the sample size by around a factor of five, and incorporating many more schools, classrooms and pupils. See Figure 1.

The research challenge / hypothesis
Internal environment quality (IEQ) research has understandably focused on the readily measurable aspects of: heat, light, sound and air quality, and although impressive individual sense impacts have been identified, Kim and de Dear [2] argue strongly that there is currently no consensus as to the relative importance of IEQ factors for overall satisfaction. In parallel, a literature and area of practice has developed around "building performance" with a wide variety of typologies on offer [3,4]. The intelligence gained should feed forward into new designs, however, post-occupancy evaluations (POEs) are not commonplace and the lessons learnt are not generally available for use in practice [5]. In a recent benchmark for whole-life Building Performance Evaluation (BPE) [6] it is made clear that BPE aspires to objectivity using "actual performance of buildings [assessed through] established performance criteria … objective, quantifiable and measurable 'hard' data, as opposed to soft criteria … qualitative … subjective" (pp27-28). However, in practice this is difficult and hardly anywhere amongst the collected chapters is such evidence actually delivered, with the most common approach being occupant surveys / interviews (p169). Some specific aspects linked to "real" impacts have gained traction, for example Ulrich's [7] classic evidence of the positive healing effects of views of nature. But progress from this promising start still falls a long way short of comprehensively addressing the complexity of the design challenge. The difficulty of studying multiple dimensions is illustrated by the problems encountered when the impressive Heschong Mahone [8,9] daylighting studies extended to include other issues. The initial Heschong Mahone study [8] found children in classrooms with most daylighting and biggest windows progressed approximately 20% faster in maths and reading. The follow-up study [9] included thermal comfort, air quality, acoustic measures along with daylighting, but concluded the issue was more complex with daylighting having both positive and negative effects on learning. It is also evident in Tanner's struggle to analyse the multiple aspects impacting on learning rates in schools. His 2009 paper [10] is a second, more successful attempt, to more cleanly structure the possibly important design factors first mooted in his analysis in 2000 [11].
So there exists an important research challenge around the issue of better understanding, and evidencing, the holistic impacts of spaces on users. The work described here represents a radical exploration of a new direction. Rather than build up from the measurable dimensions of heat, light, sound and air quality, we have taken as a starting point the simple notion that the effect of the built environment on users is experienced via multiple sensory inputs in particular spaces, which are resolved in the users' brains. These mental mechanisms can provide a basis for understanding the combined effects of sensory inputs on users of buildings at a level of resolution where "emergent properties" [12] may be evident. Until recently the only exemplar study using this sort of thinking was focused on Alzheimer's care facilities [13]. The implication is that the broad structuring of the brain's functioning can be used to drive the selection and organisation of the environmental factors to be considered, not just their inherent measurability. Drawing from Roll's [14] detailed description of the brain's implicit systems, a novel organising model has been developed and proposed [15] that reflects: the human "hard-wired" response to the availability of healthy, natural elements of our environments; our desire to be able to interact with spaces to address our individual preferences; and the various levels of stimulation appropriate to users engaged in different activities. Thus three dimensions, or design principles, have been used to suggest and structure the factors to be considered, namely: • Naturalness: light, sound, temperature, air quality and links to nature; • Individualisation: ownership, flexibility and connection; • Stimulation (appropriate level of): complexity and colour.
Within this structure the full range of relevant factors (e.g. light, layout, etc.) that might be elements of "good" design for a particular scenario (school) can be grouped, so providing a clear and balanced set of factors to be tested. These go well beyond the usual "big four". The utility of this approach depends, of course, on whether it allows clearer insights to be derived through practical research.
The underpinning hypothesis is that pupils' academic progress will be dependent on a full range of factors drawn from across all three of the design principles.

Existing research on aspects of learning environments
Using the above three-part structure a brief summary is provided below of relevant research findings, focused on the impacts of various elements of school environments. Empirical studies of the individual factors that appear to influence pupils' performance and well-being are summarized here and will be compared with the findings of this study in the 'Discussion' (Section 5).
Naturalness: The Naturalness principle relates to the environmental parameters that are required for physical comfort. These are light, sound, temperature, air quality and 'links to nature'. In particular there are specific requirements needed for children's learning environments. Each of the parameters has been individually researched. Natural light is known to regulate sleep/wake cycles [16] and what level of daylighting is optimum is still an area of active research [8], [9], [10]. With regard to classroom acoustics Crandell and Smaldino [17] define the important metrics and Picard and Bradley [18] note that noise levels in classrooms are usually far in excess of optimal conditions for understanding speech. It has been shown that for 10-12 years olds numerical and language test speeds increased when temperature was reduced slightly and ventilation rates were increased [19]. In their study Daisey et al. [20] conclude that ventilation rates are inadequate in many schools and there is a risk to health. Research also suggests evidence of profound benefits of the experience of nature for children, owing to their greater mental plasticity and vulnerability [21,22].
Individualisation: The Individualisation principle relates to how well the classroom meets the needs of a particular group of children. It is made up of Ownership, Flexibility and Connection parameters. Ownership is the first element and is a measure of both how identifiable and personalized the room is. Flexibility is a measure of how the room addresses the need of a particular age group and any changing pedagogy. Connection is a measure of how readily the pupils can connect to the rest of the school. In this area there is a focus on how to make a personally optimized built environment that can benefit a pupil's learning process and behaviour. For example, it is argued that intimate and personalised spaces are better for absorbing, memorizing and recalling information [23]. When children feel ownership of the classroom, it appears the stage is set for cultivating feelings of responsibility [24]. Classrooms and hallways that feature the products of students' intellectual engagements-representations of academic concepts, projects, displays, and construction are also found to promote greater participation and involvement in the learning process [25]. Building Bulletin 99 (2006) [26] specified that the flexibility must be a key design requirement within the brief. Flexibility is needed to allow for different activities within the classroom and / or the needs of different users. The inclusion of Connection within Individualization is demonstrated by Tanner [10] and Zeisel et al. [13] who emphasize that clearly marked pathways to activity areas improve utilization of space and performance metrics.
Stimulation: The Stimulation principle relates to how exciting and vibrant the classroom is. It has two parameters of Complexity and Colour. Colour is straightforward, but does encompass all the colour elements in the room. Complexity is a measure of how the different elements in the room combine to create a visually coherent and structured, or random and chaotic environment. It has been suggested that focused attention is crucially important for learning. Therefore, maintaining focused attention in classroom environments may be particularly challenging for young children because the visual features in the classroom may tax their still-developing and fragile ability to actively maintain task goals and ignore distractions [27].Colour research shows room colour has an effect on both emotions and physiology causing mood swings that can have an impact on performance [28].
Clearly from the literature it can be anticipated that the built environment of the classrooms will have a great impact on pupils' academic performance, health and wellbeing. However, how these aspects impact in combination has, up to now, been unclear. In other words how the sort of factors discussed above behave in the context of all of the others adds a level of complication that has confounded a clear view of the contribution of the physical space -despite all of the atomised evidence. Thus, the Education Endowment Foundation in its well respected reviews of factors influencing pupil learning concluded in 2014 that: "changes to the physical environment of schools are unlikely to have a direct effect on learning beyond the extremes." [29].
The HEAD Project seeks to bridge the gulf between what is a high level of confidence in the literature about some of the different elements, and a lack of convincing evidence concerning their combined effects in practice.

Structure of the paper
The next section (2) picks up this challenge by setting out the distinctive conceptual approach taken within the HEAD Project. Section 3 turns to methods and sets out the sample used and provides an explanation of the multi-level modelling approach employed. Section 4 gives the results and these are discussed in the context of the existing literature in Section 5. Finally, conclusions are drawn in Section 6.

Overview of planned methodology
Drawing on the discussion above, Figure 2 places the individual pupil at the centre of the analysis, with a vertical flow from their starting position academically and individual characteristics; to their year spent in the classroom; to the output in terms of their academic improvement, but possibly other aspects too, such as behavioural outcomes. This individual journey is sandwiched between non-built environment factors, such as the effect of teachers, and the built / physical features of the school environment. These latter draw on the full wealth of possible aspects, but structured into the typology of naturalness, individualisation and stimulation. To operationalize these physical factors it was necessary to create a coherent range of factors to be measured that it could be hypothesised have impacts on learning progress. This process is described in the next subsection. The research approach adopted calls for diversity in the sample across all of the elements of the above model so that there is the opportunity to reveal the impacts of variations in the factors. This aspect of the study is covered in Section 3, together with the use made of multilevel modelling (MLM) to isolate the individual pupil effects from the impacts connected to the school built environment (BE).

Environment-behaviour (E-B) model
Following the approach taken by Zeisel [13] an "Environment-Behaviour factors model" was built drawing on the available literature, but also informed by preparatory surveys of pupils [30], teachers [31] and post-occupancy evaluations of schools [32]. The E-B model was first structured by the main three "design principles", namely naturalness, individualisation and stimulation. Each of these was then broken down into "design parameters", of which there are ten in total, and these in turn were expanded into eighteen more detailed "indicators". These were then underpinned by thirty more detailed, measurable, "factors". Table 1 summarises these different levels down to the design factors thought to impact on a pupil's learning progress, and including the criteria for a high rating in each case.

Data collection and statistical methodology
This section describes: the sample selection, driven by the desire for variety in our studied variables; the way measures were constructed; and the approach taken to the analysis.

Geographical / national context
All investigated schools are in England, UK. England has a temperate maritime climate due to its proximity to the warm Atlantic Ocean shores and lies in the path of a prevailing westerly wind. It has a mild temperature with warm summers, cool winters and plentiful precipitation throughout the year, rather than seasonal extremes of hot and cold. This study focused on the learning progress in a given year, between 2011-2012 (Blackpool) and 2012-2013 (Hampshire and Ealing in London). From UK Met Office data, the average annual temperature for those two years was 10.1 o C, varying from 4.5 o C in January to 16.0 o C in August. The average monthly rainfall was 76.6mm. December was the wettest month in both years with 103.9mm (2011) and 148.9mm (2012) of total rainfall. By contrast, April 2011 and March 2012 were the driest (11.6mm and 26.5mm respectively). Total sunshine hours in both years are quite similar, 1553.3 hours in 2011, 98.2 hours more than in 2012. Although difficult to be precise owing to within-area variations, these three local authority areas represent a broad spread of socio-economic conditions. Education in England is overseen by the Department for Education. For primary schools, Local Authorities (LAs) take the great majority of the responsibility for implementing policy for public education and state schools at a local level. Children start primary school either in the year, or the term, in which they reach five years old. All LA schools are obliged to follow a centralized National Curriculum (NC), with an emphasis on reading, writing and arithmetic.
In the earlier years at primary school, made up of a "reception" year, year 1 and year 2 and known in the UK as Key Stage 1,(hereafter KS1), pupils are introduced to learning with an emphasis on play. During the last four years at primary school, that is years 3 to 6 and known as Key Stage 2 (hereafter KS2), the approach progressively becomes more formal. In many schools this transition is gradual, through the year groups. Throughout, in mainstream schools, there is apparent a "mixed teaching methods" approach, utilising different learning zones to varying degrees, to support combinations of didactic, independent and group learning.

Schools
In UK schools, primary pupils, spend the majority of their time in one classroom making this age group the ideal focus for this study. Building on an initial pilot phase [1], this study overall collected data from 30 schools, in three local authority areas, in the UK. The pilot study looked at 10 schools within the Blackpool local authority. Blackpool is a coastal town in the North-West of England with relatively high rates (approximately 30%) of child poverty. To increase the size and variety of the sample, ten, diverse schools were additionally selected from the Hampshire local authority area. Hampshire is primarily a rural area in southern England, which includes the coastal city of Portsmouth. It has, on average, low levels (approximately 11%) of children on Free School Meals (FSM), which is a measure of child poverty used regularly in the UK. The third, very different area, chosen was the Outer (West) London area of Ealing. Ten more schools were selected in this urban area, with high density housing and high levels of children with English as an Additional Language (EAL). These are pupils that often speak a different language when at home and can start formal education with little or no knowledge of English.
The 30 schools within the study were chosen to have a wide spectrum of different architectures, built at different times and of different sizes. Two schools in Blackpool were "special" schools and were not used in the final analysis (Schools 2 and 10) and one dropped out part way through for local reasons (School 1). The remaining 27 schools ranged from small, mixed year group, village schools, with 103 pupils, to multi-year intake schools, with 819 pupils. The ages of the buildings ranged from Victorian (circa 1880's), to post 2000 builds. Among other metrics, school site area was also measured; the smallest being 858m 2 and the largest being greater than 40,000 m 2 ( Table 2). There is clearly a good diversity of physical characteristics amongst this sample.

Classrooms
The aim at the outset was to gain the widest possible range of classrooms. However, it was found that in many reception classes it was not possible to obtain pupil performance measures that were comparable to those in the later years. Consequently of 203 classes studied only 153 classes from Years 1-6 were used in the final analysis.
The architectural data collection consisted of two complementary surveys in each school, carried out on the same day: a very detailed survey for each selected classroom and a whole school survey, taking measures of shared spaces, eg. libraries, assembly halls, gyms, outdoor areas. In the classroom survey: • Hard measures were taken, such as: room dimensions, size of windows, placement of doors and Interactive whiteboard (IWB), desk arrangement and learning zone layouts. A range of further factors was assessed in each classroom to create a database of measurements covering all of the hypothesised "indicators" in play. These included aspects, such as: how much control there was of the classroom environment, for example the presence of a radiator thermostat or air conditioning; how the children used the space, whether they had their own coat pegs and the quality of the desks and chairs; and the colour of decorations and complexity of displays within the classroom. The measures are shown summarized as the factors in Table 1 and the creation of the metrics for each is discussed below.
• In addition five spot meter readings were taken in each of the rooms to assess the environmental conditions at the time of the visit. Lighting levels, CO 2 levels, Temperature, noise levels and relative humidity were recorded. These measurements were used to provide an enhanced opportunity for the researchers to identify potential problem areas. However, the measurements were not used directly in the metrics created.
• Lastly, a questionnaire-based interview was also completed, investigating each teacher's experience of their classroom. These questions sought the teachers' opinions of the teaching spaces as they performed through the whole year (as opposed to the above spot measurements). They covered issues like, for example, whether glare was a problem, and if so when. Again the responses to the teachers' questionnaires were not used in the metrics that produced the final results in this study, however they did help the researchers in highlighting potentially important factors to consider.
For each of the factors in Table 1 a 5-point rating scale was used to make an assessment, drawing from the above data, of the characteristics of the factor over the study year. As far as possible this employed simple physical measurements, such as the size and orientation of the windows in relation to daylighting. However, for some factors it was necessary to employ "expert judgement" to give a comprehensive treatment of all of the hypothesised factors. An example of an area where such judgement had to be used concerns the visual complexity of displays. Experimenter bias / internal validity was addressed by separate researchers making assessments and then comparing and establishing a consistent approach, in this case based on assessing both coverage and coherence. As an indication of how the ratings were scored Table 1 shows the criteria which make up the highest ratings in each of the factor categories. The factor scores were averaged to build the ten HEAD design parameters; Light, Sound, Temperature, Air Quality, Links to Nature, Ownership, Flexibility, Connection, Complexity, Colour. Descriptive statistics for the HEAD design parameters are shown in Table 3. Here it can be seen that the sample again displays a good level of variation in the all of the factors. Table 3 Basic metrics of the classroom sample.

Pupils
The HEAD project surveyed 203 classrooms from 30 schools and collected performance statistics from 4924 pupils. Data used in the final results came from 153 classes in 27 schools and 3766 pupils. For each pupil it was essential that the specific classroom they had occupied was identified, so that in the analysis the "pupil effects" could be identified as distinct from "classroom effects". The pupils were in Years 1 to 6. The data needed for the study was the pupil grade at the start of the academic year and pupil grade at the end of the year. Grades were collected for three subjects: Reading, Writing and Maths.
Children in KS1 are assessed using a variety of performance systems. National Curriculum, hereafter NC, levels start at Level 1c with an equivalent NC point score of 7, (Table 4) so children working at or above these NC levels were used in this study. Some schools also used P scales at KS1, and again this data was used. However some children were assessed on a 9-point Foundation Stage Profile which had been introduced, but then rapidly replaced by a much simpler 3-point version. For KS1 pupils in this study it was found that the later 3-point scale did not include enough detail to place the pupils on the NC equivalent points system, so these pupils were not used. It was also common to find schools giving progress as 'working towards' which again could not be used. Table 4 Conversion of National Curriculum (NC) levels to NC points UK pupils throughout KS2 are normally assessed using the NC levels shown in table 4. Each NC level has 3 sublevels (denoted by a, b and c) and on average pupils are expected to achieve progress of 2 sublevels per year in each subject. National tests are taken at the end of Year 2 (KS1 test) and at the end of Year 6 (KS2 test). An average pupil is expected to be at level 2b at the end of KS1 and progress to level 4b by the end of KS2. For pupils studying at KS2, who have been assessed as having special educational needs a P scale, which leads into NC levels is used (see Table 4). For pupils in KS2 who have English as an Additional Language (EAL) a separate 5-point EAL scale is used by teachers (not shown).
For analyses of performance statistics, the NC levels were converted to a NC points score as given in Table 4. With the EAL pupils below the 4 th point in the EAL scale there is no equivalent NC points score so these pupils, who have no verbal or written skill in English, were not used. Pupils at the 4 th and 5 th EAL points are considered to be working at the low end and high end of the NC level 1, so were converted to level 1c and level 1a respectively. For each pupil the NC points at the start of the year and at the end of the year were used to create a measure of pupil progress in NC points. The progress points were added together for each of three subjects (Reading, Writing and Maths) to create an Overall Progress score. Overall Progress is the dependent variable in our regression analysis. It has been grand mean centred over all 3766 pupils. The summary statistics for the learning measures used are given in Table 5. It can be seen that the mean progress for the pupils in the survey population is 11.90 NC points, where 12 NC points would equate to two sublevels in each of the three subjects, which is the "expected" progress mentioned previously. Table 5 Descriptive statistics for pupil NC points score To enhance the analysis of factors associated with the individual pupils, schools were also asked to provide extra contextual data in the form of date of birth, gender, date of first class of the year, date of last class of the year, attendance rate and whether the pupil was in any of the government classifications of Free School Meals (FSM -a measure of deprivation), EAL or Special Educational Needs (SEN). Date of first and last class and attendance rate were collected to ensure pupils could be excluded from the study where they had poor attendance or had not been in the class for the whole year of study. In total there were 669 pupils (18%) rated as SEN, 874 children (23%) with EAL, and 775 pupils (21%) with FSM status.
As a starting point in the study several pupil factors had to be controlled for. Because pupils learn at different rates from year to year over their school life, the start grade of a child, compared to the average start grade in that year group is a key indicator of their potential progress. Start grade was therefore group mean centred on age (a proxy for year group) and is termed 'Weighted start-on-age' in this study. Pupils in the UK are almost always taught in classes of the same age. The start grade was also grand mean centred on the whole dataset to form a second explanatory variable which relates to how far a pupil is along their learning journey through the KS1 and KS2 syllabuses. This is termed the 'Weighted start'. Other explanatory variables are straightforward such as gender, FSM, EAL and SEN. Two further variables were also created for the study; Actual Age, which is the grand mean centred age in months for the child, and the Months Age, which is the number of months the child is past their birthday at the start of the academic year. This gave the relative age in months of the pupil compared to their year group, that is, if they were "old" or "young" in their year.
As a final step in creating the pupil variables for the study, the Overall Progress, the Weighted Starton-age, the Weighted Start, the Actual Age and the Months Age variables were 'normalized'. This process involved calculating the variance from the mean of the data set for each datum and then dividing by the standard deviation of the data set.
Again it can be seen that the pupil population displays a lot of variety across the measures used and in terms of features such as FSM, EAL and SEN.

Modelling strategy
The analysis followed two broad steps. First the influence on learning of each of the factors being studied was addressed separately through bivariate analysis. Then, once the measures likely to be in play had been identified, and any inadvertent inter-correlations had been minimised, a multi-level analysis of their combined effects was carried out. This latter part is the more unusual and so is described in greater detail below.
In this study we aimed to model pupil Overall Progress, which is a continuous variable, using a linear regression model. Because pupils learn together in classrooms we expected the pupil progress between pupils sharing the same classroom environment to be more correlated than pupil progress between pupils in different classrooms. For this reason we needed to use a type of linear regression model that allowed data to be clustered in groups, called a multi-level model (MLM). MLM analysis allows modelling of the variance-covariance matrix from the data directly so that the normal requirement of homogeneity of variance across the whole dataset can be dropped [33].
The structure of the MLM needed for this study was a two level model where pupils at Level 1 are nested within classrooms at Level 2. A three level model, with pupils (Level 1) nested within classrooms (Level 2), and classrooms nested within schools (Level 3), was also tested but not used in the final analysis. This will be discussed more fully in the results. The term 'nested ' is used as each child only learns in one classroom, and each classroom is only within one school.
MLM analysis also allows unexplained variance to be identified at each of the model levels. For example in the case of the influence of teachers, our efforts to create measures were unsuccessful owing to understandable confidentiality concerns. Thus, it is assumed that this important element is left in the unexplained variance at the classroom level. Nye et al.'s meta-analysis scales the magnitude of the teacher effect at somewhere between 7 -21% of the variance in pupils' achievement gains [34].
A specialist modelling software package MLwiN [35] was used for the study. The modelling procedure follows that outlined by  for a two level model with clustered data. The initial Level 1 (pupil) model was written as: Where ‫݈݈ܽݎ݁ݒܱ‬ ‫ݏݏ݁ݎ݃ݎܲ‬ is the individual Overall Progress for child i in classroom j which depends on ߚ , the intercept (mean value) for classroom j plus a residual, ݁ , associated with each child. The initial Level 2 (classroom) model was: Where the intercept specific to classroom j (mean value in classroom j) depends on an overall fixed intercept ߛ plus a random effect ‫ݑ‬ associated with classroom j. The overall mixed level model was given by: After building the basic structure of the regression model, the explanatory variables could then be added. As a test of the efficacy of an additional explanatory variable to improve the model, a likelihood ratio test was carried out. The '-2*log-likelihood' function was calculated for each of the competing models, that is the simpler model and that with the additional factor. Then, to test if the latter model was a significant improvement, a comparison was made of the difference in '-2*loglikelihood' between the two models taking a chi-squared distribution on 1 degree of freedom. This was repeated for each added explanatory variable (Chapter 2.5 [36]).
The next step in building the model involved adding the explanatory variables both at Level 1 and at Level 2. Following the procedure outlined in West et al. [37] explanatory variables at Level 1 were added first using a ' Step-up' procedure. The two primary predictors of pupil progress that we were using in this study were the start grades for each child; Weighted Start and Weighted Start-on-age. These two variables were added sequentially and the significance of the model improvement noted using the -2*loglikelihood statistic at each step. The model was then improved by adding the random effects on one of the Level 1 variables. The best improvement was found when a random effects variable is added to the Weighted Start-on-age. As we allowed the intercept value to vary according to which classroom a pupil was in using coefficient β_0j, we then allowed the slope of the line to vary according to classroom with the coefficient β_1j. This coefficient describes the relationship between the average Overall Progress and the average start level compared to children in the same year. This type of MLM is sometimes called a random slope model [36].
Each of the other Level 1 explanatory variables were added to the Level 1 model and the '-2*loglikelihood' tested to make sure the variables made a significant improvement to the model. There is deemed to be a significant change where the p<0.05 (2 tailed). The step-up procedure is used when each of the explanatory variables to be added are independent of each other. In this case gender, age and the key pupil metrics of FSM, EAL and SEN were all independent of each other.
The second part of the process involved adding the classroom explanatory variables at Level 2. Each environmental factor was tested individually by creating a model with just this environmental factor, and there was deemed to be a significant change where the p<0.05 (2 tailed). With the remaining variables there were still inadvertent correlations between some of the factors (see 4.1 below). Because of this a top-down approach was used when adding these variables so that the fitted model showed the combined effect of all these factors, before each factor was removed to test for its individual significance in the overall model [37]. As each remaining classroom parameter was sequentially removed the '-2*log-likelihood' was compared to the full model to see if there was a significant change (p<0.10, 2 tailed). Where the presence of the parameter significantly improved the model, it was retained; if not, then it was left out. Once all of the parameters that were not significant had been removed, a further procedure was carried out by adding back in each of the rejected parameters. This last step is important as the classroom parameters, because of their intercorrelation, had an impact on each other. A higher p-value limit was allowed in the final test as both the bivariate analysis and the individual modelling results had already shown the significance of each individual classroom parameter at the higher level.
In the initial bivariate analysis (Table 6), focusing on the pupil factors first: the start scores were significantly negatively correlated with the Overall Progress. This means that the higher the start score the less progress was made. This is also true for the Actual Age measure. The children in older classes made less progress. The correlation for gender is not significant, so males and females did not make significantly different Overall Progress. Children on FSM have poorer progress, as do SEN children. EAL pupils on average have significantly better Overall Progress. These are significant influences that clearly had to be taken into account in the MLM if the impact of the environmental factors was to be isolated. Table 6 Pearson correlation between each variable and each pupil's overall progress.
In the development of the environmental factors, scatterplots were initially produced to examine the relationship between pupil progress and each of the measures in isolation. Elements were retained in the study where a broad relationship was confirmed between the pupil progress and the measure. Particular note was taken when non-linear relationships were observed (see below) and for these factors curvilinear scales were created.
Correlations of Overall Progress for each pupil against environmental measures showed all ten parameters were positively correlated with progress. Of the five Naturalness parameters Light has the highest correlation with Overall progress. In the formulation of the Light parameter the highest quantity of natural and electrical light, but without direct sunlight, was found to be optimum. Too much direct sunlight into the classroom was found to cause a glare problem. In the Individualization theme all three parameters were found to be significantly positively correlated. For the Level of Stimulation parameters the two factors of Complexity and Colour were both found to be curvilinear and an intermediate level of the parameter was found to be optimum. For example both high Complexity and low Complexity classrooms scored poorly, while intermediate values of Complexity scored highly.
In the creation of the measures for the factors we endeavoured as far as possible to remove cases of high inter-correlation between the measures, given the attendant concern of double-counting. However, the driving focus had to remain on representing the hypothesised influences on learning being tested. Consequently there were some instances of parameters with significant correlations, for example, for the parameters Light and Air Quality the correlation stands at 0.312. This was owing to Light including a measure of 'window size', while Air Quality included a measure of 'open-able window size'. Against this context, Table 7 shows the inter correlations between the parameters.

Multi-level model
Multilevel modelling allows nesting of children within classrooms. Within a two level model variance was then partitioned between the two levels: pupil level and classroom level. Using the explanatory variables to fit a statistical model allowed some of the variance at each of the levels to be reduced. The empty, or null, two level model, as it is initially set up, without any explanatory variables describes the partition between variance at the pupil level and at the class level. In our data set for the Overall Progress the empty model partitions approximately 55% of the variance into the pupil level and approximately 45% of the variance into the classroom level.
In the three level model only 3% of the variance was at the school level. Showing that, even though the schools were chosen to be as different as possible in both architecture and pupil intake, variance in yearly Overall Progress was dominated by pupil effects and classroom-level effects. The small level of variance at the school level may be influenced to a degree by all the schools being state funded, mixed gender and local authority controlled. This does not reflect the full spectrum of UK primary schools, but it does represent the great majority. It should also be noted that that there is considerable variation in the physical characteristics of the schools, the impact of which is the focus of this study. Factors at the school level were investigated, but only minor impacts revealed as would be expected given the distribution of the variance set out above. For this reason the three level model was not investigated further. However, the low level of impact on learning of the school level factors, compared to classroom and pupil level factors, is in itself an important finding. We return to this issue in the conclusions.
The results for the two level Overall Progress model are shown in Table 8. Values are shown for the fixed effect coefficients for each of the added explanatory variables and for each of the random effects variables. The sizes of the coefficients reflect the relative importance of the explanatory variables in the model. Table 8 Parameter estimates and standard errors for factors significant in the MLM.
The proportion reduction in variance (PRV) by adding explanatory variables to the model at Level 1 and Level 2 is given in Table 9. The pupil explanatory variables reduce the Level 1 variance by 18% and the classroom explanatory variables reduce the Level 2 variance by 26%. The overall R-squared fit for the two-level model is 58%. Table 9 Proportion reduction in variance (PRV) by adding Level 1 and Level 2 factors to the model.
The following two sections discuss the explanatory variables significant at the classroom and pupil levels.

'Pupil level' influences
Results from the two-level model show the Level 1 factors that were significant in the model were Weighted Start, Weighted Start-on-age, FSM, EAL and SEN. Gender was not significant in the model. Children on FSM, and who have SEN did significantly worse than other pupils. EAL pupils did significantly better. The sizes of the coefficients is indicative of their relative effect, with EAL pupils and FSM having similar sized effect and the SEN pupil Overall Progress deficit being more than three times as great. With Weighted Start the model coefficient is negative indicating pupils who are in higher year groups made less progress. It should be noted that although the NC points scale is linear and there is an expectation that each pupil, whatever their age, makes the expected two sublevels improvement per year, there is an acknowledgement by teachers that learning rates in children are not linear. For Weighted start-on-age the model coefficient is positive indicating pupils who are advanced for their age group did on an average make more progress.
These results are similar to the earlier bivariate correlation analysis, but now of course provide an interactive backdrop within the same model as the environmental factors, to which we now turn. In addition to these operationalised pupil factors, other aspects linked to the pupils, but not measured, are also included in the modelling, within the unexplained variation compartmented at the pupil level.

'Class level' E-H-P influences
Out of the ten environmental parameters investigated in this study seven of them significantly improve our two-level MLM for Overall Progress in primary aged school children. These are shown with their model coefficients in Table 9. The environmental classroom parameters that are significant come from each of the three different design principles: Naturalness, Individualization and Level of Stimulation. Table 10 gives the breakdown of the relative importance of the parameters. The Naturalness parameters of Light, Temperature and Air quality together explain 49% of the effect on the Overall Progress model. The Individualization parameters of Ownership and Flexibility together explain 28% of the effect. The Level of Stimulation parameters of Complexity and Colour together explain 23% of the effect. The relative sizes of these classroom effects across the three principles reflects a reasonable expectation that the most influential principle is the Naturalness of the environment. The second most influential is how well the classroom is individualized for its pupil and the last component, which still accounts for almost one quarter of the effect, is the Appropriate Level of Stimulation in the classroom.  Table 11 takes the findings on the individual parameters and compares them with existing evidence from the literature. Many of the sources used for the latter have been focused on single factors, quite often in controlled conditions, whereas our findings derive from a "natural inquiry" where even when we focus on one factor, it is still acting in the context of all the others. Table 11 Insights from main study results, by design parameter.

Discussion
Although informed by previous studies, this study goes on to further concentrate on the complex interaction of a range of built environmental factors on pupils in primary schools. That said, findings concerning comfort issues, rooted in the design principle of 'naturalness', are found to be generally consistent with the literature. Light, temperature and air quality have a significant impact on the pupils' learning outcomes. However, this study also finds that large window size is not universally valuable in terms of maximizing learning benefits. Orientation, shading control (inside and outside), the size and position of openings, all have to be carefully taken into consideration so that the risks of glare, overheating and poor air quality can be avoided at the design stage. Furthermore, the importance of occupants' control of the 'naturalness' is evident. High quality and quantity of electrical lighting, central heating with thermostatic control and mechanical ventilation can all give opportunities for teachers / pupils to adjust the environment to a more comfortable level. It should be noted that although acoustics and links to nature displayed correlations to learning progress in the bivariate analysis, they were competed out in the MLM and so the evidence for their importance within this (quite extensive and varied sample) can only be said to be weak.
Pupils in primary schools usually have a relatively fixed learning space for most of their time there. They will build up considerable familiarity with their classrooms, and the extent to which they are able to have a room that responds to their individual needs comes under 'individualization', the second design principle. Permanent individual display (artworks, photos, crafts) has been addressed by many previous studies as an efficient way to promote a sense of ownership. This study confirms it and goes a step further. A classroom that has distinct architectural characteristics, e.g. unique location (bungalow, or separate buildings); shape (L shape; T shape); embedded shelf for display; intimate corner; facilities specifically-designed for pupils, distinctive ceiling pattern etc. also seems to strengthen the pupils' sense ownership. No clear consensus is reached from previous studies whether classroom size is a factor that affects the learning outcomes. It appears that classroom shapes and the optimum elements within a room depend on pupils' ages. Where play-based learning is the primary activity (KS1), the room needs to reflect this with varied learning zones. Where more formal instruction is given through the interactive white board all pupils must be in a position to easily see the front and so a simpler plan seems appropriate (KS2). It should be stressed that this distinction appears to be a function of the predominant pedagogical approaches used in the UK. Lastly, the connection factor, concerning corridors and navigation about the school, have not appeared in the MLM and so only receive weak support from this study through a link to learning progress within the bivariate correlation analysis alone.
A classroom in a primary school is for children, and arguably should be designed to make attending school an interesting and pleasurable experience. On the other hand, it is also a place where learning can take place uninterrupted by distractions. Lying behind this dynamic is the third design principle concerning the 'appropriate level of stimulation' for a given activity. The influence of the parameters identified to affect the visual perception of diversity in this study is found to be curvilinear, such that intermediate levels of the factors are optimal for learning. For example, the overall appearance, including the room layout and display on the wall has to be stimulating, but in balance with a degree of order, ideally without clutter. Similarly, colours with high intensity and brightness are better as accents or highlights instead of being the main colour theme of the classroom. This simple notion of a moderate level of stimulation being appropriate for the learning situation provides a principle that can throw light on a number of more focused studies.

Summary
The research in this study focused on a holistic environment-human-performance model examining school and classroom spaces and relating these to individual pupil progress statistics. Researchers assessed 153 classrooms in 27 schools to measure school and classroom features. Data on the 3766 pupils who occupied those spaces were also collected, including the focal dependent variable of progress in learning. The design principles of Naturalness, Individualization and Level of Stimulation were used to develop ten design parameters. The underpinning hypothesis is that pupils' academic progress will be dependent on a full range of factors drawn from across all three of the design principles. Measures were then created for the ten design parameters for each classroom. All ten parameters individually correlated significantly with pupil progress. Multi-level regression modelling was then used (including pupil factors) and resulted in seven key design parameters being identified that best predict the pupils' progress. These were Light, Temperature, Air Quality, Ownership, Flexibility, Complexity and Colour. The impact of the modelled classroom parameters was 16% of the total range of the variability in pupils' learning progress. Inclusion of three very different local authority areas with distinctly differing pupil intake characteristics and differing school building environments was intended to support the analysis at the school level. It did not do so. It became evident that the variability in learning progress to be explained at the school level in the multilevel model was only 3%. Including this level of analysis did not enhance the overall analysis and so was dropped.
In Phase 1 of the study, classroom parameters were found to explain 25% of the variance in learning progress [1]. In Phase 2 the sample is five times bigger and the classroom effect has levelled out at 16%, but with much greater certainty. The second phase of the study has also included additional pupil impacts relating to: Free School Meal (FSM) status, English as an Additional Language (EAL) status and Special Educational Needs (SEN) status. The R-squared value for the goodness-of-fit of the regression model has improved from 51% in Phase 1 to 58% in Phase 2.

Main contributions
This study has thrown light on a variety of issues ranging from broad conceptual challenges, to quite specific, practical questions.
One of the major, more general, contributions of this study is to confirm the hypothesised utility of the naturalness, individuality, stimulation (or more memorably, SIN) conceptual model ( Figure 3) as a vehicle to organise and study the full range of sensory impacts experienced by an individual occupying a given space. That this might be a productive way forward was argued speculatively in 2010 [15], but the results obtained provide clear evidence that each of these dimensions appears to have a role in understanding the holistic human experience of built spaces. It is interesting that (in this particular case of primary schools) the naturalness factors account for around 50% of the impact on learning, with individuality and appropriate level of stimulation factors accounting for roughly a quarter each. It could not be predicted if each of the dimensions would remain in play and if so with what relative weight. We now at least have an initial indication, in one situation.

Figure 3 Holistic conceptual (SIN) model
The finding that the combined impact of the built environment factors on learning scales at explaining 16% of the variation in learning progress made is a major finding in an area where, as Baker and Berstein phrase it [62]: "the relationship between school buildings and student health and learning … is more viscerally understood than logically proven" (p2). This is of course relevant in relation to schools, but as stated at the start of this paper, primary schools provide a relatively simple situation to study a complex general problem. By extension the results suggest that the scale of the impact of building design on human performance and wellbeing can be identified and that it is non-trivial.
It has also been informative how some factors that display quite strong and significant correlations, as single factors, with (in this case) learning progress, drop out of the analysis when combined with all other factors, for example "links to nature". This demonstrates the value of single factor analyses in creating hypotheses, but highlights the danger of assuming they will translate simply to naturally experienced, multi-dimensional environments. This reinforces the utility of multilevel modelling in studying complex situations as "natural" experiments.
One aspect that surprised the researchers was the muted impact of the whole-building level of analysis. To an extent this will be a result of the characteristics of this study's focus on primary state school education, where the pupils spend most of their time in one space and following the national curriculum. That said, it does provide support for the rise in recent years of polemical works arguing for "inside-out design" [63] that builds from a focus on user needs and challenges the visual dominance of much design effort [64]. This is twinned by those arguing specifically for aspects of sensory-sensitive design [65,66]. It would seem that these aspects are more important than is often realised. Figure 3 provides a powerful illustration of this issue. Each column of plots represents the classes in a school and it can be seen that the variation in modelled performance of the classrooms within a given school varies very widely. There is no such thing here as a "good" or "bad" school, but there are very clearly more and less effective classrooms. Focusing down on school design itself, the study has been able to identify and typify the elements of design that together appear to lead to optimal learning spaces for primary school pupils. This is summarised in Table 12. Several of the factors are not only issues for designers, but present opportunities for users to adapt their spaces to better support learning. However, there does remain a considerable design challenge to elegantly address all of these factors optimally in combination.

Table 12
The main classroom characteristics that support the improvement of pupils' learning.

Limitations and future research
This study has strengths and weaknesses. The chosen focus and the conceptual and methodological approach employed have enabled progress to be made, but also carry limitations and consequent opportunities for alternative approaches. In addition the findings to date also provide a foundation upon which future studies could be built with greater confidence than before.
The sample is focused on one type of building (primary schools) in one country (UK / England) and has endeavoured to explain one measure of human performance (formal academic progress). Primary schools and the pedagogy practiced within them in the UK are quite distinctive and it could be anticipated that in other scenarios the impact of the whole-building level could be more prominent. It could also be that other factors, or weighting of factors, are relevant to other dimensions of education, such as behavioural development in pupils. It would certainly be anticipated that different requirements could pertain for different activities where, for example, the appropriate level of stimulation varies. Further, the UK displays quite specific climatic conditions and for other geographical areas the specifics of how the optimum conditions are realised would be expected to vary. That said, the basic human comfort needs would probably be more stable. So, for example, the orientation and power of the sun could be quite different in different regions so that window design would need to take this into account, but the human need for sufficient light, but not too much glare should translate. More complex would be cultural differences, which could drive variations in the approach to pedagogy, or more basically effect preferences / reactions to factors such as colour.
The flip side to the above limitations is that, building on the experience of this study, further studies could fruitfully be carried out of different types of learning institutions, such as secondary schools and universities. This could extend beyond education to, say, offices, accommodation for the elderly, and retail [67]. For these, preliminary soft data studies would be advisable in order to provide a sound foundation for the hypotheses and the identification of a powerful dependent variable will not always be very simple. It would also be beneficial to go beyond the methodology used to date and move, say, to an action research approach, where changes are made based on the results so far and the impacts (anticipated and unanticipated) are tracked through multiple triangulated methods.
Within the dataset already compiled, there are sub-analyses possible, for example of the impacts of spaces on SEN pupils in particular. It will also be interesting to see to what extent currently judgemental measures can be moved to objective measures, for example the issue of visual complexity.

A significant direction
Given the large sample size and the scale of the effects identified in this study, it seems reasonable to suggest that strong proof of concept has been provided for the efficacy of the approach used in this research. Using the broader SIN conceptual model, linked to MLM, clearly has the potential to reveal more about the holistic impacts of spaces on people. That said, it is vital to capitalise on this promising initial step and to further develop these concepts and techniques.

Overview
This paper reports the final results of the HEAD (Holistic Evidence and Design) study of the impact of the design of primary school. The Aim of the project was to: "To explore if there is any evidence for demonstrable impacts of school building design on the learning rates of pupils in primary schools".
This is a focused study of a general issue, namely the impact, in practice, of physical spaces on human health and wellbeing. Primary schools are a good focus to address this knotty problem as: the pupils spend most of their time in one space (the classroom); there are available measures of their (in this case academic) performance; and maximising pupils' achievement is an important societal issue.
Phase 1 of the project was reported in 2013 [1] and included 751 pupils from seven schools in the Blackpool area of the UK. In Phase 2 data was collected in two further geographical locations in the UK and the data combined, increasing the sample size by around a factor of five, and incorporating many more schools, classrooms and pupils. See Figure 1.

The research challenge / hypothesis
Internal environment quality (IEQ) research has understandably focused on the readily measurable aspects of: heat, light, sound and air quality, and although impressive individual sense impacts have been identified, Kim and de Dear [2] argue strongly that there is currently no consensus as to the relative importance of IEQ factors for overall satisfaction. In parallel, a literature and area of practice has developed around "building performance" with a wide variety of typologies on offer [3,4]. The intelligence gained should feed forward into new designs, however, post-occupancy evaluations (POEs) are not commonplace and the lessons learnt are not generally available for use in practice [5].
In a recent benchmark for whole-life Building Performance Evaluation (BPE) [6] it is made clear that BPE aspires to objectivity using "actual performance of buildings [assessed through] established performance criteria … objective, quantifiable and measurable 'hard' data, as opposed to soft criteria … qualitative … subjective" (pp27-28). However, in practice this is difficult and hardly anywhere amongst the collected chapters is such evidence actually delivered, with the most common approach being occupant surveys / interviews (p169).
Some specific aspects linked to "real" impacts have gained traction, for example Ulrich's [7] classic evidence of the positive healing effects of views of nature. But progress from this promising start still falls a long way short of comprehensively addressing the complexity of the design challenge. The difficulty of studying multiple dimensions is illustrated by the problems encountered when the impressive Heschong Mahone [8,9] daylighting studies extended to include other issues. The initial Heschong Mahone study [8] found children in classrooms with most daylighting and biggest windows progressed approximately 20% faster in maths and reading. The follow-up study [9] included thermal comfort, air quality, acoustic measures along with daylighting, but concluded the issue was more complex with daylighting having both positive and negative effects on learning. It is also evident in Tanner's struggle to analyse the multiple aspects impacting on learning rates in schools. His 2009 paper [10] is a second, more successful attempt, to more cleanly structure the possibly important design factors first mooted in his analysis in 2000 [11].
So there exists an important research challenge around the issue of better understanding, and evidencing, the holistic impacts of spaces on users. The work described here represents a radical exploration of a new direction. Rather than build up from the measurable dimensions of heat, light, sound and air quality, we have taken as a starting point the simple notion that the effect of the built environment on users is experienced via multiple sensory inputs in particular spaces, which are resolved in the users' brains. These mental mechanisms can provide a basis for understanding the combined effects of sensory inputs on users of buildings at a level of resolution where "emergent properties" [12] may be evident. Until recently the only exemplar study using this sort of thinking was focused on Alzheimer's care facilities [13]. The implication is that the broad structuring of the brain's functioning can be used to drive the selection and organisation of the environmental factors to be considered, not just their inherent measurability. Drawing from Roll's [14] detailed description of the brain's implicit systems, a novel organising model has been developed and proposed [15] that reflects: the human "hard-wired" response to the availability of healthy, natural elements of our environments; our desire to be able to interact with spaces to address our individual preferences; and the various levels of stimulation appropriate to users engaged in different activities. Thus three dimensions, or design principles, have been used to suggest and structure the factors to be considered, namely: • Naturalness: light, sound, temperature, air quality and links to nature; • Individualisation: ownership, flexibility and connection; • Stimulation (appropriate level of): complexity and colour.
Within this structure the full range of relevant factors (e.g. light, layout, etc.) that might be elements of "good" design for a particular scenario (school) can be grouped, so providing a clear and balanced set of factors to be tested. These go well beyond the usual "big four". The utility of this approach depends, of course, on whether it allows clearer insights to be derived through practical research.
The underpinning hypothesis is that pupils' academic progress will be dependent on a full range of factors drawn from across all three of the design principles.

Existing research on aspects of learning environments
Using the above three-part structure a brief summary is provided below of relevant research findings, focused on the impacts of various elements of school environments. Empirical studies of the individual factors that appear to influence pupils' performance and well-being are summarized here and will be compared with the findings of this study in the 'Discussion' (Section 5).
Naturalness: The Naturalness principle relates to the environmental parameters that are required for physical comfort. These are light, sound, temperature, air quality and 'links to nature'. In particular there are specific requirements needed for children's learning environments. Each of the parameters has been individually researched. Natural light is known to regulate sleep/wake cycles [16] and what level of daylighting is optimum is still an area of active research [8], [9], [10]. With regard to classroom acoustics Crandell and Smaldino [17] define the important metrics and Picard and Bradley [18] note that noise levels in classrooms are usually far in excess of optimal conditions for understanding speech. It has been shown that for 10-12 years olds numerical and language test speeds increased when temperature was reduced slightly and ventilation rates were increased [19]. In their study Daisey et al. [20] conclude that ventilation rates are inadequate in many schools and there is a risk to health. Research also suggests evidence of profound benefits of the experience of nature for children, owing to their greater mental plasticity and vulnerability [21,22].
Individualisation: The Individualisation principle relates to how well the classroom meets the needs of a particular group of children. It is made up of Ownership, Flexibility and Connection parameters.
Ownership is the first element and is a measure of both how identifiable and personalized the room is. Flexibility is a measure of how the room addresses the need of a particular age group and any changing pedagogy. Connection is a measure of how readily the pupils can connect to the rest of the school. In this area there is a focus on how to make a personally optimized built environment that can benefit a pupil's learning process and behaviour. For example, it is argued that intimate and personalised spaces are better for absorbing, memorizing and recalling information [23]. When children feel ownership of the classroom, it appears the stage is set for cultivating feelings of responsibility [24]. Classrooms and hallways that feature the products of students' intellectual engagements-representations of academic concepts, projects, displays, and construction are also found to promote greater participation and involvement in the learning process [25]. Building Bulletin 99 (2006) [26] specified that the flexibility must be a key design requirement within the brief. Flexibility is needed to allow for different activities within the classroom and / or the needs of different users. The inclusion of Connection within Individualization is demonstrated by Tanner [10] and Zeisel et al. [13] who emphasize that clearly marked pathways to activity areas improve utilization of space and performance metrics.
Stimulation: The Stimulation principle relates to how exciting and vibrant the classroom is. It has two parameters of Complexity and Colour. Colour is straightforward, but does encompass all the colour elements in the room. Complexity is a measure of how the different elements in the room combine to create a visually coherent and structured, or random and chaotic environment. It has been suggested that focused attention is crucially important for learning. Therefore, maintaining focused attention in classroom environments may be particularly challenging for young children because the visual features in the classroom may tax their still-developing and fragile ability to actively maintain task goals and ignore distractions [27].Colour research shows room colour has an effect on both emotions and physiology causing mood swings that can have an impact on performance [28].
Clearly from the literature it can be anticipated that the built environment of the classrooms will have a great impact on pupils' academic performance, health and wellbeing. However, how these aspects impact in combination has, up to now, been unclear. In other words how the sort of factors discussed above behave in the context of all of the others adds a level of complication that has confounded a clear view of the contribution of the physical space -despite all of the atomised evidence. Thus, the Education Endowment Foundation in its well respected reviews of factors influencing pupil learning concluded in 2014 that: "changes to the physical environment of schools are unlikely to have a direct effect on learning beyond the extremes." [29].
The HEAD Project seeks to bridge the gulf between what is a high level of confidence in the literature about some of the different elements, and a lack of convincing evidence concerning their combined effects in practice.

Structure of the paper
The next section (2) picks up this challenge by setting out the distinctive conceptual approach taken within the HEAD Project. Section 3 turns to methods and sets out the sample used and provides an explanation of the multi-level modelling approach employed. Section 4 gives the results and these are discussed in the context of the existing literature in Section 5. Finally, conclusions are drawn in Section 6.

Overview of planned methodology
Drawing on the discussion above, Figure 2 places the individual pupil at the centre of the analysis, with a vertical flow from their starting position academically and individual characteristics; to their year spent in the classroom; to the output in terms of their academic improvement, but possibly other aspects too, such as behavioural outcomes. This individual journey is sandwiched between non-built environment factors, such as the effect of teachers, and the built / physical features of the school environment. These latter draw on the full wealth of possible aspects, but structured into the typology of naturalness, individualisation and stimulation. To operationalize these physical factors it was necessary to create a coherent range of factors to be measured that it could be hypothesised have impacts on learning progress. This process is described in the next subsection. The research approach adopted calls for diversity in the sample across all of the elements of the above model so that there is the opportunity to reveal the impacts of variations in the factors. This aspect of the study is covered in Section 3, together with the use made of multilevel modelling (MLM) to isolate the individual pupil effects from the impacts connected to the school built environment (BE).

Environment-behaviour (E-B) model
Following the approach taken by Zeisel [13] an "Environment-Behaviour factors model" was built drawing on the available literature, but also informed by preparatory surveys of pupils [30], teachers [31] and post-occupancy evaluations of schools [32]. The E-B model was first structured by the main three "design principles", namely naturalness, individualisation and stimulation. Each of these was then broken down into "design parameters", of which there are ten in total, and these in turn were expanded into eighteen more detailed "indicators". These were then underpinned by thirty more detailed, measurable, "factors". Table 1 summarises these different levels down to the design factors thought to impact on a pupil's learning progress, and including the criteria for a high rating in each case.

Data collection and statistical methodology
This section describes: the sample selection, driven by the desire for variety in our studied variables; the way measures were constructed; and the approach taken to the analysis.

Geographical / national context
All investigated schools are in England, UK. England has a temperate maritime climate due to its proximity to the warm Atlantic Ocean shores and lies in the path of a prevailing westerly wind. It has a mild temperature with warm summers, cool winters and plentiful precipitation throughout the year, rather than seasonal extremes of hot and cold. This study focused on the learning progress in a Education in England is overseen by the Department for Education. For primary schools, Local Authorities (LAs) take the great majority of the responsibility for implementing policy for public education and state schools at a local level. Children start primary school either in the year, or the term, in which they reach five years old. All LA schools are obliged to follow a centralized National Curriculum (NC), with an emphasis on reading, writing and arithmetic.
In the earlier years at primary school, made up of a "reception" year, year 1 and year 2 and known in the UK as Key Stage 1,(hereafter KS1), pupils are introduced to learning with an emphasis on play. During the last four years at primary school, that is years 3 to 6 and known as Key Stage 2 (hereafter KS2), the approach progressively becomes more formal. In many schools this transition is gradual, through the year groups. Throughout, in mainstream schools, there is apparent a "mixed teaching methods" approach, utilising different learning zones to varying degrees, to support combinations of didactic, independent and group learning.

Schools
In UK schools, primary pupils, spend the majority of their time in one classroom making this age group the ideal focus for this study. Building on an initial pilot phase [1], this study overall collected data from 30 schools, in three local authority areas, in the UK. The pilot study looked at 10 schools within the Blackpool local authority. Blackpool is a coastal town in the North-West of England with relatively high rates (approximately 30%) of child poverty. To increase the size and variety of the sample, ten, diverse schools were additionally selected from the Hampshire local authority area.
Hampshire is primarily a rural area in southern England, which includes the coastal city of Portsmouth. It has, on average, low levels (approximately 11%) of children on Free School Meals (FSM), which is a measure of child poverty used regularly in the UK. The third, very different area, chosen was the Outer (West) London area of Ealing. Ten more schools were selected in this urban area, with high density housing and high levels of children with English as an Additional Language (EAL). These are pupils that often speak a different language when at home and can start formal education with little or no knowledge of English.
The 30 schools within the study were chosen to have a wide spectrum of different architectures, built at different times and of different sizes. Two schools in Blackpool were "special" schools and were not used in the final analysis (Schools 2 and 10) and one dropped out part way through for local reasons (School 1). The remaining 27 schools ranged from small, mixed year group, village schools, with 103 pupils, to multi-year intake schools, with 819 pupils. The ages of the buildings ranged from Victorian (circa 1880's), to post 2000 builds. Among other metrics, school site area was also measured; the smallest being 858m 2 and the largest being greater than 40,000 m 2 ( Table 2). There is clearly a good diversity of physical characteristics amongst this sample.

Classrooms
The aim at the outset was to gain the widest possible range of classrooms. However, it was found that in many reception classes it was not possible to obtain pupil performance measures that were comparable to those in the later years. Consequently of 203 classes studied only 153 classes from Years 1-6 were used in the final analysis.
The architectural data collection consisted of two complementary surveys in each school, carried out on the same day: a very detailed survey for each selected classroom and a whole school survey, taking measures of shared spaces, eg. libraries, assembly halls, gyms, outdoor areas. In the classroom survey: • Hard measures were taken, such as: room dimensions, size of windows, placement of doors and Interactive whiteboard (IWB), desk arrangement and learning zone layouts. A range of further factors was assessed in each classroom to create a database of measurements covering all of the hypothesised "indicators" in play. These included aspects, such as: how much control there was of the classroom environment, for example the presence of a radiator thermostat or air conditioning; how the children used the space, whether they had their own coat pegs and the quality of the desks and chairs; and the colour of decorations and complexity of displays within the classroom. The measures are shown summarized as the factors in Table 1 and the creation of the metrics for each is discussed below. • In addition five spot meter readings were taken in each of the rooms to assess the environmental conditions at the time of the visit. Lighting levels, CO 2 levels, Temperature, noise levels and relative humidity were recorded. These measurements were used to provide an enhanced opportunity for the researchers to identify potential problem areas. However, the measurements were not used directly in the metrics created.
• Lastly, a questionnaire-based interview was also completed, investigating each teacher's experience of their classroom. These questions sought the teachers' opinions of the teaching spaces as they performed through the whole year (as opposed to the above spot measurements). They covered issues like, for example, whether glare was a problem, and if so when. Again the responses to the teachers' questionnaires were not used in the metrics that produced the final results in this study, however they did help the researchers in highlighting potentially important factors to consider.
For each of the factors in Table 1 a 5-point rating scale was used to make an assessment, drawing from the above data, of the characteristics of the factor over the study year. As far as possible this employed simple physical measurements, such as the size and orientation of the windows in relation to daylighting. However, for some factors it was necessary to employ "expert judgement" to give a comprehensive treatment of all of the hypothesised factors. An example of an area where such judgement had to be used concerns the visual complexity of displays. Experimenter bias / internal validity was addressed by separate researchers making assessments and then comparing and establishing a consistent approach, in this case based on assessing both coverage and coherence. As an indication of how the ratings were scored Table 1 shows the criteria which make up the highest ratings in each of the factor categories. The factor scores were averaged to build the ten HEAD design parameters; Light, Sound, Temperature, Air Quality, Links to Nature, Ownership, Flexibility, Connection, Complexity, Colour. Descriptive statistics for the HEAD design parameters are shown in Table 3. Here it can be seen that the sample again displays a good level of variation in the all of the factors. Table 3 Basic metrics of the classroom sample.

Pupils
The HEAD project surveyed 203 classrooms from 30 schools and collected performance statistics from 4924 pupils. Data used in the final results came from 153 classes in 27 schools and 3766 pupils. For each pupil it was essential that the specific classroom they had occupied was identified, so that in the analysis the "pupil effects" could be identified as distinct from "classroom effects". The pupils were in Years 1 to 6. The data needed for the study was the pupil grade at the start of the academic year and pupil grade at the end of the year. Grades were collected for three subjects: Reading, Writing and Maths.
Children in KS1 are assessed using a variety of performance systems. National Curriculum, hereafter NC, levels start at Level 1c with an equivalent NC point score of 7, (Table 4) so children working at or above these NC levels were used in this study. Some schools also used P scales at KS1, and again this data was used. However some children were assessed on a 9-point Foundation Stage Profile which had been introduced, but then rapidly replaced by a much simpler 3-point version. For KS1 pupils in this study it was found that the later 3-point scale did not include enough detail to place the pupils on the NC equivalent points system, so these pupils were not used. It was also common to find schools giving progress as 'working towards' which again could not be used. Table 4 Conversion of National Curriculum (NC) levels to NC points UK pupils throughout KS2 are normally assessed using the NC levels shown in table 4. Each NC level has 3 sublevels (denoted by a, b and c) and on average pupils are expected to achieve progress of 2 sublevels per year in each subject. National tests are taken at the end of Year 2 (KS1 test) and at the end of Year 6 (KS2 test). An average pupil is expected to be at level 2b at the end of KS1 and progress to level 4b by the end of KS2. For pupils studying at KS2, who have been assessed as having special educational needs a P scale, which leads into NC levels is used (see Table 4). For pupils in KS2 who have English as an Additional Language (EAL) a separate 5-point EAL scale is used by teachers (not shown).
For analyses of performance statistics, the NC levels were converted to a NC points score as given in Table 4. With the EAL pupils below the 4 th point in the EAL scale there is no equivalent NC points score so these pupils, who have no verbal or written skill in English, were not used. Pupils at the 4 th and 5 th EAL points are considered to be working at the low end and high end of the NC level 1, so were converted to level 1c and level 1a respectively. Overall Progress is the dependent variable in our regression analysis. It has been grand mean centred over all 3766 pupils. The summary statistics for the learning measures used are given in Table 5. It can be seen that the mean progress for the pupils in the survey population is 11.90 NC points, where 12 NC points would equate to two sublevels in each of the three subjects, which is the "expected" progress mentioned previously. Table 5 Descriptive statistics for pupil NC points score To enhance the analysis of factors associated with the individual pupils, schools were also asked to provide extra contextual data in the form of date of birth, gender, date of first class of the year, date of last class of the year, attendance rate and whether the pupil was in any of the government classifications of Free School Meals (FSM -a measure of deprivation), EAL or Special Educational Needs (SEN). Date of first and last class and attendance rate were collected to ensure pupils could be excluded from the study where they had poor attendance or had not been in the class for the whole year of study. In total there were 669 pupils (18%) rated as SEN, 874 children (23%) with EAL, and 775 pupils (21%) with FSM status.
As a starting point in the study several pupil factors had to be controlled for. Because pupils learn at different rates from year to year over their school life, the start grade of a child, compared to the average start grade in that year group is a key indicator of their potential progress. Start grade was therefore group mean centred on age (a proxy for year group) and is termed 'Weighted start-on-age' in this study. Pupils in the UK are almost always taught in classes of the same age. The start grade was also grand mean centred on the whole dataset to form a second explanatory variable which relates to how far a pupil is along their learning journey through the KS1 and KS2 syllabuses. This is termed the 'Weighted start'. Other explanatory variables are straightforward such as gender, FSM, EAL and SEN. Two further variables were also created for the study; Actual Age, which is the grand mean centred age in months for the child, and the Months Age, which is the number of months the child is past their birthday at the start of the academic year. This gave the relative age in months of the pupil compared to their year group, that is, if they were "old" or "young" in their year.
As a final step in creating the pupil variables for the study, the Overall Progress, the Weighted Starton-age, the Weighted Start, the Actual Age and the Months Age variables were 'normalized'. This process involved calculating the variance from the mean of the data set for each datum and then dividing by the standard deviation of the data set.
Again it can be seen that the pupil population displays a lot of variety across the measures used and in terms of features such as FSM, EAL and SEN.

Modelling strategy
The analysis followed two broad steps. First the influence on learning of each of the factors being studied was addressed separately through bivariate analysis. Then, once the measures likely to be in play had been identified, and any inadvertent inter-correlations had been minimised, a multi-level analysis of their combined effects was carried out. This latter part is the more unusual and so is described in greater detail below.
In this study we aimed to model pupil Overall Progress, which is a continuous variable, using a linear regression model. Because pupils learn together in classrooms we expected the pupil progress between pupils sharing the same classroom environment to be more correlated than pupil progress between pupils in different classrooms. For this reason we needed to use a type of linear regression model that allowed data to be clustered in groups, called a multi-level model (MLM). MLM analysis allows modelling of the variance-covariance matrix from the data directly so that the normal requirement of homogeneity of variance across the whole dataset can be dropped [33].
The structure of the MLM needed for this study was a two level model where pupils at Level 1 are nested within classrooms at Level 2. A three level model, with pupils (Level 1) nested within classrooms (Level 2), and classrooms nested within schools (Level 3), was also tested but not used in the final analysis. This will be discussed more fully in the results. The term 'nested ' is used as each child only learns in one classroom, and each classroom is only within one school.
MLM analysis also allows unexplained variance to be identified at each of the model levels. For example in the case of the influence of teachers, our efforts to create measures were unsuccessful owing to understandable confidentiality concerns. Thus, it is assumed that this important element is left in the unexplained variance at the classroom level. Nye et al.'s meta-analysis scales the magnitude of the teacher effect at somewhere between 7 -21% of the variance in pupils' achievement gains [34].
A specialist modelling software package MLwiN [35] was used for the study. The modelling procedure follows that outlined by  for a two level model with clustered data. The initial Level 1 (pupil) model was written as:

‫݈݈ܽݎ݁ݒܱ‬ ‫ݏݏ݁ݎ݃ݎܲ‬ ൌ ߚ ݁
Where ‫݈݈ܽݎ݁ݒܱ‬ ‫ݏݏ݁ݎ݃ݎܲ‬ is the individual Overall Progress for child i in classroom j which depends on ߚ , the intercept (mean value) for classroom j plus a residual, ݁ , associated with each child. The initial Level 2 (classroom) model was: Where the intercept specific to classroom j (mean value in classroom j) depends on an overall fixed intercept ߛ plus a random effect ‫ݑ‬ associated with classroom j. The overall mixed level model was given by: After building the basic structure of the regression model, the explanatory variables could then be added. As a test of the efficacy of an additional explanatory variable to improve the model, a likelihood ratio test was carried out. The '-2*log-likelihood' function was calculated for each of the competing models, that is the simpler model and that with the additional factor. Then, to test if the latter model was a significant improvement, a comparison was made of the difference in '-2*loglikelihood' between the two models taking a chi-squared distribution on 1 degree of freedom. This was repeated for each added explanatory variable (Chapter 2.5 [36]).
The next step in building the model involved adding the explanatory variables both at Level 1 and at Level 2. Following the procedure outlined in West et al. [37] explanatory variables at Level 1 were added first using a ' Step-up' procedure. The two primary predictors of pupil progress that we were using in this study were the start grades for each child; Weighted Start and Weighted Start-on-age. These two variables were added sequentially and the significance of the model improvement noted using the -2*loglikelihood statistic at each step. The model was then improved by adding the random effects on one of the Level 1 variables. The best improvement was found when a random effects variable is added to the Weighted Start-on-age. As we allowed the intercept value to vary according to which classroom a pupil was in using coefficient β_0j, we then allowed the slope of the line to vary according to classroom with the coefficient β_1j. This coefficient describes the relationship between the average Overall Progress and the average start level compared to children in the same year. This type of MLM is sometimes called a random slope model [36].
Each of the other Level 1 explanatory variables were added to the Level 1 model and the '-2*loglikelihood' tested to make sure the variables made a significant improvement to the model. There is deemed to be a significant change where the p<0.05 (2 tailed). The step-up procedure is used when each of the explanatory variables to be added are independent of each other. In this case gender, age and the key pupil metrics of FSM, EAL and SEN were all independent of each other.
The second part of the process involved adding the classroom explanatory variables at Level 2. Each environmental factor was tested individually by creating a model with just this environmental factor, and there was deemed to be a significant change where the p<0.05 (2 tailed). With the remaining variables there were still inadvertent correlations between some of the factors (see 4.1 below). Because of this a top-down approach was used when adding these variables so that the fitted model showed the combined effect of all these factors, before each factor was removed to test for its individual significance in the overall model [37]. As each remaining classroom parameter was sequentially removed the '-2*log-likelihood' was compared to the full model to see if there was a significant change (p<0.10, 2 tailed). Where the presence of the parameter significantly improved the model, it was retained; if not, then it was left out. Once all of the parameters that were not significant had been removed, a further procedure was carried out by adding back in each of the rejected parameters. This last step is important as the classroom parameters, because of their intercorrelation, had an impact on each other. A higher p-value limit was allowed in the final test as both the bivariate analysis and the individual modelling results had already shown the significance of each individual classroom parameter at the higher level.
In the initial bivariate analysis (Table 6), focusing on the pupil factors first: the start scores were significantly negatively correlated with the Overall Progress. This means that the higher the start score the less progress was made. This is also true for the Actual Age measure. The children in older classes made less progress. The correlation for gender is not significant, so males and females did not make significantly different Overall Progress. Children on FSM have poorer progress, as do SEN children. EAL pupils on average have significantly better Overall Progress. These are significant influences that clearly had to be taken into account in the MLM if the impact of the environmental factors was to be isolated. Table 6 Pearson correlation between each variable and each pupil's overall progress.
In the development of the environmental factors, scatterplots were initially produced to examine the relationship between pupil progress and each of the measures in isolation. Elements were retained in the study where a broad relationship was confirmed between the pupil progress and the measure. Particular note was taken when non-linear relationships were observed (see below) and for these factors curvilinear scales were created.
Correlations of Overall Progress for each pupil against environmental measures showed all ten parameters were positively correlated with progress. Of the five Naturalness parameters Light has the highest correlation with Overall progress. In the formulation of the Light parameter the highest quantity of natural and electrical light, but without direct sunlight, was found to be optimum. Too much direct sunlight into the classroom was found to cause a glare problem. In the Individualization theme all three parameters were found to be significantly positively correlated. For the Level of Stimulation parameters the two factors of Complexity and Colour were both found to be curvilinear and an intermediate level of the parameter was found to be optimum. For example both high Complexity and low Complexity classrooms scored poorly, while intermediate values of Complexity scored highly.
In the creation of the measures for the factors we endeavoured as far as possible to remove cases of high inter-correlation between the measures, given the attendant concern of double-counting. However, the driving focus had to remain on representing the hypothesised influences on learning being tested. Consequently there were some instances of parameters with significant correlations, for example, for the parameters Light and Air Quality the correlation stands at 0.312. This was owing to Light including a measure of 'window size', while Air Quality included a measure of 'open-able window size'. Against this context, Table 7 shows the inter correlations between the parameters.

Multi-level model
Multilevel modelling allows nesting of children within classrooms. Within a two level model variance was then partitioned between the two levels: pupil level and classroom level. Using the explanatory variables to fit a statistical model allowed some of the variance at each of the levels to be reduced. The empty, or null, two level model, as it is initially set up, without any explanatory variables describes the partition between variance at the pupil level and at the class level. In our data set for the Overall Progress the empty model partitions approximately 55% of the variance into the pupil level and approximately 45% of the variance into the classroom level.
In the three level model only 3% of the variance was at the school level. Showing that, even though the schools were chosen to be as different as possible in both architecture and pupil intake, variance in yearly Overall Progress was dominated by pupil effects and classroom-level effects. The small level of variance at the school level may be influenced to a degree by all the schools being state funded, mixed gender and local authority controlled. This does not reflect the full spectrum of UK primary schools, but it does represent the great majority. It should also be noted that that there is considerable variation in the physical characteristics of the schools, the impact of which is the focus of this study. Factors at the school level were investigated, but only minor impacts revealed as would be expected given the distribution of the variance set out above. For this reason the three level model was not investigated further. However, the low level of impact on learning of the school level factors, compared to classroom and pupil level factors, is in itself an important finding. We return to this issue in the conclusions.
The results for the two level Overall Progress model are shown in Table 8. Values are shown for the fixed effect coefficients for each of the added explanatory variables and for each of the random effects variables. The sizes of the coefficients reflect the relative importance of the explanatory variables in the model. Table 8 Parameter estimates and standard errors for factors significant in the MLM.
The proportion reduction in variance (PRV) by adding explanatory variables to the model at Level 1 and Level 2 is given in Table 9. The pupil explanatory variables reduce the Level 1 variance by 18% and the classroom explanatory variables reduce the Level 2 variance by 26%. The overall R-squared fit for the two-level model is 58%. Table 9 Proportion reduction in variance (PRV) by adding Level 1 and Level 2 factors to the model.
The following two sections discuss the explanatory variables significant at the classroom and pupil levels.

'Pupil level' influences
Results from the two-level model show the Level 1 factors that were significant in the model were Weighted Start, Weighted Start-on-age, FSM, EAL and SEN. Gender was not significant in the model. Children on FSM, and who have SEN did significantly worse than other pupils. EAL pupils did significantly better. The sizes of the coefficients is indicative of their relative effect, with EAL pupils and FSM having similar sized effect and the SEN pupil Overall Progress deficit being more than three times as great. With Weighted Start the model coefficient is negative indicating pupils who are in higher year groups made less progress. It should be noted that although the NC points scale is linear and there is an expectation that each pupil, whatever their age, makes the expected two sublevels improvement per year, there is an acknowledgement by teachers that learning rates in children are not linear. For Weighted start-on-age the model coefficient is positive indicating pupils who are advanced for their age group did on an average make more progress.
These results are similar to the earlier bivariate correlation analysis, but now of course provide an interactive backdrop within the same model as the environmental factors, to which we now turn. In addition to these operationalised pupil factors, other aspects linked to the pupils, but not measured, are also included in the modelling, within the unexplained variation compartmented at the pupil level.

'Class level' E-H-P influences
Out of the ten environmental parameters investigated in this study seven of them significantly improve our two-level MLM for Overall Progress in primary aged school children. These are shown with their model coefficients in Table 9. The environmental classroom parameters that are significant come from each of the three different design principles: Naturalness, Individualization and Level of Stimulation. Table 10 gives the breakdown of the relative importance of the parameters. The Naturalness parameters of Light, Temperature and Air quality together explain 49% of the effect on the Overall Progress model. The Individualization parameters of Ownership and Flexibility together explain 28% of the effect. The Level of Stimulation parameters of Complexity and Colour together explain 23% of the effect. The relative sizes of these classroom effects across the three principles reflects a reasonable expectation that the most influential principle is the Naturalness of the environment. The second most influential is how well the classroom is individualized for its pupil and the last component, which still accounts for almost one quarter of the effect, is the Appropriate Level of Stimulation in the classroom.  Table 11 takes the findings on the individual parameters and compares them with existing evidence from the literature. Many of the sources used for the latter have been focused on single factors, quite often in controlled conditions, whereas our findings derive from a "natural inquiry" where even when we focus on one factor, it is still acting in the context of all the others. Table 11 Insights from main study results, by design parameter.

Discussion
Although informed by previous studies, this study goes on to further concentrate on the complex interaction of a range of built environmental factors on pupils in primary schools. That said, findings concerning comfort issues, rooted in the design principle of 'naturalness', are found to be generally consistent with the literature. Light, temperature and air quality have a significant impact on the pupils' learning outcomes. However, this study also finds that large window size is not universally valuable in terms of maximizing learning benefits. Orientation, shading control (inside and outside), the size and position of openings, all have to be carefully taken into consideration so that the risks of glare, overheating and poor air quality can be avoided at the design stage. Furthermore, the importance of occupants' control of the 'naturalness' is evident. High quality and quantity of electrical lighting, central heating with thermostatic control and mechanical ventilation can all give opportunities for teachers / pupils to adjust the environment to a more comfortable level. It should be noted that although acoustics and links to nature displayed correlations to learning progress in the bivariate analysis, they were competed out in the MLM and so the evidence for their importance within this (quite extensive and varied sample) can only be said to be weak.
Pupils in primary schools usually have a relatively fixed learning space for most of their time there. They will build up considerable familiarity with their classrooms, and the extent to which they are able to have a room that responds to their individual needs comes under 'individualization', the second design principle. Permanent individual display (artworks, photos, crafts) has been addressed by many previous studies as an efficient way to promote a sense of ownership. This study confirms it and goes a step further. A classroom that has distinct architectural characteristics, e.g. unique location (bungalow, or separate buildings); shape (L shape; T shape); embedded shelf for display; intimate corner; facilities specifically-designed for pupils, distinctive ceiling pattern etc. also seems to strengthen the pupils' sense ownership. No clear consensus is reached from previous studies whether classroom size is a factor that affects the learning outcomes. It appears that classroom shapes and the optimum elements within a room depend on pupils' ages. Where play-based learning is the primary activity (KS1), the room needs to reflect this with varied learning zones. Where more formal instruction is given through the interactive white board all pupils must be in a position to easily see the front and so a simpler plan seems appropriate (KS2). It should be stressed that this distinction appears to be a function of the predominant pedagogical approaches used in the UK. Lastly, the connection factor, concerning corridors and navigation about the school, have not appeared in the MLM and so only receive weak support from this study through a link to learning progress within the bivariate correlation analysis alone.
A classroom in a primary school is for children, and arguably should be designed to make attending school an interesting and pleasurable experience. On the other hand, it is also a place where learning can take place uninterrupted by distractions. Lying behind this dynamic is the third design principle concerning the 'appropriate level of stimulation' for a given activity. The influence of the parameters identified to affect the visual perception of diversity in this study is found to be curvilinear, such that intermediate levels of the factors are optimal for learning. For example, the overall appearance, including the room layout and display on the wall has to be stimulating, but in balance with a degree of order, ideally without clutter. Similarly, colours with high intensity and brightness are better as accents or highlights instead of being the main colour theme of the classroom. This simple notion of a moderate level of stimulation being appropriate for the learning situation provides a principle that can throw light on a number of more focused studies.

Summary
The research in this study focused on a holistic environment-human-performance model examining school and classroom spaces and relating these to individual pupil progress statistics. Researchers assessed 153 classrooms in 27 schools to measure school and classroom features. Data on the 3766 pupils who occupied those spaces were also collected, including the focal dependent variable of progress in learning. The design principles of Naturalness, Individualization and Level of Stimulation were used to develop ten design parameters. The underpinning hypothesis is that pupils' academic progress will be dependent on a full range of factors drawn from across all three of the design principles. Measures were then created for the ten design parameters for each classroom. All ten parameters individually correlated significantly with pupil progress. Multi-level regression modelling was then used (including pupil factors) and resulted in seven key design parameters being identified that best predict the pupils' progress. These were Light, Temperature, Air Quality, Ownership, Flexibility, Complexity and Colour. The impact of the modelled classroom parameters was 16% of the total range of the variability in pupils' learning progress. Inclusion of three very different local authority areas with distinctly differing pupil intake characteristics and differing school building environments was intended to support the analysis at the school level. It did not do so. It became evident that the variability in learning progress to be explained at the school level in the multilevel model was only 3%. Including this level of analysis did not enhance the overall analysis and so was dropped.
In Phase 1 of the study, classroom parameters were found to explain 25% of the variance in learning progress [1]. In Phase 2 the sample is five times bigger and the classroom effect has levelled out at 16%, but with much greater certainty. The second phase of the study has also included additional pupil impacts relating to: Free School Meal (FSM) status, English as an Additional Language (EAL) status and Special Educational Needs (SEN) status. The R-squared value for the goodness-of-fit of the regression model has improved from 51% in Phase 1 to 58% in Phase 2.

Main contributions
This study has thrown light on a variety of issues ranging from broad conceptual challenges, to quite specific, practical questions.
One of the major, more general, contributions of this study is to confirm the hypothesised utility of the naturalness, individuality, stimulation (or more memorably, SIN) conceptual model ( Figure 3) as a vehicle to organise and study the full range of sensory impacts experienced by an individual occupying a given space. That this might be a productive way forward was argued speculatively in 2010 [15], but the results obtained provide clear evidence that each of these dimensions appears to have a role in understanding the holistic human experience of built spaces. It is interesting that (in this particular case of primary schools) the naturalness factors account for around 50% of the impact on learning, with individuality and appropriate level of stimulation factors accounting for roughly a quarter each. It could not be predicted if each of the dimensions would remain in play and if so with what relative weight. We now at least have an initial indication, in one situation.

Figure 3 Holistic conceptual (SIN) model
The finding that the combined impact of the built environment factors on learning scales at explaining 16% of the variation in learning progress made is a major finding in an area where, as Baker and Berstein phrase it [62]: "the relationship between school buildings and student health and learning … is more viscerally understood than logically proven" (p2). This is of course relevant in relation to schools, but as stated at the start of this paper, primary schools provide a relatively simple situation to study a complex general problem. By extension the results suggest that the scale of the impact of building design on human performance and wellbeing can be identified and that it is non-trivial.
It has also been informative how some factors that display quite strong and significant correlations, as single factors, with (in this case) learning progress, drop out of the analysis when combined with all other factors, for example "links to nature". This demonstrates the value of single factor analyses in creating hypotheses, but highlights the danger of assuming they will translate simply to naturally experienced, multi-dimensional environments. This reinforces the utility of multilevel modelling in studying complex situations as "natural" experiments.
One aspect that surprised the researchers was the muted impact of the whole-building level of analysis. To an extent this will be a result of the characteristics of this study's focus on primary state school education, where the pupils spend most of their time in one space and following the national curriculum. That said, it does provide support for the rise in recent years of polemical works arguing for "inside-out design" [63] that builds from a focus on user needs and challenges the visual dominance of much design effort [64]. This is twinned by those arguing specifically for aspects of sensory-sensitive design [65,66]. It would seem that these aspects are more important than is often realised. Figure 3 provides a powerful illustration of this issue. Each column of plots represents the classes in a school and it can be seen that the variation in modelled performance of the classrooms within a given school varies very widely. There is no such thing here as a "good" or "bad" school, but there are very clearly more and less effective classrooms. Focusing down on school design itself, the study has been able to identify and typify the elements of design that together appear to lead to optimal learning spaces for primary school pupils. This is summarised in Table 12. Several of the factors are not only issues for designers, but present opportunities for users to adapt their spaces to better support learning. However, there does remain a considerable design challenge to elegantly address all of these factors optimally in combination.

Table 12
The main classroom characteristics that support the improvement of pupils' learning.

Limitations and future research
This study has strengths and weaknesses. The chosen focus and the conceptual and methodological approach employed have enabled progress to be made, but also carry limitations and consequent opportunities for alternative approaches. In addition the findings to date also provide a foundation upon which future studies could be built with greater confidence than before.
The sample is focused on one type of building (primary schools) in one country (UK / England) and has endeavoured to explain one measure of human performance (formal academic progress). Primary schools and the pedagogy practiced within them in the UK are quite distinctive and it could be anticipated that in other scenarios the impact of the whole-building level could be more prominent. It could also be that other factors, or weighting of factors, are relevant to other dimensions of education, such as behavioural development in pupils. It would certainly be anticipated that different requirements could pertain for different activities where, for example, the appropriate level of stimulation varies. Further, the UK displays quite specific climatic conditions and for other geographical areas the specifics of how the optimum conditions are realised would be expected to vary. That said, the basic human comfort needs would probably be more stable. So, for example, the orientation and power of the sun could be quite different in different regions so that window design would need to take this into account, but the human need for sufficient light, but not too much glare should translate. More complex would be cultural differences, which could drive variations in the approach to pedagogy, or more basically effect preferences / reactions to factors such as colour.
The flip side to the above limitations is that, building on the experience of this study, further studies could fruitfully be carried out of different types of learning institutions, such as secondary schools and universities. This could extend beyond education to, say, offices, accommodation for the elderly, and retail [67]. For these, preliminary soft data studies would be advisable in order to provide a sound foundation for the hypotheses and the identification of a powerful dependent variable will not always be very simple. It would also be beneficial to go beyond the methodology used to date and move, say, to an action research approach, where changes are made based on the results so far and the impacts (anticipated and unanticipated) are tracked through multiple triangulated methods.
Within the dataset already compiled, there are sub-analyses possible, for example of the impacts of spaces on SEN pupils in particular. It will also be interesting to see to what extent currently judgemental measures can be moved to objective measures, for example the issue of visual complexity.

A significant direction
Given the large sample size and the scale of the effects identified in this study, it seems reasonable to suggest that strong proof of concept has been provided for the efficacy of the approach used in this research. Using the broader SIN conceptual model, linked to MLM, clearly has the potential to reveal more about the holistic impacts of spaces on people. That said, it is vital to capitalise on this promising initial step and to further develop these concepts and techniques.                    Natural light significantly influences the reading vocabulary and science scores. Large windows were found to be associated with better learning results over a one year period [10,38].

Different
Light has the highest impact on Overall Progress among other design parameters. However, window size alone was not significantly correlated with the learning progress. Only when the orientation and risk of glare was taken into consideration, could the pupils benefit from the optimum glazing size. *Light (E light) Poor quality of electrical lighting causes headaches and impairs visual performance [39]. Fullspectrum fluorescent lamps with ultraviolet supplements had better attendance, achievement, and growth than did students under other lights [40].

Consistent and goes further
Not only the quality but also the quantity of electrical lighting has a significant positive correlation with the pupils' learning progress Sound (Good acoustics) Significant effects of reverberation time (RT) on speech perception and short-term memory of spoken items were found [41].
Weak support RT was not measured in this study. However, there is some evidence to support the relationship between the RT and some design strategies, e.g. room shape and carpet area. In the bivariate correlation analysis these factors were found to be significantly correlated with the learning rate, however, these aspects did not feature in the MLM results. Sound (Noise) External and internal noise were found to have a significant negative impact upon performance [42][43][44] Weak support Noise level was not tested in this study. However, the factors that affect the noise level, e.g. distance from the main traffic and busy areas adjacent to the room being studied, displayed a bivariate correlation with the learning rate. However, these aspects did not feature in the MLM results. *Temperature (sun heat) The performance of two numerical and two languagebased tests was significantly improved when the temperature was reduced from 25°C to 20°C [19].

Consistent
Factors affecting the temperature were correlated with the learning progress. Unwanted sun heat was a problem where external shading was absent.
*Temperature (control) Occupants with more opportunities to adapt themselves to the thermal environment will be less likely to suffer discomfort [45].

Consistent
Pupils perform better in the room that where the temperature was easy to control.
*Air quality (CO 2 level) The mental attention of pupils are significantly slower when the level of CO 2 in classrooms is high [46] and when the air exchange rate is low [19,47] Consistent Factors affect the CO 2 are correlated with the learning progress. E.g. pupils perform better in the room that has mechanical ventilation, large volume or large window openings. Links to nature (Window view) Patients assigned to rooms with windows looking out on a natural scene had shorter postoperative hospital stays than those similar rooms with windows facing a brick building wall [7].

Weak support
The quality of view out of the window shows a bivariate correlation with learning progress where window sills are below children's' eye-level. That said this aspect did not feature in the MLM results.  natural, greener environments [48] learning progress as are those with dedicated outdoor play areas. That said this aspect did not feature in the MLM results. Individualisatio n *Ownership (Distinct design feature) An attractive physical environment in school is associated with fewer behaviour problems, whereas a negative physical environment is not [49].

Consistent
Architectural design elements that make the room unique and child-centred are significantly correlated with the learning progress *Ownership (Nature of the display) Permanent student artwork enhanced the student's sense of ownership over the learning process [50]. There was a significant positive effect on children's self-esteem [51].

Consistent
Personal displays by the children create a 'sense of ownership' and this was significantly correlated with learning progress *Ownership (Furniture) Specialized facilities are essential to student wellbeing and achievement [52][53][54].

Different
Furniture and features in the class that were ergonomic and comfortable for the children were significantly correlated with learning progress significantly *Flexibility (Room layout) Significantly more exploratory behaviour, social interaction and cooperation occurred in spatially well-defined behaviour settings [55,56].

Consistent
Flexibility measures investigated in this study were breakout spaces and rooms, storage solutions, number of different learning zones and potential display area. More learning zones for younger children and fewer for older children correlated with learning progress. Breakout zones within the room were correlated with learning progress. *Flexibility (Size) Girls' academic achievement was negatively affected by less space per student; boys' classroom behaviour was negatively affected by spatial density conditions [57].

Different
Larger rooms with simpler shapes (squarer) enabled older children to better function in whole class learning. However, complex room shapes for younger children facilitated learning zones and enabled flexibility. Connection (Pathway) Movement and circulation have a significant effect on reading comprehension [10].

Weak support
Wider and more orienting corridors showed a bivariate with better learning progress. However, these aspects did not feature in the MLM results.
Level of stimulation *Complexity (Room diversity) And Display diversity) Learning scores were higher in the sparse-classrooms than in decorated-classrooms [27]. However; Read et al. [58] found that the space with differentiated ceiling height and wall colour may be too stimulating for children. Children in Low Visual Distraction conditions spent less time off-task and obtained higher learning scores than children in the High Visual Distraction condition [59].

Different / Consistent
This research found that it is the overall room and display diversity measure that correlates with learning progress. The overall room and display diversity from under-stimulation to overstimulation was curvilinear which indicated that only when the room has an intermediate level of stimulation does it have a positive effect on pupils' learning progress. *Colour (Wall and Classroom colour) Off-task behaviours clearly dropped when the colours of the classroom walls were changed from off-white to saturated colours [58,60] Children prefer the colour red in the interior environment. Cool

Consistent
Rooms with a balance of light colour or white walls with highlighting of a feature wall or organized bright display colours had the best correlation with learning progress. A brightness colour scale was used to distinguish colour elements. Added colour elements in the room with bright coloured colours were favoured over warm colours for children from 3-5 years old [61] furniture, carpets and other elements were also correlated with learning progress. Table 12 The main classroom characteristics that support the improvement of pupils' learning Design principle Design parameter Good classroom features Naturalness Light Classroom towards the east and west can receive abundant daylight and have a low risk of glare. Oversize glazing has to be avoided especially when the room is towards the sun's path for most of year. Also, more electrical lighting with higher quality can provide a better visual environment. Temperature

*Significant in MLM
The classroom receives little sun heat or has adequate external shading devices. Also, radiator with a thermostat in each room gives pupils more opportunities to adapt themselves to the thermal environment. Air quality Large room volume with big window opening size at different heights can provide ventilation options for varying conditions. Individualisation Ownership* Classroom that has distinct design characteristics; personalized display and high quality chairs and desks are more likely to provide a sense of ownership. Flexibility Larger, simpler areas for older children, but more varied plan shapes for younger pupils. Easy access to attached breakout space and widened corridor for pupils' storage. Well-defined learning zones that facilitate age-appropriate learning options, plus a big wall area for display. Stimulation Complexity* The room layout, ceiling and display can catch the pupils' attention but in balance with a degree of order without cluttered and noisy feelings. Colour* White walls with a feature wall (highlighting with vivid and or light colour) produces a good level of stimulation. Bright colour on furniture and display are introduced as accents to the overall environment. * Strongly usage-related classroom features  a Choice was renamed to Ownership to better describe its relationship to the pupils b Texture parameter was reconfigured from a measure of outdoor spaces to a new parameter called Links to Nature which reflected classroom elements relating to natural elements. It was moved into the Naturalness principle. c Within Connections one element of the measure was removed (clear corridor) as research into wayfinding indicates temporary elements can be used as orienting features.