Building energy performance monitoring through the lens of data quality: A review

Data quality is important across sectors to ensure that data meets the requirements of its users, but until now little attention has been given to how it is reported in the architecture, engineering and construction (AEC) sectors. The lack of visibility about data quality in building energy performance monitoring motivated a review of 162 articles published since 2017. This identiﬁed that data quality reporting was fragmented and limited, with a gap from best practice and a lack of standardisation around requirements speciﬁc to building performance, including data comparability and spatiotemporal granularity. Where data quality issues were diagnosed, this was in isolation and concerned individual data quality attributes rather than within a comprehensive data governance strategy. This powerfully evidences the need to build consensus across AEC about (1) the required levels of data quality from building energy performance data, (2) a consistent reporting vocabulary and (3) how data quality is achieved. Crown Copyright


Introduction
Building energy monitoring data are being used in the architecture, engineering and construction (AEC) sectors more than ever before, but what about the quality of this data?Digitalisation and ''maximising use of data" is a major priority to transform the construction industry [1], and this trend is seen today in the wider adoption of remote energy management [2] and growth of big data analytics in the AEC sectors [3].The data about buildings' energy performance is also growing, with half-hourly smart meter data [4] now available for nearly 30 million smart meters in Great Britain and penetration increasing [5].Data pervades AEC, be it in energy management [6], commissioning [7], calibration of simulation [8], building design data [9] or building information modelling (BIM) [10], and therefore data quality merits the attention of both academia and industry.Presently, poor data quality is a known issue in AEC, which in turn has attracted ''pessimism" and a broader lack of trust in data [11].
Poor quality data is not however an issue limited to AEC, with issues reported in healthcare [12], sports science [13], manufacturing [14] and finance [15].More generally, data quality has been positioned as a core ''quality concept" that must be ''measured", ''controlled" and ''improved" [16].Without data that meets the requirements of its use case, continued uncertainty about the quality of data will manifest in both tangible and intangible ways for AEC, particularly as energy monitoring data is used more to make decisions about how buildings are designed and operated.The marriage of data with cost intensive buildings means that suboptimal decisions made on the basis of poor data are expensive, potentially increasing capital and operational expenditure significantly.Data quality could also hinder built environment decarbonisation more broadly: not only in the carbon impacts of suboptimal decision-making but jeopardising the data needed for ''monitoring progress" towards sector carbon reduction targets, considered essential for ''successful delivery" of net zero [17].
Motivated by this context, an initial literature search revealed that so far, no academic papers have explicitly reviewed the data quality of building energy monitoring studies.There is a need to address this research gap: answering what level of data quality is currently being achieved in building energy monitoring studies, how this is achieved and the root causes of any poor data quality.The aim of this paper is therefore to comprehensively review existing building energy monitoring studies through the lens of data quality.Without achieving this and first providing visibility to data quality, there will be continued difficulty in building consensus around the required data quality and overcoming any fragmentation in how data quality is reported in building energy monitoring.
To achieve this aim, the following constituent objectives must be fulfilled: To identify existing building monitoring studies that report on data quality, reporting on the proportion that do so; To critically review this literature to determine the extent to which data quality is achieved in building monitoring; and To make recommendations for future research direction in order achieve the required consensus around data quality when using building energy performance monitoring data in AEC.
The paper is organised according to these objectives.To frame the review, section 2 introduces the theory of data quality and how it relates to AEC.Section 3 provides the review methodology, with results quantitatively described and analysed in section 4. Section 5 completes describes and quantitively synthesises what was found in each article, according to different data quality concepts.Section 6 interprets this synthesis, providing the outlook and discussion for data quality in order to support the concluding recommendations about future research direction in Section 7.

Definitions of data quality
Data quality is defined by the International Standards Organisation in BS ISO 8000-2:2020 as the ''degree to which a set of inherent characteristics of data fulfils requirements" [18].Positioned as a core ''quality concept" [16], the requirements of data ''depends on the use" [19] and data quality must therefore be defined contextually: reflecting the ''needs and expectations" specific to its application [18].Within the field of building performance analysis, literature highlights that ''what may be excellent for one objective may not be appropriate for another", with quality depending on specific ''performance criteria" [19].

Data quality attributes
Data quality attributes (or dimensions) are important data quality concepts and can be considered as sub-domains of data quality, although no ''general agreement exists on which set of dimensions defines the quality of data" [20].
Data validation is key to ensuring ''accessibility and clarity" for data users and determines whether a datapoint conforms to its structure, which can include structured, semi-structured and unstructured data [21].Missing datapoints may cause a datapoint to fail a validation test, positioning completeness as another core attribute of data quality [22].
Each datapoint should be unique and free of any duplication, while consistent data will not conflict with nearby datapoints and be free from outliers [22].Timeliness and punctuality of data meanwhile requires that data be available in reasonable time, with minimum levels of performance taking into account ''user requirements" [21].In the process of data moving through an organisational system, data provenance and lineage describe what happens to data during its ''lifecycle", something that can be traced using ''metadata" and ultimately reported as part of research [23].
While not explicitly a data quality concept, identifying the source of data and understanding its journey as is moved and processed is important, especially as the source of data can inform understanding about the ''suitability of data for particular uses" [24].
In the case of building energy performance monitoring, measurement certainty becomes key [25].Data must be accurate, achieving both trueness (defined as ''the closeness of agreement between the average value obtained from a large series of test results and an accepted value") and precision (defined as ''the closeness of agreement between individual test results under stipulated conditions") [26].Respectively, this means specifying high precision monitoring apparatus and comparing measurements with apparatus known to be true (calibration).Measurement resolution (or fineness) is another component of measurement certainty, referring to the ''smallest change in the physical quantity that produces a change in the measurement" [25].High resolution monitoring apparatus will therefore be able to indicate smaller changes in system variables.
Spatiotemporal granularity is distinct from measurement resolution, with ''temporal granularity" coming from the frequency of measurement in time and ''spatial granularity" depending on the fineness of the ''measurement grid" [27].To achieve high spatial granularity while maintaining accuracy, direct metering of building subsystems is necessary [28].Increasingly for in-situ building monitoring is the potential to use big data approaches on building and occupancy data, including novel ''sensing technologies" to increase the volume of data acquired and hence achieve increased granularity [3].
Data quality concepts such as granularity are related closely to ''big data", grounded to five key characteristics: ''volume" (related to completeness), ''variety" (of data types and structures), ''velocity" (related to timeliness) and ''value" (to data users for decision-making) and ''veracity" (overall quality) [29].

Quantifying data quality
Data quality and its attributes can be represented numerically using ''data quality measures" as part of ''data quality assessment" [23].One or more measures can be produced for each attribute of data quality and numerous examples exist to describe the quality of data numerically, with critical ''thresholds" set to identify areas of poor data quality and diagnoses problems [20].Data quality measures exist within a wider field of ''performance quantification", therefore being a subset of ''performance measures" [19].Measuring and monitoring data quality are central to BS ISO 8000-110 and embedded in an overall process of ''data quality management" [18].

Applications within AEC
In applications for AEC, there is not a single definition of data quality, other than defining data requirements and characteristics of poor data quality for particular applications.In BIM, data quality is considered a major industry ''pitfall" and source of ''pessimism" about data [11].Attributes of poor data quality include ''null values", ''outliers", ''misleading values" and ''non-standardised data".
Such poor data quality and the requirement for extensive data cleaning is a cause of reluctance in industry more to adopt big data [11].
With the diversity of buildings and their uses, comparability of data is an increasingly relevant data quality attribute.The established link between occupant behaviour and building energy performance [30] can impact comparability, and additional data (''long-term" and ''high-resolution" data) is needed to determine the extent to which a building's use is comparable by capturing ''the reality of occupants' presence and behaviour in buildings" [31].As the data subject moves from buildings to people however, data quality can trade off against concerns such as privacy and cost [3].
In monitoring, standardised procedures for building performance evaluation have recently emerged which explicitly refer to data quality concepts, such as CIBSE TM68 [27] and BS 40101:2022 [32].The former frames monitoring requirements around four levels of building performance evaluation depending on requirements, with different spatial and temporal requirements for each level [32].The latter meanwhile includes guidance on ''managing continuous sensing systems and their data", with recommendations on establishing data provenance using file naming, cleaning methods including ''smoothing" and choosing ''spatial and temporal" granularity, though without any ''fixed rules" highlighting continued lack of consensus [27].Both include minimum thresholds for the accuracy of apparatus [27,32].
Building data is also key for building performance modelling more broadly, particularly ''in-use evaluation" [33].Modelling for Measurement and Verification (M&V) of performance requires data about ''design stage assumptions, intents and targets", plus building ''performance and modifications" at the operational stage [34].The International Performance Measurement and Verification Protocol (IPMVP) sets out different requirements for M&V depending on scope, from isolated systems to whole facilities, with ''quality control" of data a key M&V activity [35].Monitoring data can be used to support operational decision-making, with the soft landings framework positioning recorded data as a core ''objective evaluation method" in the process of analysing performance [7].
Of course, one of the most widely adopted data collection technologies in AEC is smart metering [4].This automates the process of meter readings, collecting data about energy imported from and exported to the National Grid, which also means spatiotemporal granularity is typically limited to the whole building level rather than end uses and half-hourly intervals [36].

Data governance
Data quality management is a process rather than something achieved by a single action, something instated by BS ISO 8000-1:2022 and which positions data quality within the concept of ''data governance" [18].Existing literature converges around defining data governance as the process of decision-making about data and its quality, with effective oversight and processes required.The Data Governance Institute states that data governance is an ''exercise of decision-making and authority" [37], while BS ISO 8000-1:2022 refers to the ''development and enforcement of policies related to the management of data" [18].De Feo and Juran [16] highlight the need for effective ''organisational structures" around data and the development of ''data quality systems".
Processes for governing data quality include setting ''direction", monitoring data quality, ''quality focussed initiatives" and stakeholder management [16].The latter can be challenging with the growth of big data, which is often integrated from ''disparate sources" and where governance requirements are typically ''underestimated" [38].This commentary highlights the increased distance between those who create data and those with decision rights who govern data, meaning that those involved in a data acquisition process may not be literate in achieving data quality or managing data effectively.This is something also relevant to monitoring systems deployed remotely.The governance challenges that arise from big data are especially pertinent for the construction industry, being ''well-known" for its ''fragmented data management practices" [11].
Data quality improvement is central to effective governance, seeking to ''make data better suited" to serve its purpose [23].Interventions to improve data quality can take place either in the design of data acquisition or in the treatment of raw data, known as ''data cleaning" [39].While numerous data cleaning techniques exist, these will largely depend on defined ''goals" [39].Managing data quality is bound by ''time, effort and resources", positioning resource-allocation for data quality improvement itself as a key governance decision [29].This has two implications: first that resource and time availability trade off against any potential improvement in data quality and be proportional to the cost of poor data quality; and second that the value of improved data quality must be understood in order for its improvement to be appropriately resourced, particularly for industry.

Review methodology
Having established the theoretical background to defining, measuring and managing data quality, the review methodology was developed.With core data quality concepts established, together with practical impacts for AEC, the approach to identifying articles, summarising evidence in articles and analysis was framed.Research questions were aligned to each data quality attribute.The PRISMA methodology was applied to review articles [40] and the process is summarised in Fig. 1.

Literature search
A comprehensive literature search took place of the institutional library database (ExLibris PRIMO), Scopus and ScienceDirect.Search terms were centred around finding articles about building energy performance studies that involved an element of monitoring, with articles identified using relevant keywords.As each database search took place, these terms were refined to identify additional relevant keywords.Only English language articles were Fig. 1.PRISMA statement.

J. Morewood
Energy & Buildings 279 (2023) 112701 retrieved.Broader search terms such as ''in-situ monitoring" were combined with other terms to narrow searches and ensure their relevance to building energy performance.As some papers do not include indexed keywords, parallel searches took place of titles and abstracts to ensure completeness of the search process.Article metadata was managed using a bibliometric file format (.RIS).Data quality cleaning of bibliometric data took place, assessing for plausibility and completeness.Missing information was treated by entering missing article years and filling empty fields, using either the reference management software or manually.Some entries were removed due to not listing authors and unsuccess in identifying the original article.This created a body of bibliometric information to screen then screen and check for eligibility and resulted in 1,091 publications.

Eligibility criteria
The rationale for screening was to identify papers that with a substantive element of building energy performance monitoring (Table 1).Only papers published in the last five years were included to ensure currency of the review with a focus on understanding today's practice around data quality, excluding 574 papers.
The screening process saw the article title reviewed, removing 213 articles.A further 24 articles were removed during abstract screening, leading 281 to proceed to a full text availability check.The main reason for exclusion at the screening stage is that papers did not have a substantive component of building energy performance monitoring.
Full texts were available if the article was found in the institutional library or available via open access, with 263 articles proceeding to a full text eligibility check.The eligibility check resulted in 159 publications, or 11 % of original articles meeting the eligibility criteria.The need to exclude so many articles was foreseen due to the breadth of the initial search terms.A final manual search took place for eligible articles using snowballing methods, which resulted in 3 additional records.

Data quality attributes and article classification
Descriptive information was collected about each article.This summarised the evidence and extracted relevant information about the building energy monitoring involved and findings relevant to data quality.A particular focus was identifying requirements from data set by researchers and how data quality was achieved.This included reporting about buildings for comparability, the type of monitoring and equipment used, the parameters monitored, measurement certainty attributes (trueness, precision and resolution) and other references to data quality concepts within the article.Where data quality was reported, further reading took place to understand how data quality was applied: objectives set in the research methods, reporting, any quantification of data quality, issues identified and how they were managed, including data governance and cleaning.
A classification framework was developed, based on 11 data quality concepts, including each attribute and data cleaning, with articles included where there was a valid mention of data quality or a data quality attribute in the full text.Synonyms in a data quality context (for example timeliness and currency) and antonyms (for example missingness against completeness) are also eligible to be included in a category.A list of terms, synonyms and antonyms was developed relevant to data quality as a domain, decomposed into both attributes of data quality and processes to achieve it (Table 2).
Because of the lack of standardised reporting around data quality in monitoring studies, a mixed approach was taken to classification: combining a keyword search with manual review of all articles.Early summary of evidence found that attributes were often homonyms with different meanings that on sentence context (for example noisy data or noisy environment) so a simple text analysis would not suffice.Where an attribute was described in the article, the manual check ensured that the term was used in the appropriate data quality context.For example, 24 articles included terms related to timeliness but were excluded as they used the term was not used data quality context.Without a standardised set of data quality attributes, vocabulary may not be consistent, so manual reviews of the methodology and results sections of each article extracted further information relevant to data quality.
The results of classification are given in Appendix A. Once classified, articles proceeded to quantitative analysis and qualitative synthesis in the subsequent sections, identifying areas of convergence and divergence in current research.

Quantitative analysis
Analysing bibliometric data, 162 articles were found, including 54 conference papers and 103 journal articles from 46 unique journals.Articles were most commonly published in Energy and Buildings (n = 22).The number of articles per year was generally steady, with articles reporting explicitly on data quality trending upwards over time (Table 3).
Only nine articles explicitly mentioned data quality (6 %), however four made no reference to data quality at all.The vast majority of articles reported on at least one attribute of data quality, without the term data quality being mentioned explicitly.Where articles did mention data quality, spatial granularity was by far the most popular, featuring in 143 (88 %) of articles (Table 2).
On average, 3.23 data quality concepts were mentioned in each article out of a possible 11 (Fig. 2).The most concepts mentioned in a single article was eight, occurring in articles by Quintal, et al. [41] and by Bourdeau,et al. [42].This implies a lack of comprehensive reporting about data quality, as data quality concepts are not described completely in any reviewed article.
A correlation analysis was performed (Fig. 3).This found that relationships between data quality attributes were generally weak, with a coefficient range of between À0.24 and 0.45.This adds further quantitative evidence to the lack of comprehensive data quality reporting, indicating that data quality concepts are described in isolation rather than within a comprehensive data governance strategy.

Data quality in articles
This section describes and qualitatively synthesises data quality in articles.It is organised relationally according to the concepts in Fig. 4, beginning with explicit references to data quality then moving into its attributes and cleaning.
Rusek, et al. [48] explained the need to achieve data quality during data preparation and pre-processing, highlighting that ''assuring good quality of data is essential for data analytics to obtain reliable results and in consequence, draw accurate conclusions".Gupta, et al. [43] centred data quality around the development of a monitoring database with the aim of ensuring ''high fidelity" of data and subjecting gathered performance data to a series of ''quality checks".As did Nikdel, et al. [46] offered similar attention to data quality checks, alongside ''formatting" of data in Microsoft Excel.Bourdeau, et al. [42] positioned data quality checks as central to ensuring their monitoring sensors network is ''functional", with the aims of ensuring ''no missing information or other issues and that the proper data format is displayed for future applications.
Lewe, et al. [44] investigated the data quality of meter data, aiming to treat potential ''noise", ''communication failure", ''deactivation", ''sensing wrong properties" and ''lack of calibration" using data filtering techniques.Having achieved ''good quality meter data", they highlighted that buildings are faced by unique contexts that may make data ''prone to misinterpretation", giving the examples of ''different weather conditions", ''inside activities" and ''plant operations" as affecting the comparability of data [44].Wang and Zheng [49] meanwhile established data quality requirements for different real-time monitoring data: needing to  be ''synchronous", ''collaborative", ''continuous" and free of any ''interruption" (which could mean ''quantitative relationships are not available").Two studies instead focused on the relationship of data quality with data cleaning.Ma, et al. [45] explored ways to identify and treat ''abnormal" electricity consumption data while Zhao, et al. [50] investigated the potential for an online platform that disaggregates electricity consumption data to separate ''lighting sockets" and produce high quality data in absence of direct metering.Of all articles, only one article explicitly defined a data quality measure.Rolando, et al. [47] introduced the ''missed data points ratio" to report on the completeness of data and ''ratio of data to errors" to report on the proportion of data available after data filtering.This article also innovated in providing a heatmap of available data, identifying concentrations of missing data across multiple apartments [47].

Measurement certainty
A summary of measurement certainty is given in Table 4.In total, 63 articles mentioned terms related to measurement certainty, of which overall accuracy was reported in 43, precision in four, calibration in 18, measurement range in 23 and measurement resolution in 11.

Accuracy
Accuracy was described alongside the type of sensors used.For environmental measurands, there was significant variation in the accuracy of sensors used.For indoor air temperature, accuracy ranged from ± 0.1 °C to as high as ± 2 °C, with a median accuracy range of ± 0.35 °C, with five studies failing to meet the BS 40101:2022 minimum accuracy standard of ± 0.5 °C [32], and 23 from the CIBSE TM68 standard of ± 0.2 °C [27].The same was true for indoor relative humidity, ranging from ± 0.1 %RH to ± 5 %RH, with a median accuracy range of ± 3 %RH.No studies exceeded the TM 68 minimum accuracy threshold of ± 5 %RH [27], though 13 exceeded the BS 40101:2022 standard of ± 3 %RH [32].CO 2 concentration measurement was expressed both proportionally and absolutely, with the former ranging from ± 2 % to ± 5 % and the latter from ± 30 ppm to ± 200 ppm.
When reporting accuracy for energy consumption parameters, this was instead reported either numerically or in terms of accuracy classes rather than direct tolerances, with measurand specific standards such as standard IEC 60,751 [51].Of these studies, measurands were in class 1 (more accurate), three in class 1.5 and two in class 2 (less accurate).

Precision
Notably, five articles reported sensor precision rather than an overall measure of accuracy [52][53][54][55][56].The two terms were largely used interchangeably despite being distinct concepts.Mataloto, et al. [56] who used a ''DHT11" temperature and humidity sensor with a low precision (±2 °C) noted that this data had to be ''excluded" due to ''unreliable outputs".

Trueness
Calibration was key to achieving true values, with Elnaklah, et al. [57] stressing the need for sensors to undergo ''rigorous testing and calibration, making them suitable for obtaining time series with good accuracy".
Among studies, 16 (35 %) of indoor air temperature sensors were calibrated, two (29 %) of surface temperature studies, 14 (33 %) of indoor relative humidity studies, five (33 %) of CO 2 concentration, three (75 %) of illuminance, two (28 %) of electrical current, two (50 %) of gas consumption, three (30 %) of electricity consumption, one (20 %) of heat delivery and one (33 %) of heat flux.This indicates that there was a general underreporting in whether sensors had been calibrated or not, meaning little comment can be made on the overall trueness of data, despite technical standards such as BS 40101:2022 [32] and CIBSE TM68 [27] instating the importance of calibration within the manufacturer interval [27,32].
While numerous articles reported that sensors had been calibrated, only one article reported that sensors had been ''incorrectly calibrated", causing ''error" [58], with data subsequently cleaned by data removal based on expected ranges.Calibration need not be limited to the same type of test, with Alonso, et al. [52] comparing decay test results with their air tightness test to find that the latter ''did not provide an accurate measure of real air change rates".Nor is calibration always straightforward: in a study that involved calibrating four current sensors, Khwanrit, et al. [59] found that different degrees of error were produced with each sensor.

Resolution
Measurement resolution was reported in 17 articles and was not reported in the majority of reviewed articles (Table 4).Among indoor environmental measurands, there remained variation in the degree of resolution achieved: indoor air temperature sensors resolution ranged from 0.01 °C to 0.5 °C, for surface temperature 0.02 °C to 0.1 °C, for indoor relative humidity 0.01 %RH to 1 %RH, 1 ppm for CO 2 concentration and 0.1 to 1 lx.There was significant underreporting in other measurands.

Completeness
Complete data is important for usability and to ensure the temporal quality of data.Missing data risks omitting events in building performance that increase energy consumption and early diagnosis of building performance issues.A total of eight papers reported numerically the data completeness of their study (Table 5), with seven papers providing data quality measures based on percentages.A further 14 papers reported issues with missingness [41,42,44,45,48,50,[112][113][114][115][116][117][118][119].Data could be significantly affected by missingness, with the worst affected paper seeing an entire year of data missing [114].
Exploring the root cause of missingness, this was most commonly explained by practical issues with data collection and transmission [41,42,50,102,113,117,118,120].Zhao, et al. [50] highlight that monitoring studies are prone to missingness because ''when problems occur in the process of data collection and transmission, they cannot be dealt with in a short period of time".Attributed to specific practical issues, Frei, et al. [120] reported three periods of missing data, respectively from an ''unplugged" router, gateway and antenna preventing transmission of data.Missingness was also attributed to ''internet outages" [113] or ''blackouts" [41,118], sensor malfunctions [42,102] and ''system failure" [117].Data loss can occur even after successful collection, with Quintal, et al. [41] describing the ''corruption" of an SD card.
In terms of mitigating the impact of missingness, several strategies were implemented in papers.During the design stage of monitoring, systems were developed to ''identify" missingness [44] while Quintal, et al. [41] developed a system to ''notify the research team" after 60 unsuccessful attempts to collect energy data.Dabaieh, et al. [121] collaborated with ''local residents" to alert the research team to malfunctions and completed a manual check of sensors after 6 months.This aligns to data governance concepts and the importance of managing stakeholders, even remotely [16].
To clean missing data, three articles used linear interpolation to provide missing values [48,117,122].As linear interpolation would be inappropriate for several weeks of missingness, Alrawi, et al. [113] instead replaced values with data from the subsequent year.Missingness is not always framed negatively within studies, with Sözer and Aldin [123] identifying that the use of an incomplete heating season ''saves 77 % of the measurement time" while still being adequate for the purpose of their study: hence short periods of data collection about building energy performance are useful to calibrate building energy models.

Validity and data structure
Conformity to a data schema is important to ensure accessibility of data, with ease of data integration and interoperability.Seven articles reported the use of structured database to store monitoring data, managed using the Structured Query Language (SQL) [69,120,[126][127][128][129][130][131] or otherwise [53].Because SQL databases require a pre-defined and relational data model, their use ensures data schema conformity.For data exchange, six articles used the JavaScript Object Notation (JSON) format to transmit data [41,53,69,109,127,132] and two used a RESTful API [53,98].On the use of tabular data storage, four articles saw the Comma Separated Values (CSV) format used [41,42,96,113] and two articles reported the use of the proprietary Excel format [99,112].Tabular data storage may be considered structured but conformance to a data schema is not mandated and the data schema able to change with each file.
Alternatively, some articles use semi-structured schemas.In the development of a digital curation model for building monitoring, Patlakas, et al. [133] use an Extendible Markup Language (XML) based data storage, attached to a hierarchical data schema that is interoperable with Industry Foundation Classes (IFC).Quintal, et al. [41] combine both structured and semi-structured data schemas, using an XML-based format to parse information about equipment from websites as well as JSON.
While data schemas and formats were mentioned in several articles, conformance to those schemas was not.The use of structured and semi-structured data was not widespread.This implies that the curation of monitoring data is ineffective and that much of data collected by researchers does not necessarily conform to standardised schemas.

Comparability and consistency
Buildings and their environments are diverse and this can impact the ability to compare the energy performance of different buildings.Comparability is particularly important for articles which report on the monitoring of multiple sites, buildings or rooms.Lack of comparability can be a major drawback of in-situ monitoring, rather than in laboratory environments where monitoring takes place under controlled conditions.
A major barrier in achieving comparable performance is different occupancy [43,44,52,61,68,84,86,100,116,[134][135][136][137].Occupants impact building energy performance in different ways, something reflected in the reporting of occupant behaviour in articles.In a study of two classrooms, Bernardo, et al. [134] reported the ''occupancy density" and defined the occupancy for each monitored classroom.Becerra-Santacruz, et al. [86] describe extensively the demographics of each occupant for a selection of studied houses, including the number, age, gender and activity levels of each occupant.Other occupant behaviour such as opened windows and doors also impacts comparability, including both indoor air temperature and space heating [58,61,138,139].Augustins, et al. [139] developed an automatic detection system for abnormal occupant behaviour with an email when energy consumption exceeded its ''theoretical value".Building use behaviours can be attributed to other passive behaviours, with Perisoglou, et al. [61] identifying that doors were opened more often ''due to smoking".There is also the risk of occupants tampering with equipment: Elnaklah, et al. [57] ''asked" building occupants ''not to cover, touch or unplug sensors from power" and Martinez-Molina, et al. [55] hid sensors to prevent ''tempering" or ''theft".Pereira, et al. [81] omitted CO 2 measurements from bathrooms due to the impact of high relative humidity and water vapour pressure caused by occupancy patterns on sensors' accuracy.As with achieving data completeness, this highlights the need for resilient research methods to mitigate risks to data quality.Occupants can also have different preferences on heating system operation and setpoints, although few studies normalised on the basis of indoor air temperature [49,124,140].Only one study normalised energy consumption by the number of occupants, defining ''energy intensity" as electricity consumption per occupant [136].
Alternative to weather, Simanic, et al. [152] reported and normalised space heating for an ''energy index", a ''combination of heating degree hours and effects of the sun and wind".In a monitoring study of a selection of ''test days", Han and Zhang [106] on the other hand simply described weather conditions qualitatively and provided the outdoor air temperature and relative humidity range.Solar radiation can also affect the performance of renewable systems, with radiation measured for each monitoring year by Merabtine, et al. [87] and data normalised for annual irradiance by Nikdel, et al. [46].A number of studies selected monitoring periods based on representative seasonal conditions [79,83,156].Conversely, Becerra-Santacruz, et al. [86] deliberately conducted monitoring during ''the months that presented the most extreme conditions for that year" as part of a study of thermal comfort, while Zhang, et al. [89] chose mid-winter conditions to study ''climate adaption".Building-climate interaction should be considered when placing sensors, for example Han and Zhang [106] selected ''inner cubicles" to deploy environmental sensor modules that were ''less affected by outdoor radiation and temperature change".Overall however, the variety of ways data was made comparable based on different weather conditions serves only to highlight inconsistency and lack of standardised approaches.
Sensor placement has implications for comparability and representativeness more broadly.15 articles reported on the placement of temperature and humidity sensors, with 13 articles [54,55,57,63,76,80,83,94,95,101,104,114,157] reporting sensors placed 1.1 m above floor level which corresponds to ISO 7726:2001 [158] to represent the head height of a seated person.Belazi, et al. [84] alternatively place temperature and relative humidity sensors at 1.5 m and Wang and Zheng [49] at 0.75 m.Others place sensors at additional vertical positions to 1.1 m [63, 94,101,157] or hung 60 cm from the ceiling [121].Standardised sensor placement is sometimes constrained by the building and its occupants: Bernardo, et al. [134] highlighted that ISO 7726:2001 compliant sensor placement was not possible as it would interrupt building occupants.Jin, et al. [75] and Belazi, et al. [84] reported the placement of CO 2 sensors at 1.5 m above floor level.McLeod and Swainson [63] used an infrared survey to ''inform sensor placement" prevent ''elevated surface temperatures" from affecting measurements and causing unrepresentative data.
Building geometry is another major factor affecting comparability, and floor area was reported commonly when describing case studies.Mitchell and Natarajan [124] drew attention to comparability issues as the floor area measure itself (gross, internal or treated floor area) was ''uncertain".Oliveira, et al. [80] extensively describe the typology, treated floor area, internal volume, glazing and external opaque area for all monitored flats.The position of a room within a building, floor or façade orientation can impact performance, with studies of multiple rooms or buildings often choosing ''representative" rooms [90] or locating equipment based on ''characteristics" of each floor [118].In studies of multiple buildings, floor area normalisation can be used to create comparable data [124,136,149].
A recently completed building or retrofit may not represent long-term performance.The first heating season possibly sees ''higher demand" in Passivhaus for example due to moisture and adjustment of building services [124].In a monitoring study of a university building, Korsavi, et al. [154] drew attention to an incorrectly programmed BMS that affected the representativeness of data during early monitoring stages.
Reasons for lack of comparability can be multi-factorial: Li, et al. [116] described that ''potential reasons" for high consumption at a monitored house was ''house type", ''floor area" and ''occupant behaviour".Issues that go on to affect comparability can be dealt with at the design-stage of monitoring; for example Elnaklah, et al. [57] introduced ''coverage criteria" for a study of indoor environmental quality, locating sensors on the basis of ''high and low density" areas, ''areas experiencing any occupant complaints or discomfort" and ''different floor areas of buildings".The multifactorial nature of comparability is also reflected in recent developments in normalisation: traditional ''static normalisation" responds to deviations in individual variables, whereas emerging ''dynamic normalisation" combines monitoring data with dynamic building energy simulation to create a calibrated model and com- parable data [145].Alternatively, regression modelling can be applied to study an explanatory variable for performance and therefore its impact on comparability: Belazi, et al. [84] applied both univariate and multivariate logistic regression to thermostat changes; Lewe, et al. [44] applied a linear model to CDDs; Gupta, et al. [43] did so for ''occupancy pattern", ''number of occupants" and ''occupancy type"; and Simanic, et al. [152] investigated occupancy and indoor air temperature.Ujeed, et al. [119] instead used correlation analysis to study the impact of outdoor temperature and relative humidity on air handling unit performance, though correlation coefficients are limited for not providing statistical significance.
Ultimately however there is a need to distinguish between achieving comparable data and what is a true representation of building energy performance.Articles widely explained building energy performance, interpreting results within analysis and discussion sections.Indeed, lack of comparability can even be scientifically interesting: Rouleau and Gosselin [68] compared the impact of COVID-19 lockdowns on energy performance, appreciating changes in occupancy between respective years.

Noisiness
Inaccurate or incomparable data may present as being noisy, outlying or unstable data, affecting the usability individual datapoints.Outliers were removed based on an acceptable range defined differently in several studies: the value relative to the median and interquartile range [125,159] or mean and standard deviation [123], an equipment-specific range [50], reasonable ranges for each measurand [47] or for exceeding ''winsorised mean" [58].Maki, et al. [117] identified a range of anomalous values based on peaks and Li, et al. [76] meanwhile filtered indoor air temperature data based on daily range outlier indices for and a maximum 4 °C difference from operative temperature.Some articles reported and removed zero value outliers [42,45,50].Outlier removal can also take place contextually: assessing whether an outlying datapoint was continuous with neighbouring values and removing datapoints accordingly [45], within plausible ranges for CO 2 concentration [47] or marking outliers as part of data visualisation to explain them within an overall pattern [76].
Four articles reported that outliers had been removed without providing a reason [50,94,124,125].Importantly, noisy data is not necessarily always the result of a data quality issue but simply reflective of actual energy behaviour at a point in time [45].Two articles explained outliers and retained them in the dataset: Colclough, et al. [77] determined that high indoor air temperatures owed to seasonal extremes and represented a comfort issue while Han and Zhang [106] explained their outliers by changes in outdoor air temperature.Because an outlier may be a true value that diagnoses a building performance issue (such as extreme temperatures and thermal comfort), close attention is needed to distinguish problematic and non-problematic outliers and complete explanation is essential.
Outliers can also be treated using data smoothing techniques, particularly to understand long-term changes in variables, although this was not widely reported.Jin, et al. [75] applied locally weighted scatterplot smoothing (LOWESS) to do this, while three other articles applied moving averages [132,160,161].

Zero values
Outlying data can also be caused by zero values, which were reported in four articles [42,45,50,119], where values incorrectly displayed as zero and do not represent actual energy performance.Lewe, et al. [44] reported ''near zero" values being recorded due to ''faulty flow rate sensors", with erroneous datapoints being subsequently removed.

Timeliness
Timeliness of data is important to ensure fast access to and the relevance of data.32 articles mentioned timeliness while 18 studies described timeliness numerically, taken to mean the time between measurement taking place and availability to researchers on a server, dashboard or data storage service (Remote monitoring is not bound by delayed access to data in the same way that traditional monitoring, where data is collected only at the end of the study is, reported in only two studies [60,121] but likely to be far more pervasive.The penetration of AMR, building management systems (BMS) and remote monitoring technologies within AEC will continue to improve timeliness, as well as supporting visibility of building performance during monitoring studies through IHDs and data dashboards (Table 6).

J. Morewood
Energy & Buildings 279 (2023) 112701 Data can also be published in real-time using messaging protocols, with four studies using the Message Queuing Telemetry Transport (MQTT) to send data [69,[107][108][109].Timeliness is particularly important in articles which reported on the use of in-home displays (IHDs) [125] or data dashboards [53,75,130,132,165,167,168].These studies relied on timely data to update data visualisations and ensure relevance, minimising delay between measurement and display.Among these studies, the largest delay between data collection and publishing was one week [168].
It is important to detect delays quickly.Poor timeliness can be detected automatically through the monitoring system.Ioannidis, et al. [169] providing a ''monitoring component to inform the facility manager via e-mail about abnormal delays in the reception of events" and Pereira and Nunes [130] developed a system to notify an administrator ''if no data is uploaded after a pre-defined time period" or ''if errors occur while communicating with the smartmeter".Mitigation can also be caried out, for example Dzulkifly, et al. [163] completed ''proximity and delay tests" to ensure adequate ''signal coverage" so that individual sensors can relay infor-mation to their designated gateway device in a punctual way, noting areas where sensors were placed that experienced ''high delay".
Remote monitoring is not bound by delayed access to data in the same way that traditional monitoring, where data is collected only at the end of the study is, reported in only two studies [60,121] but likely to be far more pervasive.The penetration of AMR, building management systems (BMS) and remote monitoring technologies within AEC will continue to improve timeliness, as well as supporting visibility of building performance during monitoring studies through IHDs and data dashboards.

Temporal granularity
The temporal granularity of articles is described in Fig. 5, with a full breakdown in Appendix B. Monitoring at five minute intervals was the most common (n = 32).The specification of temporal granularity in studies is largely dependent on the use of data, with the highest levels of reserved for thermal comfort Fig. 6.Reported spatial granularity in articles (n = 143).Fig. 7. Reported granularity at the system level in articles (n = 74).[42,74,95,118,134,170,171], detailed system evaluation [169] or energy loads [109,116,129,130,165].
Several studies highlighted the challenge of combining different monitoring apparatus operating independently meaning measurements do not temporally align or that have different levels of temporal granularity.The latter is particularly relevant, with 16 articles found to include different levels of temporal granularity.Asynchronous timestamps can impede data analysis and time synchronisation is necessary to correct this, with six studies reporting the retrospective post-processing of data into consistent temporal granularity [48,85,113,122,123,172].Time synchronisation is particularly important given the number of articles that included different levels of temporal granularity.Time synchronisation can also be used to reduce temporal granularity, recognising that high levels of granularity can impede end usability of a dataset.Six studies processed data into larger time intervals retrospectively, either into one minute time intervals [81], one hour [74,118,160], daily [55] or multiple [41] time intervals.Most of the studies that reported a synchronisation method did so by averaging data in each time interval [41,55,74,81,113,118], with one study instead rounding timestamps to the nearest ten minutes [48].Synchronisation can also be completed during deployment of apparatus, with Alonso, et al. [52] doing this during an initial ''0 period" of monitoring.In one study, timestamping (adding a datetime to each datapoint) was necessary [76].In another, twice yearly gas meter readings were disaggregated into monthly data using modelling [124].

Spatial granularity
The temporal granularity of articles is described in Fig. 6, with a full breakdown in Appendix C. Data collection at the whole building level was the most common (n = 84).For the purposes of analysis, studies were separated with those which increased granularity for spatial levels and those which investigated individual systems (such as space heating, domestic hot water and renewable energy systems).This highlights the diverse nature of building monitoring studies and widespread use of sub-metering.Fig.7.
Minimum required levels of spatial granularity often emerge from research aims [68].Elnaklah, et al. [57] highlighted a need for sufficient ''floor area coverage" for environmental data measurements, with one CO 2 sensor per 500 m 2 of floor area.Stazi, et al. [101] took air temperature and relative humidity measurements at ''different heights", something needed to calculate the ''thermal gradient".Spatial granularity can be conceptualised around a five level ''sub-metering implementation options and hierarchy" which ranges from direct metering to estimations, trading off ''accuracy" and ''cheaper" monitoring design [28].
Several studies monitored the performance of one or more individual systems, and therefore granularity was reflected at a system level rather than a spatial level.It was particularly common for studies to measure space heating and/or cooling separately (n = 54).The rationale for improving system granularity was generally to improve understanding of the performance of measured systems and sub-systems: Kitzberger, et al. [146] highlighted the need to ''get a detailed behaviour of energy consumption" while Perisoglou, et al. [61] described that measurement of performance at the system level was necessary to understand system-level performance and whole house energy performance, achieving different ''depths of analysis".As detailed performance assessment is not always practicable, some studies combined different levels of system granularity.Mitchell and Natarajan [124] for example classified data collected into three categories accordingly to granularity, depending on whether space heating had been separately measured at the house, while Janssens, et al. [140] applied ''additional metering in a small sample for in-depth studies", including of ventilation systems.
High spatial granularity need not always be achieved with detailed sub-metering and instead data cleaning can be used to treat and improve spatial granularity.Two studies reporting the use of ''spatial tagging" to pair time series monitoring data with spaces [44,76].Some studies did not sub-meter at all and instead used modelling to disaggregate data into different end uses.Bennett [103] disaggregated space heating from hot water based on firing behaviour of a gas boiler.For heat use measurements, Janssens, et al. [140] instead disaggregated space heating from hot Fig. 8. Data cleaning methods reported in articles.

J. Morewood
Energy & Buildings 279 (2023) 112701 water by measuring heat use during the summer months and considering it ''to represent the energy need for domestic hot water", while also correcting for ''degree days and indoor air temperature".Such methods can be particularly effective in ''highly insulated, airtight homes" where space heating is small in the summer [124].This relies on the assumption that DHW loads are ''consistent over the year" which is not necessarily the case, with Mitchell and Natarajan [124] also applying monthly DHW factors when disaggregating in accordance with the Standard Assessment Procedure for dwellings.Rather than heat, Zhao, et al. [50] separated power consumption based on modelling combined with outdoor temperature and historical data.Ultimately however, the reliability of spatial disaggregation does not compete with direct metering [28].
Spatiotemporal interpolation can also be applied, with Jin, et al. [75] aggregating datapoints into spatiotemporal bins and applying LOWESS regression to extract trends in order to increase the granularity of monitoring.

Uniqueness
Establishing uniqueness of datapoints is important to avoid duplication, however this was not reported widely in monitoring studies.Indeed, only one article reported the identification of duplicate (''additional") measurements [42].Uniqueness could also applied to monitoring devices themselves, with four articles reporting the use of unique identifiers for monitoring devices [44,109,120,132].Lewe, et al. [44] used unique ''meter reading IDs" and ''quantity IDs" in the development of a monitoring system, having identified potential ''challenges" in gathering data as meters had been installed at different points in time.Their use of unique, hierarchical IDs for buildings, meters and measured quantities overcame any risk to data lineage and the potential for duplication.Quintal, et al. [41] meanwhile focused on the use of identifiers for each file ensure unique records, adopting a file naming convention based on a datestamp to ensure uniqueness.

Data provenance
The means to achieve unique data relates closely to data provenance, which provides transparency and traceability around the original source of data for a study.As the screening and eligibility requirements generally resulted in monitoring studies that collected primary data, the majority of articles made limited relevant reference to data provenance.
A number of articles introduced secondary data, providing the original source (or sources).In an energy consumption study of kindergartens, Ding, et al. [150] used secondary data from an energy monitoring platform for local facilities, while Gupta, et al. [43] obtained annual measured performance data for a series of dwellings from an external database and Li, et al. [76] use data from a centralised database where sensors had been installed widely.Quintal, et al. [41] created a monitoring platform for smart meter data, and in their reported case study describe the sources of data used and when these were accessed to establish lineage.Mitchell and Natarajan [124] analyse multiple sources of secondary monitoring data, including individual monitoring campaigns, monitoring databases and data provided directly by homeowners to investigate building energy performance across multiple sites.Luján, et al. [129] introduced secondary electrical consumption data to characterise and disaggregate domestic energy use.Some articles combined both primary and secondary data.13 studies aggregated secondary weather data with primary monitoring data, in each case indicating the source [46,60,65,80,87,96,112,114,116,117,143,146,173].This aggregation was not limited to weather: Rouleau and Gosselin [68] combined their own monitoring data for domestic hot water and space heating with electricity consumption data from an electricity supplier.Ebrahim, et al. [151] also complement their data acquisition with central electricity supplier data, however did not explain the data source in detail.Overall, where secondary data was included in articles, articles effectively described the source of data to ensure traceability and lineage of data.

Data cleaning
Data cleaning and the handling of data errors is important to maintain the quality of data.To assure the quality of data, many studies used particular data validation methods, with data quality rules or objectives implicit to this [44,53,69,76,109,116,148].

Outlook and discussion
The review firstly identified chronic underreporting of data quality issues in building monitoring studies.Where data quality was reported, this was typically limited to a handful of data quality concepts.Indeed, on average, articles reported only 3.23 data quality concepts out of a total of 11 classified and only nine articles used the term data quality explicitly.Attributes that were particularly underreported included uniqueness, noisiness, validity and completeness, all in under one fifth of articles.Although there is a lack of comprehensive reporting on data quality, issues with individual data quality concepts are reported nonetheless.This evidence characterises poor data quality by low spatiotemporal granularity, outlying datapoints, data loss, comparability issues and low measurement certainty.
The way data was collected and treated was highly fragmented, in part reflecting the investigative nature of monitoring in articles and different objectives.The accuracy of sensors used in studies varied dramatically, with each measurand being collected by a variety of equipment with specified accuracy that differed by as much as a factor of 50 (indoor relative humidity).Temporal granularity ranged from milliseconds to entire months, with data collected at multiple intervals even within the same study.Spatial granularity too was fragmented, moving from low granularity where the performance of whole buildings and residences were monitored to high granularity at the system level to understand how they perform individually and in combination.For most studies, data was provided at a far more granular level than half-hourly, whole building smart meter data.While there is a time and resource cost to increasing spatiotemporal granularity, the promise of big data and new sensing technologies was beginning to be realised in many of the articles reviewed.
Fragmentation particularly affected data cleaning.The methods used were disparate: it was common to use normalisation to improve data comparability, but there was no consistency around which parameters were treated, which included normalisation for degree days, floor area, occupancy and indoor air temperature.Notably the most common data cleaning method, data filtering, was reported in only 15 articles.This shows that built environment researchers have different objectives when cleaning data: interpolation methods were reported in 11 articles with the cleaning objective being improving completeness, but normalisation was also common and reported in 15 articles, positioning comparability of data also as being important to researchers.Cleaning itself was only reported in 27 % of articles suggesting it was not widely utilised, all of which positions data cleaning as another area where consensus needs to be built.
When summarising evidence from articles, focus was placed on identifying requirements from data in studies, however articles failed generally to describe how the aims and objectives of the research translated into data quality requirements from monitoring data.Some exemplars of doing this were found however: Alrawi, et al. [113] for example identified the need for high temporal granularity in investigating household electricity loads and renewable energy generation; they highlighted that because of the variability in generation, monitoring with a lower temporal granularity could cause an ''overestimation of PV selfconsumption since fluctuations causing a mismatch between PV generation and load profiles will be ignored" and that ''subhourly data was needed to capture the behaviour of high peak powers".Here, the requirements for ''high-resolution" data to evaluate the PV system effectively were drawn from the aims of the research.This need not be for a single data quality attribute: multiple data quality attributes can be positioned within the overall aims and objectives of an article, including prerequisite data quality attributes.In a study focused around space heating demand, Mitchell and Natarajan [124] for example highlighted challenges in accurately disaggregating space heating when dealing with the poor temporal granularity of periodic meter readings.The authors applied different adjustments to estimate space heating depending on categorised data quality.In absence of higher quality data, authors needed to rely on additional ''assumptions" to estimate space heating and stated that ''adjustment 2 was better due to the higher temporal resolution" [124].This can in turn affect data cleaning, as granularity is prerequisite for degree day normalisation, relying on ''separated weather and non-weather" data such as space heating [144].Another example of this is in corequisite accuracy and granularity requirements.Accurate measurements were described as being particularly important as granularity increased, with measurement accuracy critical when investigating building systems in detail.Giving the example of heat pumps, Beermann and Sauper [148] reported that ''the measurement of small temperature differences of a few degrees is very sensitive to measurement errors of a few tenths of degrees within the class inaccuracy of the meter", meaning that without accurate data the authors could not achieve their research aim of investigating technologies within the building in detail.
The relationship between the aims of monitoring and data quality can of course become blurred, particularly when distinguishing between data quality issues and real energy performance issues which relate to the quality of the building.Data quality issues can mask genuine performance issues or mislead: in pursuit of high quality buildings, researchers and professionals may find robust energy performance analysis difficult without first excluding data quality issues.This is particularly relevant for outlying data, which were often filtered in articles according to an acceptable range.Once an error is detected, classification can take place to explain the error.Ma, et al. [45] for example classified erroneous datapoints into several types: ''real data" that ''reflects the real situation" and ''problem data" which does not.While ''real data" could still be ''abnormal", such an outlier is a result of a building performance issue rather than a data quality issue.This is also relevant as a data governance issue [18].Although articles had little to report on data governance processes followed, the fragmentated and isolated way in which data quality attributes were dealt with suggests a stark gap from good practice.Data governance is also notable because of the different priorities of industry and academia, as resource-allocation is necessary to achieve data quality [29].Whereas industry will focus on delivering effective buildings and systems, academia is concerned with improving knowledge.If the AEC sector is focused on achieving high quality buildings rather the quality of the data they consume, then deciding to invest time and resources in achieving data quality will be hard.If however industry can legitimately position improved decision-making and therefore improved quality of buildings within the value chain of data quality, then such investment will be more easily justified.This has very practical implications for monitoring.Cost for instance was identified as a barrier to precision: Wu, et al. [174] reporting that the expense of highprecision meters was ''too high" for only small increases in ''overall measurement accuracy" and argued that data quality improvements could be more cost effectively achieved with data processing.High granularity monitoring data at the system level is important but made possible only by investments in additional sensing technologies to be able to evaluate systems' performance in detail.It must be seen that for buildings' quality to improve, so must the quality of the data to evaluate them and that these investments produce value: ensuring they perform as intended in operation and energy performance gaps are avoided [175].
Trade-offs between cost, time and quality are particularly relevant where energy performance data is cost critical or where data is consumed at scale.Inaccurate smart metering would not be acceptable to a utility company or consumers and so these remote systems are designed to be robust with strict governance by regulators [36].With companies relying on big data and remote sensing technologies at scale, data quality issues could affect large numbers of buildings with poor data quality attracting far greater cost.
There may also be hesitation to making data quality visible, and this may partly explain the underreporting of data quality in articles.Such hesitation can be overcome however, and many current practices that increased transparency once faced organisational resistance in the same way, such as post-occupancy evaluation [176].
There is finally the question of who is involved in the data governance process, particularly who defines requirements from data and therefore data quality.This could be approached in two ways: a bottom-up, holistic approach or a top-down approach.The former would see the consumers of energy performance data could drive quality, setting contextual requirements according to their own, application-specific ''needs and expectations", reflecting how data quality was defined in literature [18].In the case of monitoring studies, researchers themselves would drive these requirements but is equally relevant to industrial applications of building energy performance data.Alternatively a top-down approach could be taken, which may better deal with the fragmentation of data quality.Building performance assessments increasingly mandate data acquisition: BREEAM provides credits for submitting data about in-use performance [177], while PAS 2035:2019 mandates different levels of monitoring and evaluation after a retrofit [178].Such assessments instate certain requirements from data within the process of verifying a building's quality.Standards organisations could also mandate minimum standards for monitoring, something seen with both BS 40101:2022 [32] and CIBSE TM68 [27] which provided specific requirements around accuracy and spatiotemporal granularity.Of course, such requirements can only be so effective if implemented adequately.While BS 40101:2022 is new and expectedly many studies did not comply with its accuracy requirements, looking to the two decades old ISO 7726:2001 [158], its requirements for sensor positioning are still not universally adopted in articles.Despite mandating the placement of temperature and relative humidity sensors at 1.1 m above floor level, several articles placed sensors at different heights thus impacting how representative that data is.Delivering more effective data governance for building energy monitoring will likely require a combined approach.

Limitations
Reflecting on the review's methods, a large sample size (n = 162) was achieved with mostly complete access to full texts and which can be considered large enough to represent current monitoring practice.Limitations also come from differences between the methodology developed for a monitoring study and what went on to be reported in the published article.Data quality may have been considered in far greater detail but be unreported in a publication for reasons of conciseness.This returns to the issue of seeing value in data quality in building energy monitoring: without this, then there may be limited enthusiasm to include background work in the paper, either by authors or reviewers of articles.
This study focused on measurands related to building energy performance, but its methods are replicable to other domains of building performance.For example, both BS 40101:2022 [32] and CIBSE TM68 [27] highlight comfort parameters such as acoustic quality which were not within the scope of this study.This could be important for two reasons.Firstly, to evaluate performance according to more holistic definitions of building performance.As understanding of what is required from a high quality building changes, then so too could the requirements from data.Secondly, as understanding how other building performance domains are monitored could find lessons learned that are applicable to building energy performance monitoring.
One of the most challenging aspects of completing this research was that despite the large sample size, data quality references were often implicit and each article required a manual review.Adopting as a scientific community a consistent reporting method and vocabulary for data quality would also go far to overcoming this and providing greater visibility to data quality.

Conclusion
A comprehensive review of building energy monitoring studies was completed to address the sparsity of reporting on data quality, looking at recent articles through the lens of data quality for the first time.The review identified 162 existing building monitoring studies, 158 of which reported on data quality in some way.Articles were classified into data quality attributes and concepts, finding that for most of these, reporting was not comprehensive and did not cover all attributes of data quality.A critical review identified areas of divergence and dominant issues for each data quality attribute.Together with the concluding recommendations for future research that follow, the paper achieved its aim.
The main contribution of the paper is providing new evidence that approaches to data quality in building energy monitoring studies are fragmented and suffer from a lack of standardisation.Some variation in approach could be expected, particularly for investigative monitoring studies that evaluate the performance of emerging innovations in the AEC sectors, but the requirements from data in each study were neither properly described or linked to the aims and objectives of the monitoring.The paper also identified more specific issues relevant to individual data quality attributes, such as around improving spatial granularity and ensuring completeness of data.
This paper is ultimately an early step in broader efforts to improve data quality in building energy monitoring.Although the paper succeeds in providing visibility to data quality in existing articles for the first time, the study faced a number of limitations which frame the direction for future research in the near term.This direction would see building performance researchers and decision-makers surveyed directly to understand their research methods more completely, beyond what was reported in an article.This would specifically address the issue of methodological information being lost between a study and its eventual publication.Such a survey could not only have a methodological focus but evaluate how well researchers are trained in how to achieve data quality as well as their attitudes towards it.As this next research step is taken, there would also be the opportunity to expand the scope of the research and explore other building performance domains beyond those related to energy, including comfort parameters such as noise.
This article should not be thought of as a criticism of individual researchers, rather a critical assessment of the status quo in a data quality landscape where there is limited consensus.Synthesising the theoretical background to data quality with the review's findings, five wider recommendations can be made: 1. Data quality must be embedded in the lifecycle of monitoring research, including in the design stage and not simply treated as a retrospective process.This includes identifying and managing the likely risks to data quality.The review drew attention to methodological issues such as data loss: research methods must be resilient if monitoring is to produce quality data that is shielded from interruption, such as transmission issues or power outages.Energy performance researchers must ask at the design stage of research what level of data quality will be necessary, specify and install apparatus appropriately to achieve this and measure data quality to ensure it is actually achieved.

Informational requirements for building energy monitoring
studies should be standardised and the fragmentation in spatiotemporal granularity highlights the need to agree a minimum dataset.Emerging technical standards, such as BS 404101:2022 [32] and CIBSE's TM68 [27], are beginning to realise this, although data quality concepts could be better represented within these standards.As agreeing these informational requirements may prove to be contentious, stakeholders at all levels in AEC must work together towards a robust definition and report data quality in articles according to an agreed dictionary of data quality attributes, data quality measures and minimum performance thresholds.3. Effective data governance requires transparency with widespread and comprehensive reporting of what level of data quality was achieved and how.Researchers and professionals must be transparent about data quality issues that arise.Measurement certainty for example was not widely reported, but the calibration, precision, range and resolution of apparatus are essential traits that affect how useful data is.The impact of measurement uncertainty could be quantified in results, as is increasingly common in building performance simulation around uncertain parameters.4. Achieving this level of transparency will be contentious without researchers and professionals being trained in achieving data quality.Without such training, the value of data quality improvement efforts may be lost, and both must be literature about data quality holding the skills necessary to diagnose and overcome data quality issues.Theory instated the importance of authority in data governance [37], so decision-makers should provide accountability and leadership over data quality, much in the same way as is done for information security or ethical approval today.These decision-makers must also ensure monitoring is properly resourced to enable improved data quality, for example to procure high-precision monitoring apparatus or install additional sub-metering technologies.5.In order for stakeholders in AEC to see the value in achieving data quality, the quality of buildings needs to be positioned within the value chain of data quality.Decisions made on the basis of poor quality energy performance data could lead to sub-optimal outcomes and negatively impact the quality of buildings.Without this link being known, it will be difficult to convince decisionmakers to invest time and resources in achieving quality data.This relates closely to the energy performance gap: while this gap typically concerns predicted and measured performance, poor data quality introduces the potential for deviations around measured performance.A ''data quality gap" could be promoted and analysed to better account for this, raising awareness of the deviation between measured and true energy performance that arises from issues with the quality of data.[43,44,47,50,61,63,70,77,100,112,115,116,119,[138][139][140]145,146,148,149,153,159,182] Domestic hot water 20 [42,43,46,47,68,80,97,112,116,[138][139][140]145,148,150,152,154,155,159,187] Window or door opening 3 [139,149,200] Equipment and appliance circuits 22 [28,42,43,46,52,61,70,71,78,87,100,112,115,116,130,138,140,149,153,169,182,187] Total 74 -

Fig. 2 .
Fig. 2. Distribution of the total number of data quality attribute mentioned in each article.

Fig. 3 .
Fig. 3. Correlation matrix of data quality attributes in articles.

Fig. 4 .
Fig. 4. Relational diagram of data quality concepts in articles.

Table 1
Screening and eligibility criteria.

Table 3
Classification summary by year and journal (*2022 incomplete).

Table 4
Reported measurement certainty in articles.

Table 5
Reported completeness in articles.

Table 6
Reported timeliness in articles.