Semi-parametric modelling of sun position dependent solar gain using Bsplines in grey-box models

Modelling the effects of solar irradiation plays an important role in various applications. This paper describes a semi-parametric (combined grey-box and spline-based), data-driven technique that can be used to model systems in which the solar gain depends on the sun position. The solar gain factor is introduced, i.e. the absorbed fraction of measured solar irradiation, and estimated as a continuous non-parametric function of the sun position. The implementation of the spline-based solar gain factor in a grey-box model framework is described. The method is tested in two case studies—in a model of the internal temperature of a dwelling in Aalborg, Denmark, and a model of the return temperature of a solar collector field in Solrød, Denmark. It is shown that the solar gain factor as a function of sun position is able to account for structural variations in solar gain that may occur due to factors such as shading obstacles and window or absorber orientation. In both test cases, the spline-based solar gain function improved the model accuracy significantly, and largely reduced structural errors in prediction residuals. In addition, the shape of the estimated function provided insight into the dynamics of the system and the local solar input characteristics. Accurate representation of such site characteristics was not possible with any data-driven method found in the literature. Besides the grey-box models used in this study, the solar gain factor can be used in a variety of data-driven models, for example in linear regression models.


Introduction
Solar irradiation is a crucial factor in the field of building engineering, renewable energy generation, and many other applications. In many cases, measurements or predictions of the solar irradiation are available, and the effect of solar irradiation needs to be captured in a model. However, the relation between solar gain and measured solar irradiation is typically non-linear and dependent on the position of the sun and site characteristics. This paper, presents and implements a datadriven model for dynamical thermal systems, that include a sun position dependent solar gain. The model is tested in two thermal systems: a building and a solar collector field.
In both systems, the heat dynamics related to solar irradiation are modelled for various reasons. For buildings, models are used to document the energy performance and identify a potential energy performance gap (Roels et al., 2017;Johnston et al., 2016;Haldi and Robinson, 2011;Brohus et al., 2010;Socolow, 1977). For instance, this has been done using data-driven thermal dynamic building models such as grey-box models (Roels et al., 2017;Madsen and Holst, 1995). As solar heat gain can significantly affect estimated thermal building properties, the model used for solar heat gain needs to be accurate. This is especially true for well-insulated buildings with low thermal mass and large window areas.
Similarly, for solar collector fields, forecasting models need to describe solar irradiation effects accurately to improve heat output forecasts and system control. The control must respond to rapid fluctuations in solar irradiation to prevent the collector fluid from boiling, and to ensure a high and stable outlet temperature. As solar irradiation is the dominating effect on the outlet temperature, an accurate model of the absorbed solar energy can improve predictions and hence operation of solar heat systems.
The following two sections present a literature review on typical implementations of data-driven solar heat gain in thermal building and solar heat plant models. aperture, effective window area, or gA-value) can be seen as the equivalent area of a perfectly transparent surface that transmits the same amount of solar energy as the actual windows of the space.
For inverse problem solving, the location of windows and shading obstacles are often unknown. Even if the location of the shading obstacles is known, it can still be cumbersome to estimate their thermal effects, just as it is difficult to estimate the effects of other factors such as reflections or dirt on the windows. Some data-driven thermal dynamic building models disregard the solar gain (Zeifman and Roth, 2016;Coley and Penman, 1992) and consequently treat solar gain as model uncertainty and observation noise. In other models, solar gain is considered a constant fraction of the global solar irradiation (Rabl, 1988;Bauwens and Roels, 2013;Madsen and Holst, 1995;Jimenez and Madsen, 2008), for example where sol is the solar gain, I g is the global radiation, and is the solar gain factor. The solar gain factor can then be estimated from data. When assessing the energy performance of buildings, Mejri et al. (2011) describe the solar gain in buildings as a hidden forcing function. They propose a method in which the solar irradiation striking each building facade, I i , is estimated separately as where i is the facade number and k i is a constant for the i th facade. Letting the constant, k i , account for the properties of the windowpane-e.g. the solar energy transmittance (g-value) and the area-this method could be used to calculate the solar gain coming through the windows in a manner analogous to Eq. (1). However, there is no evidence that this method would improve model accuracy compared to the simpler approach in Eq. (1).
Both of the above approaches assume that the solar gain is independent of the sun position. While these approaches may be sufficient for buildings with limited window area (and therefore limited solar gain), they may fall short for buildings with higher sensitivity to solar irradiation. As suggested by Bacher and Andersen (2014) and Madsen et al. (2016), parametric grey-box models could be used in conjunction with non-parametric models of solar gain, such as splines. Hence, rather than assuming a constant solar gain factor for the entire enclosed space or the facade of interest, the factor can be described as a function of the sun position. The present paper details the specific procedure for achieving this.

Solar gain in solar heat plant models
Solar irradiation is the main factor that determines the heat production of a solar heat plant (SHP), so it is crucial to include the effects of solar radiation in detail when modelling the return temperature (also called outlet temperature) of a solar heat field. The solar incidence angle on the panel largely influences the share of reflected irradiation, and thereby the solar gain. Furthermore, the position of the sun affects the solar gain due to possible shading objects around the collectors. It is common for the collector rows to cast shade on one another at some point in the afternoon as well. Finally, dirt or snow on the panels may affect the solar gain.
The most recent standard for performance testing of solar heat collectors by the International Organization for Standardization (ISO) is described in ISO 9806:2017 (European Committee for Standardization (CEN) Technical Committee, 2017). The method is considered to be the state-of-the-art for solar collector modelling (Kicsiny, 2014). The solar gain is included in the ISO dynamic model as where I t is the total radiation in the collector plane. The parameter (usually named F ( ) en in this application) represents the fraction of  Rasmussen, et al. Solar Energy 195 (2020) 249-258 solar irradiation that is absorbed by the panels at an incidence angle equal to zero, whereas the incidence angle modifier (IAM) K ( ) accounts for incidence angle dependence of the solar gain. The incidence angle is defined as the angle between the sunbeam and the normal of the collector plane. When beam and diffuse irradiation are measured separately and an incidence angle modifier is used, the solar gain is modelled as where the diffuse IAM (K d ) is a constant, as it is independent of the incidence angle. The standard incidence angle modifier (IAM) used in the ISO performance test is Depending on the applied model type, the parameter b 0 that determines the slope of the curve is manually tuned or fitted to data using statistical techniques. For most flat plate collectors, this equation is considered sufficiently accurate for describing incidence angle effects (Perers, 1997). Another IAM given in ISO 9806:2017 is the Ambrosetti function In this case, the dimensionless parameter determines the slope of the function. Almost all models intended for control of SHPs present in the scientific literature combine (partial) differential equations with parameters estimated from data, see for example Pasamontes et al. (2013) and Andrade et al. (2015). The incidence angle dependence is either neglected, or modelled using IAM functions such as in Eq. (5) and (6). The vast majority of existing models for SHP control do not take shadowing effects into account (Bava, 2017;Bava and Furbo, 2018).
Although these IAM functions may suffice for performance testing and simulation, they may not be accurate enough for control purposes. The functions used currently are tuned by a single parameter and therefore only a limited amount of function shapes are possible. Furthermore, the functions do not allow for asymmetric diurnal variations of efficiency, which may exist due to shading of the collectors.

Motivation
This paper aims to formulate and test a novel non-parametric method to estimate the solar heat gain in thermal dynamical systems such as buildings and solar collector fields. To the best of our knowledge, and as seen in the literature review, only simple data-driven methods for modelling solar gain in buildings exist, i.e. models in which the solar gain is independent of the sun position. For solar collectors, parametric incident angle modifiers (IAM) have been widely applied in the literature, in both steady-state and dynamic models. However, these IAMs are not able to account for potential shading patterns on the collectors.
As the relationship between solar gain and measured solar irradiation is sun position dependent and site-specific, it is often not feasible to choose a parametric representation of this relationship. In addition, the solar gain factor, i.e. the share of measured solar irradiation that is absorbed in the system, depends on the sun position non-linearly.
Spline functions address both of these issues. A key benefit of this non-parametric modelling technique is that no model structure needs to be specified. Furthermore, the splines can model non-linear phenomena, but the nonlinear operations in the construction of basis splines are performed before the model fitting process. Hence, only linear operations on the constructed basis splines are carried out in the model fitting procedure, as will be demonstrated later.
This study implements and tests the spline-based solar gain factor in a grey-box model setting. Grey-box models are a class of parametric physics-based statistical models, which in form lie between the deterministic white-box models known from simulation tools such as TRNSYS, and entirely data-driven black-box models such as simple linear regression and neural networks. Grey-box models are often relatively simple and computationally lightweight, they have excellent forecasting properties, and parameters that are directly related to physical properties.
In Section 2 the proposed method is discussed, including the mathematics behind the applied B-spline functions (Section 2.1), the applied grey-box model (Section 2.2), and the actual implementation of splines in state-space models (Section 2.3). In Section 3 the approach is applied to two use cases: modelling the effect of solar irradiation on the internal temperature of a dwelling in Aalborg, Denmark (Section 3.1), and forecasting the heat production from a solar collector field in the municipality of Solrød, Denmark (Section 3.2). Lastly, in Section 4, the approach is discussed and closing conclusions are drawn in Section 5.

Method
This section presents the used modelling procedure, which is also outlined in Fig. 1. First, Section 2.1 introduces spline functions and the construction and usage of a B-spline basis for data fitting, corresponding to the first step in Fig. 1. Next, Section 2.2 presents the resistance-capacity grey-box models and model selection procedure used in this article, which span all remaining steps in Fig. 1. Finally, Section 2.3 discusses in detail how the spline functions are implemented in the grey-box model setting in particular.

Splines
Splines are piece-wise polynomials that are continuously differentiable up to a certain order. The points at which these piece-wise polynomials connect are called knots. Splines are in practice used for interpolation, smoothing, and function approximation (Boor, 2007). This paper concerns the latter challenge, and aims to estimate the solar gain factor from data as a smooth function.
Splines are attractive for function approximation due to several properties. The functions are smooth up to a certain order, have C. Rasmussen, et al. Solar Energy 195 (2020) 249-258 compact support, and their formulas are explicit and rather simple (Prautzsch et al., 2002;Christensen, 2010). The proposed method is based on a specific type of spline function: the B-spline. Given a knot sequence, one can construct a set of B-splines that form a basis for all splines with this same knot sequence (Boor, 2007;Prautzsch et al., 2002). The construction of the cardinal B-spline basis will be discussed in Section 2.1.1, and in Section 2.1.2 a more general construction of B-splines is shown.

Cardinal B-splines
The is a piece-wise polynomial of order m, meaning that the spline consists of m polynomials of degree m 1. A transition point from one polynomial to the next is called a knot. If the knot sequence is uniformly spaced, it is called a cardinal B-spline or a cardinal basis spline.
The cardinal B-spline can be defined and constructed in various ways. A compact definition is the recursive convolution relation, first shown by Schoenberg (1946), where * is the convolution operator m and B 1 is defined by the characteristic function The B-spline has a number of desirable properties, one of which is continuity of the 0 th to m ( 2) th derivatives in the knots, i.e. B m m 2 (Sauer, 2006). For example, the B-spline of order = m 4 is a piece-wise polynomial, each of these polynomials is of third degree, and the function value, slope and curvature are continuous in each knot.
Another property is that infinitely many cardinal B-splines pointwise add up to unity, when evenly distributed along the x-axis with a distance corresponding to the distance between two neighbouring knots (Christensen, 2010).
As demonstrated later, the B-splines are suitable for estimating the solar gain factor as a smooth function of sun position. The estimated function is obtained by summing the scaled B-splines. In many use cases, one is only interested in a series of B-splines that covers a finite range of the x-axis. However, the sum of a finite set of cardinal B-splines will go to zero at the boundary regions as seen in Fig. 2A. Therefore, the cardinal B-splines near the boundaries need to be modified to add up to unity for the entire finite domain. The following section illustrates this modification of the B-splines.

B-splines in a finite domain
For a strictly increasing series of knots , in Eq. (7) by the alternative Cox-de Boor recursion formula: The Cox-de Boor recursion formula can be used to define B-splines with non-uniform knot distances and to shift the B-spline along the xaxis. The above criterion is relaxed from strictly increasing to non-decreasing knots, so that knots may be positioned at equal locations. In order to still use Eq. (9) in this generalised case, division by zero is considered as zero.
If the first m knots of the knot sequence … x x x , , , , and the same holds for the last m knots of the knot sequence, then the B-splines as defined in Eq. (9) will add up to unity on the finite domain. The leftmost B-spline will have m coinciding knots, the second leftmost B-spline will have m 1 coinciding knots, etc. The resulting set of B-splines has the same properties as the cardinal Bsplines, except that continuity is reduced in the coinciding knots to m n , where n is the number of coinciding knots. Figs. 2A-D illustrate how the basis splines change shape as more knots are coinciding and eventually how the resulting spline S x ( ) adds up to unity on the finite domain.

Data fitting with splines
The property that the B-spline series sums up to unity for the entire range of interest is especially useful for data fitting. By scaling and summing the individual B-splines as in Eq. (10), the spline function S x ( ) is obtained: where i is the scaling factor (and model parameter) of the i th B-spline. Typically linear ( = m 2), quadratic ( = m 3), or cubic ( = m 4) splines are used. Fig. 2E illustrates how it is possible to fit a spline to a certain data set by scaling the individual basis splines. By utilising the modified boundary basis spline as described in the previous section, it is possible to estimate a nonzero value at the left and right side of the spline domain. Section 2.3 will further discuss the implementation of the spline function S x ( ) in a grey-box setting.

Grey-box models
Grey-box models are mathematical representations of a physical system that is described by a number of inputs, outputs and state variables. The state variables are described by first order differential or difference equations for continuous and discrete time, respectively. This paper focuses on grey-box models with states in continuous time and observations in discrete time.
In a grey-box model, a noise term is added to the-otherwise deterministic-white-box model. The stochastic nature of these models furthermore allows for the application of statistical methods for structural model identification, such as the likelihood ratio test.
A grey-box model consists of a series of stochastic differential equations (the system equations) and one or more observation equations. The grey-box model in the form used in this article, namely the C. Rasmussen, et al. Solar Energy 195 (2020) 249-258 state-space form, is written where x t is the state vector, y tk is the observed output vector, A is the transition matrix, B is the input matrix, C is the output matrix, u t is the input vector, t is the time, k is the number of the discrete time step, is the measurement noise, and finally t is a vector of Wiener processes. The measurement error vector Ñ (0, ) t obs k is assumed to be i.i.d., and t and tk are assumed independent. Further theory on stochastic differentiation and integration can be found in for example (Kloeden et al., 1994 andZastawniak andBrzezniak, 1998). For the purposes of this paper, a basic knowledge of random variables suffices.

Model selection procedure
In order to choose the optimal number of B-splines to approximate a functional relationship present in the data, several possibilities are implemented and compared. For cubic splines with intercept, the minimum number of splines is 4. The maximum is kept at 12, which typically is sufficient for the specific modelling purposes.
Forward model selection is used, in which a simple model is iteratively extended. The procedure is visualised in Fig. 1. In each iteration, the most significant model extension is selected using the likelihood ratio test (LRT), a statistical test providing a p-value that expresses the significance of the extension (Madsen and Thyregod, 2010). The model selection ends when no extension is significant. A significance level of 0.05 is used throughout the model selection. The likelihood ratio test is also used to determine the optimal number of B-splines in each model extension. Fig. 2E visualised how a set of basis splines is scaled such that the resulting spline fits a certain data set. For the case of modelling the solar gain in a given energy system, the exact same approach can be used.

Implementing splines in state-space models
The position of the sun is uniquely defined by the solar azimuth angle, , and the sun elevation, , which can be determined from latitude, longitude and time. This paper deals with the solar gain factor as a function of the azimuth angle only. This is a valid assumption for sufficiently short time periods during which the sun elevation corresponding to each azimuth angle is approximately constant over the days included in the data.
In the following, the spline-based approach is described for modelling solar gain in a resistance-capacity (RC) grey-box model. Only cubic B-splines with equidistant knots are used here. When the data set covers all times between sunset and sunrise, the boundary knots are placed at these two moments. Otherwise the boundary knots are placed at the extremes of the azimuth angle present in the data.
The solar heat gain sol in an RC model can be expressed by multiplying the solar gain factor and the outdoor solar irradiation measured in a certain plane as shown in Eq. (1), (3), and (4). To include azimuth angle dependence of the solar gain, the solar gain factor is substituted with the spline, One needs to specify a sequence of knots in any sub-interval of°°[ 0 , 360 ), where is the azimuth angle of the sun. This sub-interval will typically include the azimuth angles between sunrise and sunset. Thereby the functional representation of the solar gain sol is obtained as in Eq. (13), where is the sun elevation, and I is the solar irradiation. The knots are defined as described in Section 2.1.2 to ensure that the spline curve can take any value for the entire chosen sub-interval. Depending on the complexity of the shadow pattern over the day, the number of base splines can be adjusted.
The basis splines B ( ) i m , are uniquely defined by the knot sequence and the azimuth angle time series. Therefore, the basis splines can be constructed before model fitting, as seen in the first step of Fig. 1.
Assuming that the solar radiation only affects the first state, the stochastic differential equations on state-space form in Eq. (11) can be written as: : Second input matrix )and its associated input vector )excluding solar gains.
In Eq. (14), p is the number of states, q is the number of basis splines used to model the solar gain, r is the number of inputs excluding solar irradiation, i is the scaling factor of the i th basis spline, is the heat capacity of the temperature node to which the solar gain is assigned, and I t is a time series of the measured outdoor solar irradiation. Finally, × A p p , is the transition matrix and x p is the state vector.
Note that the matrix B 1 can be modified to make the solar gain affect more than one state, however, for simplicity of the formulations this is not done here.

Case study
This Section describes the implementation of the spline-based approach for solar gain modelling of two dynamical thermal systems. In the first case (Section 3.1), the splines are included in a model of the heat dynamics of a single apartment. In the second case (Section 3.2), splines are applied to model the solar gain in a solar collector field.
In the following sections the used data and applied models are described for each case, along with results of the model selection procedure.
All models are formulated and fitted with use of the package ctsmr (Continuous Time Stochastic Modelling for R, version 0.6.17) (Juhl, 2013), which uses a Kalman filter and maximum likelihood methods for parameter estimation. The B-spline basis functions are generated with use of the R-core package splines (version 3.5.1) (R Core Team, 2017).

Modelling the thermal dynamics of a building
In this case study, the physical parameters of a building, such as heat loss coefficients, heat capacities, and time constants, were estimated. The following sections show how the models improve when the solar gain is modelled with splines. This semi-parametric model allows the estimation of the azimuth-dependent solar gain factor and the resulting solar gain.

Data
During the period from February 13 to April 1, 2018, measurements were obtained from a two-room apartment in Aalborg, Denmark. The apartment consists of an open kitchen and living room, a bedroom, a bathroom, a small corridor, and a heated weather porch. The apartment has a gross floor area of 67 m 2 and a total window area of 10.7 m 2 . Similar heated apartments are located on the north side, south side, and above the apartment. An unconditioned basement is located below the apartment. The floor plan of the apartment is shown in Fig. 3, including C. Rasmussen, et al. Solar Energy 195 (2020) 249-258 window sizes and the physical parameters used for modelling. The total space heating of the apartment was measured every 15 min, and the air temperature in eight locations in the apartment was measured every 5 min. The internal temperature used in the model was the arithmetic mean temperature of all eight sensors in each time step.
In addition to the heating power heat (kW) and the internal temperature T i (°C), the outdoor ambient temperature T a (°C) and the global solar irradiation I g (kW/m 2 ), were measured every minute at Aalborg University, 2.6 km away. All measurements were re-sampled to hourly values by averaging. A subset of the data is plotted in Fig. 4.

Models
The model that all the other models originate from is the single-state model T i as given by Note that the time indices shown in e.g. Eqs. (11) and (12) are omitted for simplicity. In the model above, the only state is the internal temperature T i , as indicated by the model name. The C R , , , i i w and the variance of are model parameters which are to be estimated. T , a heat and I g are model inputs, and T i is the observed internal temperature. For further specification, see the Nomenclature.
All models include the state variables in their names. The temperature of an internal thermal mass, the heating system, and the exterior wall/building envelope are represented by T , T m h , and T w , respectively. Finally, the postfix S q indicates that the solar gain is modelled by a cubic spline with q basis spline.
For illustrative purposes Eq. (16) presents a model with two states: the internal temperature T i , and the hidden state temperature of the building envelope T w . The inputs of model T T i w are the ambient temperature T a , space heating heat , and global irradiation I g .
To convert a model with constant solar gain factor, , to a model with azimuth-dependent solar gain factor, ( ),the spline function in Eq. (10) is substituted by The solar gain = I ( ) sol g is assigned to a single state, namely the internal temperature node T i , for all fitted models. Fig. 5 shows the model selection framework and the result of the model selection. The diagram shows that the simplest model T i is first extended with a hidden state without use of solar splines (row 2). In this case, the model T T i w was found to be the best model. No more than one hidden state has been tested, to avoid potential structural identifiability issues. After the model was fitted, it was extended further by introducing splines to estimate the azimuth-dependent solar gain factor. This is shown in the lower part of the figure.

Results
All the spline models with five or more basis splines (except the models with eight and nine, which did not converge) outperform the null model T T i w . However, no statistically significant improvement was found when more than ten basis splines were used for this particular data set.
The likelihood ratio test between the null model T T i w and the alternative model T T _ S i w 10 reveals a p-value of less than e 2.2 ·10 16 , which clearly shows a significant improvement obtained by the alternative model. The best model was therefore found to be the two-state model T T _ S i w 10 , with a cubic spline describing the solar gain based on ten basis splines. Fig. 6 shows the estimated solar gain factor of model T T i w and T T _ S i w 10 . In the azimuth interval from 110°to 260°, two distinct peaks in the solar gain factor from model T T _ S i w 10 and a flat insignificant section in-between can be seen. This is in agreement with the apartment layout as it has windows towards the east and west, as seen in Fig. 3.
It is noted that solar gain factor for the morning hours (azimuth angle from 100°to 110°) is rather high and oscillates. As the solar irradiation intensity is low in this period of the day it is expected that the scaling factor of the basis spline is difficult to estimate. Furthermore, the confidence band is underestimated, as the scaling factor in this region is reaching the upper parameter search boundary. However, due to the low irradiation intensity, the effect on the model states is limited as well. This is seen in the general low solar gain levels in the interval from 100°to 110°, in the lower plot in Fig. 6.

Forecasting solar heat generation
In the second use case, the aim was to predict the return (outlet) temperature of a solar heat field. As the solar irradiation has a dominant influence on this variable, it can be expected that accurate modelling of shading patterns improves the model performance significantly.  C. Rasmussen, et al. Solar Energy 195 (2020)

Data
Measurements were collected at the Aalborg CSP Solar Heat Plant in Solrød Municipality between May 23, 2017 and May 31, 2017. The total panel area is 2569 m 2 , which yields around 1300 MWh annually with a peak power of 1.9 MW, serving a community of 350 households. A schematic overview of the site and measurements are shown in Fig. 7. The following measured inputs were used: total solar irradiation in collector plane I t , fluid flow per panel surface area Q f , fluid supply and return temperature T s and T r , and ambient temperature T a . The solar azimuth angle was computed from the longitude, latitude, and time. In Fig. 7, note the trees located west of the collectors, which shade (part of) the field in the afternoon.
A subset of the measurements is depicted in Fig. 8. The values were 1 min averages of measurements taken every second. A control of the return temperature had already been implemented at the time of data collection, which limited the variation in measured return temperature, A subset of the data was selected for parameter estimation. Periods of low or zero flow were excluded, as well as data points with large incidence angle , such that < cos( ) 0.2 (i.e. >°78.5 ). Finally, some periods in which the plant switches back and forth from preheating to operating were removed. This results in a data set with gaps and a varying number of measurements per day, including days with clear sky and days with intermittent cloud cover. This is, however, not a problem in the grey-box model setup.

Models
An initial stochastic state-space model was taken from , whose model in turn derives from Perers (1997). An extended form of the heat balance proposed by the latter is used in the current ISO standard for quasi-dynamic testing (European Committee for Standardization (CEN) Technical Committee, 2017). The resulting state-space formulation for the base model T r is The modelled quantity is the return temperature T r . The measured inputs are T Q T , , a f s , and I t .
The variable T f is the average of supply and return temperature. The factor c f is a given constant, whereas mC U ( ) , , , e fa f and obs are parameters to be estimated. The exact specification of the variables can be seen in the nomenclature.
Four extensions of the base model T r were considered, which all add detail to modelling the effect of solar irradiation. One extension was the splined solar gain factor, which is indicated by S q in the model name, where q is the number of splines. In addition, two different IAM functions were implemented. The model name includes IAM for the standard function as given in Eq. (5), and IAM a for the Ambrosetti IAM from Eq. (6). Finally, the total irradiation was split into direct and diffuse components, using the method proposed by Ruiz-Arias et al. (2010). Models including this irradiation split contain I split in their names. All applied models lump the field into a single compartment with one corresponding state variable, the return temperature T r . All extensions were applied to the base model in an iterative manner that is visualised in Fig. 9. In each iteration, the model was extended in several ways, one extended model for each of the remaining extensions. The extension with the lowest p-value were selected, and the procedure was repeated until no extensions were significant or all extensions were implemented.
Note that in cases of the split radiation being combined with the splined solar gain factor ( ), the solar gain was modelled differently to Eq. (4) to ensure identifiability. Instead, it was given by in case where the model did not include an IAM.

Results
In the first model extension, the spline was the most significant improvement to the model, in particular T _S r 8 . The next selection round  C. Rasmussen, et al. Solar Energy 195 (2020) 249-258 included the radiation split to the model, where again the version with eight basis splines was most significant. Finally, the model was extended with one of the IAM functions, again with eight basis splines. The model selection ended here, as all extensions considered were accepted. Note that the final model includes both a state-of-the-art Amrbrosetti incidence angle modifier and spline functions, which together model the sun-position dependence of the solar gain factor. This result shows that several solar gain modelling components can be combined and in this way complement each other. The combined effect of the IAM, split radiation, and spline functions on the estimated solar gain factor for the (estimated) direct radiation is shown as a function of azimuth angle in Fig. 10. The models have different estimated heat capacities mC ( ) e , which scale the functions to different maximum values. In order to compare the possible function shapes of the solar gain factor function that the different models provide, for this plot the functions were scaled to a common maximum value of 1. The spline-based solar gain factor decreases faster in the afternoon (azimuth angle from 225°to 280°), which indicates that the effect of the shading forest is captured in these models, but not in the state-of-the-art model.
Model performance was compared using prediction with perfect forecasts of the inputs, that is, using the actual measurements as inputs. Fig. 11 shows how the model predictions improved due to the added solar gain function. Note that all models, including the final model, overestimate the return temperature at the end of the day due to a measurement error in the ambient temperature. The measurement error occurs due to exposure of the sensor to direct sunlight. This could be accounted for in future improvements of the model, but is outside the scope of this article. Fig. 11A compares the residuals of two of the base model extensions, T _ IAM r and the selected T _ S r 8 . It is clear that the spline-based model is able to account for shading, and thereby reduces structural prediction errors that were present when modelling incidence angle effects with an IAM only. Fig. 11B shows improvements in the residuals after the final model extension from T _ S _ I r 8 split to T _ S _ I _IAM r 8 split . The improvement was less pronounced than in 11A, but it can be seen that the residuals in the late afternoon were slightly improved by including the IAM. The remaining error in the late afternoon is explained by the ambient temperature measurement error.
With each model extension, the time of day from which the model starts overestimating the return temperature is postponed. As one might expect, the improvements are less pronounced on cloudy days than on sunny days as plotted here, but still present.

Discussion
First of all, it is noted that although the splined solar gain factor was applied in grey-box models here, the used of the method is not limited to this model type. The B-splines can also be used in other models in which the parameters are fitted from data, such as linear regressions.
In the present article, the splined solar gain factor only affected a single state in the state-space models. It is however a simple adaptation to assign the solar gain to the system in more complex ways, e.g. by assigning it to more than one state.
While only equidistant spline knot sequences are used in this article, the location of the knots can perhaps be optimised by concentrating more knots around azimuth angles for which fast changes in the shading pattern occur. One should in this case be aware of potential overfitting when spline knots are placed too closely together, as the  , only the best model is shown in the diagram, however, models with four to eight basis splines were tested. C. Rasmussen, et al. Solar Energy 195 (2020) 249-258 individual basis splines would be fitted to fewer data points. When using longer periods of measurements covering several months, there will be a larger range of sun elevations associated with a single azimuth angle. As the elevation also affects shading patterns and the incidence angle, one may need to account for this as well. This can be solved in several ways. First, an additional spline curve can be introduced to account for the sun elevation variation of the solar gain factor. The downside of this approach is that it requires significant amounts of data to estimate the splines and adds parameters to the model. This will increase the computation time and may even result in an unidentifiable model.
A more convenient approach than introducing a second spline curve is to apply an adaptive model in which parameters are allowed to change over time. One example of adaptive time-varying parameters estimation is presented by Joensen et al. (2000). In that way the scaling factors of the B-splines will be specific to shorter time periods of data, so that the azimuth angle again largely determines the shading pattern within this time period.
An important note regarding the estimated spline curve is that its physical interpretation is not always straightforward. The splines may account for other effects than solar irradiation such as systematic use patterns (heat gain from people, computers etc.) or systematic measurement errors. These concerns are mostly relevant for climates or measurement periods with low daily variation in solar irradiation, i.e. climates and periods with mainly clear sky conditions. Finally, it should be noted that it can be difficult to estimate the scaling factors for periods with low solar irradiation, e.g. close to sunrise and sunset. This may be prevented by placing the boundary knots closer towards the south, and fitting a constant or linear solar gain factor to the periods outside these bounds.

Conclusion
Modelling solar gain is of crucial importance for energy system models. This paper shows that the approaches for solar gain modelling in the literature can be improved significantly by taking the position of the sun into account when modelling the thermal dynamics of an apartment and a solar collector field. This paper has proposed B-splines as a non-parametric method to estimate the relation between measured solar irradiation and the solar gain. In two case studies, this method was implemented and tested in grey-box models that included the solar gain as a function of the azimuth angle. This function is estimated using coefficients on a B-spline basis. The splines have been used on the global, total, and beam irradiation in different model versions.
A two-state model of an apartment was improved significantly by estimating the solar gain factor with use of a diurnal spline factor instead of the commonly used constant solar gain factor. The estimated solar gain curve showed good agreement with the authors expectations for the specific apartment with east-and west-facing windows, as the peak in solar gain occurred during morning and evening.
For a return temperature model of a solar collector field, it has been shown that the splines account for local shading patterns caused by the panels themselves and trees next to the field, while also accounting for incidence angle effects. The key difference compared to the methods from ISO 9806:2017 and most models in the literature is that the splinebased solar gain factor has a higher degree of freedom than frequently used single-parameter incidence angle modifier (IAM) functions. This makes it capable of accounting for site-specific shading patterns, which largely reduced structural prediction errors. In addition, the splines can be combined with standard IAM functions to achieve even better results.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. C. Rasmussen, et al. Solar Energy 195 (2020) 249-258