Stochastic generation of synthetic minutely irradiance time series derived from mean hourly weather observation data

Synthetic minutely irradiance time series are utilised in non-spatial solar energy system research simulations. It is necessary that they accurately capture irradiance fluctuations and variability inherent in the solar resource. This article describes a methodology to generate a synthetic minutely irradiance time series from widely available hourly weather observation data. The weather observation data are used to produce a set of Markov chains taking into account seasonal, diurnal, and pressure influences on transition probabilities of cloud cover. Cloud dynamics are based on a power-law probability distribution, from which cloud length and duration are derived. Atmospheric transmission losses are simulated with minutely variability, using atmospheric profiles from meteorological reanalysis data and cloud attenuation derived real-world observations. Both direct and diffuse irradiance are calculated, from which total irradiance is determined on an arbitrary plane. The method is applied to the city of Leeds, UK, and validated using independent hourly radiation measurements from the same site. Variability and ramp rate are validated using 1-min resolution irradiance data from the town of Cambourne, Cornwall, UK. The hourly irradiance frequency distribution correlates with R 1⁄4 0:996 whilst the mean hourly irradiance correlates with R 1⁄4 0:971, the daily variability indices cumulative probability distribution function (CDF), 1-min irradiance ramp rate CDF and 1-min irradiance frequency CDF are also shown to correlate with R 1⁄4 0:9903; 1:000, and 0:9994 respectively. Kolmogorov– Smirnov tests on 1-min data for each day show that the ramp rate frequency of occurrence is captured with a high significance level of 99.99%, whilst the irradiance frequency distribution and minutely variability indices are captured at significances of 99% and 97.5% respectively. The use of multiple Markov chains and detailed consideration of the atmospheric losses are shown to be essential elements for the generation of realistic minutely irradiance time series over a typical meteorological year. A freely downloadable example of the model is made available and may be configured to the particular requirements of users or incorporated into other models. 2015 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/


Introduction
Solar irradiance varies on a minutely time scale (Sayeef et al., 2012). The fluctuations are driven by cloud dynamics, atmospheric losses (Calinoiu et al., 2014), and the transport of airborne pollutants (Vindel and Polo, 2014a). Changes in irradiance that occur on the same time scale as changes in electricity demand will impact the benefits of storage and self-consumption in a domestic or community PV system (Marcos et al., 2014). Integrated electricity demand, PV supply, and storage simulations must operate on a minutely time scale to capture these effects, and therefore require minutely irradiance time series as an input (Widen et al., 2015;Sayeef et al., 2012;Hummon et al., 2012;Cao and Sirén, 2014). Calibrated minutely irradiance datasets are generally the output of isolated research projects and are often limited in duration, measurement consistency, and location. Hourly weather data, however, is widely collected and made available through national meteorological offices. This hourly data fails to capture the intermittent nature of solar irradiance (Sayeef et al., 2012), therefore some solar irradiance models use hourly weather datasets to artificially generate minutely irradiance time series.
The focus of solar irradiance models can vary from predicting the future irradiance, to providing a general expected irradiance at any location globally. Many examples of these models have been reviewed, analysed and validated in literature (Gueymard, 2012). The methodology of interest is a sun obscured type approach. This is where the cloud cover is predicted or determined, thereby implying when the solar beam irradiance will be obstructed. A more complex methodology is outlined and developed by Morf (1998Morf ( , 2011Morf ( , 2013 where cloud cover is two-dimensionally modelled to replicate sky with certain clouded conditions, whilst a random number generator driven model separates irradiance into its subcomponents of diffuse and beam. Atmospheric transmission losses from extraterrestrial irradiance to global horizontal irradiance is extensively detailed in literature, however its inclusion on a time series irradiance generation model is less so. Simplistically, clouded periods can be subjected to a random variate to represent these losses (Ehnberg and Bollen, 2005), however there is scope for a more sophisticated approach. Geographically dependent monthly clearness index distributions can be used to deterministically select the transmission losses during clouded periods (Morf, 2013). Probabilistic methods are commonly seen to generate irradiance data by stochastically selecting the clearness index Nomenclature Latin alphabet C cloud coverage fraction ðC=10Þ C 8 cloud coverage in okta (0-8) epm elements per minute in matrix f white-noise multiplier for k c variations G irradiance, specified by subscript (W m À2 ) i random start point within row vector k c clear-sky index (G=G cs ) n number of elements within a re-sampled cloud length row vector P 1 transition probability matrix P ðxÞ probability of x to occur r random variable between 0 and 1 R resolution of primary x (100 m/el) s number of states in Markov process t time-step in Markov process u wind speed (ms À1 ) u 10 u measured 10 m above ground (ms À1 ) x horizontal cloud length (m) X state at point t in Markov process x cloud cover row vector of 1's and 0's x w cloud cover row vector adjusted by w z cloud height (m) z 0ref roughness length (m) Numerical 0 used to represent clear sky minute 1 used to represent a clouded minute 60 used as a conversion for secs to min Greek alphabet a coefficient defined by x max b single power law exponent b tilt angle from horizontal of inclined plane d coefficient defined by x min minutely fluctuation from k c h i solar incidence angle normal to angled plane h z solar zenith angle r std. dev. around hourly mean of k c w sampling rate  (Ngoko et al., 2014), or by determining monthly probability distributions of clearness index (Graham et al., 1988), or stochastically producing cloud cover and applying beam irradiance transmission losses (Aguiar et al., 1988). There are numerous methodologies that can be utilised and developed upon. The methodology presented within this paper is a temporal only sun obscured type model with two distinct sections, the first generates 1-min resolution cloud cover, and the second calculates the irradiance based on whether the ensuing minute is clear or cloudy. A distinct aspect of this methodology is the use of a multitude of Markov chains to stochastically determine future weather condition states of pressure, wind speed, cloud height and okta, 2 incorporating the weather variation influence of season, the diurnal dependency and variations caused by pressure. Furthermore, the pattern of cloud cover is generated with greater complexity than in previous literature by using a one-dimensional method that considers cloud height, the speed of the cloud, and the statistical distribution of the horizontal cloud length. The irradiance is calculated in three stages, firstly the atmospheric transmission for each minute is determined, secondly the theoretical clear-sky irradiance is calculated based on earth-sun geometry, and thirdly the irradiance is broken down into its different components in order to obtain irradiance on an arbitrary plane. The most distinct element of the irradiance calculations is the production of clear-sky index probability distributions based on the okta number, and the deterministic approach to applying atmospheric losses minute-by-minute with the inclusion of an intra-hour dependency.
The methodology is geographically flexible, so long as the input data exist for the location desired. However, individual simulations at nearby locations would not correlate due to the non-spatial nature of the methodology. The model output is a synthetic global solar irradiance time series upon an arbitrary plane at a 1-min resolution. It is a temporal data series only, and does not include a spatial dimension. Its applications are therefore limited to cases where the spatial element is not integral, such as small scale studies where a single high-resolution irradiance data series input is ideal. The methodology has suggested application in the improved modelling of small-scale PV supply, demand, and storage systems, through the calculation of electricity supply on a time scale that matches the demand flows. Regular demand flows operate with high power across a small duration, meaning that mean hourly averages fail to capture electricity peaks in the supply and demand. To appropriately capture electricity flows within the residence that can accurately calculate efficiencies, self-consumption losses, battery charge and discharge states for example, an appropriately high-resolution time scale is required (Darcovich et al., 2015;Torriti, 2014). This methodology offers an improved input estimation of the solar irradiance to these types of studies.
A validation of the methodology is carried out using hourly input observation data from Leeds, UK, and with 1-min input observational data from Cambourne, Cornwall, UK. Comparisons with the irradiance output have been made against independent radiation observation data for the same location, and the key features of this methodology are discussed.

Methodology
An example of the output can be seen in Fig. 1. Fig. 1a is the mean okta number for the hour, Fig. 1b is the one-dimensional cloud cover where black indicates the presence of a cloud, and Fig. 1c is the corresponding horizontal global irradiance. The simulated day selected was typical of June for Leeds, UK.
The inputs required for this methodology are the hourly mean weather observation measurements of sea level pressure (hPa), 10 m wind speed (knots), cloud base height (decameters), and okta number (okta). In order to capture accurately the probability statistics of a typical meteorological year, a minimum of 10 years of observational data are recommended.

Cloud size
A two-dimensional approach to producing cloud cover could be used to replicate the measurement type of an okta number (Morf, 2011). The methodology presented, however, requires only a single passage of beam irradiance from sun to the solar panel; a representation of the whole sky is not necessarily required. The typical method for measuring cloud cover is to use a cloud base recorder which uses a vertical laser pulse to track cloud presence (UKMO, 2010). This measurement device takes readings in a single direction, the okta number derived from this type of measurement is considered an acceptable representation of obscured beam irradiance once it is adapted to a synthetic temporal representation. Unlike previous methodologies, the cloud cover will be generated to follow a size distribution for a single linear dimension.
Horizontal cloud size distributions are shown to be wellrepresented using a single power-law relationship using an exponent of b < 2 (Wood and Field, 2011;Stull, 1988;Leahy et al., 2012;Pressel and Collins, 2012;Ray and Engelhardt, 1992) where x is the horizontal cloud length, b is the exponent taken at 1.66 between 0.1 and 1000 km (Wood and Field, 2011), a is a constant and P ðxÞ is the probability that x will occur. Eq. (1) represents the probability of x without horizontal cloud length limits. Eq. (2) introduces these limits and to allow for pseudo-random number extraction where r is a random variable that is uniformly distributed between 0 and 1, and a and d are coefficients defined by the upper and lower limits of x as where x min is the minimum cloud length and x max is the maximum cloud length. These equations allow a 1 metre resolution row vector of cloud cover, x, to be generated with 1s and 0s representing single metres of cloudy and clear intervals respectively. A signal poly-phase filtering technique (implemented using the built in Matlab 'resample' function (Oppenheim et al., 1999;Matlab, 2012a)) is applied that decimates vector, x, by a sampling rate, w. This converts the cloud length row vector (in m) into a smaller vector that is representative of increasing cloud speeds, and so the resultant vector, x w , is a time series.
The clouds are assumed to be travelling at the concurrent wind speed and cloud height for each hour of simulation. The sampling rate used in the signal poly-phase filtering technique is where u is the wind speed (ms À1 ), 60 is a conversion from seconds to minutes, and R is the resolution of x (m/ element).
x w is the minutely time series cloud cover row vector as a function of the wind speed. Hourly periods are selected at random from x w and the hourly mean coverage value, C, is determined through Eq. (6). C is a fractional representation of the duration of obscured sun, where 0 10 is clear sky and 10 10 is fully obscured; where the overline represents the mean, n is the total number of elements in x w , and i is a random start point along the length of x w . Hours are selected from x w until there is a cloud cover database consisting of 1000 examples of an hour of cloud cover for each possible u at each C. During each hour of simulation, an hour of cloud cover is selected from the database using the stochastically determined values of u and C as a reference.

Markov chains
To implement the stochastic element to the modelling, a Markov process is used. Markov models are a very popular method of stochastic data generation and have been used in many applications, from wind estimates (Masseran, 2015), solar energy estimations (Bhardwaj et al., 2013;Vindel and Polo, 2014b;Hocaoglu, 2011), and in weather generation (Yang et al., 2011).
A Markovian process is a probabilistic mathematical method whereby transitions from one state to the next are directed by discreet probabilities taken from the statistics of real-world processes. In the case of this methodology, statistics are developed from real observational transitions of mean hourly okta, wind speed and cloud height. From these statistics, transition matrices can be constructed.
This methodology uses a single order Markov model whereby only one previous time-step, t À 1, influences the transition process from t À 1 to t. Higher order transitions exist, from first order t À 1, through to the nth order t À n. Ngoko et al. (2014) describes how a Markov process ðX t ; t ¼ 0; 1; 2; . . .Þ that has s allowable states ð1; 2; . . . ; sÞ, is in state j at time t if X t ¼ j. In this first order Markov process, given that the process is in state i at time t À 1, the probability that it will transition to state j at time t is given by discreet probability P ij , this is calculated as These probabilities are stored in a transition matrix P 1 and can be represented as Constructing a transition matrix requires the conversion of a variable's range of magnitudes into discreet states. Fortunately for this study all the variables are extracted in appropriate states, for example each okta number ð0; 1; . . . ; 9Þ is its own state. The linear time series of observed states must be converted into a statistical representation of transitions. Firstly, the frequency of transitions f s tÀ1 ;s t must be found. This is done by tallying every transition of states found in the observational data in the appropriate f s tÀ1 ;s t location. Secondly, the frequencies are converted into probabilities through dividing f s tÀ1 ;s t by the total number of occurrences that s tÀ1 transitioned, f Ã s tÀ1 . In the case where the observed frequency of state i transitioning to state j is f ij , and where f Ã i is the total number times state i has transitioned. The probability of this transition can be expressed as A pseudo-random number generator is used to determine s t from s tÀ1 within the simulation. To do this, the cumulative sum of the probabilities P 1 C is found as To implement the Markov process in Matlab r2012a, a random variate r, evenly distributed between 0 and 1, is introduced at each transition. P 1 C is queried against r in a logical statement, resulting in a Boolean matrix where ! denotes a logical if statement, and P 1 B is the resultant Boolean matrix. For example, if each of the cumulative transition probabilities are less than r, a value of 1 is assigned, else a value of 0 is set. The future state s t is then given by where the subscript s tÀ1 indicates the matrix row to sum using the previously determined state. In the subsequent iteration s t becomes s tÀ1 and the process is repeated.

Okta, wind speed, pressure and cloud height
Twenty different Markov chains are utilised to stochastically select weather variables for the subsequent hour, based on the conditions of the current hour. The variables selected through the Markov process are the okta number for each season during both above and below average pressures (8 matrices), wind speed for each season (4 matrices), cloud height for each season (4 matrices), and the diurnal changes for each season (4 matrices). The transition probabilities are calculated through statistical analysis of the transitions between hours within the observation data.
The wind speed and cloud height Markov chains are produced accounting for seasonal variations. A Markov chain is used for each variable representing each of the four seasons, capturing the variability at different times of the year, totalling four chains each. The okta number Markov chains also consider the effect of season, with the inclusion of impacts from pressure and diurnal variation. Eight okta Markov chains are produced that are split by above and below average pressure for each season, and four additional morning okta Markov chains are produced to capture the diurnal variation for okta transitions between 00:00 and 05:00 am for each season. The intent is to capture the variation in transition probability that occurs as a result of weather changes due to the presence of solar energy. 5 am is considered the cut-off because it is a typical sunrise in the summer for the applied study locations. 5 h represents 5 okta transitions and is considered an appropriate duration for the slight propensity to shift towards an increased okta to be represented, Fig. 8 demonstrates the diurnal transition differences. Fig. 2 visually demonstrates the mean okta Markov chain for the entire year, whilst the effect of season can be seen in Fig. 11.
The hourly mean sea level pressure is considered to be either above or below average at any hour, this is to account for weather condition variation between high and low pressure systems. The mean pressure is calculated from the observation data and the frequency distribution of below and above average pressure durations are produced. The duration of the pressure system is then selected using a pseudo-random number generator that follows the duration probability distribution. The pressure systems always alternate between periods when they are above and below average. Wind speed measured by weather stations is typically recorded at 10 m above ground level. For clouds in the atmospheric boundary layer (below 1 km) the wind speed can be extrapolated using the stochastically selected cloud base height as the target reference height, z ref . The extrapolated wind speed is assumed to be the speed at which the cloud travels. To extrapolate the wind speed from the 10 m measurement to a reference cloud height, the following logarithmic profile extrapolation (Best et al., 2008) is used where u ref is the wind speed at the reference height (ms À1 ), u 10 is the wind speed at 10 m above ground level (ms À1 ), z ref is the vertical reference height, and z 0ref is the roughness length of the location (set to 0.14 for rural locations (Best et al., 2008)). Above 1 km, the geostrophic wind speeds in the free atmosphere are influenced by pressure and thermal gradients (UKMO, 1997). Estimating geostrophic wind speeds is difficult using mean hourly surface observational data; presenting one of the limitations of the current state of the methodology. Without knowledge of pressure and thermal gradients, alternatives must be used. Excellent methods exist such as the wavelet variability model (Lave and Kleissl, 2013), however the data input required is an irradiance profile, an input avoided in the rationale of this methodology once clear sky statistical analysis has been performed. Future development is required to truly address the cloud speed, the validation results however were found to be relatively insensitive to this aspect, and so at this time a guided estimation is utilised. Using the same range as used in the wavelet variability model ð0 À 25 msÞ and typical geostrophic wind speeds of 12.5 km h À1 (Mathiesen et al., 2013), the estimated wind speed at cloud heights above 1 km is determined allowing variation between the suggested range with the mean found at the typical free atmosphere wind speed as u >1km ¼ Gammað2:69; 2:14Þ.
For each hour of simulation, the wind speed, cloud height and okta are stochastically selected from the appropriate Markov chain. From here, an hour of cloud cover at a minutely resolution can be selected from the cloud cover database, resulting in a temporal Boolean vector representing 1 min cloud cover.

Irradiance calculation
Once the cloud cover is generated for the entire time series desired, the irradiance is calculated. The irradiance model is in three parts. Firstly, atmospheric transmission for each cloudy and cloud-free minute is determined.
Theoretical clear-sky irradiance based on earth-sun geometry is then calculated and the atmospheric transmission factor applied to obtain simulated global irradiance. Lastly, simulated all-sky direct and diffuse irradiance is calculated and then applied to obtain irradiance on an arbitrary plane.

Generation of clear sky indices
The atmospheric transmission is parameterised in terms of the clear-sky index, k c ¼ G=G cs , where G cs is a theoretical cloud-free irradiance value. To account for the varying optical thicknesses of clouds, it is appropriate to choose k c from a distribution rather than to assign a fixed value. To this end, observations of G from the 91 MIDAS 3 stations that provided observations of both cloud okta and hourly global irradiance in the year of 2012 were used. Hours in which the mean solar zenith angle was greater than 80 were rejected to minimise the impact of horizon effects and exclude hours during which the sun rose or set.
Values of G cs were calculated by the DISORT radiative transfer solver (Stamnes et al., 2000) in the libRadtran package (Mayer and Kylling, 2005). In the radiative transfer calculation, atmospheric profiles of temperature, O 3 concentration, precipitable water vapour and surface albedo were obtained for each month of 2012 from the European Centre for Medium-Range Weather Forecasts (ECMWF) ERA-interim reanalysis data (ECMWF, 2014) at a resolution of 1:5 Â 1:5 . Monthly aerosol data was provided by the GLOMAP model at a resolution of 2:8 Â 2:8 , specifying scattering and absorption coefficients and asymmetry parameters for 6 bands in the shortwave spectrum for sulphate, sea-salt, black carbon and particulate organic matter aerosols in four size modes (Scott et al., 2014). As each simulated hour is to be aligned with actual data to obtain k c , it is critical to obtain the solar geometry to high accuracy. The latitude and longitude of each MIDAS station was therefore provided to four decimal places, from which the effective cosine-weighted solar zenith angle for each hour was calculated using the Blanco-Muriel et al. (2001) algorithm (accurate to within 0.01 ) included in libRadtran.
Clear-sky index observations were grouped by corresponding cloud okta for the hour and histograms of the observations created (Fig. 3). There were 111090 observations in total. From this, the distributions from okta 0, 6, 7 and 8 were kept. Periods of okta 9, which represents sun obscured due to fog or other meteorological phenomena, were treated as overcast hours. For hours where okta is in the 1-6 range, the distribution for okta 6 is used, as the aim is to obtain a distribution for cloud optical thickness for partially cloudy hours which differs from cloud optical thickness in overcast or nearly overcast hours, but is relatively unaffected by periods of clear sky. As okta 1-5 h include a significant amount of clear sky, it is difficult to estimate a sun-obscured k c distribution for these hours. Furthermore, observations of okta 1-5 are comparatively rare. The distributions were fitted using maximum likelihood estimation (Table 1), except in the case of okta 0 where a normal distribution with mean of 0.99 and standard deviation of 0.08 was selected to provide a good visible fit to the peak of the histogram. In the okta 0 histogram, there are several k c values which are much less than 1, indicating that even when no cloud is registered, the irradiance reported can be much lower than expected. This can be explained by the fact that the cloud sensor is pointing directly upwards whilst the sun is in a different portion of the sky and may be obscured by cloud. This is a particular issue in the UK where the sun is always at least 27 from zenith. To ensure realistic clear-sky values where the sun is shining, these low values have been rejected and a normal distribution fitted visually to the spike of the histogram, which represents genuinely clear hours.

Construction of minutely clear-sky index
For each day, a baseline cloudless clear-sky index, k c;clear , is selected from the okta 0 distribution, and applied to all minutes where the sun is not obscured. For cloudy minutes, the baseline cloud-obscured clear-sky index, k c;cloud , is selected from the appropriate distribution. If the sky is overcast for an extended period (hourly okta 8 for > 6 hours), the model sets the baseline value of k c;cloud for each interval within the overcast period from the okta 8 distribution. This is justified because the clear-sky index of overcast skies does not vary extensively (Skartveit and Olseth, 1992). For partially cloudy hours (okta 1-7), the hourly k c;cloud value is allowed to change each hour and is drawn from the okta 6 distribution if the hourly okta C 8 is 6 or less, and the okta 7 distribution if C 8 ¼ 7. If the value of k c;cloud selected from the distribution exceeds 1, a new value from the distribution is drawn.
To move from hourly averages of k c;cloud to minutely instantaneous values, a linear interpolation between two values of k c;cloud was performed to obtain baseline minutely values k c;cloud;m . To produce more realistic irradiance profiles, the k c;cloud;m were allowed to fluctuate between intervals of 6 min. The model allows fluctuations at intervals that are a factor of 60. Allowing fluctuations >10 min intervals did not offer similarities in terms of ramp rate occurrences or the variability indices, whilst fluctuations at <5 min intervals saw too much variability. The intention is to capture the gentle rolling of the k c;cloud;m observed in real profiles. Following this, minutely clear-sky index variations were introduced for both cloudy and clear minutes by using a Gaussian white noise multiplier based on the hourly cloud okta: for cloudy minutes, and for clear minutes. Eq. (13) gives the greatest variation in cloudy minutes for higher okta; Eq. (14) gives the greatest variation in clear minutes with increasing cloud cover.

Generation of irradiance time series
Clear-sky irradiance values, and their direct and diffuse components, were obtained from the HELIOSAT method (Hammer et al., 2003). HELIOSAT is fully described by Hammer et al. (2003) and so is not repeated here. Whilst clear-sky irradiance from the radiative transfer code can be used, the HELIOSAT method is flexible enough to calculate minutely clear-sky direct and diffuse irradiances G B;cs and G D;cs for any location of choice in very little computational time.
The global clear sky horizontal irradiance is the sum of the components and the global all-sky irradiance is given by A further two adjustments are made to ensure that unrealistic values of G are not seen. If the value of k c;m is less than 0.01, it is set equal to 0.01. At the other end of the scale, there are many situations where k c;m exceeds 1 in the distribution tails. This is most likely to happen at low solar elevations where horizon and ground effects are  Table 1 Sky states used in the weather generator model and the distribution driving the initial value of k c for each change of state. The normal distribution is parameterised by ðl; rÞ where l is the mean and r is the standard deviation. The Weibull and Gamma distributions are parameterised by ða; bÞ where a is a shape parameter and b is a scale parameter.
Hourly okta k c distribution for cloud cover 66 k c $ Nð0:6784; 0:2046Þ 7 k c $ Weibullð0:5577; 2:4061Þ 8 k c $ Gammað3:5624; 0:0867Þ more pronounced, and the division of a small G by a very small G cs leads to high values of k c . The largest values of k c from the observed irradiances were found to obey the relationship given by (R 2 ¼ 0:9931) k c;max ðh z Þ ¼ 27:21expðÀ114 cosh z Þ ... þ 1:665 expðÀ4:494cos h z Þ þ 1:08 ð17Þ so that k c;m is set equal to k c;max if the value drawn from the distribution exceeds this upper threshold.

Decomposition of all-sky irradiance into direct and diffuse components
The decomposition of global irradiance into direct and diffuse components is necessary in order to calculate the irradiance on a plane of arbitrary alignment. Direct irradiance under all sky conditions is related to clear-sky index and clear-sky direct irradiance by an adjustment (Mü ller and Trentmann, 2010) such that ; G B;cs ðk c À 0:38ð1 À k c ÞÞ 2:5 19 69 6 k c 6 1; G B;cs k c k c > 1: Diffuse all-sky irradiance is the difference between the global and direct horizontal values, G D ¼ G À G B . The direct and diffuse horizontal irradiances are translated to the irradiance incident on a panel of arbitrary inclination and aspect G P using the Klucher model (Klucher, 1979) which is found to perform well even when compared to more recent models (Gueymard, 2009): where h i is solar incidence angle normal to the angled plane, and F ¼ 1 À ðG D =ðG B þ G D ÞÞ 2 .

Validation
All data processing was performed using the commercial software package Matlab r2012a (Matlab, 2012b). Hourly weather observational data are taken from the British Atmospheric Data Centre (BADC) (BADC, 2013) which stores data from MIDAS' land surface monitoring stations, which are well geographically dispersed across the UK. As monitoring stations are occasionally taken off-line for repairs or upgrades for months at a time, 12 years of data sets are used to allow at least 10 years data for each variable.

Hourly validation in Leeds, UK
For the application of the model, 12 years of hourly weather observations for the monitoring site at Leeds Church Fenton (Source ID: 533) were taken from the BADC MIDAS data sets to produce the Markov chains. In order to validate the model, 12 years of radiation observation data were taken for the same monitoring station for comparison. Missing data points are assumed to insignificantly impact the validation. In cases where duplicate hourly measurements exist, the hour that has undergone quality control by MIDAS is selected.
Figs. 4 and 5 illustrate direct comparisons between modelled and observational mean hourly irradiance, averaged across 10 years. The 10 year mean annual irradiance is also indicated by the + scatter point at 112.5 W m À2 observational and 113.5 W m À2 modelled, giving approximately a 0.9% yearly irradiance overestimate for the location of Leeds. Whilst the intent of the model is not to produce hourly mean data in this format, averaging the minutely irradiance generated by the model over hourly timesteps shows a strong correlation when compared with the observational data ðR 2 ¼ 0:9715Þ. Fig. 6 displays the mean 10-year frequency distribution of the mean hourly irradiance, normalised to a single year. The correlation between the modelled and observational outputs is R 2 ¼ 0:9963.

Minutely validation for Cambourne, UK
Validation against a minutely irradiance dataset is necessary to confirm that the model can achieve statistically representative minutely resolution. To demonstrate this, 7 years of minutely radiation data was taken from the World Radiation Monitoring Centre -Baseline Surface Radiation Network (WRMC-BSRN) from BSRN station number 50, located in Cambourne, Cornwall, UK (WRMC-BSRN, 2014). Missing data points were ignored and deemed to not significantly impact the distributions for comparison. The model inputs were adjusted for the location of the Cambourne weather station using a latitude of 50:2178, longitude of À5:32656, and   Fig. 4. Comparison of the seasonal mean diurnal irradiance profiles and the annual mean diurnal irradiance profile between 10-years of modelled output and 10-years of validation data from Leeds, UK.
height above sea level of 87 m. 12 years of hourly weather observations for the monitoring site at Cambourne (Source ID: 1395) were taken from the BADC MIDAS data sets to produce the appropriate Markov chains. Three metrics are used to validate the intermittent nature of the model output, the variability index CDF, the irradiance CDF and the ramp rate CDF. The variability index is the ratio between the rate of change of the measured irradiance and the rate of change of the reference clear sky irradiance (Stein et al., 2012). The more intermittent the day's irradiance, the higher the variability index. Furthermore, the 2-sample Kolmogorov-Smirnov (K-S) test was carried out for each metric, for each day of the year. The subset of each K-S test consisted of 7 of the same day's minutely data. For example, 7 modelled samples of the 1st-January represents one subset and is compared against 7 samples of the same day from the observational data.
The comparison of the CDFs of the mean daily variability indices for the model output and the measured validation data is shown in Fig. 7a. The two curves correlate well at R 2 ¼ 0:9903. The model generated variability indices have a slightly increased frequency of mildly variable days between variability indices 10-25 as is indicated by the steeper slope, whilst having a slightly reduced frequency of extremely variable days. Table 2 indicates that 94:38% of days analysed with the 2 sample K-S test reject the null hypothesis that the minutely variability indices of each day of the year observed and modelled cdfs are not from the same sample, when using a significance level of 99%. No annual temporal bias was observed from the days that accept the null hypothesis. The comparison of the observational ramp rate occurrence CDF and that of the model have an excellent correlation, as shown in Fig. 7b with R 2 ¼ 1. Both the 1-min interval of ramp-up and ramp-down events are captured excellently using this methodology. This is furthermore demonstrated with the results from the dayby-day K-S tests shown in Table 2. All days tested reject the null hypothesis across all tested significance levels with mean asymptotic p-value of 6:7365e À29 . Fig. 7c shows the comparison of the CDFs of the irradiance occurrence frequency between the 1-min generated data and the 1-min observational data across 7 years. The probability distribution functions correlate well at R 2 ¼ 0:994 with a very slight underestimation of the mid-range irradiance occurrences. A possible cause for this is combining the clear sky indices of okta values 1-6 with the same distribution. The day-by-day irradiance frequency distribution compare well with observational data as seen in Table 2, which shows that over 95% of days tested reject the null hypothesis at a significance level of 99%.

Application and discussion
Irradiance is generated over a 10 year simulation using Leeds, UK, input observational data to complete the methodology, the same simulation was used in the validation. A distinctive element of this methodology is the use of 20 different Markov chains in order to capture seasonal, diurnal and pressure based variations.
Combined, the Markov chains successfully replicated the okta frequency distribution of the 12 years of observation data. The mean percentage error between the modelled and observed okta transitions is À0:03%, which demonstrates that the overall statistics for Leeds are retained even with 20 separate Markov chains in use, whilst capturing more detailed transition characteristics at certain times of a year. The annual correlation of modelled and observational okta frequency distribution is R 2 ¼ 0:9956, showing a highly accurate reproduction of the location's okta statistics.
Each type of okta Markov chain (diurnal, pressure, and season) is analysed through examining the variation away from the annual mean okta Markov chain (Fig. 2) to assess the impact each of the different types have on the transition probability. The mean okta Markov chain is produced with every transition in the observational data, and not separated for season, pressure and season.

Diurnally weighted okta Markov chains
The most significant deviation of transition probabilities away from the mean is seen with the morning diurnal dependency type Markov chain. A comparison of the mean of the diurnal Markov chains (so that variations caused by pressure and season are excluded) minus the mean okta Markov chain is shown in Fig. 8. It illustrates the most significant deviations from the mean, as the differences in probability below AE0:01 are removed from the plot. There is a very distinct pattern for the probability of an okta number to remain the same from one state to the next, with a decrease of between 0.01 and 0.25 for okta 0-6. Okta 7, 8 and 9 however have an increased probability of remaining the same meaning that cloudier weather states are longer and more prevalent. Significant probability increases for an okta state transitioning to a higher okta number are seen across all okta states. There is also a slight increase for okta 3, 4 and 8 to transition to okta 0. The physical implication of this is that during the morning period (00:00-05:00 am), the current okta value has an increased tendency to transition towards an okta value of 6-8 or 0, and so cloud state is most likely to develop into either fully obscured or complete clear sky. This increased tendency towards both okta extremes during the morning  Table 2 The percentage of days that reject the null hypothesis when performing the 2-sample Kolmogorov-Smirnov test on 7 of the same day of modelled and observational CDF profiles of the minutely variability indices, minutely ramp rate occurrences, and the minutely irradiance frequency, tested at increasing confidence limits.  period is not captured when using a single Markov chain for all times of the year.

Pressure weighted okta Markov chains
The reason for the inclusion of transitions based on pressure is to attempt to capture variations in weather that are caused by high and low pressure systems. The approach is simplified to be either above or below the average pressure. Fig. 9 shows a slightly different analysis to the diurnal comparison for simplicity. The mean below average (BA) transition probabilities are subtracted from the mean above average probabilities (AA). The darker shade (blue) represents occurrences where the BA transition probability is greater than that of AA, for that particular transition only. The lighter shade (green) represents where the AA transition probability is greater than that of BA. The most significant key point to note is that the probabilities for okta 0-3 to transition to a cloudier state are consistently and significantly greater during periods of BA pressure. The transitions are more stable during AA pressure, with an increased tendency to remain at the same okta state. For okta 1-2 during AA pressure, there is a slight increase in probability to become less cloudy. Okta 7-9 during BA pressure has a higher tendency, þ0:1, to remain cloudy.

Seasonally weighted okta Markov chains
The variation due to the different seasons is detailed in Fig. 11, where the deviation from the mean okta Markov is shown for each of the four seasons. Again, in order to observe the most significant of differences, deviations of AE0:015 are removed. Autumn variations are the closest to the mean. The frequency of occurrence of okta 2-6 are  the lowest and so differences in this region offer the least impact on the overall model. The tendency for okta 1 to remain the same is significantly larger for autumn. Spring transitions are similar to those of autumn, in that the most significant deviations occur during the least sensitive okta range. There is however a distinct shift to favour lower okta values. Summer and winter contain the most significant variations from the mean. This is most important in summer as this is the period receiving the majority of the year's irradiance. Summer sees an increase in the probabilities along the diagonal (x ¼ y), hinting towards more stable transitions, directly contrasting the pattern observed in winter. Interestingly, winter has an increased tendency to move towards the two extremes of clear sky and fully obscured. It is worth noting that the differences reported here are specific to the location of Leeds, whilst the differences may be similar in pattern, the seasonal, diurnal and pressure differences are specific to the input observational data.

Irradiance
Hourly averages of clear sky indices derived from the model were compared with k c values from the MIDAS data (Fig. 10). It can be seen that the model recreates bi-modal structure of the real-world distribution of clear-sky indices. The peaks from the simulated data are of similar height to the MIDAS data, although the intermodal spread is lower and in particular there are fewer extreme high or low values in the simulated data. Fig. 12 displays the diurnal root mean square error (RMSe) performance of the simulated data for each season and annually, averaged across 10 years. The RMSe is calculated as a direct comparison of 10 years of modelled mean hourly irradiance data aligned with 10 years of mean hourly observational data. The time of day of most importance is around midday during the summer season during maximum clear-sky irradiance as this is when the potential for the largest ramp-rates and peak outputs are found.
Typical un-obscured irradiance outputs during these times are around 900 W m À2 at Cambourne. The RMSe in Fig. 12 at midday in summer is at 14 W m 2 , offering potential mean hourly irradiance error of AE1:5%. The vast majority of hourly RMSe values fall below 15 W m 2 . The correlation of 10 years observational and generated mean hourly irradiance profiles is R 2 ¼ 0:9715.

Conclusion
This paper presented a stochastic sun obscured type synthetic irradiance generation methodology with the intention of taking readily available hourly resolution input data and generating statistically accurate 1-min resolution irradiance profiles. One-dimensional cloud cover has been generated so that the horizontal cloud length follows a power law relationship. Twenty different Markov chains are used in setting the hourly conditions that determine the cloud cover. These Markov chains allow the stochastic selection of wind speed and cloud height whilst accounting for seasonal variability, and the selection of cloud cover accounting for seasonal, diurnal and pressure variability. The use of diurnal Markov chains is shown to have the most significant influence on the output. The variability of okta transition due to pressure system and season also allows for greater accuracy in capturing the weather conditions of a geographical location. The distribution of hourly clear-sky indices are recreated whilst allowing for minutely clear sky index fluctuations.
This methodology is validated on an hourly basis using radiation observations from Leeds Church Fenton, UK; and on a 1-min basis using radiation observations from Cambourne, Cornwall, UK. The model was shown to capture the irradiance frequency distribution with R 2 ¼ 0:9963 and the 10-year mean hourly irradiance with R 2 ¼ 0:9715. The 10-year mean diurnal comparison was shown to perform well for each season, the maximum RMSe during summer midday is at 14 W m À2 for a single hour. The largest RMSe equates to an error during summer height outputs of AE1:5%. The use of clear sky index frequency distributions as a function of the okta number allows the typical cloud characteristics inherent during okta to be produced. The 1-min validation in Cambourne showed the same performance on an hourly resolution as shown for Leeds, however it has been demonstrated that the 1-min irradiance ramprates, irradiance frequencies and the variability indices distributions correlated very well at R 2 values of 1:000; 0:9994, and 0:9903 respectively. Day by day 2-sample K-S tests were carried out on the modelled and obscured CDFs of the variability index, ramp rate occurrences, and the irradiance frequency; the percentage of days that reject the null hypothesis at significance levels of 99% are 94.38%, 100%, and 95.89% respectively. Whilst not a predictive model, visual inspection of both the 1-min resolution data displayed in Fig. 13 shows how the observational and modelled irradiance profiles have very similar characteristics during overcast and intermittent moments. This model can be used to generate statistically accurate 1-min time series irradiance data from readily available 1-h meteorological input data. The generated irradiance can be utilised as an input into other applications such as temporal intermittency studies, small scale grid impacts modelling associated with residential PV, and residential PV integrated battery storage modelling. Furthermore, this methodology is not limited to a minutely resolution and can theoretically be applied to produce secondly time series, however, in order to have confidence in the accuracy of the output, higher quality resolution observation data is required. This methodology could be extended by the inclusion of a spatial dimension using geographic smoothing techniques such as those demonstrated in Lave et al. (2012), as well as exploring potential improved accuracy through the use of higher order Markov models.
The solar resource model has other possible applications and a freely downloadable example of the model will be provided in the hope that researchers will adopt and adapt it for their own purposes (Bright et al., 2015).