Panel-based stratified cluster sampling and analysis for photovoltaic outdoor measurements

We study a stratified multisite cluster-sampling panel time series approach in order to analyse and evaluate the quality and reliability of produced items, motivated by the problem to sample and analyse multisite outdoor measurements from photovoltaic systems. The specific stratified sampling in spatial clusters reduces sampling costs and allows for heterogeneity as well as for the analysis of spatial correlations due to defects and damages that tend to occur in clusters. The analysis is based on weighted least squares using data-dependent weights. We show that this does not affect consistency and asymptotic normality of the least squares estimator under the proposed sampling design under general conditions. The estimation of the relevant variance–covariance matrices is discussed in detail for various models including nested designs and random effects. The strata corresponding to damages or manufacturers are modelled via a quality feature by means of a threshold approach. The analysis of outdoor electroluminescence images shows that spatial correlations and local clusters may arise in such photovoltaic data. Further, relevant statistics such as the mean pixel intensity cannot be assumed to follow a Gaussian law. We investigate the proposed inferential tools in detail by simulations in order to assess the influence of spatial cluster correlations and serial correlations on the test’s size and power. © 2016 The Authors. Applied Stochastic Models in Business and Industry published by John Wiley & Sons, Ltd.


Introduction
Photovoltaics (PV) contributes substantially to the power supply in many developed countries. PV systems may consist of hundreds and thousands of PV modules, which are exposed to heavy operating conditions over many years. This exposure may result in degrading performance and damages, as well as defects and defaults of modules. Collecting on-site outdoor measurements in order to detect such defects and assess their temporal development and impact, as well as to evaluate the overall quality and economic value of such systems, is a challenging task and requires careful designs for sampling and analysis.
In this article, we propose a panel-based methodology for the medium-term and long-term evaluation and analysis of a heterogeneous set of PV systems. There are several problem-specific issues that have to be taken into account and motivated the specification of the proposed approach. In practice, PV systems often consist of PV modules from different module types and even different manufacturers. Further, various other site-specific factors may affect response variables related to quality and reliability issues, for example, the exposition to salt or snow, the type of the electrical connectors or the direct current (DC) to alternating current (AC) converter, also called solar inverter.
Another issue is that quality measurements taken at PV modules may be affected by spatial correlations. There are several phenomena, which may result in such dependencies. For example, it may happen that the modules are installed in the same order as they are produced, such that correlations from the production line are propagated. Another source of correlation is the fact that the PV modules are electrically connected in strings and mechanically mounted in rows (typically one to six), on racks. In this way, electrical, as well as mechanical, issues may cluster. For example, hot spots visible in infrared irradiance images result from low current solar cells (e.g. due to a damage) connected in a string with good cells,

The general sampling design
The general sampling approach, which combines the concepts of stratified sampling, random cluster sampling and a panel design, can certainly be applied to a large number of field studies. Nevertheless, we shall explain and elaborate the approach in terms of the PV industrial application that motivated the work, namely, to design a methodology for the evaluation and analysis of large PV systems.
The aim was to elaborate a solution that allows to answer different important questions ranging from the estimation of defect rates to the testing of site-specific effects (such as the module type) based on measurements taken at several time points. Simultaneously, the sampling costs have to be taken into account and balanced with the statistical properties, which generally motivates the choices of the sample sizes.
Initially, at time zero, we propose to draw, at each site, a large random sample of size n = 400; samples at later stages may consist of 200 modules. The initial sample may be stratified by the module type (or manufacturer), as well as by a small number of important focus defects or predamages, such as an increased number of microcracks in the electroluminescence (EL) image supplied by the manufacturer or made after having installed the PV system. The reason to conduct stratified sampling is that for those focus defects a minimal sample size, say 40, should be guaranteed, in order to allow certain strata-wise statistical analyses, even when the strata correspond to rare events. Suppose, for simplicity, that there are k = 4 strata and stratum 0 represents good modules, whereas the other strata with probabilities between 0.05 and 0.15 represent defects. The proposed stratified sampling is always conducted in multiples of the strata sample sizes (m 1 , … , m 4 ) = (80, 40, 40, 40), respectively, according to the proportions 2 ∶ 1 ∶ 1 ∶ 1. Let w i denote the true proportion of stratum i in the population. Then the distribution function (d.f.) of a randomly drawn measurement X is given where F i is the d.f. of the ith stratum. If stratified sampling is applied, one replaces the w i by m i ∕(m 1 + · · · + m k ). Given estimatesF i (x) for the ith stratum, one may easily estimate the population's d.f. F(x).
It may happen that the classification of the modules with respect to the strata is unknown at the beginning of the study and thus has to be determined after having drawn the initial random sample of PV modules, for example, by appropriate measurements (e.g. based on expert audits, power output measurements and EL or infrared imaging) and by applying clustering techniques. In this case, one may proceed as follows: we observe for each quality feature X a bivariate sample (X 1 , 1 ), … , (X n , n ), where i ∈ {1, … , k} indicates the observed strata of the ith observation. Let us assume that E( i ) coincides with the true class of X i for all i. Then the unknown proportion w i of each strata can be easily estimated bŷ In general, those strata proportions (defect rates) do not coincide with the required strata sample sizes. As discussed in [1], one then may continue sampling until all strata have the required sample sizes.
Consider now a multisite study where a, a ∈ N, PV systems (sites) are under investigation, such that the pooled PV modules represent a random sample of the underlying overall population of PV modules. By weighting the aforementioned estimates with the proportion of PV modules, n v ∕(n 1 + · · · + n a ), where n v denotes the number of PV modules installed at site v, v = 1, … , a, one may easily generalize the aforementioned estimators.
The initial (stratified) large random sample allows assessment of the PV systems by current-voltage curves (IV curves) and EL imaging, in order to evaluate the initial quality and economic value of the systems in terms of power output, power losses and damages, defects (such as microcracks or cell breaks) and predicted future returns, right from the beginning. To assess the temporal development, one has to conduct a longitudinal study resulting in time series observations. Because, in practice, monitoring all initially selected PV modules over a longer time span is infeasible, we propose to select a panel. From the initial sample at time t = 0 one selects, say, 50 anchor modules by random, forming the core of the panel for the longitudinal study. Under the proposed stratification scheme, one may simply select half of the modules in each stratum by random. At each time instant, each of those 50 modules and four direct neighbours are measured. This means the sampling is conducted in randomly selected clusters (batches) consisting of b = 5 modules, thus resulting in a sample of size 200 at each time point. Consequently, we are given a panel time series where the panel of PV modules is observed over the time span of interest.
It is important to select the anchor modules spatially at random, within the strata defined by the stratification variable (module type and manufacturer), in order to ensure that the whole area of the PV site is covered. Taking that sample as a random sample of possibly correlated batches has two additional advantages: the engineer can easily detect clusters of defects and may perhaps infer their causes. In addition, those clusters can be used to estimate local spatial correlations. Usually, there is an additional effect present in such a cluster. For example, for PV outdoor measurements, there is a row effect as the modules are typically installed in two rows, and according to expert knowledge, measurement from the upper and lower row may differ.
We assume that the panel is remeasured at later time instants, for example, on an annual basis. In what follows, we assume that the study is planned for a small number of time points, such that serial correlations may be present. Then we are given only a short (vector) time series, such that methods from time series analysis cannot be applied. Therefore, we shall rely on a multiple testing procedure to test for interesting effects, especially main effects associated with a site or a certain defect. According to our Monte Carlo investigations, the proposed multiple testing approach is not severely affected by serial correlations, which justifies the approach: ignoring serial correlations provides a good approximation for the settings studied in our simulations.

Stochastic model and inference
In order to analyse data collected at, say, a PV sites according to the aforementioned sampling design, we consider a stratified linear model with random effects; for a general exposition on linear models, we refer to [2]. This framework allows us to take into account fixed site-specific effects represented by regression vectors v for site v ∈ {1, … , a}. For an empirical study designed to investigate influential factors, one may select the sites with respect to factors of interest, such as the type of the converter, the type of grounding, the geographical location or the exposure to external factors, such as salt or snow. If the engineer is interested to keep things simple, we propose to consider binary effects (coded by +1 and −1) and to consider a (nearly) orthogonal design to obtain statistically sound estimates. Orthogonal design vectors allow for uncorrelated estimation of the associated regression coefficients (main site effects), whereas a nearly orthogonal design aims at approximating orthogonality when it cannot be achieved exactly; for an exposition, we refer to [3]. Table I illustrates, as an example, a nearly orthogonal design for the case of a = 5 PV sites and four binary factors.
Let Y (v) j be the jth observation of stratum at a fixed site v and let (v) = (Y (v) j ∶ j = 1, … , m , = 1, … , k). We may and shall assume that all observations are sorted in such a way that the first m 1 entries correspond to the first strata, the next m 2 to strata two and so forth. To analyse the data in the presence of regressors as introduced earlier, we assume the linear model Here, v is the, say, p-dimensional regression vector with the experimental conditions of site v (extensions are discussed next), v j is a q-dimensional vector of additional explanatory variables, p, q ∈ N, is the grand mean, and are the regression coefficients and vj are mean zero and independently distributed error terms representing the measurement error when observing the jth measurement of th stratum. Br represents the mean zero random effect of the rth cluster (batch), r = 1, … , m ∕b, assumed to be i.i.d. with common variance 2 and independent from the measurement errors v j . At this point, we confine ourselves to that model and postpone discussion of several extensions to the subsequent sections. Notice that the design matrix of model (3.1) takes the form , which is a diagonal matrix for an orthogonal design. The aforementioned model can be used to analyse data from one site or data from all sites. Let us first discuss the former case. Fix v ∈ {1, … , a} and put (v) Observe that the first m 1 entries correspond to the first strata and follow the d.f. F (v) 1 , the next m 2 to strata two and have distribution F (v) 2 and so forth. For simplicity, we assume that the size of all batches is equal and each batch contains only PV modules of the same stratum. Then the variance-covariance matrix of (v) is given by being the variance of the measurement error at site v for stratum . Here, ⊕ stands for the direct sum of two matrices and , that is, the block diagonal matrix with upper left submatrix and lower right submatrix , It is easy to see that the within-cluster correlation for stratum is given by . Of course, the structured variance-covariance (3.4) results from the linear random effects model (3.2). More generally, we may also allow for an unstructured variance-covariance matrix. To do so, partition (v) = ( r consist of the corresponding mean zero error terms v j with marginal d.f. F . The unstructured (fully unspecified) cluster sampling model now assumes that If the aforementioned model is used to analyse the data of the whole study, one puts Assuming independence across sites, the variance-covariance matrix of the errors is given by The design matrix is = ( (1)′ | · · · | (a)′ ) ′ with (v) as aforementioned. A straightforward calculation shows that the upper left submatrix of ′ is a diagonal matrix for an orthogonal design, and the same applies to ′ for any diagonal matrix .
The parameter vector of interest is = ( , ′ , ′ ) ′ . Following [4], we consider the weighted estimator̂n of as a solution of the following minimization problem This estimator can be written explicitly as followŝ Notice that if the weights and therefore are deterministic or estimated from an independent sample, then the covariance matrix (given the learning sample to estimate W) of the estimator̂n has the usual form If b and k are fixed, is a b-dependent series, such that the least squares estimator can be seen to be consistent and asymptotically normal under routine regularity conditions, as min m → ∞.
For general stochastic weights, we have the following sufficient conditions for the validity of the method, which we formulate for a fixed site and thus omit the corresponding index. Suppose that Further, let us assume that the weighted least squares estimator satisfies the following regularity assumptions: Allowing for (non-random) regressors that may depend on the strata ∈ {1, … , k} and the repeated measurement j ∈ {1, … , m }, we assume that as min m → ∞, for some regular matrix , the second moments of the regressors as min m → ∞, for some regular matrix that depends on the cluster variance-covariance matrix , = 1, … , k. If the stochastic weights satisfyŵ then the weighted least squares estimator with stochastic weightsŵ has the same asymptotic distribution as the weighted least squares estimator with deterministic weights w : as min m → ∞. That central limit theorem also yields the weak consistency of the proposed estimator. It is worth mentioning that Assumption (3.10) is very weak. In particular, the weights can be estimated in-sample, that is, from the same data to be analysed. For the readers convenience, a proof that adopts arguments as detailed in [5, ch. 8 corresponding to (v) , which is a consistent estimator under fairly general conditions even if the specific structure of dependencies of the linear model approach does not hold. Now, the variance-covariance matrices (v) and are estimated by substituting the (v) by their estimates in the aforementioned formulas.

Comparison with linear mixed models and extensions to nested models
In this section, we compare the aforementioned nonparametric modelling and estimation approach with the classical linear mixed models approach [6] and discuss its extension to nested models as arising, for example, when repeated measurements at each module are available. The linear mixed model is similar as (3.1) but imposes much stronger assumptions. It is given by where it is assumed that | ∼  ( + , 2 ) and ∼  (0, 2 ), for some unknown matrix . In order to estimate the fixed effects parameter vector, , of the model and the distribution parameters, ( , ), of the random effects , one employs maximum likelihood or restricted maximum likelihood approaches. In the latter approach, the optimization problem is split into two parts, namely, estimation of fixed effects parameters and random effects [7].
In order to compare our ansatz with the classical linear mixed model approach, we remark that the model described in Section 3 can be easily extended to several random effects. For the application discussed in this paper, we shall be mostly interested in the so-called nested random effects models, as there is strong evidence that our real data have this structure (Section 7), where this issue is discussed in greater detail and substantiated by data analysis.
The nested structure arises when we not only observe a measurement Y (v) lj for each module but also a measurement Y where the error terms̄v ljc are mean zero with a variance-covariance matrix reflecting the nested random effects structure. A common approach is to assume that, for independent mean zero random variables B,vl (random effects for modules inside batch l), C,vj (random effects for cells inside module j) and vljc (measurement error), all with finite variances, Observe that this specification is in agreement with typical assumptions made in the classical linear mixed model, especially equi-correlated errors within a module. Those assumptions, however, can be relaxed easily in our approach.
Observe that, similar as in Section 3, the covariance matrix has a nested block structure; that is, each non-zero block is a block matrix on its own and has the following form where m l is the number of modules in batch l (possibly different for each batch) and the matrices (v) 0 and (v) are square matrices with dimensions equal to the number of cells in a module. Under the aforementioned specification, (v) 0 has identical elements on the diagonal as well as identical off-diagonal elements and the matrix (v) has identical elements everywhere.
We may weaken those assumptions for the variance-covariance matrix of the error terms by assuming only unknown correlations between different cells in a module (that are not necessarily equal). In that case, both matrices (v) 0 and (v) in (4.1) are unknown and have to be estimated as well.
Estimators for the variance-covariance matrix can be constructed in a similar way as described at the end of Section 3, namely, by averaging sample covariance matrices of the vector of observations corresponding to the cells inside each module.
Contrary to a linear mixed models approach, we do not require a normality assumption of the error terms, which is restrictive for applications and, as shown in Section 7, is not valid for our data, because the method of estimation does not require to specify distributions and the conditions we impose for asymptotic normality of the fixed effects coefficients are nonparametric. Further, in our approach, we may allow weaker assumptions on the variance-covariance structure of the error terms, which allows to take into consideration a more general model for the observations of the linear mixed models.

Extended panel time series model and inference
Let us now extend the model of the previous section to the case that the anchor modules and the associated clusters are observed at T time instants thus forming a vector time series. We shall also generalize the model for the cluster effect to the case that a random effect inducing within-cluster correlations is only present with a certain probability.
To allow for serial correlations, we consider an additional additive component that follows a linear time series model. For general expositions on such models and extensions, especially to the class of ARMA (autoregressive-moving-average) models and linear processes, we refer to [8] and [5].
We assume that panels of a sites are observed over time at T time equidistant points, where T is small. At the t-th time point, t = 1, … , T, measurements are taken according to a stratified cluster sampling approach with four strata, stratified sample sizes (m 1 , … , m 4 ) = m * (2, 1, 1, 1) for some integer m * and a cluster size b. We formulate the model for a fixed site and therefore omit the site index v. So let us assume that where t j are the strata-related error components to be discussed next and t d j are cluster-related error terms, d j = ⌊(j − 1)∕b⌋ + 1, j = 1, … , m , = 1, … , k and t = 1, … , T. Further regressors are omitted to keep the exposition brief but are straightforward to take into account.
The cluster-related random effect is now assumed to follow a mixture distribution where 0 stands for the Dirac measure in 0. This means, with probability p within-cluster correlations are present, which degree is controlled by the parameter . The strata-related error t j is assumed to be governed by a stationary autoregressive AR(1) model, where and c are parameters and t j is a Gaussian white noise process; that is, t j are i.i.d.  (0, 1). Extensions to ARMA models, for lag polynomials (L) and (L), L the lag operator, are straightforward. For simplicity of exposition, we confine ourselves to an AR(1) model. The AR parameter controls the dependence between time points and is assumed to satisfy the stationarity condition ∈ (−1, 1). The variance of the strata-related errors becomes now a function of both and c, The model (5.1) combines spatial correlations as well as serial correlations of the measurements. Statistical inference can be conducted as follows: at each time instant t, the available observations are analysed using the test introduced in the previous section. The global null hypothesis that there is no (main) effect at all is rejected, if one test rejects. Obviously, we have to deal with the issue of a multiple testing problem, because T hypotheses are now tested. There are two well-known approaches to handle this, namely, the Bonferroni correction method and the Šidák corrections, where the latter assume independent samples. Because in model (5.1) the time series component is additive, any statistic U t that depends only on the time t observations treats its value as a constant, which therefore can be absorbed in the intercept. Hence, U 1 , … , U T are conditionally independent given {Y t j ∶ t = 1, … , T}, and it is easy to verify that the Šidák corrections apply.

Modelling and analysing strata proportions (defect rates)
Let us now discuss how to estimate the strata proportions and how to assess the estimation error within the proposed stratified cluster sampling approach. Estimation of the strata proportions is, of course, interesting in its own right, but it is deeply motivated by the fact that for the PV application of interest, the strata usually correspond to relevant defects or damages of solar cells and PV modules, respectively. Even if continuous quality measurements are available such that the approaches of the previous sections are applicable, valid confidence intervals for the defect rates are of interest. Of course, in the presence of quality measurements, the defect rates are related to the distribution of those measurements. Therefore, let us consider the following threshold model that links the strata to an underlying random variable X. Given strictly ordered thresholds , = 0, … , k, with 0 = −∞ and k = ∞, where k is the number of strata, a PV module with quality measurement X, for example, the power output, belongs to stratum , if The corresponding proportions are p = P( −1 < X < ), for = 1, … , k. That threshold model can be linked to the model of the previous section as follows: for given strata proportions at each time instant t, one may calculate the associated thresholds from the d.f. of Y (v) jt , which can be easily determined for given model parameters. Having calculated all resulting thresholds (v) t , one can determine the right strata for each simulated data set {Y (v) jt } by simple classification. After these preparations, for each (simulated) sample of size n v , we may therefore calculate the relative frequencies of the strata, to estimate the true strata proportions p(t, ) = Ep n (t, ).
In order to calculate confidence intervals for the true strata proportions, it is necessary to take into account the dependence structure induced by the stratified cluster sampling approach. The usual formulas based on the binomial distribution are not valid, of course.
We may rely on the following generic asymptotic result (see the Appendix for a derivation): let i = ( i1 , … , ib ) ′ ∈ R b are r i.i.d. random vectors with common variance-covariance matrix , where each coordinate is distributed according to the Bernoulli distribution with parameter p.
where n = rb and b stands for the cluster size. The unknown variance-covariance matrix can be estimated nonparametrically as discussed in Section 3 when replacing the Y-observations by the corresponding indicators appearing in (6.1). Denote the corresponding estimator bŷ.

Data analysis
We analysed real data collected at a PV system in order to investigate whether the complex structure of (nested) spatial correlations arises in real PV outdoor measurements. Further, we were interested in checking whether or not the strong assumptions, as imposed by the classical linear mixed model approach to analyse such data, hold. The analysed data are field measurements taken at the beginning of the PV-Scan study. From each PV module of a randomly drawn batch of modules belonging to the same string, an EL image was taken without demounting the module, resulting in more than 2000 images. Firstly, each EL image was preprocessed to correct for optical distortions of the camera's lens and perspective distortions that arise when the camera's sensor is not parallel to the PV module. A particular issue arising here is the robust estimation of the horizontal and vertical lines determining the boundaries of the module's cells and a cell's areas, which are separated by the grid fingers collecting the current. Here, a specialized algorithm was developed based on robust regression [9]. Then the cells of each PV module were extracted automatically and resized to a common (pixel) resolution. Lastly, several summary statistics were calculated including the average EL intensity of each solar cell, on which we shall focus in what follows. Figure 1 depicts the average EL intensity of all modules of a PV system. Each PV module corresponds to a subrectangle of 60 dots representing the 60 average cell intensities. The PV modules corresponding to a batch are put side by side, and the batches are plotted side by side as well, from left to right and row by row, without any correspondence to their (physical) spatial location.
It is clearly visible that the batch-to-batch variation of the average EL intensity is substantially larger than that of the within-batches variation. This indicates that a random effects models is appropriate for modelling and analysis.
We analysed the dependence of the average cell intensity on the batch (random), module (random, nested within batch) and the cell (random, nested within module). The model was fitted using the R package lme4 using maximum likelihood under the assumption of Gaussianity. The variance, 2 , of the random batch effect is estimated bŷ2 n = 226. However, the question arises whether the data satisfy the assumption of normality. Figure 2 shows a quantile-quantile plot of the model residuals. The departure from normality is obvious and also confirmed when conducting a significance test such as the Shapiro test, shedding doubt on parametric approaches, such as maximum likelihood estimation and inference, for such data.
Contrary to the normal assumption behind the maximum likelihood approach for random effects linear models, the methodology developed in this paper allows for non-normal errors. In addition, for equal batch sizes, we allow for an arbitrary covariance matrix of the measurements taken within a batch, whereas the standard approach assumes a structured covariance matrix with equally correlated measurements within batches and modules.

Monte Carlo simulations
We conducted several simulation studies in order to investigate the statistical properties of the aforementioned methodology. The aim was to balance the sampling costs and the required statistical properties (power of tests and accuracy of estimators) under distributional assumptions that are realistic for PV outdoor measurements. All Monte Carlo simulations are based on 6000 repetitions except otherwise stated. We used the mixed model (3.1) with fixed regressors v as specified in Table I and true coefficients = (1, 0.7, 0.4, 4 ) ′ , where 4 varies to study type I error rates and power under different settings.

Power of the stratified cluster sampling design
We first investigated the statistical power when testing a main effect in the presence of correlated clusters, for a fixed site. The size of the clusters was selected as b = 5, the number of strata was k = 4 and the 80 ∶ 40 ∶ 40 ∶ 40 stratified sampling as discussed earlier was applied. The distributions of the strata were chosen as Gaussian distributions,  (a , 2 ), with parameters a 1 = 0, 2 1 = 4, a 2 = 0, 2 2 = 9, a 3 = −2, 2 3 = 5 and a 4 = −6, 2 4 = 3 for the four strata. Table II provides the proportions of the strata.
Clearly, the variance of the strata distributions affects the degree of the within-cluster correlation, which is also controlled by the variance 2 of the random effect that was assumed to follow an  (0, 2 ) law. Table III shows       Being interested in the type I error rate and statistical power under alternatives when testing a main effect, we fixed = 200 and = (1, 0.7, 0.4, 4 ) and investigated testing the null hypothesis H 0 ∶ 4 = 0 against the associated two-sided alternative H 1 ∶ 4 ≠ 0 at the nominal 5% significance level. Table IV provides the simulated rejection rates. It can be seen that the test behaves well in terms of the type I error rate even for substantial cluster correlations. The first row in Table IV provides the corresponding results when ignoring the cluster correlations, that is, when assuming independent observations. We can see that the effect of cluster correlations is substantial and leads to a heavily overreacting test when falsely assuming independent data. Indeed, even for small cluster correlations of 0.2, the size of the significance test is unacceptable, such that ignoring such correlations leads to invalid inference. The correlations also affect the statistical power, which decreases as the degree of dependence increases. However, for the chosen sampling approach, the power is still acceptable for correlations corresponding to ⩽ 1.5.
In order to investigate the practical influence of a non-uniform sampling across sites, such samples were generated using the strata sample sizes given in Table V. The simulated size and power when testing a main effect under non-uniform stratified sampling are shown in Table VI. Again, the first row shows the simulated rejection rates when the cluster correlations are not taken into account. Overall, the results presented in Table VI show that the corresponding rejection rates are quite similar to the findings for uniform stratified sampling.

Equivalent sample size
A sound and comprehensible measure of the amount of power loss due to cluster correlations is to determine the equivalent sample size that leads to the same statistical power when no correlations are present. For that purpose, Table VII provides the rejection rates for different sample sizes determined when no cluster correlations have to be taken into account, in order to allow the engineer to gain insight into that issue. For example, the case of n = 200 for = 0.5 (third column in         Table IV) leads to similar power as the case of n = 150 for = 0 (second column in Table VII). We also see that the case of n = 200 for = 1 is approximately equivalent to the case of n = 100 for = 0.

Benchmarking
It is well known that non-normal measurement errors may affect the size as well as the power of statistical tests. When analysing the power output of PV modules, the distribution may take quite different forms owing to the production process and the way how module classes are determined. Therefore, we benchmarked the proposed approach by simulating the rejection rates when sampling the measurement error from historical data sets available to us. For that purpose, the density was estimated from the historical data by the Parzen-Rosenblatt kernel density estimator with bandwidth selected using the Sheather-Jones method [10]. For alternative approaches to the issue of bandwidth selection, we refer to [11] and the references given there. The random effect modelling the cluster correlations was again assumed to be Gaussian.
The results are shown in Table VIII for uniform stratified sampling and in Table IX for the case of non-uniform stratified sampling. For each stratum, a different real historical data set was used to simulate the measurement error. It can be seen that the rejection rates differ from the corresponding values for Gaussian errors, but the proposed test generally behaves very well for the real-world error distributions.

Influence of serial correlations
Having in mind the empirical findings discussed earlier, we were mainly interested in the influence of the degree of spatial cluster correlation on the statistical power for different sample sizes when testing a main effect, in the presence of a noticeable serial correlation between time points. Therefore, we varied the sample size by conducting the stratified sampling scheme with proportions 2 ∶ 1 ∶ 1 ∶ 1 and a cluster size of b = 5. This means, in order to generate data sets of different sample sizes, the number of observations collected at each site, that is, the size of the (stratified) panel, varied between 125 and 1125 at each site. In addition, at time t = 0, an initial random sample of size 400 was drawn to estimate the proportions of the strata. For the time series simulations, T = 8 time points were chosen.
Recall that in model (5.1), the strata-related error depends on two parameters: governs the dependence of the remeasured modules at different time instants and controls the variance of the strata-related error. This means, each stratum has its own parameter i . In our simulation, we used the fixed values ( 1 , 2 , 3 , 4 ) = (4,9,5,3), whereas was selected from the set of values {0, 0.2, 0.7}. Here, = 0 represents the case of independent observations over time.
Recall that the parameters and p related to the spatial clusters that determine the degree of within-cluster correlation. They were chosen as p = 0.1 and ∈ {0, 1, 2, 3, 4}, thus defining four settings I-IV. Table X provides the corresponding values of the coefficient of within-cluster correlation, , in each stratum and for the different values of the model parameter .
The power of the significance test for the null hypothesis H (v) 0 ∶ 4 = 0 against various alternatives was studied, v = 1, … , a, employing corrected critical values to take into account that we have to deal with a multiple testing problem. For this set of simulations, each rejection rate was simulated using 3000 repetitions. Figure 3 provides the power of the considered test at a fixed time instant, for different alternatives for 4 . The AR parameter determining the degree of serial correlations was chosen as = 0.7 to study the case of rather strong serial correlations. The within-cluster correlations occur with probability p = 0.1, and their size is determined by the choice = 4.     Figure 4 demonstrates the influence of the within-cluster correlation on the statistical power. A further finding of this set of simulations is that for the case of a short time series as studied here, the serial correlations have only minor importance. This is likely due to the fact that at each time instant a quite large number of independent random vectors (consisting of the cluster measurements) are available, and those observations are used for inference. The situation may change, if one picks a small subset of the data, for example, a strata or a couple of clusters, and observes them over time. For such an analysis, however, longer time series as considered here have to be collected.

Accuracy of sample strata proportions
Lastly, we investigated the accuracy of the proposed confidence intervals for the strata proportions under the proposed stratified cluster sampling approach. We employed the refined panel time series model from the last section in combination with the threshold model for the strata, as explained in Section 6.
Again, we assume that four strata are of interest. The corresponding proportions at each time instant t were fixed according to Table XI. From those probabilities, the associated thresholds were calculated for a fixed set of parameter values of the simulation model. Those thresholds were then used to determine the strata of each simulated observation of simulated random sample.
The fourth stratum can be interpreted as a good lot, whereas the other three strata represent certain damages or defects. For simplicity, let us consider confidence intervals only for the first stratum, which has the lowest probability.
For engineers, it is instructive to look at simulated examples illustrating how the study and the associated confidence intervals could look like under different simulated worlds. Figures 5 and 6 depict confidence intervals for a simulated sample with a relatively large sample size n = 2500. The green (dotdash-circle) line corresponds to the true proportion of the strata determined by the simulation model. The red (dash-diamond) line are the bound of the 95% confidence interval, and the black (triangle) line is the estimated proportion, calculated from one simulated sample. Figures 7-10 depict the corresponding results for fixed simulated samples of sizes 500 (and 200, respectively) and the four settings I-IV as in Table X corresponding to different degrees of the within-cluster correlation. Simulations not reported here in detail have shown that other parameters of the model are relevant when calculating those confidence intervals.

Conclusions
A comprehensive methodology for sampling and the statistical analysis of multisite PV field studies is developed and thoroughly investigated. The proposed approach allows for stratification (e.g. due to predamages or different manufacturers), various forms of correlations, such as spatial correlations, correlations due to random effects or serial correlations, and non-normal measurement errors. In order to reduce the costs of sampling, it is proposed to take measurements at randomly selected clusters of neighbouring modules. To allow for the analysis of temporal developments, a panel design is employed. Estimation is based on weighted least squares, which provides consistent estimates under weak regularity conditions and allows for feasible inference based on hypothesis tests and confidence intervals. We rely on a nonparametric approach to estimate the unknown and possibly unstructured variance-covariance matrix of the errors, thus going beyond the scope of the classical mixed linear model. Several extensions of the basic approach are discussed in greater detail, including nested mixed models and serially correlated errors. The methodology also includes accompanying proposals for the analysis of binary (count) data collected under the proposed sampling design. The statistical properties are extensively studied by Monte Carlo simulations. Among the investigated properties are the influence of within-cluster correlations, serial correlations due to temporal effects and different strata proportions. Those simulations aim at a thorough understanding of the impact of those various issues, which quite naturally arise in such a study, on the statistical power of hypothesis testing. Here, a major aim is to provide guidance when determining the design and selecting the sample sizes. The results generally support the overall approach and the proposed choices for certain parameters such as the sample sizes of the panel(s). The analysis of a real data set taken at a PV system and leading to massive amount of image data of all involved solar cells illustrates that the hypothesized correlations are present in real data and have to be taken into account.
(i) f ∈ C(U), (ii) partial derivatives exist , as n → ∞: The multivariate central limit theorem yields as n → ∞. An application of the method with f (x 1 , … , as n → ∞, where f (p, … , p) = p and f ( and finally multiplying everything with as n → ∞, which establishes the assertion.