Probabilistic net load forecasting framework for application in distributed integrated renewable energy systems

Integrating various sectors enhances resilience in distributed sector-integrated energy systems. Forecasting is vital for unlocking full potential and enabling well-informed decisions in energy management. Given the inherent variability in generation and demand prediction, quantification of uncertainty is crucial. Therefore, probabilistic forecasting is becoming imperative compared to deterministic forecasting, as it ensures a more comprehensive depiction of uncertainty. This paper introduces probabilistic net load forecasting framework (PNLFF), a non-blackbox approach that is robust, non-parametric, computational and data inexpensive, and adaptable across sectors. It utilizes the personalized standard load profile for deterministic forecasts, and integrates quantile regression to generate probabilistic forecast. The cumulative distribution function is approximated from quantiles of probabilistic forecast using piecewise cubic hermite interpolating polynomial, and then it is derived to probability density function (PDF). Then the probabilistic net load was obtained by the convolution of PDFs for electricity demand, heat demand and PV generation. A case study demonstrates its application in operational optimization for a distributed energy system of the logistics facility. In the first stage of the PNLFF, the results of the personalized standard load profiles clearly show that they can be applied in all sectors and outperform their respective benchmarks. The second stage, the probabilistic expansion using quantile regression, also performs promisingly across all sectors, with the best results being achieved in particular with a small training data set of 30 days. With the extension of the quantiles and interpolation, it was demonstrated how a PDF can be approximated without prior knowledge of the distribution of the data. The result of the case study demonstrate that the PNL, as an aggregated PDF of the different sectors by convolution, can be used for decision making under uncertainty, e.g. for the planning of flexible loads.


Background
Sector integration is already considered a cost-effective and efficient tool for decarbonizing the energy system and transforming it from a fossil and centralized basis, to a renewable and more decentralized one [1,2].This makes sector integration an important component for achieving a climate-neutral energy system in 2050, which is a goal of the European Union [3,4].Ever more decentralized generation and electrification in all sectors (i.g., "sector coupling", sector integration, or P2X) in local grids are creating a growing need for energy and load management at this level.This also applies to commercial and industrial properties that have not previously come into contact with the issue of energy beyond the paying of their energy bills [5,6].
At the decentralized level, it is becoming increasingly important to operate across generation and consumption, as well as across sectors (i.g., electricity, heating and cooling, transportation, etc.).Thus, power demand and generation forecasts must also be considered together.The increasing number of electrified consumers, such as heat pumps [7] or electric cars, poses challenges for local energy and load management initiatives.However, opportunities also arise from optimized operation of the energy system.For example, own-consumption can be maximized, peak loads reduced, and CO 2 emissions minimized.This can reduce independence from energy prices and minimize the need to expand the public electricity grid.In order to optimize the operation of devices that are part of a functioning energy system, both forecasts and the quantification of their uncertainties are important.
Our view on this is that the renewable sector´s integrated energy systems (P2X applications) and the expansion of decentralized generation will increase the need for forecast-based decentralized energy management solutions.Many households, companies, and other small decentralized energy systems will not be able to afford big data analysis or high costs for energy management services, but would nevertheless benefit from data driven energy management.For them, simple, highly automated, and pragmatic solutions are needed.Decentralized data acquisition (e.g., smart meters, solar photovoltaic (PV), and battery inverters) is the basis and makes it possible to operate more independently from third party providers and cloud services.If such data is collected and stored locally across sectors, it can be used to help plan and operate distributed integrated energy systems in a cheap and resilient manner.This paper offers solutions for such this data in the making of cross-sector predictions and the quantification of its uncertainties.

Literature Review
In order to be able to plan and operate within the temporal differences between generation and consumption and optimize, e.g., grid operation, costs, or emissions, forecasts have been developed and used for a long period of time [8,9,10].In particular, deterministic machine learning approaches have been further developed in recent years.Today, however, distribution network operators often still rely on so-called standard load profiles (SLPs) for planning and balancing consumption and the generation of small consumer loads, such as for residential buildings or small businesses [11,12].At the distribution grid level, SLPs are used to approximate consumption when no measurements are available and are applied at different temporal and spatial scales.In their study, "Are standard load profiles suitable for modern electricity grid models?",Peters et al. show that a spatially higher resolution leads to better forecasting scores [13].With the "smart meter rollout" taking place in many countries around the world, ever more data is also becoming available at the decentralized level and can be used in grid operation and energy management [14] and to improve SLPs [15,16].In recent years, deterministic prediction of electricity [8,17,18] and heat load [19,20], as well as PV generation power [21,22], has also become a common tool.Personalized standard load profiles (PSLPs) [15], on the other hand, are more of an evolutionary step beyond today's standard of grid operators [11].We compared this approach using locally-collected data in conjunction with machine learning techniques and found that there can be advantages, especially when data availability or computational power are limited [23,24].In order to optimize the use of renewable energy, it is necessary to align the generation of renewable energy sources and consumption, minimizing the residual or net loads.The capacity to deterministically predict the net load has been further developed in recent years [23,25,26].
For robust and safe optimization and scheduling of a power system, it is indispensable to be able to evaluate the uncertainties inherent to corresponding the forecasts [27].Probabilistic forecasts are therefore developed to assess these [28].Many probabilistic approaches to generation and load prediction have merged in recent years [18,29,30].Critical to the application of predictions and evaluation of uncertainty is the choice of evaluation metrics.For this purpose, detailed reviews of known techniques, application examples, and challenges have been published [31,28].In addition to the metrics, it is also necessary to determine whether the uncertainty is aleatoric or epistemic [32].There are a few examples of applications in which multiple sectors are taken into account.[33], e.g., the use transportation data to better predict electricity demand.
In this study, we combine PSLP with our approaches to build a low-effort generation forecasting model for small-scale (residential) PV power systems [34,35] to create a probabilistic net load prediction.In order to do so, we employ quantile regression (QR) [36,37] -a non-parametric approach, which has the advantage that no prior knowledge regarding the distribution of the data is required, but it may require a higher computational effort in comparison to a parametric approach [30].QR has already been shown to be applicable in the field of load or energy prediction [38,39].For the operational optimization of energy systems with a high share of renewable generation, the net load is a key parameter.In recent years, several approaches to predicting deterministic net load have been unveiled [40,41].The probabilistic prediction of net load can be performed at various levels of aggregation [26].When aggregating individual probabilistic predictions, the probability densities of two or more continuous random variables must be convolved as a joint probability density [42].

Contribution
In this study, the probabilistic net load forecasting framework (PNLFF) -a simplistic, adaptable and robust framework -is presented.It provides load and generation forecasts and quantifies the uncertainty of these.Thus, the output of the PNLFF can be used in operational optimization or the scheduling of decentralized integrated energy systems, taking uncertainties in different sectors into account.For this purpose, a non-black-box approach was followed, which is based on locally collected data or a low data input.This can be trained with low computational time and implemented quickly with the possibility of adapting independently to a change in consumption or generation behavior.The approach does not aim to increase the accuracy of forecasting algorithms, but to provide a structure that enables the creation and assessment of forecasts with minimal time, data and implementation requirements ("low cost", "small/low data", "easy to implement" and "independently adaptable").
The PNLFF is based on a personalized standard load profile (PSLP) and quantile regression (QR).In this work, PSLP was further developed from electricity load demand forecasting [23], [24] and applied to on heat demand and PV generation forecasting.PSLP offers various possibilities for addressing different consumption behaviors and to take seasonal effects into account.As a further extension of the original PSLP, two additional modes ("fix" and "variable") are introduced and possibilities to choose seasonality and type days as an option demonstrated.A persistence or naïve forecast is used as the benchmark for the PSLP.The PSLP then serves as an explanatory variable for the QR, where uncertainty within the PSLP is accounted for and becomes the quantile personalized standard load profile (QPSLP).A cumulative distribution function (CDF) and probability density function (PDF) are then determined from the results of the day-ahead forecasts of the QPSLP.For this purpose, a method is presented on how a CDF and PDF can be approximated from the quantiles of the QR.These help to evaluate the forecast uncertainty and the aggregation of the individual forecasted PDFs.Thus, the probabilistic distributed net load or probabilistic net load (PNL) can be determined.This is necessary for determining the uncertainty in the distributed energy systems, where there is fluctuating renewable generation and new flexible consumers (such as electric cars) must be integrated and their operation optimized technically, economically and ecologically.A case study of a decentralized integrated energy system is used to demonstrate, evaluate and discuss the application of the framework.
In Section 1 of the paper, a comprehensive framework is introduced with the aim of predicting distributed generation and load demand.Specifically, the framework focuses on the quantile personalized standard load profile (QPSLP) pertaining to the sectors of electricity, heat, and photovoltaic (PV) generation.This PNLFF also seeks to assess and quantify the inherent uncertainty associated with forecasts across these distinct sectors, along with the aggregated forecast of a distributed energy system.Section 2 includes a presentation and detailed description of the PNLFF development.For this purpose, an overview of all stages of PNLFF is first presented in order to introduce them step by step.The first step of the section introduces the case study and the data used.For this purpose, the decentralized energy system of a logistics facility is employed as an example to demonstrate the PNLFF.In the subsequent sections of the methodology, the deterministic (PSLP and Benchmark) and probabilistic prediction methods (QR and Benchmark) used are introduced, and the evaluation metrics and quantification of uncertainty are then explained.Furthermore, the aggregation methods for determining the decentralized residual QPSLP and the evaluation of its uncertainty are outlined.The results, reported in Section 3, present the outcome of the forecasting framework and the uncertainty quantification using case study 4 as an example.Using the example of an additional electrical load to be integrated, the PNL is used to demonstrate an optimized application The results are then evaluated in the discussion section reviewed and for their transferability; possible future work lines of research is also proposed.

Probabilistic Net Load Forecasting Framework
The PNLFF, illustrated in Figure 1, can be roughly explained in four steps.In the first, locally-collected data is processed and calendar features are created (e.g.public holidays).In the second step, the data are utilized as input for the creation of a deterministic forecast.As a deterministic model, the PSLP is used to forecast the next 24 hours (day-ahead) of the electricity demand, heat demand, and PV generation.Persistence models are employed as a benchmark, or rather for naïve predictions.The PSLP prediction accuracy is evaluated using the four commonly used metrics of mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE) and mean absolute scaled error (MASE) and compared with the benchmark in order to select the best models for further steps [43,44].The functionality and application of the PSLP are described in detail in Section 2.2.For the third step, we performed a QR based on the PSLP, as a probabilistic extension towards a quantile PSLP or QPSLP.In order to measure the performance and uncertainty of the QPSLP, the evaluation metrics of PICP, MPIW, and Winkler score are introduced.In the last step, an approach is presented to approximate the CDF and PDF from the QPSLP and how convolution can best determine the joint density functions and PNL.To demonstrate the PNLFF, the energy system of a logistics facility [45] was used.A detailed description of the cross-sector energy system was described by the authors in [46].The electricity and heat load data were used from measurements collected as part of the ELogZ ("Energieversorgungskonzepte für Klimaneutrale Logistikzentren") project [45].The PV generation power was generated using the Python library pvlib [47].For this purpose, a system that can realistically be installed on the demonstration premises with a nominal power of 200 kW was assumed.The modules were oriented half to the east and half to the west, with an inclination angle of 10°.Weather data, especially radiation, temperature and wind speed, which served as inputs for the PV simulation, were obtained from open DWD [48]. Figure 2 shows the distributions, in the form of a density distribution, of the data used.

PSLP fundamentals
In this study, the PSLP was applied and extended to the sectors of heat load and PV generation, and different adaptation options were presented to address different load and generation behaviors.The PSLP process and its extensions are described in detail in the following section.
A persistence approach or naïve forecast is used as a benchmark for preparing deterministic forecasts.Three different approaches are taken.The naive forecast corresponds to the values of the last day (d-1), the day before last (d-2), or the values from a week ago (d-7).
The so-called PSLP was first described by Hinterstocker et al. [15] and further developed in other locations.The PSLP provides the ability to generate forecasts from historical measured data, which marks a significant improvement over the SLP [23,24].It starts from the basic idea that the use of standard load profiles is no longer sufficient [13].The SLP [11], for example, is based on annual energy demand.The year is divided into three seasons and each day is assigned to a characteristic day (see Table 1).Public holidays are thereby assigned to a Sunday.The following SLPs are proposed: seven SLPs were available for commercial/industrial enterprises, two for agricultural enterprises and one for residential buildings.With the help of a dynamic modification factor, the profiles can be made even more variable.An annual consumption of 1000 kWh/a was defined as the standardization basis, i.e., the sum of all consumption values of one year results in 1000 kWh/a.
The PSLP also builds up on these characterizations.For the PSLP, locally-collected data on the power demand are required.As an option to better account for loads correlated with outdoor temperatures, a characterization of the characteristic season by mean daily temperature is proposed, as is shown in Table 1.One possibility would be to follow the guideline VDI 4655 [49].For this option, however, time series of ambient temperature would need to be available for the location under consideration, as well as predictions of ambient temperature for the day to be predicted.This results in the following characteristics: winter workday (ww); winter Saturday (wsa); winter Sunday (wsu); summer workday (sw); summer Saturday (ssa); summer Sunday (ssu); transition workday (tw); transition Saturday (tsa); and transition Sunday (tsu).The process of the PSLP can be traced in Figure 3.

PSLP Modes and Options
In order to be able to also use the PSLP for production or consumption time series, which do not depend on weekdays, i.g., PV production, there is the option of switching off the characterizations.That is, it is resolved to make a division after weekdays (work days, Saturday, Sunday) and/or after seasons (winter, summer, transitions).In the first instance, there would only be division into winter, summer, and  transitions.In the second, there would only be a distinction between work days, Saturdays and Sundays.If both options are switched off, there is no more differentiation and all days that were specified are used, with the maximum number of past days t max to be considered for the PSLP forecast.
The process of a standard PSLP can be seen in Figure 3. Before a PSLP prediction can be made, a minimum amount of historical data is needed for the data storage.This means, that for the respective day that is prognosticated, at least one day from the past is needed that has the same characteristics, namly: ww, wsa, wsu, sw, ssa, ssu, tw, tsa or tsu (see Table 1).However, one day, would only correspond to a naïve forecast.Assuming the threshold t threshold is set to 21 days, the first forecast can only be made for day 22.Within the PSLP, there is an option to determined the days from the past to be considered in the forecast.For this purpose, a value t max must be specified.If this is, e.g., 21 days, the previous 21 days are taken into account for the creation of the forecast.The value t max cannot be higher than t threshold : t max <= t threshold .In the special case when the day to be predicted falls in a new season and there are not yet any corresponding days in the data storage, the last day that matches the character attributes is used.The prediction profiles are then calculated using the data, incorporating the same characteristics as the day to be predicted.If more than one historic profile is available with the same characteristics, the mean value is calculated for each time point within the profiles.
The PSLP can be run in three different modes, which are differentiated in Figure 3 as "standard" (solid lines), the newly introduced "fix" (dashed), and "variable" (dotted) modes.The "standard" mode corresponds to the previous description with a fixed value of t max .In the "fix" mode, a separate t max can be specified for each of the nine characteristics (ww, wsa, wsu, sw, ssa, ssu, tw, tsa or tsu).This can help better take into account seasonal or daytype differences.That is, in order to determine the forecast profiles, different numbers of historical daily profiles are used depending on the season and day characteristics if different values of t max were chosen.If the variable mode is selected, a maximum value t max is stated.However, before each prediction, the optimal t max,opt is checked, as is shown in Figure .3. This is the number of historical day profiles that should be used for the forecast.For this, the last available day with the same characteristics as the prediction day at timepoint t is selected as a reference point.Afterwards, the data storage is scanned from t − 1 to t − t max .In the first step, only one day with the same characteristics is looked at and the error against the reference point is determined.Either the MAE (1) or MSE (2) can be selected for the error calculation (see also Section 2.2.3).Thus, on how large t max must optimally be in relation to the reference point is analyzed in an itarative form.Furthermore, a kind of "early stopping" is used, i.e., after a certain number of iteration steps in which the calculated error has not improved, it is stopped and the best result or t maxopt with the smallest errors is used.
where y i is the measured value and ŷ the predicted value at time point i, n is the number of samples, and y j−m a naive forecast with m as the number of previous days for a näive forecast.Compared to the MAE and RMSE, the MASE is scale-independent.This is achieved by comparing the MAE of the forecast with the MAE of a naïve forecast.Thus, a method can be considered reliable if the MASE < 1.The MAPE is only used to evaluate forecasts in the electricity and heat sectors, as PV generation (y i ) often has zero values and is therefore only suitable for y i > 0.

Quantile Personalized Standard Load Profile -QPSLP
The following section describes how the PSLP is extended to a probabilistic forecast, a quantile PSLP (QPSLP), using quantile regression (QR).

Quantile regression
QR is a statistic approach developed in the 1970s by Koenker and Bassett [37] as an extension of linear models.It essentially models the relationship between independent variables and explanatory variables (X), and the conditional quantile of dependent variable (y).As opposed to linear regression, QR provides a more comprehensive picture of the effect of the independent variable on the dependent one.QR is expressed in linear form as: where Q y is the conditional q th quantile of the load/generation distribution(y), q the quantile level, x the feature vector, β q an estimated vector of parameters for quantile q with an unknown coefficient, and X t the corresponding input feature vector at time t.
Quantiles are estimated by giving asymmetric weights to the error defined by the pinball loss function given in equation 7.
where τ is the quantile probability level and ranges between 0 and 1 and u is the error term.The error term is given by y t − q t , where q t is the quantile forecast and y t the actual value.
Given the pinball loss function, the optimization problem of QR can be expressed as equation 8: Minimization of the quantile loss function ρ τ is conducted seperately for each τ .Following the minimization problem, regression coefficients are obtained for each quantile (τ ), which could in turn be used to obtain the distribution of the forecast at different quantile levels.In this work, we use the PSLP as a single feature (explanatory variable).For this purpose, a historical period, training or calibration window, was determined (spanning 30, 90 and 120 days) from which the PSLPs were taken into account.The QPSLP, as well as the PSLP, was run as a rolling forecast of a day-ahead pediction.Within the rolling scheme, i.e. every day the training window (30, 90 or 120 days) was moved forward by one day and the parameters of the models were recalculated for the next day or the next QPSLP.This was performed separately for all of the sectors considered.

QPSLP Evaluation Metrics
This paper evaluates probabilistic forecasting based on three key properties, namely: sharpness, reliability (calibration), and resolution [51].In order to fully assess these aspects, a set of well-established evaluation metrics, namely prediction interval coverage probability (PICP), mean prediction interval width (MPIW), and Winkler Score (WS), are employed.The PICP quantifies the reliability of the forecasts, the MPIW measures their sharpness, and the WS holistically considers both of these factors.It is important to note that there is often a trade-off between PICP and MPIW, as a higher PICP and lower MPIW are both desired but can conflict with one another [52].
where C i is defined as: where L i and U i are lower and upper quantile values of the evaluated prediction interval (PI), N is the number of samples, C i the coverage factor of the PI, y i the measure value at time t, δ is the interval width given by δ = U i − L i , y i the true load demand at time step i and α the confidence interval.

Deterministic Net Load
In order to determine the uncertainty in the QPSLP and calculate the PNL, it is presented below how the predicted quantiles must be approximated to a CDF and PDF and how the PNL can be determined.
To achieve this, we first defined the predicted deterministic net load (NL).The predicted NL is defined as the difference between the sum of all predicted electricity loads y Load,i and that of predicted distributed renewable generation y Gen.,i , as expressed in Eq.12.In our case the difference between the electricity load PSLP and PV generation PSLP.
Therefore, a positive NL indicates that additional power is drawn from the public grid.On the other hand, a negative NL implies there is a generation surplus i.e., grid feed-in.The NL is therefore essential for the optimization of flexible load operation and shows times at which the integration of flexible loads or storage is optimal in the case of e.g.own-consumption, or if it makes sense to schedule a flexible load or storage unit to at later time.

Empirical CDF and PDF from the QPSLP
In order to further apply of the QPSLPs to, e.g., determining the residual or net load or the total uncertainty within a distributed integrated energy system, the PDFs of each random variable are required.For this purpose, the empirical PDFs of the respective QPSLPs are first established on a time-intercept basis.Then, we determine the instantaneous empirical cumulative distribution function from over the predicted quantiles.For each time step t, the CDF is approximated by a Piecewise Cubic Hermite Interpolating Polynomial (PCHIP) interpolation.This special case of spline interpolation allows to avoid the phenomenon of oscillating edges of an interval, which is also known as "Runge's phenomenon".Therefore, according to [53], a PCHIP using the Python based PCHIP interpolation function from SciPy [54] is used.The empirical PDF is then determined from its first derivative, which is continuous without any jumps.
The condition for the interpolation is that the data are monotonically increasing, as the PCHIP interpolant P (x) uses piecewise monotone cubic splines to compute new values of points and its associated derivative [53].For each sub-interval x k ≤ x ≤ x k+1 , P(x) is a cubic Hermite interpolation polynomial with a specific derivative at the interpolation points.It is a given that the first derivative, dP dx , is continuous.The advantages of a PCHIP interpolation compared to spline one is the lower effort and fewer oscillations if the data are not smooth and no overshoots occur.A spline interpolation can provide for a more accurate interpolation, and the second derivative is still continuous.A detailed description of the PCHIP algorithm and calculation of the one-sided three-point estimation of the slopes at the endpoints is described in [53] and [55].In order to approximate a complete empirical CDF or PDF from QPSLP as exemplified in Figure 4, with quantiles between 0.1 and 0.9 predicted as a basis, the quantile points 0.0 and 1.0 were estimated and added before the interpolation.The maximum and minimum occurring value of the final X days from the historic data store of the respective sectors are taken as an estimate.For example, if the CDF for the electrical load is to be determined for timepoint 12:00, it could present as follows.The measured electrical load profiles of the last 30 days for this time point are first selected.From this, the base load (minimum occurring power) is selected as a 0.0 (q=0.0)quantile value and the peak load (maximum occurred value) is selected as a 1.0 (q=1.0)quantile value for the considered time.The condition must be fulfilled that the quantile values for q=0.0 > q=0.1 and q=1.0 < q=0.9 obtain.For highly variable or seasonally dependent load or generation profiles, care must be taken to ensure that the number of retrospective days is not too large.

Convolution
Assessing joint uncertainty within an integrated system or calculating the distributed net load necessitates the aggregation of random variables, such as electricity load and PV generation.For this combination of the individual PDF's, a convolution (*) of the PDFs is necessary.To achieve this, the determination of joint PDFs becomes imperative.The PDFs f X , f Y : R → R+ and their joint PDF f XY of two independent continuous random variables X and Y can be determined as described in Eq. ( 13), as a convolution of f X and f Y .The joint PDF of X + Y f X+Y (z) then itself constitutes a continuous random variable.
For a non-parametric approach with discrete approximated PDF's, the discrete convolution is defined as Eq. ( 14): For the convolution of two approximated PDFs, discretized Probability Mass Function (PMF) and the same symmetrical value range is used for the x-axis.The resulting PMFs can then be convoluted and subsequently multiplied by a discretization factor.The sum of the PMF's and the convolution must be equal to one.For the implementation of the convolution, the Python library SciPy [54] is used.
The expected value E(X) or µ of a discrete random variable X is given by Eq. ( 15):

PSLP Forecasting Evaluation
In the following section, the prediction results of the different PSLP modes and naive predictions are presented and compared.For this purpose, the forecast accuracy for the electricity, heat demand, and PV generation were evaluated using the metrics MAE, RMSE, MAPE, and MASE for the period from October 5 th 2021 to the end of September 2022.Before the first forecasts are made, a waiting period of 21 days was specified to serve as a basis for PSLP generation.Subsequently, the measured values of the previous day were taken into account in the forecast of the next day.In this manner, the data storage would constantly grow.This means that the forecasts were created in a rolling format for each day, using a training data set whose size depends on the mode of the PSLP (see also Sub-section 2.2.2).Thus, a real world scenario can be simulated with changing training data-sets and seasons.The results of the PSLP predictions of each sector, mode and daytype and/or season classification are shown in Figure 5.The figure also shows the results for the electricity demand forecast using boxplots.The box marks the interquartile range (IQR) (25%75% quantile).The black line within the box marks the median (50% quantile).The upper and lower whiskers extend from the box by a maximum of 1.5 times the IQR.What can be seen from the boxplots in Figure 5 and Table 3 is that all PSLP models outperform the benchmarks or the naive predictions.Looking at the PSLP prediction accuracy of the different sectors (electricity, heat and PV) shows that the PSLP's performance depends on the different types of data profiles.The forecast error distributions in the boxplots of Figure 5 show the differences between the individual modes (standard, fixed and variable) but also the influence of characterizing by day types and/or seasons.In the heat and PV forecasts, the larger IQRs are noticeable compared to the electricity sector.In addition, higher outliers occur and the smaller differences compared to the naïve forecasts become visible as pictured out in Figure 5 and can be read from the MASE in Table 4.The performance of the different PSLP modes and the option to whether consider or not the day types (d) or seasons (s) within the different sectors is also remarkable.While the standard (std) PSLP performs best in the power sector, it is significantly worse in the heat and PV sectors.For these sectors, variable (var) PSLP performs best.Furthermore, the PSLP in the electricity sector is best when day type and season are taken into account whereas PSLP is best for PV and heat sector when day or seasonal characteristics Figure 5 are not considered.Based on the MAPE of the best prediction results, the fundamental performance differences of the predictions between electricity and heat demand can be shown.Although the average MAPE in the electricity one is 14.56%, it is significantly higher in the heat sector with 25.04%, as shown in Table 4.
For the probabilistic extension, the mean standard PSLP with day and season characteristics (std_el_d_s) was used for the electricity one, and the mean variable PSLP (var_ht, var_pv) was selected for the heat and PV sectors.The mean and median values of the selected PSLP for each sector are listed in comparison to the best performing naive predictions in Table 3.These are shown for an randomly day in Figure 6, which also displays the deterministic net load 2.4.1.Based on the prediction errors and example from Figure 3, it is clear that the PSLP performs differently depending on the sector in which it is applied.Particularly  for highly variable profiles like daily PV generation or heat demand, higher forecasting errors tend to occur than for the electricity load profile.

QPSLP Forecasting Evaluation
For the generation of the QPSLPs, as for the PSLP, a rolling forecast was used in which the training window was limited to 30, 90, or 120 days.All metrics were calculated for referral to the 80% PI (0.1 -0.9 quantile).The probabilistic extension of the PSLP to a QPSLP is exemplified for the different sectors in Figure 7.In fig. 7, the gray solid line indicates quantiles 0.1 and 0.9, the dashed lines the predicted quantiles 0.2, 0.3, 0.4, 0.6, 0.7 and 0.8, the red line the median (0.5 quantile), the blue line the PSLP point forecast, and the green dots depicts the actual measured values.The example day shows that the PI range can vary greatly.Especially during times with the fluctuating PV generation 7 (right), large differences occur.The accuracy of the probabilistic forecasts were evaluated using the PICP, MPIW and Winkler score metrics, which are described in Section 2.3.2.The forecasting metrics of the QPSLP's for each sector are shown in Figure 8 as box plot and also in Table 5.The median (orange line) of the PICP, as well as the mean (see.Table 5) are around 80% which is to be expected with the PI [0.1, 0.9].However, there are very large differences in the range between the QPSLP in the electricity sector compared to the other sectors.This is especially reflected in the sharpness of the probabilistic prediction, as depicted by the MPIW metric.
If we look at the mean values given in Table 5, the differences are also obvious and the predictions differ between the sectors.While the PICP, with an expected 80% at a PI of the 0.1 to 0.9 quantiles, is at about the same level, the MPIW and Winkler score significantly differ.In particular, the highly variable PV generation is not predicted that efficiently compared to electricity and heat demand.The evaluation metrics over the entire observation period can be seen in the IQR (boxes) of the boxplots in Figure 8. Again, the IQR of the MPIW and Winkler scores of the PV generation forecast is particularly striking, as it is significantly larger than in the other two cases.To quantify the uncertainty, it is necessary to obtain a  view on the CDF or PDF of the obtained predictions.

CDF and PDF Approximation
The CDF and PDF from the QPSLPs quantiles were approximated using PCHIP interpolation and its derivatives, as described in Section 2.4.2.As an example, Figure 9 shows the approximated instantaneous CDFs (upper graphs) and PDFs (lower graphs) for electricity and heat demand and PV generation.As a plausibility check, the PDF's were discretized into PMFs and it was checked whether their sum was equal to one.

Comparison of Different Distribution Functions
Comparing the PCHIP-approximated CDFs with a fit of a Gaussian, beta or gamma distribution by the quantiles, the advantage of the proposed method becomes apparent, as is shown in Figure 10.Quantiles fitting a CDF with Gaussian, beta, and gamma distributions were identified in 11 instances, which were also used for PCHIP interpolation in section 3.3.In Figure 10, the quantiles, the normal or Gaussian, the beta, and the gamma distributions are shown in each subplot, as well as the pchip approximation in red.From left to right the electricity, heat and PV sector was shown.
It is apparent that these "standard" distributions sometimes work better and sometimes worse depending on the application (e.g., beta distribution works nearly good for PV generation, but fails for the other two applications).In comparison, the PCHIP approximation is clearly better adapted and has the advantage that no prior assumption about the distribution of the data must be known.

PNL Forecasting Evaluation
In order to add or subtract the probability densities of different predictions (random variables), they must be convoluted, as was described in Section 2.4.3.The results of such a convolution for a sample time point are displayed in Figure 2.4.3.The top graph in Figure 2.4.3 shows the approximated PDF's of the electricity and heat load forecasts, as well as PV generation (negative powers).In the bottom graph can be seen the convolution of the PDF's.The probabilistic NL (orange line) results from the convolution between the power consumption minus PV generation.If one is interested in the joint probability density function of several random variables (here, e.g., electricity, heat load, and PV generation), the convolution must be performed with all variables, as is shown Figure 11.The joint density function or probabilistic NL can then be used to determine the probability of occurrence for the different power ranges.

Case Study
In application, the probabilistic NL can help making decisions in the operational optimization of distributed energy systems.It helps deciding when an additional flexible load or energy storage (e.g., a BEV or electrolyzer) can be supplied or additional power can be provided by a flexible generation or storage unit.If the NL is positive, a power deficit results and power needs to be purchased from the public electricity grid.On the other hand, if it is negative, there is a surplus of decentralized renewable power generated.In that case, demand should be shifted or energy should be stored in energy storage technologies.The convolution of probability densities of stochastic supply sources and demands can help to determine not only an expected future NL but a probability density function.In the following, a short case study is presented to demonstrate the advantages of the approach presented in this paper.
The case study consists of the distributed power system of a logistics property.In a fist step, a probabilistic forecast of the power demand and PV generation for the subsequent 24 hours is made.For this, the PNL is calculated as described previously.The example of the logistics sector was chosen due to the facts that it is often time-critical, the transformation of the transport sector is proceeding rapidly, and solutions must be identified at a decentralized level.In this case, the flexible load is presented as an electrical preconditioning process, e.g.pre-cooling from a certain temperature to a temperature set-point of a refrigerated trailer (reefer).The refrigerated trailer stands at the logistics site for four hours and can only be electricallypreconditioned during this time.In [46], the authors described which cost advantage and reduced emissions the electrical process compared to the diesel preconditioning, especially if PV own-consumption can be used.
In the first example described, a randomly selected day between 6:00 am and 10:00 am is considered.In the observation, the full hour is considered in each case.Start time = 06:00, t1= 07:00, t2= 08:00, t3= 09:00, and the end time 10:00.Figure 12 shows the respective PDFs of the NL for the respective times.
It is fixed as an assumption that own-consumption of PV-generated power imposes lower costs and CO 2 -emissions than the supply from the public electricity grid.So that the optimization goal is that PV own-consumption should be as high as possible, i.e., ≤ 0 kW (Figure 12 right vertical gray dashed line).For the sake of simplicity, we assume the electrical pre-cooling takes one hour with an additional load of 10 kW.The left vertical gray dash-dotted line (≤ -10 kW ) in Figure 12 marks when it can be served solely from locally produced electricity.
In order to make the decision under uncertainty, the probabilities to reaching the criteria are accumulated.Table 6 presents these values for both, the ≤ 0kW (own-consumption) and ≤ -10 kW (+10 kW pre-cooling load) criteria.For comparission, it also displays the expected value µ for each time step.In the first time step (starting 6:00), there is most likely no PV generation yet (see Figure 12 and the PNL is therefore approximately 95 % ≤ 0 kW.By 10:00 am, PV power generation increases, as does probability of a negative NL, the expected values of the PNL shift towards negative values.At this point, the suggested method already helps to quantify the impact of a decision in the schedule.If the pre-cooling is carried out between 09:00 and 10:00, it will increase the probability to need electricity from the grid by 0.5 %.However, in the presented period of time, judging based on the expected net load would yield the same decision, as the load also takes the lowest value in that hour.Now, the example is extended to consider a full day.For this, the expected NL and the aggregated probabilities as used before are considered.In Figure 13, also the value with highest probability of occurrence is added, as it might also serve as a criterion for decisions.It can be directly seen that a pre-cooling   operation between 10:00 and 12:00 is beneficial to have a high probability of PV own-consumption, while having a strongly negative net load.Again, the aggregated probability (red dotted and dash-dotted lines in Figure 13), the expected value (solid line), and the value of the maximum probability (dashed line) yield the same schedule.However, the situation becomes more interesting in cases where it is not possible to pick this optimal solution.For example, with a set departure time of 8:00, it would make no difference at which point in time the pre-cooling process is started.This fact is only visible as the probability of own-consumption stays at 0 %, while the load values already start shifting.At these times, it makes sense to consider another decision criterion.

Discussion
With the QPSLP and probabilistic net load, a method was presented that takes up the concept of the standard load profile and extends it to make it applicable to other sectors, as well as to in the optimization of sector-integrated distributed energy systems.This tool can be used to integrate renewable generation (e.g., PV) and new flexible loads (e.g., battery electric-vehicles) in distributed energy systems in a technicallyand economically-optimized way.Therefore, this tool can help decarbonize distributed energy systems in particular.The PNLFF is a non-blackbox approach, is easy to implement and does not require more than locally collected measurements from sectors to forecast load demands, generation, and the net load.It can be used by a large number of users and especially in small distributed energy systems, e.g., households, neighborhoods, or SMEs, for which high investment in a forecasting system is not economically-viable.
It was shown that the PSLP can also be applied to the heat and PV sectors.The PSLP performance in different sectors is very different, e.g., the average MAPE of electricity (14.82%) and heat (25.04%) load differs by more than 10%, as is shown in Table 3.A strong scattering of the errors can be observed, particularly in the PV prediction as shown in the box plots in Figure 5.This can be interpreted in the higher stochastic section within the data and large fluctuations between days.It can be noted that the PSLP performs well, especially for the electricity sector compared to heat demand and PV generation.However, it was also shown that the addition of the "fix" and "variable" modes, as well as the option to turn off the characterizations by day type or season, brought about significant improvements in the PSLP (see Table 3 and Figure 5) and outperformed the benchmarks.Individual hyper-parameters should be set for different energy systems and sectors, such as whether to classify by weekdays, workdays, or seasons, or which aggregation method is to be used.In the case of heat demand, the outdoor temperature, if temperature data are available, can be used to make the classification into different seasons, as described in Section 2.2.2, instead of a fixed date (day type characterization).However, this only carries an advantage if there is a correspondingly high correlation between outside temperature and heat demand.In the case of coupling between the heat and electricity sectors by means of heat pumps, for example, it could also become relevant for electricity demand.Furthermore, in the aggregation method, other options than the mean value, e.g., median or maximum value, could be tested.The maximum value could be successful, especially for PV, shown in [34].In addition, the transportation sector could be included as an outlook and the electric power demand of BEVs or charging stations [56] included in the PNLFF.In contrast to deterministic prediction, probabilistic prediction offers the advantage that uncertainty can be taken into account.
The probabilistic extension towards a QPSLP shows how to use PSLP as simply an input feature for QR in order to make a probabilistic forecast while achieving promising results.The QPSLP was evaluated for different training windows (30, 90, and 120 days).For the PICP, the expected value obtained was 80%, as shown in Table 5.This is slightly overestimated in the mean value of the individual sectors with 83.2% (30 days) and 83.91 % (30 days) in the electricity and heat sectors, and is closest to the value expected, with 80.57% in the PV QPSLP.Overall, the PICP is overestimated in all sectors with outliers at the bottom (see Figure 8).However, looking at the distribution of the other metrics (MPIW and Winkler score), it is clear that the magnitude and range of errors, as with the PSLP, are clearly the highest for PV (see Figure 8).A short training window of 30 days performs best, but is less relevant in the electricity and heat sectors.The significant differences for PV are due to the changeable weather and the seasonal dependency of PV performance.This approach can also be further optimized in future work.However, there is an increased potential for optimization, especially in PV generation.For example, temperature time series or calendared features could be used to improve the QPSLP.It is also possible to see whether a parametric approach can be used, if the distribution of the load time series is known.The advantage of a non-parametric approach (QR) in our work is that no assumptions need to be made about the distribution of the data as shown in Section 3.4 and Figure 10.We quantify this and it initially outweighs the disadvantages of higher computation times and the risk of quantile crossing, which was described by [57]; a solution for QR without quantile crossing was presented in [58].However, the task of monitoring the results and taking counter measures against quantile-crossing would be essential in future work.In order to be able to determine uncertainty within the QPSLPs, a method was presented to which the CDF and PDF from the quantiles of the PSLPs could be approximated.It was also shown how a joint PDF can then be determined from the convolution of two independent variables.For example, the PDF of the net load can be determined from convolution 2.4.3 of the electricity and PV PDFs.If the random variables are not independent of each other, then the correlation between their PDFs must be determined.According to Sklar's theorem [59], Copulas can be used to consider the dependencies of two random variables in their joint distribution [42].Convolution allows the joint probability density to be determinded for different sectors at a point in time and to apply it in uncertainty quantification or risk optimization.Thus, the PNLFF can provide the basis for stochastic operational optimization as demonstrated by [60].
In the last step of our work, a case study was used to show how the predictions or PNL can be used to optimize the integration or operation of additional electrical loads under uncertainty.The advantage of considering the probabilistic net load is obvious.If all electrical consumers and generators are taken into account, it can be used for decision-making under uncertainty.Thus, not only technical limitations (e.g. the house connection point) but also economic ecological factors can be optimized.As is shown in the example of the optimized integration of the electric refrigerated trailer, a decision can be made in the simplest way.Then, in a further step, operation schedules for decentralized sector-integrated energy systems can be generated in a stochastic optimization process for load, renewable generation, and storage.

Conclusions
In this study, we hypothesized that there is an increased need for forecast-based energy and load management in distributed energy systems to support integrated electrification in all sectors, as well as renewable generation. in the case of small-scaled energy systems, these should cost as little as possible, so as to be able to generate and be adapted on the basis of the system's own collected data (e.g., smart meters or PV inverters).Therby, these can be usable for energy management by means of uncertainty assessment.For this purpose, a framework was developed that enables the creation of PSLPs for the electricity, heat, and PV sectors, and the generation of a non-parametric probabilistic forecast from them using QR, thus moving towards QPSLP.To this end, the question was answered, how PSLP and QPSLP can be transferred from the electricity sector to others (i.e., heat and PV generation) was answered.For further applications, an approach was then presented on how to derive the respective empirical CDF and PDF from the QPSLPs and thus by convolution of the PDF's of electricity, heat and PV generation forecasts to determine the probabilistic distributed net load.Based on the PNL, a case study for the use of the PNLFF in energy management was ultimately shown.
In the first section of the paper, we focused on the "related work" of previous developments, advantages and disadvantages in the field of standard load profiles, and personalized standard load profiles.In addition, probabilistic forecasts or extensions with a focus on a simple QR and the evaluation of uncertainty are highlighted.In the second section, the methodology that describes an overview of the developed PNLFF and the data basis used was presented.Based on this, deterministic forecasting methods were first presented.As a benchmark, a naïve forecast model built on the data from one, two, and seven days previously was used.For the PSLP, different options were presented that can be applied depending on the sector and data.As a novelty, the PSLP was transferred to the heat and PV generation sectors and options for switching on and off day type characterization and seasonal consideration were introduced.The PSLP was then used as an explanatory variable and input into a QR to generate a probabilistic forecast.In order to determine the net load of the QPSLPs and to be able to optimize internal own-consumption, a new approach for determining the empirical CDF and PDF by means of a so-called PCHIP interpolation was introduced.The PNL was then calculated via convolution from the individual PDF's.Finally, a case study was used to demonstrate the use of the PNL, which shows how the results of the framework could be used to optimize the operation of the distributed power system under uncertainty.The PNLFF considers energy systems a holistic, cross-sectoral system for which only a few piece of locally-collected measurement data are required.It includes the characterization of the load and generation data (PSLP), the generation of probabilistic forecasts (QPSLP), the approximation of empirical density distributions (CDF and PDF), and the calculation of net load (PNL) as a basis for optimization under uncertainty.Due to its simplicity, there will be proprietary methods and tools that can produce a more accurate prediction with increased resources, but it is holistic, widely applicable, and transferable, making it an important contribution to the development of for cleaner energy systems.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figure 1 :
Figure 1: Flow chart of the probabilistic net load forecasting framework

Figure 2 :
Figure 2: Density distribution of electricity, heat load, and PV generation over the entire observation period

Figure 3 :
Figure3: Flow Chart of the PSLP algorithm.The three different modes "standard" (solid lines), "fixed" (dashed), and "variable" (dotted) are distinguished in the selection method of the maximum past days (tmax or tmax,opt).

Figure 4 :
Figure 4: Typical ECDF by PCHIP interpolation of the quantiles [0.1:0.9] for one time increment (left graph) and EPDF/derivative (right graph) for the electricity sector

Figure 5 :
Figure 5: Evaluation of the forecasting results (MAE, RMSE, and MAPE) for the electricity load, comparing different PSLP modes and naive forecasts

Figure 6 :
Figure 6: Example day (April 20th 2022) of PSLP forecast for all sectors and the net load (electricity -PV)

Figure 7 :
Figure 7: Example day (April 20th 2022) for QPSLP of Electricity and heat load and PV-Generation using QR.

Figure 8 :
Figure 8: Boxplots of probabilistic forecasting scores of the PICP, MPIW and Winkler score for electricity, heat and PV, with 30, 90 and 120 day training window, respectively.

Figure 9 :
Figure 9: PCHIP interpolation of the QPSLP quantiles to a CDF and PDF from its derivative for all sectors

Figure 10 :
Figure 10: Comparison of the PCHIP CDF approximation and the Gaussian, beta and gamma distribution fit to the quantiles.

Figure 11 :
Figure 11: Separate PDF of each sector (upper graph) and the joint PDF's after convolution (lower graph).

Figure 12 :
Figure 12: PNL of different time points in the cast study (the left vertical line marks -10kW, the right one 0kW

Figure 13 :
Figure 13: Important characteristics values from the PNL, at another example day.

Table 3 :
Mean MAE and RMSE forecasting errors of the best PSLP.

Table 4 :
Mean MAPE and MASE of the best PSLP forecasts.

Table 5 :
Mean QPSLP (QR) scores of the rolling forecast

Table 6 :
Accumulated probabilities and expected values of the case study criteria