Impact of tropical convective conditions on solar irradiance forecasting based on cloud motion vectors

Intra-day forecasts of global horizontal solar irradiance (GHI) are widely produced by displacing existing clouds on a geo-stationary satellite image to their future locations with cloud motion vectors (CMVs) derived from preceding images. The CMV estimation methods assume rigid cloud bodies with advective motion, which performs reasonably well in mid-latitudes but can be strained for tropical and sub-tropical climatic zones during prolonged periods of seasonal convection. We study the impact of the South Asian monsoon time convection on the accuracy of CMV based forecasts by analysing 2 years of forecasts from three commonly used CMV methods—Block-match, Farnebäck (Optical flow) and TV-L1 (Optical flow). Forecasted cloud index (CI) maps of the entire image section are validated against analysis CI maps for the period 2018–2019 for forecast lead times from 0 to 5.5 h. Site-level GHI forecasts are validated against ground measured data from two Baseline Surface Radiation Network stations—Gurgaon (GUR) and Tiruvallur (TIR), located in hot semi-arid and tropical savanna climatic zones respectively. The inter-seasonal variation of forecast accuracy is prominent and a clear link is found between the increase in convection, represented by a decrease in outgoing longwave radiation (OLR), and the decrease in forecast accuracy. The GUR site shows the highest forecast error in the southwest monsoon period and exhibits a steep rise of forecast error with the increase in convection. The highest forecast error occurs in the northeast monsoon period of December in TIR. The impact of convection on the number of erroneous time blocks of predicted photovoltaic production is also studied. Our results provide insights into the contribution of convection to errors in CMV based forecasts and shows that OLR can be used as a feature in future forecasting methods to consider the impact of convection on forecast accuracy.


Introduction
Solar photovoltaic (PV) power forecasting is essential for secure and economic injection of the generated electricity into the grid and capturing the cloud-induced output fluctuations is a major challenge in increasing the accuracy of such predictions. As of June 2022, the installed capacity of solar PV systems connected to the Indian electricity grid has reached 57 GW based on the statistics from the Central Electricity Authority of India (https://cea.nic.in/ installed-capacity-report/?lang=en). However, the current installed capacity is only 6.5% of the country's solar potential of 748 GW, estimated by the Ministry of New and Renewable Energy (https://mnre.gov.in/ solar/current-status/) while assuming that 3% of the total wasteland area is utilized for solar PV installation.Furthermore, the installed capacity is expected to increase manifolds in the upcoming years in order to fulfil the target of 300 GW of solar generation capacity by 2030 (Malik et al 2020, Kumar et al 2021. In this regard, the increasing importance of generation forecast has been highlighted in several studies such as in the reports of the Green Energy Corridors (https://mnre.gov.in/img/documents/uploads/ 80f821f916274ab9b73ac8869a0fa619.pdf) project under the Indo-German Energy Programme and in the SAARC Energy Centre's report (www. saarcenergy.org/wp-content/uploads/2021/02/Draftof-Study-Report-on-Assessment-of-wind-and-solarpower-forecasting-techniques-in-SAARC-countries_ 02.12.2020.pdf). Short-term forecasting and scheduling of solar energy generation has become a widely pursued area of research and policy development in the Indian context due to the challenges involved in the large-scale integration of solar PV into the existing grid (Das 2017). The Indian Central Electricity Regulatory Commission (CERC, https:// cercind.gov.in/2015/regulation/SOR7.pdf) and the other state regulators (www.tnerc.gov.in/Regulation/ files/Reg-120220211048Eng.pdf) set deviation margins (e.g. 10% of the nominal capacity) for fluctuating renewable generation systems like solar PV within their area. Deviations are calculated for each 15 min time block. This has financial implications for the operators of large-scale solar PV systems in terms of crossing deviation penalty margins. CERC regulation allows for a fixed number of intra-day revisions to the day-ahead schedule of solar PV generators up to 1.5 h before the delivery time. In such a situation, choosing the less erroneous forecast can make a significant difference. Deviation of solar PV generation from the day-ahead forecasted schedule across a large region causes load-generation imbalance in the power system and hampers the frequency stability. Short-term deviations of renewable generation from their forecasts in this time period require balancing capacities in terms of reserves. Due to the intermittency of solar PV and wind generators, power system operators in India sometimes need to perform emergency curtailment of solar PV and wind generation to maintain the grid stability. This not only causes financial losses for these generators, but also leads to a wastage of emission-free renewable energy. Clouds are the main source of uncertainty in hours-ahead PV power forecast and their effects can be captured by satellite-based methods to a large extent. Geostationary satellite images can provide information at spatio-temporal resolution appropriate for cloud detection and cloud motion estimation. Deep convective systems present complex cloud situations which involve motion in all three dimensions along with growth and decay. The South Asian monsoon occurs over a span of a few months during which there are frequent occurrences of deep convective systems. It is caused by the northward and southward propagation of the Inter-Tropical Convergence Zone (ITCZ) from the equator, in response to the seasonal variation of the latitude of maximum insolation (Gadgil 2003, Sharma et al 2021. This leads to the formation of two distinct monsoon patterns-the southwest or summer monsoon (June to September) and the northeast or winter monsoon (October to December), during the northward and the southward shift of the ITCZ respectively.
During the summer monsoon, the ITCZ advances northward over the Indian Ocean into the subcontinent (20 • -25 • N) from its near-equatorial position (Sikka et al 1986). The intense solar heating through boreal spring and summer, and the difference in heat capacity between the South Asian landmass and the adjoining oceans leads to the development of a large-scale meridional surface temperature gradient (Turner and Annamalai 2012). The withdrawal of the summer monsoon and the onset of the winter monsoon is characterized by the reversal of lower level wind direction from southwest to northeast (Rajeevan et al 2012). The ITCZ retreats towards its near-equatorial position during the winter monsoon with the maximum cloudiness observed over the southern peninsular part (Wonsick et al 2009). The monsoon period in general is characterized by frequent formation and dissipation of convective clouds over the landmass.
Motion extraction techniques were originally used in video compression for removing redundancy from consecutive frames in order to reduce the total size (Cros et al 2014). Global solar horizontal irradiance (GHI) forecasts for future time instances are obtained by extrapolating cloud structures forward in time with the cloud motion vectors (CMVs) estimated from past consecutive images, considering clouds as rigid bodies with purely advective motion. Various CMV techniques are already used commercially, and several others are available in literature (Hammer et al 1999, Lorenz et al 2004, Lee et al 2017, Gallucci et al 2018. Block-matching is the most widely used operational CMV estimation method from geostationary images (Cros et al 2014a, Hammer et al 1999. It estimates the CMVs by calculating the spatial correlation between nearby blocks in consecutive images. Optical flow (OF) techniques emerged from research in the field of computer vision, for object detection and tracking (Bai andHuang 2018, Oh et al 2021). They involve the use of differential techniques for estimating the apparent motion of image objects between two sequential frames caused by either the movement of the object or the camera (Urbich et al 2018. Urbich et al (2018) compared the performances of Farnebäck and TV-L 1 OF methods in forecasting cloud albedo for 10 exemplary days and found that TV-L 1 performed the best. André et al (2019) validated the performances of block-match, spatio-temporal autoregressive and scaled persistence methods in producing intra-day GHI forecasts for the Guadeloupe island over a period of 2 years with satellite images. Kallio-Myers et al (2020) applied Farnebäck technique to produce site-level GHI forecasts, and validated them against ground measurements from five sites in Finland for a period of 4 months. Smart persistence of satellite images was used as the reference to highlight the superior accuracy of Farnebäck. Al-Amaren et al (2021) used framewise peak signal-to-noise ratio (PSNR) to validate the accuracy of motion vectors in video compression. Cros et al (2020) produced intraday GHI forecasts by applying the block-match method on satellite images and validated them against the ground measured irradiance from the Palaiseau Baseline Surface Radiation Network (BSRN) site for a period of 3 years covering different weather regimes. Oh et al (2021) performed a spatial analysis of the performance of several OF and deep learning based CMV extraction methods over the South Korean peninsula with 1 year of satellite images.
Several authors implemented and validated forecasting methods for the different satellites around the globe. Prasad and Kay (2021) benchmarked solar power forecasts with Himawari-8 images against four sites in Australia for 1 month. The authors used the Farnebäck method for CMV estimation with Heliosat for converting pixel intensity to solar irradiance and found that the site located in arid climatic zone had the highest forecast error. Notable errors were reportedly due to changes in image contrast in situations involving rapidly developing cumulus congestus clouds. Kim et al (2017) estimated GHI over the Korean Peninsula and a part of the Japanese islands from COMS images using the UASIBS/KIER model, which takes visible channel reflectance and infrared channel brightness temperatures as inputs. Yang et al (2019) presented a model for forecasting GHI up to 3 h ahead with visible and infrared channel images from the Fengyun-4 satellite and its validation against measurements from a site in the Gobi Desert for some sample days. The authors used particle image velocimetry (PIV) for estimating the motion field and a cloud/shadow detection algorithm (Zhai et al 2018) based on spectral indices for deriving GHI from satellite reflectivity values. Yang et al (2020) used the Heliosat 2 and the PIV method for estimating GHI and the motion field respectively. They validated the forecasts against ground measurements from a site for a few typical months. They found that the forecasts perform best in January and the worst in July for the Chengde site. Cheung et al (2015) analysed the spatio-temporal variation in cloud cover and the resulting reduction in surface solar irradiance at eight sites distributed across Australia. Mejia et al (2018) assessed the seasonal day-ahead GHI forecast accuracy of numerical weather prediction (NWP) data from the weather and research forecasting (WRF) model against ground measurements from a site in Las Vegas, Nevada for the period August 2015 to December 2016. Similar studies on the seasonal variability of NWP based day-ahead irradiance forecast for different regions can be found in the literature (Lara-Fanego et al 2012, Ohtake et al 2013, 2015. Gregory et al (2012) cited incorrect representation of convective clouds in the tropics and orographic lifting over mountainous areas in southeastern Australia as the primary causes behind day-ahead GHI forecast error. Huang et al (2018) performed a climatological validation and inter comparison of multiple NWP models over the entire Australian landmass with 1 year of data. They found a strong correlation between the monthly forecast error and monthly cloudiness by validation against ground measured data from 13 sites. Tuononen et al (2019) investigated how operational NWP forecasts of low and midlevel clouds affect the accuracy of GHI forecasts with 4 years of cloud and GHI observations from a site in Helsinki, Finland. They observed that the relative error in solar irradiance forecast remains more or less constant throughout the year.
The aforementioned CMV based forecast methods have primarily been tested for mid-latitude climatic zones with predominantly advective cloud motion. Wonsick et al (2009) extensively studied the seasonal variation of the total cloud cover at various locations inside the Indian monsoon region with images from Meteosat-5. However, the relation between cloud cover and GHI forecast accuracy has not been analysed. In Mao and Wu (2007) and Yamazaki and Nakamura (2021), the authors utilized satellite estimated outgoing longwave radiation (OLR) as a measure of convection. Aicardi et al (2022) benchmarked the performance of blockmatch and three other OF techniques with a 1 year dataset of GOES-East satellite images of a region in south-east South America. The authors found that TV-L 1 performed the best on an average, but concluded that further studies into the relation between regional cloudiness regime and CMV performance are required for different parts of the globe. A brief overview of the relevant studies have been provided in table 1. Under the convection dominated weather situation in the tropics, the underlying assumptions for CMV estimation may not be adhered to. For an individual solar PV system, this would imply an increase in erroneous forecast updates and intra-day bids at the power exchange. With many such systems connected to the grid, the deviation of the actual generation from the forecasted could pose significant load-generation balance problems for the grid operator. Additionally, the intra-annual or seasonal variability in this forecast error may affect long-term planning and the optimal location of solar PV systems (Davy and Troccoli 2011). In particular, a performance analysis of the CMV methods for duration long enough to capture the seasonal effects in tropical climatic regions, is missing. In this paper, the performance of the operational Block-match technique and two other commonly used OF methods-Farnebäck and TV-L 1 , are analysed for a period of 2 years (2018-2019) and the following contributions made: • Quantification of the impact of seasonal convection on CMV based solar irradiance forecast accuracy in two Köppen-Geiger climatic zones. . Furthermore, the error in albedo at 0.6 µm is less than that at 0.8 µm. In this analysis, visible channel data centred at 0.6 µm with a spatial resolution of 3 km × 3 km at nadir is utilized for the time period 01-2018 to 12-2019.

Ground measurements
Ground measured GHI data available at 1 min temporal resolution from two stations of the World Meteorological Organization's BSRN (Kumar et al 2014, Driemel et al 2018 in India namely-Gurgaon (GUR) and Tiruvallur (TIR) are used for the validation of the satellite predicted GHI. The GUR BSRN site, located in northern India (28.42 • N 77.16 • E),

OLR
Monthly averaged and daily averaged OLR data from the climate prediction centre of the National Center for Environmental Prediction is used as a measure of convection. The OLR dataset is derived using a multi-spectral technique involving the water vapour channel and multiple infrared channels. So, it can detect convection with higher certainty than the brightness temperature from any single infrared channel. It has a spatial resolution of 2.5 • latitude × 2.5 • longitude and a spatial coverage of (Liebmann and Smith 1996). It is derived from the NOAA 18 polar orbiting satellite images available from September 2005 till the present time and has a daily temporal resolution. The low spatial resolution causes convective activities in the wider vicinity to be considered when analysing the OLR at any point. This is appropriate for the present study due to the forecast horizon of 5 h. Since we study the effect of seasonal convection on the irradiance forecast accuracy, the daily temporal resolution is found to be sufficient. Figure 1 shows the OLR map constructed from this data for the South Asian section over the summer and winter monsoon periods of 2018. In Ghanekar et al (2010), Jiang and Zhu (2020) and Su et al (2020), an OLR value of less than 250 W m −2 is considered as an indication of tropical convection while a value less than 240 W m −2 signifies deep convection. As noted in Midhuna and Dimri (2019), OLR is influenced by the presence of convective clouds and cold surface earth temperature as well.
Therefore, a careful interpretation of the OLR over the Himalaya is necessary.

Pre-processing satellite image
The original MSG-IODC high rate information transmission (HRIT) image has a size of 3712 pixels × 3712 pixels and covers large parts of Africa and Asia. This image is cropped to a size of 1200 pixels × 1200 pixels to fit the South Asian section, as shown in figure 2(a). Enough margin is allocated to be able to detect incoming clouds in advance. The section extends from 1.39 • N to 43.34 • N and 58.04 • E to 122.27 • E. The digital counts from the satellite image are converted to radiances with the calibration factors given in the image headers. They are further converted to bidirectional reflectance factors (BRFs) according to the EUMETSAT report for the conversion of radiance to reflectance (www-cdn.eumetsat.int/files/ 2020-04/pdf_msg_seviri_rad2refl.pdf). From these cloud index (CI) images are processed using the Heliosat method (Hammer et al 2015). The CI is a measure of cloudiness, as shown in figure 2 ρ is the actual BRF value of a given pixel. ρ 0 and ρ c are determined using a time series of BRFs. ρ 0 is estimated individually for each pixel and is assumed to be the most frequent low value (5th percentile) of reflectance in the time series of a given pixel. ρ c , on the other side, is estimated for the entire image and corresponds to the most frequent high value of reflectance (95th percentile of all values of ρ > 0.5) in the time series considering all pixels. Both values are daily updated using images of the same time for the previous 30 d. The final step of the Heliosat method is to transform CI to GHI as described in Hammer et al (2015). For this, the clear sky model introduced in Dumortier (1995) with climatological turbidity values from an International Energy Agency (IEA) report by Remund and Domeisen (https://meteonorm.com/assets/publications/ieashc3 6_report_TL_AOD_climatologies.pdf) are used. Forecasts of GHI are produced by applying this last step of the Heliosat method on forecasted CI images and using the expected clear sky irradiance for the time ahead.

CMV estimation techniques
Although different in their implementation, the various CMV methods make similar basic assumptions: • Cloud pixel intensity values remain constant along motion trajectories. • Cloud motion is advective and there is no formation or dissipation of clouds between consecutive images. • CMV fields are smooth over a window or block.
The block-match technique (Lorenz et al 2004, Gallucci et al 2018 looks for the best matching image segment or block in terms of the least mean squared error between two consecutive images and within a search window to estimate the displacement of cloud structures. The operational block-match algorithm used at the German Aerospace Center (DLR, Institute of Networked Energy Systems) on Meteosat-10 derived CI images, is tuned for the South Asian section. We tested the sensitivity of the Blockmatch algorithm for different block (21,26,31,36,41,46) and search window sizes (37,42,47,52,57,62). A block size of 31 pixels (≈31 × 3 km = 93 km at Nadir) and a search window of (≈47 × 3 km = 141 km at Nadir) shows the best results. In Aicardi et al (2022), the authors found 120 and 144 pixels as the optimum values of block and search window sizes for 1 km resolution GOES East image section of south-east South America.
OF employs differential methods to extract motion vectors for small displacements and uses a pyramidal approach to detect large displacements (Bai and Huang 2018, Li et al 2019). The OpenCV (http://opencv.org/) implementation of Farnebäck (Pérez et al 2013) and TV-L 1 (Zach et al 2007) are used in this analysis. With a similar tuning approach, we found that a window size of 8 pixels (≈24 km at Nadir) is optimum for Farnebäck. This is also comparable to the value of 22 pixels window size at 1 km resolution found optimum in Aicardi et al (2022). Furthermore, the authors showed that optimizing the parameters led to a maximum improvement of 2%-3% for Farnebäck and less than 1% for TV-L 1 . The time step parameter of TV-L 1 is kept the same as in Urbich et al (2018) at 0.1. Tables 2 and 3 show the complete list of parameters used in Farnebäck and TV-L 1 .
We chose images with a time difference of 30 min due to the low spatial resolution of 3 km × 3 km. Using a higher temporal resolution is not necessary due to the effect of the spatial resolution on slow moving clouds like cumulus. In Yang et al (2020), the authors also chose a temporal resolution of 30 min when analysing irradiance forecasts from a channel of 2 km × 2 km spatial resolution. The CMV fields estimated by the three different methods are visualized in figure 3 for one example pair of images from 14 June 2019 07:00 UTC and 14 June 2019 06:30 UTC images. The direction of motion and the magnitude of motion are indicated by colour and intensity of the colour, respectively. It can be observed that the direction of rotation of the cyclonic formation To input an initial approximation of the flow fits well with the general counter-clockwise motion of cyclones in the northern hemisphere in all the three methods. Block-match detects more areas with motion due to its coarse resolution. TV-L 1 provides the smoothest motion field output. It takes 3.5, 0.4 and 0.1 s respectively for TV-L 1 , Farnebäck and blockmatch to compute the CMV field from an image pair used in this analysis.

Extrapolation of CI image with CMV
The 30 min ahead location of the pixels in the CI image is determined by adding the displacement vectors for each pixel obtained from the CMV estimation method to the row and column numbers of that pixel. These represent the new (row, column) locations of the pixels. Gridded interpolation is then applied to resample the CI pixel values in the new grid to the original (row, column) grid. The forecast for 60 min ahead is obtained by displacing the 30 min ahead forecast again by 30 min with the same displacement vectors. The same procedure is repeatedly applied for producing forecasts up to a horizon of 5.5 h.

Conversion of GHI to solar PV power output
In the first step, GHI is split into diffuse horizontal irradiance (DHI) and direct normal irradiance using the Engerer2 model implemented in Bright and Engerer (2019). The module tilts at the two sites are assumed to be equal to the latitude at the two sites. At both the sites, an installed capacity of 100 MW p is assumed. DHI is transformed into diffuse tilted irradiance (DTI) using the model introduced in Perez et al (1990) and implemented in pvlib (https://pvlib-python.readthedocs.io/ en/stable/reference/generated/pvlib.irradiance.perez. html). The global tilted irradiance (GTI) is then obtained by adding the DTI, the direct beam component and the ground reflected component using the pvlib function (https://pvlib-python.readthedocs. io/en/v0.6.0/generated/pvlib.irradiance.get_total_irr adiance.html). GTI is converted to DC power with the PVWatts model from pvlib (https://pvlib-python. readthedocs.io/en/v0.4.2/generated/pvlib.pvsystem.p vwatts_dc.html) assuming that there is no effect of temperature or wind speed on module power output, as shown in equation (2). The inverter AC power output is estimated using the PVWatts inverter model (https://pvlib-python.readthedocs.io/ en/stable/reference/generated/pvlib.inverter.pvwatts. html) as shown in equation (3). The inverters are assumed to have a nominal efficiency of 96% and a reference efficiency of 96.37%. ζ is the ratio of the actual DC power input and the DC power input limit of the inverter

Forecast evaluation technique
The satellite-based forecasts from all the four methods (CI persistence as reference, and three CMV approaches) are validated at the section-wide level and at the two BSRN sites over the daytime period for a forecast horizon of 0-330 min ahead with 30 min forecast steps (see figures 4(a)-(c), 5(a)-(c), 7(a)-(c) and 8(a)-(c)). The forecasted CI maps for the entire section are validated against the real-time analysis CI maps for the period 2018-2019 using the PSNR metric shown in equation (7). PSNR is the ratio of the maximum possible value of an m × n image or signal MAX f (here 1) to the magnitude of distorting noise or error RMSE section and is expressed in decibels (dBs) (Poobathy and Chezian 2014). PSNR x minutes section values are computed separately for each forecast horizon from 0 to 330 min ahead. A higher value of PSNR implies a better match between the predicted and the analysis image Site-level CI forecasts for the co-ordinates of the two BSRN stations are extracted from the forecasted CI maps for the calendar year 2018 and transformed to irradiance as described in section 2.2.1. The GHI forecasts are validated against ground measurements using the root mean square error metric as shown in equation (4). The RMSE site values are normalized by the average non-zero ground measured GHI computed separately for each forecast step ahead from 30 to 330 min (see equation (8)) to obtain the normalised root mean square error (nRMSE). The normalised mean absolute error (nMAE) is calculated as shown in equation (9).
Additionally, a no cloud motion or persistence of CI map forecast is also considered in this analysis as the worst-case reference. The same analysis CI map is assumed to persist up to 330 min ahead.

Validation against reference analysis maps
It is expected, that the forecast accuracy drops from an initial high value and gets more and more inaccurate with increasing forecast horizon. This is illus-   CMV methods outperform the persistence method. The worst performance is observed in August, where the PSNR drops below 15 dB for forecast horizons longer than 120 min for all methods.
On an average, TV-L 1 outperforms all the other methods at the section-wide level as seen in table 4, where the 30 min ahead PSNR value is given as the initial best value.
The highest forecast accuracy of 24.35 dB and 23.76 dB for TV-L 1 is observed in the month of March in both 2018 and 2019 respectively as shown in figures 4(d) and 5(d) respectively. During this period, the section-wide average CI is low (<0.15) and the OLR is high (>260 W m −2 ). Low forecast accuracy for the entire section is observed during the southwest monsoon period of June to September (see table 5) in 2018 and again in 2019 with the lowest-20.42 dB and 20.80 dB respectively for TV-L 1 , occurring in August. High values of section-wide averaged CI (>0.2) are also observed during the southwest monsoon period as shown in figures 4(e) and 5(e). The section-wide averaged OLR remains close to or below 250 W m −2 during this period as shown in figures 4(f) and 5(f). Intermediate accuracy values of 23.11 dB and 22.56 dB are observed in December 2018 and 2019 respectively, which is in the northeast monsoon period (see table 5). Both the CI and OLR are found to remain close to the threshold values of 0.2 and 250 W m −2 respectively, during this period.  The impact of convection and cloudiness on forecasting accuracy is further analysed with a day to day performance. Figure 6 shows that the forecast accuracy increases with the increase in daily averaged OLR. Low forecast accuracy is typically also observed for the days with high section-wide CI.

Validation against ground measurements
The variation in forecast error with the forecast horizon for the three selected months is shown in figures 7(a)-(c) for the GUR site and figures 8(a)-(c) for the TIR site. As shown in figure 7(d), the lowest forecast nRMSE is observed at the GUR site in March (9.35% for TV-L 1 ). The OLR at the site is also high (>280 W m −2 ) during this period as seen in figure 7(e). During the July to September southwest monsoon period (see table 5) of 2018 at the GUR site, a high forecast nRMSE is observed. Monthly averaged OLR of less than 230 W m −2 , indicative of deep tropical convection, is also observed in the months of July and August 2018 at the GUR site during this period. The highest forecast nRMSE for the CMV based methods is observed in July (29.46% for TV-L 1 ). Persistence shows the highest forecast error in August (32.41%), when the OLR at the site is also the least (220 W m −2 ). Relatively low forecast nRMSE (12.37% for TV-L 1 ) and high OLR (270 W m −2 ) is observed again in December. There is a steep rise in GHI forecast nRMSE at the GUR site with the decrease in OLR below 250 W m −2 as shown in figure 9(a). The p-value is close to 0 and implies that the correlation is significant. High CI (>0.4 W m −2 ) is predominantly observed when the OLR is below 250 W m −2 . The quadratic polynomial fit shows the best correlation at 0.7, see figure 10(a).
The averaged 30 min ahead GHI forecast error at the TIR site for each month of the calendar year 2018, is shown in figure 8(d). It must be noted here that ground measured GHI from the TIR station is unavailable due to missing or bad quality data for October and November 2018. The lowest forecast nRMSE of 9.47% (for TV-L 1 ) is observed in the month of March at the TIR site. During the southwest monsoon period of August (see table 5) 2018, a high forecast nRMSE (20.67% for TV-L 1 ) and the lowest OLR (<210 W m −2 ) is observed. The highest forecast error (23.46% for TV-L 1 ) is found to be during the northeast monsoon period (see table 5) in December. However, a relatively high OLR of 265 W m −2 is observed in December. The daily averaged error in the 30 min ahead GHI forecast is found to be increasing gradually with the decrease in OLR, as shown in figure 9(b). The correlations are significant due to the low p-values. Days with high CI (>0.4) typically occur when the OLR is below 200 W m −2 . The exponential fit of CI against OLR shows the best correlation coefficient (0.72), see figure 10(b). On an average, TV-L 1 shows the best results for both the sites (see table 6).

Evaluation of forecasted power
The percentage of time blocks with Error PV exceeding 10% is found to be maximum at the two sites for the convective period with low OLR (see figure 12). The highest benefit in using OF based methods over persistence is also observed during this convective period. Unlike the nRMSE metric for the GHI forecast, the percentage of erroneous blocks is not the highest in December at the TIR site. A similar pattern can be seen from the monthly GHI forecast nMAE plot for TIR (see figure 11). Note, that differences in the order of lines between figures 7 and 8 on the one hand and figures 11 and 12 on the other hand are due to the change in forecast horizon (30 min ahead and 90 min ahead).

Discussion
Out of the methods tested, TV-L 1 is found to outperform the others at both the section-wide and site level on an average. However, the inter-seasonal difference in forecast accuracy is far more significant than the inter-method difference in accuracy. The highest 30 min ahead section-wide CI forecast accuracy is observed in the month of March along with high section-wide OLR, indicating very little convection and low section-wide CI, signifying the prevalence of clear skies. The forecast accuracy then drops to a minimum in August accompanied by a sharp dip in OLR and a rise in CI, indicating a period of deep tropical convection. This is due to the fact that the southwest monsoon occurring in August (see table 5) affects a large part of the Indian subcontinent and parts of South Asia shown in the image section. The forecast accuracy rises to an intermediate value in December. This could be attributed to the relatively narrow area of influence of the northeast monsoon (Wonsick et al 2009, Misra andBhardwaj 2019). In general, low forecast accuracy is expected for periods with heavy convection and high cloudiness. Huang et al (2018) found a similar relation between cloudiness and forecast accuracy with NWP forecast errors over the Australian landmass for 13 irradiance measurement stations. Our analysis demonstrates the contribution of convection towards low accuracy in satellite-based CI image forecasts. The increase in day-ahead GHI forecast error due to the difficulty in representing convection at the lower spatial resolution of an NWP, as reported in Gregory et al (2012), is also observed here in satellite based  intra-day GHI forecasts with higher spatial resolution. However, a similar multi-year analysis of NWP and ground measured GHI data for a site in Helsinki, located in boreal climatic zone, revealed that there is no seasonal difference in the relative error of predicted GHI (Tuononen et al 2019).
Seasonal variation of the 30 min ahead GHI forecast error at the GUR site is similar to that    observed for the section-wide level CI forecast, except in December. The lowest average forecast error is observed in March, while the highest error is observed in August in the southwest monsoon period (see table 5). The error rises by a factor of 3 from March to August. This is accompanied by observations of very low OLR, as seen in the section-wide case. In December however, a low average forecast error is observed due to the location of the GUR site in a hot semiarid climatic zone out of the influence of the northeast monsoon.
Higher average GHI forecast errors are observed at the TIR site. The lowest 30 min ahead GHI forecast error at the TIR site is also observed in March and subsequently the error climbs by a factor of 2 till August. The OLR drops to its lowest value in August. The forecast error further rises to reach the highest in the month of December, during the northeast monsoon period (see table 5). However, a relatively high OLR is observed in December compared to August. This could be attributed to the lower cloud top height observed during the northeast monsoon as noted in Amudha et al (2016) and Rajeevan et al (2012). Additionally, the northeast monsoon of December 2018 is also reported to have been deficient in terms of rainfall (https://mausam.imd.gov. in/chennai/mcdata/ne_monsoon_2018.pdf) and fits well with the relatively high OLR observed. On an average, higher 30 min ahead GHI forecast errors are observed at TIR than at GUR. Our results show that it occurs due to the higher cloudiness at TIR, located in an Aw climatic zone, than GUR, located in a BSh climatic zone. Longer period with high cloudiness has also been reported for sites within the Aw zone than BSh in , Bojanowski et al 2018. Majority of the cloudy period is observed at GUR during the seasonal southwest monsoon convection time. As a result, the forecast error increases sharply with the decrease in OLR. TIR has a longer cloudy period and not all the forecast error is due to deep convection with large cloud top height, as indicated by the relatively high OLR during northeast monsoon than in the southwest monsoon period at the site. Therefore, the increase in forecast error with OLR is more gradual.
By analysing the percentage of erroneous time blocks of forecasted 90 min ahead PV production for each method, we observed that the OF methods provided the greatest improvement against persistence during convective situations with low OLR. The number of erroneous blocks and nMAE in December at the TIR site is not the highest like nRMSE. This can be attributed to some large forecast errors, whose effects get amplified by the nRMSE metric.

Conclusion
By analysing 2 years of Meteosat-8 derived image sections of the Indian subcontinent with parts of South Asia and 1 year of ground measured GHI data from two BSRN sites located in India, we demonstrated the negative impact of the South Asian monsoon on CMV based forecast accuracy. Our findings suggest that a direct link can be established between convection and high forecast error. As discussed previously, the seasonality of forecast error is clearly dependent on the climatic zone (Gregory et al 2012, Huang et al 2018, Tuononen et al 2019. Our study provides added value by showing that the limitations observed in day-ahead NWP based prediction are also present in intra-day CMV based forecasts. It can also be seen from the results that the difference in accuracy between the block-match and the OF methods tested, is not significant in comparison to the inter-seasonal difference in forecast accuracy. The section-wide analysis provides a broader assessment than single site validation, and can provide insights to power system operators and policymakers on possible loadgeneration mismatch scenarios. We further examined the difference in the influence of convection on the forecast error at two sites located in different Köppen-Geiger climatic zones. Tropical savanna locations like TIR witness higher cloudiness, higher forecast errors and a more gradual increase in forecast error with the reduction in OLR than in semi-arid locations like GUR.
Tropical and sub-tropical regions, like the Indian subcontinent, receive a high amount of solar irradiance and are also undergoing a rapid increase in gridconnected solar PV capacity. However, the economic operation of such PV systems and the power system as a whole, is limited by the inherent inaccuracy in forecast. Our analysis provides timely information on the quality of intra-day forecast that can be expected. Viewing these results in the context of the deviation and imbalance settlement regulations provide stakeholders interested in large scale PV systems with an estimate of average risk or expected losses due to deviations. On the one hand, further development of models to better incorporate the effect of convection on GHI prediction is a major open question from energy-meteorology point of view. On the other hand, the knowledge that such a seasonal limitation of forecast accuracy exists would help grid operators and solar PV operators in better allocation of resource or bidding in the power exchange.
We can see that convection sets a limitation on the accuracy of satellite based CMV methods. One option could be to use images of higher spatio-temporal resolution such as those from the ground based all sky imagers, which would allow a finer demarcation of the advective and convective parts. However, it is still not feasible to model the formation and dissipation of clouds in these models. Using data from satellite and ground imagers as inputs into large eddy simulation is another way forward which has the possibility of being able to simulate cloud formation and dissipation. These topics are a part of the future planned work in our group. Furthermore, higher errors are expected during the convective period. In such situations, a probabilistic forecast could be more useful than a deterministic one.
Apart from convection, aerosols have a significant impact on satellite estimated or predicted GHI depending on the choice of clear sky model for the region considered here. However, this factor was ignored in the current study in order to focus the attention purely on cloudiness and convection.

Data availability statement
The data that support the findings of this study are available upon reasonable request from the authors.