WIND SPEED FORECASTING IN BIG DATA AND MACHINE LEARNING: FROM PRESENTS, OPPORTUNITIES AND FUTURE TRENDS

Wind speed forecasting is an exciting study because it covers the fields of climate and energy disciplines, where the most widely used research focus is forecasting. During the last decade, the use of wind speed forecasting analysis techniques has seen a significant change from the traditional statistical method to machine learning. in this article discusses publication trends from 1945 to the end of 2020 using co-occurences.


Data collection
In this study, publication data was taken from Scopus sources using the keyword "wind speed forecasting" starting 1945 to 2020, in short for 75 years.

Data analysis
Information in text form is important information and can be obtained from various sources such as books, newspapers, websites, or e-mail messages. Text is an expanse of language, both in speech or in writing, which has meaning, is practical and useful for the public and relates to the real world [31]. To analyze frequently occurring keywords, the step most crucial is to measure how often words appear together relative either how often they appear separately [32] [33] [34]. Besides, the correlation between words. Regarding text, correlation between words is measured in binary form -words appear together or not. The common measure for such binary correlation is the coefficient in Table 1 and Eq(1).
In the selection of Chi Square features based on statistical theory, Eq(2) represents of two events of which are, the emergence of features and the emergence of categories, where each term value is ordered from the highest based on the following calculation The chi Square feature selection is done by sorting each feature based on the Chi Square feature selection results from the largest value to the smallest value [8] [35]. Meanwhile, the chi-square feature selection value that is greater than the significant value indicates the rejection of the independence hypothesis. Whereas if two events show dependent, then the feature resembles or is the same as the corresponding category label in the category.

Wind Speed in Climate to Energy
Research studies on wind speed are very important to use because they involve the needs in terms of climate, including assessing the monsoon [36]. The monsoon in Indonesia is part of the East and Southeast Asian Monsoons and this extension of the monsoon system is called the North Australian monsoon [37]. The characteristic of the East Asian monsoon is the strong winter component. The East Asian monsoons are formed during winter in the Northern Hemisphere, namely in December, January and February [40]. High pressure is on the Asian continent and low pressure in the southern hemisphere due to the summer on the Australian continent, so that the wind blows from Asia to Australia. During this period, from the tip of southern Sumatra, Java, Bali, Nusa Tenggara to Irian, the monsoon winds blew from west to east. 4 ROHAYANI, WARNARS1, MAURITSIUS1, ABDURRACHMAN In fact, during the summer in the Northern Hemisphere, namely June, July and August. Low pressure is on the Asian continent and high pressure is on the Australian continent, so that the wind blows from Australia to Asia [41]. From the tip of southern Sumatra, Java, Bali, Nusa Tenggara to Irian, the monsoon wind blows from east to west. This period brings dry air masses, so it can be said that this period coincides with the dry season in most parts of Indonesia. Then, wind speed is also useful for oceanographic studies, one of which is for analyzing the level of fertility of a waters that always fluctuates because it is influenced by oceanographic phenomena that occur.
Meanwhile, [42] conducted by the North Pacific region with 2-year data series, in 1999 and 2000 concluded that chlorophyll-a concentrations were higher during periods of relatively strong winds, whereas chlorophyll-a concentrations decreased during periods of relatively weak winds. This pattern shows that most areas with increasing wind speed can deepen the mixed layer vertically in the ocean so as to cool the ocean surface and increase the concentration of chlorophyll-a.
The wind speed study can be useful as renewable energy [43]. Air that moves has mass, density and velocity. So that with these factors, the wind has kinetic energy and potential energy [44] [45].
However, the velocity factor dominates the position of the mass towards the earth's surface. Thus the kinetic energy is more dominant than potential energy. The movement of air molecules has kinetic energy, so that locally the number of air molecules moving through an area during a certain period of time determines the amount of power. This area is not the surface area of the earth, but the area upright. Different topography or altitude causes different wind potential, and because wind power is proportional to wind speed cube, even a small difference in wind speed will result in a large difference in power. Wind conditions and speed determine the rotor type and size. Average wind speeds ranging from 3 m/s are adequate for small size propeller wind turbines, above 5 m/s for medium 5 wind turbines and above 6 m/s for large wind turbines [46]. Thus the wind power system makes use of wind through windmills to generate electricity [47]. Wind energy is an alternative energy that has good prospects because it is always available in nature, and is a clean and renewable energy source [48]. The process of utilizing wind energy goes through two conversion stages, namely: The wind flow will move the rotor which causes the rotor to rotate in accordance with the wind blowing. Also, the rotation of the rotor is connected to the generator so that 5 WIND SPEED FORECASTING IN BIG DATA AND MACHINE LEARNING electricity can be generated. Thus, wind energy is kinetic energy or energy caused by wind speed to be used to rotate windmill blades [49] [50].

Scopus Database
In this paper, we are performing corpus analysis and the dataset was generated on Scopus using the keyword "Wind Speed Forecasting", it was found that 8430 total publications during 1945 to last 2020, including 2745 open access, 891 Gold access, 206 hybrid gold, 1043 bronze access, and 1473 green access. Figure 1 represents that in the Scopus database the first published paper was in 1945 and up to 2020 there was a quite drastic increase, more specifically the increase was seen in early 2000 to 2016. Then, Table 2      Lei M., Shiyan L., Chuanwen J., Hongling L., Yan Z. [52] A review on the forecasting of wind speed and

CONCLUDING REMARKS
Based on this analysis, the future trend will discuss a lot about the application of deep learning methods to forecast wind speed. However, the ensemble technique or also the hybird method which is very broad is still used because it combines information in both parametric and non-parametric methods so that the resulting information is richer,following ARIMA+FFNN [70], ARIMAX+FFNN [71]. SARIMA+SVR [72] [73], Advanced techniques are also used, such as deep neural networks [74] [75], long short term memory [76] [77], facebook prophet model [78] [79].
Meanwhile, the wind speed series are complex and exhibits several levels of seasonality [80]: the wind speed at a given hour is dependent not only on the load at the previous hour, but also on the wind speed at the same hour on the previous day, and on the wind speed at the same hour on the day with the same denomination in the previous week. At the same time, there are many important exogenous variables that must be considered, especially climate-related variables.
Besides, the technique used can also include combining optimization techniques with hybrids, such as ANFIS+Quantum-behaved PSO [81], ARIMA + FFNN + GA, VAR-NN-PSO [82], VAR-NN-GA [10], ARIMA+Deep Learning [83], VAR+GSTAR+SVM [84]. Also, multi kernel learning includes Fixed rules, Heuristic approaches, Bayesian approaches, Boosting approaches. For combinations also use the Linear combination, Nonlinear combination, Data-dependent combination. Another technique also uses the best feature selection using feature selection which aims to Reduce Overfitting, Less redundant data means less opportunity to make decisions based on noise. Improves Accuracy, Less misleading data means modeling accuracy improves. Reduces Training Time, fewer data points reduce algorithm complexity and algorithms train faster. Such as Random forest, Boruta, XgBoost, Random multinomial logit (RMNL), Auto-encoding networks with a bottleneck-layer, Submodular feature selection, local learning based feature selection, Recommender system based on feature selection. The feature selection methods are introduced into recommender system research.

CONFLICT OF INTERESTS
The authors declare that there is no conflict of interests.