Data Mining Based on Neural Networks for Gridded Rainfall Forecasting

The application of neural networks in the data mining has become wider. Although neural networks may have complex structure and long training time but they have high acceptance ability for noisy data and high accuracy. Artificial neural network (ANN), has emerged during last decade as an analysis and forecasting tool in the field of weather. In this chapter the data mining based on neural networks has been used to forecast daily rainfall over Indian region. ANN has been trained for forecasting the rainfall of current year based on previous year’s rainfall for the months of June to September. The ANN hence trained has demonstrated promising results.


Introduction
The application of neural networks in the data mining has become wider.Although neural networks may have complex structure and long training time but they have high acceptance ability for noisy data and high accuracy.Artificial neural network (ANN), has emerged during last decade as an analysis and forecasting tool in the field of weather.In this chapter the data mining based on neural networks has been used to forecast daily rainfall over Indian region.ANN has been trained for forecasting the rainfall of current year based on previous year's rainfall for the months of June to September.The ANN hence trained has demonstrated promising results.

Literature review
ANN has been applied in a few weather forecasting cases in the past.A neural network, using input from the Eta Model and upper air soundings, has been developed by Hall et al.,1999 for the probability of precipitation (PoP) and quantitative precipitation forecast (QPF) for the Dallas-Fort Worth, Texas, area.Forecasts from two years were verified against a network of 36 rain gauges.The resulting forecasts were remarkably sharp, with over 70% of the PoP forecasts being less than 5% or greater than 95%.
A neuro-fuzzy system has been used for rainfall forecasting using data from 1893-1933 as training set and 1934-1980 as test set.ANN has shown outstanding forecasting performance in many other weather related forecasts (Hayati & Mohebi, 2007;Chattopadhyay, 2007;Paras et al., 2007;Collins & Tissot, 2008).In this chapter, we have tried to forecast rainfall based on only the latitude and longitude of previous year's rainfall datasets and the results were found to be very convincing.

About artificial neural network
An ANN is a mathematical model or computational model that is inspired by the structure and/or functional aspects of biological neural networks.A neural network consists of an www.intechopen.comBusiness Intelligence -Solution for Business Development 98 interconnected group of artificial neurons, and it processes information using a connectionist approach to computation (Sivanandam et al., 2009).In most cases an ANN is an adaptive system that changes its structure based on external or internal information that flows through the network during the learning phase.Modern neural networks are nonlinear statistical data modeling tools.They are usually used to model complex relationships between inputs and outputs or to find patterns in data.An ANN is configured for a specific application, such as pattern recognition or data classification, through a learning process.For the configuration, there are network functions used for training and testing of the network, as explained in following sections.

Network function
The word 'network'' refers to the inter-connections between the neurons in the different layers of each system.The most basic system has three layers.The first layer has input neurons which send data via synapses to the second layer of neurons and then via more synapses to the third layer of output neurons.More complex systems have more layers of neurons with some having increased layers of input neurons and output neurons.The synapses store parameters called "weights" which are used to manipulate the data in the calculations.
The layers network through the mathematics of the system algorithms.The network function f(x) is defined as a composition of other functions g i (x), which can further be defined as a composition of other functions.This can be conveniently represented as a network structure, with arrows depicting the dependencies between variables, as shown in Fig. 1.

Training and testing the network
In an Artificial Neural Network, the system parameters are changed during operation, normally called the training phase.After the training phase, the Artificial Neural Network parameters are fixed and the system is deployed to solve the problem at hand (the testing phase).The Artificial Neural Network is built with a systematic step-by-step procedure to optimize a performance criterion or to follow some implicit internal constraint, which is commonly referred to as the learning rule (Kosko, 2005).The input/output training data are fundamental in neural network technology, because they convey the necessary information to "discover" the optimal operating point.The nonlinear nature of the neural network processing elements (PEs) provides the system with lots of flexibility to achieve practically any desired input/output map, i.e., some Artificial Neural Networks are universal mappers.
An input is presented to the neural network and a corresponding desired or target response set at the output (when this is the case the training is called supervised).An error is composed from the difference between the desired response and the system output.This error information is fed back to the system and adjusts the system parameters in a systematic fashion (the learning rule).The process is repeated until the performance is acceptable.It is clear from this description that the performance hinges heavily on the data.If one does not have data that cover a significant portion of the operating conditions or if www.intechopen.comData Mining Based on Neural Networks for Gridded Rainfall Forecasting 99 they are noisy, then neural network technology is probably not the right solution.On the other hand, if there is plenty of data and the problem is poorly understood to derive an approximate model, then neural network technology is a good choice.In artificial neural networks, the designer chooses the network topology, the performance function, the learning rule, and the criterion to stop the training phase, but the system automatically adjusts the parameters.So, it is difficult to bring a priori information into the design, and when the system does not work properly it is also hard to incrementally refine the solution.But ANN-based solutions are extremely efficient in terms of development time and resources, and in many difficult problems artificial neural networks provide performance that is difficult to match with other technologies.

MLP back propagation network
This is the most common neural network model, also known as supervised network because it requires a desired output in order to learn.The goal of this type of network is to create a model that correctly maps the input to the output using the historical data so that the model then can be used to produce the output when the desired output is unknown.
In this network, shown in Fig. 2, the input data are fed to input nodes and then they will pass to the hidden nodes after multiplying by a weight.A hidden layer adds up the weighted input received from the input nodes, associates it with the bias and then passes the result on through a nonlinear transfer function.The output node does the same operation as that of a hidden layer.This type of network is preferred as back propagation learning is a popular algorithm to adjust the interconnection weights during training, based upon the generalized delta rule proposed (Kosko, 2005).

Case study of rainfall forecasting 4.1 Datasets used
A very high resolution (0.5° × 0.5°) daily rainfall (in mm) dataset for mesoscale meteorological studies over the Indian region has been provided by Indian Meteorological Department (IMD) and described by Rajeevan & Bhate(2009).The dataset is in .grdformat, a control file describing the structure of .grdfile has been provided.There is one .grdfile for each year of rainfall.This dataset consists of daily rainfall data for each year for the period 1984-2003.The data is for the geographical region from longitude 66.5 ºE to 100.5 ºE and latitude 6.5 ºN to 38.5 ºN for each day of the year.There are 4485 grid points readings every day and rainfall record for 122 days (June to September) per year are selected for analysis i.e 5,47,170 records out of a total of 16,37,025 records for one year of rainfall.

Data re-processing
Steps followed for pre-processing of the .grdso that the ANN can be trained and tested, are mentioned below: 1.The .grd file has been converted to .datfile using a FORTRAN (Formula Translator) programme.This dataset is very huge in size.2. The .txt files have been exported to Excel worksheet and then to Access database.Before training, the inputs and outputs have been scaled so that they fall in the range[-1,1].
The following code has been used at Matlab prompt:-[pn1992,minp,maxp,tn1992,mint,maxt]=premnmx(linput,loutput) The original network inputs and targets are given in the matrices linput and loutput.The normalized inputs and targets, pn1992 and tn1992, that are returned, will all fall in the interval [-1,1].The vectors minp and maxp contain the minimum and maximum values of the original inputs, and the vectors mint and maxt contain the minimum and maximum values of the original targets.

Methodology
Different transfer functions for hidden and output layers were used to find the best ANN structure for this study.Transfer function used in hidden layer of the back propagation network is tangent-sigmoid while pure linear transfer function is used in output layer.
ANN developed for prediction of rainfall is trained with different learning algorithms, learning rates, and number of neurons in its hidden layer.The is to create a network which gives an result.The network was simulated using 3 different Back propagation learning algorithms.They are Resilient Backpropagation (trainrp), Fletcher-Reeves Conjugate Gradient (traincgf) and Scale Conjugate Gradient (trainscg).
The Resilient Back propagation (trainrp) eliminates the effect of gradient with small magnitude.As magnitudes of the derivative have no effect on the weight update, only the sign of the derivative is used to determine the direction of the weight update.Trainrp is generally much faster than standard steepest descent algorithms, and require only a modest increase in memory requirements which suits network with sigmoidal transfer function.
Fletcher-Reeves Conjugate Gradient (traincgf) generally converges in fewer iteration than trainrp, although there is more computation required in each iteration.The conjugate gradient algorithms are usually much faster than variable learning rate back propagation, and are sometimes faster than trainrp.Traincgf also require only a little more storage than simpler algorithms, thus they are often a good choice for networks with a large number of weights.
The third algorithm, Scale Conjugate Gradient (trainscg) was designed to avoid the timeconsuming line search.This differs from other conjugate gradient algorithm which requires a line search at each iteration.The trainscg routine may require more iteration to converge, but the number of computations in each iteration is significantly reduced because no line search is performed.Trainscg require modest storage.

Results
Daily rainfall data for 122 days in a year i.e. months June to September were chosen for training and testing.Networks were trained with data of year 1989 and tested using rainfall data of the year 1990.The training has been done using three different training functions as mentioned before: traincgf, trainrp and trainscg.

Conclusion
It is concluded that ANN has demonstrated promising results and is very suitable for solving the problem of rainfall forecasting.Using only the input parameters as gridded location, the ANN has been trained to predict Rainfall.This study has clearly brought out that Data Mining techniques when applied rigorously can help in providing advance information for forecast of sub-grid phenomenon.

Acknowledgement
This study is based on the datasets made available by courtesy of Indian Meteorological Department, India.The author is thankful for the support extended by IMD.Also, the author thanks Dr. Rattan K. Datta, Former Advisor -Deptt. of Science & Technology, Former President -Indian Meteorological Society and Computer Society of India, for his motivation and guidance.

References
Chattopadhyay S.( 2007).Multilayered feed forward Artificial Neural Network model to predict the average summer-monsoon rainfall in India, Journal Acta Geophysica, Vol. 55, No.3, 2007, pp. 369-382. Collins W., Tissot P.(2008).Use of an artificial neural network to forecast thunderstorm location, Proceedings of the Fifth Conference on Artificial Intelligence Applications to Environmental Science, Published in Journal of AMS., San Antonio, TX, January, 2008.

Fig
Fig. 2. Neuron Model www.intechopen.comData Mining Based on Neural Networks for Gridded Rainfall Forecasting 101 Latitude(°N) Fig. 3 to Fig. 5 demonstrate the result of training with year 1989 dataset and testing with year 1990 datasets.The results are convincing and the network once trained has been tested with year 1990 datasets and the error comes out to be less than 0.005 in 5 epochs for training functions trainscg and traincgf.With trainrp function, it takes 35 iterations to train.Another rainfall dataset is for the year 1991 and 1992, training with 1991 and testing with 1992.Fig. 6 to Fig. 8 demonstrate the result of training with year 1991 dataset and testing with year 1992 datasets.Here again, the results are convincing and the network once trained has been tested with year 1992 datasets and the error comes out to be less than 0.005 in 3 epochs for training functions trainscg and traincgf.With trainrp function, it takes 13 iterations to train.

Fig. 3 .Fig. 8 .
Fig. 3. Result of training ANN with Rainfall data of year 1989 and testing with Rainfall data of year 1990 using learning function traincgf The data looks like as if a rectangular grid is filled with values of rainfall in mm.(a sample of year 1989 rainfall is shown in table 1). 3. A programme is written in Visual Basic so as to organize data in tabular format with rainfall mentioned at every grid point on each day, as shown in table 2. 4. Finally exporting the dataset into .xlsformat for analysis, by Matlab (Matrix Laboratory).The daily rainfall dataset taken into consideration for the training of Neural Network is from longitude 70.5 °E to 90.0 °E and latitude 17.5°N to 37.0°N for the time period June to September for the years 1989 to 1992 as the focus is on Indian subcontinent only.
(Source: as a result of pre-processing rf1989.grdprovided by IMD)

Table 2 .
Rainfall for year 1989 organized in tabular format

Table 3 .
Sample of location-wise rainfall for year 1989 www.intechopen.comDataMining Based on Neural Networks for Gridded Rainfall Forecasting 103Back Propagation network has been used.The input dataset comprises of daynumber (day 1 corresponds to June 1, day 2 to June 2 and so on till day number 122 that corresponds to September 30), latitude and longitude.The output data corresponds to rainfall in mm.A sample of dataset is shown in table 3. From this table, columns 1 to 3 are used as input and column 4 is used as target.