Data on Field Canals Improvement Projects for Cost Prediction Using Artificial Intelligence

Field Canals Improvement Projects is an important sustainable project to save fresh water in our world. Machine learning and artificial intelligence (AI) needs sufficient dataset size to model and predict the cost and duration of Field Canals Improvement Projects. Therefore, this data paper presents dataset includes the key parameters of such project to be used for analyzing and modelling project cost and duration. The data were acquired based on questionnaire survey and collecting historical cases of Field Canals Improvement Projects. The data consists of the following features: area served, total length of PVC pipe line, number of irrigation values, construction year, geographical zone, cost of FCIP, and duration of FCIP construction. The data can be applied to compare and evaluate the performance of machine learning algorithms for predicting cost and duration.


Specifications
Management Information Systems. Specific subject area Predictive analysis, conceptual cost estimate, duration prediction, algorithms validation, ensemble machine learning. Type of data Table, Excel file (7 columns X 1276 rows). How data were acquired The data were acquired based on questionnaire survey and collecting historical cases of Field Canals Improvement Projects as shown in Table 1 and Appendix B. Moreover, Delphi rounds [9] have been conducted as displayed in Appendix C. Data format Raw data. Parameters for data collection The data consists of the following features: area served, total length of PVC pipe line, number of irrigation values, construction year, geographical zone, cost of FCIP, and duration of FCIP construction. Description of data collection Key conceptual cost drivers affecting the cost estimation of FCIPs based on using historical quantitative data. Raw data are publicity available on the following repository. Data

Value of the data
• The data explains the key cost drivers of Field Canals Improvement Projects (FCIPs).
• The dataset is important for irrigation authorities and stakeholders such as contractors, engineers and decision makers to estimate the conceptual cost of FCIPs based on financial and feasibility perspectives. • The data objective is developing a reliable parametric cost or duration estimation model at the conceptual phase for Field Canals Improvement Projects (FCIPs). • The data can be applied to compare and evaluate the performance of machine learning algorithms for predicting cost and duration. • Data can be used as a benchmark data to assess the accuracy of other novel frameworks or models against the developed models in the previous studies [1] . • The data can be conducted to applied advanced computational theories and algorithms such as fuzzy-genetic model and deep learning algorithms.

Data Description
Construction cost estimation can be applied for several projects such as Irrigation [1] , transportation [2 , 5] , petroleum exploration and safety [4 , 6] . The current trend of cost estimation is using Artificial intelligence and machine learning to get the most accurate cost predictions [3] . Artificial intelligence and machine learning require sufficient dataset to model the cost prediction. The objective of this paper is describing the data on FCIPs for conceptual cost prediction using artificial intelligence and data science. The data are presenting the key conceptual cost drivers affecting the cost estimation of FCIPs based on using historical quantitative data. The collected parameters are denoted from P1 to D where minimum (Min), maximum (Max) and standard deviation (StD) have been displayed as showed in Table 1 . Such parameters are gathered based on surveying historical cases of Field Canals Improvement Projects via construction site records and contact information as quantitative data based on past project construction contracts' information and site recordings from 2011 to 2018 [1 , 7] . Such information is described in Table 1 and Appendix B . The geographical zone parameter is divided into three categories: 0 is the middle of the delta region, 1 is the east of delta, 2 is the west of delta. The poly venial chloride (PVC) pipeline diameters are ranging from 225 mm to 350 mm as shown in Fig. 1 and Fig. 2 . Moreover, these collected parameters depend on the previous literature.

Experimental design, materials, and methods
Questionnaire survey and collecting historical cases of Field Canals Improvement Projects have been conducted to collect the data using Delphi rounds as Appendix C . The Delphi rounds consists of three main rounds: collecting, rating and revising rounds [9] . The collecting round collects the all possible parameters. Rating and revising rounds have been applied to assessing and ranking the parameters. Accordingly, the key parameters were the top-rated parameters based on expert's evaluation. This approach was the identifying the key parameters based on qualitative technique.
Developing a reliable parametric cost estimation model consists of two main stages: key cost drivers identification and machine learning model development [7 , 10] . Firstly, identifying the key conceptual drivers can be conducted using qualitative or quantitative approaches as shown in Fig. 3 . Selecting the key drivers are affecting the accuracy of the cost estimation of FCIPs is based on using historical quantitative data. The data objective aims identifying FCIPs' cost drivers of preliminary cost estimate (CDPCE) by using the historical quantitative data. Experts' opinions  are not utilized here to avoid biased selection when using human judgment. The purpose of the data is to discover and apply data-driven methods to select the key cost drivers based only on the quantitative collected past data. The importance of cost drivers is to help decision makers to predict the preliminary cost of FCIPs and study the financial feasibility of these projects [1 , 7] .
As shown in Fig. 4 , the heat map correlation presents high positive correlation among the total length of PVC, cost of FCIP and the duration of FCIP. A slight positive correlation exists among the area served and the number of irrigation valves and duration of the project [1] . No strong correlation exists among the geographical zone of the project and project duration. Accordingly, this parameter can be removed. A slight negative correlation exists among the construction year and project duration. Accordingly, the heat map of key parameters correlation shows the pair relation among all the selected parameters [7] .
Secondly, a comprehensive tool for parametric cost or duration estimation can be developed using ML algorithms such as multiple regression analysis and the optimum neural network model. The data objective is developing a reliable parametric cost estimation model before the construction of FCIPs. Therefore, a total of 1276 FCIPs of constructed projects are collected to build up the proposed model. This data can be used for executing the most common artifi-  cial intelligence (AI) techniques which are conducted for cost modeling such as fuzzy logic (FL) model, artificial neural networks (ANNs), regression model, case-based reasoning (CBR), hybrid models, and evolutionary computing (EC) such as genetic algorithm (GA) [1] as showed in Fig. 5 .

Supplementary materials
Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.dib.2020.105688 .

Appendix A. Supplementary material
Supplementary data associated with this article will be found in Excel (.csv) format in the online version and in the following public repository: https://github.com/HaythamElmousalami/ Data _ in-Breif .

Appendix B: Field survey module and contract information
Likert scale is a rating scale to represent the opinions of experts where Likert scale can be consisted of three points, five points or seven points. For example, a five-point Likert scale may be "Extremely Important", "Important", "Moderately Important", "Unimportant", and "Extremely Unimportant" where the experts will select these points to answer received questions [4 , 3 , 8] . Based on Likert scale (5 points), select the most appropriate rate for each of the following parameters to evaluate each parameter affecting on the cost of FCIPs.