Optimal Common Job Block Table (CJBT) to improve the Performance in Hadoop framework

By rapid transformation of technology, huge amount of data (structured data and Un Structured data) is generated every day. With the aid of 5G technology and IoT the data generated and processed every day is very large. If we dig deeper the data generated approximately 2.5 quintillion bytes. This data (Big Data) is stored and processed with the help of Hadoop framework. Hadoop framework has two phases for storing and retrieve the data in the network.


I. INTRODUCTION
Using Hadoop framework, it is very efficient to handle big data storage as well as its processing.
Hadoop uses large clusters of commodity hardware to store and process big data in a distributed fashion.
Open Source, Massive data storage and faster processing capabilities made it very popular.
To retrieve the data from the distributed environment requires lot of computations.More efficient algorithms are needed to handle such cases.Map Reduce framework is a efficient algorithm to process the huge data sets.Even Map Reduce algorithm is efficient to process the big data it has some limitations.

•
Task scheduling is based on the location of the data.
• After finishing all mapping, reducing can be started.
• Intermediate data generated during map reduce process is destroyed after use.

•
To provide data location and resource allocation it requires lot of efforts.
• Map reduce treats each job as a new job and does all the computations again.
In Hadoop framework there are Three daemons.
Name node, Secondary Name node and Data node.
Name node holds the meta data of file that is distributed in the cluster, Secondary Name node holds the replica of meta data of file, that is used in case of master Name node failure and Data node holds the actual data of file that is divided into blocks.
Each block has three replicas in the cluster.When there is a request for accessing the file from the client, the request send to Name node then name node will reply the meta data of file to the client, client will convert that meta data in the form of HDFS and then the request would be sent to the Data nodes which are in the cluster.
Map reduce algorithm is implemented at each Data node.Map algorithm will find the data sets present at each Data node.Reduce algorithm will aggregate all the data sets (blocks) into one file and then the file will be transferred to the client machine.Common Job Block Table act as a cache for the files which contains the following attributes.
• Common job name  The size of Common Job Block Table keeps on increasing in real time.There should be some limit on the size of CJBT.To get optimal size of the CJBT we implement optimal algorithm on the Common Job Block Table .Least Recently Used (LRU) algorithm will give the best results.We replace the existing file in the CJBT with the new file when it is not used in recent.

IV. EXPERIMENTAL RESULTS
The results are compared between native Hadoop

Fig. 3 .Fig. 4 .
Fig.3.Native Hadoop framework framework and Improved Hadoop framework.If the file is not processed previously then only Map Reduce task is performed.After Map Reduce tasks, results are stored at Data Nodes and an entry is made at CJBT.If the same file is requested by the client again, then it searches first in the CJBT to get optimal performance.By this the recompilations are reduced and data transfer within the network is reduced.Data Nodes required during the action is very less, it further helped to reduce the energy as well.
In the improved Hadoop we improve the capabilities of Name node by employing a special table called Common Job Block Table (CJBT).CJBT holds the data of the data of files, which act as Cache for the files.When there is a request from the client, The Name node first inspect the CJBT.If the file information found in the CJBT then the Name node gets the file Each request from the client is treated as a new request so that there is wastage of Time and Resources.The improved Hadoop will illustrate how this problem is minimized.II.IMPROVED HADOOPAs discussed above, In the native Hadoop framework every request is treated as new request.When there is a request for the same file again, then it is treated as a new request then all the steps are repeated again, this leads to wastage of time and resources.directly from the nodes which is already computed in the previous request.If the file information is not available in the CJBT, then it is treated as a new request.

Table .
(CJBT) at Name node will reduce the time to access the file and resource minimization by compromising the cost to implement Common Job Block table at Name node.And also implement Least Recently Used algorithm to put cap on the size of the Common Job Block Table.