Service Oriented Load Balancing Framework in Computational Grid Environment

Grid computing or computational grid has become a vast research field in academics. It is a promising platform that provides resource sharing through multi-institutional virtual organizations for dynamic problem solving. Such platforms are much more cost-effective than traditional high performance computing systems. Due to the provision of scalability of resources, these days grid computing has become popular in industry as well. However, computational grid has different constraints and requirements to those of traditional high performance computing systems. In order to fully exploit such grid systems, resource management and scheduling are key challenges, where issues of task allocation and load balancing represent a common problem for most grid systems as because the load scenarios of individual grid resources are dynamic in nature. The objective of this paper is to review different existing load balancing algorithms or techniques applicable in grid computing and propose a layered service oriented framework for computational grid to solve the prevailing problem of dynamic load balancing.


I. INTRODUCTION
The popularity of the Internet and the availability of powerful computers and high-speed networks as low-cost commodity components are changing the way we use computers today.These technical opportunities have led to the possibility of using geographically distributed and multi-owner resources to solve large-scale problems in science, engineering, and commerce.Recent research on these topics has led to the emergence of a new paradigm known as Grid Computing [[12]].It aggregates dispersed heterogeneous resources for solving various kinds of large-scale applications in science, engineering and commerce.In large-scale grid environments, the underlying network connecting them is heterogeneous and bandwidth across resources varies from link to link.Not limited to grid, in many of today"s distributed computing environments, the computers are linked by a delay and bandwidth limited communication medium that inherently inflicts tangible delays.Moreover, the impacts of trust and availability on performance and development difficulty can influence the choice of whether to deploy onto a dedicated computer cluster, to idle machines internal to the developing organization, or to an open external network of volunteers.
Due to uneven task arrival patterns and unequal computing capabilities, the computing node in one grid site may be overloaded while others in a different grid site may be under-utilized.As a result, to take full advantage of such grid systems, task scheduling and resource management are essential functions provided at the service level of the grid software infrastructure, where issues of task allocation and load balancing represent a common problem for most grid systems.
Hence, objective of this paper is to propose a layered service oriented framework to address the problem of load balancing in grid.Section II discusses the related work done in load balancing till date.A few aspect of load balancing in grid and the algorithms are discussed in Section III and Section IV.Section V briefly presents the open challenges of load balancing in computational grid.Functionalities of the proposed layered service oriented framework is explained in section VI and section VII concludes the paper.

II. RELATED WORKS
Load balancing has been discussed in traditional distributed systems literature for long period.Various strategies and algorithms have been proposed, implemented, and classified in a number of studies.In those studies, the load balancing algorithms attempt to improve the response time of a user"s submitted applications by ensuring maximal utilization of available resources.The main goal of this type of algorithm is to prevent, if possible, the condition in which some processors are overloaded with a set of tasks while others are lightly loaded or even idle.
In [[6]- [10]], some researchers have proposed several load balancing strategies in grid environments.Cao [[6]] and his coresearchers [ [11]] use an ant-like self-organizing mechanism to achieve system-wide grid load balancing through a collection of simple local interactions between grid nodes.In this model, multiple resource management agents cooperate to achieve automatic load balancing of distributed job queues.Each ant takes two sets of m steps in succession to determine the least and the most loaded nodes, respectively.The two nodes then redistribute the load between themselves.After a series of successive redistributions, system-wide uniform load balancing can be achieved.
Yagoubi and Slimani [ [7]] propose an algorithm which achieves dynamic load balancing in grid computing.On the basis of a tree model, their algorithm presents the following main features: (i) it implements a layered framework; (ii) it supports heterogeneity and scalability; and (iii) it is totally independent of any physical architecture of a grid.
Erdil and Lewis [ [9]] describe information dissemination protocols that can distribute the load in a way without using load rebalancing through job migration, which is more difficult and costly in large-scale heterogeneous grid.Essentially, in their model, nodes adjust their advertising rates and aggressiveness to influence where jobs get scheduled.
Ludwig and Maollem [ [10]] propose two new distributed swarm intelligence inspired load balancing algorithms.One algorithm is based on ant colony optimization, and the other algorithm is based on particle swarm optimization.
The discussed approaches have seldom given importance to the deadline stringency of the submitted jobs.This research concentrates on the prioritization of deadline of submitted jobs.There are two aspects of the proposed approachi) nearer the deadline of the submitted job, higher is its priority and ii) highest priority job gets allocated to least loaded processing nodes.Hence, the proposed "service oriented framework" described in this paper deals with the provision of the resources is capable of satisfying the Service Quality Agreement (SQA) of the task submitted by client.

III. LOAD BALANCING IN GRID
The load balancing mechanism in grid aims to equally spread the load on each computing node, maximizing their utilization and minimizing the total task execution time.In order to achieve these goals, the load balancing mechanism needs to be "fair" in distributing the load across the computing nodes implying that the difference between the heaviestloaded node and the lightest-loaded node should be minimized.
To implement the above discussed policies, grid middleware plays a significant role for creating a computational grid environment.It enables sharing and manages grid components based on user requirements and resource attributes (e.g., type of processor, speed, performance).It is a software that connects other software components or applications in order to provide the facilities, like, execution of data intensive applications on suitable resources in secured manner and allocation of resources are done based on policy.
The functionalities of grid middleware can be majorly classified into three categories: resource management, data management and information services.J u l y 2 0 , 2 0 1 3 Resource management functionality is mainly responsible for (i) resource allocation, (ii) job submission and (iii) remote execution of jobs and receiving the results.It is also responsible for managing job status and progress.
The data management functionality provides support to transfer files among nodes in the grid and for the management of these transfers.
The information services provide support for collecting information in the grid and for querying this information.
All of these are supported by the security infrastructure of grid.This provides security functions, including single/ mutual authentication, confidential communication, authorization, and delegation.
One major aspect of resource management in computational grid is load balancing.Load balancing can be defined by the following policies: (1) The information policy specifies what workload information and when is to be collected, and from where.
(2) The triggering policy determines the appropriate time at which to start a load balancing operation.
(3) The resource type policy classifies a resource as a server or a receiver of tasks according to its availability status.(4) The location policy uses the results of the resource type policy to find a suitable partner for a resource provider or a resource receiver.
(5) The selection policy defines the tasks that should be migrated from overloaded resources (source) to the idlest resources (receiver).In fact, a distributed system consists of policies for the use of the resources and the resources themselves.The policies include load balancing, scheduling, and fault tolerances.Although a grid belongs to the class of distributed systems, traditional policies of the distributed system cannot be applied into a grid directly [[1], [2]].In addition, although load balancing methods have been intensively studied in conventional parallel and distributed systems, they cannot work in grid architectures because these two classes of environment are radically distinct [[3], [4]].Indeed, the scheduling of tasks on multiprocessors or multiple computers supposes that the processors are homogeneous and linked with homogeneous and fast networks.The rationale behind this approach is as follows [[1], [2]]: (1) The resources have the same capabilities.
(2) The interconnection bandwidth between processing elements is high.
(3) Input data is readily available at the processing site.(4) The overall time spent transferring input and output data is negligible in comparison with the total application duration.
Because of the distribution of a large number of resources in a grid environment and the size of the data to be moved, the traditional distributed approaches are not accurate in a grid [ [5]].Heterogeneity, autonomy, scalability and adaptability, resource selection and computation-data separation of a grid make load balancing more difficult.These challenges bring significant obstacles to the problem of designing an efficient and effective load balancing system for grid environments.Some problems resulting from the above have not been solved successfully yet and still remain open research issues.Thus, it is a challenging problem to design a load balancing framework or system which can integrate all these factors.

IV. LOAD BALANCING ALGORITHM
Load balancing algorithms aim to equally spread the load on each computing node, maximizing their utilization and minimizing the total task execution time.The load scheduling algorithms can be divided into two groups: centralized and decentralized [ [19], [25], [34]].
In the centralized [ [19]] one, a central controller performs the load distribution among different sites.Since this controller has a general view of all the resources and sites, it can devote every job to its appropriate resource.Of course, increase in the number of resources and sites in this algorithm intensify the problem of bottleneck.
The decentralized approach [ [25]] performs load balancing based on the dynamic information which is derived from the sites.Among the advantages of this method, we can refer to scalability and high fault tolerance.A scheduling strategy can be divided into two groups: clairvoyant and non-clairvoyant.
The clairvoyant algorithm [ [28]] allocates the jobs to the suitable resources according to the characteristics of the jobs such as service time.
On the other hand, the non-clairvoyant algorithm [ [28]] unsystematically performs the job allocation to the resources without considering the characteristics of the job.
Another method used to make load balancing is Branch & Bound algorithm [ [30]].This method makes a collection of response space (nodes) by searching a binary tree and then making an interval of numbers for each available node of the tree.This method prunes a number of intervals and its existing numbers via the use of elimination policy in order to reduce the process of obtaining the load balancing.The branch & bound method is based on the farmer-worker model, and it only follows a worker to devote a job but it does not pay attention to the power of workers.
Among other studies having done, one is the use of dynamic tree model to get the load balancing.This model consists of three levels.The first level that includes leaves acts as sites, the second level acts as clusters and the third one is the Grid level.By using this method, and three available levels, the load balancing is formed in three levels namely intra-site, intracluster and intra-grid.
The hybrid methods establish load balancing via the use of First-Come-First-Served (FCFS) [ [27]] and genetic algorithms (GA) [ [27]].Whenever the number of jobs entered the queue is low, the FCFS algorithm will be appropriate and suitable.Whenever the number of jobs increases, in a way that they are not placed in the queue, the genetic algorithm is used and the load balancing is provided using sliding window.The sliding window makes only the jobs that are entered this window are scheduled by GA.This window makes the jobs to be allocated to the resources rapidly.The only fault of this method is that switching between two algorithms makes the overhead increase.
One of the proposed algorithms for load balancing is intelligent agents [ [11]].In this method, each agent acts as a local resource.These agents try to reduce the execution time by sharing and interchanging the information with each other.Having hierarchical structure is of the specification of these agents.By using this specification, as well as static and dynamic methods, the agents provide load balancing.
The other proposed algorithm for load balancing is the use of ant colonies.In this method, which is based on ant colony optimization (ACO) [ [10]], the ants can move in the form of search-max or search-min.In the first case, an ant moves ahead at random to find a node with overload, then it switches in the form of search-min (underload), and it is the time that the ant makes balance between the heavy-load node and the light one.Using artificial life techniques is one of the methods which provide load balancing in Grid.This method utilizes two kinds of algorithms including genetic and Tabu search (TS) to solve the problem of Grid load balancing.Whenever the space of solution is broad the genetic algorithm can be used.This algorithm performs the job scheduling alternatively and continuously because the load balancing can be done periodically.
The Tabu algorithm [ [27]] initiates with a neighborhood structure which includes the list of neighboring nodes.This list is searched.In searching process, the best move is selected.That is the same as optimum solution.Afterward the jobs are allocated to the optimum solution.
The Sufferage, Max-min and Min-min algorithms [ [27]] can be statically or dynamically used so as to establish load balancing.Through these algorithms and among a set of entered jobs to the environment, one job is selected in order to be scheduled and it is eliminated from the set.The process continues up to the allocation of the existing jobs.The Minmin algorithm selects the job which contains the minimum completion time (MCT) to allocate, but the Max-min algorithm selects the job which contains maximum completion time, and finally the Sufferage algorithm calculates the Sufferage value for each job, and performs the allocation accordingly.Sufferage value for a task is called the difference between best MCT (minimum MCT) and second best MCT (is the first number which is larger than best MCT).
One of the proposed methods in the field of load balancing is using the load managers.In this method each site in the grid includes a unit named load manager which accepts the jobs entering the system.It has relation to other load managers in other sites.Whenever this unit accepts the job, it executes that job in its local site via local scheduler, but if the site is overloaded, the unit sends the job to the remote site via network.Monitoring and managing the load scheduler and load manager is done by storage manager.Ultimately, all of these units establish the load balancing together.The static load balancing algorithms which use scatter operation are among the existing algorithms used to establish load balancing in this manner in grid.

V. OPEN CHALLENGES
Task scheduling in parallel and distributed systems has been intensively studied, but new challenges in grid environments still make it an interesting topic and call for further exploration.Among them heterogeneity, dynamism, computation and data separation are of prime importance.These unique characteristics of grid computing, which make the design of scheduling algorithms more challenging, are explained in what follows.Although we can look for inspirations in previous research, traditional scheduling models generally produce poor grid schedules in practice.

 Heterogeneity and Autonomy
Although heterogeneity is not new to scheduling algorithms even before the emergence of grid computing, it is still far from fully addressed and a big challenge for scheduling algorithm design and analysis.In grid computing, because resources are distributed in multiple domains in the Internet, not only the computational and storage nodes but also the underlying networks connecting them are heterogeneous.The heterogeneity results in different capabilities for job processing and data access.In traditional parallel and distributed systems, the computational resources are usually managed by a single control point.The scheduler not only has full information about all running/pending tasks and resource utilization, but also manages the task queue and resource pool.Thus it can easily predict the behaviors of resources, and is able to assign tasks to resources according to certain performance requirements.In a grid, however, resources are usually autonomous and the grid scheduler does not have full control of the resources.It cannot violate local policies of resources, which makes it hard for the grid scheduler to estimate the exact cost of executing a task on different sites.The autonomy also results in the diversity in local resource management and access control policies, such as, for example, the priority settings for different applications and the resource reservation methods.Thus, a grid scheduler is required to be adaptive to different local policies.The heterogeneity and autonomy on the grid user side are represented by various parameters, including application types, resource requirements, performance models, and optimization objectives.In this situation, concepts such as application-level scheduling and grid economy are proposed and applied for grid scheduling.

 Performance Dynamism
Making a feasible scheduling usually depends on the estimate of the performance that candidate resources can provide, especially when the algorithms are static.In traditional parallel and distributed systems contention caused by incoming applications can be managed by the scheduler according to some policies, so that its impact on the performance that the site can provide to each application can be well predicted.Computations and their data reside in the same site or data staging is a highly predictable process, usually from a predetermined source to a predetermined destination, which can be viewed as a constant overhead.On the contrary grid schedulers work in a dynamic environment where the performance of available resources is constantly changing.The change comes from site autonomy and the competition by applications for resources.Because of resource autonomy, usually grid resources are not dedicated to a grid application.For example, a grid job submitted remotely to a computer cluster might be interrupted by a cluster"s internal job which has a higher priority; new resources may join which can provide better services; or some other resources may become unavailable.The same problem happens to networks connecting grid resources: the available bandwidth can be heavily affected by Internet traffic flows which are non-relevant to grid jobs.For a grid application, this kind of contention results in performance fluctuation, which makes it a hard job to evaluate the grid scheduling performance under classic performance models.From the point view of job scheduling, performance fluctuation might be the most important characteristic of grid computing compared with traditional systems.A feasible scheduling algorithm should be able to be adaptive to such dynamic behaviors.Some other measures are also provided to mitigate the impact of this problem, such as SQA negotiation, resource reservation (provided by the underlying resource management system) and rescheduling.

 Resource Selection and Computation-Data Separation
In traditional systems, executable codes of applications and input/output data are usually in the same site, or the input sources and output destinations are determined before the application is submitted.Thus the cost for data staging can be neglected or the cost is a constant determined before execution, and scheduling algorithms need not consider it.But in a grid which consists of a large number of heterogeneous computing sites (from supercomputers to desktops) and storage sites connected via wide area networks, the computation sites of an application are usually selected by the grid scheduler according to resource status and certain performance models.Additionally, in a grid, the communication bandwidth of the underlying network is limited and shared by a host of background loads, so the inter-domain communication cost cannot be neglected.Further, many grid applications are data intensive, so the data staging cost is considerable.This situation brings about the computation-data separation problem: the advantage brought by selecting a computational resource that can provide low computational cost may be neutralized by its high access cost to the storage site.These challenges depict unique characteristics of grid computing, and put significant obstacles to design and implement efficient and effective grid scheduling systems.

VI. PROPOSED FRAMEWORK
The proposed "service oriented framework" to solve the load balancing problem in computational grid is pictorially depicted in figure 1.The major focus of this framework is the deadline stringency of the submitted jobs.

Resource Allocation Computation Result
Resource Broker Grid User

Job Submission Completion Information
The framework consists of three different layers: a. Grid Usersubmits computation or data intensive application to grid for execution.
b. Resource Brokersolely responsible for distribution of the jobs in an application to the grid resources based on user"s service quality requirements and details of available grid resources for further executions.
c. Resource Poolpool of resources include cluster, PCs, supercomputer etc.
Initially, users submit the jobs or applications with details through the portal.Then resource broker of the grid collects runtime status or information from the resource pool.Thus, the grid information service collects the details of available grid resources and passes the information to the resource broker and broker keeps a record of those which will be required during task allocation.
On the other side, the tasks being submitted to the broker have their own "service quality requirements", which, if only mutually agreed between broker and user or client (who submits jobs), then only the task gets submitted to the broker.In this SQA (Service Quality Agreement) stress is given on the deadline for completion of the task efficiently.
The framework works on the notion of priority scheduling, nearer the deadline of the submitted job higher is its priority.
And the broker will allocate the highest priority job to the least loaded processing entity, provided that processing entity is capable of satisfying the SQA.If two tasks are submitted to broker with same deadline to completion values, then the one which is submitted first gets allocated to minimal utilized resource first.Other hand, submitted job can not be scheduled in the resources of grid if SQA is not satisfied.

Figure 1 -
Figure 1 -Service Oriented Load Balancing Framework