Literature Survey of Dynamic Data Replication in Cloud Computing

The main aim of this survey is to explore the existing replication strategies in cloud database so that the researchers can include all the necessary metrics in their works in this domain and the limitation s of the existing ones can be overcome. Cloud computing is a promising paradigm that provides computing resources as a service over a network. A number of data replication approaches have been presented for data cloud in the earlier decades. All replication technique access some attributes such as fault tolerance, scalability, reliability, performance, storage consumption, data access time etc. In this review, diverse issues included in data replication methodologies is distinguished and distinctive replication procedures are study to discover which attributes is tended to in a given and which is ignored. To categorize the techniques, all articles that had the word “dynamic data replication” in its title or as its keyword published between January 2003 to December 2014, is first selected from the scientific journals: IEEE, Elsevier, Springer and international journals. Here, we categorize the research based on three dissimilar perspectives, like features utilized application utilized and parameter measure. In addition, this study gives an elaborate idea about cloud computing based dynamic data replication.


INTRODUCTION
Cloud computing: Cloud computing is turning out to be a considerably significant technique, which aids the computing services to be employed all around the globe.The various attributes related to cloud computing include the unambiguousness in the resource allotment procedure and service rendering amenities.Internet has supported greater advancements to take place in information services as well as business related applications.The development of a number of applications relies on the cloud computing platform due to its flexibility (Anjali and Rokade, 2014).Further, cloud computing is a distributed computing system, which is extremely parallel.It is constituted of a set of interlinked and virtualized computing resources, which can be controlled as a single computing resource.The rendered abstract and the virtual resources that incorporate servers, data, networks and storage applications can be supplied in the form of a service.The services are supplied through the Internet upon the end-users' request and the service rendering can take the form of any of the three computing architectures, namely, Platforms as a Service (PAAS), Software as a Service (SAAS) and Infrastructure As a Service (IAAS).The cloud computing was developed with the aim to offer the users with highly flexible services as well as great computing resources in a cost-efficient, scalable, unambiguous and largely existing manner (Buyya et al., 2009).The layered architecture of cloud computing is given in Fig. 1.
When a more protective cloud computing solution has to be offered, the main criteria to be focused is the determination of the type of cloud to be realized.At present, four kinds of cloud deployment models are available and they are the public cloud, the private Public cloud: It enables the users to have the right to use the cloud through interfaces, which involve the mainstream web browsers.This model should be paid for every usage and it resembles the prepaid electricity metering system that shows improved flexibility through the consideration on spikes, which are essential for optimizing the cloud.Hence, the clients in the cloud could meet their IT expenses during the operational stage itself through the reduction in the principal cost spent on IT infrastructure (Jansen and Grance, 2011).
Private cloud: This kind of cloud exists inside an inhouse enterprise datacenter of an organization.It has the ability to support security, compliance and regulatory requisites in a better way and offers increased level of enterprise control over its exploitation (Malathi, 2011).
Hybrid cloud: This cloud belongs to the private cloud category.Further, it has connection to multiple external cloud services in a way that it is controlled centrally and assumed as one unit, which is confined to a more protective network (Mell and Grance, 2011).It makes a blend of the clouds that are public and private to yield virtual IT solutions.With hybrid cloud, the data control is increasingly protective and the application can enable a number of parties to perform data access via the internet (Mell and Grance, 2011).

Community cloud:
In this type of cloud, the clouds operate with a common intention.This cloud may assist either a single or multiple organizations and allows the information sharing, which is pertained to mission, principles, security, regulatory compliance requirements and similar other functions, between the organizations.Control over the community cloud is achieved using the constituent organization (s) or through a third party (Mell and Grance, 2011).Figure 2 shows the deployment model of cloud computing.Normally, cloud computing offers the infrastructure related to both the hardware as well as the software as services with the help of massive data centers (Björkqvist et al., 2011).Accordingly, the cloud computing has extended the computation as well as the data storage from the end user to countless data center infrastructures.The Cloud infrastructure balances the hardware capacity for achieving the desired nonfunctional Quality of Services (QoS).The cloud is massive and dynamic, so that it is troublesome to attain increased reliability and efficiency levels during the time the cloud data centers are accessed.Additionally, the cloud computing system allows numerous applications to be executed through the processing of enormous amount of network data.The network is composed of several nodes.The hardware of the system has more chances of damage due to this increased number of nodes.When the hardware gets damaged, the stored data in a node will be also corrupted.Further, an application requesting the crashed data during its execution stage will fail to have the data access from the corrupted node (Anjali and Rokade, 2014).Corruption prevents the applications from obtaining useful and thriving results.Data replication is a novel method in cloud computing system, which assists the applications in a continuous manner through the prevention of data corruption.Replication refers to the method that is involved in offering various replicas of a particular service at various nodes (Kemme and Alonso, 2010).
Data replication: Data Replication refers to the process of copying the electronic form of data from a computer's database to yet another computer's database in a quite often manner and thereby, the entire number of users is allowed to share equal degree of knowledge (Mazilu, 2010;Zhuo et al., 2011).The consequences of exploiting data replication is that the users can make access to their desired data from a distributed database, devoid of giving hindrance to other users utilizing the same distributed database.But, data replication grabs the attention of the researchers in its concept as well as practice.Further, data replication has been applied to enhance the effectiveness of data access in the conventional wired/wireless networks to a great deal.Data replication offers improved support to the users in accessing the desired data through not relying on the network infrastructure and reduction in the traffic load (Oo et al., 2010).Data replication assists the travelling sales people, the roaming disconnected users and the mobile users containing laptops to be kept informed with the recent database details, while the server is connected and uploaded with their data.Here, the data generation takes place at first and the replication process follows it (Tiwari et al., 2011).The replication techniques can be either active or passive.The replicas on the whole receive and process the same series of client requests in active replication.On the other hand, in passive replication, the requests of the clients are sent to a primary that acts upon the requests and delivers update messages to the backups (Dang and Lim, 2007).The replication aims to bring about a reduction in the data accesses, which can be either user accesses or the accesses pertaining to the enhancement in the effectiveness of task implementation (Ratner et al., 2004).In case of mobile computers, replication offers better performance as well as reliability through generating numerous replicas of significant data (Nguyen et al., 2010).

Dynamic data replication in cloud:
Several methodologies in distributed systems are engaged in enhancing the reliability as well as the availability.The replication technology is one such methodology in Cloud computing, which provides numerous replica of a certain service to the user on various nodes for cutting down the user waiting period as well as the bandwidth consumption in the cloud system and also to raise the data availability (Wei et al., 2010).There are two kinds of data replication and they are the dynamic replication algorithm (Doğan, 2009;Ghemawat et al., 2003) and the static replication algorithm (Shvachko et al., 2010;Wang et al., 2010;Ghemawat et al., 2003).The method of replication is already exploited in various kinds of clouds that include GFS (Google file system) and HDFS (Hadoop Distributed File System) (Xhafa et al., 2012;Rani and Yadav, 2013).But, huge advancements in terms of size and count have taken place in the cloud data centers.Moreover, with the help of internet, the dynamically-scalable and completely virtualized resources are being rendered as a service (Bell et al., 2002).While considering a cloud, the data resource pool is the one that supports the data replication to take place and the desired quantity of data replicas can be set in a statistical way in accordance with the earlier records and practice.In addition, the replica of the entire number of data files and unfamiliar data files in particular, are not necessarily be produced using dynamic data replication.Hence, the popular data files have to be replicated in an adaptive sense.Further, the decision on the number of data replicas to be produced as well as the data nodes to be allocated with the new replicas has to be made in reliance with the present situations in the cloud.
With dynamic regulation of the main data files, increased accessibility levels, greater fault tolerance and improved efficiency is achievable.Data replication offers both merits as well as demerits.Generating more number of replicas may probably affect the data availability and hence, results in unwanted expenses.In addition, troubles also arise during the location of the new replicas on various nodes in relation to the present situation of the distributed systems like, the cloud systems.If three main issues are solved, these shortcomings can be overcome and improved dynamic replication systems can be produced.The first issue is to determine the data to be subjected to replication as well as the time of replication in the cloud systems.The second issue is concerned with the number of appropriate new replicas to be generated in the cloud, so that the system availability is plausibly satisfied.The third issue is related with the location of the new replicas because the placing of the new replicas should bring about excellent execution rate and meet the requisites of bandwidth consumption.Therefore, it is necessary to develop a technique that has the ability of replicating the popular file in cloud computing with minimum expense for system maintenance and to offer solution to these issues.This is due to the fact that the increase in the number of replicas is likely to enlarge the expense, rather than rendering increased system availability always.In general, the crossing of the file's popularity across a threshold decides the initiation of dynamic replication (Stockinger et al., 2002).Hence, the objective of dynamic replication algorithms can be assumed as the discovery of feasible popular files.
Figure 3 elucidates the way the data is replicated in the cloud using an architectural diagram.The three important sections in the architecture are the user, the scheduling manager and the replica manager.The users are nothing but the clients in various sites, who makes the cloud access.Every single user accesses the cloud in a separate manner and offers various accesses to the data that owns replication as a feature.A certain task to be done is initially sent to the scheduling manager, who in turn, splits those tasks and forwards them via the replica manager to the related data centers.This procedure relies on the user's count, utilizing the specific data center for making a file access devoid of collusion.All kinds of dynamic replication strategy should consider three essential issues for achieving the replication in an optimal manner.The various issues are as follows: Replica management: Replica management refers to the procedure that is executed to either produce or remove the replicas from a storage spot (Chang and Chen, 2007).This statement implies that it is necessary to decide the file, which has to undergo replication.
Copying the entire number of data files that are residing in the data resources is never intelligent because it will have a bad influence on the storage capacity.

Determination of the number of replicas:
The number of replicas could be decided in an effortless way, when the replica generation of the prominent file depends solely on the file's owner.However, it is very difficult to find the number of replicas in Grid systems and it becomes more serious, when the nodes as well as the resources grow quickly in number.Hence, the replication depends on both the owner as well as the nodes.
Determination of the replica location: Replica location is not an easy job because it effectively discovers the physical locations, where several replicas of the required data reside, from a massive and an extremely large data area.
 The main objective of the survey is explaining the Dynamic Data Replication in Cloud Computing.This study gives the elaborate idea about cloud computing based dynamic data replication.

Survey of dynamic data replication in cloud:
All articles that had the word "data replication" in its title or as its keyword published from January 2004 to December 2014, was first selected from the scientific journals: IEEE, Elsevier, Springer and international journals.Among the periods, a huge number of studies have been developed on machine Techniques based on peer to peer architecture: In this part we elucidated the some earlier works associated to the peer to peer architecture using data replication, in this literature inspired us to do this research.Some of such present researches are briefly made cleared in this part.Early work carried out on data replication was typically based on these methods.
Based on Performance with fault tolerance the Improving P2P in Cloud Computing has been elucidated by Sarada and Nagarajan (2011).In this document a bunch contains a single database and multiple chunk servers and was accessed by multiple clients.Now, when a client desires to visit some information on a chunk server, it was first send an appeal and the database next directs with the related chunk handle and locations of the models.Therefore the processing loads on servers were stabilized.In addition, Agarwal et al. (2012) have made cleared the Peer-to-Peer Data Sharing for Scientific Workflows on Amazon EC2.By means of a peer-to-peer approach the problem of data sharing in scientific workflow is performed to assist to work out this problem.For storing information for a typical data-intensive workflow application they contrast the presentation of our peer-to-peer file manager with that of two network file systems.Our effects demonstrate that while our peer-to-peer file manager executes considerably better than one of the network file systems checked, it does not execute with the other.Similarly, data replications in P2P Collaborative systems have elucidated in Xhafa et al. (2012).Now, they extremely made cleared some of the replication techniques based on XML document files.To triumph over the difficulty of data transfer and storage in data-intensive workflows more they brought in the approach was a fuzzy based approach for successful data replication in peer to-peer networks in Rani and Yadav (2013).The obligation of setup some sub system over the network to allocate the network load.Such sub systems was named replication servers.This system was complete or the incomplete copy of the real centralized server.To find optimal number of replication servers necessary in the system was the most important problem in such systems.The existing research was about to recognize the necessary number of such sub systems over the network.In this effort, a statistical study was offered based on the network capabilities, network distribution and the node requirements.In addition, a Storage Service based on P2P Cloud System has been made cleared by García-Rodríguez and López-Fuentes (2014).Now, scalability, confidentiality, files sharing, data replication, data management, quality of service, decentralization and transparency problems are worked out by means of P2P architecture.

Techniques based on multi-tier architecture:
In this section we explained the dynamic data replication using multi-tier architecture.To allocate the storage, computational and the network resources the multi-tier topology offers a very cheap and competent way.It permits hundreds and thousands of users to divide the common resources competently.The Dynamic replication algorithm for the multi-tier has been made cleared by Data Grid Ming (Tang et al., 2005).Now, they employed two dynamic replication algorithms, Simple Bottom-Up (SBU) and Aggregate Bottom-Up (ABU) was applied for the multitier Data Grid.A multitier Data Grid simulator called DRepSim was proposed for studying the concerts of the dynamic replication algorithms.ABU was attained great performance improvements for all access patterns even if the accessible storage size of the replication server was very little.In addition, in Chang and Chang (2008) have elucidated the dynamic data replication by means of weight in data grids.Now, a dynamic data replication mechanism called Latest Access Largest Weight (LALW) was explicated.LALW chooses a famous file for replication and works out an appropriate number of copies and grid sites for replication.The significance of each record was distinguished by associating a dissimilar weight to each historical data access record.
A latest data access record has a bigger weight.It points out that the record was more relevant to the present situation of data access.To raise the system accessibility the author Sun et al. (2012) have made cleared the modeling a dynamic data replication by means of cloud.Now, the dynamic data replication approach was put forward with a short review of replication approach appropriate for distributed computing environments.It comprises:  Examining and modeling the relationship among system accessibility and the number of models  Assessing and recognizing the famous data and triggering a replication operation when the popularity data passes a vibrant threshold;  Working out an appropriate number of copies to meet a sensible system byte efficient rate requirement and placing models between data nodes in a balanced way;  Proposing the dynamic data replication algorithm in a cloud.
Experimental effects show the competence and efficiency of the better system brought by the suggested approach in a cloud.
In addition, Repantis et al. (2011) have clarified the Consistent Replication in Distributed Multi-Tier Architectures.They clarified the basic replication approaches with shifts consistency ensures and contend for the achievability of solid consistency.They utilized the TPC-W value-based web trade benchmark to give a far reaching execution correlation of the diverse replication approaches under a mixed bag of workload blends.Moreover, Hussein and Mousa (2012) have made cleared the A Light-weight Data Replication using Cloud Data.Now, the data replication approach to adaptively choose the data files which need replication in order to develop the accessibility of the system.Additionally, the system chooses vigorously the number of models and the efficient data nodes for copying.An experimental effect was exposed that the system behaves successfully to develop the accessibility of the Cloud system.Similarly, Kirubakaran et al. (2013) has clarified the information replication system utilizing adjusted D2RS algorithm.Here, they prescribe a Modified Dynamic Data Replication Strategy (MDDRS) to choose a sensible number and right area of replicas and they looked at both the adjusted element information replication method and Dynamic Data Replication Strategy (DDRS).The DDRS has three distinct stages which were the distinguishing proof of information document to imitate, number of imitations to be made and setting new replicas.They alter the fame degree in the first phase of typical element information replication system and the other two stages were like the ordinary element information replication methodology.

Techniques based on Hadoop Distributed File System (HDFS):
A number of earlier works related to data replication that employs in Hadoop Distributed File System (HDFS) architecture in the literature inspired us to do this research.Some of such recent researches are briefly described in this section.At high bandwidth the Hadoop Distributed File System (HDFS) was an allocated storage system that accumulates largescale data sets dependably and streams those data sets to applications.HDFS offers high presentation, dependability and accessibility by replicating data, classically three copies of every data.The data in HDFS varies in fame over time.The replication policy of HDFS should be flexible and acclimatize to data popularity to get improved presentation and higher disk exploitation.
In Wang et al. (2009) have elucidated the Hadoop High Availability through Metadata Replication.They made cleared the metadata replication based solution to facilitate Hadoop high accessibility by eliminating single point of failure in Hadoop.The solution engages three main phases: in initialization phase, each standby/slave node was recorded to active/primary node and its first metadata (such as version file and file system image) were caught up with those of active/primary node; in replication phase, the runtime metadata (such as outstanding operations and lease states) for failover in future was repeated; in failover phase, standby/new elected primary node takes general communications.In addition, Wang et al. (2009) has elucidated the Elastic Replication Management System for HDFS.The Elastic Replication Management System (ERMS) offered an active/standby storage model for HDFS.It exploits a composite event processing engine to differentiate real-time data types and after that vigorously raises extra replicas for hot data, cleans up these extra replicas when the data cool down and employs crossing out codes for cold data.ERMS as well brings in a replica placement approach for the extra replicas of hot data and crossing out coding parities.Metadata management is decisive to distributed file system in this approach.To triumph over this approach, Varade and Jethani (2013) have made cleared the Distributed Metadata Management Scheme in HDFS.In HDFS architecture, a single master server administers all metadata, while a number of data servers accumulate file data.This architecture can never meet the exponentially increased storage demand in cloud computing, as the single master server may turn out to be a performance blockage.To continued dependability, metadata was repeated in different Name Nodes with log replication technology and Paxos algorithm was implemented to keep replication consistency.In addition, Mamatha et al. (2014) have elucidated the general idea of Hadoop File System with Elastic Replication Management.The Apache Hadoop was a software structure that employs simple programming paradigm to process and examine large data sets (Big Data) across clusters of computers.
The Hadoop Distributed File System (HDFS) was one such technology that administers the Big Data competently.
Techniques based on QoS-aware data replication: In this part we explain the papers based on the QoS-Aware Data Replication which is the most common approach to data replication.The QoS-Aware Replica Placement for Content Distribution has been elucidated by Tang and Xu (2005).This document explores the QoS-aware replica placement problems for responsiveness QoS requirements.They reflect on two classes of service models: replica-aware services and replica-blind services.In replica-aware services, the servers were conscious of the locations of replicas and were thus optimize request routing to develop responsiveness.They demonstrate that the QoS-aware placement problem for replica-aware services was NP-complete.Numerous heuristic algorithms for quick computation of good solutions was elucidated and experimentally assessed.The QoS-Aware Heuristic Algorithm for Replica Placement has been made cleared by Wang et al. (2006).Now, they elucidated a novel heuristic algorithm that finds out the positions of replicas in order to please the quality requirements inflicted by data requests.The experimental effect point out that the algorithm finds a near-optimal solution successfully and competently for algorithm was as well acclimatize to different parallel and distributed environments.In cloud computing, Esteves et al. (2012) has made cleared the Quality-of-service for consistency of data georeplication.Now they employ of VFC3; an approach was stability model for repeated data across data centers with framework and library support to implement increasing degrees of stability for dissimilar types of data (based on their semantics).It goals cloud tabular data stores, presenting rationalization of resources (especially bandwidth) and development of QoS (performance, latency and availability), by offering strong consistency where it matters most and relaxing on less critical classes or items of data.
In addition, Liao et al. (2012) have made cleared the QoS-aware Dynamic Data Replica Deletion Strategy for Distributed Storage Systems under Cloud Computing Environments.Now, they employed QoSaware Dynamic Data Replicas Delete Strategy for disk space and maintenance cost saving reason.Experimental effects show that the DRDS algorithm was save disk space and maintenance costs for allocated storage system while the accessible and presentation QoS requirements are made certain.
Based on data importance, the Dynamic QoSaware data replication in grid environments has been made cleared by Andronikou et al. (2012).In this document, they scrutinized how this mixture influences the replication lifecycle in Data Grids and they brought in a set of interoperable approach file replication algorithms that take into report the infrastructural constraints and the 'importance' of the data.The final was approximated through a multi-parametric factor that summarizes a set of data particular parameters, such as popularity and content importance.However, the QoS-Aware Data Replication for Data-Intensive Applications has been made cleared by Lin et al. (2013) in Cloud Computing Systems.To constantly sustain the QoS requirement of an application after data corruption, they brought in two QoS-Aware Data Replication (QADR) algorithms in cloud computing systems.The initial algorithm implements the intuitive plan of High-QoS First-Replication (HQFR) to carry out data replication.On the other hand, this greedy algorithm can never minimize the data replication cost and the number of QoS infringed data replicas.The next algorithm converts the QADR problem into the famous minimum-cost maximum-flow (MCMF) problem to attain these two minimum points.By using the presented MCMF algorithm to work out the QADR problem, the next algorithm can generate the optimal solution to the QADR problem in polynomial time; however it takes more computational time than the initial algorithm.In addition, it was identified that a cloud computing system frequently has a great number of nodes.Zeng et al. (2014) have explained Monetaryand-QoS Aware Replica Placements in Cloud-Based Storage Systems.Here, they used two greedy algorithms such as GS QoS and GS QoS C1, for replication placements in cloud-based storage systems.The assess on data replication process based on cloud computing has been made cleared by Reena and Alone (2014).Now MCMF algorithm is employed to minimize the cost of the data replication and their approach is compared by means of HQFR.As they reflect on extra number of nodes as it is in cloud environment due to which the computation time is high while contrasted to HQFR algorithm.To find a result for this time difficulty mixture of nodes-Technique was brought in MCMF.Additionally this execution was expanded to concern energy consumption in cloud environment.In, Anjali and Rokade (2014) have made cleared the data replication over cloud computing.Now, they employed HQFR [High QoS First Replication] algorithm.Their major goal was to minimize data replication cost along with the QoS requirement.Hence they were proposing one more algorithm which was motivated from MCMF [Minimum Cost Maximum Flow] algorithm.Finally they elucidated a competent plan for data replication on the basis of QoS requirement.
Techniques based on cost effective dynamic data replication: Among the cost effective dynamic data replication, Wei et al. (2010) has made cleared the Cost-Effective Dynamic Replication Management Scheme for Cloud Storage Cluster.This model was elucidated to incarcerate the relationship among accessibility and model number.CDRM influences this model to work out and uphold minimal model number for a specified accessibility requirement.Model placement was based on capacity and blocking possibility of data nodes.CDRM was vigorously redistributing workloads among data nodes in the heterogeneous cloud by adjusting model number and location according to workload changing and node capacity.In addition Li et al. (2011) have elucidated the Cost-Effective Dynamic Data Replication Strategy for dependability in Cloud Data Centre's.Now they employed 3-replicas and data replication strategies were used for data dependability, in this document they made cleared cost-effective dynamic data replication approach which makes easy an incremental replication method to decrease the storage cost and meet the data dependability requirement at the similar time.Similarly, Li et al. (2012) have made cleared the Cost-Effective Mechanism for Cloud Data Reliability Management Based on Proactive Replica Checking.Model placement was based on capacity and blocking possibility of data nodes.By regulating model number and location according to workload changing and node capacity, CDRM was vigorously re-distributing workloads among data nodes in the assorted cloud.They executed CDRM in Hadoop Distributed File System (HDFS) and experiment effects finally show that their CDRM was cost efficient and outperforms default replication management of HDFS in terms of presentation and load balancing for large-scale cloud storage.On the other hand, cost effective image replication in cloud computing as well elucidated in Shen et al. (2014).Now they examined the features of image provisioning the studying the traces gathered from the real-world cloud data centre.From the study effects, they monitored that the overloaded and dynamic requests for some famous images result in degradation and fluctuation of performance and accessibility of the system.Addressing this matter, they made cleared a stochastic model based on queueing theory, which incarcerates the main factors in image provisioning to optimize the number and placement of image replication, so as to supervise the VM images in a cost-effective manner.

Other techniques based on dynamic data replication:
In recent years a few of the other methods as well proposed for dynamic data replication.By means of data replication, Ye et al. (2010) have made cleared the hybrid architecture.In this document, they elucidated a Two-level DHT (TDHT) approach for commonly distributed cloud storage.Initially, they examine the tradeoffs on security and accessibility among TDHT and the conventional pure data partitioning approach (integrated with DHT and called GDHT (global DHT)).The effects demonstrate that TDHT was offered better security than GDHT and more or less the similar level of accessibility as GDHT.To contrast their concert, they propose a Two-Level Access (TLA) protocol for the TDHT approach and contrast it with the Distributed Version Server (DVS) protocol for the GDHT approach.Moreover, Ma et al. (2013) have made cleared An Ensemble of Replication and Erasure Codes for Cloud File Systems.In this document, they elucidated a system named CAROM, an ensemble of replication and crossing out codes, to offer resiliency in cloud file systems with high competence.While upholding the similar stability semantics seen in today's cloud file systems, CAROM offers the advantage of low bandwidth cost, low storage cost and low access latencies.They carry out a largescale assessment by means of real-world file system traces and show that CAROM outperforms replication based schemes in storage cost by up to 60% and crossing out coded schemes in bandwidth cost by up to 43%, while upholding low access latencies close to those in replication based schemes.
In Cloud Environment the Autonomic Data Replication has been made cleared by Gupt and Bala (2013).In this document they executed automatic replication of data from local host to cloud environment.By employing HADOOP Data replication was executed which accumulates the data at different nodes.If one node goes down then data was recovered from other node impeccably.For cloud computing infrastructures a storage system was an important building block.Even though high performance storage servers are the final solution for cloud storage, the execution of inexpensive storage system stays an open matter.To address this setback, the competent cloud storage system was executed in Myint and Naing (2011) with inexpensive and commodity computer nodes that are arranged into PC cluster based datacenter.Hadoop Distributed File System (HDFS) was an open source cloud based storage platform and planned to be organized in low-cost hardware.With HDFS PC Cluster based Cloud Storage System is executed by improving replication management scheme.In the cloud data objects are allocated and repeated in a cluster of commodity nodes located.This system offers optimum model number also as weighting and balancing between the storage server nodes.Similarly, Long et al. (2014) have elucidated the multiobjective offline optimization approach for replica management, in which they analysis the different factors influencing replication decisions such as mean file unavailability, mean service time, load variance, energy consumption and mean access latency as five intentions.It formulates decisions of replication factor and replication layout with an enhanced artificial immune algorithm that develops a set of solution candidates through clone, mutation and selection processes.The algorithm called Multi-objective Optimized Replication Management (MORM) looks for the near optimal solutions by balancing the trade-offs between the five optimization objectives.
In Cloud Storage, the Min Copysets: Derandomizing Replication has been made cleared by Cidon et al. (2013).Their work decouple the mechanisms employed for load balancing from data replication: they employed randomized node choice for load balancing but derandomize node selection for data replication.They demonstrate that MinCopysets offers important improvements in data durability.For instance in a 1000 node cluster under a power outage that kills 1% of the nodes, it decreases the possibility of data loss from 99.7% to 0.02% compared to arbitrary replica selection.Moreover, Loukopoulos and Ahmad (2004) have elucidated the Static and adaptive distributed data replication by means of genetic algorithms.Repeating some of the objects at multiple sites was one feasible solution in reducing network traffic.The decision of what to repeat where needs working out a constraint optimization problem which was NP-complete in common.Such problems were recognized to make bigger the competence of a Genetic Algorithm (GA) to its limits.On the other hand, they made cleared hybrid GA that takes as input the current replica distribution and calculates a novel one by means of knowledge about the network attributes and the changes happened.Continuing in outlook more pragmatic scenarios in today's distributed information environments, they assess these algorithms regarding the storage capacity constraint of each site with deviations in the popularity of objects and as well observe the trade-off among running time and solution quality.

Categorization based on features: For the
Table 5 shows the Categorization based on parameter measure.In data replication, many of the researches handle varies measurement to prove the effectiveness of the work.Here, cost and time measurements are mostly used.The "minimum time" and "minimum cost" are the maximize the system output.Also, many of the researchers use "security" as the major parameter of performance measures.

CONCLUSION
Cloud computing is one of the user oriented technology in which user faces hundreds of thousands of virtualized resources for each task.Here, dynamic data replication is considered as the major factor for cloud environment.In this survey, we have survey varies data replication techniques and tabulated different parameter under cloud and grid environment.Using standard publisher methods, we have offered a survey of the dynamic data replication in cloud computing.Now, 40 articles are recognized from 2003 to 2014 associated to replication.According to the year, these 40 articles are categorized into three kinds.And also we have studied their limitations and time complexity.So we need to improve availability and reliability of cloud computing for data replication.At last, some of the research issue is as well addressed to lead the further research on the similar direction.

Fig. 2 :
Fig. 2: Deployment model of cloud computing cloud, the hybrid cloud and the community cloud.Each of this cloud deployment models can be explained as shown below.

Fig. 3 :
Fig. 3: Architectural diagram of replication in cloud computing learning and other techniques to work out the problems of cloud computing.In this section, we survey and categorize the techniques based on computer vision, which have been developed so far against cloud computing.They can be subdivided into six broad categories: like techniques based on peer to peer architecture, techniques based on multi-tier architecture, techniques based on Hadoop Distributed File System (HDFS), technique based on QoS-Aware data replication, Techniques based on cost effective dynamic data replication and other techniques based on dynamic data replication.

Figure 4
shows the Summary of analysis based on type of architecture based data replication.Here, 2004-2014 are categorized under different method used in the data replication.From the figure we clearly understand 2004-2011 we utilized 13 research paper based on dynamic data replication, totally, 11 papers utilized in the year of 2012 and 2013-2014 we used 16 research paper based on the dynamic data replication.Among 40 research papers, totally eleven papers based on HDFS architecture, only two papers used for tree based

Table 1 :
Categorization based on features from 2003 to 2011 Article from 2003 to 2011 - - -

Table 2 :
Categorization based on features on 2012 Article from 2012 - - -

Table 3 :
Categorization based on features from 2013 to 2014 Article from 2013 to 2014 - -

Table 4
shows the Categorization based on application from 2003 to 2014.Here, we used totally 40 research papers and each paper having the different application.From the Table4, we clearly understand many of the researches used the application of data replication in cloud environment.Here, nearly 32 works are related to the data replication and only 8 works are related to other application.Moreover 33 researches are use the cloud environment remaining seven researches utilize the grid environment.

Table 4 :
Categorization based on application from 2003 to 2014 Article from 2003 to 2014 -

Table 5 :
Categorization based on parameter measure Article from 2003 to 2014