Efficient Version Management for Content Sharing Using File Splitting and Differences in Hybrid Peer-to-Peer Network

We proposed an efficient content sharing strategy using file splitting and difference between versions in hybrid Peer-to-Peer (P2P) networks in the past. In this strategy, when a user requests a content item that is updated and has some versions, the user can get it from the network by retrieving the other version of the content replica and the difference from the requested version. This way of content sharing can be expected to accomplish effective and flexible operation. However, when many peers are concentrated within a certain range in the network, there may be a peer in which any replica is not placed due to content control procedure, and storage may not always be used effectively in some cases. In this research, we improve this disadvantage, and evaluate its effectiveness by assuming a more realistic network topology. 


I. INTRODUCTION
In content sharing using a P2P network [1], there is a possibility that a user subscribing to this service cannot acquire or refer to some shared contents due to peers' failure, departure from the network, etc.As a countermeasure against this, a method of placing replicas of a content on plural peers is used.By this method, since one content can be referred from plural peers, even when a certain content holding peer cannot be accessed, the possibility of referring to the same content from another peer can be enhanced.In addition, since plural peers hold a replica of the same content, it is possible to distribute accesses to the content to plural peers.Furthermore, because the possibility that a replica of the requested content exists near the content requesting peer becomes high, the load on the network can be suppressed [2], [3].
In an environment where there exist plural replicas of a content on the network, when the content's update occurs at a peer, a situation occurs in which both old and new replicas exist at the same time.Thereafter if only updated Manuscript received May 10, 2018; revised August 1, 2018.
content is used, it is necessary to quickly replace every replica with updated one so as to maintain consistency of the content [4].On the other hand, if users need both old and new versions of the content, holding replicas of the two as they are, requires a lot of storage and efficiency of the content sharing might be reduced.Therefore, in this case, it is conceivable to efficiently share the content in which there exist multiple versions by holding either version and the difference between the two, assuming the size of the difference is smaller than that of another version.
Various studies on content sharing [5]- [11] or version management [12], [13] in P2P networks have been conducted so far.However, consideration on content sharing in environments where multiple versions of the same content exist has not been investigated sufficiently.Therefore, in our previous research [14], we propose an efficient content sharing method using a difference between contents before and after updating in hybrid P2P network environment.Also, by dividing a replica of each content version into multiple blocks and storing them in the peers, we made effective use of peers' storages.However, in this method, each peer has a reference range defined by a certain number of hops, and each content replica that exists within the range is provided with its management peer so that there are no plural management peers of a same content in a same reference range.Therefore, when there are many peers within a reference range, there is a problem that peers possessing no or few replicas exist and the storages of peers are not necessarily effectively used.Also, in the evaluation of this method, we use a lattice network as a network topology for the simulations, but this cannot be said to conform to the real communication networks.
In this paper, we propose an improved method which makes each storage of a peer more effectively utilized, by defining the reference range of each peer with the number of hops from the peer and the number of peers to be included in the range, and thereby controlling the number of peers within the reference range.In addition, we verify the effectiveness of the proposed method by computer simulation using a more realistic BA model as the network topology.
In the following, we firstly show the assumption of the environment where the proposed method works and define the problem to be solved in this paper in Section II.Secondly, we illustrate the proposed method in Section III, and then, after the evaluation of the proposal in Section IV, we finally conclude the paper in Section V.

II. ASSUMED ENVIRONMENT AND PROBLEM DEFINITION
In this paper, we assume an environment in which each peer requests a content held in the storage of another peer as its replica and can acquire it in a hybrid P2P network.In addition, we set some other assumptions as follows.
 Each peer provides its self-decided sized storage capacity to the content sharing system, and the system can freely use this for content sharing. There is no substantial difference between the original of each content and its replica. A replica of a content which is completed itself is called a full object, and a replica of a certain version can be reproduced from a full object replica of another version patched with the difference data between the two versions. A replica can be divided into small blocks and be stored in different peers separately. Each peer always may leave the network or join again according to the user's decision or the network environment.The costs required for content sharing assumed in this paper is as follows.
a) Network Cost It is a load that occurs in the network when referring, deploying, and relocating contents, and in this paper, it is assumed to be proportional to the product of the capacity of contents and the moving distance (number of hops) on the network.b) Contents Loss It is the cost that occurs when specific content in the network disappears due to peer dropping out of the network or replicas' deletion and the content requested by the user cannot be acquired (the user's purpose could not be achieved).In this paper, it is defined as the total number of times that the users could not obtain the requested contents' replicas, divided by the total number of times that the users requested contents.
In this paper, we will pursue a content sharing method that minimizes network cost and contents loss in the above environment.

III. PROPOSED METHOD
Each content is divided into plural blocks as shown in Fig. 1, and each block generally has a different version configuration.For each block of each content, there is a regular data file of a certain version and a difference between the regular data file and that of another version occurred by updating.In this paper, the former regular file is called a "full object."A content records the number of reference times for each version of each block, and determines the usefulness of each version accordingly.
A range determined from a peer by a procedure to be described later is referred to as a reference range of the peer.In addition, each peer becomes a peer having the administrative authority of its own contents (hereinafter referred to as a replica management peer) in the initial state.A replica management peer manages the full objects and the differences regarding the contents to be managed within its own reference range.When joining the network, the newly joining peer becomes a replica management peer of its possessing content replicas.
Proposed Method's Procedures: We have proposed an efficient content sharing method (i.e., conventional method) that reduces network cost and content loss in an environment where updates to shared contents occur and both updated new version of a content and its old version are equally worthwhile [14].However, in the conventional method, since the reference range is determined only by the number of hops and two different replica management peers of the same content are controlled not to be located within the same reference range of a peer, peers which do not store any content replicas exist, and this causes insufficiently effective utilization of the peers' storages.Therefore, in this paper, we change the method so that the reference range is determined by both the number of hops from a peer and the number of peers within the range.This change makes the method form sufficient ranges and utilize sufficient amount of storage for more effective content sharing.The concrete modified part of the procedure is shown below, as especially the difference from the conventional method.For other parts of the procedure of the proposed method, see the conventional method shown in [14].<Procedure to determine the reference range> When a peer joins the network, the following steps are executed.After the following procedure, if the replica management peer of the content which is also held by the joining peer too exists within the reference range of the joining peer, the joining peer keeps holding only the differences to make the versions of full objects which are not managed by the replica management peer, and discards the other components concerning the content.The differences held by the joining peer are also managed by the replica management peer.
Note that as initial state, peers included in the reference range of each peer are determined by selecting up to R th from the peers within the range of H th hops in descending order of their degrees.
1. Set all peers within H th hops from the joining peer as reference candidate peers.2. Among the reference candidate peers, the peers which has more than R th peers in its reference range is excluded from the candidates.3.If there remain reference candidate peers, one of the peers that is closest to the joining peer is selected, this peer and the joining peer include each other in each of their reference ranges, and the selected peer is excluded from the candidates.
If not, spread H th by 1, add newly included peers to the candidate peers, and go back to step 1. 4. If the number of peers within the reference range of the joining peer is less than R th , go back to 3. Otherwise, this procedure is terminated.When a peer leaves the network, the processing is executed in the following procedure.
1. Let all peers including the leaving peer in the reference range be a set of peers.2. Select one peer from the set, and find peers that are within the range of H th hops from the selected peer and also are not included in the reference range of the selected peer.If such a peer is not found, go to the beginning of step 2 if there are peers remaining in the set of peers.If there remains no peer in the set, terminate this procedure.3.Among the peers found in step 2, the peer which is closest to the peer selected in step 2 and the selected peer itself include each other in their reference range.4. Exclude the peer selected in step 2 from the set of peers and go back to step 2 if at least a peer still remains in the set.If there remains no peer in the set, terminate this procedure.

IV. EVALUATION
In this paper, we evaluate the proposed method by computer simulations.

A. Evaluation Criteria
In this research, among the proposed method and some methods for comparison, one to minimize network cost and contents loss is highly evaluated.

B. Methods to Be Compared
In this research, the conventional method proposed in [14], owner replication method [15], and simplified BitTorrent (hereinafter referred to as BT method) are used as the methods compared with the proposed method.Note that in both cases, the definition of content replicas or content replica components' usefulness is the same as the proposed method.Brief explanations of the methods are shown below.

1) Conventional method
Reference range is determined by the threshold H th, and each content replica component is given its management peer.When a content is requested, if there is a replica management peer of the same content within the reference range of the content requesting peer, the replica is only referred by the requesting peer and not replicated on the peer.If there is no replica management peer in the area, the content replication process is executed.A content replica that cannot be held in the located peer because of lacking its vacant storage capacity is attempted to be transferred on a component basis to another peer with free storage located within the reference range of the replica management peer.Furthermore, when a peer's leaving the network, the content replicas held by the peer are supplemented to its neighboring peer.See [14] for details.
2) Owner replication method When a content requesting peer acquires its requested content's replica from a peer holding the replica, the replica is replicated and held in the requesting peer's storage to provide it to the other peers.If there is no sufficient space to replicate in the storage, unnecessary replicas are selectively eliminated for all replicas in the storage including acquired one so as to maximize their usefulness.
3) BT method Each content replica is divided into plural blocks and allocated to some peers.When requesting a content, for each block of requested content, a peer is randomly selected from the peers that hold the replica of the requested version, and then, for each block, the requesting peer acquires replicas of the requested version of the requested content from those peers.When acquiring a set of replicas, the requesting peer holds the set if there is sufficient space in its storage, and thereafter responds to the acquisition requests of other peers.If there is no sufficient vacancy in the storage, sort out the content replicas so that the total utility is maximized in the storage and retain what remains.

C. Simulation Conditions
Simulation parameters and their values set for the evaluations are shown in Table I.
In this simulation, a network of BA model topology with initially deployed 1200 active peers is created.The number of active peers on the network varies around 1,200 which is the same as the initial value as peers join and leave at almost the same arrival rate.The number of kinds of contents is 2400, and in all methods, a replica of each content is stored in a single peer randomly selected from the active peers in initial state.The reference frequency is recalculated every fixed time T, and T is set to 1,500 [unit time].In addition, the number of divisions of a single content in the proposed method and the BT method is set to 4 or 8, and the threshold C sup for complementing the content is 3% (See [14]).
Contents requested by each peer is decided according to user's preference, which is defined by the following two-phased procedure.First, the breadth of preference of each user is determined (Step 1).Here, the breadth of preference means the number of kinds of contents matching the preference of each user, and is given according to the normal distribution with an average of 30 and standard deviation of 5 in actual simulations.Next, according to the breadth determined in the Step 1, contents matching the preference are determined for each user (Step 2).In the actual simulations, the probability that each content is selected is given by using Zipf distribution, and contents are allocated so as not to exceed the breadth of the user's preference.The contents allocated to a user are requested as the user's contents requests according to Zipf distribution.For reaching the steady state of the simulation earlier, from the beginning of the simulation to 500 [unit time], peers' leaving and joining the network are discontinued, the requests for content are executed according to Poisson distribution with the arrival rate λ req = 36 (3 [%] of the number of average active peers in the network), and the number of contents updates, which are decided according to Poisson distribution with arrival rate λ update = 0.5 per unit time, occur at randomly selected peers, concerning the content selected in the same way as content requests.
After 50 [unit time] from the beginning, the simulation is assumed to reach the steady state, although the arrival rate of the content request λ req is kept at 36, the arrival rate of the content update λ update is reset to 0.1, and continue the simulation in the environment where randomly selected peers join and leave the network in each unit time in the condition where the number of joining and leaving peers are determined according to Poisson distribution with arrival rate λ mov =1, and each joining peer has replicas of contents, the number of which is decided according to normal distribution with an average of 3 and a standard deviation of 1.The kinds of contents possessed by the joining peers is determined in the same way as content requests.
The storage capacity of each peer is set to a value according to the normal distribution with an average of 5 [GB] and a standard deviation of 1 [GB].The size of a content difference of a block is supposed to be 1% to 50% of the size of the full object content of the block, and assumed to be given according to uniform distribution.

D. Simulation Results and Discussions
The results of the simulations are shown in Fig. 2 and Fig. 3.
As shown in Fig. 2, proposed method achieves the lowest network cost in the case where the content replicas are divided into eight blocks.On the other hand, in BT method, the network cost is the largest compared with the other methods when dividing the content replicas into four.However, if the content replica is divided into eight blocks, network cost is reduced better than owner replication method after 600 [unit time] from the beginning of the simulation.The reason is as follows.In the proposed method and the conventional method, there may be a case where it is sufficient to request only the difference between versions at the time of content requests, and the network cost generally becomes small.In addition, since each content is divided into plural blocks, there are cases where it is possible to acquire the requested content by only requesting small number of blocks, which also becomes a factor of reducing the network cost.Furthermore, as the number of contents divisions increases, the storage capacity of each peer can be effectively utilized, and the number of content replicas that can be held in the entire network increases, then Journal of Advances in Information Technology Vol. 9, No. 3, August 2018 possibility of acquiring requested contents from neighboring peers increases.This fact is also another factor to suppress the network cost.
On top of that, because proposed method and the conventional method do not allocate plural management peers of the same content within the reference range of a single peer, proposed method which limits a reference range by the number of its including peers can effectively utilize the storages of the peers within the reference range by deploying only useful content replicas.On the other hand, in the conventional method, because reference range of a peer is defined only by the distance from the peer, there tends to exist a lot of peers in a single reference range and then there tends to exist a lot of peers with storages which are not effectively utilized.This causes larger network cost than proposed method in conventional method.
In the owner replication method, the number of replicas held in a peer is smaller than that of the other methods.For this reason, the content providing peers are likely to be far away at the time of content request, and since a content replica is acquired as full objects, the network cost tends to become larger.
In the BT method, since content blocks are requested not only from neighboring peers but also peers at various positions at the time of content requesting, the content providing peers are likely to be farer way than in the cases of proposed method and owner replication method.However, BA model is adopted as the network topology in this time of simulations, and the average distances between peers are relatively smaller than in the case of using lattice topology, and network cost does not become much larger than the other methods (about 50% larger this time, but double in the case of using lattice topology, shown in [14]).Also, just like the proposed method, since each content is divided into plural blocks, the requesting content sometimes can be acquired only by retrieving just the part of the whole blocks, and especially in the case of dividing the content into eight blocks, the network cost became smaller than that of the owner replication method after 600 [unit time] from the beginning.
In owner replication method, network increases from the beginning to 2700 [unit time], but it starts to decrease after that.As for the reason of increase, the network load for acquiring the requesting content increases due to the increase of the number of versions of each content, and the increase of the opportunities of contents requesting from distant peers.
The other methods' increasing tendency is caused for the same reason.As for the reason of decrease, along with the increase of the number of versions of each content, content replicas that cannot be retained in the storage of the peers start to disappear from the network, then, the number of contents not to be found increases, and as a result, the traffic for content acquisition decreases.
In proposed method, conventional method, and BT method, network cost generally decreases as the number of content divisions increases.This is because in each method, by dividing a content into smaller pieces, the storage capacity of each peer can be effectively utilized without waste.
As shown in Fig. 3, from the beginning to 1600 [unit time], proposed, conventional and owner replication methods constrain the contents loss almost to the same extent, and after that, proposed and conventional methods are suppressing the loss to the same extent than other methods.The reason for this is as follows.From the beginning to 1600 [unit time], content loss can be suppressed because there is enough storage capacity to hold contents in each peer in any case of adopting proposed method, conventional method, or owner replication method.However, since then, only proposed and conventional methods can hold a version of a content as the difference from another version of the same content, then, the storage capacity required for storing a version can be suppressed, and as a result, more versions of each content can be held on the network than the other methods.In addition, since each content replica is divided into small blocks, a small free area of each storage is also not wasted and can contribute to holding contents.Since the content replicas are relocated in proposed and conventional methods, network cost required for the relocations occurs, but at the same time, the storage of each peer can be effectively used, then, the number of replicas that can be held throughout the network is increased, and as a result, content loss is suppressed also for this reason.
Furthermore, in proposed method, the reference range is limited by the number of included peers, but content loss can be suppressed to the same extent as in conventional method because sufficient number of useful content replicas can be held within this range.On the other hand, owner replication method tends to run short of the storage capacity of the peers, and the content loss becomes the maximum after 2,700 [unit time].
In proposed, conventional, and BT methods, contents loss decreases with the increase in the number of divisions of contents.This is because the storage of each peer can be effectively utilized by dividing the contents into smaller pieces, and it can be said that the division of contents is effective in reducing both network cost and contents loss.In all methods, contents loss tends to increase with the passage of time.This is because in this simulation, all versions of all contents that appeared on the network in the past are targeted for the requesting contents.The contents to be requested increases with the passage of time, but since the storage capacity is finite, this result is obtained.

V. CONCLUSION
In this paper, we proposed a newly improved version management method to efficiently share contents with multiple versions, when updates occurred in a hybrid P2P network.Specifically, we added a mechanism for determining the reference range by the number of peers included in the range as well as by the number of hops, to the method we have proposed so far in order to effectively utilize the storage of each peer.And then, we evaluated the effectiveness of the improved method by computer simulations.As a result, the improved method successfully suppressed content loss, and network cost.This confirmed that the proposed method is sufficiently effective.
As our future work, we plan to investigate the influence of increasing the division number of content replicas on the effectiveness of the proposed method.

Figure 1 .
Figure 1.A version configuration of a content.

Figure 2 .
Figure 2. Relationship between elapsed unit time and network cost.

Figure 3 .
Figure 3. Relationship between elapsed time and content loss.

TABLE I .
SIMULATION PARAMETERS