Energy efficient target set selection and buffer management for D2D mobile data offloading

Data offloading offers a significant solution to the problem of explosive rise in mobile data traffic. A naive approach would be to utilize the infrastructure (cellular tower, WiFi, femtocell) or other mobile devices to offload data. However, increasing the number of a cellular towers, WiFi, or femtocells is costlier deal for data delivery. Recently, device-to-device (D2D) paradigm of data communication has emerged out as one of the most promising solutions to deal with cost effective cellular traffic offloading. D2D communication provides a direct communication link between closely located mobile users. Another significant feature of D2D is its content centric nature, which makes it useful in data offloading. In this paper, we have addressed the issue of data offloading in mobile devices and proposed a hybrid model of D2D communication with ad-hoc nature. The paper also considers the issues like memory constraints of the devices, pruning of replicated messages and energy efficiency to increase the lifetime of the battery. Considering all the constraints and trade off, we have modeled our problem into optimal target selection problem and distributed community detection problem, both of which are NP-hard. We propose a clustering algorithm to optimize the cooperative mobile nodes. The proposed algorithm uses the betweenness centrality and k-means for optimizing target set section. Our proposed algorithm requires less time in terms of computational complexity with limited space. We compare it with the community-based approach in terms of load transferred for varying target set sizes for validation. The simulation results of the suggested algorithm may reduce the energy requirements up to 16.7% and is able to accommodate 80% more traffic as compared to the community-based algorithm.


Introduction
The demand for data and internet has increased in the last few years due to the cheap and easily available smartphones and personal digital assistants. Around 71% growth in global mobile data usage was witnessed in the year 2017 which has grown approximately 17 times over the previous five years from 2012 (Index, C.V.N., 2017). Also the overall mobile traffic is expected to be reaching about 77 exabyte by 2022. In 2017, 54\% of mobile data traffic has been released with technologies such as WiFi or femtocells. The study indicates that by 2022 smartphones could reach 90\% of mobile data traffic. The WiFi offload is expected to more dependent over fourth-generation (4G) and fifth-generation (5G) networks than the slower networks. The reason behind the exponential growth in data is because of mobile social applications and multimedia streaming. The data plans are expected to be more generous due to the introduction of 5G networks and the respective offloading percentage is estimated to be around 71% by 2022. The network burden that the service providers will bear is likely to be depleted very soon. Due to the present COVID-19 pandemic like situations, more and more work is relying over data networks through distant networking solutions. There is a tremendous need for technically and economically feasible alternatives to satisfy these high demands of mobile data and prevent blocking networks. There are several solutions which are provided to overcome the limited bandwidth and heavy load situations for networks. An intuitive approach is to build new base stations (BS), increase cellular network capacity by adding smaller cell size to BS or upgrade the cellular network to advanced networks of next generation such as Wi-Max and Long-Term Evolution (Kumar et al., 2013) to handle large volumes of data transmission.
Another solution could be, to use broadband cellular network technologies like 4G/5G to accommodate massive data traffic. However, because of their higher energy usage (Akyildiz et al., 2010) and high settlement cost, these solutions are not adaptable enough to sustain today's mobile internet services. Moreover, the practical need of the hour is to keep pace with the rapid rise in data demands with the available network supply. Other substitutes like WiFi offloading are impressive methods to move from the cellular network naturally, and it has shown a significant improvement in offloading (Fortetsanakis & Papadopouli, 2013). Immediate offloading and deferred offloading are used frequently for offloading. The unlimited availability of WiFi and the sharing of information on-site uses immediate offload. Most advanced mobile phones currently choose WiFi over the cellular interface to transmit on-site data. Data has a due time duration  in deferred offloading and it resumes repeatedly for data offloading until the exchange finishes. In deferred offloading delay may happen because of the WiFi association or movement of the mobile device in and out from the WiFi region. The only limitation of WiFi offloading and cellular offloading is that it requires high financial investment, long process of infrastructure development (Aijaz et al., 2013) low return, and high maintenance cost. Such drawbacks can be overcome by using partially connect (Vahdat et al., 2000) or delay tolerant networks (Jain et al., 2004). Thus, the D2D communication (Kaufman et al., 2013) using the ad-hoc property of nodes, as illustrated in Figure 1 may substitute a cheaper and adaptable solution. It shows that if association like store carry and forward approach ) among the mobile users can be established then opportunistic communication  and Delay Tolerant Networks (DTN)  can be deployed for data offloading. The major advantage of using D2D network is the less monetary cost (Vahdat et al., 2000) and optimal resource utilization (Jain et al., 2004). However, the detection of optimal target-set selection  plays an important role in data offloading. The other challenges are restricted storage on any device, caching of data and unwanted replication of data.
The important benefit of our proposed mobile data discharge method is that it uses the existing framework and the social activity of the mobile node nearly without any financial costs. In order to minimize the mobile data traffic by the vector classification, we aim to analyze how to choose the original target for k users only. The targets set nodes help all subscribed users through their social involvement to transmit data further. For example, the nodes in the transmission range may communicate with each other opportunistically. The information can also be spread by non-target users after it is received by target users or others. To validate our proposed work, we compare our work with real-world data set of MIT (Eagle et al., 2006), INFOCOM (Hui et al., 2005), and NUS (Srinivasan et al., 2006). We compare our proposed protocol with community based approach, greedy approach and heuristic approach in our implementation in Sharma et al. ( 2018), but restrict our comparison in this article only with community-based approach since being the better. This paper makes a valuable contribution as follows: 1. It proposes a hybrid approach for data offloading using D2D, infrastructure-based nodes and mobile nodes. 2. The distributed algorithm is prune to redundant messages at subscriber level. 3. We also propose community formation, so that data dissemination can take less time to propagate the information to the rest of the subscribed users. The remainder of this paper is organized as follows. In Section 2, we have described briefly the relevant work related to data offloading and opportunistic networks. The device model, problem formulation for our work is outlined in Section 3 and Section 4 respectively. Section 5 deals with our offered strategies for data offloading. Section 6 and Section 7 cover the results of the simulation and experimental results respectively. Finally, we finish our conclusions with future scope in Section 8.

Theoretical background
Numerous researchers and practitioners have been interested in wireless network analysis as well as several studies on mobile data transmission have been published. In the last few years, there has been little knowledge of methods and calculations that improvise the overloading conditions of data offloading. We have divided our related work based on different data communication techniques. These techniques are cellular networks, ad-hoc networks using Bluetooth, WiFi offloading, D2D networks and opportunistic networks. The cellular network-based data offloading approach uses GSM, CDMA, AMPS, LTE, or small cells like Microcell, Picocells, Femtocell (Ghosh et al., 2012). The authors in (Ioannidis et al., 2009) propose a subscription-based dynamic-content distribution service with the help of a service provider for mobile users. The paper proposes a new approach for optimum bandwidth allocation by the service provider to upload user content. The research in Zhuo et al., (2011) concentrates mainly on stimulating customers to discharge cellular traffic via Delay Tolerant Network (DTN). The authors in Chuang et al. (2012) use encounter-based frequency and community detection for data forwarding within the same classroom area. The contribution of this work is the selection criterion of primary sources disjointed communities. The authors compare their work with different forwarding strategies like encounter-based forwarding, rarest first forwarding, and random forwarding. To prove their claim, they used NUS dataset. The paper uses social network property for community detection. The limitation of the reviewed work is that the authors consider that the community develops on the basis of location and network from a fully connected graph, which is impractical in the real world. The second technique is the utilization of Bluetooth or WiFi (Ioannidis et al., 2009) based offloading. The researchers in Baier et al. (2016) develop a definition of traffic offload by reducing cell traffic by transferring to local networks such as WiFi. They offer a TOMP model that analyses prospective inter-device connectivity using mobility predictions from smartphone users. The encounter probability is computed as a heuristic parameter to determine the coverage relation. In the research work Lee et al. (2014), the economic benefits were discussed by modeling a two-stage sequence game of the interaction between one client and one source. The authors concentrate on the key issue of the WiFi offload for several sensitive time frames with limited capacity of APs. They used the game theory strategy of Nash equilibrium to solve the bargaining problem. The authors in Jung et al. (2014) propose a solution to increase the per-user throughput while considering the deadline constraints. For improving mobile network capacity, the authors of Osseiran et al. (2009) and Asadi et al., (2014) propose to use a network coding scheme with D2D communication.
They assume a cooperative network where two D2D communications are used to interchange the uplink data among D2D users before the message is transmitted to the base station. The proposed solution was only limited to uplink they have not considered the downlink data management. Bao et al. (2014) have proposed a caching based approach for data transmission in which each device caches the available content. When the device reaches the area containing dense clusters of mobile devices, we identify it as a data spot. In such a scenario the operator instructs the mobile devices to share similar data. Their approach does not consider the redundant data pruning and memory constants. In Jiang et al. (2015), the authors propose the solution on how content sharing by D2D communication can be maximized. The authors have used specified content caching mechanism. It is used to identify the best matching pairs in between the senders and receivers. Such optimization problem is defined in terms of knapsack using selective cache. The sender-receiver problem is solved using bipartite graph properties using maximum weighted match problem. The limitation of this work is they have not considered the partially connected networks and pruning of unwanted messages. Data can be downloaded from the end-user or the ISP (Aijaz et al., 2013). This allows defined operators to reduce the cellular system clog, while offering cost investment funds for the end user, which are focused processed data management and greater accessibility to data transmission. The operators can increase the asset utilization of accessible systems and reduce the congestion of operators, thereby identifying their system assets to different clients by uploading the movement of portable information to accessible correlative system. There are two types of infrastructures based on offloading namely immediate offloading and deferred offloading featured in Valerio et al. (2014). Immediate offloading means using WiFi connectivity without restrictions and on-the-spot sharing of information. Deferred offloading is related to a deadline and continues to share information until the share is completed. This time delay may occur because of problems with infrastructures or because a customer transfers into and out of the area of infrastructure. This delay could occur because of problems with infrastructure associations or because a customer transfers in and out of the infrastructure area. In the worst case, the cell will have to adjust the working phenomenon if the exchange of information does not take place in time. The fourth technique is to utilize the opportunistic communication for data offloading. The work done in Li et al. (2013) proposes a data offloading solution for disruption tolerant networks. This article defines the issue as the 0-1 knapsack with linear restrictions maximizing problem. The problem of maximization is solved by a greedy method based on heuristic record or approximation of records of less time duration. The proposed algorithm is compared with greedy, approximated, and optimal algorithms. The authors in Han and Srinivasan (2012) and Han et al. (2013) exploit the importance of target set selection problem using opportunistic communications. They have proposed greedy algorithm, heuristics-based algorithm, and random algorithm using mobile social networks for data offloading. They used the regularity of migration flows as a key factor of opportunistic communication to suggest social interaction. The authors justify the selection problem as sub-modular and then introduce the greedy algorithm before applying the heuristic approach based on the patterns of human mobility. The comparison among greedy, heuristic and random implementation reveal the greedy approaches to be best in terms of practical implementation and the global information needs. The authors in Valerio et al. (2014) have identified two major sub-problems for data offloading. The first problem is deciding the number of nodes to be assigned for the diffusion of content and, second, which nodes are more efficient and useful for accelerating the dissemination process. The authors have used a reinforcement learning method based on actor-critic approach to solve the first problem. For the second problem, they have used the acknowledgment messages information to establish the heuristic behavior for message dissemination. In Barbera et al. (2014) the authors propose an opportunistic communication based data offloading. In this article the authors choose a subset of important VIP clients on the basis of investigation of contact design between end-clients using centrality and page-rank. These VIP delegations help in data forwarding using social importance and mobility. It uses VIP methods that are listed as reliant on global blind promotion and global greed. The issue of the neighborhood VIP delegation is also taken into consideration, based on the k-clique group algorithm, that most repetitive meetings between people usually take place at the same place. The significance of a node is evaluated on the basis of between centrality, closeness, centrality and page ranking. To validate their result, they have used real data sets of Dartmouth and Taxis. Other works like Han and Srinivasan (2012) and Wang et al. (2014) propose traffic offloading approaches being helped by social network services (SNS) via opportunistic sharing in mobile social networks. The emphasis of Zhuo et al. (2011) is over stimulating clients to unload mobile traffic using DTNs. The authors have proposed an incentive framework for down-linking mobile traffic offloading based on an auction mechanism.
The major drawback identified in the literature survey is to focus over target set selection for heterogeneous networks focusing on the problem of multiple communities of users simultaneously in a real time environment. We try to utilize the social behavior based realistic environments using D2D communications to enable opportunistic communications amongst users.

System model and problem definition
Our concern is to model the practical behavior of the opportunistic network which may or may not be completely connected taking into consideration the energy dissipation across them. We enable both opportunistic communications using the devices.

Issues, Controversies and Problems
The literature suffers the shortcoming of suitable well suited model which could relate all types of opportunistic communications. We have considered a hybrid of two scenarios, infrastructure-based sparsely connected networks. We assume that cellular towers = , , . . form a connected graph. To make scenario simplified, we have considered a scenario of single access point accounting for the connection for mobile users visualized as a set = , , . . . . . . . . The mobile users who are in the range of access point can download/upload data bytes at . We denote the data bytes as = 1 , 2 , 3 , . . , where is defined back as the composite function such that )] ( ), represents the respective message size held by node Mi in terms of bytes and ) (i  denotes the time taken it to transfer data .
We need to consider for the practical observations, that the data items have been organized in the ascending order without the loss of generality. We assume that each object of the data is indivisible, at the same time. In addition, we must consider the transfer of data in terms of successful upload and download before the expiry of the TTL time, which can be visualized as the transmission deadline of such data items. The cellular coverage is assumed to have clusters, especially in indoor where the cellular tower is deployed in such a way that it covers all the mobile users. We have considered it because sometimes the coverage may be weak in the center of the massive structures. We have excellent coverage on the street like cases, but when moving further indoor, the cellular signal deteriorates quickly. Although inadequate indoor coverage can be solved by adding small cells, it requires a considerable amount of time to implement and becomes costly. Hence, we propose that each device is capable of D2D communication for indoor communication or spotty scenarios. For rural environment, we assume that the cellular towers are sparsely deployed and sometimes create a partially connected network. In such a scenario, we assume that nodes are capable of forming an ad-hoc network. We enlist our assumptions as follows: 1. All mobile devices form a disconnected undirected graph. Vertices of the graph represent the mobile devices. 2. In common with all nodes, the data are replicated if available prior to a specified delivery date. 3. Every network has interconnections focused on adaptive interactivity. 4. The list of its neighbors can be shared by every node.
The initial step of any data offloading procedure is target selection problem as they are crucial to start the dissemination data properly. The community-based source selection problem is one of the approaches for source selection. Community-based source selection problem is NP-complete based on the reduction from the dominating set problem. To overcome this issue, heuristic approach is being used. The next objective is pruning of message as data flooded in, before TTL. Due to the limitations of local storage, sometimes it becomes infeasible to cache all messages. Apart from it, there are scenarios where multiple devices that can send/receive requests, which will lead to multiple messages of the same time. Hence the main challenge is how to improve the message, offloaded by using D2D communication.

Optimal Target set Selection Algorithm
Our problem is selecting a subset of users belonging to the same or different communities as defined in system model and offloads the data traffic otherwise meant for the cellular network. We consider a single community S of users based on the subscription of the same service. Our aim is to subdivide them at various levels of identification and optimize the subset selection procedure. Several authors described in the literature also encounter the similar kind of problem, but the solution proposed generally has the limitation of static nodes. The existing solution becomes impractical to realize since the users are mobile, and in most cases the authors inferred a relationship based on the history of experiences, which identifies a heuristic relationship. Taking these flaws into account, we concentrate on realization of an optimal and realistic implementation for allocation of target sets. In accordance with the change in nodes assignment to many target classes, we suggest a dynamic algorithm. TSS optimization is categorized into two phases: the first phase involves the nodes within a single access point and the second phase involves TSS optimization avoiding the overlapping communities and use message pruning. It is helped on the basis of selection of subscribers for similar service. We describe the algorithmic optimization for identifying initial target set collection on the basis of identical interests opted for subscription. The second step of the algorithmic description consists of the detecting the subscribers with more than one target sets category. After the optimized target set for a single group is determined, the findings for the overlapping communities are extended for multiple groups. Initially, a set of mobile users = , , , . . . . , are assumed to be connected with access point. The inter-connection between mobile devices is represented as the graph where each node denotes the individual mobile device and edge denotes the communication link between them. All mobile devices create a list of neighbors which contains the information about devices in proximity. Mobile devices periodically share the information with access points. The clustering of nodes is done using kmeans algorithm, input parameters of betweenness and cut-vertex. Then the cluster based CV node uses the overlapping node to find out the other clients for message dissemination. The access point on receiving this information, computes the ) ( i n BI Betweenness Centrality (Daly et al., 2008) of all mobile devices using the following Eq. (1).

Fig. 2. Increasing traffic in undirected graph of nodes
where represents the number of edges connecting node & node and identifies the number of paths which include . In addition to betweenness centrality, a function cut-vertex CV is also computed using similar − algorithm used in (Cohen et al., 2008). It is derived on the basis of the graph theory, where the − subgraph for a graph is defined as the largest subgraph with all interconnected edges belonging to at least − 2 triangles. The betweenness value of each node is saved in data structure in sorted manner and CV . In the next step, algorithm creates a clusters of using -means clustering. On the basis of betweenness centrality and location of each nodes, access point creates a dimensional observation vector = , , , . . , . The -means clustering partition the observations into sets = , , ,. . . , so as to minimize the within cluster sum of squares. The insight of -means clustering is evaluated using Equation 2 as follows: where is the mean of points in . Provided that the overall variance is constant, this implies that the square based deviations between points in different clusters are maximized. Thus, the equivalence is derived using the identity in Equation 3 as follows: The values are used to find the optimal target set , The subset of clusters with most number of nodes are highly capable of disseminating information between different communities. This is accomplished by the following algorithm.

Algorithm 1. Algorithm executed at access point for optimized target set identification Input: Set of users N, Neighbor list of nodes
, Location of nodes . Output: Clusters and .
Step 2: For each user in the range, the access point accesses Nbrlist(ni) Step 3: For each user in the range, the AP evaluates BI(ni).
Step 4: The AP computes using and − method.
Step 5: Create -dimensional observations vector using location and BI(ni).
Step 6: Create cluster C applying k-means method over matrix X.
Step 7: Return and

Overlapping node detection and Summary vector computation
The second algorithm works for the detection of overlapping node ( ) detection and summary vector ( ) computation. Since the network is dynamic, so the key point of this approach is to use only local information of the network as global information requires high communication cost, processing, and energy drainage. Egocentric betweenness is used in the algorithm as it detects belonging of one node to multiple communities yielding overlapping nodes. The major benefit of egocentric betweenness is that each node does not need global knowledge of the network, but needs connectivity information only up to 2 hop neighbors. Table 1 shows the values of nodes and their overlapping nodes depicted in Fig. 2. The concept of a SV is based on a bloom filter that is used to reduce the excessive message and redundant messages. Each node allocated to a buffer keeps track of the latest message exchange with the rest, so that no redundant communications are available. The buffer stores messages that come from other mobile nodes and collects them. The indexes of all messages need to be saved in a hash table and held in a single identifier known as a summary vector in order to manage every node effectively. When a node enters the proximity of another node, the summary vectors are shared. The message requested from a neighbor determines each summary vector exchange. The nodes also save the time when a certain node was exchanged to prevent redundant message exchange. The algorithm for computation of ego node and pruning of redundant message is defined in the following steps:

Algorithm 2. Overlapping node detection and summary vector based message pruning algorithm
Step 1: Each node evaluates its egocentric betweenness .
Step 2: Node exchanges with all of its neighbors in the and computes the threshold value: = ∑ , where ( = 1,2,3, . . , ) represents the th neighbor of node and represents the egocentric betweenness of neighbor .
Step 3: Node finds the overlapping node from its neighbor set on the basis of comparisons of ( ) and ( ). If ( ) > ( ), then the node is declared as overlapping node ( ) or the vice versa.
Step 4: The identified overlapping nodes ( ) from the previous step share with each node and compare them. If there exist common entries in between, then ignore the final target sets , else share the data of uncommon entries.
Step 5: Return final target set .
We explain the algorithm to be implemented by the access point now. In the first step every node , finds the ego betweenness , to determine its node usability for its neighbors in the list . Egocentric betweenness is characterized as having the probability of a node lying between two nodes. More the value, therefore it is more likely for a node to be linked to other nodes. , contains some walks of length 2 between node i and j considering A as the adjacency matrix of a node. We need not to evaluate the smaller associations like walks of unit length since since it will not contribute to betweenness. In the end 1 − , is computed, with the number of geodesic connections of length 2 in between the node i and node j. Each node i exchanges the ego betweenness value given by with its neighbors. Each node then decides the nodes between its neighbours, which may have a greater effect on the data dissemination in the network. It is termed as sensitive OL. Finally, the cut-vertex from Algorithm 1 and OLs periodically update SV message sharing process in anti-entropy session. Presuming two nodes and node encounter each other. Let be a summary vector of node non-overlapping node and − be a summary vector of an overlapping node. Upon encounter, they exchange their summary vectors and compares it with buffered messages and accordingly requests the missing data.

Complexity analysis
The first phase of the algorithm computes the betweenness centrality using . This step requires ( ) space complexity for matrix creation and ( ) as runtime, which can be reduced to ( ) using the randomized algorithm. The second phase of algorithm computes Cutvertex using k-truss algorithm which ( + ) as run time, E is the edge in the graph created by n nodes at a particular instance. Finally, k-means algorithm for clustering, which needed ( )time complexity, where n is the number of mobile nodes k is the numbers of clusters created and d is the dimension constant. So the dominating complexity in all the three phases is ( ) as ≥ 2, space complexity is ( ) and communication cost is (1). In Algorithm 2 the communication cost is the same as of the first phase of Algorithm 1, and the space complexity depends on the size of the bloom filter used for the creation of the summary vector.

Simulation Results
We have divided our simulation work into two parts; in the first part, we have evaluated our proposed algorithm standalone for the analysis of D2D communication, and in the second part, a comparative analysis is done with a community-based approach (Valerio et al., 2014).

Standalone analysis
In this subsection, we have evaluated the following parameters for analyzing the performance: 1. Energy utilization in terms of battery consumption. 2. Delays involved with different target set sizes. 3. Signal-to-noise ratio-based unpredictability and mean square error. A total of 1000 nodes has been used in the networks in an area of 1000 × 1000 meters varied from 100 to 1000. Clustering of nodes is done using k-means algorithm and input parameters of betweenness and cutvertex. Then the cluster node uses the overlapping node to find out the other clients for message dissemination. The simulation parameter is illustrated in Table 2 and the simulation is performed using MATLAB.

Energy Consumption
In order to compare the energy drainage of the clients, their battery usages are studied. The Table 3 depicts that Bluetooth consumes maximum energy, whereas D2D consumes the least. The table also indicates that additional data is taken from other devices while WiFi is in sleep mode. The results indicate such implementation to use more energy. The difference in battery consumption is approximately 95% less in case of D2D over that of the Bluetooth. The energy consumption of Bluetooth is significantly more than D2D because it uses the internal power of the device.

Delay with respect to load transfer
The Fig. 4 depicts the transfer load delay in milliseconds against different target sets. In this figure, x − axis indicates the different sizes of the target set, whereas the y − axis indicates the heuristics-based transfer load. It shows the variation of load delay for fixed target of 1000 nodes during a 6 hours session. We have depicted the results for five delay scenarios of half an hour, an hour and its double followed by three hours and its double. However, we need to re-iterate for the remaining session periods in order to check the results for similar time duration. Thus, we have double iterations with 3 hours delay. We repeat the iterations thrice during 2-hour repetitions. Likewise, we repeat six iterations with 1-hour delay allowed and twelve such iterations for 30-minute reminders. The Fig. 4 clearly depicts that the delay is minimum for the load with lesser delay scenario. We observe that the load delay has no significant contribution with respect to different delays allowed. Hence, we focus over other aspects for comparison.

Mean Square Error(MSE) and Signal to Noise Ratio(SNR)Variance
Battery consumption is defined as the deterioration of energy because of usage. We have compared it for five time-slabs alike as in the previous subsection. MSE stands for the mean square error, and mathematically it is evaluated using , where ngs is noisy group strength, nngs is non noisy group strength, and NSC represents the total number of nodes within a single cluster. SNR means the signal to noise ratio which assesses the unpredictability of the total group against its total strength. Mathematically, it can be defined as SNR=MaxBitValue 2 /MSE. The Fig. 5 illustrates different load variations and their deviations for the mean square error. The MSE rises with the increase in load. The results show that the increase in load begins steadily to increase until the third load. But, the MSE switches suddenly faster as it enters the fourth load variance. This is because several noisy nodes are attached to it at the third load variation. This makes it much quicker to intensify. The instance where MSE rises abruptly, the SNR also gets lowered. The maximum value of MSE is approximately 14 × 10 7 , whereas the minimum value of MSE is about unity. The x-axis is a load variation and the y-axis is a value for the MSE or SNR respectively. The analysis is done for the number of four load variations. The value of SNR ranges in between 0.4 and 2. The Fig. 5 depicts that SNR steadily increases until the third load variation and then decreases because it gets extra noisy. The mean square error at the 3rd iteration is high and then SNR begins to decrease as shown in the figure. For the third load we obtained the highest SNR 1,79, but for the first and second load variations the lowest possible SNR is measured.

Comparative Analysis
We have contrasted our proposed clustering algorithm with the community-based algorithm for comparative study. The comparison is made on the factors of load transfer, latency and buffer usage. The obtained result is illustrated in the Figs. 6, 7, 8 respectively. The simulation results reveal our proposed algorithm offers about 75% to 80% less delay in data traffic offloaded from the access point on an average with approximately 16.7% more energy optimization in five iterations. Result in Fig. 7 shows that our proposed algorithm also outperforms the community-based approach in terms of latency as the number of nodes steadily increases from 100 to 1000. The maximum latency is observed as 30% for smaller target set size which reduces further as we go on to increase the count of nodes in the target set. The reason for this downfall of latency is our proposed approach which uses the concept of a summary vector and overlapping nodes helping in faster delivery of data. In this simulation, we have considered the buffer size of each mobile node fixed of around 50 MB. However, we have also analyzed the impact over latency with varying buffer size as shown in the Fig. 8. We gradually increase the size of the message, which is to be disseminated from 10MB to 26MB. To make the simulation delay-tolerant, we considered that each of the data could be delayed up to 2 hours. As we go on to increase the buffer capacity, the latency also tends to increase. The result shows that with the enhanced buffer size the latency increases. The results also show that our proposed approach can perform better than the community-based approach for known size of target set community.

Conclusion and future scope
In this research paper, we have proposed an efficient D2D communication using k-means clustering-based algorithmic approach for data offloading. The proposed work not only efficiently offloads the mobile data but also optimizes the network performance in terms of energy consumption, buffer, and latency. The limitation of this work is that it does not cover the mobility aspect of the nodes in depth. However, it has fair applications to offloading in limited geographical situations such as conferences, educational institutes, residential locations especially in the pandemic like situations of COVID. The future aspect of our work will also carry the usage of meta-heuristic algorithms to optimize target set selection automatically and to reduce the delivery time of data items in disconnected data. The aspect of all users to be available as helpers needs to be exploited for the trust-based evaluation and incentive evaluation.