Inferring Ties in Social IoT Using Location-Based Networks and Identification of Hidden Suspicious Ties

Stochastic Internet of +ings (IoT)-based communication behavior of the progressing world is tremendously impacting social networks. +e growth of social networks helps to quantify the effect on the Social Internet of+ings (SIoT). Multiple existences of two persons at several geographical locations in different time frames hint to predict the social connection. We investigate the extent to which social ties between people can be inferred by critically reviewing the social networks. Our study used Chinese telecommunication-based anonymized caller data records (CDRs) and two openly available location-based social network data sets, Brightkite and Gowalla. Our research identified social ties based on mobile communication data and further exploits communication reasons based on geographical location.+is paper presents an inference framework that predicts the missing ties as suspicious social connections using pipe and filter architecture-based inference framework. It highlights the secret relationship of users, which does not exist in real data. +e proposed framework consists of two major parts. Firstly, users’ cooccurrence based on the mutual location in a specific time frame is computed and inferred as social ties. Results are investigated based upon the cooccurrence count, the gap time threshold values, and mutual friend count values. Secondly, the detail about direct connections is collected and cross-related to the inferred results using Precision and Recall evaluation measures. In the later part of the research, we examine the false-positive results methodically by studying the human cooccurrence patterns to identify hidden relationships using a social activity. +e outcomes indicate that the proposed approach achieves comprehensive results that further support the theory of suspicious ties.

further classified as strong or weak depending upon the frequency of communication, number of times, emotional attachment, number of mutual ties, relationship actions, and a combination of these mentioned parameters [5]. Equation (1) quantifies the strength of social ties, which is denoted by weight, such that higher weight tells stronger ties and vice versa [10]. wAB represents the weight of social tie between Node A and Node B, while CA and CB represent the degree of Node A and Node B; cAB is the number of mutual nodes between A and B. Community structures were also found one of the main reasons for social tie strength [11]; it was found that people from the same communities have strong ties as compared to different communities [5]: Various models and techniques are developed to infer the social network based on inadequate aspects [7,12]. One of the specific categories belonging to such inference determines the cooccurrence based on time and location. Despite encountering many measures, there remains a deficiency in acquiring precise and accurate inferences. In our research, we consider several threshold parameters to quantify more precise inferences. We also develop a framework that infers existing social ties and the hidden relationships in a social network.
Initially, in our research, we present the inference of social ties among people by correlating to their physical presence at several sites and their direct connections. We define a social connection if two individuals X and Y cooccur in a s cell within t hr time frame, such that X calls to person R while connected to a b1 base station and in the same time frame Y calls a person S from the same b1 base station. Furthermore, we counted the number of cooccurrence of X and Y. Firstly, we find social ties depending upon the number of direct calls between two people. To ensure the correct social connection, we state a threshold, such that the count of direct calls is more than the threshold. Secondly, we evaluate the social relationship between two people by counting the number of calls by X and Y in a specific time frame. Figure 1 states an example that explains the procedure of quantification. Each hexagon represents an area of a single base station. X and Y are together 6 times in various base stations, and there is a variation in the gap of calls. In the first part of research, we use the CDR data set provided by the telecommunication company and two openly available location-based social network data sets, i.e., Brightkite and Gowalla [13]. All data sets resemble to the stated example in Figure 1. We counted the number of concurrences based on multiple gap time frame thresholds and mutual friends. Furthermore, we correlate results with the direct calls based on social connection using Precision and Recall evaluation measures.
In the second phase of research, we explore the falsepositive results formed by the CDR-based social tie inference model. We state a missing tie as a suspicious tie between two people if they do not have any direct calls but are found together numerous times. Also, they have a certain number of mutual friends. In the literature, missing ties are defined as either nonresponse or absent ties [12]. In an activity, an actor does not give any information about a tie considered as nonresponse [14], while an absent tie means when an actor does not give any indication about the tie detail. A survey was conducted to monitor the social behavior of the boys' and girls' liking pattern. at was limited to binary data, such that one represents a tie while zero represents no tie. Figure 2 shows visual representation (block modeling) of adjacency matrix made according to survey data [14]. In Figure 2, green-filled slots represent the existence of tie, regardless of its strength, while white slots indicate either absent ties or nonresponse ties. In our research, we explore and classify a subset of missing ties as suspect ties. We conduct a social activity and simulation that generates a data set the same as the CDR data supporting this concept. Furthermore, we correlate the CDR-based social tie inference model's false-positive results with the activity and simulation results. e contributions of this paper are as follows: (1) We developed an inference model and a classifier that identifies location-based social ties. e inference model is tested on the CDR-based social network, Brightkite, and Gowalla, using Precision and Recall measures. (2) We identify a class of suspect ties by examining the social tie inference model's false-positive results. (3) We conducted an activity-based survey and a simulation that demonstrates and evaluates the suspects' ties.
e rest of this article is organized as follows: Section 2 describes the literature review. Section 3 presents the descriptions of cooccurrence count normalization, inference algorithm, and social tie inference. Brief concepts about the hidden relationship and suspicious links are described in Section 4. e proposed framework and an algorithm to infer suspicious relationships are given in Section 5. Section 6 describes the data set description, results, and analysis. Finally, the conclusion of this article is presented in Section 7.

Related Work
e physical world social network is represented as a graph, where nodes are treated as people, and edges are represented as the social tie between two people [15]. In the literature, edge weight is represented as the strength of that particular social tie [10,16]. A social network such as Twitter forms a bidirectional graph, e.g., a fan follows a celebrity but the celebrity hardly ever follows back. Usage of bidirectional graphs investigates influential networks and most inflectional people [17][18][19]. Recommendation and targeted marketing are some of the essential objectives of exploring social ties [20,21]. eme-based model adopts dynamic programming to explore critical factors, for example, playing and dating are the kind of themes [22]. Social ties coupling and predicting the mobility of users were researched by seeing the physical and network properties (geosocial properties) [23,24]. An effective prediction technique was proposed to find the typical patterns of two users by comparing the check-in details [24,25]. Area significance is measured using a weight-assigning method by incorporating two users' cooccurrence for a specific area. A coffee shop is a more significant area as compared to an ordinary place [26]. e scoring mechanism helps in categorizing and labelling of social ties [10]. Inference about the any social network is incomplete if associated features are neglected. e baseline of any social network is the single social connection between two people. In the state of the art, social ties are generally categorized as (1) strong ties, (2) weak ties, and (3) absent ties [9,17], whereas the strength of tie depends upon (a) amount of time, (b) emotional intensity, (c) intimacy variables, and (d) social distance [3,10]. e repeated presence of two individuals in a specific geographical location within a limited time also infers a social connection [7]. e strength of the social tie is directly proportional to the happening of such high cooccurred events.
IoT has emerged as one of the most powerful and impressive technological research domains [2]. IoT presents a novel connectivity concept, where machines can equally collaborate with humans based on actuators and sensors [27,28]. One research forecasts that smart devices such as electronic medical kits and smart watches will reach up to the worth of USD 160 billion by the end of 2026 [28]. e communication network between smart devices and human forms Social Internet of ings (SIoT) and further opens up new research challenges for researchers. Managing problems such as data scalability, velocity, and variety are few of the emerging issues in SIoT [29]. Understanding social ties among human-to-human, human-to-machines, and machines-to-machines helps to quantify the network performance issue [30].
Social ties are the backbone of any social network [31]. Formation and deformation of social ties affect communities in a network [32]. Besides social tie strength: factors such as location, emotion, situation, age, gender, religion, personality, and many more have a substantial impact on the social connection [10,33]. Granovetter highlighted the strong connection between weak social connections and finding jobs [34]. In the literature, sources of data commonly used for social analysis are call logs [35,36], emails, and socialnetworking websites [5]. In the literature, extensive challenges associated with the integration of visible and invisible networks are highlighted [37]. Investigating criminal social networks using limited clues is one of the emerging research areas of social network analysis (SNA) [38,39].   Statistically, there are always some hidden or visible associated parameters among cooccurred events. Social network analysis is performed to explore such intriguing knowledge. In the physical world, social network analysis is utilized in job searching [4], studying urban life psychology, investigation of guilt association [12], finding communities [40], spreading of news [41], and influential networks [18,42]. In the recent era of information and technologies, massive logs are generating for each person, e.g., call records, bank transactions, online purchase records, daily emails, CCTV cameras, and much more mediums [7,43,44]. In contrast to the physical world, such mediums further concise the accuracy of results by highlighting such associated features. Despite numerous data sources, there is no optimal procedure to quantify stochastic human nature and social network evolution [45]. e grouping method identifies hidden social groups, which further explores the friend circles and focuses under high privacy settings [46]. Another research explores the hidden social ties using respondent sampling [47]. In the literature, hidden social ties refer to that population, which is hard to access. e population that tries to hide from the social network is hidden in a network [47]. In our research, suspect ties mean actors in a social network that are present and accessible, but they try to hide their social connections. Our second part of the research explores the suspicious ties within the existing network instead of a hidden population in a social network.

Data Set Descriptions.
In our research, we incorporated three large location-based data sets, i.e., CDR, Brightkite, and Gowalla [13]. e CDR large data set used in this study was provided by one of the Chinese mobile telecommunication operator companies. e data set contains 202,000 subscribes along with user demographic information.
Calling detailed records contain six months (June 2014-December 2014), and calling detailed records contain these 202, 000 subscribes, which have 221, 451, 169 records. Each record of the data set is represented in the following format.

Duration LAC ID CELL ID
Brightkite and Gowalla are openly available locationbased social network data sets [13,48]. Both data sets are gathered using the online social-networking websites. Websites maintain user check-in data by fetching mobile GPS location data.
ese services use to help people in finding the nearby users and to build social connection. Brightkite contains 58,228 nodes and 214,078 edges, and Gowalla contains 196,591 nodes and 950,327 edges. Other than social network data, both data sets also contain direct social tie data. Figure 3 states the example of the social network, having a case of suspect actors and their hidden ties. Actors with several mutual friends but do not have direct connection may have a secret connection. is information helps in identifying them as a suspect tie. e social network evolves, and new connections expand the scope of the social network. One social network is a combination of multiple social networks involving different individuals [31]. A social network can be sliced based on starting and ending time. Social networks can also be divided into subsocial networks monthwise if it has been developed over one year [49,50]. Social network slicing helps our research further to explore the missing ties between friends of friend relationships. e following list of abbreviations is used for the quantification processes of Precision and Recall, which will also be used in several parts of the paper: SK � calling record e calling records represent the actual number of direct calls that occur between two users. e value of SK is counted to identify the social tie between two users. CK � times of cooccurrence Time cooccurrence represents the presence of two users in the range of a common base station. We counted CK when two users were connected to a common base station, and they called any other user. G � time − frame gap value e time-frame gap value represents the time interval between two users' calls while connected to a specific base station. For example, user X calls someone at 2 pm and user Y calls someone else at 4 pm; in this case, the gap between the calls is 2 hr. To quantify the results and

Cooccurrence Count Normalization Measure.
Cooccurrence count value CV tells the presence of two users in the region of one base station. An issue related to CV counting is explained and resolved using an example for the two users X and Y, shown in Figure 4. We counted CV when two users were connected to a common base station, and they called any other user in a specific time frame. e example is shown in Figure 4 states the call log details of users X and Y gathered in a time frame T. where In Figure 4, x 1 and x 2 call times have the closest call time to y 1 call. In this case, a count value of CV can be calculated as 2. However, such counting may lead to a wrong inference. It is the same as if one person calls once, and another person calls n-times within a specific time frame, equals n as the count value. To resolve this issue, we propose a normalization equation that decreases the count value periodically. We introduce Beta (β) value as a periodic normalizing factor.
Let X denotes a set of calls by user X and Y denotes a set of calls by user According to the example stated in Figure 4, we assumed β for set Y � (y 1 β, y 2 β, y 3 β).
For first match value of β � 1.
For the second match values of β � β/2. Likewise, for the n th match value of β � β/n, In equation (9), mk refers to the total number of calls made by user X to Y, while nk refers to the total number of calls made by user Y to X. equation (9)

Social Tie Inference
We initially investigated direct social ties formed by CDR data sets and compared them to the indirect social ties formed based on common location using Algorithm 1. By Scientific Programming direct ties, we mean calling or direct connection. For example, person A calls person B refers to a direct tie between A and B. Algorithm 1 takes GV, SNK, and CDR data sets (social network) as inputs. Furthermore, the algorithm has two parts; initially, it finds the direct ties between two individuals depending upon the SNK threshold value. Secondly, it counts the presence of two individuals based on several parameters. e Calculate Cooccurrence Count() function finds the number of cooccurrences using equation (9), explained in the previous section. Infer Social Ties() function finds the social connections depending upon CTK, DT, and SNK and inferred them as the social ties.

CDR-Based Social Tie Inference.
A social tie is inferred between two persons if they are found together at several sites numerous times. e inference algorithm identifies two sets of results, i.e., direct social ties and inferred social ties. For the cross-validation of results, we correlate the direct tie results with the inferred ones. Precision and Recall evaluation measures are used to examine the results. We tested all records based on threshold values, K is the direct calls, M is the times of cooccurrence, and G is the time frame gap value. While N as direct call count shows the degree of friendship, more value of N indicates the friendship strength. Figure 5 shows the Precision graph, which contains four sets, Figures 5(a)-5(d). e whole data set is examined based on K and the value of M and v.
In Figure 5(a), the value of K is 15 which represents the users with direct calls between each other equals to or greater than 15. e value of M is the number of cooccurrence for two different users. e Precision values are comparatively significantly less for M in the range of 0 to 10. In contrast, the value of Precision increases exponentially for the value of M in the range of 10 to 30. e higher value of M indicates higher cooccurrence of users. A positive correlation can be observed between the values of M and Precision. It infers that cooccurrence is a significant attribute that affects positively in identifying social ties. All graphs in Figure 5 have six different lines; each line represents the different time gap ranges. It can also be seen that the values of gap value 30 minutes are having more Precision while the rest lines of 1 hour, 2 hours, 6 hours, 12 hours, and 24 hours are having less Precision.
is also clues that the strength of ties has a specific effect on Precision. Users having strong social connections, most of the time, are found together in certain areas. is pattern is explicitly observed by looking cooccurrence value M � (20 to 40) and gap time frame G � 30 x 3 x 4 x 5 x 6 minutes. Another positive correlation is found between the degree of friendship and physical presence at a specific place.
To see the effect of friendship strength, we evaluated results for the four different K values, i.e., 2, 5, 10, and 15. A typical pattern is found in all the graphs shown in Figure 5. It shows that Precision is less for people whose mutual presence is less at different sites. Also, people with strong social ties spent less than 1 hr time together at a specific location. To understand the graph's actual meaning, we quantify and reconcile with the actual direct social ties. It is observed that a positive correlation in results infers that people with strong social connections often visit places together. Figure 6(a) represents the Recall results. We tested and evaluated Recall based on the same measures as Precision, i.e., direct calls, cooccurrence, and gap time frame.  is part of the research finds people's cooccurrence based on the same base station connectivity in a specific time frame and infers them as social ties. Furthermore, it cross relates the inferred results with direct call results.

Brightkite-and Gowalla-Based Social Tie Inference.
Brightkite and Gowalla data sets contain direct social ties as well as the check-in information of each user. In our study, we investigated both data sets based on several dimensions and found some of the very interesting facts. inferences, we conducted a social activity and simulation. e false-positive results of the first part of the research serve as the foundation for the second part. Activity under the first part data set is conducted, and the false-positive results are examined by studying the human cooccurrence patterns, described in the next section Suspicious Ties. is stage of research gave us a clue to further exploit the category of missing ties.

Suspicious Ties
An absent tie can be inferred as a suspect tie, if it satisfies the following properties: Let TH denotes a set th i of th n points, and th i is called as the threshold value for timeframe Let C denotes a set c i of c n points, and c i is called as the call information, , ct), where id as call id, ct as outgoing call time.
Let R denotes a set r i of r n points, then Let RT denotes a set rt i of rt n points, then Let S denotes a set s i of s n points, then wheres i is a set of elements that identifies distinct callers based on the same base station connectivity and a definite number of calls in a specific time frame.

Suspect Inference Framework
We studied the pattern of exceptional cases belong to the false-positive set and described a subset of the falsepositive set as suspect ties. Physical activity was designed and conducted to investigate the formation of suspect social ties. Activity consisting of 50 people, and a data set was generated within almost 4-5 hours. A basketball court was utilized for the activity. Nine circles were drawn physically on the basketball court, assumed as the base station cell. Out of 50 people, nine were directed to act as a base station. e boundary of each circle was considered as a range of the base station cell. Rests of the 41 persons were directed to perform the following two steps.

Selection
Step. Initially, each person from 41 people was asked to choose two sets of friends. One set as obvious friends and the second set of hidden friends such that the size of the hidden friend set should be at most 1/5 of the obvious friend set size, e.g., if one person has five people in apparent friend set, he can have no more than one hidden friend. After the selection of both sets by each individual, information was shared with one of our representatives.

Operation
Step. In this phase, each person was directed to follow the following rules: (1) You should not call your hidden friend (2) You should call all of your obvious friends at least once (3) You should conduct a maximum number of calls to your closest obvious friend and second maximum to a second level obvious friend and likewise to the least friend (4) You must try to meet your hidden friend as much as possible physically e method of calling is like, if person A wants to call person B from base station B1, the person has to go to the base station B1 and register a call with a person acting and standing in base station B1. Respective base station person will write and make an entry with five parameters, i.e., Caller Name, Callee Name, Time, From Base Station Name, and To Base Station Name. e data set was gathered in the following format. An example is given below. For the understanding of the variations and patterns, the same activity was also designed using simulation. e whole simulation followed the same conditions, and another data set was generated using a random function. Based on the activity, a framework is designed to separate a class of suspect ties. Proposed inference framework work is designed and implemented using pipe and filter architecture, shown in Figure 8. Algorithm 2 shows the implementation of suspicious tie inference framework, explained in Figure 8. e framework takes the social network matrix, count threshold value, gap time value, and mutual friend count value as inputs and filters the result. Initially, Calculate Levels() function finds the five level depth information for each distinct user. Let us say if A calls B, B calls C, C calls D, and D calls E, it implies that A � 0, B � 1, C � 2, D � 3, and E � 4 represent five levels. is step ensures that all the levels have distinct users. Secondly, Find Suspects() function selects only those sets of users from level 1 and level 3 that do not have any direct calls and M number of mutual friends. Furthermore, Calculate Subsocial Network() function generates subgraphs using level 1 and level 3 details depending upon the gap time value. Results are filtered on these bases of time gap value, e.g., two users called some other user while connected to the same base station within the given time frame, explained in the previous section. After that, Infer Hidden Ties() function uses the proposed normalization method to find the number of the count, defined in equation (9), and then all results are filtered according to the cooccurrence count CV and the mutual friend threshold value M. Based on mentioned parameters and thresholds, Algorithm 2 significantly identifies the subclass of missing ties as suspicious ties.
e results of the activity and simulation are computed and evaluated using Precision and Recall measures.  ((x n , y m ) × (y m β/n)) m m=1 n n=1 = Figure 8: Suspect social tie inference framework.

Scientific Programming 11
Evaluation results of simulation and social activity conducted are shown in Table 1. Precision, Recall, and F1 Score measures are used to evaluate the framework that is further calculated based on the cooccurrence count CV, mutual friend count M, and gap time GV parameter setting values. F1 Score is calculated using equation (5). Definitions of the related parameters are given as follows.
TP Which were hidden friends and system infer as hidden friends TN Which were not hidden friends and system infer as not hidden friends FP Which were not hidden friends and system infer as hidden friends e system obtains a maximum number of relevant hidden ties along with false-positive results. By GV ≥ 30, CV ≥ 5, and M ≥ 2, it means that the gap between the two calls is 30 mins or more while the count of cooccurred events is kept minimum five and mutual friend count as two or less. e system's performance drops when gap time is reduced to < 30, < 20, and < 10. Even though there is a drop in truepositive values, a significant drop in the false-positive values can be seen. Results become concise, with the variation in both values of GV and M. e limited data set collected using activity highlights the occurrences of hidden ties. Whole activity and simulation were designed to get similar fields of data as CDR so that the proposed framework is compatible with the CDR data set.
Results shown in Figures 9 and 10 exhibit the existence of hidden relationships. e parameter, such as the gap time value GV, helps to identify the time frame selection such that We found some exciting dissimilarities in the results between simulation and activity data set results during the complete evaluation process. e simulation results do not give higher Recall and Precision value compared to activity data results. We concluded that the data set of simulation is generated using random function while the activity had the human hiding patterns. ese results also help to infer human psychology in building hidden ties. e simulation data set generator works based on the same constraints mentioned as rules of the activity. However, the key difference between simulation and activity is the selection of friends, hidden friend, and the pattern of calling. While conducting the simulation, trivial variation in results was observed as the random selection has random patterns. According to our findings, simulation and activity both exhibit patterns of hidden ties. However, activity results are more pronounced and significantly identifying the hidden relationships. It is essential to highlight some critical questions and dependent variables that help to find hidden social ties between two people, for example, why two people hide their social ties? Is this deliberate or unintentional action? What if they are deliberately hiding their social tie for a purpose? In such a scenario, extracting a social tie for two people is a kind of intense problem. It is a kind of investigation process which explores a clue to draw some relationship between two people. Investigation designates if two people are posting a picture on social media to be more cautious about not identifying their social ties. While if they are doing some private activity, they will be less careful, for example, posting a picture on Facebook or Flickr compared to calling to a person from a specific location. Although extracting hidden social ties include various privacy issues. We designed Scientific Programming 13 activity and simulation that generated the data set by the caller data record (CDR) data set. We thoroughly investigated the patterns of connectivity and established a framework to infer social ties. is research has opened up a new direction further to explore connectivity in the Social Internet of ings (SIoT), specifically machine-to-machine direct communication and machine-to-human or humanto-machine hidden relationships. In many cases, several machines work together but they rarely have a direct connection.

Conclusion and Future Work
In this research, we have examined the developments of social connection patterns based on physical gathering. In the first part of our research, we explored the correlation between direct communication and two individuals' physical presence. To check the system's performance and evaluation, we examined all results by utilizing Precision and Recall evaluation measures. We also present a periodic normalization equation for the cooccurrence count. In the second phase, we propose the suspect tie inference framework. False-positive results of the first part of the research are the ground to the second part of the study. e proposed framework adopts pipe and filter architecture, where the threshold values control each filter. e framework's fundamental objective is to take the data set such as CDR (caller data record) and infer suspect social ties, depending upon the specified threshold values. Analyzing the results critically, we propose a theory that identifies suspect social ties. Besides this, for comparison and evaluation, we conducted real-time human-based activity and simulation. Keeping in mind the structure of the actual CDR data set, the whole activity was designed and evaluated. In contrast to existing work, our research focus is on hidden ties instead of the hidden actors. In the future, we are aiming to explore the homophilic nature of suspect ties within the Social Internet of ings (SIoT).

Data Availability
e data used can be found at http://snap.stanford.edu/data/ index.html#locnet.

Conflicts of Interest
e authors declare that they have no conflicts of interest regarding the publication of this work.