User Perception of Security on Social Networking Sites Using Fuzzy Logic

Human nature often frowns on engaging or interacting with near strangers. However, on online social media networks, this is largely ignored. There is an open interaction among both known users and loosely-connected users


INTRODUCTION
Social Networking sites (SNSs) have come to stay and are now an integral part of our lives. In recent years, participation in social networking sites has dramatically increased. Online social media services such as Facebook, Twitter, and LinkedIn allow millions of people to create online profiles and share personal information with vast networks of friends and sometimes unknowingly with strangers. However, whiles popularity is soaring for these SNSs and millions of users sign onto these sites on a daily basis, there are also growing concerns about breach of security on these sites. Privacy issues and identity theft on social media sites are huge concerns. The phenomenon is attracting the attention of academic and industry researchers who are intrigued not just by the wide reach of audiences for these social networking sites but the increasing concerns of security risks posed to users.

Overview of the Problem
As of June 2010, 22 percent of all the time online or one in every four and half minutes spent online was social-through sharing, messaging, commenting, and blogging [1]. It is also interesting to note that for the first time ever, social networks or micro-blogging sites are visited by three quarters of global consumers who go on-line [2]. Brazil leads the world chat with the highest percentage (86%) of internet consumers visiting a social networking site and in the U.S. the total minutes spent on social networking sites has increased eighty-three percent year-over-year. Facebook alone, as one of the major players of SNSs had by September, 2012 reached one billion active users with each active user linked to an average of 305 other users making it the second most visited website on the Internet [3,4].
While there is no doubt about the range of opportunities for communication and real-time exchange of all kinds of information offered by social networking sites, there has emerged critical issues of concern about its privacy and security [5]. In most of the SNSs, there is very little protection against copying of personal data from profiles and re-publishing the data elsewhere [6] but one of the most important challenges of information sharing is how to assure its security [7]. For example, in recent times, the reputation of social networking sites has been hit by a number of incidents as reported by various media platforms [8,9]. It is therefore incumbent on SNSs to have clear policies regarding data protection so as to deliver the same level of social privacy that exists face to face. As it is with any new tool or application, it is always important to keep a close watch on its security especially when majority of the population is actively involved in the use of such tool.

Information Security Evaluation Criteria
The widely accepted model or criteria for evaluating information security is the basic CIA triad; standing for Confidentiality, Integrity and Availability. These three key criteria principles are deemed fundamental to guaranteeing security in any information system. These criteria have been applied across the whole subject of Security Analysis, from access to a user's internet history to security of encrypted data across the internet [10]. Therefore the universal classic definition of information security is brief and very simple: Information security is the confidentiality, integrity, and availability of information [11]. By extension, if any one of the three principles are violated or breached, it can have serious consequences for the parties concerned be it an organization or the individual user of an information system. The Information Technology Security Evaluation Criteria (ITSEC), a consortium of Information Security experts from France, Germany, Holland and the United Kingdom, also employ confidentiality, integrity and availability as the yardstick for evaluation of Information Technology security [12]. The relationship among these factors however, has much ambiguity and conflict [13,14,15] such that it is reasonable and scientific to apply fuzzy comprehensive evaluation method for evaluating security risk in an information technology system such as an online social networking site. The CIA Triad variables are not independent and sometimes oppose themselves as regards their use. For example [15] explains" locking your data in a safe and throwing away the key may help confidentiality and integrity but not availability".

Fuzzy Sets Theory Applications in Information Security Evaluation
The Fuzzy set theory approach, pioneered by Zadeh [16] was intended to deal with the issue of uncertainties that are not statistical in nature. The approach has been widely used to represent the uncertainties of real-life situations. The decade has witnessed rapid growth in the number and variety of applications using fuzzy set theory. In the field of computer security, fuzzy set theory was used by [17] to assess quality performance of E-Banking Security system. The research focused on the complex and dynamic nature of the various factors that are considered in E-banking security assessment. They were convinced that fuzzy logic (FL) model presents an effective tool in assessing and evaluating e-banking security performance and quality. Fuzzy set theory is also applied in assessing online risk for distributed intrusion prediction and prevention systems [18]. The research illustrated how the design of fuzzy logic based on Distributed Intrusion Prediction and Prevention Systems (DIPPS) can be used to effectively assess online risk. Hierarchical Takagi-Sugeno Models is also used for online security evaluation Systems [19] where the risk assessment was carried out using an evolutionary algorithm to automatically design a Hierarchical Takagi-Sugeno fuzzy inference system. The hierarchical structure is evolved using Probabilistic Incremental Program Evolution (PIPE) with specific instructions. Authors [20] further on used a neurofuzzy learning method to optimize the performance of fuzzy risk models. The architecture of the developed hierarchical fuzzy inference system was however designed manually. Other works include information security risk assessment method based on Fuzzy Logic [21] and Network information security comprehensive evaluation using interval-valued Fuzzy mathematics [22].

METHODOLOGIES
The methodological approach and design implementation selected for the evaluation of users' perception of security on online social networking sites by fuzzy set theory was done through the following stages: Fig. 1 outlines the procedure.

Design Model of Linguistic Variables
The inputs to the system were confidentiality, integrity and availability. These criteria or linguistic variables are assumed to be of the same weight and a particular value is determined for each of them based on questions that are answered about a specific social networking site. Designing the fuzzy system requires that the different inputs (that is, confidentiality, integrity, and availability) are represented by fuzzy sets. The fuzzy sets are in turn represented by a membership function. The membership function used in this paper is the triangular membership function which is a three point function defined by minimum (α), maximum (β) and modal (m) values where (α ≤ m≤ β) as shown in Fig. 2.

Fuzzy Sets
The values of linguistic variables were represented with fuzzy sets defined by triangular membership functions [20]. The triangular membership function was chosen mainly because of its simplicity and appropriateness for this work [25]. Each linguistic variable takes 5 values considered to be an ideal choice because with more than 5 values the design becomes cumbersome. The level of confidentiality as a linguistic variable was defined on a set of membership functions of not confidential, slightly confidential, confidential, very confidential and extremely confidential. The level of integrity was also defined based on the scales of very low, low, high, very high, and extremely high whiles the level of availability was defined by the scales of not often, rarely often, often, very often, and always available. The levels defined above were based on a range definition with an estimated interval of [0-10]. The level of security, the output, is defined based on the scales of not secure, slightly secure, secure, very secure, and extremely secure within the range of [0 -30]. A web survey on the CIA Triad was put on the three social networking sites for a period of 4 months. Some were also sent through emails to people. In all there were 829 respondents but 18 were incomplete bringing the number to 811. 376 of the respondents were between ages 19-25, 227 between 26-32, and 187 between ages 33-39. The rest were either below age 17 or above age 39. The respondents were from diverse backgrounds of culture and race. Based on the results from the web survey, the interval estimation method was used to define the ranges for each of the membership functions belonging to each of the three inputs and the output.

Input Variables
The input variables were the CIA TRIAD of confidentiality, integrity and availability which are used in the evaluation of information technology security. The same CIA criteria were deemed appropriate to be incorporated into a secure social networking site application system. Tables 1,2,3 and 4 show the ranges for each membership function for the input variables respectively.

Membership Functions for Input Variables
The inputs were defined on a domain interval of [0 -10], based on the results from the survey using the interval estimation method. Again based on the results, the domain was then divided into 2N + 1 regions and to each region a membership function was attached. In this paper, the domain was divided into 5 regions (N =2). The regions were represented by triangular membership functions as shown in Figs. 3, 4 and 5 respectively for confidentiality, integrity and availability in MATLAB.

Membership Functions for Output Variables
The output domain interval was estimated to be [0 -30]. The domain interval was further divided into 2N + 1 regions and to each region, a membership function was attached. The level of security (the output) is divided into 5 regions (N = 2) represented by not secure, slightly secure, secure, very secure, and extremely secure as the fuzzy sets. Fig. 6 shows the triangular membership functions for the output variable as modeled in MATLAB.

Formulating Rules and Populating the Rule Base
The rules were built based on intuitive knowledge of the relationships between the variables. The rules were formulated so as to reflect the relationships between any possible relations of the input variables to the output variable. The rules in this work reflected the relationships among the levels of confidentiality, integrity and availability to the level of security. To determine the overall security level for each social networking site, the rule base needs 5³ = 125 rules since there were five linguistic values and three linguistic variables (Confidentiality, Integrity and availability). A sample of the rule base used to construct the overall knowledge base is shown in Table 5 for different linguistic values. The levels of Confidentiality, Integrity, and Availability were used in the antecedent of rules and the level of security as the consequent of rules. A fuzzy rule is a conditional statement in the form: IF X 1 is A₁ and X 2 is A₂ and X 3 is A₃ THEN y is B where X 1 , X 2 , X 3 and y are linguistic variables and A₁, A₂, A₃ and B are linguistic values determined by fuzzy sets on universe of discourses X₁, X₂, X₃ and Y.

APPLICATION OF FUZZY TECHNIQUE
In order to construct a fuzzy rule-based assessment for the evaluation of users' perception of security risk on social networking sites, an online questionnaire was designed mainly based on the CIA triad of Confidentiality, Integrity and Availability. In all there were seven social network security based questions under each Linguistic variable to help form users' perception of security on each of the selected social networking sites which were Facebook, Twitter and LinkedIn respectively. The link for the survey was posted on all the three social networking sites for users to respond. Other users were emailed with the link to respond to the survey. The survey questionnaire was online for approximately three months. In representing users' perceptions as a fuzzy membership function, the interval estimation method was used. The interval estimation generates more suitable results for continuous measurements. Participants understand and represent their opinions more easily using interval estimation. Often an interval estimation method for constructing fuzzy membership functions is the most appropriate and is commonly used [23,24]

Fuzzy Aggregation using Weighted Average
One of the most common aggregation operator often found in literature is the weighted average (WA) also known as the weighted mean. It is similar to an arithmetic mean where instead of each of data points contributing equally to the final average, some data points contribute more than others. There are weighted versions of other means such as the weighted geometric mean (WGM) and the weighted harmonic mean (WHM). There is also the ordering weighted average (OWA).
Let TFN X = (X1, X2, X3) (1) And TFN Y = (Y1, Y2, Y3) (2) Then the sum of X and Y is (X1+Y1, X2+Y2, X3+Y3) In this paper, respondents were to choose between a series of statements on the ordinal/interval scale the one they judge most appropriate and it is argued that the choice of score is, in effect, a judgement between 3 indicator statements. Thus, for example as shown in Fig. 7 for the linguistic variable confidentiality, respondents rate the level of confidentiality of the dating history or intimate secrets they submit with friends on social media on the following scale:

Fig. 7. Linguistic variable confidentiality
In this interpretation, a respondent who judges "very confidential" to be the appropriate score makes a constrained choice in the range where 5 is the minimum value, 7.5 is the modal value and 10 the maximum. (To think of it in another way, respondents must consider which of the five hypotheses, Not confidential, Slightly confidential, Confidential, Very confidential and Extremely confidential best represent their judgement of the situation.) In extracting the fuzzy scores on a range of 10, the descriptor "Not confidential" corresponds to a triangular fuzzy number (0, 0, 2.5). Similarly, the descriptor "very confidential" also corresponds to (5, 7.5, 10), and so on.

Evaluation using Weighted Average
To summarize the answers modeled by the fuzzy numbers put to many users, the weighted average was used. For summarizing the answers that were put to n users we used the average of fuzzy numbers representing users' answers.

Center of Gravity
To determine perceived confidentiality, integrity and availability, users responded to 7 different questions (Appendix A). Further on, seven summarized fuzzy numbers are derived as shown in Table 6 and subsequently in Table 7 and 8. To represent these 7 fuzzy numbers by crisp value, the centroid method was used in this instance. Figs. 8 and 9 illustrate the centers of gravity for one triangle and two triangles respectively.

are centers of gravity
This method of aggregation is used for all the three linguistic variables (confidentiality, integrity and availability). The three inputs representing the three variables are then fed into the fuzzy logic tool box to generate the appropriate output of the level of security on each social networking site.
For illustration purposes, the table results for the linguistic variables; confidentiality, integrity and availability for Facebook is shown below. The same was done for Twitter and Linkedin. Center of Gravity c T = 6.10   Table 9 summarizes the crisp values from the aggregated responses from users for the input variables.

Implementation Procedure in MATLAB
The final result for each of the linguistic inputs derived after aggregating the responses from the well-constructed online social networking sites security questions were fed into MATLAB to derive the final output (security risk level) for each of the three selected social media sites. The inputs were supplied through the graphical user interface called rule viewer.

MATLAB fuzzy inference system (FIS) editor
The fuzzy inference system editor in Fig. 10 shows the summary of the fuzzy inference system. In the editor, is shown the mapping of the input variables to the output. The input variables were respectively confidentiality, integrity and availability. The output was security level whiles the rules were constructed using the Mamdani fuzzy reasoning and the defuzzification technique was done using the centroid technique. The mamdani method was chosen over Sugeno because it is well suitable for human input like this research and generally has broad acceptance and applicability [25]. The three input variables (confidentiality, integrity and availability) were fed into the system. The appropriate input corresponds to the weighted averages of the user responses in the questionnaire for each of the input variables followed appropriately by their center of gravity. For example, in the Fig. 11, the input values for the variables confidentiality, integrity and availability are respectively (5,6,7) and the corresponding output (security level) is 18.1, as shown at the top of the corresponding graphs. The result for each of the input variables is specified at the top of the section corresponding to them, so also is the output variable.

The surface viewer
The MATLAB surface viewer as shown in Fig. 12 is a 3-D graph that shows the relationship between the inputs and the output. The output (security level) is represented on the Z-axis while 2 of the inputs (Confidentiality and Integrity) are on the x and y axes and the other input (Availability) is held constant. The surface viewer shows a plot of the possible ranges of the input variables against the possible ranges of the output.

EVALUATION
The output from the fuzzy system is a crisp number whose value is not intuitive. We therefore interpreted it also in terms of used notions not secure, slightly secure, secure, very secure, and extremely secure. These notions were modeled with fuzzy sets and then evaluated the membership functions of the fuzzy sets that represent these notions. For example linguistic variable value secure is represented with the triangular membership function (7.5, 15, 22.5). It can be seen that for crisp value x, the following holds:   (14) (16)

User Security Perception of LinkedIn
LinkedIn was judged with scores of inputs of 3.98, 2.70 and 7.70 for confidentiality, integrity and availability respectively. The crisp output was 14.9. This value corresponded to 1% slightly secure and 99% secure. Fig. 17 shows the security level for Facebook based on the membership functions secure.   Table 10 gives a summary of the research in terms of how the final values were arrived at for the variables involved. The research finds that users consider Twitter very secure and Linkedin secure. Facebook lies between secure and very secure on a range of level of security. In can be inferred therefore, from the research that, Twitter seems to meet the expectations of users of a more robust security than the others. Whiles the focus of the paper was to develop an appropriate methodology that captures the views of users to be incorporated into future security designs aimed at improving security on social media, the ranking on who best meets users' expectations on security on social media sites could also in part, help instill some healthy competition among the media sites even though admittedly, Facebook has more active users than the two other media sites.

CONCLUSION
Users are by far the main building block of any online social networking site and therefore their security and privacy should be of utmost concern to managers of these social media sites. One user complaint, user perception, is an important element when considering the concepts of social networking security. The perception processes of humans cannot be analyzed and assessed by a binary approach or in a simple quantitative way. The human thought process is subjective, imprecise and complicated, and human perception usually uses a linguistic approach, as opposed to a numerical approach, to classify, describe, or "value" a system.
In addition, user perception of security risk on social networking site is solely affected by an individual evaluator's needs and requirements of what would make him or her secured on a social networking site. In this paper, a fuzzy system was implemented using fuzzy logic theory to evaluate user perception of security on social networking sites. Facebook, Twitter and LinkedIn were used as case studies for this research. Employing MATLAB and its associated fuzzy logic toolbox to design the Fuzzy Inference System, an overall user perception of security risk on SNSs were realized.