TDSRC: A Task-Distributing System of Crowdsourcing Based on Social Relation Cognition

,


Introduction
Crowdsourcing systems [1] have become a powerful, scalable, cost-effective method for promptly completing tasks, enabling requesters to allocate large-scale tasks to a crowd and obtaining results by leveraging mass wisdom.A solver crowd is typically large, anonymous, transient, and unprofessional, so that it is challenging to establish a trust relationship between a requester and solvers [2].Some solvers may not have required abilities for tasks or they may want to obtain the reward without carefully performing the tasks, which significantly influences the quality of the task outputs [3][4][5][6].
Many works have documented recently in order to improve the crowdsourcing quality.For example, Howe [1] proposed the golden standard data paradigm.Also, Eickhoff et al. [7] and Cao et al. [8] proposed another popular methodology, "simple majority voting."Jeffrey et al. [9,10] leveraged the behavioral traces captured from online solvers to predict the crowdsourcing quality.
e method of weighing results based on a solver's historical performance was well established by [5,6,11].Some researchers leverage social relationship in crowdsourcing system [12][13][14].However, most of these prior studies have assumed that a crowdsourcing platform has information of all the solvers, and these solvers can be considered as an entire large and stable resource set.e common processing flow is shown in Figure 1(a) where the platform matches the task needs with all the participants.ere are two defects in the platform: (1) the tasks can only be distributed by the platform once, and the distribution process cannot be iterated by the individual solver and (2) all the potential participants can only be the registered users on the platform.In addition, the method of historical performance-based easily incurs a "cold start" because some new solvers have no any historical records.
Considering an alternative scenario in which any participant does not have to register in the crowdsourcing platform, we proposed a novel model called the Task-Distributing system of crowdsourcing based on Social Relation Cognition (TDSRC) where a requester can distribute the crowd tasks to some of his friends without obtaining information about all information of potential participants (e.g., friends' friend).e requester needs only to generate a task and distribute it to his relevant friends.Iteratively, the friends can play the role of the next requester and redistribute the task in their social networks without any extra burden (as shown in Figure 1(b)).By introducing social relation cognition (SRC) into crowdsourcing, we establish a trust relationship that is considered to be challenging in a common crowdsourcing platform [2].
is study has the following contributions: (i) A method that enables a requester to efficiently distribute a task to more suitable solvers is proposed, and the accuracy of task distribution is promoted (ii) e social relationship is used to create and distribute a crowdsourcing task in his friend circle directly without obtaining global information (e.g., the set of all candidate solvers) which is often difficult to get (iii) e system can effectively avoid cold start problem which exists in performance-based methods e remainder of this paper is organized as follows: e related studies are described in Section 2. Necessary definitions are described in Section 3. Feature discovery and candidate solver selection are discussed in Section 4. e process and simulation are described in Section 5. Section 6 summarizes this work and explores possibilities for future studies.

Related Works
Crowdsourcing has been attracted considerable attention since it was proposed approximately ten years ago.Lease et al. [15] indicated that quality control must be considered if the crowdsourcing quality needs to be improved.Eickhoff et al. [7] pointed out that (1) filtering low-quality solvers decreases malicious solvers but causes longer completion times and that (2) a solver's reliability cannot be efficiently ensured by the solver's acceptance ratio of the previous tasks.In general, the selection of crowdsourcing nodes and the guarantee of quality of service are always core issues.Many researchers have made great contributions to different aspects in this field.e following related studies are briefly reviewed as follows.
2.1.Quality Control.Howe [1] proposed the golden standard data paradigm.According to the paradigm, certain questions (named golden standard data), which are elaborately predesigned with definite baselines, are advocated by careful insertion into the tasks without being identified by solvers.By comparing the solvers' responses to these baselines, a requester can identify unqualified solvers and precisely aggregate all task results.e main flaw of this approach is that the design of golden data is generally challenging and costly.Another popular methodology, "simple majority voting" [7,8], has been extensively discussed.is method classifies solvers' responses and aggregates the results according to the largest number of votes of the classification.Although this method is simple, it fails to identify a deceitful participant.
e basic principle state of historical performance-based methods in [5,6,11] is that better historical performances are correlated with a greater impact on the aggregate results of the solvers.As a useful complementary technique, Jeffrey et al. [9,10] leveraged the behavioral traces captured from online solvers to predict the crowdsourcing quality.e behavioral characteristics of the participants are highly correlated with the quality of the crowdsourcing.However, historical performance-based methods, for example, fail to consider the matched degree  2 Mobile Information Systems between the task requirements and the potential solvers' abilities.Some solvers may have better performance for a particular type of task than another [3].In addition, they easily incur a "cold start" because they require sufficient historical data to build an effective model.Moreover, some challenges are encountered when crowdsourcing complex tasks.Some tasks (e.g., picture editing) may generate a substantial amount of traffic in task distribution and results collection, which hinders the ability to attract participants due to the large cost of energy and money [16][17][18].erefore, the participants must collaborate with each other (e.g., applying the "store-carry-forward" routing pattern [18] to upload the results' data).Considering these facts, an alternative efficient manner for improving the task quality is to allocate the task to appropriate solvers rather than using complete random distribution.
Many researchers have employed social relationships in crowdsourcing.A social relation is considered as an essential and significant attribute of a human being; numerous methodologies are employed to establish social relationships [19][20][21].e famous theory of "six degrees of separation" [22] maintains that people are six or fewer steps from each other and that a chain of "a friend of a friend" statements can be made to connect any two people using no more than six steps.e trust relationship in society is broadly applied to personalized recommendation systems [23,24], software crowdsourcing [25], image annotation [26], and so on.Rahman et al. [12] proposed a framework that can create a large ad hoc social network and construct an incentive based on context-aware. is framework can solve many daily life problems such as finding lost individuals, handling emergency situations, helping pilgrims to perform ritual events based on location and time, and sharing geotagged multimedia resources within the crowd.Assem et al. [13] proposed a framework for summing up the crowd mobility patterns in cities using Location-Based Social Networks (LBSNs) data which is a spatial-temporal dataset crawled from Twitter based on nonnegative matrix factorization and Gaussian kernel density estimation. is framework utilizes a temporal functional to discover the correlation between the locations and crowd, and the framework can help in better allocating resources based on the expected crowd mobility.Gan et al. [14] proposed a novel game-based incentive mechanism for multiresource sharing based on social network, and a combination of task allocation process, profit transfer process, and reputation updating process is involved in the incentive to satisfy the truthfulness and individual rationality.Yang et al. [27] introduced a novel approach named the social incentive mechanism to incentivize the social friends of the participants who perform the sensing tasks.
e incentive leveraged the social ties among participants to promote global cooperation.
However, the most prior studies focused on obtaining an optimal aggregate result by identifying and excluding frauds after analyzing the collected results of a crowd, which fails to partially remove the deceivers at the earliest time (e.g., the node selection stage).e researches of introducing social relations into crowdsourcing mainly focus on the coverage of sensing region based on the participants' location [12,13,28] and motivating participants by utilizing social relationships.ey only use the mutual influence between friends [14,27] and do not classify and quantify the ability of friends.

Solver Selection.
e pioneering literature of solver selection in a social network was described in a study from Lappas et al. [29], in which the authors proposed a model to identify a group of individuals who can function as a team to minimize the communication cost.Zhao Dong et al. [30] designed two online mechanisms based on an online auction model.Under certain constraints (e.g., budget and time), the mechanisms can select proper solvers for different tasks and maximize the value of services.Considering the mobility of the mobile terminals, based on a time-sensitive task and a delay-tolerant task, Guo Bin et al. [31] proposed a framework named "ActiveCrowd" for multitask-oriented solver selection in large-scale mobile crowdsourcing scenarios, which applied the "greedy enhanced genetic algorithm" to achieve optimal or near-optimal solutions for minimizing the total distance and the burden, respectively, for tasks and solvers.According to the constraints of tasks, Zhang et al. [32] provided an incentive mechanism that enables a requester to actively assign most valuable tasks to the solvers.Bozzon et al. [33] proposed a model to select the top-K experts in a social network when a set of task needs is received.Considering both the profile information and the social activities, the model matches the expertise needs to candidate experts by formulating them as vectors.In contrast, with other team formation methods, Wang et al. [34] proposed an approach to build a collaborative team in a noncooperative social network, which assumed that individuals are selfish and pursue the maximization of their profits.Montelisciani et al. [35] highlighted some critical issues to structure a team formation with the aim of identifying suitable solvers in crowdsourcing natural language processing (NLP).Qing Liu et al. [36] devised four incentive mechanisms for selecting a team of solvers to accomplish some complex tasks.e authors addressed the team formation problems by formulating them as a task allocation and pricing mechanism design problem.
However, the majority of these authors assumed that the requester (or crowdsourcing platform) can obtain all potential solvers in advance, which is typically impossible and unnecessary in reality.e TDSRC proposed in this paper can accommodate the lack of information about the candidate solvers and needs only routine interactive information between the requester and friends.Based on the "six degrees of separation" [37,38], the trust chain between the requester and solvers can be well established and iteratively transmitted.Relative to strangers, people always prefer to believe people with whom they are more familiar.Deception among friends is relatively lower, and the crowdsourcing results become more precise and reliable [39].erefore, using the social relationship, the TDSRC facilitates building a trust chain between the requesters and the solvers and then enhances the accuracy and credibility of task distribution.

Preliminary Definitions
We aim to apply the social relations of the requester in the crowdsourcing system.e first step is to discover and quantitate friend features.In this study, friends' abilities have the same meaning as friends' features and include interest, hobbies, personality, characteristics, and integrity.

Social Network.
A social network is a social structure that consists of many nodes that typically refer to individuals or organizations.Such a network links various people or organizations regardless of whether they have a close relationship [37].e interaction among individual members in the social network form relatively stable relations and influence people's social behaviors [38].
In the book "Networked: e New Social Operating System" [40], published in 2012, Lee Rainie and Barry Wellman described the social network revolution, mobile revolution, and Internet revolution as the three revolutions that influenced human society in the new period.
A social network is formed by nodes and the connections between these nodes.Commonly, nodes consist of different types of properties [22].e social network in this study refers to any social network.A requester is the center of a network, and an edge is a one-way connection that indicates the friend features evaluated by the center node.
e participant node is denoted as p � 〈A〉, where A � a 1 , a 2 , . . ., a n   represents the attributes of the node.e social network is denoted as G � 〈P, E〉, where P � p 1 , p 2 , . . ., p m   denotes the friend nodes of the central node p and E � 〈e i,j 〉, where e p,j(m×n) � x with a direct connection between p and p j , 0 no direct communication between p and p j , where x represents the strength degree of communication between p and p j and zero indicates that p j has no communication.

Definitions Based on SRC.
Each node has unique properties, such as hobbies and professional competence.A node typically evaluates the abilities of his friends, such as the specific interests of the friends and the friends that are suitable for specific tasks.p is a node in a social network, and p has m friends.Requesters and solvers are referred to as participants.
For the convenience of reading, the important and frequent notations used in this paper are illustrated in Table 1.

Definition 1 (ability).
Ability denotes the qualities that are needed to complete a project or task.An ability is denoted as a in this study.

Definition 2 (abilities set (AS)).
is set has all types of abilities to complete a crowdsourcing task.e AS is A � a 1 , a 2 , . . ., a n   in our system.AS is a global factor that should be shared in this system.

Definition 3 (abilities subset (ASS)).
is subset consists of the elements from the AS.

Definition 4 (abilities value (AV)).
is digital denotation corresponds to the AS.We denote it as C.For example, C p � C a1 p , C a2 p , . . ., C a n p   denotes the abilities of node p. e original value is set between zero and one by p, and the default value is zero.

Definition 5 (abilities coverage rate (ACR)).
e ACR is the proportion of the actual AS of the solvers to the demanded AS of the requester.We use δ to denote it as follows: where AS so denotes the AS of the solvers and AS se represents the demanded AS of the solvers.e ACR indicates the match degree between the solvers and the task.For example, if the government wishes to conduct a public poll, certain characteristics of the informants, such as knowledge, background, location, job category, sexuality, income, and party category, may substantially influence the results.e larger the ACR, the more typical are the results.

Definition 6 (qualities factor (QF)).
e QF is the comprehensive valuation given by all friends of a solver after the solver finishes a crowdsourcing task.e QF can be denoted as Q.Hypothesis s j is the total number of tasks that the friend p j invited p to perform.After the tasks are completed, p j gives p a valuation according to the performance of every task.e valuation is represented as y z , (z � 1, 2, s j ) and y z ∈ [0, 1], and QF is denoted as

Mobile Information Systems
Definition 7 (communication).Communication represents the interaction times between a node and its friends in the social network.A short message, telephone, and information receiving and sending on social software can be counted in communication.We use F p,p j to denote the total communication times in a sampling time between node p and his friend p j .
Definition 8 (honesty index (HI)).is index is a weighted average of the QF evaluated by all a solver's friends.We denoted it as h, and it is a global variable.For example, h p denotes the total evaluation that all friends of node p give to p: where W j p denotes the weight of the friend j to node p, which is generally set to one.

Definition 9 (friends' abilities vector (FAV)).
A solver, as the central node in his social network, gives the AVs to one of his friends based on the AS according to their communications.For example, the FAV that node p gives to his friend p j is denoted as e p,p j : e p,p j � x 1 p,p j , x 2 p,p j , . . .x n p,p j  . (5)

Definition 10 (friends' abilities matrix (FAM)).
e FAM of a node is a matrix that consists of all the node's FAVs.For example, the FAM of node p is denoted as E p :

Feature Discovery and Candidate Solver Selection
As previously discussed, we redefine the node p in the social network as a triple: where h p indicates the HI, C p denotes the AVs, and E p represents the FAM.

Computing and Updating the FAM.
where F p,p j denotes the total communication times between node p and his friend p j , and F a k p,p j represents the times for the ability (topic) a k .
Based on formula (4) and algorithms 1 and 2, node p can reconstruct itself as the following form:
When p wants to distribute a task, all he needs to do is select the task topics and set the weight for each topic.If the task is associated with a location, his friends are filtered based on the location.en, the TDSRC generates the CNs by algorithm 3.
e topics (abilities) of the task should be set by node p; two main parameters must be set: ASS and the weight of this subset.Assuming that ASS � a u1 , a u2 , . . ., a up  , the corresponding weight is w u1 , w u2 , . . ., w up  , the node number in CN is α, and u1, u2, u3 ∈ 1, 2, . . ., n { }.

Quick Task-Distribution Mode Based on Abilities
Coverage.According to algorithm 3, the CNs of p can be determined, and then p can push the task forward to the CNs.As shown in Figure 2, the social network of p is surrounded by a red dotted line.e CNs of p may be p 1 , . . ., p m  , and p does not push the task to p 3 , p 4 , p 5  , whose backgrounds are gray.e friend who receives the task can complete the task or redistribute the task in his social network in the same manner.e processes can be repeated until the task is completed.
According to the concept of "six degrees of separation", a task can be sent to anybody in the world by transferring six times [23][24][25][26].Every time, we let a participant push the task to α friends in his social network (the value of α can be changed according to the demand).As a result, the distribution accuracy of the TDSRC is higher than that of a random distribution, and friends can avoid interference by irrelevant information.

Framework and Process of the System.
e modules and flow of the distribution system are illustrated in Figure 3. Assumption: P4 is the requester who wants to distribute a task.e main processing flow may be expounded in the following steps: Step 1. Requester P4 extracts the historical contents and records between his friends and himself.
Step 2. P4 statistically analyzes the contents and records, selects suitable abilities from the AS, and sets relevant weights to generate the task requirements.

Mobile Information Systems
Step 3. As the center, P4 reconstructs his social network and generates the triple, as shown in formula 7.
Step 4. P4 generates CNs using algorithm 3, and some best-matched friends in CNs are chosen as the solvers.
Step 5. e solvers iteratively undertake or redistribute the task.
Step 6. P4 evaluates the responses of the friends.
Step 7. P4 updates relevant data in his tables.
Step 8. Friends are regarded as the next requesters if they redistribute the task in their social network.ese steps are repeated (as shown in Figure 2, second distribution).

Simulation.
In recent years, WeChat has become the most popular social network in China.In 2017, the number of monthly active users reached 963 million, which is 20% more than the previous year [41].By the end of 2016, the WeChat public platforms published an average of 518 articles, each of which was read approximately 3603 times and won 17 praises [42].
us, WeChat has excellent transmission capacity.Regarding privacy protection, any individual in WeChat is restricted to viewing his contents and records through the WeChat system, which is suitable to our system.TDSRC simulates the process of information diffusion in a friends circle in WeChat.

Input: communication times of p and his friends for different keywords;
Output: the ability value for p's friends;  [33] employed a perspective that is similar to our perspective.e authors selected the top-K experts in a social network who fulfilled the task needs, and all potential experts were regarded as a stable and whole resource set.However, the set of candidate experts cannot be built in our system, which prevents the outcome of the two methodologies from being directly comparable.erefore, to validate the feasibility of our scheme, we leverage web crawler technology to grab the data, e.g., task categories, time, and other data for about 8 weeks on ZhuBaJie [43], which is an actual and well-known crowdsourcing platform in China.en we simulate data according to those data.Similar to [33], in which the experts' needs were classified into seven domains (namely, computer engineering, location, movies and TV, music, science, sport, technology, and video games), we approximately categorize the tasks into ten types by investigating ZhuBaJie.e tasks are designated A � 1, 2, . . ., 10 { }.Several keywords are extracted in every type, as shown in Table 2.
Assumption: Node p has 100 friends numbered from 1 to 100.
e data are sampled once every three months.Ten topics (abilities) exist, as shown in Table 1.e communication times between p and his friends range from 0 to 300. e abilities' interactive times follow a Poisson distribution.Several topics are randomly selected from the ten topics, and the FAM of p is calculated and shown in Table 3.Only 20 friends are included in the table due to length restrictions.e numbers in the table indicate the communication times with different friends for different topics in a sampling cycle.is table can also be denoted as E p (formula 6).

Abilities Discovery of Friends.
e data in Table 3 cover only one sampling period.We also count the communication times in five sampling periods.e AVs of p can be calculated by algorithm 1, and the results after the data are normalized by formula 7 are shown in Table 4. From Table 4, we can easily determine the largest value.Column 1 and column 10 contain the largest amount of data, which indicates that p is good at (i.e., interested in) abilities 1 and 10. e differences between one sampling and five samplings are shown in Figure 4.Only Nos.14, 28, 42, 85, and 100 are randomly selected as the examples.
As shown in Figure 4-7, Figure 4 is similar to Figure 5, and Figure 6 is similar to Figure 7.We can conclude that the number of times for topic 1 is large, whereas the number of times for topic 2 and topic 3 is small, which implies that the AVs of p is relatively stable and that p likes topic 1 and he probably is interested or skilled in topic 1.

CN Selection.
Because a requester intends to distribute a crowdsourcing task, he should select ASS and the weight of ASS.Using algorithm 3, for every ability, p can select the highest priority of ten (or other number according to the demands) friends to perform or redistribute a task.In the experiment, the CNs obtained by algorithm 3 are listed in Table 5.If p needs to release a crowdsourcing task of topic 1, he should send the task to his 92nd, 78th, 2nd, 46th friends, and so on.
e experimental results with multiple topics/abilities are shown in Table 6.
e simulations reveal that the TDSRC can successfully count the communication times according to the AS and calculate the AVs and FAM.ese parameters can be simultaneously updated according to the sampling period.For any task, the TDSRC can correctly determine the most appropriate CNs by matching the abilities' demands and the friends.A CN can complete and redistribute the task in his social network, and all procedures can be iterated until the task constraints (e.g., time constraints) are violated.

Time Efficiency of Task Distribution.
e time efficiency of task distribution is very significant mainly for delay-sensitive tasks.erefore, we randomly selected three different types of tasks: Sports, Business, Public welfare, and Manufacturing (Nos. 1, 2, 9, and 10 in Table 2).We applied three simulation experiments, i.e., random distribution method, full distribution method, and TDSRC distribution method.In the experiment, we have 2,000 friends.We assume that the success of the task distribution is that we receive valid task execution results from 50 friends.We, respectively, selected 200 friends for the random method and TDSRC method, and full distribution means that the task is distributed to all friends.In the experiment, we also assumed that when the task ability requirement falls in the top 50% of the friend's ability matrix (FAM), it means that the friend will perform the task.e time spent on the task means the average time spent of the 50 friends.e experiment results for different methods are shown in Figure 8.
As can be seen from Figure 8, the random strategy takes the longest time because it cannot accurately find the most appropriate participants.e results of the full strategy are almost the same as the TDSRC strategy, which shows that TDSRC can accurately find the suitable task workers almost as much as the full strategy.However, the number of samples selected by TDSRC is only one-fifth of the full strategy, which means TDSRC brings much less interference to unrelated friends than full strategy.In addition, most task Mobile Information Systems distribution is accompanied with some incentives, and the TDSRC strategy can save more costs than full distribution strategy.

Conclusion and Future Work
Adequate qualified participation is one of the most crucial factors that determine whether a crowdsourcing system can achieve perfection.We expand participants' coverage to location, attributes, background knowledge, social relations, and credibility.e TDSRC can dynamically and automatically discover participants' abilities according to the routine communication between requesters and friends and then reconstruct their social networks to facilitate task distribution.
is study is the first investigation of tasks distribution by leveraging the trust chain and transmission capabilities implied in a friends circle.e TDSRC not only  e simulation results verify the effectiveness of the TDSRC.However, several issues warrant future investigation.

Time Factor of Keywords.
In this study, we employ communication content without considering the time factor, which is significant (e.g., a keyword that appeared one month ago is more important than a keyword that appeared six months ago).e TDSRC becomes more complex if the time factor is considered.We can compromise by setting different weights for different sampling periods.e nearer the time, the more important the content.

Weight Values of Friend
Evaluation.Many weights should be set in the TDSRC (e.g., formulas 3 and 4).Different weights produce different results.Setting the weights is a topic worthy of further discussion.In our system, we use default values, which can typically be manually set by the central node.In the future, we will attempt to employ machine-learning methods to automatically set these weight values.Mobile Information Systems 6.3.Varied Interactive Data.In this study, we considered only the contextual information.In reality, extratextual elements, such as voice, pictures, and emoji, are also popular in WeChat.Such elements play an increasingly important role in expressing emotions among friends.To take advantage of all information, AI technologies, e.g., speech recognition and image understanding, should be incorporated to enhance the complexity of the TDSRC.We plan to conduct extensive research in this area in the future.Mobile Information Systems

where w j z
indicates the weight of the task Z of the friend j and w j z ∈ [0, 1].

4. 1 .
Computing and Updating the AVs.where d represents the total sampling times, and C a k p,p j denotes the communication times between p and his friend p j for ability a k .e AVs update once in every sampling period, and C a k p ∈ [0, 1].

Figure 2 :
Figure 2: Task releasing mode based on social relationship.

Figure 3 :
Figure 3: Crowdsourcing flow based on SRC in the social network.

Figure 8 :
Figure 8: Time efficiency under different distribution strategies.
e communication times between p and his friend p j for ability a k .

Table 2 :
Task topics and relevant keywords.

Table 4 :
Statistics of AVs for various sampling times (normalized).

Table 5 :
Selection results of CNs with single ability.

Table 6 :
CN selection results with multiple abilities.