Elsevier

Neurocomputing

Volume 97, 15 November 2012, Pages 390-397
Neurocomputing

Letters
Learning to blend vitality rankings from heterogeneous social networks

https://doi.org/10.1016/j.neucom.2012.06.024Get rights and content

Abstract

Heterogeneous social network services, such as Facebook and Twitter, have emerged as popular, and often effective channels for Web users to capture updates from their friends. The explosion in popularity of these social network services, however, has created the problem of “information overload”. The problem is becoming more severe as more and more users have engaged in more than one social networks simultaneously, each of which usually yields different friend connections and various sources of updates. Thus, it has made necessity to perform effective information filtering to retrieve information really attractive to web users from each of social networks and further blend them into a unified ranking list. In this paper, we introduce the problem of blending vitality rankings from heterogeneous social networks, where vitality denotes all kinds of updates user receives in various social networks. We propose a variety of content, users, and users correlation features for this task. Since vitalities from different social networks are likely to have different sets of features, we employ a divide-and-conquer strategy in order to fully exploit all available features for vitalities from each social network, respectively. Our experimental results, obtained from a large scale evaluation over two popular social networks, demonstrate the effectiveness of our method for putting vitalities that really interest users into higher orders in the blended ranking list. We complement our results with a thorough investigation of the feature importance and model selection with respect to both blending strategy and ranking for each social network.

Introduction

Social network services on the Web are now emerging as a new medium of communication: users are enabled to compose and broadcast messages with various types, such as text, links, images, and videos, to their friends in the portal of social network. In contrast to traditional Web portals that publish well-formed and static Web content, social network services, such as Facebook1 and Twitter,2 feature much more light-weight and real-time information, which usually includes status updates of friends, emerging news, and other contents that are interesting to the publisher and its friends. As the fast and convenient channel for information sharing, social network services have gained its explosive popularity among Web users. For example, Facebook has already had more than 500 million registered users at the end of 2010, and Twitter claims that it enjoys over 108 millions registered users as of April 2010.

Such explosion in popularity of those social network services, however, leads to the problem of “information overload”, namely, the sheer amount of information received by ordinary users can easily go beyond their processing capabilities. For instance, it is estimated that active Twitter users received over 300 tweets on average per day as of early 2010. This problem has been becoming more severe since there is growing body of users who actively engage in more than one popular social network services, simultaneously. Thus, to let users more efficiently surf on the Web, it has made necessity to introduce effective information filtering mechanism to identify information most interesting to web users from each social network, and taking one more step, to build a blending method for aggregating interesting information from heterogeneous social networks.

There have been several previous works studying information retrieval in the context of social networks, such as [1], [2]. However, most of them paid attention to only single social network, without considering blending various types of content from different social networks into one unified ranking. Inherently, it is quite a challenging problem. While on the surface many social networks look similar, each individual user has various friends connections and brings in quite different attitudes for obtaining information. Particularly, according to the previous studies [3], users on Twitter tend to connect with someone they do not know and are more interested in breaking news or new discovery; while users on Facebook usually connect with others they know and are more apt to post and see local events and issues needed feedback. Therefore, it is very hard to normalize the users' interests on information updates from heterogeneous social networks, which makes it even harder to blend those various types of information into a unified ranking framework.

To address these problems, we propose a new learning framework for blending vitality rankings from heterogeneous social networks, where we use “vitality” to represent all various types of updates users receive from different social networks. In particular, we first generalize a couple types of features, describing the content of vitality, the characteristics of the vitality viewer, and the correlation between viewer and vitality poster, as the signals to imply viewer's interests on the vitality. However, since different vitalities are not generated from the same social network service, there are a number of features which are good indicator for user's interests on the vitality under one social network service, while they might be invalid in another social network. For example, the “like” behavior in Facebook is a strong signal to indicate that the user likes to see the vitality, but Twitter does not include this feature. In our paper, we address this challenge by employing a divide-and-conquer strategy, which fully exploits the available features for each individual vitality, respectively. By using this strategy, we can apply learning-to-rank based algorithms to obtain calibrated and comparable ranking scores for vitalities from different social networks, which then directly leads to a unified ranking list. The results of a large scale evaluation over two popular social networks demonstrate the effectiveness of our method for identifying user-interested vitalities from heterogeneous social networks and blending them into a unified ranking list. We also complement our results with a thorough investigation of the feature importance and model selection for this blending framework.

The main contributions of this paper can be summarized as: (1) formalizing the problem of blending vitality rankings; (2) extracting a couple types of features for implying users' interests on vitalities; (3) a divide-and-conquer strategy for ranking and blending vitalities from heterogeneous social networks.

Section snippets

Related work

The recent growth and popularity of online social network services such as Facebook, Twitter, etc., has lead to a surge in the research community. Much of this work has focused on analyzing the network structure and growth patterns. For example, [4] studied the evolution of network structure and group membership in MySpace. Java et al. [5] studied the topological properties of the social network formed by Twitter users. And, [6] analyzed the relationship strength in the social network of

Problem statement

We now formalize the problem of blending vitality rankings from heterogeneous social networks. Currently, there are a couple of popular social network services on the Web, which are denoted as SN1, SN2, …, SNk. These social networks are inherently heterogeneous in the sense that each of them consists of its own set of users and the corresponding network structure. Most recently, there is an increasing number of users who actively engage in more than one social networks. At a certain time point,

Blending vitality rankings

We now introduce our learning based approach for blending vitality rankings from heterogeneous social networks. We focus on the specific characteristics of two heterogeneous social networks, Facebook and Twitter. Generally, we follow a learning to rank framework. Given a user u and the set of vitalities V(u) she receives from heterogeneous social networks at a specific time point, we first derive features for each user-vitality tuple u,v(vV(u)), (e.g. text of the vitality, user profile,

Datasets

User set: To collect vitalities from Facebook and Twitter, we first collect a set of users who have registered on both Facebook and Twitter. This user set is obtained from a commercial online portal service which can allow users to integrate their Facebook and Twitter accounts into one single portal. All users have been anonymized, and each user is represented using an ID without any meaning. From the whole set of users, we select a subset by filtering those who have no behavior in consecutive

Blending vitality rankings

In this experiment, we train all kinds of ranking models based on the training dataset, with parameter tuning on the validation dataset. Then, we test the respective results on the remainder hold-out testing dataset.

Fig. 2(a) and (b) illustrates the Precision at K of all compared methods as listed in Table 2, where we apply GBDT in Fig. 2(a) and ListNet in Fig. 2(b). We also illustrate the MAP and MRR scores for all compared methods in Table 3. From these figures and tables, we can observe that

Conclusion and future work

In this paper, we presented, to our best knowledge, an early attempt to blending vitality rankings from heterogeneous social networks. We introduced the formalized problem of blending vitality rankings from heterogeneous social networks, and proposed a variety of content, users, and users correlation features for this task. Due to the heterogeneity of vitalities, we employ a divide-and-conquer strategy in order to fully exploit all available features for vitalities from each social network,

Jiang Bian is the Scientist at Yahoo! Labs. He received the Ph.D. degree in Computer Science from Georgia Institute of Technology, U.S. in 2010 and the B.S. degree from Peking University, China, in 2006. His research focuses on Web search, recommendation and social networks. He authored tens of papers on many well-recognized international conferences and highly regarded journals. He has also served as PC Member and Peer Reviewer for several conferences and journals. He has much industrial

References (18)

  • A. Dong, R. Zhang, P. Kolari, J. Bai, F. Diaz, Y. Chang, Z. Zheng, Time is of essence: improving recency ranking using...
  • J. Weng, E. Lim, J. Jiang, Q. He, Twitterrank: finding topic-sensitive influential twitterers, in: Proceedings of WSDM,...
  • S. Gordhamer, When do you use twitter versus facebook? 〈http://mashable.com/2009/08/01/facebook-vs-twitter/〉,...
  • L. Backstrom, D. Huttenlocher, J. Kleinberg, X. Lan, Group formation in large social networks: membership, growth, and...
  • A. Java, X. Song, T. Finin, B. Tseng, Why we twitter: understanding microblogging usage and communities, in:...
  • R. Xiang, J. Neville, M. Rogati, Modeling relationship strength in online social networks, in: Proceedings of WWW,...
  • William W. Cohen et al.

    Learning to order things

    J. Artif. Int. Res.

    (1999)
  • R. Herbrich, T. Graepel, K. Obermayer, Large margin rank boundaries for ordinal regression, in: Advances in Large...
  • C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, G. Hullender, Learning to rank using gradient...
There are more references available in the full text version of this article.

Cited by (4)

  • Dimensionality reduction of data sequences for human activity recognition

    2016, Neurocomputing
    Citation Excerpt :

    An important area of social network is exploring new methods for behavior analysis [4–6]. However, explosion in popularity of social network services leads to the problem of “information overload” [7]. To let users more efficiently connect and communicate, it is necessary to introduce effective information extracting mechanism [8–12] to identify information most interesting to users from every network [13].

  • Aggregated Search

    2017, Foundations and Trends in Information Retrieval

Jiang Bian is the Scientist at Yahoo! Labs. He received the Ph.D. degree in Computer Science from Georgia Institute of Technology, U.S. in 2010 and the B.S. degree from Peking University, China, in 2006. His research focuses on Web search, recommendation and social networks. He authored tens of papers on many well-recognized international conferences and highly regarded journals. He has also served as PC Member and Peer Reviewer for several conferences and journals. He has much industrial experience as he worked or interned in several IT companies, including Yahoo!, Microsoft, and Facebook. More information about him can be found at https://sites.google.com/site/jiangbianhome/.

Yi Chang joins Yahoo! in 2006, and he is managing ranking science team in Yahoo! Labs to work on multiple relevance ranking or recommendation projects. Prior to this, he is a Ph.D. student in Carnegie Mellon University. His research interests include applied machine learning, information retrieval, natural language processing and text mining. Yi Chang serves as an associate editor of Neurocomputing and Pattern Recognition Letters, and he is the author or coauthor of more than 40 referred journal and conference publications.

Yun Fu received his B.S. and M.Eng. in Computer Science from Nankai University, China, in 1995 and 1998 respectively and his M.S. and Ph.D. in Computer Science from Duke University, USA, in 2001 and 2004 respectively. He is currently a principal software engineer at Yahoo! Inc. His research interests include distributed systems, operating systems, computer networks, databases, and data mining.

Wen-Yen Chen joins Yahoo! in November 2009, mainly working on research, new initiatives and data mining/analysis. Prior to this, he received his Ph.D. degree from University of California, Santa Barbara, in 2009, and B.S./M.S. degrees from National Chiao Tung University in 2004. He interned at IBM, NEC Labs America, and Google Research. His research interests include data mining, machine learning, parallel computing, and their applications to social networks. He is a recipient of the Research Creativity Award from National Science Council of Taiwan and Graduate Fellowship from the University of California. More information about him can be found at http://alumni.cs.ucsb.edu/∼wychen.

View full text