Abstract
Community search (CS) enables personalized community discovery and has found a wide spectrum of emerging applications such as setting up social events and friend recommendation. While CS has been extensively studied for conventional homogeneous networks, the problem for heterogeneous information networks (HINs) has received attention only recently. However, existing studies suffer from several limitations, e.g., they either require users to specify a meta-path or relational constraints, which pose great challenges to users who are not familiar with HINs. To address these limitations, in this paper, we systematically study the problem of CS over large star-schema HINs without asking users to specify these constraints; that is, given a set Q of query vertices with the same type, find the most-likely community from a star-schema HIN containing Q, in which all the vertices are with the same type and close relationships. To capture the close relationships among vertices of the community, we employ the meta-path-based core model, and maximize the number of shared meta-paths such that each of them results in a cohesive core containing Q. To enable efficient CS, we first develop online algorithms via exploiting the anti-monotonicity property of shared meta-paths. We further boost the efficiency by proposing a novel index and an efficient index-based algorithm with elegant pruning techniques. Extensive experiments on four real large star-schema HINs show that our solutions are effective and efficient for searching communities, and the index-based algorithm is much faster than the online algorithms.
- Vladimir Batagelj and Matjaz Zaversnik. 2003. An O (m) algorithm for cores decomposition of networks. arXiv preprint cs/0310049 (2003).Google Scholar
- Francesco Bonchi, Arijit Khan, and Lorenzo Severini. 2019. Distance-generalized core decomposition. In Proceedings of the 2019 International Conference on Management of Data. 1006--1023.Google ScholarDigital Library
- Lijun Chang, Xuemin Lin, Lu Qin, Jeffrey Xu Yu, and Wenjie Zhang. 2015. Index-based optimal algorithms for computing steiner components with maximum connectivity. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. 459--474.Google ScholarDigital Library
- Lu Chen, Yunjun Gao, Yuanliang Zhang, Christian S Jensen, and Bolong Zheng. 2019. Efficient and incremental clustering algorithms on star-schema heterogeneous graphs. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, 256--267.Google ScholarCross Ref
- Lu Chen, Chengfei Liu, Xiaochun Yang, Bin Wang, Jianxin Li, and Rui Zhou. 2016. Efficient batch processing for multiple keyword queries on graph data. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. 1261--1270.Google ScholarDigital Library
- Lu Chen, Chengfei Liu, Rui Zhou, Jianxin Li, Xiaochun Yang, and Bin Wang. 2018. Maximum co-located community search in large scale social networks. Proceedings of the VLDB Endowment 11, 10 (2018), 1233--1246.Google ScholarDigital Library
- Lu Chen, Chengfei Liu, Rui Zhou, Jiajie Xu, Jeffrey Xu Yu, and Jianxin Li. 2020. Finding effective geo-social group for impromptu activities with diverse demands. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 698--708.Google ScholarDigital Library
- Afzal Azeem Chowdhary, Chengfei Liu, Lu Chen, Rui Zhou, and Yun Yang. 2020. Finding attribute diversified communities in complex networks. In International Conference on Database Systems for Advanced Applications. Springer, 19--35.Google ScholarDigital Library
- Jonathan Cohen. 2008. Trusses: Cohesive subgraphs for social network analysis. National security agency technical report 16, 3.1 (2008).Google Scholar
- Wanyun Cui, Yanghua Xiao, Haixun Wang, Yiqi Lu, and Wei Wang. 2013. Online search of overlapping communities. In Proceedings of the 2013 ACM SIGMOD international conference on Management of data. 277--288.Google ScholarDigital Library
- Wanyun Cui, Yanghua Xiao, Haixun Wang, and Wei Wang. 2014. Local search of communities in large graphs. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data. 991--1002.Google ScholarDigital Library
- Yuxiao Dong, Nitesh V. Chawla, and Ananthram Swami. 2017. Metapath2vec: Scalable Representation Learning for Heterogeneous Networks (KDD '17). Association for Computing Machinery, New York, NY, USA, 135--144. Google ScholarDigital Library
- Zheng Dong, Xin Huang, Guorui Yuan, Hengshu Zhu, and Hui Xiong. 2021. Butterfly-core community search over labeled graphs. Proceedings of the VLDB Endowment 14, 11 (2021), 2006--2018.Google ScholarDigital Library
- Joel T Dudley, Tarangini Deshpande, and Atul J Butte. 2011. Exploiting drug-disease relationships for computational drug repositioning. Briefings in bioinformatics 12, 4 (2011), 303--311.Google Scholar
- Soroush Ebadian and Xin Huang. 2019. Fast algorithm for K-truss discovery on public-private graphs. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. 2258--2264.Google ScholarCross Ref
- Yixiang Fang, Reynold Cheng, Siqiang Luo, and Jiafeng Hu. 2016. Effective community search for large attributed graphs. Proceedings of the VLDB Endowment 9, 12 (2016), 1233--1244.Google ScholarDigital Library
- Yixiang Fang, Xin Huang, Lu Qin, Ying Zhang, Wenjie Zhang, Reynold Cheng, and Xuemin Lin. 2020. A survey of community search over big graphs. The VLDB Journal 29, 1 (2020), 353--392.Google ScholarDigital Library
- Yixiang Fang, Yixing Yang, Wenjie Zhang, Xuemin Lin, and Xin Cao. 2020. Effective and efficient community search over large heterogeneous information networks. Proceedings of the VLDB Endowment 13, 6 (2020), 854--867.Google ScholarDigital Library
- Santo Fortunato. 2010. Community detection in graphs. Physics reports 486, 3--5 (2010), 75--174.Google Scholar
- Edoardo Galimberti, Francesco Bonchi, and Francesco Gullo. 2017. Core decomposition and densest subgraph in multilayer networks. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. 1807--1816.Google ScholarDigital Library
- Jiafeng Hu, Reynold Cheng, Kevin Chen-Chuan Chang, Aravind Sankar, Yixiang Fang, and Brian YH Lam. 2019. Discovering maximal motif cliques in large heterogeneous information networks. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, 746--757.Google ScholarCross Ref
- Jiafeng Hu, Xiaowei Wu, Reynold Cheng, Siqiang Luo, and Yixiang Fang. 2016. Querying minimal steiner maximum-connected subgraphs in large graphs. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. 1241--1250.Google ScholarDigital Library
- Xin Huang, Hong Cheng, Lu Qin, Wentao Tian, and Jeffrey Xu Yu. 2014. Querying k-truss community in large and dynamic graphs. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data. 1311--1322.Google ScholarDigital Library
- Xin Huang and Laks VS Lakshmanan. 2017. Attribute-driven community search. Proceedings of the VLDB Endowment 10, 9 (2017), 949--960.Google ScholarDigital Library
- Xin Huang, Laks VS Lakshmanan, Jeffrey Xu Yu, and Hong Cheng. 2015. Approximate closest community search in networks. Proceedings of the VLDB Endowment 9, 4 (2015), 276--287.Google ScholarDigital Library
- Zhipeng Huang, Yudian Zheng, Reynold Cheng, Yizhou Sun, Nikos Mamoulis, and Xiang Li. 2016. Meta structure: Computing relevance in large heterogeneous information networks. In Proceedings of the 22nd ACM SIGKDD International conference on knowledge discovery and data mining. 1595--1604.Google ScholarDigital Library
- Xun Jian, Yue Wang, and Lei Chen. 2020. Effective and efficient relational community detection and search in large dynamic heterogeneous information networks. Proceedings of the VLDB Endowment 13, 10 (2020), 1723--1736.Google ScholarDigital Library
- Yangqin Jiang, Yixiang Fang, Chenhao Ma, Xin Cao, and Chunshan Li. 2022. Effective community search over large star-schema heterogeneous information networks (technical report). https://github.com/ZzMeei/CS-StarSchemaHIN/blob/master/main.pdf (2022).Google Scholar
- Yuli Jiang, Xin Huang, and Hong Cheng. 2021. I/O efficient k-truss community search in massive graphs. The VLDB Journal 30, 5 (2021), 713--738.Google ScholarDigital Library
- Qing Liu, Minjun Zhao, Xin Huang, Jianliang Xu, and Yunjun Gao. 2020. Truss-based community search over large directed graphs. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 2183--2197.Google ScholarDigital Library
- Linhao Luo, Yixiang Fang, Xin Cao, Xiaofeng Zhang, and Wenjie Zhang. 2021. Detecting Communities from Heterogeneous Graphs: A Context Path-Based Graph Neural Network Model. Association for Computing Machinery, New York, NY, USA, 1170--1180. Google ScholarDigital Library
- Chenhao Ma, Yixiang Fang, Reynold Cheng, Laks VS Lakshmanan, and Xiaolin Han. 2022. A Convex-Programming Approach for Efficient Directed Densest Subgraph Discovery. In SIGMOD.Google Scholar
- Chenhao Ma, Yixiang Fang, Reynold Cheng, Laks VS Lakshmanan, Wenjie Zhang, and Xuemin Lin. 2020. Efficient algorithms for densest subgraph discovery on large directed graphs. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 1051--1066.Google ScholarDigital Library
- Chenhao Ma, Yixiang Fang, Reynold Cheng, Laks VS Lakshmanan, Wenjie Zhang, and Xuemin Lin. 2021. On Directed Densest Subgraph Discovery. TODS 46, 4 (2021), 1--45.Google ScholarDigital Library
- Changping Meng, Reynold Cheng, Silviu Maniu, Pierre Senellart, and Wangda Zhang. 2015. Discovering meta-paths in large heterogeneous information networks. In Proceedings of the 24th International Conference on World Wide Web. 754--764.Google ScholarDigital Library
- Mark EJ Newman and Michelle Girvan. 2004. Finding and evaluating community structure in networks. Physical review E 69, 2 (2004), 026113.Google Scholar
- You Peng, Song Bian, Rui Li, Sibo Wang, and Jeffrey Xu Yu. 2022. Finding Top-r Influential Communities under Aggregation Function. In ICDE. IEEE.Google Scholar
- Paola Pesántez-Cabrera and Ananth Kalyanaraman. 2017. Efficient detection of communities in biological bipartite networks. IEEE/ACM transactions on computational biology and bioinformatics 16, 1 (2017), 258--271.Google ScholarCross Ref
- Chuan Shi, Xiangnan Kong, Philip S Yu, Sihong Xie, and Bin Wu. 2012. Relevance search in heterogeneous networks. In Proceedings of the 15th international conference on extending database technology. 180--191.Google ScholarDigital Library
- Chuan Shi, Yitong Li, Jiawei Zhang, Yizhou Sun, and S Yu Philip. 2016. A survey of heterogeneous information network analysis. IEEE Transactions on Knowledge and Data Engineering 29, 1 (2016), 17--37.Google ScholarDigital Library
- Chuan Shi, Ran Wang, Yitong Li, Philip S Yu, and Bin Wu. 2014. Ranking-based clustering on general heterogeneous information networks by network projection. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. 699--708.Google ScholarDigital Library
- Chuan Shi, Chong Zhou, Xiangnan Kong, Philip S Yu, Gang Liu, and Bai Wang. 2012. Heterecom: a semantic-based recommendation system in heterogeneous networks. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. 1552--1555.Google ScholarDigital Library
- Mauro Sozio and Aristides Gionis. 2010. The community-search problem and how to plan a successful cocktail party. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. 939--948.Google ScholarDigital Library
- Yizhou Sun, Charu C Aggarwal, and Jiawei Han. 2012. Relation strength-aware clustering of heterogeneous information networks with incomplete attributes. Proceedings of the VLDB Endowment 5, 5 (2012), 394--405.Google ScholarDigital Library
- Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S Yu, and Tianyi Wu. 2011. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. Proceedings of the VLDB Endowment 4, 11 (2011), 992--1003.Google ScholarDigital Library
- Yizhou Sun, Jiawei Han, Peixiang Zhao, Zhijun Yin, Hong Cheng, and Tianyi Wu. 2009. Rankclus: integrating clustering with ranking for heterogeneous information network analysis. In Proceedings of the 12th international conference on extending database technology: advances in database technology. 565--576.Google ScholarDigital Library
- Yizhou Sun, Brandon Norick, Jiawei Han, Xifeng Yan, Philip S Yu, and Xiao Yu. 2013. Pathselclus: Integrating meta-path selection with user-guided object clustering in heterogeneous information networks. ACM Transactions on Knowledge Discovery from Data (TKDD) 7, 3 (2013), 1--23.Google Scholar
- Yizhou Sun, Yintao Yu, and Jiawei Han. 2009. Ranking-based clustering of heterogeneous information networks with star network schema. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. 797--806.Google ScholarDigital Library
- Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. 2008. Arnet-Miner: Extraction and Mining of Academic Social Networks. In KDD'08. 990--998.Google Scholar
- Ruby W Wang and Y Ye Fred. 2019. Simplifying Weighted Heterogeneous networks by extracting h-Structure via s-Degree. Scientific reports 9, 1 (2019), 1--8.Google Scholar
- Carl Yang, Yuxin Xiao, Yu Zhang, Yizhou Sun, and Jiawei Han. 2020. Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark. TKDE (2020).Google Scholar
- Dingqi Yang, Daqing Zhang, Longbiao Chen, and Bingqing Qu. 2015. Nation-Telescope: Monitoring and visualizing large-scale collective behavior in LBSNs. Journal of Network and Computer Applications 55 (2015), 170--180.Google ScholarCross Ref
- Dingqi Yang, Daqing Zhang, and Bingqing Qu. 2015. Participatory cultural mapping based on collective behavior in location based social networks. ACM Transactions on Intelligent Systems and Technology (2015). in press.Google Scholar
- Yixing Yang, Yixiang Fang, Xuemin Lin, and Wenjie Zhang. 2020. Effective and efficient truss computation over large heterogeneous information networks. In 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 901--912.Google ScholarCross Ref
- Xiao Yu, Xiang Ren, Yizhou Sun, Bradley Sturt, Urvashi Khandelwal, Quanquan Gu, Brandon Norick, and Jiawei Han. 2013. Recommendation in heterogeneous information networks with implicit user feedback. In Proceedings of the 7th ACM conference on Recommender systems. 347--350.Google ScholarDigital Library
- Long Yuan, Lu Qin, Wenjie Zhang, Lijun Chang, and Jianye Yang. 2017. Index-based densest clique percolation community search in networks. IEEE Transactions on Knowledge and Data Engineering 30, 5 (2017), 922--935.Google ScholarCross Ref
- Zhirong Yuan, You Peng, Peng Cheng, Li Han, Xuemin Lin, Lei Chen, and Wenjie Zhang. 2022. Efficient k-clique Listing with Set Intersection Speedup. In ICDE. IEEE.Google Scholar
- Yikai Zhang and Jeffrey Xu Yu. 2019. Unboundedness and efficiency of truss maintenance in evolving graphs. In Proceedings of the 2019 International Conference on Management of Data. 1024--1041.Google ScholarDigital Library
- Yaping Zheng, Shiyi Chen, Xinni Zhang, Xiaofeng Zhang, Xiaofei Yang, and Di Wang. 2020. Heterogeneous-Temporal Graph Convolutional Networks: Make the Community Detection Much Better. arXiv:1909.10248 [cs.LG]Google Scholar
- Alexander Zhou, Yue Wang, and Lei Chen. 2020. Finding large diverse communities on networks: the edge maximum k*-partite clique. Proceedings of the VLDB Endowment 13, 12 (2020), 2576--2589.Google ScholarDigital Library
- Yang Zhou and Ling Liu. 2013. Social influence based clustering of heterogeneous information networks. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. 338--346.Google ScholarDigital Library
- Honglei Zhuang, Jing Zhang, George Brova, Jie Tang, Hasan Cam, Xifeng Yan, and Jiawei Han. 2014. Mining query-based subnetwork outliers in heterogeneous information networks. In 2014 IEEE International Conference on Data Mining. IEEE, 1127--1132.Google ScholarDigital Library
Recommendations
Influential Community Search over Large Heterogeneous Information Networks
Recently, the topic of influential community search has gained much attention. Given a graph, it aims to find communities of vertices with high importance values from it. Existing works mainly focus on conventional homogeneous networks, where vertices ...
Effective and efficient community search over large heterogeneous information networks
Recently, the topic of community search (CS) has gained plenty of attention. Given a query vertex, CS looks for a dense subgraph that contains it. Existing studies mainly focus on homogeneous graphs in which vertices are of the same type, and cannot be ...
Influential Community Search Over Large Heterogeneous Information Networks
Spatial Data and IntelligenceAbstractCommunity search (CS) aims to find a cohesive community that satisfies query conditions in a given information network. Recent studies have introduced the CS problem into heterogeneous information networks (HINs) that are composed of multi-typed ...
Comments