skip to main content
research-article
Artifacts Available / v1.1

Effective community search over large star-schema heterogeneous information networks

Published:01 July 2022Publication History
Skip Abstract Section

Abstract

Community search (CS) enables personalized community discovery and has found a wide spectrum of emerging applications such as setting up social events and friend recommendation. While CS has been extensively studied for conventional homogeneous networks, the problem for heterogeneous information networks (HINs) has received attention only recently. However, existing studies suffer from several limitations, e.g., they either require users to specify a meta-path or relational constraints, which pose great challenges to users who are not familiar with HINs. To address these limitations, in this paper, we systematically study the problem of CS over large star-schema HINs without asking users to specify these constraints; that is, given a set Q of query vertices with the same type, find the most-likely community from a star-schema HIN containing Q, in which all the vertices are with the same type and close relationships. To capture the close relationships among vertices of the community, we employ the meta-path-based core model, and maximize the number of shared meta-paths such that each of them results in a cohesive core containing Q. To enable efficient CS, we first develop online algorithms via exploiting the anti-monotonicity property of shared meta-paths. We further boost the efficiency by proposing a novel index and an efficient index-based algorithm with elegant pruning techniques. Extensive experiments on four real large star-schema HINs show that our solutions are effective and efficient for searching communities, and the index-based algorithm is much faster than the online algorithms.

References

  1. Vladimir Batagelj and Matjaz Zaversnik. 2003. An O (m) algorithm for cores decomposition of networks. arXiv preprint cs/0310049 (2003).Google ScholarGoogle Scholar
  2. Francesco Bonchi, Arijit Khan, and Lorenzo Severini. 2019. Distance-generalized core decomposition. In Proceedings of the 2019 International Conference on Management of Data. 1006--1023.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Lijun Chang, Xuemin Lin, Lu Qin, Jeffrey Xu Yu, and Wenjie Zhang. 2015. Index-based optimal algorithms for computing steiner components with maximum connectivity. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. 459--474.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Lu Chen, Yunjun Gao, Yuanliang Zhang, Christian S Jensen, and Bolong Zheng. 2019. Efficient and incremental clustering algorithms on star-schema heterogeneous graphs. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, 256--267.Google ScholarGoogle ScholarCross RefCross Ref
  5. Lu Chen, Chengfei Liu, Xiaochun Yang, Bin Wang, Jianxin Li, and Rui Zhou. 2016. Efficient batch processing for multiple keyword queries on graph data. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. 1261--1270.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Lu Chen, Chengfei Liu, Rui Zhou, Jianxin Li, Xiaochun Yang, and Bin Wang. 2018. Maximum co-located community search in large scale social networks. Proceedings of the VLDB Endowment 11, 10 (2018), 1233--1246.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Lu Chen, Chengfei Liu, Rui Zhou, Jiajie Xu, Jeffrey Xu Yu, and Jianxin Li. 2020. Finding effective geo-social group for impromptu activities with diverse demands. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 698--708.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Afzal Azeem Chowdhary, Chengfei Liu, Lu Chen, Rui Zhou, and Yun Yang. 2020. Finding attribute diversified communities in complex networks. In International Conference on Database Systems for Advanced Applications. Springer, 19--35.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Jonathan Cohen. 2008. Trusses: Cohesive subgraphs for social network analysis. National security agency technical report 16, 3.1 (2008).Google ScholarGoogle Scholar
  10. Wanyun Cui, Yanghua Xiao, Haixun Wang, Yiqi Lu, and Wei Wang. 2013. Online search of overlapping communities. In Proceedings of the 2013 ACM SIGMOD international conference on Management of data. 277--288.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Wanyun Cui, Yanghua Xiao, Haixun Wang, and Wei Wang. 2014. Local search of communities in large graphs. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data. 991--1002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Yuxiao Dong, Nitesh V. Chawla, and Ananthram Swami. 2017. Metapath2vec: Scalable Representation Learning for Heterogeneous Networks (KDD '17). Association for Computing Machinery, New York, NY, USA, 135--144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Zheng Dong, Xin Huang, Guorui Yuan, Hengshu Zhu, and Hui Xiong. 2021. Butterfly-core community search over labeled graphs. Proceedings of the VLDB Endowment 14, 11 (2021), 2006--2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Joel T Dudley, Tarangini Deshpande, and Atul J Butte. 2011. Exploiting drug-disease relationships for computational drug repositioning. Briefings in bioinformatics 12, 4 (2011), 303--311.Google ScholarGoogle Scholar
  15. Soroush Ebadian and Xin Huang. 2019. Fast algorithm for K-truss discovery on public-private graphs. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. 2258--2264.Google ScholarGoogle ScholarCross RefCross Ref
  16. Yixiang Fang, Reynold Cheng, Siqiang Luo, and Jiafeng Hu. 2016. Effective community search for large attributed graphs. Proceedings of the VLDB Endowment 9, 12 (2016), 1233--1244.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Yixiang Fang, Xin Huang, Lu Qin, Ying Zhang, Wenjie Zhang, Reynold Cheng, and Xuemin Lin. 2020. A survey of community search over big graphs. The VLDB Journal 29, 1 (2020), 353--392.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Yixiang Fang, Yixing Yang, Wenjie Zhang, Xuemin Lin, and Xin Cao. 2020. Effective and efficient community search over large heterogeneous information networks. Proceedings of the VLDB Endowment 13, 6 (2020), 854--867.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Santo Fortunato. 2010. Community detection in graphs. Physics reports 486, 3--5 (2010), 75--174.Google ScholarGoogle Scholar
  20. Edoardo Galimberti, Francesco Bonchi, and Francesco Gullo. 2017. Core decomposition and densest subgraph in multilayer networks. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. 1807--1816.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jiafeng Hu, Reynold Cheng, Kevin Chen-Chuan Chang, Aravind Sankar, Yixiang Fang, and Brian YH Lam. 2019. Discovering maximal motif cliques in large heterogeneous information networks. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, 746--757.Google ScholarGoogle ScholarCross RefCross Ref
  22. Jiafeng Hu, Xiaowei Wu, Reynold Cheng, Siqiang Luo, and Yixiang Fang. 2016. Querying minimal steiner maximum-connected subgraphs in large graphs. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. 1241--1250.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Xin Huang, Hong Cheng, Lu Qin, Wentao Tian, and Jeffrey Xu Yu. 2014. Querying k-truss community in large and dynamic graphs. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data. 1311--1322.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Xin Huang and Laks VS Lakshmanan. 2017. Attribute-driven community search. Proceedings of the VLDB Endowment 10, 9 (2017), 949--960.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Xin Huang, Laks VS Lakshmanan, Jeffrey Xu Yu, and Hong Cheng. 2015. Approximate closest community search in networks. Proceedings of the VLDB Endowment 9, 4 (2015), 276--287.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Zhipeng Huang, Yudian Zheng, Reynold Cheng, Yizhou Sun, Nikos Mamoulis, and Xiang Li. 2016. Meta structure: Computing relevance in large heterogeneous information networks. In Proceedings of the 22nd ACM SIGKDD International conference on knowledge discovery and data mining. 1595--1604.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Xun Jian, Yue Wang, and Lei Chen. 2020. Effective and efficient relational community detection and search in large dynamic heterogeneous information networks. Proceedings of the VLDB Endowment 13, 10 (2020), 1723--1736.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Yangqin Jiang, Yixiang Fang, Chenhao Ma, Xin Cao, and Chunshan Li. 2022. Effective community search over large star-schema heterogeneous information networks (technical report). https://github.com/ZzMeei/CS-StarSchemaHIN/blob/master/main.pdf (2022).Google ScholarGoogle Scholar
  29. Yuli Jiang, Xin Huang, and Hong Cheng. 2021. I/O efficient k-truss community search in massive graphs. The VLDB Journal 30, 5 (2021), 713--738.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Qing Liu, Minjun Zhao, Xin Huang, Jianliang Xu, and Yunjun Gao. 2020. Truss-based community search over large directed graphs. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 2183--2197.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Linhao Luo, Yixiang Fang, Xin Cao, Xiaofeng Zhang, and Wenjie Zhang. 2021. Detecting Communities from Heterogeneous Graphs: A Context Path-Based Graph Neural Network Model. Association for Computing Machinery, New York, NY, USA, 1170--1180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Chenhao Ma, Yixiang Fang, Reynold Cheng, Laks VS Lakshmanan, and Xiaolin Han. 2022. A Convex-Programming Approach for Efficient Directed Densest Subgraph Discovery. In SIGMOD.Google ScholarGoogle Scholar
  33. Chenhao Ma, Yixiang Fang, Reynold Cheng, Laks VS Lakshmanan, Wenjie Zhang, and Xuemin Lin. 2020. Efficient algorithms for densest subgraph discovery on large directed graphs. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 1051--1066.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Chenhao Ma, Yixiang Fang, Reynold Cheng, Laks VS Lakshmanan, Wenjie Zhang, and Xuemin Lin. 2021. On Directed Densest Subgraph Discovery. TODS 46, 4 (2021), 1--45.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Changping Meng, Reynold Cheng, Silviu Maniu, Pierre Senellart, and Wangda Zhang. 2015. Discovering meta-paths in large heterogeneous information networks. In Proceedings of the 24th International Conference on World Wide Web. 754--764.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Mark EJ Newman and Michelle Girvan. 2004. Finding and evaluating community structure in networks. Physical review E 69, 2 (2004), 026113.Google ScholarGoogle Scholar
  37. You Peng, Song Bian, Rui Li, Sibo Wang, and Jeffrey Xu Yu. 2022. Finding Top-r Influential Communities under Aggregation Function. In ICDE. IEEE.Google ScholarGoogle Scholar
  38. Paola Pesántez-Cabrera and Ananth Kalyanaraman. 2017. Efficient detection of communities in biological bipartite networks. IEEE/ACM transactions on computational biology and bioinformatics 16, 1 (2017), 258--271.Google ScholarGoogle ScholarCross RefCross Ref
  39. Chuan Shi, Xiangnan Kong, Philip S Yu, Sihong Xie, and Bin Wu. 2012. Relevance search in heterogeneous networks. In Proceedings of the 15th international conference on extending database technology. 180--191.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Chuan Shi, Yitong Li, Jiawei Zhang, Yizhou Sun, and S Yu Philip. 2016. A survey of heterogeneous information network analysis. IEEE Transactions on Knowledge and Data Engineering 29, 1 (2016), 17--37.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Chuan Shi, Ran Wang, Yitong Li, Philip S Yu, and Bin Wu. 2014. Ranking-based clustering on general heterogeneous information networks by network projection. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. 699--708.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Chuan Shi, Chong Zhou, Xiangnan Kong, Philip S Yu, Gang Liu, and Bai Wang. 2012. Heterecom: a semantic-based recommendation system in heterogeneous networks. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. 1552--1555.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Mauro Sozio and Aristides Gionis. 2010. The community-search problem and how to plan a successful cocktail party. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. 939--948.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Yizhou Sun, Charu C Aggarwal, and Jiawei Han. 2012. Relation strength-aware clustering of heterogeneous information networks with incomplete attributes. Proceedings of the VLDB Endowment 5, 5 (2012), 394--405.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S Yu, and Tianyi Wu. 2011. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. Proceedings of the VLDB Endowment 4, 11 (2011), 992--1003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Yizhou Sun, Jiawei Han, Peixiang Zhao, Zhijun Yin, Hong Cheng, and Tianyi Wu. 2009. Rankclus: integrating clustering with ranking for heterogeneous information network analysis. In Proceedings of the 12th international conference on extending database technology: advances in database technology. 565--576.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Yizhou Sun, Brandon Norick, Jiawei Han, Xifeng Yan, Philip S Yu, and Xiao Yu. 2013. Pathselclus: Integrating meta-path selection with user-guided object clustering in heterogeneous information networks. ACM Transactions on Knowledge Discovery from Data (TKDD) 7, 3 (2013), 1--23.Google ScholarGoogle Scholar
  48. Yizhou Sun, Yintao Yu, and Jiawei Han. 2009. Ranking-based clustering of heterogeneous information networks with star network schema. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. 797--806.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. 2008. Arnet-Miner: Extraction and Mining of Academic Social Networks. In KDD'08. 990--998.Google ScholarGoogle Scholar
  50. Ruby W Wang and Y Ye Fred. 2019. Simplifying Weighted Heterogeneous networks by extracting h-Structure via s-Degree. Scientific reports 9, 1 (2019), 1--8.Google ScholarGoogle Scholar
  51. Carl Yang, Yuxin Xiao, Yu Zhang, Yizhou Sun, and Jiawei Han. 2020. Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark. TKDE (2020).Google ScholarGoogle Scholar
  52. Dingqi Yang, Daqing Zhang, Longbiao Chen, and Bingqing Qu. 2015. Nation-Telescope: Monitoring and visualizing large-scale collective behavior in LBSNs. Journal of Network and Computer Applications 55 (2015), 170--180.Google ScholarGoogle ScholarCross RefCross Ref
  53. Dingqi Yang, Daqing Zhang, and Bingqing Qu. 2015. Participatory cultural mapping based on collective behavior in location based social networks. ACM Transactions on Intelligent Systems and Technology (2015). in press.Google ScholarGoogle Scholar
  54. Yixing Yang, Yixiang Fang, Xuemin Lin, and Wenjie Zhang. 2020. Effective and efficient truss computation over large heterogeneous information networks. In 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 901--912.Google ScholarGoogle ScholarCross RefCross Ref
  55. Xiao Yu, Xiang Ren, Yizhou Sun, Bradley Sturt, Urvashi Khandelwal, Quanquan Gu, Brandon Norick, and Jiawei Han. 2013. Recommendation in heterogeneous information networks with implicit user feedback. In Proceedings of the 7th ACM conference on Recommender systems. 347--350.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Long Yuan, Lu Qin, Wenjie Zhang, Lijun Chang, and Jianye Yang. 2017. Index-based densest clique percolation community search in networks. IEEE Transactions on Knowledge and Data Engineering 30, 5 (2017), 922--935.Google ScholarGoogle ScholarCross RefCross Ref
  57. Zhirong Yuan, You Peng, Peng Cheng, Li Han, Xuemin Lin, Lei Chen, and Wenjie Zhang. 2022. Efficient k-clique Listing with Set Intersection Speedup. In ICDE. IEEE.Google ScholarGoogle Scholar
  58. Yikai Zhang and Jeffrey Xu Yu. 2019. Unboundedness and efficiency of truss maintenance in evolving graphs. In Proceedings of the 2019 International Conference on Management of Data. 1024--1041.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Yaping Zheng, Shiyi Chen, Xinni Zhang, Xiaofeng Zhang, Xiaofei Yang, and Di Wang. 2020. Heterogeneous-Temporal Graph Convolutional Networks: Make the Community Detection Much Better. arXiv:1909.10248 [cs.LG]Google ScholarGoogle Scholar
  60. Alexander Zhou, Yue Wang, and Lei Chen. 2020. Finding large diverse communities on networks: the edge maximum k*-partite clique. Proceedings of the VLDB Endowment 13, 12 (2020), 2576--2589.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Yang Zhou and Ling Liu. 2013. Social influence based clustering of heterogeneous information networks. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. 338--346.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Honglei Zhuang, Jing Zhang, George Brova, Jie Tang, Hasan Cam, Xifeng Yan, and Jiawei Han. 2014. Mining query-based subnetwork outliers in heterogeneous information networks. In 2014 IEEE International Conference on Data Mining. IEEE, 1127--1132.Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader