Hypercore decomposition for non-fragile hyperedges: concepts, algorithms, observations, and applications

Bu, Fanchen; Lee, Geon; Shin, Kijung

doi:10.1007/s10618-023-00956-2

Hypercore decomposition for non-fragile hyperedges: concepts, algorithms, observations, and applications

Published: 08 August 2023

Volume 37, pages 2389–2437, (2023)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

320 Accesses
1 Citation
2 Altmetric
Explore all metrics

Abstract

Hypergraphs are a powerful abstraction for modeling high-order relations, which are ubiquitous in many fields. A hypergraph consists of nodes and hyperedges (i.e., subsets of nodes); and there have been a number of attempts to extend the notion of \({\varvec{k}}\)-cores, which proved useful with numerous applications for pairwise graphs, to hypergraphs. However, the previous extensions are based on an unrealistic assumption that hyperedges are fragile, i.e., a high-order relation becomes obsolete as soon as a single member leaves it.In this work, we propose a new substructure model, called \({\varvec{(k,t)}}\)-hypercore, based on the assumption that high-order relations remain as long as at least t fraction of the members remains. Specifically, it is defined as the maximal subhypergraph where (1) every node is contained in at least \({\varvec{k}}\) hyperedges in it and (2) at least \({\varvec{t}}\) fraction of the nodes remain in every hyperedge. We first prove that, given \({\varvec{t}}\) (or \({\varvec{k}}\)), finding the \({\varvec{(k,t)}}\)-hypercore for every possible \({\varvec{k}}\) (or \({\varvec{t}}\)) can be computed in time linear w.r.t the sum of the sizes of hyperedges. Then, we demonstrate that real-world hypergraphs from the same domain share similar \({\varvec{(k,t)}}\)-hypercore structures, which capture different perspectives depending on \({\varvec{t}}\). Lastly, we show the successful applications of our model in identifying influential nodes, dense substructures, and vulnerability in hypergraphs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient Truss Computation for Large Hypergraphs

Improving the core resilience of real-world hypergraphs

Article 09 August 2023

Generalized core maintenance of dynamic bipartite graphs

Article 26 October 2021

Notes

A multiset is a set al.lowing duplicate elements.
In this work, the maximal subgraph (subhypergraph) satisfying some conditions means that every other graph (hypergraph) satisfying such conditions is a subgraph (subhypergraph) of the maximal one.
Similar to clique expansion, we can also have weighted star expansion, which is, however, not used in this work.
We assume that the input hypergraph is in the memory and thus do not count the complexity of loading the hypergraph, which is \(O(\sum _{e \in E} |e|)\).
Recall that \(\ell\)-hypercoreness with \(\ell = 2\) is include in t-hypercoreness with \(t = 0\). For each dataset, we apply min-max normalization to all the possible \(\ell\) values with \(\ell \ge 3\) so that t-hypercoreness and \(\ell\)-hypercoreness can fit in the same x-axis with the range [0, 1].
The average performance over five independent trials is reported.
We count \(\ell\)-hypercoreness with each \(\ell\) value as a separate method (\(\ell = 2\) is not counted since it is already included in the concept of t-hypercoreness with \(t = 0\)).
Simplicial complexes can be seen as a special class of hypergraphs.
The case \(\ell = 2\) is included in the proposed concept of t-hypercoreness with \(t = 0\).

References

Adamic LA, Lukose RM, Puniyani AR et al. (2001) Search in power-law networks. Phys Rev E 64(4):046–135
Article Google Scholar
Albert R, Barabási AL (2002) Statistical mechanics of complex networks. Rev Mod Phys 74(1):47
Article MathSciNet MATH Google Scholar
Alvarez-Hamelin JI, Dall’Asta L, Barrat A, et al. (2006) Large scale networks fingerprinting and visualization using the k-core decomposition. In: NeurIPS
Alvarez-Hamelin JI, Dall’Asta L, Barrat A et al. (2008) K-core decomposition of internet graphs: hierarchies, self-similarity and measurement biases. Netw Heterog Media 3(2):371
Article MathSciNet MATH Google Scholar
Antelmi A, Cordasco G, Spagnuolo C et al. (2021) Social influence maximization in hypergraphs. Entropy 23(7):796
Article MathSciNet Google Scholar
Arafat NA, Khan A, Rai AK, et al. (2023) Neighborhood-based hypergraph core decomposition. PVLDB 16
Arya D, Gupta DK, Rudinac S, et al. (2020) Hypersage: Generalizing inductive representation learning on hypergraphs. arXiv preprint arXiv:2010.04558
Bai S, Zhang F, Torr PH (2021) Hypergraph convolution and hypergraph attention. Pattern Recogn 110(107):637
Google Scholar
Batagelj V, Zaversnik M (2003) An \(o(m)\) algorithm for cores decomposition of networks. In: arXiv
Benson AR (2019) Three hypergraph eigenvector centralities. SIAM J Math Data Sci 1(2):293–312
Article MathSciNet MATH Google Scholar
Benson AR, Abebe R, Schaub MT et al. (2018) Simplicial closure and higher-order link prediction. PNAS 115(48):E11221–E11230
Article Google Scholar
Benson AR, Kumar R, Tomkins A (2018b) Sequences of sets. In: KDD
Blanco R, Lioma C (2012) Graph-based term weighting for information retrieval. Inf Retr 15(1):54–92
Article Google Scholar
Bodó Á, Katona GY, Simon PL (2016) Sis epidemic propagation on hypergraphs. Bull Math Biol 78(4):713–735
Article MathSciNet MATH Google Scholar
Bonacich P, Lloyd P (2001) Eigenvector-like measures of centrality for asymmetric relations. Social Netw 23(3):191–201
Article Google Scholar
Bonchi F, Khan A, Severini L (2019) Distance-generalized core decomposition. In: SIGMOD
Bu F, Lee G, Shin K (2023) Code, datasets, and supplementary materials. https://github.com/bokveizen/non-fragile-hypercore
Chein M, Mugnier ML (2008) Graph-based knowledge representation: computational foundations of conceptual graphs. Springer
MATH Google Scholar
Chen Z, Yuan L, Han L, et al. (2021) Higher-order truss decomposition in graphs. In: TKDE
Chien E, Pan C, Peng J, et al. (2021) You are allset: a multiset function framework for hypergraph neural networks. arXiv preprint arXiv:2106.13264
Corominas-Murtra B, Fuchs B, Thurner S (2014) Detection of the elite structure in a virtual multiplex social system by means of a generalised k-core. PLoS ONE 9(12):e11,2606
Article Google Scholar
Cui H, Lu Z, Li P, et al. (2022) On positional and structural node features for graph neural networks on non-attributed graphs. In: CIKM
Dai Q, Li RH, Qin L, et al. (2021) Scaling up distance-generalized core decomposition. In: CIKM
Debnath S, Ganguly N, Mitra P (2008) Feature weighting in content based recommendation system using social network analysis. In: WWW
Do MT, Yoon Se, Hooi B, et al. (2020) Structural patterns and generative models of real-world hypergraphs. In: KDD
Feng Y, You H, Zhang Z, et al. (2019) Hypergraph neural networks. In: AAAI
Gabert K, Pinar A, Çatalyürek ÜV (2021a) Shared-memory scalable k-core maintenance on dynamic graphs and hypergraphs. In: IPDPSW
Gabert K, Pinar A, Çatalyürek ÜV (2021b) A unifying framework to identify dense subgraphs on streams: Graph nuclei to hypergraph cores. In: WSDM
Gao Y, Feng Y, Ji S, et al. (2022) Hgnn\(^{+}\): General hypergraph neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence
Grover A, Leskovec J (2016) node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp 855–864
Han Z, Zheng X, Chen C, et al. (2023) Intra and inter domain hypergraph convolutional network for cross-domain recommendation. In: WWW
He T, Ong YS, Bai L (2021) Learning conjoint attentions for graph neural nets. In: NeurIPS
Hua QS, Zhang X, Jin H et al. (2023) Revisiting core maintenance for dynamic hypergraphs. IEEE Trans Parallel Distrib Syst 34:981–994
Article Google Scholar
Huang J, Yang J (2021) Unignn: a unified framework for graph and hypergraph neural networks. arXiv preprint arXiv:2105.00956
Jiang J, Wei Y, Feng Y, et al. (2019) Dynamic hypergraph neural networks. In: IJCAI
Kim H, Ko J, Bu F, et al. (2023) Characterization of simplicial complexes by counting simplets beyond four nodes. In: WWW
Kim J, Oh S, Cho S, et al. (2022) Equivariant hypergraph neural networks. In: ECCV
Kitsak M, Gallos LK, Havlin S et al. (2010) Identification of influential spreaders in complex networks. Nat Phys 6(11):888–893
Article Google Scholar
Ko J, Kook Y, Shin K (2022) Growth patterns and models of real-world hypergraphs. KAIS 64(11):2883–2920
Google Scholar
Konstantinova EV, Skorobogatov VA (2001) Application of hypergraph theory in chemistry. Discret Math 235(1–3):365–383
Article MathSciNet MATH Google Scholar
Lee D, Shin K (2023) I’m me, we’re us, and i’m us: Tri-directional contrastive learning on hypergraphs. In: AAAI
Lee G, Shin K (2021) Thyme+: Temporal hypergraph motifs and fast algorithms for exact counting. In: ICDM
Lee G, Ko J, Shin K (2020) Hypergraph motifs: concepts, algorithms, and discoveries. PVLDB 13(11):2256–2269
Google Scholar
Lee G, Choe M, Shin K (2021) How do hyperedges overlap in real-world hypergraphs? - patterns, measures, and generators. In: WWW
Leskovec J, Kleinberg J, Faloutsos C (2007) Graph evolution: Densification and shrinking diameters. TKDD 1(1):2–es
Li P, Wang H, Li K, et al. (2023) Influence without authority: Maximizing information coverage in hypergraphs. In: SDM
Liao X, Xu Y, Ling H (2021) Hypergraph neural networks for hypergraph matching. In: ICCV
Limnios S, Dasoulas G, Thilikos DM, et al. (2021) Hcore-init: Neural network initialization based on graph degeneracy. In: ICPR
Liu B, Yuan L, Lin X et al. (2020) Efficient (\(\alpha\), \(\beta\))-core computation in bipartite graphs. VLDB J 29(5):1075–1099
Article Google Scholar
Lotito QF, Musciotto F, Montresor A et al. (2022) Higher-order motif analysis in hypergraphs. Commun Phys 5(1):79
Article Google Scholar
Lu Z, Zhu Y, Zhong M, et al. (2022) On time-optimal (k, p)-core community search in dynamic graphs. In: ICDE
Luo F, Li B, Wan XF, et al. (2009) Core and periphery structures in protein interaction networks. In: BMC bioinformatics
Luo Q, Yu D, Cai Z, et al. (2021) Hypercore maintenance in dynamic hypergraphs. In: ICDE
Luo Q, Yu D, Cai Z et al. (2022) Toward maintenance of hypercores in large-scale dynamic hypergraphs. VLDB J 32:1–18
Google Scholar
Malliaros FD, Giatsidis C, Papadopoulos AN et al. (2020) The core decomposition of networks: theory, algorithms and applications. VLDB J 29:61–92
Article Google Scholar
Mastrandrea R, Fournet J, Barrat A (2015) Contact patterns in a high school: a comparison between data collected using wearable sensors, contact diaries and friendship surveys. PloS one 10(9):e0136497
Article Google Scholar
McGlohon M, Akoglu L, Faloutsos C (2008) Weighted graphs and disconnected components: patterns and a generator. In: KDD
Mihalcea R, Radev D (2011) Graph-based Natural Language Processing and Information Retrieval. Cambridge University Press
Book MATH Google Scholar
Peng C, Kolda TG, Pinar A (2014) Accelerating community detection by using k-core subgraphs. In: arXiv
Peng Y, Zhang Y, Zhang W, et al. (2018) Efficient probabilistic k-core computation on uncertain graphs. In: ICDE
Preti G, De Francisci Morales G, Bonchi F (2021) Strud: Truss decomposition of simplicial complexes. In: WWW
Qu C, Tao M, Yuan R (2018) A hypergraph-based blockchain model and application in internet of things-enabled smart homes. Sensors 18(9):2784
Article Google Scholar
Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106
Article Google Scholar
Rossi MEG, Malliaros FD, Vazirgiannis M (2015) Spread it good, spread it fast: Identification of influential nodes in social networks. In: WWW
Sarıyüce AE, Pinar A (2018) Peeling bipartite networks for dense subgraph discovery. In: WSDM
Seidman SB (1983) Network structure and minimum degree. Social Netw 5(3):269–287
Article MathSciNet Google Scholar
Shin K, Eliassi-Rad T, Faloutsos C (2018) Patterns and anomalies in k-cores of real-world graphs with applications. KAIS 54(3):677–710
Google Scholar
Shin K, Hooi B, Faloutsos C (2018) Fast, accurate, and flexible algorithms for dense subtensor mining. TKDD 12(3):1–30
Article Google Scholar
Silva NB, Tsang R, Cavalcanti GD, et al. (2010) A graph-based friend recommendation system using genetic algorithm. In: CEC
Sinha A, Shen Z, Song Y, et al. (2015) An overview of microsoft academic service (mas) and applications. In: WWW
Sun B, Chan THH, Sozio M (2020) Fully dynamic approximate k-core decomposition in hypergraphs. TKDD 14(4):1–21
Article Google Scholar
Torres L, Blevins AS, Bassett DS et al. (2021) The why, how, and when of representations for complex systems. SIAM Rev 63:435–485
Article MathSciNet MATH Google Scholar
Tudisco F, Higham DJ (2021) Node and edge nonlinear eigenvector centrality for hypergraphs. Commun Phys 4(1):1–10
Google Scholar
Victor F, Akcora CG, Gel YR, et al. (2021) Alphacore: data depth based core decomposition. In: KDD
Vogiatzis D (2013) Influence study on hyper-graphs. In: AAAI Symposia
Wang K, Cao X, Lin X, et al. (2018) Efficient computing of radius-bounded k-cores. In: ICDE
Watts DJ, Strogatz SH (1998) Collective dynamics of small-world networks. Nature 393(6684):440–442
Article MATH Google Scholar
Wood CI, Hicks IV (2015) The minimal k-core problem for modeling k-assemblies. J Math Neurosci 5(1):1–19
Article MathSciNet MATH Google Scholar
Wu T, Ling Q (2023) Self-supervised heterogeneous hypergraph network for knowledge tracing. Inf Sci 624:200–216
Article Google Scholar
Xia L, Huang C, Xu Y, et al. (2022) Hypergraph contrastive collaborative filtering. In: SIGIR
Xie M, Zhan X, Liu C, et al. (2023) An efficient adaptive degree-based heuristic algorithm for influence maximization in hypergraphs. Inf Process Manage 60(2):103161
Article Google Scholar
Yang C, Wang R, Yao S, et al. (2022) Semi-supervised hypergraph node classification on hypergraph line expansion. In: CIKM
Yin H, Benson AR, Leskovec J, et al. (2017) Local higher-order graph clustering. In: KDD
Zhang C, Zhang F, Zhang W, et al. (2020) Exploring finer granularity within the cores: efficient (k, p)-core computation. In: ICDE
Zhang F, Zhang Y, Qin L, et al. (2017a) Finding critical users for social network engagement: the collapsed k-core problem. In: AAAI
Zhang F, Zhang Y, Qin L, et al. (2017b) When engagement meets similarity: efficient (k, r)-core computation on social networks. In: PVLDB
Zhang Y, Parthasarathy S (2012) Extracting analyzing and visualizing triangle k-core motifs within networks. In: ICDE
Zhu J, Zhu J, Ghosh S et al. (2018) Social influence maximization in hypergraph in social networks. TNSE 6(4):801–811
MathSciNet Google Scholar
Zhu W, Chen C, Wang X, et al. (2018b) K-core minimization: an edge manipulation approach. In: CIKM
Zhu W, Zhang M, Chen C, et al. (2019) Pivotal relationship identification: the k-truss minimization problem. In: IJCAI
Zien JY, Schlag MD, Chan PK (1999) Multilevel spectral hypergraph partitioning with arbitrary vertex sizes. IEEE Trans Comput Aided Des Integr Circuits Syst 18(9):1389–1399
Article Google Scholar

Download references

Funding

This work was supported by National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. NRF-2020R1C1C1008296) and Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2022-0-00871, Development of AI Autonomy and Knowledge Enhancement for AI Agent Collaboration) (No. 2019-0-00075, Artificial Intelligence Graduate School Program (KAIST)).

Author information

Authors and Affiliations

School of Electrical Engineering, KAIST, Daejeon, South Korea
Fanchen Bu & Kijung Shin
Kim Jaechul Graduate School of AI, KAIST, Seoul, South Korea
Geon Lee & Kijung Shin

Authors

Fanchen Bu
View author publications
You can also search for this author in PubMed Google Scholar
Geon Lee
View author publications
You can also search for this author in PubMed Google Scholar
Kijung Shin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kijung Shin.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Responsible editor: Charalampos Tsourakakis.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 4972 KB)

Appendices

A. Proofs

1.1 A.1 Proof of proposition 1

Proof

Since H is finite, the number of subhypergraphs of H is also finite. Therefore, there exists one subhypergraph with maximal total size (which is possibly an empty hypergraph) where each node has degree at least k and at least t proportion of the constituent nodes remain in each hyperedge, completing the proof of existence. To show the uniqueness, suppose the opposite, and let \(C^1 = (V^1, E^1)\) and \(C^2 = (V^2, E^2)\) be two distinct (k, t)-hypercores of H. Then we consider the hypergraph \(C' = (V', E')\) with \(E' = \{e^1_i \cup e^2_i: i \in I_{E^1} \cup I_{E^2}\}\). Clearly, \(C'\) is a subhypergraph of H with a larger total size that satisfies the node-degree and hyperedge-fraction conditions, which contradicts the maximality and completes the proof. \(\square\)

1.2 A.2 Proof of proposition 2

Proof

Suppose that \(C_{k, t_2}(H)\) is not a subhypergraph of \(C_{k, t_1}(H)\). Then we take the union \(C_{k, t_2}(H) \cup C_{k, t_1}(H)\) and we obtain a hypergraph that is strictly larger than \(C_{k, t_1}(H)\) and satisfies the conditions of \((k, t_1)\)-hypercore, which contradicts with the maximality, completing the proof. The second statement can be proved similarly. \(\square\)

1.3 A.3 Proof of lemma 1

Proof

This equivalence is immediate by two facts. First, for each node \(v \in V\), the degree of v in H is equal to the degree of v in \(G_{bp}(H)\). Second, for each hyperedge \(e \in E\), the number of nodes in e is equal to the degree of e in \(G_{bp}(H)\). With the above two facts, this equivalence immediately follows. \(\square\)

1.4 A.4 Proof of lemma 2

Proof

By Definition 5, when \(t = 0\), the definition of \(C_{k; t = 0}(H)\) is the maximal subhypergraph of H where (1) every node in \(C_{k, t=0}(H)\) has degree at least k and (2) at least two nodes remain in every hyperedge of \(C_{k, t=0}(H)\). Such a definition exactly coincides with \({\tilde{C}}_{k; \ell = 2}(H)\), completing the proof. \(\square\)

1.5 A.5 Proof of lemma 3

We can understand the differences between \((k; \ell )\)-hypercores and (k, t)-hypercores by two intuitions. When we obtain the \((k; \ell )\)-hypercore of a given H with \(\ell > 2\), all hyperedges of cardinality 2 are removed in the first place. Therefore, if we want to find an \(\ell\) such that \({\tilde{C}}_{k; \ell } = C_{k, t}\) where \(C_{k, t}\) contains any hyperedge of cardinality 2, the only possible \(\ell\) value is \(\ell = 2\). Since the threshold in the (k, t)-hypercore is proportional, it imposes different absolute cardinality thresholds for hyperedges of different sizes. On the contrary, the \((k; \ell )\)-hypercore imposes the same absolute cardinality threshold for each hyperedge. We shall show two counterexamples from the two intuitions above.

Proof

Consider \(H = (V, E)\) with \(E = \{\{1, 2\}, \{1, 3\} \{1, 2, 3, 4\}, \{1, 3, 4, 5, 6\}\}.\) The \((k=2, t=3/4)\)-hypercore of H consists of the hyperedges \(\{\{1,2\}, \{1,3\}, \{1, 2, 3\}\},\) where the hyperedge \(\{1,3,4,5,6\}\) is totally removed since only \(3/5 < t = 3/4\) of the constituent nodes remain by the node-degree threshold \(k = 2\). For the \((k; \ell )\)-hypercore, the \((k=2; \ell =2)\)-hypercore of H consists of the hyperedges \(\{\{1, 2\}, \{1, 3\}, \{1, 2, 3, 4\}, \{1, 3, 4\}\}\); the \((k=2; \ell =3)\)-hypercore of H consists of the hyperedges \(\{\{1, 3, 4\}, \{1, 3, 4\}\}\); when \(\ell \ge 4\), the \((k=2;\ell )\)-hypercore of H is empty, completing the proof.

We show another counterexample. Consider \(H = (V, E)\) with \(E = \{\{1,2,3,4\}, \{1,2,5,6\}, \{5,6,7,8\},\{3,4,9,10,11\},\{1,2,3,4,5,6,7,8\}\}.\) The \((k=3, t=1/2)\)-hypercore of H consists of the hyperedges \(\{\{1,2\}, \{1,2,5,6\}, \{5,6\}, \{1,2,5,6\}\}.\) For the \((k; \ell )\)-hypercore, when \(\ell = 2\), the \((k=3; \ell =2)\)-hypercore of H consists of the hyperedges \(\{\{1,2,3,4\},\{1,2,5,6\},\{5,6\},\{3,4\},\{1,2,3,4,5,6\}\};\) when \(\ell \ge 3\), the \((k=3; \ell )\)-hypercore of H is empty, completing the proof. \(\square\)

Remark 1

Our proposed (k, t)-hypercore allows arbitrarily fine-grained adjustment since the value of t is continuous in [0, 1], while \(\ell\) must be an integer. In real-world hypergraphs, many hyperedges are of cardinality 2. Therefore, the \((k; \ell )\)-hypercore with \(\ell > 2\) is significantly less meaningful than the (k, t)-hypercore since many hyperedges are not taken into consideration at all. See Tbl. 5 for the detailed number of hyperedges of different cardinality in each dataset we have used. See Figs. 8 and 9 for the performance of hypercoreness w.r.t \((k; \ell )\)-hypercore to indicate the influence of nodes. Note again that the \((k; \ell =2)\)-hypercore is included in our proposed concept as the \((k, t=0)\)-hypercore. We observe that in most datasets, the \((k; \ell )\)-hypercores become less meaningful and fail to indicate the influence of nodes when \(\ell\) becomes large, as expected.

1.6 A.6 Proof of theorem 1

Proof

Correctness. The size of a hyperedge changes only when some node in \({\mathcal {R}}\) is removed from it, and the degree of a node changes only when some incident hyperedge is removed. Therefore, when Algorithm 1 ends, each node has degree at least k, otherwise it must have been included in \({\mathcal {R}}\) and removed, and each hyperedge satisfies the hyperedge-fraction condition, otherwise it must have been removed. This implies that the output of Algorithm 1 satisfies both the node-degree and hyperedge-fraction conditions w.r.t H, k, and t. We now show the maximality. Suppose not, and let \((v, e_i)\) be the first node-hyperedge pair that appears during the process of Algorithm 1 with \(v \in e_i \in E(C_{k, t})\) but \(v \notin e_i' \in E'\), where \(C' = (V', E')\) is the returned hypergraph. This implies that v is removed from e, and thus v is included in \({\mathcal {R}}\) because its degree has been below k. However, by the definition of the (k, t)-hypercore and the assumption that \((v, e_i)\) is the first pair, before the deletion, the degree of v is at least k, which completes the proof by contradiction.

Time complexity. We assume the input hypergraph has been loaded in the memory and thus do not count the complexity of loading the hypergraph. Checking the initial degrees (Line 1) takes \(O(|V|)\). In the while loop, each node is added to the set of nodes to be removed at most once since each node is added exactly when its degree decreases from k to \(k - 1\). Therefore, this process takes \(O(|V|)\). By checking the incident edges of each node in \({\mathcal {R}}\), we find all \(e'_i\)s intersecting with \({\mathcal {R}}\), which takes \(O(|{\mathcal {R}}|) = O(|V|)\). Hash tables are used to implement the sets. Before a hyperedge \(e \in E\) is totally removed, at least \(\max \left( \lceil t|e| \rceil , 2\right)\) nodes remain in it (otherwise it has been removed earlier), and thus it can be visited at most \(|e| - \max \left( \lceil t|e| \rceil , 2\right) + 1\) times (because one node is removed at each time). This process takes \(O(\sum _{e \in E} (|e| - \max \left( \lceil t|e| \rceil , 2\right) + 1)) = O(|E| + (1 - t) \sum _{e \in E} |e|)\). Therefore, the total time complexity is \(O(|V|) + O(|E| + (1 - t) \sum _{e \in E} |e|) = O(|E| + (1 - t) \sum _{e \in E} |e|)\). \(\square\)

1.7 A.7 Proof of theorem 2

Proof

Correctness. For each node v, the assignment of \(c_t(v)\) happens only once when \(v \in {\mathcal {R}}\), i.e., before its deletion. By Theorem 1, \(c_t(v) = k - 1\) implies that v is not in the (k, t)-hypercore but in the previous \((k', t)\)-hypercore where each node has degree at least \(k - 1\), i.e., v is in the \((k-1, t)\)-hypercore.

Time complexity. The values of k increases \(O(c_t^*)\) times, thus the process in Lines 4 and 5 is repeated for \(O(c_t^*)\) times and takes \(O(c_t^* |V|)\). The assignment of t-hypercoreness of each node (Line 7) takes \(O(|V|)\). As shown in the proof of Theorem 1, each hyperedge is visited at most \(|e| - \max (\lceil t|e| \rceil , 2) + 1\) times before being deleted and each node is added to the set of nodes to be removed only once. Therefore, the remaining process takes \(O(|V| + |E| + (1 - t) \sum _{e \in E} |e|)\). \(\square\)

1.8 A.8 Proof of theorem 3

Proof

Correctness. For each node v, the assignment of \(f_k(v)\) happens only once when \(v \in {\mathcal {R}}\), i.e., before its deletion. By Theorem 1, \(f_k(v) = t\) implies that v is in the (k, t)-hypercore with degree k and is in at least one hyperedge that is in the (k, t)-hypercore but not in any \((k, t')\)-hypercore with \(t' > t\). Therefore, v is not in any \((k, t')\)-hypercore with \(t' > t\).

Time complexity. Recording the hyperedge sizes (Line 1) takes \(O(|E|)\). By Theorem 1, computing \(C_{k, 0}\) (Line 2) takes \(O(\sum _{e \in E} |e|)\). As shown in the previous proofs, the while loop (Lines 4 to 12) takes \(O(|V| + |E| + \sum _{e \in E} |e|) = O(\sum _{e \in E} |e|)\). \(\square\)

Details of datasets

In this section, we provide more details of the datasets used in our experiments.

Coauth-DBLP/Geology. In these two coauthorship hypergraphs, each hyperedge represents a publication, and the constituent nodes of a hyperedge represent the authors of the corresponding publication.
NDC-classes/substances. In these two hypergraphs from the National Drug Code (NDC) Directory, each hyperedge represents a drug (with its unique NDC code), and the constituent nodes of a hyperedge represent the class labels (for NDC-classes) or the ingredients (for NDC-substances) of the drug.
Contact-high/primary. In these two contact hypergraphs, each hyperedge represents a group of interacting individuals (the constituent nodes) within a predetermined time period.
Email-Enron/Eu. In these two email hypergraphs, each hyperedge represents an email (possibly sent to multiple people individually at the same time), which contains the sender and all the receivers as its constituent nodes.
Tags-ubuntu/math/SO. In these three tags hypergraphs from https://stackoverflow.com/, each node represents a tag, and each hyperedge represents a question, where each constituent node represents a tag applied to the question.
Threads-ubuntu/math/SO. In these three threads hypergraphs also from https://stackoverflow.com/, each hyperedge represents a thread, where each constituent node represents a person that participates in it.

We have used the preprocessed version of the datasets where each hyperedge consists of at most 25 nodes. In Table 5, we report the number of hyperedges of different cardinality in each dataset. Notably, on https://www.cs.cornell.edu/~arb/data/, the full version of the datasets, in which the cardinality of the hyperedges is not limited, is also available.

Table 5 The number of hyperedges of different cardinality in each dataset. For each dataset, we list the number of hyperedges of each specific size. Specifically, in most datasets, a large number of hyperedges are of cardinality 2. We use \(E_s\) to denote the set of hyperedges of cardinality s, for each s

Full size table

Table 6 For each dataset, we report the information gain over degree, for each of the considered quantities: \(\ell\)-hypercoreness with \(\ell \in \{3, 4, 5\}\), neighbor-hypercoreness, and neighbor-degree-hypercoreness

Full size table

Additional experimental results

In this section, we provide additional experimental results supplementing the main text. In Fig. 15, we report the results regarding the statistical difference between t-hypercoreness and other centrality measures, as well as among t-hypercoreness with different t, on the datasets not covered in the main text. In Fig. 16, we report the results regarding the information gain over degree, on the datasets not covered in the main text. In Table 6, for each dataset, we report the information gain over degree for the following quantities: \(\ell\)-hypercoreness with \(\ell \in \{3, 4, 5\}\),^{Footnote 9} neighbor-hypercoreness, and neighbor-degree-hypercoreness (Definitions. 9, 12, and 14). Notably, we do not claim that higher information gain is always better, since degree is still a reason measure by cohesiveness, and being too different from degrees can be negative as a cohesiveness measure. In Fig. 17, we report the results on influential-node identification, on the datasets not covered in the main text. In Table 7, we report the full results of the collapsed (k, t)-hypercore problem.

Table 7 Full results of the collapsed (k, t)-hypercore problem (\(b = 100\)). Time: running time (in seconds). Red.: reduction in the hypercore size

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Bu, F., Lee, G. & Shin, K. Hypercore decomposition for non-fragile hyperedges: concepts, algorithms, observations, and applications. Data Min Knowl Disc 37, 2389–2437 (2023). https://doi.org/10.1007/s10618-023-00956-2

Download citation

Received: 08 February 2023
Accepted: 06 July 2023
Published: 08 August 2023
Issue Date: November 2023
DOI: https://doi.org/10.1007/s10618-023-00956-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hypercore decomposition for non-fragile hyperedges: concepts, algorithms, observations, and applications

Abstract

Access this article

Similar content being viewed by others

Efficient Truss Computation for Large Hypergraphs

Improving the core resilience of real-world hypergraphs

Generalized core maintenance of dynamic bipartite graphs

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (PDF 4972 KB)

Appendices

A. Proofs

1.1 A.1 Proof of proposition 1

Proof

1.2 A.2 Proof of proposition 2

Proof

1.3 A.3 Proof of lemma 1

Proof

1.4 A.4 Proof of lemma 2

Proof

1.5 A.5 Proof of lemma 3

Proof

Remark 1

1.6 A.6 Proof of theorem 1

Proof

1.7 A.7 Proof of theorem 2

Proof

1.8 A.8 Proof of theorem 3

Proof

Details of datasets

Additional experimental results

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation