The Effects of Randomness on the Stability of Node Embeddings

Schumacher, Tobias; Wolf, Hinrikus; Ritzert, Martin; Lemmerich, Florian; Grohe, Martin; Strohmaier, Markus

doi:10.1007/978-3-030-93736-2_16

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1524))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

2335 Accesses
4 Citations

Abstract

We systematically evaluate the (in-)stability of state-of-the-art node embedding algorithms due to randomness, i.e., the random variation of their outcomes given identical algorithms and networks. We apply five node embeddings algorithms—HOPE, LINE, node2vec, SDNE, and GraphSAGE—to assess their stability under randomness with respect to their performance in downstream tasks such as node classification and link prediction. We observe that while the classification of individual nodes can differ substantially, the overall accuracy is mostly unaffected by the geometric instabilities in the underlying embeddings. In link prediction, we also observe high stability in the overall accuracy and a higher stability in individual predictions than in node classification. While our work highlights that the overall performance of downstream tasks is largely unaffected by randomness in node embeddings, we also show that individual predictions might be dependent solely on randomness in the underlying embeddings. Our work is relevant for researchers and engineers interested in the effectiveness, reliability, and reproducibility of node embedding approaches.

T. Schumacher, H. Wolf and M. Ritzert—Equal contribution.

This work is supported by the German research council (DFG) Research Training Group 2236 UnRAVeL, the Federal Ministry of Education and Research (BMBF), and the Ministry of Culture and Science of the German State of North Rhine-Westphalia (MKW). We thank Jan Bachmann, Max Klabunde, and Florian Frantzen for their help with implementing and running the experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
All code available on https://github.com/SGDE2020/embedding_stability.

References

Antoniak, M., Mimno, D.: Evaluating the stability of embedding-based word similarities. Trans. Assoc. Comput. Linguist. 6, 107–119 (2018)
Article Google Scholar
Barabási, A.L., Albert, R.: Emergence of scaling in random networks. Science 286(5439), 509–512 (1999)
Google Scholar
Goyal, P., Ferrara, E.: GEM: a python package for graph embedding methods. J. Open Source Softw. 3, 876 (2018)
Article Google Scholar
Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864. ACM (2016)
Google Scholar
Hamilton, W.L., Leskovec, J., Jurafsky, D.: Diachronic word embeddings reveal statistical laws of semantic change. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1489–1501 (2016)
Google Scholar
Hamilton, W.L., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. In: Advances in Neural Information Processing Systems, pp. 1024–1034 (2017)
Google Scholar
Hellrich, J., Hahn, U.: Bad company-neighborhoods in neural embedding spaces considered harmful. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 2785–2796 (2016)
Google Scholar
Kunegis, J.: Konect: the koblenz network collection. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 1343–1350 (2013)
Google Scholar
Leskovec, J., Krevl, A.: SNAP Datasets: stanford large network dataset collection, June 2014. http://snap.stanford.edu/data
Leszczynski, M., May, A., Zhang, J., Wu, S., Aberger, C., Re, C.: Understanding the downstream instability of word embeddings. In: Proceedings of Machine Learning and Systems 2020, pp. 262–290 (2020)
Google Scholar
Mahoney, M.: Large text compression benchmark (2011)
Google Scholar
Ou, M., Cui, P., Pei, J., Zhang, Z., Zhu, W.: Asymmetric transitivity preserving graph embedding. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1105–1114. ACM (2016)
Google Scholar
Pierrejean, B., Tanguy, L.: Predicting word embeddings variability. In: Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics, pp. 154–159 (2018)
Google Scholar
Rozemberczki, B., Davies, R., Sarkar, R., Sutton, C.: Gemsec: Graph embedding with self clustering. arXiv preprint arXiv:1802.03997 (2018)
Stark, C., Breitkreutz, B.J., Reguly, T., Boucher, L., Breitkreutz, A., Tyers, M.: Biogrid: a general repository for interaction datasets. Nucleic Acids Res. 34(suppl\_1), D535–D539 (2006)
Google Scholar
Šubelj, L., Bajec, M.: Model of complex networks based on citation dynamics. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 527–530. ACM (2013)
Google Scholar
Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., Mei, Q.: Line: large-scale information network embedding. In: Proceedings of the 24th International Conference on World Wide Web, pp. 1067–1077. International World Wide Web Conferences Steering Committee (2015)
Google Scholar
Tang, L., Liu, H.: Relational learning via latent social dimensions. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 817–826. ACM (2009)
Google Scholar
Wang, C., Rao, W., Guo, W., Wang, P., Liu, J., Guan, X.: Towards understanding the instability of network embedding. IEEE Trans. Knowl. Data Eng., 1 (2020)
Google Scholar
Wang, D., Cui, P., Zhu, W.: Structural deep network embedding. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1225–1234. ACM (2016)
Google Scholar
Watts, D.J., Strogatz, S.H.: Collective dynamics of ‘small-world’ networks. Nature 393(6684), 440 (1998)
Google Scholar
Wendlandt, L., Kummerfeld, J.K., Mihalcea, R.: Factors influencing the surprising instability of word embeddings. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 2092–2102 (2018)
Google Scholar
Zafarani, R., Liu, H.: Social computing data repository at ASU (2009). http://socialcomputing.asu.edu

Download references

Author information

Authors and Affiliations

RWTH Aachen University, Aachen, Germany
Tobias Schumacher, Hinrikus Wolf, Martin Grohe & Markus Strohmaier
Aarhus University, Aarhus, Denmark
Martin Ritzert
University of Passau, Passau, Germany
Florian Lemmerich
GESIS - Leibniz Institute for the Social Sciences, Cologne, Germany
Markus Strohmaier
Complexity Science Hub Vienna, Vienna, Austria
Markus Strohmaier

Authors

Tobias Schumacher
View author publications
You can also search for this author in PubMed Google Scholar
Hinrikus Wolf
View author publications
You can also search for this author in PubMed Google Scholar
Martin Ritzert
View author publications
You can also search for this author in PubMed Google Scholar
Florian Lemmerich
View author publications
You can also search for this author in PubMed Google Scholar
Martin Grohe
View author publications
You can also search for this author in PubMed Google Scholar
Markus Strohmaier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hinrikus Wolf .

Editor information

Editors and Affiliations

IKIM, Ruhr-University Bochum, Bochum, Germany
Michael Kamp
University of Sydney, Sydney, NSW, Australia
Irena Koprinska
University of Namur, Namur, Belgium
Adrien Bibal
University of Rennes 1, Rennes, France
Tassadit Bouadi
University of Namur, Namur, Belgium
Benoît Frénay
Inria, Rennes, France
Luis Galárraga
University of Antwerp, Antwerp, Belgium
José Oramas
Ruhr University Bochum, Bochum, Germany
Linara Adilova
Royal Holloway University of London, Egham, UK
Yamuna Krishnamurthy
Ghent University, Ghent, Belgium
Bo Kang
Université Jean Monnet, Saint-Etienne cedex 2, France
Christine Largeron
Ghent University, Gent, Belgium
Jefrey Lijffijt
Telecom Paris, Paris, France
Tiphaine Viard
University of Bonn, Bonn, Germany
Pascal Welke
Norwegian Univesity of Science and Technology, Trondheim, Norway
Massimiliano Ruocco
BI Norwegian Business School, Oslo, Norway
Erlend Aune
University of Pisa, Pisa, Italy
Claudio Gallicchio
University of Duisburg-Essen, Essen, Germany
Gregor Schiele
Graz University of Technology, Graz, Austria
Franz Pernkopf
Xilinx Research, Dublin, Ireland
Michaela Blott
Heidelberg University, Heidelberg, Germany
Holger Fröning
Heidelberg University, Heidelberg, Germany
Günther Schindler
University of Pisa, Pisa, Italy
Riccardo Guidotti
University of Pisa, Pisa, Italy
Anna Monreale
ISTI-CNR, Pisa, Italy
Salvatore Rinzivillo
Warsaw University of Technology, Warsaw, Poland
Przemyslaw Biecek
Freie Universität Berlin, Berlin, Germany
Eirini Ntoutsi
Eindhoven University of Technology, Eindhoven, The Netherlands
Mykola Pechenizkiy
Leibniz University Hannover, Hannover, Germany
Bodo Rosenhahn
University of Sussex, Brighton, UK
Christopher Buckley
University of Chieti-Pescara, Chieti, Italy
Daniela Cialfi
Radboud University Nijmegen, Nijmegen, The Netherlands
Pablo Lanillos
McGill University, Montreal, Canada
Maxwell Ramstead
Ghent University, Ghent, Belgium
Tim Verbelen
University of Lisbon, Lisboa, Portugal
Pedro M. Ferreira
University of Bari Aldo Moro, Bari, Italy
Giuseppina Andresini
Universita di Bari Aldo Moro, Bari, Italy
Donato Malerba
University of Lisbon, Lisbon, Portugal
Ibéria Medeiros
Shenzhen University, Shenzhen, China
Philippe Fournier-Viger
Harbin Institute of Technology, Harbin, China
M. Saqib Nawaz
University of Córdoba, Córdoba, Spain
Sebastian Ventura
Peking University, Beijing, China
Meng Sun
Noah's Ark Lab, Huawei, Beijing, China
Min Zhou
UniCredit, Milan, Italy
Valerio Bitetta
UniCredit, Rome, Italy
Ilaria Bordino
UniCredit, Milan, Italy
Andrea Ferretti
Unicredit, Rome, Italy
Francesco Gullo
ENEA Headquarters, Portici, Italy
Giovanni Ponti
Unicredit, Rome, Italy
Lorenzo Severini
University of Porto, Porto, Portugal
Rita Ribeiro
University of Porto, Porto, Portugal
João Gama
UPC BarcelonaTech, Barcelona, Spain
Ricard Gavaldà
Northwestern University, Chicago, IL, USA
Lee Cooper
PD Personalised Healthcare, Basel, Switzerland
Naghmeh Ghazaleh
University of Lausanne, Lausanne, Switzerland
Jonas Richiardi
ETH Zurich, Basel, Switzerland
Damian Roqueiro
F. Hoffmann–La Roche Ltd, Basel, Switzerland
Diego Saldana Miranda
Novartis Pharma AG, Basel, Switzerland
Konstantinos Sechidis
University of Lisbon, Lisbon, Portugal
Guilherme Graça

Appendices

A Experimental Setup

1.1 A.1 Datasets

We provide some more details on the graphs datasets that were used in our experiments. Note that the Facebook dataset has only been used in our preliminary experiments on embedding geometry (cf. Appendix B), as it does not provide any node labels. Statistics for each graph can be found in Table 1.

BlogCatalog: This graph models the relationships among the users of the BlogCatalog website. Each user is represented by a node and two nodes are connected if the respective users are friends. Each user additionally has one or more labels which correspond to the news category their blog belongs to.
Cora [16]: In the well-known Cora citation network each scientific paper is represented by a node, and a directed edge indicates that the outgoing node cites the target node. Each paper is associated with a category that refers to its research topic.
Facebook [14]: The Facebook government dataset models the social network structure of verified government sites on Facebook. Each site is represented by a node and nodes are connected by an edge if both sites like each other.
Protein [15]: This biological network models protein interactions in human beings. Each node represents a protein and two nodes are connected if the corresponding proteins interact with each other. Additionally, each node is associated with one or more labels that represent biological states.
Wikipedia [11]: This network represents the co-occurrence of words within a dump of Wikipedia articles. Each word corresponds to a node, and weighted edges represent the number of times two words occur in the same context. Additionally, each node has one or more labels that encode its part of speech.

We used the Cora dataset from the KONECT graph repository [8] and BlogCatalog from the ASU Social computing repository [23]. The other empirical datasets were taken from the SNAP graph repository [9].

Table 1. Statistics of empirical graph datasets. We show number of nodes (|V|) and edges (|E|), density, and number of node labels. MC indicates multi class, ML multi label problems.

Full size table

1.2 A.2 Implementations and Parameter Settings

To complement Sect. 3, in the following we give a more detailed overview on the chosen implementations and parameter settings of the node embedding algorithms, as well as the experimental setups of the downstream classification tasks that we used in our experiments.

Node Embedding Algorithms. For every algorithm from Sect. 3 we use the reference implementation except for HOPE, for which no reference implementation was published. Thus we resorted to the HOPE implementation from the GEM library [3]. We run the algorithms with default parameters from the given implementations whenever possible and compute embedding vectors of length $d=128$. We adapted SDNE to use only a single intermediate layer and for larger graphs increased the weight on the reconstruction error and the regularization term, as otherwise SDNE maps all nodes onto the same vector.

Downstream Classification. For both node classification and link prediction, we use AdaBoost, decision trees, random forests, and feedforward neural networks as downstream classification algorithms. For all classifiers we used the standard methods with default parameters from scikit-learn (AdaBoost, decision tree, random forest) and TensorFlow (neural networks). In the case of neural networks, we use a network with a single hidden layer of width 100 with ReLu activation and an output layer with softmax or sigmoid activation depending on the classification type. Deeper and wider networks did not improve performance which is why we worked with this very simple architecture.

In node classification we predict either the class of a node, e.g., top-level research category in Cora, or a set of labels of a node, e.g., the news categories in BlogCatalog. In the latter case of multi label classification, we assume that we know the number l of labels and thus predict the l most probable labels. This approach leads to more stable predictions and is common in literature [18].

For the link prediction task, we considered subgraphs of each network where we removed 10% of the original edges at random while ensuring that the residual graph is still connected. For each reduced network, we computed 10 embeddings per algorithm. We then interpreted link prediction as a binary classification task on the Hadamard product of two embedding vectors. The removed edges are then the positive examples for the link prediction, and we chose as many non-edges at random as negative examples for training the classifier.

For the stability of performance, we compute the variance of micro-F1 scores over one classifier computed on each of the 30 embeddings per graph and embedding algorithm in node classification, and each of the 10 embeddings per graph and embedding algorithm in link prediction. In both experiments, macro-F1 yields very similar results such that we only report micro-F1.

For the stability of single classifications, we have to separate inherent instability of the classifiers from the influence of different embeddings. We estimate the instability of a classifier by running it 10 times on a single embedding, averaged over 5 embeddings. The total variance in individual predictions is computed on the results of one classifier trained on each of the 30 embeddings using 75% of the nodes for training and 25% for evaluation.

B Experiments on Geometric Stability

In this section, we present our preliminary experiments on the geometric stability of node embeddings. We first give a brief description of the measures for geometric stability, and then present the results.

1.1 B.1 Measures for Geometric Stability

To quantify geometric instability of node embeddings, we use two measures which have been introduced in related literature on word embeddings, namely aligned cosine similarity [5] and k-NN Jaccard similarity [1].

The aligned cosine similarity computes the node-wise cosine similarity between two embeddings after aligning the axes of the corresponding embedding spaces. To obtain the optimal alignment, we normalize all embedding vectors and solve the Procrustes problem: Given two embedding matrices $Z^{(1)}, Z^{(2)}\in \mathbb {R}^{N\times d}$, with N denoting the number of nodes in a given network, and d denoting the embedding dimension, we determine the transformation matrix $Q\in \mathbb {R}^{d \times d}$ by solving the minimization problem

$$ Q := \underset{Q^TQ=I}{{\text {argmin}}} \left\| Z^{(1)} Q - Z^{(2)}\right\| _F . $$

The k-NN Jaccard similarity measure compares the local neighborhoods of nodes between different embeddings. In both embedding spaces, we compute for a node u the k nearest neighbors with respect to cosine similarity. We then calculate the Jaccard similarity of the two nearest-neighbor sets of $u$.

Each of those two measures computes a score for a single node in two embeddings. In order to obtain a score for an embedding space to compare different algorithms, we average over all pairs of embeddings and all nodes.

1.2 B.2 Experimental Results

In our experiments on geometric stability, we used the same algorithmic parameter settings and datasets that have been introduced in Appendix A. Next to the overall stability of the embeddings, we also look into the influence of node centrality, and the influence of network size and density on the stability of node embeddings.

Geometric Stability. We start our analysis by computing 30 embeddings per dataset with every algorithm. We then compute node-wise stability measures averaged over all pairs of embeddings computed per graph and embedding algorithm. Figure 4 shows the distributions of (a) aligned cosine similarity and (b) k-NN Jaccard similarity over the nodes of each graph.

For the aligned cosine similarity, we observe that GraphSAGE achieves similarities that are generally only slightly above zero and sometimes even negative. Negative values correspond to angle differences of more than 90$^\circ $ between two embeddings of the same node. Thus, even after aligning axes, embedding vectors of the same node are mostly close to orthogonal to each other. In contrast, HOPE yields near-constant embeddings (not shown) and shows hardly any instability. The algorithms SDNE, node2vec and LINE achieve aligned cosine similarities in the interval (0.8, 0.9) with low variances. These values correspond to angles between 25$^\circ $ and 35$^\circ $ such that corresponding embedding vectors roughly point in the same direction after aligning the embedding spaces. Thus, the latter algorithms exhibit a moderate, but significant degree of instability in their embeddings.

Results for the k-NN Jaccard similarity, as shown in Fig. 4(b), generally confirm these findings. For HOPE, we observe perfectly matching neighborhoods, while for GraphSAGE the neighborhoods are completely disjoint. This matches our observations for aligned cosine similarity. For the other three algorithms, the resulting similarities seem to be highly dependent on the dataset, with quite large variances. Generally, node2vec appears most stable among these algorithms, though only by a slight margin over LINE. SDNE appears to the significantly less stable than node2vec and LINE with respect to Jaccard similarity, with similarity values close to zero on BlogCatalog, Protein and Wikipedia. This contrasts the results with respect to aligned cosine similarity, where SDNE appeared as stable as the other two algorithms.

Influence of Node Centrality. Now, we analyze whether nodes that are central in their graph have more stable embeddings. Closeness centrality has been identified to be one of the top influence factors for stability in the analysis of Wang et al. [19]. Also, from the definition of node2vec we expect this algorithm, among others, to produce more stable central node embeddings since central nodes occur more often in random walks. In Fig. 5, for the Cora and Facebook datasets we plot each node’s closeness centrality against a moving average with window size 25 of their average node-wise (a) k-NN Jaccard similarity, and (b) k-NN angle divergence, aggregated over all 30 embeddings per network and algorithm. First of all, the (in)stability of the extreme cases HOPE and GraphSAGE appears invariant of the centrality of the node, both in (a) and (b). For SDNE, we observe that stability with respect to k-NN Jaccard similarity appears to increase with growing closeness centrality. This trend however is not visible when considering aligned cosine similarity. For LINE and node2vec, there is no simple trend visible with respect to any of the two measures, their similarity scores look rather arbitrary. Overall, we see that although closeness centrality is ranked high in the factor analysis of Wang et al. [19], there are no clear signs that more central nodes have more stable embeddings.

Influence of Graph Properties. To evaluate the impact of graph properties on the stability of the embeddings, we generated synthetic graphs with varying sizes and densities. More precisely, we utilized two network models, namely Barabasi-Albert networks [2] and Watts-Strogatz [21] networks. For each model, we generate two sets of networks, in which we either fixed the network’s size at $n = 8000$ nodes and varied its density, or fixed the densities at $D=0.01$ and varied their size. The results of this analysis can be found in Fig. 6, where we plot the average aligned cosine similarities over all nodes and embeddings per graph and algorithm against (a) graph size and (b) graph density. Figure 6(a) contains missing data points that result from terminating the embedding computation after a maximum of 72 h per embedding.

Considering the impact of network size, we see that for GraphSAGE, the already low stability rapidly drops with larger graph size on both synthetic models, whereas for HOPE, the near-perfect stability seems invariant of graph size. In between, LINE, SDNE and node2vec show similar stabilities like in our experiments on empirical graphs, however there is no consistent trend regarding the impact of network size on their stability. This finding contrasts results from Wang et al. [19], who stated that the stability of DeepWalk and node2vec primarily depends on the size of the input graphs.

For the dependence on network density plotted in Fig. 6(b), we see that the embedding stability of SDNE and node2vec seems to increase when graphs get more dense. HOPE is once again consistent in its high stability, whereas GraphSAGE shows consistently low stability that is unaffected by network size. Finally, LINE does nor display any clear trend as it diverges between the two synthetic models.

Summary. Our results indicate clear differences in the geometric stability between the embedding algorithms, which is also in line with the results by Wang et al. [19]. HOPE consistently yields near-constant embeddings, whereas GraphSAGE was shown to be very volatile. In between, the other algorithms (LINE, node2vec, and SDNE) exhibit a moderate, but significant degree of instability. When checking possible influence factors for stability, we found for none of them a strong and general trend. In particular, we observed that the influence of node centrality, graph size, and graph density have a rather small to negligible influence on the stability of node embeddings. This does not match the high ranking of the node and graph properties in the factor analysis by Wang et al. [19]. In contrast, stability is dominated by the choice of the embedding algorithm, which overshadows the aforementioned influences.

C Additional Results on Downstream Stability

In the following we present additional plots from the experiments that we conducted on downstream stability, which we left out due to space limitations in the main part.

1.1 C.1 Node Classification

We first present our results on the node classification task. Figure 7 depicts the stability of classification performance on all datasets. We observe that over all algorithms and datasets, the resulting accuracies vary only marginally, and higher variances appear to depend on the datasets rather than embedding techniques (Fig. 8).

Our results regarding the stability of single predictions are shown in Fig. 7. The results on Wikipedia are mostly in line with the results that were obtained on BlogCatalog and Cora and discussed in the main part. For Protein, where we have already obtained the overall lowest accuracies in node classification, we observe an overall much lower stability in individual predictions compared to the other datasets.

1.2 C.2 Link Prediction

We close with the results regarding the stability of link prediction performance on all datasets, which are shown in Fig. 9. We observe that once again, the performance differences between different embeddings are negligible, except for neural networks on HOPE embeddings of the Protein network.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schumacher, T., Wolf, H., Ritzert, M., Lemmerich, F., Grohe, M., Strohmaier, M. (2021). The Effects of Randomness on the Stability of Node Embeddings. In: Kamp, M., et al. Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2021. Communications in Computer and Information Science, vol 1524. Springer, Cham. https://doi.org/10.1007/978-3-030-93736-2_16

Download citation

DOI: https://doi.org/10.1007/978-3-030-93736-2_16
Published: 17 February 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-93735-5
Online ISBN: 978-3-030-93736-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics