PartKG2Vec: Embedding of Partitioned Knowledge Graphs

Priyadarshi, Amitabh; Kochut, Krzysztof J.

doi:10.1007/978-3-031-10986-7_29

PartKG2Vec: Embedding of Partitioned Knowledge Graphs

Amitabh Priyadarshi¹² &
Krzysztof J. Kochut¹²

Conference paper
First Online: 19 July 2022

1720 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13369))

Abstract

Large-scale knowledge graphs with billions of nodes and edges are increasingly common in many domains. Such graphs often exceed the capacity of the systems storing the graphs in a centralized data store, not to mention the limits of today’s graph embedding systems. Unsupervised machine learning methods can be used for graph embedding, which can then be used for various machine learning tasks. State-of the art embedding techniques are often unable to achieve scalability without losing accuracy and efficiency. To overcome this, large knowledge graphs are frequently partitioned into multiple sub-graphs and placed in nodes of a distributed computing cluster. Graph embedding algorithms convert a graph into a vector space where the structure and the inherent property of the graph is preserved. Running such algorithms against these fragmented sub-graphs poses new challenges, such as maximizing the likelihood of preserving network neighborhood of nodes. Also, the learned embeddings of the individual graph partitions need to be merged into one overall embedding to maximize the likelihood of preserving network neighborhood of nodes. This paper introduces a novel method for embedding of partitioned knowledge graphs. It partitions the knowledge graph and executes learning algorithm in parallel on the partitions and merge their outputs to produce an overall embedding. Our evaluation demonstrates that the runtime performance is improved after partitioning of knowledge graph against complete knowledge graph and the quality of the embedding is like that of an embedding produced on the complete, unpartitioned graph.

Download conference paper PDF

1 Introduction

Recently, large-scale knowledge graphs, have been used for representation of transportation networks, e-commerce and shopper preference networks, social and communication networks, and many other real-world systems. Such graphs often hold hundreds of millions, or even billions of vertices and edges. Many data processing methods rely on large-scale graph analytics, which are often based on nodes and graph embedding and node feature extraction, which can further be used in a various machine learning tasks. However, state-of the art techniques are not scalable to large graphs without losing accuracy and/or efficiency. A knowledge graph often needs to be partitioned into multiple sub-graphs, called shards, and stored at multiple computing nodes, which then requires distributed or parallel graph processing.

Graph embedding, also known as network embedding, is a frequently used technique for learning low-dimensional representations of a graph’s vertices, attempting to capture and retain the graph’s structure, as well as its inherent properties. Many tasks on graphs, such as link prediction, node classification, and visualization, greatly benefit by embedding a very large, web-scale graph into a low-dimensional vector space. More specifically, we might be interested in estimating the most likely labels for nodes in a network, or predicting user interests in a social network, or we might be interested in predicting functional labels of proteins in a protein-protein interaction network [1]. Similarly, in a link prediction task [2], we might want to know if a pair of nodes in a graph should be connected by an edge. Link prediction is beneficial numerous fields. For example, in bioinformatics, it aids in the discovery of novel protein interactions [3], and it can recognize “real-world buddies” on social networks [4].

A knowledge graph (KG) is a directed graph G (V, E) whose nodes v_i ∈ V are entities and edges e_i ∈ E are relations connecting entities. Knowledge graphs are often represented as RDF [5] datasets, where triples (v_i, e_i, v_j) represent some type of semantic dependency between the connected entities and nodes/entities are identified by URI’s. Target nodes in triples are either URIs or literals, and edge/relationships have types represented by URIs, as well. RDFS [6] is used to define a schema for an RDF knowledge graph. KGs are closely related to Heterogeneous Information Networks (HIN) [21].

Various approaches to graph embedding have been presented in the machine learning literature, e.g., [7,8,9]. They function well on smaller networks, but real-world knowledge graphs, which often have millions of nodes and billions of edges, present a far more difficult situation. For example, a decade ago, the Twitter’s followee-follower network had 175 million active users and approximately twenty billion edges [10]. Most of the existing graph embedding algorithms do not scale up to networks of this magnitude.

Knowledge graphs can be partitioned into smaller subgraphs, with a hope that many tasks can take advantage of distributed and/or parallel processing. Given a graph G = (V, E), where V is a set of vertices and E is a set of edges and a number k > 1, a graph partitioning of G is a subdivision of vertices of G into subsets of vertices V₁,…, V_k that partitions the set V. A balance constraint requires that all partitioned subgraphs be equal, or close, in size. In addition, a common objective is to minimize the total number of cut edges (min-cut), i.e., edges crossing (cutting) partition boundaries.

In this paper we propose PartKG2Vec, an algorithm for scalable feature learning in partitioned knowledge graphs. Our approach creates embeddings based on random walks in partitioned knowledge graphs and offers significant runtime improvements due to performing the walks in parallel. This is important in random walk-based methods, especially in semantics based, such as metapath2vec [29], due to the high cost of selecting the next node at each step during a random walk.

The rest of the paper is structured as follows. In Sect. 2, we briefly discuss related work. We present the technical details for PartKG2Vec in Sect. 3. In Sect. 4, we briefly explain the implementation of PartKG2Vec. In Sect. 5, we empirically evaluate PartKG2Vec. We conclude with a discussion of the PartKG2Vec framework and highlight some interesting directions for future work in Sect. 6.

2 Related Work

Recently, graph representation learning has attracted a lot of attention. In general, there are two types of graph representation learning methods: unsupervised and supervised. The goal of unsupervised approaches is to learn low-dimensional representation that preserves the structure of a given graph. The supervised methods work in the same way as the unsupervised methods, but for a specific prediction task, such as node or graph classification. Only unsupervised approaches are discussed in this paper.

Unsupervised embedding methods map a graph’s nodes and edges, into a continuous vector space. Several of the graph embedding techniques have been motivated by the Word2Vec algorithm [12], which originated in natural language processing. One type of this algorithm relies on the skip-gram embedding models, where a word’s embedding is optimized to predict its context or adjacent words. A random walk in a graph is akin a sequence of words in a sentence, where the nodes visited in the walk can be thought of words in a sentence.

Deepwalk [13] is one of the first approaches to embedding of graph-structured data. Deepwalk relies on the parallels between graph nodes and words and its neural networks are trained to maximize the likelihood of predicting the context nodes for each target vertex in a graph, in terms of vertex proximity.

node2vec [14], is a popular unsupervised graph embedding algorithm, which extends Deepwalk's sampling strategy. It utilizes random walks. Also, node2vec utilizes breadth-first and depth-first to capture both local and global community structures, resulting in more informative embeddings.

LINE [15], which is an acronym of Large-scale Information Network Embedding, produces embeddings ensuring first- and second-order proximity. For first order, LINE minimizes the graph regularization loss, and for second order, decodes embeddings into context conditional distributions for each node, which is computationally expensive. Negative sampling is used by LINE to sample negative edges based on a noisy distribution over edges. Finally, LINE combines first and second order embedding with concatenation.

HARP [16], or Hierarchical representation learning for networks lowers the number of nodes in the graph by coarsening the graph in a hierarchical manner. Iteratively grouping nodes into super nodes, it creates a graph with similar properties to the original network, resulting in graphs of reduced size. Existing approaches, such as LINE or Deepwalk, are then used to learn node embedding for each coarsened graph. The random walk technique on G_t-1 uses the embedding learned for G_t as initialized embedding at time-step t. This technique is repeated until each node in the original graph is embedded.

2.1 Embedding of Partitioned Graphs

PyTorch-BigGraph [17], also known as PBG, is a multi-relation embedding system that can scale to graphs with billions of nodes and trillions of edges by incorporating various improvements to existing multi-relation embedding systems. PBG can train very large embedding using a distributed cluster using graph partitioning. The adjacency matrix is decomposed into N buckets, with each bucket training on the edges individually. PBG then performs distributed execution across multiple machines or swaps embedding from each partition to disk to reduce memory use.

MILE [18], or Multi-Level Embedding, is a graph embedding framework that can scale to large graphs. It uses a hybrid matching technique to repeatedly coarsen the graph into smaller ones while maintaining its structure. It then uses known embedding methods on the coarsest graph and uses a graph convolution neural network to refine the embedding to the original graph. It is independent of the underlying graph embedding techniques and may be applied to a wide range of existing graph embedding methods without requiring them to be modified. It has been demonstrated that, MILE dramatically improves graph embedding time (by an order of magnitude).

Accurate, Efficient and Scalable Graph Embedding [19] relies on the GCN [20] model and its variants are strong graph embedding tools for enabling graph classification and clustering. A unique graph sampling based GCN parallelization strategy that achieves excellent scalable performance on very large graphs without sacrificing accuracy. To scale, it uses parallelism within and across many sampling instances for the graph sampling step and devises an efficient data structure for concurrent accesses. Data partitioning improves cache utilization within the sampled graph. On several large datasets, its parallel graph embedding exceeds state-of-the-art approaches in terms of scalability, efficiency, and accuracy.

PartKG2Vec, presented in this paper, is a parallel processing of partitioned knowledge graph to generate the random walk for the embedding. PartKG2Vec is a graph embedding system capable of handling big graphs. A parallelization approach, that achieves good scalability on very large graphs while maintaining accuracy. PartKG2Vec can be used with any random walk generator algorithm with minor adjustments.

The nodes in the knowledge graph are partitioned into the sub-graphs using METIS [11]. First, random walks of the subgraphs in a partitioned graph generate partial random walks. These partial walks are then combined to form complete walks. This way, the likelihood of preserving network neighborhoods of nodes in a d-dimensional feature space is maximized. The random walks are performed independently, starting with the initial nodes within each partition. Some of these walks will be incomplete (shorter than a desired length), as they reach a partition boundary. These walks are then completed with fragments of random walks in the neighboring partitions. A full set of complete walks is then used for representational learning to generate knowledge graph embedding.

PBG and PartKG2Vec use partitioning to support Knowledge Graph which is too large for a single machine and helps in distributed training of the model. PBG creates buckets from the cross edges (p_i, p_j), These buckets are loaded and subdivided among the CPU threads for training. PartKG2Vec is different, as we create the metadata of cut edges for each partition and complete or partial random walks are generated separately in each partition. Before the representation learning, the partial random walks are completed by concatenation with other walk fragments (from neighboring partitions). MILES repeatedly coarsening the graph into smaller graph, where multiple nodes in graph are collapsed to form super nodes and edges between them are the union of edges. Whereas in PartKG2Vec, we partition the knowledge graph to reduce the size of graph while maintaining the structure.

3 Partitioned Knowledge Graph Embedding

Our method, PartKG2Vec, (1) partitions a knowledge graph into k partitions, (2) distributes the resulting shards to k computing nodes, (3) in each shard, complete and/or partial random walks are created; a walk is partial, if a cut edge on a walk is encountered, before a desired walk length is reached, (4) a complete set of random walks is obtained from already existing complete walks in individual shards and by concatenating partial walks with sub-walks in neighboring partitions; a sub-walk is a fragment of a walk beginning with the target node in a cut edge terminating the corresponding partial walk, and (5) using the complete walks for graph embedding. Further downstream, graph embedding can be used to solve other problems, including link prediction, node classification, and many other tasks.

Knowledge graphs used in the method presented in this paper are represented as RDF datasets. However, others graph representations can be easily adapted and used in PartKG2Vec. This method has a time complexity bounded by O(|V | log |V |).

3.1 Graph Indexing and Partitioning and Segregator

To speed up the process of learning embedding for the knowledge graph, we created indices, using Apache Lucene [22], on all triples in the knowledge graph, based on their subjects, predicates, and objects. Using these indexes, the graphs triples of the form (S, P, O) can be efficiently searched, similarly as we have done in our prior works WawPart [27] and in AWAPart [28]. This indexing helps the system to convert the knowledge graph into a representation suitable for graph partitioning, as the URIs used in triples are also converted to numeric identifiers. This new graph representation is then partitioned into several sub-graphs using METIS [11]. We have experimented with other ways to partition the knowledge graph, including bisection methods, community detection methods, and others, but found METIS to produce the best partitions with a low number of cut edges in acceptable runtime.

The partitioning outputs the node list and their partition identifiers. Our system compares this list with the complete graph to produce the list of cut edges. Consequently, along with its edges, each partition stores information about the cut edges. The cut edges are replicated across the shards which share the cut edges. That is, a cut edge {u, v} is stored with both partitions, to which the nodes u and v belong.

3.2 Random Walk Generation

The created knowledge graph partitions (sub-graphs) are stored as shards at computing nodes for the processing and random walk generation. Figure 1(a) shows an example of two partitions P1 and P2 with vertices {a, b, c, d, u, v} and {m, n, o, p, q, v, u}, respectively. The partitions are connected with a cut edge {u, v}. A modified node2vec algorithm attempts to generate random walks of walk_length length, within each partition. However, if a walk in partition P1 encounters a cut edge {u, v} transitioning to partition P2, the random walk is terminated and recorded it as a partial_walk. The node v is recorded as the exit node of P1 and an entry node of P2. Figure 1(b) shows a partial random walk, interrupted because it attempted to cross to P2 using the cut edge {u, v}. Node v is an exit node for partition p1 and an entry node for partition p2. The exit and entry nodes and the current walk length is stored with the partial walk. This data is later used to complete the partial walk, as described below.

A random walk in a partition may traverse a node v which is also a node in a cut edge. However, even though the node is in a cut edge, the walk does not terminate at v (as a partial walk) and continues within the same partition. A sub-walk is a sub-sequence of nodes in a random walk, beginning at a node v of a cut edge, but not crossing to the other partition. The modified node2vec algorithm also records and indexes all sub-walks within each partition. Figures 1(c1) and 1(c2) show random walks and sub-walks in P2. In Fig. 1(c3), a sub-walk does not exist, because the walk in the figure does not traverse node v, which is in the cut edge.

3.3 Accumulator and Graph Embeddings

The Random Walk Accumulator collects all the partial and complete random walks collected within all partitions. Complete walks do not need any further work, as they already have the desired walk length. However, any partial walks must be extended to the required walk length. For each partial walk, a sub-walk with the same starting node as the partial walk’s exit node is randomly selected and concatenated to the partial walk to make a complete random walk. As shown in Fig. 1(d), a complete random walk a, b, d, c, u, v, p, o, n, q is created using the sub-walk from Fig. 1(c1) and the partial walk from Fig. 1(b). Once the full set of complete random walks is created, it can be used by the representation learning module.

4 Implementation

Figure 2 shows the processing pipeline used in PartKG2Vec, while Fig. 3 shows the architecture of the system. A knowledge graph is given as input and its embedding is produced as the output. KG2Index Converter indexes the knowledge graph using Lucene [22] and converts it to a format suitable for partitioning. The graph is partitioned using METIS [11] into k partitions by the Partition Engine. The partition data (the edge lists) are sent to the Graph Partition Segregator, which creates the final partitions and identifies cut edges to be included with each partition.

The k partitioned sub-graphs (shards) are then sent to the k processing nodes to produce random walks. All partial (PRW) and complete walks (CRW) are transferred from the processing nodes to the master node. The master-node runs an Accumulator, which gathers all the walks (partial/sub-walks and complete walks) and other critical information. Already complete walks are simply retained, but the Accumulator uses partial walks and matching sub-walks to create complete walks (of the desired length). At the end of accumulation process, a corpus of complete random walks is finalized. This set of random walks is then used for representation learning. Finally, the Lucene index is applied to restore the original node identifiers (URIs) in the knowledge graph embedding.

5 Evaluation

Two popular datasets, Yago39K [23] and NELL [24], were used for the evaluation of PartKG2Vec. Yago39K contains a subset of the Yago knowledge base [25], which includes data extracted from Wikipedia, WordNet and GeoNames. Yago39K contains 123,182 unique entities (nodes) and 1,084,040 edges, using 37 different relation types. NELL is a knowledge graph mined from Web documents and contains 49,869 unique nodes, 296,013 edges, using 827 relation types. The evaluation experiments discussed here were conducted on an Intel i7-based cluster.

Two experiments were used to evaluate the performance of PartKG2Vec. The first experiment was designed to evaluate the runtime of producing the embedding on the complete vs. partitioned graph. The second experiment intended to compare the graph embedding produced by PartKG2Vec (based on the modified node2vec and DeepWalk algorithms) with the embedding produced by the original algorithms on un-partitioned graphs. In the two experiments, both knowledge graphs (Yago39K and NELL) were partitioned into N = 10 partitions. We set all walk parameters to their default values, namely the number of walks to 10, walk length to 80, number of workers to 8, and the window size to 10, and the walk parameters of p and q both sets to 1.

5.1 Experiment 1: Runtime Improvement

This experiment demonstrates the improvement in the runtime of the random walk generation on the partitioned graph as compared to the random walks produced on the complete graph by the original algorithms. In Fig. 4, the runtime of node2vec and PartKG2Vec (with modified node2vec) on Yago39K and NELL is shown, while in Fig. 5, the runtime of Deepwalk and PartKG2Vec (with modified Deepwalk) on Yago39K and NELL is shown. Graph preprocessing and the 10 iterations of random walk generation are shown. Node2vec did not require all the steps before graph preprocessing and accumulation of random walks. All these extra steps were required for PartKG2Vec, but it did not require considerable time. Consequently, we can consider these steps as insignificant. Figure 4 indicates that the time required by PartKG2Vec for graph preprocessing took 32% of the time required by node2vec (695 vs 2175 s). Ten iterations of random walk generation were used in both algorithms, but PartKG2Vec runs in parallel, so it took only 17.75% of the time required by the original node2vec (480 vs. 2702 s) on the complete knowledge graph.

Similarly, Fig. 4, shows that the time required by PartKG2Vec for graph preprocessing was only 20.5% of the time required by node2vec (13.5 vs. 66 s), on the NELL dataset. Ten iterations of random walk generation were used in both algorithms, but PartKG2Vec_N2V runs in parallel, so it took only 6.5% of time (51 vs. 794 s) of the time taken by node2vec on the complete graph. Learning the graph embedding takes the same time for node2vec and PartKG2Vec, because at this point both algorithms work on the similar random walk pool.

Figure 5, with the results for the YAGO dataset, shows that the time required for the generation of random walks by PartKG2Vec is only 21.6% of the time used by Deepwalk (46 vs. 213 s), and for NELL dataset, the time required for the random walk generation by PartKG2Vec is only 28% of the time used by Deepwalk (23 vs. 82 s).

5.2 Experiment 2: Embedding Quality

This experiment demonstrates the embedding quality based on the random walks generated by PartKG2Vec vs. node2vec and Deepwalk. Again, the experiment used the same two knowledge graphs (NELL and Yago39K) and produce their embeddings by PartKG2Vec vs. node2vec and Deepwalk, with varied dimensions d ∈ {128, 64, 32, 16}. The algorithms were executed 25 times, for each dimension. To compare the produced embeddings, the average divergence scores [26] S_A,d were computed. Broadly speaking, a divergence score is the result of comparing a graph with the edges re-created from an embedding produced for a graph and the original graph. When comparing embeddings, a lower divergence score indicates a better embedding and, conversely, a higher divergence score means that a given embedding is not as good.

Figure 6 shows divergence scores of the embeddings of the Yago39K dataset produced by node2vec and PartKG2Vec_N2V (modified node2vec) and embeddings produced by Deepwalk and PartKG2Vec_DW (a PartKG2Vec implementation on Deepwalk). The embeddings have very similar divergence scores at every dimension. Incidentally, node2vec (and PartKG2Vec_N2V) produce better embeddings than those produced by Deepwalk and PartKG2Vec_DW. Comparing the embeddings produced for the NELL dataset leads to similar conclusions, as the divergence scores for node2vec and PartKG2Vec_N2V and for Deepwalk and PartKG2Vec_DW are very similar.

This demonstrates that the embeddings generated from node2vec on the original graph are very similar to those generated by PartKG2Vec_N2V and the embeddings generated from Deepwalk are similar to those from PartKG2Vec_DW.

6 Conclusions and Future Work

We propose a system, PartKG2Vec, to create embeddings of partitioned knowledge graphs. The method uses modified node2vec and Deepwalk random walk algorithms to take advantage of the partitioning and perform in parallel. Our experiments showed that the embeddings produced on the original knowledge graphs are very similar to those produced by our method on the partitioned graphs. Importantly, PartKG2Vec offers significant performance improvements over the embedding algorithms on the unpartitioned (original) knowledge graphs, which would improve the runtime of embedding very large graphs.

In the future, we intend to study other embedding algorithms utilizing different types of random walks, especially incorporating the semantics in knowledge graphs, such as metapath2vec [29] and RegPattern2Vec [30].

References

Radivojac, P., et al.: A large-scale evaluation of computational protein function prediction. Nat. Methods 10(3), 221–227 (2013)
Article Google Scholar
Liben-Nowell, D., Kleinberg, J.: The link-prediction problem for social networks. J. Am. Soc. Inform. Sci. Technol. 58(7), 1019–1031 (2007)
Article Google Scholar
Vazquez, A., Flammini, A., Maritan, A., Vespignani, A.: Global protein function prediction from protein-protein interaction networks. Nat. Biotechnol. 21(6), 697–700 (2003)
Article Google Scholar
Backstrom, L., Leskovec, J.: Supervised random walks: predicting and recommending links in social networks. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, pp. 635–644 (2011)
Google Scholar
RDF Working Group: Rdf - semantic web standards. https://www.w3.org/RDF/. Accessed 1 July 2021
World Wide Web Consortium: Rdfs - semantic web standards. https://www.w3.org/2001/sw/wiki/RDFS. Accessed 1 July 2021
Cox, M., Cox, T.: Multidimensional scaling. In: Chen, C., Härdle, W., Unwin, A.: Handbook of Data Visualization. Springer Handbooks Comp.Statistics, pp. 315–347. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-33037-0_14
Tenenbaum, J.B., De Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)
Article Google Scholar
Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Nips, vol. 14, no. 14, pp. 585–591 (2001)
Google Scholar
Myers, S.A., Sharma, A., Gupta, P., Lin, J.: Information network or social network? The structure of the Twitter follow graph. In: Proceedings of the 23rd International Conference on World Wide Web, pp. 493–498 (2014)
Google Scholar
Karypis, G., Kumar, V.: METIS--unstructured graph partitioning and sparse matrix ordering system, version 2.0 (1995)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710 (2014)
Google Scholar
Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864 (2016)
Google Scholar
Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., Mei, Q.: Line: large-scale information network embedding. In: Proceedings of the 24th International Conference on World Wide Web, pp. 1067–1077 (2015)
Google Scholar
Chen, H., Perozzi, B., Hu, Y., Skiena, S.: Harp: hierarchical representation learning for networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1 (2018)
Google Scholar
Lerer, A., et al.: Pytorch-biggraph: a large scale graph embedding system. In: Proceedings of Machine Learning and Systems, vol. 1, pp. 120–131 (2019)
Google Scholar
Liang, J., Gurukar, S., Parthasarathy, S.: Mile: a multi-level framework for scalable graph embedding. arXiv preprint arXiv:1802.09612 (2018)
Zeng, H., Zhou, H., Srivastava, A., Kannan, R., Prasanna, V.: Accurate, efficient and scalable graph embedding. In: 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, pp. 462–471 (2019)
Google Scholar
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)
Sun, Y., Han, J.: Mining heterogeneous information networks: a structural analysis approach. ACM SIGKDD Explor. Newslett. 14, 20–28 (2013)
Article Google Scholar
Białecki, A., Muir, R., Ingersoll, G., Imagination, L.: Apache lucene 4. In: SIGIR 2012 Workshop on Open Source Information Retrieval, p. 17 (2012)
Google Scholar
Lv, X., Hou, L., Li, J., Liu, Z.: Differentiating concepts and instances for knowledge graph embedding,” arXiv preprint arXiv:1811.04588 (2018)
Wan, G., Du, B., Pan, S., Haffari, G.: Reinforcement learning based meta-path discovery in large-scale heterogeneous information networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 04, pp. 6094–6101 (2020)
Google Scholar
Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a large ontology from wikipedia and wordnet. J. Web Semant. 6(3), 203–217 (2008)
Article Google Scholar
Dehghan-Kooshkghazi, A., Kamiński, B., Kraiński, Ł., Prałat, P., Théberge, F.: Evaluating Node embeddings of complex networks. arXiv preprint arXiv:2102.08275 (2021)
Priyadarshi, A., Kochut, K.J.: WawPart: workload-aware partitioning of knowledge graphs. In: Fujita, H., Selamat, A., Lin, J.CW., Ali, M. (eds.) Advances and Trends in Artificial Intelligence. Artificial Intelligence Practices. IEA/AIE 2021. Lecture Notes in Computer Science, vol. 12798, pp. 383–395. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-79457-6_33
Priyadarshi, A., Kochut, K.J.: AWAPart: adaptive workload-aware partitioning knowledge graphs. In: SEMAPRO 2021, The Fifteenth International Conference on Advances in Semantic Processing, Barcelona, Spain. Thinkmind Digital Library, pp. 12–17 (2021)
Google Scholar
Dong, Y., Chawla, N.V., Swami, A.: metapath2vec: scalable representation learning for heterogeneous networks. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 135–144 (2017)
Google Scholar
Keshavarzi, A., Kannan, N., Kochut, K.: RegPattern2Vec: link prediction in knowledge graphs. In: 2021 IEEE International IOT, Electronics and Mechatronics Conference (IEMTRONICS). IEEE, pp. 1–7 (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Georgia, Athens, GA, 30602, USA
Amitabh Priyadarshi & Krzysztof J. Kochut

Authors

Amitabh Priyadarshi
View author publications
You can also search for this author in PubMed Google Scholar
Krzysztof J. Kochut
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Krzysztof J. Kochut .

Editor information

Editors and Affiliations

Télécom Paris, Paris, France
Gerard Memmi
Purdue University, West Lafayette, IN, USA
Baijian Yang
Shanghai Jiao Tong University, Shanghai, Shanghai, China
Linghe Kong
Nanyang Technological University, Singapore, Singapore
Tianwei Zhang
Texas A&M University – Commerce, Commerce, TX, USA
Meikang Qiu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Priyadarshi, A., Kochut, K.J. (2022). PartKG2Vec: Embedding of Partitioned Knowledge Graphs. In: Memmi, G., Yang, B., Kong, L., Zhang, T., Qiu, M. (eds) Knowledge Science, Engineering and Management. KSEM 2022. Lecture Notes in Computer Science(), vol 13369. Springer, Cham. https://doi.org/10.1007/978-3-031-10986-7_29

Download citation

DOI: https://doi.org/10.1007/978-3-031-10986-7_29
Published: 19 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-10985-0
Online ISBN: 978-3-031-10986-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics