Mobile Link Prediction: Automated Creation and Crowd-sourced Validation of Knowledge Graphs

Building trustworthy knowledge graphs for cyber-physical social systems (CPSS) is a challenge. In particular, current approaches relying on human experts have limited scalability, while automated approaches are often not accountable to users resulting in knowledge graphs of questionable quality. This paper introduces a novel pervasive knowledge graph builder that brings together automation, experts' and crowd-sourced citizens' knowledge. The knowledge graph grows via automated link predictions using genetic programming that are validated by humans for improving transparency and calibrating accuracy. The knowledge graph builder is designed for pervasive devices such as smartphones and preserves privacy by localizing all computations. The accuracy, practicality, and usability of the knowledge graph builder is evaluated in a real-world social experiment that involves a smartphone implementation and a Smart City application scenario. The proposed knowledge graph building methodology outperforms the baseline method in terms of accuracy while demonstrating its efficient calculations on smartphones and the feasibility of the pervasive human supervision process in terms of high interactions throughput. These findings promise new opportunities to crowd-source and operate pervasive reasoning systems for cyber-physical social systems in Smart Cities.


I. INTRODUCTION
Mobile cyber-physical systems involve humans utilizing mobile services in their social contexts. This inclusion of human actors extends the classical cyber-physical systems paradigm [1] to cyber-physical-social systems (CPSS) [2]. These systems integrate both, social and physical systems by intelligent human-machine interactions in cyber-physical space [3].
Knowledge graphs store information in a graph structure that are often utilized in these CPSS to improve services such as route navigation [4], health recommendations [5] [6] or question answering [7]. In particular, knowledge graphs improve the performance of learning algorithms at predicting unobserved relationships between entities in an application domain [8,9]. Nevertheless, manually building knowledge graphs may be impractical and unscalable [10]. Hence systems utilizing link prediction methods are proposed to automate the building of knowledge graphs [11].
These CPSSs are designed either explicitly or implicitly for values such as usability [12], autonomy [13], or privacy [14]. The sucessful implementation of these values into CPSSs can determine their adoption by humans [15] [16,17] and thus should be explicitly accounted for in the design phase [18].
Hence, this work applies a value-sensitive design methodology [19,20,18,21,22] that explicitly considers values such as privacy and accountability to design a CPSS in the form of a knowledge graph builder that constructs a knowledge graph by adding links. By utilizing a novel link prediction methodology, the knowledge graph building is automated. In particular, users are assisted to identify missing relationships in a knowledge graph via a link prediction method. By following a privacy-by-design approach, both the knowledge graph as well as the link prediction method are deployed locally on users' mobile phones without access from a third-party. The automated knowledge graph building remains accountableby-design to humans by letting users supervise the accuracy of recommendations via accepting or rejecting recommended links. Moreover, as this feedback is then in turn utilized to train the link prediction method, users can control the calibration of their machine intelligence. This value-sensitive design approach builds a trustworthy domain-specific knowledge graph about users' reality that can improve services provided by CPSS such as privacy-preserving recommenders.
The contributions of this work are the following: • An automated knowledge graph builder for CPSSs that is accountable via human-supervision, preserves the privacy of its users and runs locally on smart phones. In particular, the novel approach connects expert knowledge, automation and crowd-sourcing to collaboratively build a trustworthy and personalized knowledge graph. • The extension of an existing link prediction methodology [23] with structural semantic and temporal information. • Extended and novel similarity metrics that measure the probability of link formation between two nodes of a knowledge graph. • Identification of dominant metric ensembles that guide link prediction in knowledge graphs in a smart city application scenario.
This paper is organized as follows: In Section II, contentbased recommenders and knowledge graph building via link prediction are discussed. A data model for knowledge graphs and its applications for digital assistance is introduced in Sec-tion III, while the automated and privacy-preserving knowledge graph builder is then outlined in Section IV. Thereafter, Section V illustrates the methodology of the conducted experiment and Section VI presents the evaluation. Section VII summarizes the findings and Section VIII draws a conclusion and gives an outlook on future work.

II. BACKGROUND AND LITERATURE REVIEW
Intelligent CPSSs in the form of recommenders are studied to sort through information and to make personalized recommendations to individual users [24]. Two types of methods are applied in these recommender systems [25]: Userbased collaborative and content-based filtering. The former is often not privacy-preserving as it relies on collecting sensitive information from users [26,27,28]. In contrast, the latter relies on informative content descriptors [26] in the form of a common and transparent information source that can be constructed by expert knowledge [14], crowdsourced information [29,30], or automation [31]. Often this approach does not rely on sensitive user information and thus can better preserve their privacy. In particular, this approach optimizes recommendations by matching users' preferences (e.g. watched products) with product information. A novel approach in content-based recommender systems that follows a value-sensitive design improves product recommendations while shopping by matching local user personalization with a centrally maintained information source in the form of a knowledge graph [14], which has been shown to improve recommender systems [32]. By performing this matching on the users' phone, the accuracy of recommendations is improved while users' privacy is preserved. Nevertheless, as knowledge graphs are often static and incomplete [33,34], this approach misses the opportunity to improve recommendations by letting users build the utilized knowledge graph [9].
In general, such knowledge graph building can either be performed by (i) human experts (ii) crowd-sourcing, or (iii) automation [35]. Utilizing human experts results in highly accurate knowledge graphs, but lacks scalability due to the limited available human resources [36]. Crowd-sourcing information scales better but may result in less accurate knowledge graphs [37]. Additionally, the scalability of that approach, though increased, can also become saturated as the slow down of Wikipedia growth indicates [38]. Hence, automating knowledge graph building is promising to increase its scalability. Nevertheless, it is a challenge to determine the accuracy of automatically constructed knowledge graphs which reduces their trustworthiness [35].
Two tasks in knowledge graph building are identified [39]: Knowledge graph completion and error detection. The former focuses on adding new instances (e.g. links) to the knowledge graph whereas the latter identifies and removes erroneous information from the knowledge graph. To tackle these challenges with automation and thus scale up the knowledge graph building, two types of methods are utilized [35]: latent feature and graph feature-based methods. Latent feature-based methods often lack the capability to account for new entities entering the knowledge graph as those are not considered in the latent feature calculations [40]. Moreover, these methods utilize the whole knowledge graph in their calculation which can result in limited scalability and privacy concerns [9]. In contrast, graph-based methods utilize the knowledge graph directly to calculate features. Three types of methods are identified that utilize graph-based information [35]: Similarity measures, rule mining and inductive logic programming, and path rank algorithms.
Similarity-based methods is the most commonly used approach in link prediction [41]. In this approach, a score is assigned to new candidate links, and the top-k links with the highest score are recommended [42]. These algorithms require no domain knowledge to compute the similarity scores [43] and can identify homophily patterns in knowledge graphs [42]. Depending on the structural information utilized in the calculations, similarity measures can be clustered into three groups [44]: local, quasi-local, and global similarity metrics. When compared to the computations of global metrics, local similarity metrics computations are more efficient and parallelizable but are restricted on distance-two nodes (neighbors of neighbors) [44]. Quasi-local similarity metrics are less efficient when compared to local metrics but can in contrast to those assign similarity scores to further apart nodes [44].
Recently, knowledge graphs grew to networks consisting of thousands of different object and link types [45]. These networks are often incomplete and change dynamically, which makes mining and analysis challenging [46,47]. In particular, link prediction in such networks has to model topological as well as temporal and semantic influences between various types of relationships and to identify the underlying mechanisms that drive the formation of new relationships [48]. Extensive reviews of link prediction are outlined in Wang et al. [47] for social networks, in Shi et al. [49] for networks with more than one relationship type and in Martinez et al. [44] for complex networks. In the following, an overview of the link prediction literature that is important for this work is illustrated.
Tylenda et. al [50] find that temporal information about changes in knowledge graphs are a dominant feature in link prediction. This has been confirmed by Yang et al. [51] by introducing supervised and unsupervised methods for link prediction in knowledge graphs. Moreover, the authors introduce the multi-relational influence propagation metric for heterogeneous networks. Likewise, other researchers developed measures and algorithms utilizing the ontology of knowledge graphs. For instance, Maedche et al. [52] introduce relation and taxonomy similarity metrics to measure the similarity between any two objects in a knowledge graph by analyzing ontological information. It was shown that these measures perform well for cluster analysis [53]. Likewise, Opuszko et. al [54] predict links between actors using ontology-based similarity measures. They show that including these measures can improve the prediction performance. Nonetheless, they also show, that their results are often not easily interpretable and that it is not obvious how to weight the different measures when combined.
In particular, this often requires domain knowledge and manual effort [55]. This is confirmed by Brando et. al [56] who state that the weighting of different metrics is a grant challenge in the context of link prediction.
Bliss et. al [23] address this problem by utilizing a genetic algorithm for link prediction [23], which adjusts the weights of the similarity measures by optimization. They use local similarity measures to estimate the likelihood of an unobserved link existence. The strength of this approach compared to other link prediction algorithms is that it neither requires the assumption of network classes nor prior knowledge about the analyzed knowledge graph as the weights are calculated by the optimization strategy [57]. It is shown that this approach produces comparable results to other link prediction approaches while enabling researchers to analyze the networks driving mechanisms. In particular, the change of weights of different similarity measures during a period of time or for different networks can be analyzed [23]. Often neither a single metric is dominant for predictions [58] nor is the combination of metrics stable over different application domains. Thus identification of dominant metric weights in novel application domains is required. Nevertheless, Bliss et. al [23] primarily focus on topological information of a homogenous network [57] and thus they neither investigate the performance of temporal and ontology-based similarity metrics nor the applicability of their method on heterogeneous 1 or multi-dimensional 2 networks consisting of a multitude of link and node types.
In summary, current approaches in recommender systems often lack privacy-preservation and rely on a centralized completion of global knowledge graphs which does not scale well when applied in a smartphone setting. The identification of dominant combinations of similarity metrics for an application domain is challenging and the approach of Bliss et al. [23] does not consider temporal and ontological similarity metrics which might improve prediction accuracy.
In order to solve these identified gaps, this work extends the method of Bliss et. al [23] with temporal, ontology-based, local, and quasi-local similarity measures that are applied to a multi-dimensional and heterogeneous knowledge graph to identify the dominant similarity metrics in a smart city application scenario. The method is then utilized in a humansupervised and privacy-preserving knowledge graph builder to enable users to build a personalized knowledge graph in a given application scenario that collaboratively combines experts' knowledge, crowd-sourced information, and automation.

PREDICTION
This article focuses on knowledge graphs that are modeled as an ontology. Such a structure enhances machines/ algorithms capability to analyze and interpret information [52]. In the following the concept of an ontology is introduced

Symbol
Explanation graph consisting of nodes in V and links in E degree of node u Pn path of length n between u and v J(u) set of all links in which u is subject: set of realized links in which u is the subject and v the object node: set of node pairs that are connected via a relation identifier j.
(Section III-A) and its data model is defined (Section III-B). Moreover, applications of knowledge graphs are illustrated (Section III-C).

A. Ontology
Ontologies formally define types, properties, and relationships between entities that are applied to a concrete domain and enable the construction of knowledge graphs [40]. Ontologies and all related concepts are rigorously defined in Maedche et. al [52]. In the following, those terms and terminologies are introduced that are relevant for this work. An example of an ontology and an instance of it -a metadata structure, which together construct a knowledge graph, are depicted in Figure 1. An ontology consists of concepts (e.g. human in Figure 1) and relation identifiers (e.g. waited at in Figure 1). The concepts are structured in a concept hierarchy (e.g. human is a sub concept of root). Concepts are instantiated by instances (e.g. Albert Einstein is an instance of a human in Figure  1). Moreover, a concrete relationship is an instantiation of a relationship between two instances (e.g. Albert Einstein-born in -Germany is a concrete relationship in Figure 1).

B. Directed graphs as a data model for knowledge representations
As depicted in Figure 1, an instance of a knowledge graph can be modeled as a graph. In the following necessary definitions are given.
Let i ∈ {1, ..., N } be a concept (i.e. human, country, bus stop) and N the number of different concepts, then V i is the set of nodes of the same concept i and V = ( is the set of all nodes. An instance of a concept i can be denoted as u i or as u ∈ V i . A relation identifier j ∈ {1, .., M } := S M (e.g. waited at, born, etc.) can connect two nodes u, v ; M being the number of different relation identifiers. A concrete relationship is denoted as a triplet (u, v, j), where u, v ∈ V and j denote the relation identifier. These links are directed, where u is the subject and v the object. The set of realized triplets of link type j is then denoted as E j and E = ( is the set of all realized links. e ∈ E is referred to as concrete relationship or link in the following. The graph or network with nodes in V and links in E is then denoted as G(V, E). If N > 1 the graph G is called heterogeneous and if M > 1 the graph is called multidimensional. On such a multi-dimensional and heterogeneous network a similarity measure s i can be defined rigorously as in Chen et. al [59]. Each s i measures how similar two nodes u, v are. The linear combination of such metrics is also a similarity metric [59]. A similarity measure can be normalized onto the range [0, 1].

C. Automated knowledge graph building via link prediction
In this work, link prediction is utilized to automate the completion of a knowledge graph by predicting links between existing instances. In the following, the link prediction problem is introduced (Section III-C1). A method is then proposed in Section IV.
1) Problem formulation: The link prediction task on a multidimensional and heterogeneous graph G (Section III-B) can be stated as follow: Let G p (V, E p ) ⊂ G(V, E) be a sub graph such that E p ⊂ E. The task of link prediction is to identify those j ∈ E that are currently not observed in E p . Let u, v ∈ V , j ∈ S M and s 1 : being normalized similarity measures. A similarity measure models the probability of an unobserved link to be established between the nodes. Two types of predictions are considered in this work: • Existence prediction is utilized in one-dimensional knowledge graphs (M = 1) and predicts if any link exists between the two nodes u and v (∃j , thus without specifying the relation identifier. The probability of such a link formation is defined as the normalized similarity measure s 2 (u, v) between the two nodes. • Semantic prediction: is utilized in multi-dimensional knowledge graphs (M > 1) and predicts what relation identifier connects the two nodes u and v. The probability of a link j ∈ S M to be formed is defined as the similarity measure s 1 (u, v, j) between these two nodes along the candidate link. This work distinguishes between these two types of predictions because of computational considerations: Heterogeneous and multi-dimensional knowledge graphs have complex dependency structures [48]. Computing the semantic type of a link between two nodes requires distinguishing the formation mechanism for each link type [48] which is computationally costly. Hence this work predicts the semantic type of relationship only between nodes where it is known that already a link exists. Existence prediction is utilized for nodes that are not connected yet.

IV. A HUMAN-SUPERVISED AND PRIVACY-PRESERVING KNOWLEDGE GRAPH BUILDER
In this section, the knowledge-graph builder is introduced. It performs link prediction (Section III-C1) on a heterogeneous and multi-dimensional knowledge graph (Section III-B) utilizing an optimization mechanism in form of genetic programming (Section IV-A) to optimize the weights of various similarity metrics (Section IV-B3). These weighted metrics then facilitate the recommendation of missing links (Section III-C1) in a knowledge graph (Section III-A). Moreover, the builder is supervised by the user and preserves their privacy.
In the following, a background on genetic programming is given (Section IV-A) before the link prediction method is introduced (Section IV-B). Finally, the knowledge graph builder is illustrated (Section IV).

A. Background: genetic programming
Genetic programming is an optimization method that is utilized in symbolic regression to identify underlying functions to given data points that explain their dependencies [60]. Compared to other optimization strategies, genetic programming provides solutions for large, poorly defined search spaces that are high-dimensional, multi-modal, and noisy [61]. Due to its flexibility in adjusting to diverse problems [62], genetic programming has gained an increased interest in diverse research communities such as software improvement [63], image processing [64], production scheduling [65], and machine learning [66].

B. Link prediction method
The method is an extension of the link prediction algorithm found in Bliss et. al [23] with temporal and ontology-based metrics added: The core idea of the prediction algorithm is to measure topological, temporal and semantic similarity metrics s i between a target node u and candidate node v on a multi-dimensional and heterogeneous graph G and then to predict based on a weighted combination of these metrics if a link j between these nodes exists (existence prediction, Section III-C1), resp. what type of link should be formed (semantic prediction, Section III-C1).
The link prediction method consists of two main steps and is depicted in Algorithm 1. In the first step, the weights a i are obtained via genetic programming (Section IV-A). Then, Equation 1 is utilized to predict which links are unobserved in a given candidate set that was obtained by a baseline heuristic. In both steps, it is distinguished between semantic and existence prediction, as formulated in Section III-C1.
Algorithm 1 Link Prediction Algorithm 1: procedure PREDICT(u) obtain link existence or type information 2: a ← getWeights() see section IV-B1 3: r ← predictType(u, a) or predictExistence(u, a) see section IV-B2 4: return r List of links (semantic prediction) or nodes (existence prediction) with assigned similarity values as calculated by Equation 1.
1) Training -Obtaining weights a i : Algorithm 2 depicts the weight calculation algorithm: It first calculates a training set and then uses this training set as an input for the genetic programming algorithm, which after termination returns the weight vector. List<TrainingInstance> l ← getTrainingInstances() 3: a ← calcGenetic(l) 4: return a The training set generation algorithm creates training sets consisting of positive and negative training instances. In link existence prediction, an instance consists of: (u, v, {0, 1}), u, v ∈ V . In case of semantic prediction it has the following form: (u, v, j, {0, 1}), u, v ∈ V , j ∈ M . For both types, 1 indicates that a link (of type j) exists between node u and v and 0 that no link exists. The training set consists of 50% positive (1) and 50% negative (0) instances. Depending on the available information, the algorithm operates in two modes for the generation of negative instances: (i) Knowledge of non-existent links: It is known that specific links of type j are not existent between some nodes u, v ∈ V . This often requires manual work but is considered as the gold standard in method evaluation [39]. (ii) No knowledge of non-existing links: It is assumed that all unobserved links in the network are non-existent. Due to the incompleteness of knowledge graphs [33,34], this approach is considered as the silver standard in evaluation [39].
The details of how genetic programming is implemented in this work can be found in Appendix A.
2) Prediction: Figure 2 illustrates the utilized link prediction method: In a first step, for a given target node u (e.g. Max Frisch in Figure 2) a candidate set c is calculated by a baseline method. This method returns a set consisting of candidate nodes v i (existence prediction) or of candidate noderelationship pairs (v i , j) (semantic prediction), j ∈ S M . Then, in a second step, Equation 1 is applied on the candidate set to obtain a similarity score for each candidate (u, v i ), t ← getNeighbors(u) 3: s ← getNeighborsForEach(t) 4: s.remove(t) 5: result.add(random(s, N/2)) 6: s ← getAllNodes() 7: s.remove(t) 8: result.add(random(s, N/2)) 9: return result Candidate set of nodes respectively (u, v i , j). The set is ordered based on these scores and the top entries are utilized for link prediction. In the evaluation (Section VI), the accuracy of these predictions are compared to the accuracy of taking randomly instances from the baseline's candidate set. Two baseline methods are utilized for comparison, one for the existence and one for the semantic prediction. The existence baseline method (Algorithm 3) for a target node u considers both, exploitation and exploration of the existing knowledge graph: In order to exploit topological information, 50% of the candidate set consists of neighbors of existing neighbors of u that are not already connected to u. The other half is constructed by exploring the remaining knowledge graph and thus to include not connected nodes randomly with an equal probability.
Because instances in knowledge graphs are often linked by more than one relation identifier, the semantic baseline method (Algorithm 4) exclusively exploits topological information: The candidate set is constructed by the neighbors of u, which are included in the candidate set with an equal probability. Hence, in contrast to the existence baseline, the full knowledge graph is not explored to decrease the computational complexity as outlined in Section III-C1. Each of the selected nodes is then accompanied with possible relationship identifiers of links that still can be formed between the selected node and u. s ← getNeighbors(u) 3: for Node c : s do for each candidate 4: rel ← chooseNonExistingRelationship(u, c) 5: l.add(c,rel) 6: if l.size()> N then 7: break 8: return l Candidate set of pairs (node,link) 3) Utilized similarity metrics: The utilized metrics are clustered in three groups, characterized by their applicability in existence prediction, semantic prediction or both predictions. Moreover, the metrics are normalized to take values in the interval [0, 1]. In the following, additional notation to illustrate the metrics is introduced (Section IV-B3a) before the utilized metrics are illustrated in greater detail (Section IV-B3b).
a) Additional notation: Besides the notation introduced in Section III-B, the following is required to define the utilized similarity metrics: The neighborhood of u for link type j is defined as Γ(u, j) = {v ∈ V |∃(u, v, j) ∈ E j }. The neighborhood of u is defined as Γ(u) = ∪ j∈M Γ(u, j). In addition, k u is the degree of node u. A path of length n between u and v is denoted as P n . J(u) = {j ∈ M |∃v ∈ V : (u, v, j) ∈ E} is the set of all links in which u is the subject node. J(u, v) = {j ∈ M |u, v ∈ V : (u, v, j) ∈ E} is the set of realized links in which u is the subject and v the object node. Moreover, E(j) = {(u, v)|∃(u, v, j) ∈ E} is the set of node pairs that are connected via a relation identifier j and N (j) = {u ∈ V |∃(u, v, j) ∈ E} is the set of subject nodes.
b) Metric description: Table II depicts the 27 utilized metrics. As illustrated in Section III-C1, this article distinguishes between semantic and existence prediction. Not all metrics can be utilized in both of these types of predictions (Columns 4 and 5 in Table II). 19 metrics are topological metrics that utilize the metadata structure of the knowledge graph (Section III-A), two are time-based and 14 utilize semantic information of the knowledge graph by using its ontology (Column 6 in Table II). Four metrics are introduced in this paper (ID 16-19 in Table II), two metrics found in literature are modified such that they are normalized to take values in the range [0, 1] (ID 2, 3 in Table II) and three metrics are adjusted such that they can be utilized with the information model ((ID 9 − 11) in Table II). These novel and modified metrics are illustrated in greater detail in the Appendix.

C. knowledge graph builder for cyber-physical systems
The knowledge graph builder consists of the link prediction method, a feedback mechanism called tinder view, knowledge graph visualizations, and a metric weight dashboard. They are integrated into users' daily life by deploying the builder as a mobile application on users' phones, as depicted in Figure 3. The user can supervise the link prediction method by giving feedback via the tinder view as depicted in Figure 4. In this way the completion of the knowledge graph happens in a supervised way: links are recommended by the algorithm and final decisions for their acceptance or rejection are performed by the users which is a necessary condition for users' autonomy [77]. Based on this supervision (information about existing and non-existing links), weights of the link prediction methods are updated, as illustrated in Section IV-B. In particular, by utilizing this evaluation strategy that is considered as the gold standard in validation [39], the builder continuously collects information about non-existing links that are utilized in the training set construction of the genetic programming. The obtained weights are presented to the users in the metric dashboard. Moreover, users can adjust the metric weights in the dashboard and thus control the link prediction mechanism. Moreover, they can learn how the algorithm is configured and thus reason about link predictions. Finally, as the data and algorithms are deployed locally without the requirement to communicate with a centralized server, the privacy of the users is preserved. In particular, no information about the metric weights or the knowledge graph are revealed to third parties. Fig. 4. Illustration of the feedback mechanism, as presented to the experiment participants. Links can be rejected or accepted by swipes. In the example, the user is asked if the contact Alex Bachler is connected with Cathy Zelkowsk. Any instance of the knowledge base such as bus stops or cities could be presented instead of a contact.

V. EXPERIMENT METHODOLOGY
The knowledge graph builder is evaluated by a social experiment. Additionally, the system is utilized to investigate the performance of various similarity metrics to predict links. In the following, the methodology of the experiment is illustrated. In particular, hypotheses are introduced (Section V-A), the data schema is illustrated (Section V-B) and the experiment execution is illustrated (Section V-C).
A. Hypotheses and Operationalisation 1) Knowledge graph builder: The usability of the knowledge graph builder and its accuracy in predicting unobserved links in a knowledge graph is investigated by the following hypotheses: a) The knowledge graph builder improves the accuracy of link prediction compared to the baseline heuristic: The knowledge graph builder utilizes a genetic programming approach to estimate the weights of various similarity measures (Table II). These weights are then utilized to improve the accuracy in link prediction of the baseline method ( Figure 2 and Algorithms 3 -4).
Both methods are evaluated in the following way: Users rate link suggestions (true/ false) via the feedback mechanism (Section IV-C) of the knowledge graph builder. One-third of these suggested links are drawn randomly from the baselines candidate set and two-third are taken from the highest-ranked results of the link prediction method (Figure 2). The accuracy in the form of true positives is then evaluated.
b) The knowledge graph builder is usable measured in terms of user interaction: As reasoned in Section I, the success of CPSS is dependent on its usability for humans. The usability of the knowledge graph builder is measured by analyzing the frequency with which users utilize the feedback mechanism of the knowledge graph builder to train the link prediction method.
2) Metric weights: The dominance of different metric weights is investigated by the following hypothesis.
a) A combination of metrics compared to a single metric increases the accuracy of link predictions: It is known from link prediction in energy grids that not all metrics show the same performance. Moreover, often the combination of several metrics outperforms a single metric [78]. Hence, likewise, it is assumed that also in knowledge graph completion a combination of metrics increases the accuracy of predictions compared to single metrics. The hypothesis is evaluated by analyzing the final metric weights obtained from genetic programming. Assuming that genetic programming maximizes accuracy, a single metric is dominant if its weight is close to one and those of all other metrics are zero.
b) Semantic and temporal metrics improve the link prediction performance: Leveraging semantic and temporal information increase link prediction performance compared to a scenario that utilizes only topological information (Section II). This is evaluated by analyzing the metric weights obtained by the genetic programming algorithm. In particular, the weights of temporal and semantic metrics are compared to those of topological metrics.

B. Knowledge graph instantiation: Model and schemas
The data model of the knowledge graph is a directed graph, as illustrated in Section III-B. Figure 5 depicts the instantiation flow of the data model: In a first step existing ontologies such as friends of a friend 3 are merged in Protege 4 and extended with relation identifiers and concepts illustrating a city. In particular, in order to support users completing their existing knowledge graph within the cyber-physical system of a smart city, relation identifiers (e.g. the visit relation), and concepts (e.g. bus stops) illustrating a city are added.
In order to automate the process of storing information on an android mobile phone that utilizes SQLite 5 , a relational data schema in MySQL workbench is created that is populated with the data from Protege. This data is extended with information from Open data Zurich illustrating tram and bus stops in Zurich (e.g., the geo-locations).
Finally, this data is exported to users' phones where the data is extended with personal contact book information of each user resulting in a personalized knowledge graph illustrating the users' social contacts in the city of Zurich. This approach preserves a user's privacy as all personal information is stored locally on users' phones. 3 Ontology that defines people related terms suitable for storing generalized user profile data, as well as social friendship relations [28]: http://www.foafproject.org/ (last accessed: May 2020). 4 A free, open-source ontology editor and framework for building intelligent systems that uses the owl schema and is developed by the University of Stanford: http://protege.stanford.edu/. 5 C-language library implementing a SQL database: https://www.sqlite.org/index.html (last accessed: May 2020).

C. Setup
The experiment has an execution time of one week (17.08.2017 -23.08.2017) and consists of three phases as depicted in Figure 6. The eleven participants are recruited by convenience sampling. In the initialization phase, users obtain a welcome email that contains detailed experiment instructions and which can be found in the Supplementary material. In the experiment phase, the users utilize the feedback mechanism and knowledge graph view of the knowledge builder ( Figure  3) to complete their knowledge graph and to supervise the link prediction method. In the exit phase, users export their knowledge graph via an export button and send it by mail to the instructors of the experiment. During the export process, all personal data of the users are anonymized.

VI. EXPERIMENTAL EVALUATION
By investigating the Hypotheses of Section V-A, both, the accuracy and usability of the knowledge builder (Section VI-A) as well as the capability of metrics to recommended unobserved links are analyzed (Section VI-B) in the following sections. Table III depicts the true (TP) and false (FP) positives of the genetic link prediction method compared to the baseline. The accuracy of the method is 27.9% 6 and of the baseline is 12.20%. Table IV depicts the true positives on user level. For all users, the genetic link prediction method outperforms the baseline heuristic. On average, the link prediction method outperforms the baseline overall experiment participants by a factor of 2.13. Moreover, the figure also depicts the number of evaluated links per user. On average, a user evaluates 2627 links. Assuming a time of five seconds for a user to evaluate a link, users spent on average 3.6 hours evaluating links which indicates, considering that the utilization of the feedback mechanism is not incentivized, that the feedback mechanism is practical. Figure 7(a) illustrates the cumulative distribution function (CDF) for similarity values of links that have been 6 In contrast to typical application scenarios in which recommender search spaces are small (e.g. types of pasta in a supermarket), the search space of the experiment is large consisting of every possible link between any two nodes in the knowledge graph. Hence, this larger search space size could explain the lower TP probability of the applied method when compared to the performance of recommender systems in other typical scenarios. Also, a cold start of the algorithm is applied which initially can lower the TP probability and which could be analyzed in future work by extending the study period of the experiment.    evaluated as existing (1) and non-existing (0) by the users. One notices, that an existing link has a higher probability of having a high similarity value than a non-existing link and that in turn, non-existing links have a higher probability for low similarity values. Thus, the knowledge builder assigns higher similarity values to existing links than to non-existing links which indicates that the knowledge builder distinguishes between existing and non-existing links.

A. Knowledge graph builder
In the experiment, always the top-ranked results of the candidate set are recommended to the user. In particular, no threshold in the form of a specific similarity value is utilized that would prevent recommendations of links in case the candidate set consists entirely of non-existing links having low similarity values. In future work, such a threshold could be introduced to achieve higher true positives probabilities by removing those links automatically from the recommendations that have a low similarity value. Figure 7(b) depicts the average time genetic programming requires to recalculate the weights for the existence and semantic prediction on users' phones (Equation 1, Section IV-B1). The weight calculation for the existence prediction takes on average 7.5 minutes and for semantic 17.5 minutes. As the weights are recalculated on average every two hours to account for new user feedback, it is concluded that the deployment of the knowledge graph builder on users' phones is feasible. In the case of existence prediction, a combination of topological (Shortest path), semantical (AORelation), and temporal (OneDayEps) metrics are dominant to predict links. In semantic prediction, shortest path dominates the ensemble of deciding metrics. Nevertheless, also other topological metrics (hub promoted, Edge dimension connectivity), semantic (Conditional probability), and temporal (Oneday eps, half day eps) metrics contribute significantly to the predictions. In none of the scenarios (top 2, top 4, all users) a single metric is found to determine alone the link prediction. Nevertheless, in both types of predictions, the shortest path shows a large dominance when compared to the other metrics. In particular, for both ensembles, a hierarchy can be observed in the dominance of the metric weights: Temporal metrics have lower weights. An explanation for this is that due to the short time period of  Both observations indicate that the algorithm is challenged by the cold start and that its performance could be improved when the study period is extended.

VII. SUMMARY OF FINDINGS
The key findings of the performed experiment are summarized as follow: • The knowledge graph builder is practical. In particular, its usability is indicated by the large number of supervision actions performed by users and its feasibility to run locally on users' phones. • Optimizing the weighting of diverse similarity metrics for link prediction with genetic programming outperforms a baseline heuristic with regard to accuracy. In particular, links with a higher similarity value are recommended to the user which improves the acceptance of predicted links. • An ensemble of semantic and temporal metrics are identified that dominate link prediction in a smart city application domain. This confirms findings from the literature that an ensemble of metrics can outperform a single metric. • The novel metric AOrelation is dominant in the evaluated link existence prediction scenario.
In a nutshell, the findings demonstrate that the contributions of this paper support the domain-independent building of scalable and trustworthy knowledge graphs. In particular, the automation scales up the building process by suggesting links of high similarity accurately to users. The local human-supervision, which is considered as the gold standard in knowledge graph evaluation [39], facilitates trust in the constructed knowledge graph. And the learning of underlying mechanisms that guide link formation in knowledge graph in the form of dominant metric weight ensembles indicates that the knowledge graph builder can be applied domainindependently to novel applications.

VIII. CONCLUSION AND FUTURE WORK
This paper argues that an accurate and automated knowledge graph builder for cyber-physical-social systems can be constructed that accounts for values such as privacy-preservation and accountability. By applying a value-sensitive design approach a system is designed that builds knowledge graphs automatically while remaining accountable to humans via local human supervision and thus can be applied effectively to novel application domains such as smart cities. In particular, localized supervision is considered as the gold standard in knowledge graph evaluation and thus increases the trust in the constructed graph, while the automation facilitates the scalability of the building process. This is evaluated by a methodology that integrates the constructed system into users' daily lives.
The results point to various avenues for future research. First, the identification of dominant similarity metrics in a smart city application scenario suggests to further investigate these metrics in varying application domains. In particular, the domain-independence of the knowledge graph builder could be further demonstrated. Second, several machine learning models utilized in automation are not explainable [79] which limits users' trust [18]. The transparent display of metric weights in the metric dashboard of the knowledge graph builder could be a basis for the explainability of recommended links. In particular, a user could reason why a link was recommended based on the observed metric weights. Third, the participant field of the user study and the time frame of the experiment could be enlarged to add significance to the identified findings, reduce the cold start problem and identify further temporal patterns which could improve the recommendation accuracy. Finally, the parameters of the knowledge graph builder could be fine-tuned by a meta-optimization strategy to improve the prediction accuracy.

APPENDIX GENETIC PROGRAMMING
Genetic programming (Section IV-A) is utilized to calculate the weights a i of Equation 1. One way of implementing a genetic programming algorithm is described in Algorithm 5. In the following, this implementation is illustrated in greater detail. The reader is referred to Koza [60] for definitions and motivations of utilized terms. a) Fitness Evaluation Function: The function is defined from the space of weight vectors a and training sets I (Section IV-B1) to the space of real numbers: f (R n , I) → R, where n is the number of utilized similarity measures. A training set with positive and negative examples of links between nodes is used to calculate how close a particular a predicts the existence of a link, respectively type of a link (Section III-C1).
Let I be a concrete set of training instances, having the size m. Let i ∈ I be a single training instance, then in case of link existence prediction the instance i = (u, v, {0, 1}) is an array of size three (Section IV-B1), u being the target and v the candidate node. Then a mean squared error evaluation is utilized as the fitness function: (2) b) Genotype and phenotype: The genotype of an individual is its weight vector a. This vector is stored in a linked list, where the last element points to the first. The phenotype of an individual is then its fitness value, calculated via its weight vector and the training instances by f (a, I). The smaller the fitness value of an individual, the better is the genotype (weight vector a) of that individual able to predict links in the training set.
c) Crossover: A single-point crossover is chosen. The linked list of the weight vector a is split randomly at the same position for both parents and then two children are created. d) Mutation: Mutation is simulated by identifying randomly a position in the weight vector a and altering its value if a specific threshold is matched. The new value is randomly chosen from a uniform distribution in the interval [0, 1). e) Selection: The least fit individual of a generation is removed from reproduction. f) Micro approach: A micro genetic programming approach is utilized by considering small population sizes with 5 to 11 individuals per generation. Please refer to Hafner [80] for details. Table II illustrates the similarity measures utilized in the experiment. Some of those are obtained by modifying metrics found in literature or are introduced in this work. These two types of metrics will be described in the following. while notDone && (iter < maxIter) do 6: iter++ 7: population.run() 8: tmpBestIndivid ← population.getBest() 9: if tmpBestIndivid.getFitness() < bestIndivid.getFitness() then 10: bestIndividual ← tmpBestIndividual 11: if bestIndividual.getFitness() < tol then 12: notDone ← FALSE population.purge() 13: return bestIndivid.getGenotype() return weight vector a • Resource Allocation (R) This measure is also normalized:

A. Modified Metrics
• Focci distance The measure found in Jahanbakhsh et.
al [75] is adjusted because a neighborhood cannot be defined in the same way as in the referenced work. In this work, a neighborhood is defined as the union of nodes to which both nodes u and v are connected via the same relation identifier. Thus the Focci distance looks as follow 1 |Γ(z, inverse(j))| (5) • Shortest Path (SP) Due to the computational complexity of path-based methods, this measure is restricted to paths of maximum length 5. If a shortest path is longer than 5, than the similarity value is 0:

B. Novel Metrics
Two novel semantic metrics for are introduced that are independent of the topological distance of the target and candidate node. These novel metrics are: • Active Relations Relation (ARR) This measure is the Jaccard index for the active relationships of two nodes. Hence it counts the number of active relationships in which both, target node u and candidate node v engage and divides the result by the number of all relationships in which u engages.
The idea is, that two nodes are more similar when they share the same type of relationships in which they actively engage. • AO Relation (AOR) Counts the number of neighbors of candidate node v which are of the same concept as the target node u and divides the number by the amount of neighbors of v: AOR(u, v) = z∈Γ(v) Eq(type(u), type(z)) |Γ(v)| where Eq returns 1 if both types are equal. Else it returns 0. The rationale is, that a relationship will form more likely if the candidate node v already engages in relationships with nodes of the same type as u. This metric m(u, v) has a reversed version mr(u, v) which is defined as: mr(u, v) := m(v, u) and a combined version mc(u, v), which is defined as mc(u, v) := m(u,v)+mr(u,v) 2 .