Semantic 3D City Database – an enabler for a dynamic geospatial knowledge graph

This paper presents a dynamic geospatial knowledge graph as part of The World Avatar project, with an underlying ontology based on CityGML 2.0 for three-dimensional geometrical city objects. We comprehensively evaluated, repaired and reﬁned an existing CityGML ontology to produce an improved version that could pass the necessary tests and complete unit test development. A corresponding data transformation tool, originally designed to work alongside CityGML, was extended. This allowed for the transformation of original data into a form of semantic triples. We compared various scalable technologies for this semantic data storage and chose Blazegraph™ as it provided the required geospatial search functionality. We also evaluated scalable hardware data solutions and ﬁle systems using the publicly available CityGML 2.0 data of Charlottenburg in Berlin, Germany as a working example. The structural isomorphism of the CityGML schemas and the OntoCityGML Tbox allowed the data to be transformed without loss of information. Efﬁcient geospatial search algorithms allowed us to retrieve building data from any point in a city using coordinates. The use of named graphs and namespaces for data partitioning ensured the system performance stayed well below its capacity limits. This was achieved by us-ing scalable and dedicated data storage hardware capable of hosting expansible ﬁle systems, which strengthened the architectural foundations of the target system.


Introduction
General context of the paper -problem space Development of sustainable digitisation practices is widely recognised as an important part of roadmaps at organisational, industry [60], national [3] as well as international levels [31]. Radermacher [50] points to the fact that global governing bodies, such as the UN, G20 and the World Bank, all agree on the importance of adopting digitisation standards for achieving international comparability. Despite the complexity of existing standards and the time-consuming adoption and implementation of information systems into a digital form, those bodies agree that the benefits greatly outweigh the costs. Radermacher [50] also notices that a scientific approach and adherence to such standards ensures trust based on evidence. Moreover, it also positively impacts accountability, transparency and control of execution at all levels. Undoubtedly, roadmaps strengthening comparability include technological solutions designed with intention to support systems interoperability as well [51].
The World Avatar (TWA) is an all-encompassing dynamic knowledge graph. It is built on agent-based system [3,52] architectural principles. Within it, intelligent agents operate on a knowledge graph built in accordance with semantic web standards and recommendations provided by the W3C. Therefore, the system can be regarded as an example of a general knowledge graph, capable of multi-domain knowledge representation [3,17,18,20,22,23,41,64,65]. Answering to inter-domain interoperability problems is at its core. The target system is a digital representation of the world. Representations of different domains must also adhere to applicable standards.
Designed as one of the critical TWA components, the J-Park Simulator (JPS) [35,47,48,62,63,66] includes representations of built environments and agent-based subsystems capable of simulating emissions dispersion from various types of air pollution sources as well as optimising designs of Eco-Industrial Parks (EIPs) with respect to their carbon footprint [21].
The system architecture for the Semantic 3D City Database proposed in this paper aims at closing some of the gaps, particularly related to current built environment representation within JPS and TWA. Basing it on reusable Open Source components and standardised interfaces encourages wider adoption throughout other information systems that require scalable and interoperable three-dimensional representation of such environments.

Cities and geospatial information
Existing City Information Models (CIM) already integrate large urban datasets in order to represent multiple aspects of cities such as the built environment, energy management, transport, etc [26]. Linking across domains and securing scalability are key challenges to developing urban Digital Twins (DT) [53]. Representing cities as three-dimensional models of built environments is a crucial step to add more urban data and knowledge [59] to representing urban environments that are developed and planned.
One of the common ways of 3D modelling for built environments in various information systems is to use the CityGML standard, provided by the Open Geospatial Consortium (OGC) [28]. This can be used as a data exchange standard for city landscape management and planning systems or even as a file-based data source for applications visualising 3D city landscapes on the web. It is possible to encode information about different domains within this format through domain-specific extensions as well as combine purely geospatial concerns with any others in order to analyse or support decisions regarding digitised urban design blueprints. A digital twin of the Manchester landscape in CityGML 2.0, with solar irradiation projected on the roofs of the buildings [43], is just one of the plethora of examples currently available on the web. However, developing applications built on static files lacks flexibility. Apart from compliance with standards, flexibility is a key ingredient to achieving interoperability [51]. It also keeps standard based systems open to future innovation. Dynamism of the stored data is required to be able to perform simulations under various conditions and hot swap certain information in representations on demand. Moreover, dynamic representation allows the gaps in static city models arising from the constant evolution of the entities within built environments to be addressed.
The open source 3D City Database, developed at the Technische Universität München (TUM), was meant to close some of those gaps [57]. 3D City Database is a suite of tools to transform data encoded in flat CityGML 2.0 files into a more flexible database format and to store and visualise 3D city landscapes. It has been under development since 2003 [61]. The flagship examples, which showcase how to store and visualise city data with the help of those tools, are models of Berlin and New York in Level Of Detail 2 (LOD2). This approach demonstrates a possibility of storing city data and adhering to the CityGML 2.0 standard in a different way than by using static XML files. However, relational database backends of this solution, limit implementation of semantic data interoperability. While some authors have reported on first attempts to use graph databases for geospatial data [2], discussions contain predominantly general ideas and partial results [46]. Adopting a semantic data store allows for turning a bare 3D City Database into a knowledge base [52] with inference, truth maintenance and reasoning engines. This makes the resulting Semantic 3D City Database an enabler for the dynamic geospatial knowledge graph in TWA.

Synthesis
Research at the University of Geneva focused on the other side of the problem spectrum led to producing CityGML ontology. It turned out to be possible to generate an ontology, with one to one matching between concepts, by applying XLST transformations into original CityGML 2.0 schemas. The ontology produced by applying those techniques could be regarded as a step towards bringing the standard and applications providing semantic interoperability together. However, closer examination of the available ontology reveals a number of issues concerning its quality and, because of that, suitability to be used as a Tbox (an ontology schema) for reliable applications adhering to semantic web standards and recommendations. Apart from that, there is also a lack of data transformation tools which would allow population of the ontology with data and produce instances for an Abox. Schema and instances are equally needed to build any application able to operate a three-dimensional geospatial data. Such applications also need to provide reliable geospatial search functionality with acceptable data retrieval times [38]. Although there are semantic triple stores implementing geospatial search, there is a lack of examples of semantic web applications operating on the multitude of geometries required to represent entire cities. Any software application satisfying such requirements needs appropriate hardware to facilitate this specific functionality. Fully semantic 3D city database architecture definitions, bringing together all the above specified components in order to provide foundations for semantic applications operating on dynamic city models, are not currently available.
The purpose of this paper is to present such an architecture definition as well as the steps necessary to produce proof of concept solutions that address the mentioned problems. The ontology refinement process and its evaluation with regard to quality and correctness, required to ensure Tbox reliability, are elaborated on in the following Section 2. Next, the use of refined ontology concepts in the process of augmenting data transformation tools is presented, based on existing open source solutions (Section 3). Evaluation of existing semantic data stores, carried on before producing an Abox by utilising terms of the refined ontology, is described in Section 4. The final Section 5 is devoted to presenting estimated hardware requirements, as a result of evaluating geospatial functionality on tens of thousands of buildings with over 2000 different types of two-dimensional and threedimensional geometrical shapes.

Refined Ontology for CityGML
Producing a proof of concept Semantic 3D City Database required ensuring reliability of the ontology, used as its schema, in the first step. As a result, following the general principle of reusability, such schema -OntoCityGML -is based on an existing ontology reflecting the CityGML 2.0 standard and developed at the University of Geneva [15,36]. Because of the methods used to produce it, it is referred to as CityGML ontology in the following subsections, where the steps and methodology undertaken in the process of refining it to a version suitable for the proof of concept solutions are elaborated on.

An evaluation of the CityGML ontology
The publicly available CityGML ontology [15], which served as a base for OntoCityGML ontology, was developed as a result of applying XSLT transformations to CityGML 2.0 schema [28] as well as some manual mapping needed to generate it [15]. The ontology implements 185 classes, 281 object properties and 92 data properties. It also contains 1254 implemented axioms. Categorisation of criteria used during its evaluation in order to check suitability for the proof of concept presented in this paper is enumerated in Table 1. The suite of tools and plugins available in the Protégé ontology editor [45] was utilised during this process.
The following errors were reported by those tools in the CityGML ontology, and then manually fixed during evaluation, based on the all of the above metrics. First, it did not pass the Accuracy test. There were a number of Illegal redeclarations of entities: reuse of entity errors. For instance, the term year of construction, also present in the original Charlottenburg CityGML 2.0 data, was implemented as owl:ObjectProperty and owl:DatatypeProperty entities at the same time. More than fifty errors of this nature were reported by the editor. Second, the Conciseness test checked whether the base on-

An evaluation of the OntoCityGML ontology
OntoCityGML ontology, which served as a Tbox [5] for the proof of concept Semantic 3D City Database, is an extension of the CityGML ontology and a result of resolving previously mentioned issues. As in the case of the base ontology, tools and plugins available for the Protégé ontology editor [45] were used for its evaluation. The OntoCityGML ontology implements 344 classes, 272 object properties and 78 datatype properties. It also contains 3363 implemented axioms.
The Computational efficiency test shows that the expressivity of OntoCityGML ontology DLs is equivalent to A L E H (D) DLs. Due to the DLs' expressivity of the ontology falling between DL-Lite [11] and S ROI Q DL [30], the OntoCityGML cannot be used to query city data stored in relational databases by the means of the ontology-based data access (OBDA) technologies [39]. The HermiT reasoner is able to classify the OntoCi-tyGML ontology. In debugging mode, it also detects that this ontology is Consistent and Coherent. The OntoCityGML ontology fully passed Accuracy, Conciseness and Completeness tests as well. Protégé does not show any errors related to illegal declaration of entities or reuse of entities.
To cover TWA domain appropriately and to the extent needed for the proof of concept, sixty-nine new terms were implemented into the OntoCityGML ontology. The terms were checked for one to one correspondence between the implementation in the ontology and the CityGML 2.0 specification [28]. The list of terms corresponds to the unique list of CityGML 2.0 tags found in the Charlottenburg-Wilmersdorf data used in this proof of concept. OntoCityGML axioms relevant to this list are included in the Appendix A.
Additionally, each of the new terms implemented in the OntoCityGML was covered by unit tests. This step ensures that any further changes to the OntoCityGML ontology preserve its structure. Sample test cases are included in Appendix B. Furthermore, the On-toCityGML ontology has been placed under git version control system in order to make tracking such changes transparent.

Augmented Data Transformation Tools
For the proof of concept, data validation mechanisms of the augmented Importer/Exporter tool, originating from TUM [57], were used to transform CityGML 2.0 data to the Semantic 3D City Database [5] which uses OntoCityGML terms to describe city models. In this process, every CityGML object is validated by the tool prior to its instantiation to a corresponding Java object. The data transformation process is described in the next section (3.1).

3D City Database Importer/Exporter Tool
Depending on the level of detail, CityGML models can form quite complex and, when measured by the average present-day computing capabilities, relatively large datasets. An architecture of any application designed to work with such models needs to be developed with this in mind. In the particular case of the semantic geospatial knowledge graph, it has to ensure efficiency of SPARQL queries on an Abox as well as balance performance and optimal data storage. While citygml4j [16], an open source Java class library and API, provides a very good start to work with CityGML 2.0 models programmatically, there is a lack of tools which would be able to turn such models into semantic triples forming an Abox of a geospatial knowledge graph.
After exploring options, the closest existing data transformation tool able to fulfill such requirements is the 3D City Database Importer/Exporter [1]. It is also based on citygml4j and is available as an open source project. The TUM tool is optimised to work with large CityGML 2.0 models and uses multithreading to read the data, transform it and write it into a database [57]. Hence, it is more computationally efficient than a raw library when the potential number of a city model objects being processed simultaneously is taken into consideration. This matters in the case of large and detailed models. The unmodified tool supports Oracle and PostGIS relational databases and makes use of the Java Database Connectivity (JDBC) API with respective database connectors. This particular design allowed the reuse and augmentation of large parts of its code to work with Semantic 3D City Database based on a non-relational graph triple store.
In order to augment the tool in such a way, Jena JDBC, A SPARQL over JDBC driver framework [32], was utilised for the proof of concept presented in this paper, with choos- Figure 1: CityGML 2.0 data transformation tool augmented to support Blazegraph™ as a data store back-end. In the depicted menu, connection to the semantic database was established and the tool is ready to start importing the data. In this augmented version of the Importer/Exporter tool [57], city model data will be imported via executing SPARQL statements with OntoCityGML vocabulary against the semantic data store, instead of SQL statements against a relational database with predefined 3DCityDB schema, like in the original version. The original functionality is preserved and relational database types can still be used.
ing its Remote Endpoint driver connecting into a SPARQL Protocol compliant triple store that exposes SPARQL query and SPARQL update endpoints. Adding the new driver allowed augmentation of the tool and preservation of its original functionality. The majority of the new features were added into two of the tool's original code packages: impexpclient and impexp-core. While modifying the first one allowed addition of components enabling the selection of new database type, augmentation of the second component allowed the tool to be capable of establishing a connection to a triple store via JDBC. This required adding a new database backend adapter and incorporation of five new classes into the foundation codebase. Namely, BlazegraphAdapter, GeometryConverterAdapter, SchemaManagerAdapter, SQLAdapter and UtilAdapter classes had to be implemented, at minimum, for the tool to be able to facilitate connectivity to a semantic triple store with the new driver.
As the Importer/Exporter tool was originally designed to work with relational databases at a very high level, it was validating the CityGML models before instantiating model members into Java objects, which were, in turn, persisted in a database by the means of corresponding SQL statements. For the tool to be fit for purpose when used as a data transformation tool with the Semantic 3D City Database, the last step had to be augmented with the functionality to generate equivalent SPARQL statements for the respective Java objects. Preserving current data structures leveraged many years of development spent on query and storage optimisation at TUM while developing it and fine-tuning over that time [57].
In order to produce a semantic twin of the 3D City Database representation for the Charlottenburg-Wilmersdorf district of Berlin, the following classes of the org.citydb.citygml.importer.database.content module had to be modified to work with SPARQL JDBC prepared statements instead of SQL JDBC prepared statements: DBCityObject, DBBuilding, DBAddress, DBAddressToBuilding, DBCityObjectGenericAttrib, DBSurfaceGeometry, DBAppearance, DBAppearToSurfaceData, DBExternalReference, DBSurfaceData, DBTexImage, DBTextureParam, DBThematicSurface. New methods generating SPARQL prepared statements were added to each of these classes and covered by appropriate unit tests. The existing code was augmented to fill in those statements with CityGML objects data when the semantic backend is specified as a chosen option for the 3D City Database Importer/-Exporter tool. As depicted in Figure 1, the modified tool is able to produce the Semantic 3D City Database Abox, using OntoCityGML as a Tbox. It also produces a semantic "mirror twin" of the relational 3D City Database, with additional properties specific to semantic knowledge bases. Those properties are described in more detail in the next subsection.

Relational Schema to Graph Mapping
The heart and soul of much mathematics consists of the fact that the "same" object can be presented to us in different ways [44]. The present section elucidates this statement while taking into consideration the results of data transformations, which are outcomes of the augmented data transformation tool. The augmented Importer/Exporter is able to produce the original database as well as the Semantic 3D City Database. Structural isomorphism of the 3D City Database and its semantic twin, illustrated in Figure 2, shows their equivalence.
There is one to one correspondence [6] between schemas of the proof of concept databases, in terms of the number of schema objects as well as their names RDB ∼ SDB. Both make use of names defined in the CityGML 2.0 conceptual schema.  The open world assumption (OWA) in the semantic representation [5] is what makes it different from the relational database representation with closed world assumption (CWA) [54]. An analogy could be drawn to the 10 = 9.99999... equation. It is possible to say much more about the world when one has the realm of real numbers at hand to describe it -as on the right side of the relation -than it is when one is left to do so with only decimal numbers at hand -as on the left side of the relation. The same statement holds when considering representations built on elements listed in Table 2.
The following example illustrates one of the potential consequences of CWA and OWA when considering relational versus semantic 3D City Databases used to build the proof of concept described in this paper. Both databases contain information concerning a building identified by gml id BLDG_000300000007a403, which could be also found in the original CityGML 2.0 representation of Charlottenburg-Wilmersdorf downloaded from https://www.businesslocationcenter.de/en/economic-atlas/download-portal/ for the purpose of building this proof of concept. Both databases also contain information about an address, identified by the gml id UUID_76daf80a-2fef-443d-88bb-b9bc0c24fffb belonging to this building. While querying for information concerning the building and its address in both databases, it is possible to find that this building is in Berlin at 36 Tauroggener Str. One can also find out that the building is bounded by a polygon specified by the following set of coordinates: However, the relational database address record for the building contains NULL in place of the country, whereas the semantic database contains a BLANK NODE as a vertex, which is an endpoint of the country edge connected to the address of the building on the other side. In the case of the relational database, under the CWA, this could be interpreted as "The building with this address does not belong to any country". In case of its semantic twin and under the OWA, on the other hand, it is possible to interpret that as "It is not known to which country the building with this address belongs to".
This would matter if both databases were integrated into a bigger system, like TWA, and turned into knowledge bases. For example, one could imagine a subsystem integrating different information sources coming from a few neighbouring European countries in order to find out the most suitable roof locations for solar panels, for instance. It is not hard to imagine that some of those information sources would not contain any geospatial information about the buildings, but would rather make possible retrieval of some information for buildings by postal address. Geospatial search, as one of the features of the dynamic geospatial knowledge graph in TWA, would allow retrieval of information concerning buildings with one roof in a square area spanning those countries and compare this information with the one retrieved from the information sources containing no geospatial information whatsoever.
It would be not possible to easily integrate those systems and make them interoperable under the CWA. On the one hand, one would end up with the information containing the one roof buildings in a few countries, specified by the coordinates of the square. On the other hand, one would end up with buildings with either no country, under the CWA, or unknown country, under the OWA. It would be easier to add the country information to such buildings by narrowing down the geospatial search to the country level after that and filling in the missing information under the OWA.
This way, it would be possible to say that "The building identified by gml id BLDG_00030-0000007a403 is in Germany" and integrate this information with other systems, which do not contain any geospatial information for European buildings. However, under the CWA, one would end up with two contradictory statements: "The building identified by gml id BLDG_000300000007a403 is in Germany", and "The building identified by gml id BLDG_000300000007a403 does not belong to any country". Contradictory statements contain no information and, therefore, it is possible to say much more about the world under the OWA. Adding new information onto the CWA system "The building with the address identified by the gml id BLDG_000300000007a403 is in Germany" changes such a system and invalidates previous inferences, whereas such a thing does not take place in the OWA systems.

Semantic 3D City Data Store
Results from working on the OntoCityGML ontology and augmented data transformation tool, described in previous sections, ensured the possibility of city model representations compliant with W3C and OGC standards at the same time. In order to obtain this result, promising when looked at from the point of view of sustainable digitalisation practices, realised in a form of dynamic geospatial knowledge graph, Semantic 3D City Database had to be created within a scalable and W3C compliant triple store as well. To keep the architecture open to further collaborations and maximise potential of its reuse and innovative modifications in the future, during the proof of concept stage of the database, research presented here was focused on open source stores. From the scalability point of view, the store must be capable of accommodating city data in a form of semantic triples. Furthermore, it must be possible to add more data without significantly reducing performance of geospatial queries. In addition to that, the store must be capable of ensuring multidomain interoperability by allowing city data to be linked with any other data in a semantic form and be queried for such relationships. The following describes results of the research, briefly summarised in Figure 3, as well as motivations of the final triple store technology choice as a target solution for this proof of concept.
Considering Eclipse RDF4J, an open source framework for processing Resource Description Framework (RDF) data [19], as a triple store for the Semantic 3D City Database was motivated mainly by its relative popularity as well as familiarity. It would allow implementation of TWA data interoperability within a dynamic geospatial knowledge graph without the need for data migration. At the moment, there are a few existing TWA components which use it as a data store backend. The framework has modular architecture. It is composed of a parser/writer API, model API, repository API and a storage and inference layer (SAIL) API. Its repository API supports SPARQL 1.1 query and update language. Different core database implementations are also supported, such as memory store, native store and elastic search store. On top of these core databases, the RDF4J API can be extended with SPARQL Inferencing Notation (SPIN) rule based reasoning functionalities [37]. The RDF4J framework implements GeoSPARQL functions, but it fails almost all of the GeoSPARQL benchmark tests [33]. The SAIL interface can be successfully used for communication between the RDF4J framework and an Apache HBase database in order to process petabytes of heterogeneous RDF data [55]. However, limited geospa- tial support as well as limited out of the box scalability motivated further research on the triple store of choice for the semantic geospatial database storing city models.
Another open source SPARQL server project, Apache Jena Fuseki, was also considered because of its relative popularity and familiarity (use of and interoperability with TWA components). It has been used as a triple store backend for some of TWA components as well. Users can run it as an operating system service, Java web application or a standalone server. The server follows the SPARQL 1.1 protocol to query and update RDF data [4]. It also provides a graph store protocol [56]. A Fuseki SPARQL server evaluation test shows that it is too slow to be used in the production of software for intensive use [34]. Apache Jena Fuseki supports a HTTP server component that conforms to the GeoSPARQL standard [25]. A GeoSPARQL compliance benchmark test used thirty benchmark requirements to prove that Jena Fuseki can handle geographical vector data representation literals. The Jena Fuseki server supports top level spatial and topological relation vocabulary components, as well as Resource Description Framework Schema (RDFS) entailment [33]. Because geospatial search support has been set as an essential requirement for the dynamic geospatial knowledge graph, the lack of it, as well as limited scalability, motivated further research on the triple store of choice for the Semantic 3D City Database proof of concept.
Although the spatiotemporal store Strabon, provided as an open source project by the University of Athens, has never been used with TWA or any of its components before, it caught initial attention during research on semantic data stores for the dynamic geospatial knowledge graph as well. This was partially for the reasons discussed before; because it uses familiar rdf4j backend as one of its components. Moreover, the store provides rich geospatial support and implementation of GeoSPARQL, stSPARQL, GML and WKT literals [40]. Its architecture is also based on using named graphs to separate data. Strabon is known to show good performance on single machine and synthetic datasets [24]. The novelty of the underlying stRDF model and stSPARQL query language consists of adding temporal extensions to the semantic representations [42]. Although the mentioned query language and model has not yet made it to the realm of W3C standards, from their authors' perspective they provide a major advantage over pure GeoSPARQL. However, in the context of TWA, a time varying knowledge graph was already considered during development of its Parallel World Framework [21]. Within TWA, this approach will be explored further instead, as it was conceptualised to address a much broader problem spectrum than that. Strabon scales up to 500 million triples [42]. This is less than double the estimated number of triples required to transform the entire Berlin CityGML 2.0 data available at the moment. It would be hard to link other datasets with the city data and provide a sufficiently rich multi-domain interoperability for TWA under those limits. Moreover, apart from the limited scalability of the pure rdf4j backend, there are still a number of open problems related to scalability and inferencing with stSPARQL and its underlying data model [37]. Some of them are solved by using PostGIS in addition to rdf4j within the system. However, Semantic 3D City Database is a proof of concept demonstrating the possibility of utilising fully non relational data stores for geospatial representations, so that they can be incorporated into dynamic geospatial knowledge graphs. Strabon also appears less than production ready for larger deployments, due to its relatively rudimentary documentation [58] when compared to other RDF stores. An additional point is a lack of technical community forums which would allow users to find answers to common questions and help with resolving any potentially arising issues. Geospatial search as a feature is also not mentioned in the currently available documentation or publications concerning the store.
Blazegraph™ is an active open source project and a triple store which met the requirements listed at the beginning of this section. It is a W3C compliant semantic data store released under a GPL-2.0 License [8]. Because of its compliance with standards, it proved to be relatively easy to migrate other TWA data to this triple store as well, even if it had not been used within the system before. The latest stable version, 2.5.1, was released on the 19th of March 2019. The latest version candidate, 2.6.1, was released on the 4th of February 2020. Blazegraph™ is in production use at Fortune 500 companies such as EMC, Autodesk and Wikimedia Foundation's Wikidata Query Service. Semantic transformation of Charlottenburg-Wilmersdorf CityGML 2.0 LOD2 data contains 20,570 buildings and results in 24,244,610 triples materialised and stored in Blazegraph™ across multiple named graphs in a single namespace. Blazegraph™ supports up to 50 billion edges on a single machine and it seems to be capable of accommodating the whole Berlin city data, which is split into 12 parts (Charlottenburg-Wilmersdorf being one part). Assuming that each part contains between 20,000 and 25,000 buildings, there will be a need to accommodate between 240,000 and 300,000 buildings in order to semantically represent the whole of Berlin. Assuming uniform complexity of the buildings in other parts of the city, it would result in between 282,873,427 and 353,591,784 semantic triples generated. This is still quite far from reaching the single machine limit and shows the possibility of integrating various additional layers of heterogeneous data to complement the city data and achieve high levels of multi-domain data interoperability within a general knowledge graph, such as TWA.
Blazegraph™ supports geospatial search via SPARQL queries. Partitioning data into namespaces allows for query optimisation by executing them on smaller portions of data, potentially in parallel. Using named graphs for different parts of city objects (i.e. walls, roofs, etc.) allows querying smaller graphs within namespaces independently. Information resulting from such independent queries could be combined into information about larger objects, as well as various interdependencies between such objects. The linked data approach allows OntoCityGML buildings' data to be combined with other semantic data, either stored within one and the same namespace, or across named graphs in separate namespaces. Federated queries are supported by Blazegraph™ too. Sale-out and High Availability features are available in Enterprise editions of Blazegraph™, which also supports GPU query optimisation, amongst many other features. Transactions, very high concurrency and very high aggregate IO rates are supported in all of its editions.
In conclusion, Blazegraph™ triple store has been used for this proof of concept because of its scalability as well as geospatial search algorithms already implemented as a part of its functionality. Enabling this functionality in Blazegraph™ required development of a custom vocabulary class as well as a datatype configuration properties file. Due to the variety of the geometrical shape types found in the proof of concept data, it was not possible to create such a configuration manually. Therefore, a functionality of automatically creating such datatype configurations as well as corresponding vocabulary items was added to the augmented TUM data transformation tool's GeometryConverterAdapter class. Together with a newly introduced BlazegraphConfigBuilder class, based on a thread safe singleton design pattern, the tool detects any new shape type not previously encountered in the data and creates appropriate configurations based on its geometrical properties. Those properties are also encoded in vocabulary item names to make it easier to break down the stored data into underlying geometries. This way, for instance, it is possible to find out that coordinates stored under the datatype ending with SOLID-3-15-15-15-15-15-15, as the last part of the IRI, contain information about a cube. The first number -3 -describes dimensionality of the stored geometry type. The rest of the numbers say that the stored data consists of 6 parts of such a cube and each piece describes a polygon in terms of 15/3 = 5 points encoded in a coordinate system. This algorithm allowed detection and building of datatype configurations automatically for over 2000 different geometrical shape types found in the Charlottenburg-Wilmersdorf LOD2 building data. Those were required to fully complete and materialise the Semantic 3D City Database proof of concept, enabling the possibility of dynamic geospatial knowledge graph components within TWA. Sample live query results performed on the Charlottenburg-Wilmersburg data, with the Semantic 3D City Database already integrated into TWA, and particularly relevant to the city planning, are presented in the next section together with hardware requirements, estimates and recommendations.

Hardware Requirements
Integration of the Semantic 3D City Database, populated with Charlottenburg-Wilmersdorf LOD2 building data, by transforming the original CityGML 2.0 representation using the Return specified generic attribute "Qualitaet" with all its values found in the dataset and order solutions by the value in descending order.

8.
Return all distinct street names found in the dataset. Count the number of buildings in every street.

9.
Return all distinct street names that have building function code "1134" in it. Count how many times that function occurred and order streets by the number of function occurrence in descending order. 7 155 ms.

10.
Return all distinct street names that have building function code "1444" in it. Count the ratio of the specified function in each street. Order results by the ratio in descending order.
augmented Importer/Exporter tool and OntoCityGML ontology, into the TWA, enabled dynamic geospatial knowledge graph capabilities within the wider system. Cities Knowledge Graph (CKG) is a TWA subsystem under active development and research [13] in collaboration between the Cambridge Centre for Advanced Research and Education in Singapore (CARES) [12] and the Singapore-ETH Centre (SEC) [14]. A decision support system for Smart City planning, based on the CKG, is a showcase of making sustainable urbanisation practices while being aided by the sustainable digitisation practices described in the previous sections. Results of sample questions, important from the city planning perspective, translated into the appropriate SPARQL queries executed against the CKG are listed in the tables 3 and 4. The number of query solutions as well as elapsed time are listed next to each query.
The system is deployed to a server with Microsoft Windows Server 2016 Standard as an operating system and 1TB of storage space, 200GB RAM as well as 2 Intel® Xeon® ES-2620 v3 @ 2.40GHz CPUs. Out of the total 200, 32GB of RAM is assigned solely to the Blazegraph(™), deployed in Nano SPARQL Server mode. The current journal file size used by it to store available city data is 6.13GB. System performance on the type of queries illustrated in Tables 3 and 4 is satisfactory in a single user mode.
Following are the results of executing multiple geospatial search queries simultaneously on the CKG, using the same hardware. This incremental geospatial search concurrency test follows the first few numbers of the Fibonacci Sequence, each multiplied by 10.
• 10 concurrent geospatial search queries completed in 611 ms, increasing overall system CPU utilisation by 3% and memory utilisation by 0%. Queries returned 3913 city objects contained in variable square size areas in total.
• 20 concurrent geospatial search queries completed in 1617 ms, increasing overall system CPU utilisation by 4% and memory utilisation by 0%. Queries returned 28653 city objects contained in variable square size areas in total.
• 30 concurrent geospatial search queries completed in 2771 ms, increasing overall system CPU utilisation by 8% and memory utilisation by 0%. Queries returned 52196 city objects contained in variable square size areas in total.
• 50 concurrent geospatial search queries completed in 3959 ms, increasing overall system CPU utilisation by 9% and memory utilisation by 0%. Queries returned 76689 city objects contained in variable square size areas in total.
• 80 concurrent geospatial search queries completed in 5766 ms, increasing overall system CPU utilisation by 10% and memory utilisation by 0%. Queries returned 110318 city objects contained in variable square size areas in total.
• 130 concurrent geospatial search queries completed in 9274 ms, increasing overall system CPU utilisation by 11% and memory utilisation by 0%. Queries returned 184496 city objects contained in variable square size areas in total.
• 210 concurrent geospatial search queries completed in 12292 ms, increasing overall system CPU utilisation by 11% and memory utilisation by 0%. Queries returned 255711 city objects contained in variable square size areas in total.
Geospatial search queries were sent to the /citieskg/namespace/berlin/sparql endpoint in the TWA as collections of HTTP GET requests, using Postman v8.1.0 [49] web API testing tool, over the Internet. The proof of concept CKG, populated with Charlottenburg-Wilmersdorf LOD2 building data and integrated into TWA, shows satisfactory results in tests of concurrent execution of geospatial search queries as well, when looked at from the point of view of expected workloads.
Recommendations for larger and more intensive workloads vary. For instance, Nguyen and Kolbe [46] test graph comparison algorithms on city data using a machine running SUSE Linux Enterprise Server 12 SP1 (64 bit) equipped with Intel® Xeon® CPU E5-2667 v3 at 3.20GHz (16 CPUs + Hyper-threading), a PCIe Solid-state Drive Array (SSD) and 1 TB of main memory. Whereas RDF GAS API, implemented in Blazegraph™, seems not to be particularly heavy on memory and CPU requirements, but instead it emphasises the importance of the SSD technology to achieve close to 1 million traversed edges per second on a MacBook Air [10]. Upon consultation, one of the market-leading server hardware vendors recommended the following configuration for the CKG -at least, as a starting point to a system able to store and query between 282,873,427 and 353,591,784 semantic triples generated in case of the whole Berlin and integrated within TWA: 1. Frontend web server:

Conclusions and future work
Applying sustainable digitisation practices in order to build scalable technological solutions, based on standards and open source components, could be used to create intelligent decision support systems. Architecture definition for one of them, in the form of CKG, elaborated on in this paper, shows that incorporation of the Semantic 3D City Database into TWA, enables dynamic geospatial knowledge graph capabilities within it and makes it able to aid sustainable urbanisation practices and decision making processes.
Previous sections demonstrate how to build knowledge graphs capable of semantic representation of three-dimensional geometrical city objects and, in this way, provide comprehensive insights based on multi-domain data interoperability. A refined OntoCityGML ontology, presented in Section 2, is an example of bringing a well-defined and specified international standard, namely CityGML 2.0, into the world of semantic web and making it compliant with W3C recommendations and standards at the same time. The demonstration of using such ontology with augmented data transformation tools proved suitability of the ontology to serve as a schema for the semantic twin of the 3D City Database, designed and optimised at TUM for many years. CKG leverages this past work by keeping data in named graphs organised within namespaces. Advantages of the new semantic representation arising from the implied OWA have been shown when adding intelligence to a bare database is of interest as well. The last sections show the possibility of materialisation of such a database within a scalable triple store with geospatial search capabilities. The store also adheres to W3C standards as well as making it possible to integrate geospatial data within a general knowledge graph, such as TWA, and provide multi domain data interoperability. Such systems could be hardware demanding. Sample tests of a live CKG single user and concurrent queries gauge existing TWA capabilities with regard to that, allowing better understanding of how to evolve this aspect of the system further. Such recommendations conclude the last section of this paper.
TWA, at its core, is an agent-based system where intelligent autonomous agents operate on the knowledge graph. A system of agents specific to the CKG has not been considered during its proof of concept as presented in this paper. In order to do that, further strengthening of the OntoCityGML would be required, mainly consisting of cross-checking more CityGML 2.0 concepts not found in the sample city data used in this proof of concept. This would allow to add more data into the CKG and enable it to serve as a base for even broader insights with such data linked to other datasets, already available in TWA. At the same time, further extensions of the TUM data transformation tools would be needed as well. Higher level geospatial search functionalities currently implemented Blazegraph™, namely inCircle and inRectangle [9], failed in some of the tests conducted using different coordinate reference systems. Therefore CKG makes use of only customFields geospatial search feature at the moment. Resolving issues with those higher level types of search would add more capabilities and enable implementation of certain functionalities going forward as well. Making use of the Parallel World Framework, already implemented in TWA, with the city data would allow users to simulate, analyse, dynamically visualise and evaluate various urbanisation scenarios and enable the possibility for cross domain sustainability impact analyses.
One can not forget that TWA, as an information system, also occupies some physical space in a data centre -a group of buildings used to house computer systems and their associated components. Improving the sustainability of such buildings has come into focus for many big technology companies in recent years. Very often they look into making use of more efficient cooling as well as utilisation of renewable energy sources, such as installation of solar panels on their roofs, as well as multi-tenancy or maximisation of optimal computing resource utilisation, eliminating idle but energy-consuming times. When looked at from this angle, the presented proof of concept may be regarded as a brick on paving the road towards self-sustainable knowledge graphs.