Approximate Query Answering Based on Topological Neighborhood and Semantic Similarity in OpenStreetMap

In this paper we focus on a pictorial query language, referred to as Geographical Pictorial Query Language (GeoPQL), and we revise its formal semantics by considering the polygon-polyline, polyline-polyline, and polygon-polygon topological relationships. This work proposes the Approximate Answering Engine (AAE) within a Distributed System, referred to as GeoPQLJSON (GeoPQLJ). The AAE provides approximate answers to query with empty results by following two directions: the Operator Conceptual Neighborhood (OCN) graph, and the OpenStreetMap (OSM) attribute hierarchy, giving maximum ﬂexibility to the user choices. According to the former, the geo-operators of the queries can be replaced with the ones labeling the adjacent nodes of the OCN graph. By following the latter, the system evaluates the OSM attribute semantic similarity according to the information content approach, and proposes possible attribute replacements to the user. Note that the presence of OSM attributes allows the quick and direct access to large amount of geographical data, without requiring in our case the use of the topological elements. The functionalities of the Distributed GeoPQLJ System are illustrated by several query examples.


I. INTRODUCTION
Distributed Geographic Information Systems (GIS) have significantly enriched the expressiveness of geographic data models thanks to the contribution of users. OpenStreetMap (OSM) [3] is an example of distributed environment which allows a new way of collecting geographic information through the crowd rather than organizations, and brought free access to a plethora of geographic information.
In general, in a world-wide distributed GIS, the presence of native data models for querying geographic data is a key requirement [10]. Geographic queries indeed can be better expressed by using graphical metaphors in query languages which are powerful to express the user's mental model of the query [34]. In GIS, geographic query languages should satisfy two basic requirements: they must be powerful and easy to use at the same time. Powerful, because they have to retrieve information about complex database schemas, keeping track of several relations existing among The associate editor coordinating the review of this manuscript and approving it for publication was Waleed Alsabhan . data. Easy to use, because the access to the stored information should not be limited to experts, but should be conceived for non-specialized end-users [11], [33]. These two basic requirements find a common solution in the development of advanced visual geographic query languages [20].
In this paper, we concentrate on pictorial query languages, i.e., visual languages where queries are formulated by freehand drawing [25]. In particular, we focus on the Geographical Pictorial Query Language (GeoPQL) [23], [24], [28], which is a pictorial query language that provides drawing facilities to formulate queries and correctly interprets their syntax and semantics. In the context of GIS, one of the main challenges regards the possibility of providing approximate answers to queries with empty results by relying on both topological neighborhood and semantic similarity approaches. While in the context of distributed GIS, one more challenge concerns the query response time due to the access to large amount of data. In this work we address both these problems and propose an integrated approach for providing approximate answers to queries with empty results based on OSM. It is a digital archive of geospatial data whose physical characteristics are represented on the surface that are referred to as ''features'', e.g., roads, buildings, parks, etc.. These basic features represent the database structure, and each feature is associated with a set of attributes representing geographical characteristics.
OSM is a potential alternative to commercial data and aims at generating and maintaining a free editable map database of the world in a collaborative manner without restrictive rules of the traditional copyright and license commitments [12]. In this perspective, we exploit the large amount of data associated with attributes in OSM and provide approximate answers to queries with empty results.
In this paper we have revised the formal semantics of the GeoPQL operators, and we focus on the polygon-polygon, polyline-polyline and polygon-polyline topological relationships. Then, in order to elaborate geographical queries in the distributed environment, we have associated with the revised GeoPQL operators the GeoPQLJSON (GeoPQLJ for short) functions which conform to GeoJSON format specification [1], in line with [26]. In particular, in this work we extend the GeoPQLJ functions related to polygon-polyline relationship, introduced in the mentioned paper, to the polygon-polygon, and polyline-polyline ones. These functions are based on the simple and text based GeoJSON format specification. Since GeoJSON is a JavaScript Object Notation (JSON) encoding [18], [41], the parsing and data interchanging for web services are flexible. For this reason it is one of the most popular encodings for transferring data to client-side map visualization.
The main contribution of this paper is the definition of the Approximate Answering Engine (AAE) within a system, referred to as GeoPQLJ Distributed System, which provides approximate answers, by following two different directions: the topological neighborhood based on the Operator Conceptual Neighborhood (OCN) graph, and the OSM attribute hierarchy. In the case of empty answers, according to the former direction, the geo-operators of the queries can be replaced with the geo-operators labeling the adjacent nodes of the OCN graph, on the basis of the user preferences; according to the latter direction, by keeping the same geo-operators, the attributes of the query can be replaced by accessing the list of OSM attributes, giving maximum flexibility to the user choices. In particular, the WordNet similarity engine can be used in order to evaluate OSM attribute similarity [7]. Among the similarity metrics that can be elaborated by WordNet, we selected the semantic similarity measure proposed by Lin in [36], which has been extensively experimented in the literature and shows higher correlation with human judgment.
In order to clarify the problem addressed in this paper, suppose the user is interested in the number of highways touching some amenities where it is possible to have a coffee break during a trip in Amsterdam. Such a query can be pictorially defined in our system as shown in Figure 1, where the features highway and amenity are associated with polyline and polygon respectively, and highway is the target of the query (see in Figure 1 the top left side where Target 1 is associated with highway). This query, as shown in detail in the experiment presented in Section V of this work, is translated according to the GeoJSON format specification and, by selecting the feature amenity and the related attribute cafe, is submitted to OSM. As described in the experiment, in this case the system provides an empty answer (see Figure 8), i.e., there are no highways touching an amenity with a coffee shop in Amsterdam, in the selected bounding box. In order to have approximate non-empty answers to this query, the user is provided with different possible alternatives, on the basis of his/her preferences. One of them is for instance the replacement of the touch geo-operator with pass-through, because of their adjacency in the OCN graph (see Figure 2). By replacing the touch geo-operator with pass-through, the answer to the query is non-empty and in particular the systems returns 3 highways, among the 52425, which pass through an amenity with a coffee shop in Amsterdam (see Figure 9). As described in Section V, we will see that another possibility, depending on the user preferences, is the modification of the query attributes by replacing them with the most similar ones in OSM, if there are any, according to the formal semantics of Lin [36].
Overall, in this paper we present an integrated approach for approximate query answering in distributed GIS, whose benefits can be summarized as follows: the user can express his/her queries pictorially by using drawing facilities; in the case of empty results, the proposed system provides possible approximate answers by following two different approaches, i.e., the topological neighborhood of geo-operators or the semantic similarity of OSM attributes; the presence of attributes in OSM allows the quick and direct access to large amount of geographical data, without requiring in our proposal the use of the topological elements such as point, polygon or polyline.
The paper is structured as follows. In the next section, the related work is given. In Section III, the revised GeoPQL operators are presented, and the OCN graph for polygonpolyline, polyline-polyline, and polygon-polygon relationships are given. In Section IV, the GeoPQLJ Distributed System is described. In Section V, the main functionalities of the GeoPQLJ Distributed System are illustrated. Finally, in Section VI the conclusion is given and in the Appendix the GeoPQLJ functions are illustrated.

II. RELATED WORK
In order to access and share data in distributed GIS [16], [35], the current Web-based systems adopt traditional browser/server architectures [35]. In these environments, the browser usually sends a request to the server, which processes it and returns the result to the browser. In this activity, spatial query languages become guidelines for Webbased GIS.
In [9] a framework, called XQuery for OpenStreetMap (XOSM), for integrating and querying OSM and Linked Geo Open Data (LGOD) resources is presented. It is equipped with a Web Tool and a XQuery based library that allows the definition of queries combining OSM layers created from LGOD. This library is based on the spatial operators introduced in [21] and [22]. The preliminary version of XOSM is given in [8], for the retrieval of layers with boolean spatial/keywords operators, in line with [17]. With regard to the implementation, the authors use the BaseX XQuery processor, and the PostGIS system over PostgreSQL. In [31], the logical and topological inconsistencies in the Volunteered Geographic Information (VGI), with a focus on OSM, is addressed. In the mentioned work, it has been demonstrated that the parameters such as direction, distance, and topological relationships between objects can directly affect human comprehension and analysis results. Therefore, by considering these relationships, the spatial similarity in multi-representation is used to build a framework to determine the probable inconsistencies in OSM, and fulfill the spatial quality assurance in VGI. In [40], the problem of computing and representing topological relations solely from geometries is considered and, to solve such a problem, ontologies and multi-layered topological relations, as well as a dataset of topologically linked places derived from DBpedia and OSM have been used. Since DBpedia places are only represented as point coordinates, in the mentioned work, OSM is used to match as many places from DBpedia with their corresponding polygon or polyline geometries in OSM.
With respect to the mentioned papers, in our proposal, we also use OSM but we address attributes in order to directly access large amount of data and, furthermore, in the case of queries with empty answers, we give the possibility to the users to select semantically similar attributes. In addition, we adopt a pictorial approach in order to formulate geographic queries.
Open Geospatial Consortium (OGC) GeoSPARQL standard [2], [15], is the genesis of a significant amount of work on the combination of the Resource Description Framework (RDF) and the Web Ontology Language (OWL) with geospatial data for representing and querying data on the Semantic Web. In order to facilitate querying in GeoSPARQL, for instance in [29] and [42], a graphical geospatial query tool, called GeoQuery, is presented. It includes a mapbased user interface to search functions and geospatial operators enabling queries against GeoSPARQL. In our work we adopt an inherently different approach with respect to the mentioned papers for the following reasons. First of all, in GeoPQLJ we address the user's mental model and provide him/her with a drawing environment for easily expressing pictorial queries. Whereas in the mentioned papers the query formulation by means of the graphical interface is performed according to a procedural paradigm. In fact, in our approach, the query is formulated in a non-procedural way and therefore the order in the query specification is not relevant. Furthermore, as already mentioned, in our proposal the system provides approximate answers by exploiting both topological neighborhood and semantic similarity. Note that the topic of approximate query answering has been extensively investigated in the literature in different contexts, for instance, in relational databases [19], graph modeled-data [37], aggregate databases [39], and GIS [13]. Our proposal falls under the area of approximate query answering in GIS.
In [38], the Overpass API (or OSM3S) is proposed which is an extension of the API to select parts of an OSM layer. It has a proper query language which can be specified by an Extensible Markup Language (XML) template. However, OSM3S facilities (i.e., query composition and filtering) VOLUME 8, 2020 cannot be combined with spatial topological operators (e.g., touch and cross), which indeed are at the basis of our proposed approach.
With regard to the Geography Markup Language (GML), in [30] the authors propose GQL, which is a query language conceived to support spatial queries over GML documents. In particular, they extend the underlying data model, the algebra, and the semantics of XQuery. Furthermore, in [32], a similar approach has been proposed, by also addressing the performance problem of manipulating large GML documents. With respect to the aforementioned papers, our approach can be used in a general distributed system thanks to GeoJSON (and its functions) which, by making use of spatial operators, allows the access to geo-spatial repositories in an efficient way.
Regarding the semantic similarity in [14] the authors present a Semantic Network based on OSM and propose an approach to compute semantic similarity of geographic classes in this Network. However, this work differs from our proposal because it does not rely on the information content approach, but on co-citation algorithms that compute the semantic similarity of concepts in a graph of inter-linked objects based on the intuition that similar objects are referenced together. Furthermore, in [27] a method for measuring the semantic similarity of geographic classes organized as PartOf hierarchies has been investigated. In particular, the proposed method takes into account both the concept similarity within the PartOf hierarchy (through the information content approach) and the tuple similarity (through the sets of typed attributes). However, in the mentioned paper ISA hierarchies are not addressed.
Note that, to the best of our knowledge, in the literature, there are no proposals integrating at the same time the different characteristics of the GeoPQLJ Distributed System presented in this paper, that are: a non-procedural declarative pictorial query language with related drawing facilities; approximate answers to query with empty results obtained by replacing topological geo-operators or OSM attributes on the basis of the OCN graph and the widely experimented semantic similarity of Lin [36], respectively; quick access to large amount of geographical data in OSM. Therefore, comparisons with other proposals have not been presented.

III. GeoPQL OPERATORS
In this paper, we focus on GeoPQL which is based on the notion of Symbolic Graphical Objects (SGO) [24], [28]. It has been defined to graphically represent the spatial configurations of geographic entities (i.e., point, polyline, and polygon), and the spatial relationships between SGO. 1

Definition (SGO): Given a GIS, a Symbolic Geographical Object (SGO) is a pair geometric_type,
where: • geometric_type can be a point, a polyline or a polygon; 1 A polyline is non self-intersecting (self-crossing), without loops, spirals, and bifurcations. A polygon is simple, i.e., it does not intersect itself.
• is an ordered set of pairs of coordinates, which defines the spatial extent and position of the SGO with respect to the coordinate reference system of the working area.
The GeoPQL algebra consists of a set of binary geooperators, which are logical (Geo-union, Geo-any, Geoalias), metrical (Geo-difference, and Geo-distance), and topological (Geo-disjunction, Geo-touch, Geo-inclusion, 2 Geo-intersect, Geo-cross, Geo-pass-through, Geo-overlap, Geo-equal). Our focus, as mentioned in the Introduction, is on the polygon-polyline, polyline-polyline, and polygonpolygon topological relationships. Therefore, in each of the above mentioned cases, we consider the related set of the topological operators. In particular, in the case of: • polygon-polyline topological relationships, they are: • polyline-polyline topological relationships, they are: • polygon-polygon topological relationships, they are: -Geo-equality (EQL). In Table 1 the above geo-operators and some related configurations are shown.
Below we assume that, for a given SGO, the subscripts i, b, and e denote respectively, the interior, boundary, and exterior points of the SGO. Furthermore, for any SGO, if P is a polygon and L is a polyline, the following holds: Note that in the definitions below, the order of the operands is not relevant. Furthermore, in this paper we focus on pictorial configurations representing only one geo-operator, i.e., combined geo-operators are not addressed.
In the following, the formal semantics of the polygonpolyline geo-operators is given.

Definition (Polygon-Polyline Geo-Operators):
Given a polygon P, and a polyline L, which are two SGO, the binary geo-operations DSJ, INC, TCH, INT, and PTH, are formally defined as follows, where k, j ∈ {i,b,e}: Below, the notion of semi-neighborhood is recalled, which will be used to formally define the geo-operators between polylines [25].
Definition (Semi-Neighborhood): Given a polyline L ∈ SGO, let R L a and R L b be the semi-planes of R 2 defined by L and its extension. Let us consider a point x ∈ L and a neighborhood of x, I (x). Then, I (x) L a and I (x) L b represent the two semi-neighborhoods of I (x) belonging to R L a and R L b , respectively, without L.
Finally, the formal semantics of the polyline-polyline and polygon-polygon geo-operators are presented.

Definition (Polygon-Polygon Geo-Operators):
Given two polygons P, and Q, which are two SGO, the binary geooperations DSJ, TCH, OVL, INC, and EQL, are formally defined as follows, where k, j ∈ {i,b,e}: The above geo-operators are invoked in the GeoPQLJ functions, which are introduced in the Appendix.

A. OCN GRAPH
In this section, below we recall the definition of Operator Conceptual Neighborhood (OCN) graph for topological relationships introduced in our previous work [25].

Definition (OCN Graph):
The Operator Conceptual Neighborhood (OCN) graph is a graph where each node is labeled by one geo-operator corresponding to a possible pictorial configuration, and an arc directly connects two nodes if and only if it is possible to transit from one configuration VOLUME 8, 2020 to another by applying either a translation or a rotation operations.
By neighborhood we mean the continuous translation or rotation of objects within a given topological relationship. For instance, if two polygons touch and one is moved towards the other, they must first overlap before one is included in the other. Essentially, in the graph two nodes are adjacent if and only if the operators they denote can be transformed into each other by continuously modifying the related SGO, applying either a translation or a rotation.
According to different topological relationships between SGO, in Figure 2 the OCN graphs of polygon-polyline, polyline-polyline, and polygon-polygon topological relationships are shown, respectively.
For instance, let us consider the OCN graph shown Figure 2(c) related to the polygon-polygon topological relationship. The transitions from DSJ to TCH, from TCH to OVL, and from OVL to INC nodes by applying the translations operations are shown in Figure 3. Accordingly, in the transition from DSJ to TCH, we obtain the configuration shown in Figure 3 Analogously, in Figure 2(a) related to the polygon-polyline topological relationship, from the INT node it is possible to transit to the adjacent TCH, INC and PTH nodes. Similarly, in Figure 2(b), related to the polyline-polyline topological relationship, the transitions from CRS to TCH nodes can be obtained by applying a translation operation. Whereas from TCH to INC it is required a rotation and eventually a translation operation. Furthermore, the transition from CRS to EQL occurs by applying a rotation if and only if the lengths of the polylines coincide. Similar considerations hold for the remaining cases.

IV. THE GeoPQLJ DISTRIBUTED SYSTEM
The GeoPQLJ Distributed System is based on the GeoPQLJ functions which conform to the GeoJSON format specification. It is a format for encoding a variety of geographic data structures using JSON [18]. . In particular, the former is a text markup language for representing geometry objects according to a vector format and reference systems of spatial objects. The latter is used to transfer and store the same information in specific geographic databases.
The GeoPQLJ functions have been inspired by taking into account the syntax of Turf.js [5]. In the following, the spatial types Point, LineString, and Polygon, which allow us to define the spatial operators' functions, are given. These operators are: disjoint, inclusion, touch, intersect, passthrough, cross, overlap, and equal, which are admissible for representing the topological relationships between SGO. Table 2 summarizes the above mentioned functions with the corresponding definitions and invoked GeoPQL operators.
In Figure 4 the diagram of the GeoPQLJ Distributed Systems is shown. As shown in this figure, within the local GeoPQL system a Feature can be associated with a set of attributes. Note that, in this paper we assume that in a query a feature is associated with at most one attribute. Suppose the user expresses a pictorial query q. It is transformed into a query by identifying the corresponding operator GeoPQL_op, as follows:  where Feature i , i = 1, 2, are associated with geometric_types, and Feature 1 is the target of the query. Successively, the query is translated into the format defined according to the GeoPQLJ functions described above, and the following [x n , y n ]]) ) ) → true where geom1 and geom2 are the geometric_types associated with Feature 1 and Feature 2 , respectively. The wkt method converts the geometries to WKT geometry formats [5]. In the wkt method, the WKTReader extracts the geometry objects from either Readers (i.e., the abstract class for reading character streams) or Strings. It is a parser that allows the reading of the geometry objects from text blocks embedded in other data formats (e.g., XML). The syntax of the geopqlj_op are given in the Appendix.
Then, the above query is sent to OSM which connects to the data sources in order to select the required URLs (see the dashed box shown in Figure 4 and, eventually, to associate attributes with features: In other words, OSM uses the GeoJSON data format in order to access the territorial data directly from distributed data sources, as well as to analyze and process them. OSM uses the well-known MapFeatures, Metadata list of Features, and Primary Features. They capture the general information about the metadata that contain the list of features corresponding to the query. This list is associated with the Geodetic Parameter Registry (EPSG), that allows the accurate overlapping of the required features.
As already mentioned in the Introduction, OSM is a digital archive of geospatial data represented on the surface, as for instance roads, buildings, parks, etc., that are objects referred to as features. The OSM features allow the access to the content of data attributes, and the related spatial coordinates. They represent the database structure, and each of them is associated with a set of attributes denoting geographical characteristics. Finally, the query answer is sent to the Web Client, and if it is non-empty, it is shown as a map. Otherwise, in the case the answer is empty, the AAE is activated, which is shown in Figure 5, and described in the next subsection.

A. APPROXIMATE ANSWERING ENGINE
In this section, the AAE proposed in this paper is illustrated. In the presence of empty answers, it provides the user with one or more results that better approximate the given queries. Analogously to GeoPQL, in OSM a query is defined by two features (Feature i , i = 1, 2), each having the same name of the class it refers to. The first feature is the target of the query, and defines the elements expected in the answer. For this reason, we say that in a query symmetry does not hold because by exchanging the features, and therefore the target of the query, in general we do not have the same result. Furthermore, also in OSM, a feature is associated with an eventually empty set of attributes, which in this paper is restricted to a singleton. The set of attributes belonging to a given feature in OSM is organized according to a ISA hierarchy, which is often a forest. For instance, consider the previous query q: Feature 1 (attr 1 ) geopqlj_op Feature 2 (attr 2 ) and suppose that the answer to this query is empty (see on right side of Figure 4). In order to give an approximate answer, the AAE system provides the user with two different solutions. They are based on the OCN graph and the OSM attribute hierarchy, respectively, as descried below.
• OCN graph. According to this approach, the OCN graph corresponding to the topological relationships involved in the query is considered. To this end, the geooperator of the above query is mapped to the original GeoPQL operator of q, by means of the GeoMapper, as follows: Then, the geo-operators labeling the adjacent nodes of the GeoPQL_op in the OCN graph are considered. Therefore, the user decides which is the operator that best approximates the original query, on the basis of his/her interests. Suppose the user chooses, among the nodes adjacent to GeoPQL_op, the node labeled with GeoPQL_op k . Then, the query is updated as follows: which is again submitted to the local GeoPQL system, as shown in Figure 4 (see Step 1 ), and elaborated according to the steps illustrated above in the GeoPQLJ Distributed System. The result is then proposed to the user as a approximate answer to the query q.
• OSM attribute ISA hierarchy. In order to have an approximate answer to the query q, in place of the OCN graph, it is possible to access the sets of OSM attributes associated with the features involved in the query. Attributes in OSM are organized according to ISA hierarchies. The goal of the approach is to replace     87022 VOLUME 8, 2020 one or both the attributes of the query q with the ones preferred by the user, if there are any, or similar attributes in the OSM ISA hierarchy. In particular, in our approach attribute similarity is performed according to the semantic similarity approach defined by Lin [36], also referred to as information content approach. It is based on the association of probabilities with the attributes (concepts) of the ISA hierarchy. The probability of an attribute a is defined as: where f(a) is the frequency of the attribute a estimated using noun frequencies from large text corpora, as for instance the Brown Corpus of American English, and M is the total number of observed instances of nouns in the corpus. According to the standard argumentation of information theory, the information content of a is defined as −log p(a). We assume that the ISA hierarchy is a tree, therefore the least upper bound (lub) of any pair of attributes is always defined and provides the maximum information content shared by the pair of attributes in the hierarchy. Formally, given two attributes a i and a j of the ISA hierarchy, their similarity, referred to as sim(a i ,a j ), is defined as the maximum information content shared by the attributes divided by the sum of their information contents: In order to evaluate Lin's similarity, in our proposal we relay on the WordNet similarity engine [7]. This engine allows the evaluation of the similarity between nouns, verbs, adjective, etc. by following different similarity metrics defined in the literature. In our proposal, the information content approach of Lin is adopted because it has been extensively experimented and shows a higher correlation with human judgment with respect to most of the similarity methods defined in the literature [36]. Consider one of the attributes of the query q, for instance attr 1 , and suppose the user, once accessed the set of OSM attributes related to the Feature 1 , wants to replace attr 1 with the attribute attr k that is either the most similar to attr 1 according to Lin, or it is one among his/her favorite ones, independently of the similarity values. Then, the query: is submitted to the local GeoPQLJ Distributed System (see Step 2 ), and elaborated. The result to this query is then proposed to the user as an approximate answer to the query q.

V. THE EXPERIMENT
In this section some experiments are shown by considering the geodata related to the city of Amsterdam, stored in the OSM data repositories. Note that OSM data includes all the elements of maps such as nodes, ways, and relations, which are gathered in the Planet.osm file of 1192.4 GB size when uncompressed [4]. In order to evaluate the proposed approach, an area of Amsterdam delimited by a given bounding box was selected, and the related XML information has been extracted from OSM. In this area, the number of nodes is 15249954, and the size of data is 557 MB. Let us start by considering the query presented in the Introduction, i.e., suppose the user is interested in the number of highways touching some amenities where it is possible to have a coffee break during a trip in Amsterdam. This query, that is pictorially defined in the Local GeoPQL System as shown in Figure 1 in the Introduction, is formulated in GeoPQL as:

highway TCH amenity
where the geometric_types associated with the features highway and amenity are polyline and polygon respectively and, as anticipated in the Introduction, highway is the target of the query. It is translated into the following query by associating with TCH the geopqlj_tch operator as follows:  Figure 6, the selection of the feature amenity is illustrated. In addition, since the user is interested in an amenity where it is possible to have a coffee, he/she can also access the list of attributes of amenity, as shown in Figure 7, in order to select the preferred one which, in this case, is cafe, i.e.:

highway geopqlj_tch amenity(cafe)
This query is submitted to OSM which, in the case of the selected bounding box for Amsterdam, provides an empty answer as shown in Figure 8.
In particular the above query involves 52425 highways, 2013 amenities, and 17 coffee shops. Note that, among them, 166 of 52425 is the number of highways touching the amenities (see bottom left side in Figure 8). However, with the VOLUME 8, 2020    addition of the attribute cafe, the answer to this query is empty, i.e., there are no highways touching an amenity with a coffee shop in Amsterdam, as shown in Figure 8.
Therefore the user can look for possible approximated answers to this query by following our proposed approach. For instance, suppose he/she wants to follow the first direction, i.e., replacing the geo-operator TCH. Therefore, as shown in Figure 5, according to the GeoMapper, the geopqlj_tch geo-operator is converted to the corresponding GeoPQL TCH, and the OCN graph of the polygonpolyline topological relationships is accessed. Suppose the user selects the PTH geo-operator because, among the adjacent nodes to TCH, he/she is interested in the highways which pass trough a coffee. Then, the query is updated as follows: highway PTH amenity (cafe) and submitted to the system (see Step 1 ). The obtained answer is 3, i.e., there are 3 among the 52425 highways which pass through an amenity with a coffee shop in Amsterdam, as shown in Figure 9.
In order to illustrate the second direction, consider the following query:

waterway INT natural(grassland)
where waterway is the target of the query and is of type polyline, whereas natural is of type polygon, taking into account that in Amsterdam there are 1114 waterways, 4279 naturals, and 21 grasslands. Also in this case, the answer to this query is empty, i.e., there are no waterways intersecting a natural which is a grassland in Amsterdam, as shown in Figure 10.
Suppose the user wants to modify the attribute grassland of natural. Then, he/she accesses the list of OSM attributes of the natural feature which are organized according to the ISA hierarchy shown in Figure 11, that are tree, sand, beach, water, etc.. Furthermore, assume the user wants to select the most similar attribute to grassland. Therefore, each of the attributes shown in Figure 11 is compared with grassland, whose sense according to WordNet is: ''land where grass or grasslike vegetation grows and is the dominant form of plant life''. The similarity values between grassland and each attribute associated with natural are computed according to the semantic approach of Lin. A subset of attributes of natural representing the most similar ones with grassland are shown in the Table 3.
In this table, the most similar attribute to grassland is wood (their similarity is 0.35), whose sense is ''the trees and other plants in a large densely wooded area''. Therefore, by considering that there are 1114 waterways, 52425 naturals (as mentioned above) and 33 woods, the above query is modified as follows: waterway geopqlj_intnatural (wood) and submitted to the system (see Step 2 ), where geopqlj_int corresponds to the original geo-operator INT of the query. As shown in Figure 12 the answer to this query is equal to 6, i.e., there are 6 waterways among 1114 identified, which intersect a wood natural area (see bottom left side of Figure 12).
Consider now a query involving the polygon-polygon topological relationship and suppose the user is interested in all the grasslands overlapping the historical buildings in Amsterdam. This query can be formulated as follows:

natural(grassland) OVLhistoric
where both the geometric_types of natural and historic are polygons, considering that there are 4279 naturals of type polygon as shown before, 21 grasslands, and 78 historic buildings of type polygon. The answer to this query is empty, see Figure 13, i.e., there are no grasslands, among the 21 in Amsterdam, overlapping one of its 78 historical buildings.
Suppose now the user prefers to replace the attribute grassland rather than modifying the OVL operator by accessing the OCN graph. In particular, in place of considering the most similar attribute of grassland in the ISA hierarchy (which, as shown above, is wood), the user is interested in the specific attribute water. Therefore, taking into account that in Amsterdam there are 3508 waters, the query becomes:

natural(water) geopqlj_ovl historic
where geopqlj_ovl corresponds to the OVL geo-operator.
As shown in the bottom left side of Figure 14, the answer to this query is 3, i.e., there are 3 waters among the 3508 naturals, which overlap historic buildings in Amsterdam.
Finally, in order to provide an example about a query involving the polyline-polyline topological relationship, suppose the user is interested in obtaining the canals, which are waterways, touching a cycleway (the attributes associated with waterway are shown in Figure 15). Considering that there are 1114 waterways, 385 canals, and 1382 cycleways, the query is the following:

waterway(canal) TCH cycleway
where the geometric_types of both the features waterway and cycleway are polylines. Also in this case the answer to this query is empty, as shown in Figure 16. Suppose the user is interested in accessing the OCN graph and replacing the TCH operator with the CRS operator, therefore asks the canals which cross, rather than touch, the cycleways in Amsterdam. The answer to this query is 32, i.e., there are 32 among the 385 canals which cross a cycleway in Amsterdam, as shown in Figure 17.
In Table 4, the benchmarks related to the response time in seconds (s) of the eight queries described above are presented. Note that, we ran the queries on a workstation 2CPU Intel Xeon, 2.666 GHZ, 16 slots, DDR4 RAM 128 GB, GPU Memory16 GB GDDR5X. As we observe, the execution time of the query shown in Figure 9 is significantly higher (22s) than the others because the computation of the PTH geo-operator requires more complex operations, according to the formal semantics defined in Section III. In fact, besides the verification of the conditions related to the intersection among the internal and external points of the involved features, also the condition about both the boundary points of the polyline, that must be external to the polygon, has to be checked.

VI. CONCLUSION
In this paper we proposed the Approximate Answering Engine within the GeoPQLJ Distributed System, which provides answers to empty queries according to either the Operator Conceptual Neighborhood graph or the OpenStreetMap (OSM) attribute hierarchy, giving maximum flexibility to the user choices. In the GeoPQLJ Distributed System, queries are formulated by the GeoPQL pictorial query language in the Local GeoPQL System, whose semantics has been revised for the polygon-polyline, polyline-polyline, and polygonpolygon topological relationships, and the corresponding GeoPQLJ functions have been defined. The system has been illustrated by several query examples.
As a future work, we are planning to give the possibility to the user to modify in the query both the geo-operators and the OSM attributes at the same time. Furthermore, in this context, we are extending the proposed approach to topological relationships involving directed polylines and we are investigating the formal semantics of the related geo-operators.  She served as a Referee of several international journals and conferences. She has taken part in various research projects of the European Framework Programs and bilateral projects with international institutions. Her research interests include query processing, data analysis and management, GIS, data warehousing and OLAP, semantic web, and similarity reasoning.
MAURIZIO RAFANELLI received the degree in mathematics from the University of Rome ''La Sapienza,'' in 1976. He is currently an Associate Researcher with the ''Istituto di Analisi dei Sistemi ed Informatica'' (IASI) ''Antonio Ruberti,'' Italian National Research Council (Consiglio Nazionale delle Ricerche-CNR), Rome, where he works with the Information Systems and Knowledge Representation Group. In 1988 and 1998, he organized as the Program Chair and the General Chairman, the IV International Working Conference on Statistical and Scientific Database Management and the X International Conference on Scientific and Statistical Database Management, respectively. He serves as a Referee of several international journals and conferences. His current research interests are geographical information systems and advanced query languages.