Toward the Multilingual Semantic Web: Multilingual Ontology Matching and Assessment

The amount of multilingual data on the Web proliferates; therefore, developing ontologies in various natural languages is attracting considerable attention. In order to achieve semantic interoperability for the multilingual Web, cross-lingual ontology matching techniques are highly required. This paper proposes a Multilingual Ontology Matching (MoMatch) approach for matching ontologies in different natural languages. MoMatch uses machine translation and various string similarity techniques to identify correspondences across different ontologies. Furthermore, we propose a Quality Assessment Suite for Ontologies (QASO) that comprises 14 metrics, out of which seven metrics are used to assess the quality of the matching process and seven metrics are used to evaluate the quality of the ontology. We present an in-depth comparison of different string similarity techniques across various languages to get the most effective similarity measure(s) between multilingual terms. To illustrate the applicability of our approach and how it can be used in different domains, we present two use cases. MoMatch has been implemented using Scala and Apache Spark under an open-source license. We have compared our results with the results from the Ontology Alignment Evaluation Initiative (OAEI 2020). MoMatch has achieved significantly high precision, recall, and F-measure compared to five state-of-the-art matching approaches.


I. INTRODUCTION
Ontologies are being widely used in various fields of science other than computer science, including Biology [1], Engineering [2], and Medicine [3]. With the rapid expansion of multilingual data on the Semantic Web, more ontologies have become available in different natural languages. According to Linked Open Vocabularies (LOV), 1 English is by far the most prominent language, i.e., most ontologies in the Semantic Web are in English; however, many ontologies in other Indo-European languages (e.g., German) also exist. Specifically, out of a total of 782 vocabularies found in LOV, 584 are in The associate editor coordinating the review of this manuscript and approving it for publication was Mansoor Ahmed . 1 https://lov.linkeddata.es/dataset/lov/vocabs sharing, and web service composition [5]. The main idea is that all ontologies should be mapped to a core set of related ontologies in advance when a system is attempting to automatically search for a specific piece of information from or exchange knowledge with other systems. To date, there is no clear winner in solving the cross-lingual matching problem. Further research is required to advance cross-lingual ontology matching techniques to obtain better results compared to monolingual approaches. Therefore, we can conclude that there is a need to create a flexible framework for matching ontologies in different languages without relying on a specific language.
In this paper, we propose a Multilingual Ontology Matching (MoMatch) approach for matching ontologies in different natural languages. MoMatch uses different string similarity techniques and machine translation to match classes and properties across ontologies. MoMatch comprises four phases: 1) resource extraction: extracting all resources (classes and properties) from the input ontologies, 2) pre-processing: preparing the extracted resource for the subsequent phases, 3) translation: translating the extracted resources, and 4) matching: identifying potential matches between the input ontologies.
The quality of the matching process greatly depends on the quality of the input ontologies to be matched. Existing approaches defined in the literature have addressed the quality evaluation of ontology from different dimensions, such as syntactic, semantic, pragmatic, social, structural, functional, content, schema, and usage [6], [7], [8], [9]. Each dimension has its own set of criteria, and each criterion has its own set of metrics to measure the characteristics of an ontology that can be represented formally [10]. A vast number of metrics have been addressed from different aspects. As a result, determining what quality metrics that affect the quality of the ontology matching process is complex. In our previous work MULON [11], an approach for merging monolingual ontologies in different natural languages, we proposed seven quality metrics to assess the quality of ontologies on the schema level. In this paper, we extend the quality metrics, by proposing the Quality Assessment Suite for Ontologies (QASO). QASO comprises 14 quality metrics for assessing the quality of the matching process in addition to the quality of the input ontologies' schema that influences the matching process (more details in subsection V-A). Out of the 14 metrics, seven are used to assess the quality of the input ontologies and seven to evaluate the quality of the matching process. We address the following research questions: RQ1) What is the most effective string similarity measure(s) for matching multilingual terms across different ontologies? RQ2) How strong is the performance of the similarity measure when the language of the input ontologies is changed? and RQ3) How can the quality of the ontology matching process be measured using a set of metrics?
The contributions of this work can be summarized in the following points: • MoMatch can efficiently match ontologies in any natural language compared to state-of-the-art (cf. subsection VII-C), • Ten language pairs, including Indo and Non-Indo European languages, have been tested in this approach, • We present a comparative analysis of 13 different string similarity measures, • A metric suit (QASO) has been designed for assessing the quality of the matching process, which also can be used to assess the quality of the ontology, • Two use cases demonstrating the usability of the MoMatch in matching multilingual as well as monolingual ontologies from different domains are presented, and • MoMatch empirically showed significantly better performance when compared to five state-of-the-art approaches.
We believe that MoMatch is a crucial step towards realizing the multilingual Semantic Web as it supports the integration of ontologies in different languages. MoMatch 2 and QASO 3 are available in two separate public repositories in GitHub, in which the source code is documented, describing each configurable parameter and function.
The rest of this paper is organized as follows: In section II, we provide background information about our research topic as well as formal definitions of ontology and the ontology matching process. In section III, we provide an overview of related work. The proposed approach is described in detail in section IV. The quality metric suite is presented in section V. Two use cases are presented in section VI to demonstrate possible applications of MoMatch. The results of experiments and evaluations are presented in section VII. Finally, in section VIII, we summarize the main conclusions with an outline of future work.

II. PRELIMINARIES AND DEFINITIONS
Ontology is considered one of the significant cornerstones of representing information more meaningfully, providing machine-understandable semantics of knowledge. It is a container for capturing semantic information of a particular domain, which allows sharing and reusing knowledge in this domain [17], [18].
Definition 1 (Ontology): Formally, an ontology can be represented as a tuple of five components [19]: O =< C, R, P, I , X >, where C is the set of classes/concepts, R is the set of relationships between classes (object properties), P is the set of data properties (a specific type of relation whose domain is a class and the range is a data type), I is the set of class instances (concrete objects), and X is the set of axioms and rules used for checking and verifying ontology consistency and new knowledge inference. Ontology matching is a complicated procedure to bridge the semantic gap between multiple representations of the same domain [20]. It identifies correspondences between the entities (classes and properties) of two or more ontologies that satisfy specific conditions. Definition 2 (Ontology Matching): ''a function f which, from a pair of ontologies to match O 1 and O 2 , an input alignment A, a set of parameters p and a set of resources r, returns an alignment A ′ between these ontologies: [5].
Definition 3 (Alignment): the alignment between two ontologies, O 1 and O 2 , is a set of correspondences between pairs of entities belonging to O 1 and O 2 , respectively [5]. It is the output of the matching process.
Definition 4 (Correspondence): given two ontologies O 1 and O 2 , the correspondence between two entities e 11 and e 12 , where e 11 ∈ O 1 and e 12 ∈ O 2 , is the relation that produces according to a matching algorithm between e 11 and e 12 [5]. It can be represented by the triple: ⟨ e 11 , e 12 , r ⟩, where r is the relation between the two entities e 11 and e 12 .
Definition 5 (Monolingual ontology matching): the process of matching ontologies in the same natural language, i.e., L 1 = L 2 , where L 1 is the natural language of O 1 , and L 2 is the natural language of O 2 [4].
The growing amount of multilingual data on the Web and the resulting development of ontologies in different natural languages has increased the demand for cross-lingual ontology matching.
Ontology matching techniques are used to identify the correspondence between two ontologies' entities, including the analysis of subsumption between classes and the similarity between the entity names. Different ontology matching techniques have been proposed, which can be classified as [20]: a) Element-level matching techniques: identify the correspondences by analyzing entities in the ontologies in isolation, ignoring the structure, i.e., ignoring their relations with other entities or instances. b) Structure-level matching techniques: identify the correspondences by analyzing the structure of entities in the ontology, i.e., considering the relations between entities and their instances. In this paper, we propose an element-level matching technique.
String-based techniques (as element-level techniques) are utilized to match names and name descriptions of ontology entities [5]. These techniques consider strings as alphabetical sequences of letters. The more similar the strings are, the VOLUME 11, 2023 TABLE 2. An overview of five set-based string similarity measures utilized in MoMatch, as well as their formulas. The input is two sets X and Y , each containing a set of tokens for the two strings x and y , respectively. All the resulting values are in the range of [0,1]. more likely they represent the same concepts. Inspired by py_stringmatching library 4 and Doan et al. [13] stringbased techniques are categorized as follows: • Sequence-based: Input strings are considered as a sequence of characters. Such as Hamming distance, Jaro, Jaro Winkler, Levenshtein, Partial ratio, Partial token sort, Ratio, and Token sort.
• Set-based: Input strings are considered as sets or multisets of tokens. Such as Cosine, Dice, Jaccard, Overlap coefficient, and Tversky Index.
• Bag-based: Input strings are considered as bags, i.e., collections of tokens, in which a token appears multiple times. Such as TF/IDF.
• Phonetic-based: It matches strings based on their sound instead of appearances. This similarity measure effectively matches names, which often appear in different ways that sound the same, such as Meyer, Meier, and Mire; Smith, Smithe, and Smythe. Such as Soundex. MoMatch utilizes a set of string similarity measures to search for the most effective string similarity measures that can be used to find identical matches between entities. The utilized string-based techniques are described in Table 1 and Table 2 with their formulas.

III. RELATED WORK
According to a recent review of the multilingual Web of Data literature, fewer researchers have addressed cross-lingual 4 http://anhaidgroup.github.io/py_stringmatching/v0.4.x/index.html ontology matching [23]. Cross-lingual ontology matching techniques are mainly used for matching linguistic information across ontologies in different natural languages [4], [23].

A. CROSS-LINGUAL MATCHING APPROACHES
Abu Helou and Palmonari [24] proposed a cross-lingual lexical matching technique to map lexically-rich language resources such as WordNet. The results of word translations are used as evidence to map concepts lexicalized in different languages. Google Translate and BabelNet are used as external resources for translation. Four language versions of WordNet (Arabic, Italian, Slovene, and Spanish) are mapped to the English WordNet. Musyaffa et al. [25] proposed a framework for interlinking heterogeneous multilingual open fiscal data. Machine translation and similarity measures are used to map concepts across different languages.
Fu and Brennan [26], [27] proposed an algorithm for matching English and Chinese ontologies that takes into account the target ontology's semantics, mapping intent, operating domain, time and resource constraints, and user feedback. Hertling and Paulheim [28] proposed an approach for finding corresponding ontology elements that makes use of Wikipedia's interlanguage links. Lin and Krizhanovsky [29] proposed an approach that uses Wiktionary 5 as a resource for background information to match English and French ontologies.
Trojahn et al. [30] proposed a multi-agent architecture for cross-lingual ontology matching. The translation agent translates the source ontology into the target language using a dictionary. Lexical databases and thesaurus have been used for identifying mappings. Tigrine et al. [31] presented an approach for matching several ontologies in different natural languages that uses the multilingual semantic network Babel-Net 6 as a source of background information.
Ali et al. [32] proposed a multi-agent architecturebased cross-lingual ontology enrichment approach to enrich ontologies from multilingual text or ontologies. Ibrahim et al. [33], [34] proposed a fully automated ontology enrichment approach based on cross-lingual matching that creates a multilingual ontology by enriching a monolingual one from another in a different language. They used lexical similarity (Jaccard) and semantic similarity (based on WordNet) to filter the equivalent classed. All translations produced by Google Translate for each class are considered during the matching process. They proposed another approach for merging monolingual ontologies in various natural languages to produce a multilingual ontology [11]. First, the alignments between input ontologies are identified using the cross-lingual matching techniques, then adds them to the merged multilingual ontology by adding rdfs:label for each language (using language-tagged strings). Offline dictionaries have been built using Yandex translate API for concept translations, and lexical (Jaccard) and semantic similarity are used to find the alignment between input ontologies.
SimCat [35] and Crolom [36] proposed a lexical matching technique that uses Yandex translator and WordNet to compute the semantic similarity between concepts. They first apply NLP techniques to normalize ontology entities, translate all entities, and compute the similarity between them. Such order affects the quality of the translation because NLP techniques eliminate and normalize words, which can greatly affect the translation quality and significantly reduce the alignment quality.

B. CROSS-LINGUAL MATCHING TOOLS IN OAEI 2020
In the context of OAEI 2020 7 results of the ontology alignment evaluation initiative 2020 campaign for evaluating ontology matching technologies, VeeAlign [37], AML [38], LogMap [39], and Wiktionary Matcher [40] provide highquality alignments for the cross-lingual matching task. These approaches heavily rely on the lexical matching technique, except VeeAlign [37], which discovers alignments using a supervised deep learning approach. VeeAlign [37] proposes a two-step model which uses contextualized representations of concepts to discover alignments based on semantic and structural aspects of an ontology. AML [38] is based on lexical and structural matching algorithms. It utilizes background knowledge and machine translation tools, such as Microsoft Translator, before starting the matching process. LogMap [39] implements optimized data structures for lexically and structurally indexing the input ontologies for the matching process. It is an iterative process that begins with initial mappings (i.e., almost exact lexical correspondences) and proceeds to the discovery of new mappings. Therefore, LogMap cannot find matches between ontologies that lack sufficient lexical information. LogMapLt [39] is a ''lightweight'' version of LogMap, which only applies string matching techniques. Wiktionary Matcher [40] is an element-level, label-based matcher which uses multiple language versions of Wiktionary as an external background knowledge source. Meilicke et al. [41] developed a benchmark dataset (MultiFarm) based on manual translations of a set of ontologies from the conference domain into eight natural languages. This dataset is widely utilized to evaluate cross-lingual matching approaches [37], [38], [39], [40]. Good literature on the stateof-the-art approaches in cross-lingual ontology matching is provided in [4].
Despite ongoing efforts to develop various techniques, no clear winner has emerged in the solution to the crosslingual matching problem. Further investigations are still needed to advance cross-language ontology matching techniques not only to obtain good results but also to assess them.

IV. MOMATCH: THE PROPOSED APPROACH
In this section, we will describe in detail the four phases of our approach (cf. Figure 1). The input consists of two ontologies O 1 and O 2 , which can be in two different natural languages (for example, L 1 = fr and L 2 = de) or in the same language. The output is the alignment between the input ontologies in addition to the assessment sheet for the input ontologies and the resultant alignment. In the following subsections, we describe each of these phases in detail.

A. RESOURCE EXTRACTION
This phase aims to extract all resources (including both classes and properties) from the two input ontologies and store them in the resources matrix R. R is a twodimensional matrix (n × 6), where n is the number of extracted resources. Each row in R is represented as a tuple of ⟨resource, type, source, language, translation, pre-processed translation⟩, which contains the resource label, the type of the resource ('C' for a class and 'P' for a property), the source ontology of this resource, the language tag, the translation of this resource, and the pre-processed translation. Currently, the translation and the pre-processed translation for each resource will be NULL and will be assigned later in the following phases. For more illustration, consider the tuple ⟨Comité de programme, C, O 1 , fr, NULL, NULL⟩, in which the resource ''Comité de programme'' is a class extracted from O 1 and its language is French. Similarly, the tuple ⟨ist geschrieben von, P, O 2 , de, NULL, NULL⟩, in which the resource ''ist geschrieben von'' is a property extracted from O 2 and its language is German. The output of this phase is R.

B. TRANSLATION
In order to match resources from different languages, all of them should be translated into a common natural language. Two translation paths could be followed. The first path is to translate one ontology's resources into the other's language. For example, if the input ontologies are in German and French, then either the resources of the German ontology are translated to French or the resources of the French ontology are translated to German. The second path is to translate both of the input ontologies into a chosen pivot language, like English, for example. In our approach, we chose the second path because we found that most of the Natural Language Processing (NLP) techniques for English text outperform others in other languages. Furthermore, most of the external knowledge sources used in the matching process, such as dictionaries and thesaurus, are available in English.
According to Abu Helou et al. [42], machine translation tools can return proper translations for a very large number of resources in the cross-lingual matching task. We use Yandex translate API 8 to translate each resource in the resource matrix R and add the resource's translation to R. For example, consider the same tuple from the previous example in subsection IV-A ⟨Comité de programme, C, O 1 , fr, Program Committee, NULL⟩, in which the English translation ''Program Committee'' is attached to the French class ''Comité de programme''. The output of this phase is R with the translations of all resources.

C. PRE-PROCESSING
This phase aims to clean and prepare the translated resources for the matching phase by employing the following NLP techniques. 8 https://tech.yandex.com/translate/ • Tokenization: divides the resource name into a set of tokens. Tokens are separated by delimiters such as whitespace characters.
• True casing: recognizes resources with camel cases and adds a space between lower-case and upper-case letters such as ''BestPosterAward'' and ''isSponsorOf'' became ''Best Poster Award'' and ''is Sponsor Of'' respectively.
• POS-tagging: classifies tokens into their parts of speech, depending on their definition and context, then assigns them a special label or tag (such as Adjective (ADJ), Conjunction (CONJ), Verb (V), and Preposition (PREP)) accordingly.
• Stop words removal: removes tokens with high frequencies of occurrence and have no contribution to the subject of a text, such as pronouns, prepositions, and conjunctions.
• Normalization and regular expressions: transforms a token into a standard form by removing nonalphanumeric characters and additional white spaces.
The output of this phase is R with the pre-processed translated resources.

D. MATCHING
This phase aims to identify correspondences (matched resources) between the two input ontologies. We perform a pairwise lexical similarity between the translated pre- processed resources of the two input ontologies. We use 13 string matching measures described in Table 1 and Table 2.
We take different threshold values to select the top-matched resources in the matching process. MoMatch matches not only ontologies in different natural languages (i.e., crosslingual matching) but also ontologies in the same natural language ( i.e., monolingual matching). In monolingual matching, the translation phase is skipped, and the matching occurs between the pre-processed resources of the two ontologies.
The output of this phase is the matched resources stored in a matrix M . M is a two-dimensional matrix (m × 4), where m is the number of matched resources. Each row in M is represented as a tuple of ⟨resource1, resource2, type, simScore⟩, which contains resource1 from O 1 , resource2 from O 2 , type of both resources, and the similarity score between them. For example, ⟨Car, , C, 1.00⟩, i.e., ''Car''in English and '' '' in Arabic are two resources of a type class, from O 1 and O 2 , respectively, with a similarity score of 1.00.

V. QUALITY ASSESSMENT
We propose the metric suite QASO for assessing the quality of the matching process. These metrics need to have prior statistical information about the ontologies. We calculate the statistics of the input ontologies using a distributed inmemory approach for statistical computing calculations of large-scale RDF datasets using Apache Spark [43]. A sample for the statistics metrics in QASO is described in Table 3.

A. QASO SUITE
QASO comprises seven metrics for assessing the quality of the ontologies and seven for evaluating the quality of the matching process. We adapt and reformulate the metrics defined in [6], [7], and [8].
• Relationship richness (RR) [6]: refers to the diversity of relations and their position in the ontology. The more relations the ontology has (except rdfs:subClassOf relation), the richer it is. The quality score function f RR : O → R for an input ontology O is defined as follows: where P obj represents the relationships (i.e., object properties) and P subClassOf represents the rdfs: subClassOf relations in O. • Attribute Richness (AR) [6]: refers to how much knowledge about classes is in the schema. The more attributes are defined, the more knowledge the ontology provides.
The quality score function f AR :O→ R for an input ontology O is defined as follows: where C represents the ontology classes and P attr represents all classes' attributes (i.e., data properties).
• Inheritance Richness (IR) [6]: refers to how well knowledge is distributed across different levels in the ontology. The more rdfs:subClassOf relations, the broader range of general knowledge the ontology provides. The quality score function f IR :O→ R for an input ontology O is defined as follows: • Readability (RB) [6]: refers to the existence of humanreadable descriptions (HRD) in the ontology, such as comments, labels, or descriptions. The more HRD exists, the more readable the ontology is. The quality score function f RB :O→ R for an input ontology O is defined as follows: where HRD ∈ {label, comment, description} and R represents the ontology resources.
• Isolated Elements (IE) [7]: refers to classes and properties which are defined but not connected to the rest of the ontology, i.e., not used. The quality score function f IE :O→ R for an input ontology O is defined as follows: where R isolated represents resources defined but not used in O.
• Missing Domain or Range in Properties (MP) [7]: refers to missing information about properties. The less information about properties is missing, the more complete the ontology. The quality score function f MP :O→ R for an input ontology O is defined as follows: VOLUME 11, 2023 where P incomplete represents properties that do not have domain or range.
• Redundancy (RD) [8]: refers to how many redundant resources exist. Resources which are syntactically (e.g. ''isMemberOf'' and ''is_member_of'') or semantically (e.g. ''Chair'' and ''Chairman'') close are considered as redundant resources. The quality score function f RD :O→ R for an input ontology O is defined as follows: where R r represents the redundant resources in O. All the previous metrics can be used to assess the quality of any ontology. To assess the quality and effectiveness of a matching process, we need to verify whether all relevant correspondences have been retrieved (high recall) and correct (high precision). The following metrics are adapted to assess the quality of the matching process not only by using the reference alignment (Ref ) but also without it.
• Class Precision (CP): refers to the fraction of relevant matched classes among the retrieved ones. The more relevant results retrieved, the more precision the matching process has. The quality score function f CP :O→ R for the matching process M is defined as follows: where C match is the retrieved matched classes by MoMatch, and Ref c is the matched classes in the reference alignment.
• Class Recall (CR): refers to the fraction of relevant matched classes retrieved by the system. The more relevant results retrieved, the more recall the matching process has. The quality score function f CR :O→ R for the matching process M is defined as follows: • Property Precision (PP): refers to the fraction of relevant matched properties among the retrieved ones. The more relevant results retrieved, the more precision the matching process has. The quality score function f PP :O→ R for the matching process M is defined as follows: where P match is the retrieved matched properties by MoMatch, and Ref p is the matched properties in the reference alignment.
• Property Recall (PR): refers to the fraction of relevant matched properties retrieved by the system. The more relevant results retrieved, the more recall the matching process has. The quality score function f PR :O→ R for the matching process M is defined as follows: • Degree of Overlap (OV): refers to how many common resources exist between the input ontologies. Resources that are syntactically or semantically close are considered common resources. The quality score function f OV :O→ R for the matching process M is defined as follows: where R match represents the set of found correspondences in a match result produced when matching O 1 and O 2 ontologies. R 1 and R 2 represent the resources of the input ontologies O 1 and O 2 respectively. If the reference alignment is unavailable, we use rough approximations for recall and precision based on the relative quality of the obtained matching results produced by MoMatch [44].
• Match Coverage (MC) [44]: estimation for recall refers to the fraction of resources that exist in at least one correspondence in the matching results in comparison to the total number of resources in the input ontologies.
The quality score function f MC :O→ R for the matching process M is defined as follows: where R O 1 −match and R O 2 −match represents the set of matched resources of ontologies O 1 and O 2 respectively.
• Match Ration (MR) [44]: estimation for precision refers to the ratio between the number of found correspondences and the number of matched resources in the input ontologies. The closer the ratio is to 1.00, the better precision the match result has. In other words, when a match result is not loosely matched to many other resources but only the most similar ones, the match result is better. The quality score function f MR :O→ R for the matching process M is defined as follows:

VI. USE CASES
In this section, we show the applicability of MoMatch in monolingual and cross-lingual ontology matching in different domains.

A. USE CASE 1: CROSS-LINGUAL MATCHING IN SCHOLARLY COMMUNICATION DOMAIN
In this use case, we use ontologies from the scholarly communication domain in various natural languages. We use an example scenario to match the SEO en 9 ontology (with 106 classes and 88 properties), in English, with Conference de ontology (60 classes and 64 properties), in German, from the MultiFarm dataset (see section VII). The goal of this use case is to demonstrate the entire process, from submitting the input ontologies to producing the alignment. Here, O 1 is the German ontology Conference de and O 2 is the English ontology SEO en . Table 4 demonstrates the matching process and shows each phase's sample output. The relevant matching results are identified in the matching phase using Jaccard similarity with a threshold θ ≥ 0.90. MoMatch identified eight matched classes and two matched properties.

B. USE CASE 2: MONOLINGUAL MATCHING IN BIOMEDICAL DOMAIN
In this use case, we use ontologies from the biomedical domain. We match monolingual ontologies from BioPortal 10 -a web-based application for accessing and sharing biomedical ontologies and providing alignments between them. We match the Sample Processing and Separation Techniques Ontology (SEP) 11 with several ontologies from the Bio-portal such as Plant Experimental Conditions (PECO), 12 10 https://bioportal.bioontology.org/ 11 https://bioportal.bioontology.org/ontologies/SEP 12 https://bioportal.bioontology.org/ontologies/PECO Plant (PO), 13 Plant Trait (PTO), 14 and Units of Measurement (UO) 15 ontologies. Table 5 shows the statistics of the Bioportal ontologies and the degree of overlap between SEP with each ontology. The degree of overlap refers to how many common resources exist between ontology pairs as defined in subsection V-A. MoMatch identified 434 out of 448 correspondences found in Bioportal for matching SEP × UO with a 34% degree of overlap. Surprisingly, MoMatch found new correspondences which were missing in BioPortal. In matching SEP × PECO, MoMatch found 35 new correspondences. Similarly, in matching SEP × PO, MoMatch found 38 new correspondences. In matching SEP × PTO, MoMatch found 37 new correspondences.

VII. EVALUATION
In this section, we show the results of two experiments to provide an in-depth analysis for comparing similarity measures across different languages. In addition, we evaluate the quality of the matching process using QASO.

A. DATASET AND EXPERIMENT SETUP 1) DATASET
We use MultiFarm benchmark 17 from the T-Box/Schema matching track of OAEI 2020. 18 The OAEI 2020 competition is an annual international ontology matching competition. MultiFarm is a cross-lingual ontology matching system evaluation benchmark. It consists of seven ontologies (Cmt, Conference, ConfOf, Edas, Ekaw, Iasted, Sigkdd) derived from the Conference benchmark of OAEI, their translation into nine languages (Chinese, Czech, Dutch, French, German, Portuguese, Russian, Spanish and Arabic), and the corresponding cross-lingual alignments between them. Statistics for Mulltifarm ontologies are presented in Table 6. Classification of the nine languages according to Wikipedia 19 is presented in Table 7. We could not use the dataset from OAEI 2021 [45] because the results are not available in detail (for each language pair and each ontology pair) as in OAEI 2020. 13

2) EXPERIMENT SETUP
Scala and Apache Spark 20 were used to implement all phases of MoMatch. A graphical user interface of MoMatch is created (cf. Figure 2). To parse and manipulate the input ontologies (as RDF triples), SANSA-RDF library 21 [46] with 20 https://spark.apache.org/ 21 https://github.com/SANSA-Stack/SANSA-RDF Apache Jena framework 22 are used. To process the resource labels, the Stanford CoreNLP 23 [47] is used. All experiments TABLE 8. Average values (for all ontology pairs) for precision (P), recall (R), and F-measure (F) of matching multilingual ontologies. P* and F* describe the results with adjusted precision and F-measure.
Red, green, and blue entries are the top scores for precision, recall, and F-measure for each similarity measure per row. Similar values are merged.
are run on Ubuntu 16.04 LTS with an Intel Corei7-4600U CPU @ 2.10GHz x 4 CPU and 10 GB of memory.

3) EVALUATION METRICS
The precision, recall, and F-measure metrics, inspired by the information retrieval community, can be used to evaluate the effectiveness of the matching process. We use the gold standard alignments between each pair of ontologies in Multifarm to compute precision, recall, and F-measure. Precision is the fraction of retrieved resources that are relevant, while recall is the fraction of relevant matched resources retrieved by MoMatch. The F-measure is the harmonic mean of precision and recall. Formally, precision is defined as TP/ (

4) EXPERIMENTAL CONFIGURATION
In our experiments, we choose one natural language for each language category in Table 7. Regarding categories with more than one language, such as Italic, we select the most widespread language according to Linked Open Vocabularies (LOV) 1 . For example, the Italic family has three languages where 65 vocabularies are in French, 19 in Portuguese, and 49 in Spanish, i.e., French is selected, which has the most extensive vocabulary in the Italic family. We did not include English in our experiments because it is the most prominent language, and a lot of work has been done in this language. Therefore, we used German, French, Russian, Chinese, and Arabic in the experiments. The Multifarm benchmark is composed of 55 pairs of languages, with 49 matching tasks for each of them, taking into account the alignment direction (e.g., Cmt en → Edas de and Cmt de → Edas en are distinct matching tasks), i.e., 2,695 matching task [48]. In MoMatch, we used 13 similarity measures, so we needed to perform a 35,035 matching task. To decrease the matching tasks without losing their value, we design our experiments to 1) evaluate the effectiveness of the cross-lingual matching process in MoMatch using different similarity measures compared to the reference alignment provided in the MultiFarm benchmark, 2) compare MoMatch matching results with five state-of-the-art approaches, and 3) assess the quality of the matching process using QASO.

B. EFFECTIVENESS OF MOMATCH
In this experiment, we use all similarity measures listed in both Table 1 and Table 2. Different threshold values (θ = {1.00, 0.95, 0.90, 0.85, 0.80}) are used to select the topmatched results according to the resulting similarity scores between every two resources' labels. We select two ontologies (Iasted and ConfOf ) from the Multifarm benchmark and two language pairs from different families: French-German and German-Arabic. In matching French-German ontologies, we match the resources' labels of the French version of Iasted fr ontology with the resources' labels of the German version of ConfOf de (i.e., Iasted fr × ConfOf de ). Similarly, we match the resources' labels of the French version of ConfOf fr ontology with the resources' labels of the German version of Iasted de . The same procedure has been followed in matching German-Arabic ontologies. The resulting alignments are compared with the reference alignments as a gold standard provided in the benchmark for each pair of ontologies. Table 8(a) and Table 8(b) show the average values (for all ontology pairs) for precision, recall and F-measure for matching French-German and German-Arabic ontologies respectively. Surprisingly, we found new correspondences missing in the gold standard alignments, such as the correspondence ⟨ écrit, schreibt, ≡, 1.00 ⟩, which implies that the French and German properties ''écrit'' and ''schreibt'' are identical with a similarity score = 1.00. P* and F* represent adjusted precision and F-measure results when considering the new correspondences. Therefore, P* and F* represent results that are not false positives in practice. Figure 3(a) and Figure 3(b) present the significant improvement of P* and F* over P and F, respectively, in all similarity measures for matching French-German ontologies, where θ ≥ 0.90. The precision and F-measure results are improved by an average of 23% and 8%, respectively. Similarly, Figure 4(a) and Figure 4(b) present a significant improvement in matching German-Arabic ontologies. The precision and F-measure results are improved by an average of 18% and 5%, respectively. To address our research questions, we study these results in terms of two dimensions: 1) similarity measures (RQ1) and 2) language pairs (RQ2). Table 8, the precision is directly proportional to the threshold values, which means the precision increases when the similarity measure value increases, especially with θ ≥ 0.90. While the recall is inversely proportional to the threshold values, e.g., the highest recall is achieved with θ = 0.80.

As shown in
In matching French-German ontologies, Levenshtein, Hamming, Jaccard, and Tversky have achieved the best precision of 100% for all thresholds except for θ = 0.80 Tversky performs 92% (see Figure 5(a)). Similarly, Levenshtein, Hamming, Jaccard, and Tversky have achieved the best F-measure of 71% for all thresholds except for θ = 0.80 Tversky reaches 69% (see Figure 5(c)). Cosine, Overlap coefficient, Partial ratio, and Partial token sort have achieved the best recall of 78%, 78%, 73%, and 73%, respectively, where θ = 0.80 (see Figure 5(b)). In matching German-Arabic ontologies, Levenshtein, Hamming, Jaccard, and Tversky have achieved the best precision of 100% for all thresholds (except for θ = 0.80, Tversky gains 78%) (see Figure 6(a)). Levenshtein, Hamming, Jaccard, and Tversky have achieved the best F-measure of 59% for all thresholds (except for θ = 0.80, Tversky performs 54%) (see Figure 6(c)). Cosine,   Table 8. Similarity measures are sorted from left (the highest value) to right (the lowest value) according to precision, recall, and F-measure. The more threshold values, the more precision and F-measure scores are achieved. In contrast, the lower the threshold values, the more recall scores are achieved.
Overlap coefficient, Partial ratio, and partial token sort have achieved the best recall of 54% where θ = 0.80 (see Figure 6(b)).
Across the two language pairs, French × German and German × Arabic, precision and F-measure in sequencebased similarity measures are sensitive to variations in the threshold (see Figure 7(a) and Figure 7(e)). Levenshtein, Hamming, Token sort, and Ratio provide the best precision and F-measure scores. In contrast, recall remains constant across all threshold values (see Figure 7(c)). Partial ratio and Partial token sort provide the highest recall of 64% across all threshold values. Similarly, precision and F-measure in VOLUME 11, 2023 FIGURE 6. Precision, recall, and F-measure charts of different similarity measures over all thresholds for matching German-Arabic ontologies, as shown in Table 8(b). There is no significant difference between these charts and the charts for matching French-German ontologies ( Figure 5).
set-based similarity measures are also sensitive to variations in the threshold (see Figure 7(b) and Figure 7(f)). Jaccard, Tversky, and Dice have the best precision and F-measure scores. The recall is relatively stable against threshold changes (see Figure 7(d)). The overlap coefficient provides the best recall of 64% and 66% for θ ≥ 0.80.
The choice of string similarity measure greatly influences the precision and recall of the matching process. When selecting a similarity measure to be used in ontology matching, it is crucial to examine the features of the ontologies being matched and whether the precision or the recall is more significant to the matching technique. Levenshtein, Hamming, and Jaccard achieve the best precision. If one wants to use more than one similarity measure, then one can choose from the following list, sorted in descending order of precision: 1) Levenshtein, Hamming, and Jaccard, 2) Tversky, 3) Token sort and Ratio, 4) Dice, 5) Jaro, 6) Jaro Winkler, 7) Cosine, 8) Partial ratio, 9) Partial token sort, 10) Overlap coefficient. On the other hand, the Overlap coefficient achieves the best recall. Similarly, if one wants to use more than one similarity measure, then one can select from the following list sorted in descending order of recall: 1) Overlap coefficient, 2) Partial ratio and Partial token sort, 3) Cosine, 4) Jaro, Jaro Winkler, Levenshtein, Hamming, Ratio, Token sort, Dice, Jaccard, and Tversky.

2) LANGUAGE PAIRS
The language pair French-German has better results than German-Arabic (see Figure 8). Two native speakers of Arabic found that the reason behind that is the linguistic mistakes found in the Arabic ontologies, which negatively affect the translation and the matching results. We correct these mistakes and make them available at the MoMatch GitHub repository 2 .
In order to answer the research question RQ2, we calculate the Spearman correlation between the two language pairs for precision, recall, and F-measure where θ = {1.00, 0.95, 0.90, 0.85, 0.80}. All threshold values have the highest correlation of 1.00 except for θ = 0.80; the correlation values are 0.99, 0.80, and 0.98 for precision, recall, and F-measure, respectively. As a result, the two language pairs are positively and strongly correlated to each other, i.e., the higher similarity measure is ranked in French-German, the higher similarity measure is ranked in German-Arabic, and FIGURE 7. Average precision, recall, and F-measure on different similarity measures and thresholds across the two language pairs French-German and German-Arabic. The greater the distance between the threshold line and the polygon's center, the higher the precision, recall, and F-measure scores. VOLUME 11, 2023   vice versa. This result indicates that regardless of language pairs, the resulting scores for precision, recall, and F-measure obtained from the combination of different similarity measures and threshold values stay highly correlated.

C. COMPARISON WITH THE STATE-OF-THE-ART
In this experiment, we identified five of the related approaches (AML [38], LogMap [39], LogMapLt [39], VeeAlign [37], Wiktionary [40]) to be included in our evaluation. The other related works neither publish their code nor their evaluation datasets [26], [27], [31]. In order to show the applicability of MoMatch, we use different language pairs. We select the broadest language from each category in Table 7, i.e., German (Germanic), French (Italic), Russian (Balto-Slavic), Chinese (Tai-Kadai), and Arabic (Afro-Asiatic). Therefore, we have ten pairs of language as follows:  German × French, German × Russian, German × Chinese, German-Arabic, French × Russian, French × Chinese, French × Arabic, Russian × Chinese, Russian × Arabic, and Chinese × Arabic. We choose the most effective similarity measures in terms of precision, recall, and F-measure from the previous experiment. Therefore, we select Jaccard and Levenshtein, which has achieved the best precision and F-measure, and Overlap coefficient, which has achieved the best recall. In order to compare our results with the stateof-the-art, we match Conference (the middle ontology in Table 6) with Edas and Ekaw ontologies as mentioned in the results of OAEI 2020 7 . Therefore, there are four ontology pairs in each language pair. For example, in matching French × Chinese, the ontology pairs are: Conference fr × Edas cn , Conference cn × Edas fr , Conference fr × Ekaw cn , and Conference cn × Ekaw fr . We evaluate the quality of the matching process by calculating precision, recall, and F-measure as in the previous experiment. Table 9(a), Table 9(b), and Table 9(c) show a comparison between MoMatch's results for matching ontologies in ten pairs of languages against five state-of-the-art systems using Jaccard, Levenshtein, and Overlap coefficient respectively. We found new correspondences for all ontology pairs which were missing in the gold standard alignments. MoMatch* represents results when considering the new correspondences. It provides the matching results with the adjusted precision and F-measure. LogMapLt achieves the highest precision of 100% but the lowest recall and F-measure of 0% for all language pairs. Similarly, Wiktionary achieves the highest precision of 100% but the lowest recall and F-measure of 0% for all language pairs with Arabic or Chinese, except for German × Chinese. MoMatch outperforms most other systems in terms of precision, recall, and F-measure when using Jaccard and Levenshtein similarities and not considering the new correspondences as false positives. For instance, in matching German × French ontologies, MoMatch outperforms AML, the highest precision (after LogMapLt), recall, and F-measure among the others in matching German × French ontologies, by 38%, 10%, and 17% in terms of precision, recall, and F-measure respectively when using Jaccard similarity. Similarly, MoMatch outperforms AML by 41%, 9%, and 17% in precision, recall, and F-measure, respectively, when using Levenshtein similarity.
The use of Jaccard and Levenshtein similarity measures give relatively similar results in precision, recall, and F-measure (see Figure 9). While the use of Overlap coefficient similarity in MoMatch achieves the highest recall among the other systems for matching all language pairs. These results confirm our findings from the previous experiment where Jaccard and Levenshtein similarity measures can achieve the best precision and F-measure while the Overlap coefficient can accomplish the best recall.
We calculate the Spearman correlation between F-measure values for Jaccard, Levenshtein, and Overlap coefficient produced by MoMatch, and the ten language pairs. All language pairs achieve the highest correlation of 1.00 except the two language pairs, French × Chinese and Russian × Chinese achieve 0.87. In addition, the correlation between German × Chinese and French × Chinese achieve 0.87 as well. As a result, the ten language pairs are positively and strongly correlated to each other. These results confirm our results from the previous experiment (see subsection VII-B), where the effects of different similarity measures are independent of the language pair.

D. EVALUATING THE QUALITY OF THE MATCHING PROCESS USING QASO
In this experiment, we randomly choose three pairs of ontologies from the previous experiments (Conference de × Edas fr , Conference ru × Ekaw ar , and ConfOf de × Iasted ar ). We assess the quality of the matching process using QASO described in subsection V-A. Table 10 shows the assessment results for each input ontology in addition to the assessment for the matching results. The quality metric results for Conference de and Conference ru are identical because it is the same ontology but in two different natural languages (German and Russian). Conference has the highest results of 48% and 78% in terms of relationship richness and attribute richness, respectively. In terms of inheritance richness, Ekaw ar achieves the highest results of 97%. Ekaw ar is the most ontology that suffers from missing properties information by 21%.
MoMatch achieves the highest class precision of 100% for matching ConfOf de × Iasted ar and the highest class recall of 46% for matching Conference de × Edas fr . In terms of property precision and recall, MoMatch achieves the highest results of 50% and 67%, respectively, in matching Conference de × Edas fr . The property recall for matching ConfOf de × Iasted ar is N/A because there are no properties retrieved in the reference alignment (Ref p = 0). MoMatch identifies the highest degree of overlap of 6% in matching Conference ru × Ekaw ar , where it has the highest results for match coverage and match ratio of 11% and 104% respectively.

VIII. CONCLUSION
We propose the MoMatch approach, which matches ontologies in different natural languages. We show a comparative analysis of 13 different string similarity measures. Additionally, we present QASO -a metrics suite for assessing the quality of any ontology and the quality of the matching process. We test the performance of MoMatch over different ontologies in different natural languages, including Indo and non-Indo-European languages. We find that Levenshtein, Hamming, and Jaccard similarity measures have the highest precision and F-measure, while partial ratio, partial token sort, and overlap coefficient have the highest recall for matching multilingual ontologies (RQ1). Accordingly, we sorted the 13 similarity measures into two lists according to precision and recall to support choosing the most appropriate one for the matching process. The correlation between language pairs is consistently high and positive (RQ2). The results of the cross-lingual matching process in MoMatch are found to be promising compared to five state-of-the-art approaches.
We assess the quality of the matching process using QASO (RQ3). We show the usability of MoMatch by presenting two use cases in scholarly communication and biomedical domains for both cross-lingual and monolingual ontology matching. MoMatch can be easily adapted for other use cases and domains. In conclusion, MoMatch established the first step toward a multilingual Semantic Web.
In the future, we intend to further; 1) consider individuals in the matching process, 2) include other similarity measures, such as string-based structural measures which consider the entity's neighbors in the matching process, and 3) develop scalable approaches to match large-scale ontologies and knowledge graphs efficiently. SAID FATHALLA received the M.Sc. degree in computer science from the University of Alexandria, Egypt, and the Ph.D. degree in information systems from the University of Bonn, Germany, in 2021. He has several conferences and journals publications in the semantic web, text classification, bioinformatics, scholarly communication, and knowledge engineering. He has published over 40 papers in peer-reviewed journals and international conferences. His current research interests include semantic web technologies, scholarly communication, knowledge graph construction, metadata management, and natural language processing. JENS LEHMANN received the Ph.D. degree (summa cum laude) from the University of Leipzig and the joint master's degree in computer science from the Technical University of Dresden (TU Dresden) and the University of Bristol. He is currently a Principal Scientist with Amazon, working at Alexa AI on conversational AI, knowledge graphs, and generalized intelligence. He is also an Honorary Professor with TU Dresden. Previously, he was a Full Professor at the University of Bonn and Fraunhofer IAIS. His academic activities at TU Dresden and InfAI support the Smart Data Analytics Research Group. He has authored more than 300 publications, which were cited more than 25000 times. His research interests include semantic web technologies, question answering, machine learning, and knowledge graph analysis. He was selected as a fellow of ELLIS. He is a member of InfAI. He has received 16 international best papers awards, including multiple ''Test of Time'' awards.
HAJIRA JABEEN received the Ph.D. degree in computer science from the National University of Computing and Emerging Sciences, Islamabad, Pakistan. She is currently a Team Leader in big data analytics with the GESIS-Leibniz Institute for the Social Sciences and an Associate Researcher with the University of Bonn. Her research interests include artificial intelligence, data analytics, knowledge graphs, and scalable machine learning. She has authored many journals articles and conference papers in these areas. VOLUME 11, 2023