Elsevier

Information Sciences

Volume 467, October 2018, Pages 35-58
Information Sciences

Partial multi-dividing ontology learning algorithm

https://doi.org/10.1016/j.ins.2018.07.049Get rights and content

Abstract

As an effective data representation, storage, management, calculation and model for analysis, ontology has attracted more and more attention by researchers and it has been applied to various engineering disciplines. In the background of big data, the ontology is expected to increase the amount of data information and the structure of its corresponding ontology graph has become more important due to its complexity. It demands that the ontology algorithm must be more efficient than before. In a specific engineering application, the ontology algorithm is required to find in a quick way the semantic matching set of the concept and rank it back to the user according to their similarities. Therefore, to use learning tricks to get better ontology algorithms is an open problem nowadays.

The aim of the present paper is to present a partial multi–dividing ontology algorithm with the aim of obtaining an efficient approach to optimize the partial multi–dividing ontology learning model. For doing it we state several theoretical results from a statistical learning theory perspective. Moreover, we present five experiments in different engineering fields to show the precision of our partial multi-dividing algorithm from angles of ontology, similarity measuring and ontology mapping building point of view.

Introduction

The concept of ontology, inspired in the philosophical notion, started to use in sciences in 1980s refers to different properties of a materia and their relations. Later, it was introduced into the field of computer and information technology, and from the90′s of the last century it became one of the hot research fields in artificial intelligence. Because of its powerful semantic query and concept management ability, the ontology has been applied to other fields in the past 10 years. Now, it is used in nearly all disciplines, such as chemical science (see for instance Vijayasarathi and Sankar [47] or Banchetti-Robino [4]), pharmacology science (see Sarntivijai et al. [36]), biology science (see Kohler et al. [26], Levine et al. [30] and Vishnu et al. [48]), psychology (see Aime and Charlet [1] and Petrunia [34]), education system (see Demartini et al. [12], Kruger-Ross [28] and Ochara [33]), geographic information system (GIS) (see Vaccari et al. [46], Delgado et al[11]. and Tahmoorespur et al. [44]), medical science (see Bertaud-Gounot et al. [6] and Lousado et al. [31]), material science (see Cuccia [9] and Ghibaudi and Cerruti [23]), and neuroscience (see Bowden et al.[7]. and Fumagalli [13]).

As a conceptual model, ontology storage and management the information, has been widely concerned in the field of information retrieval. Using the ontology similarity calculation, we can effectively find the semantic similarity concept of the original retrieval concept, carry out the extended query in the retrieval, and return the result to the user. This trick can improve a lot the intelligence of the information retrieval. For example, if we retrieval the keyword “computer”, the traditional way of search will return the computer–related information according to the degree of relevance from high to low and present them to the user. However, this retrieval is based on keyword matching, like a similar information contains the word “laptop” can not be matched to “computer”. But in fact the words “computer” and “laptop” share high semantics similarity. With the help of ontology for query expansion, can be found that the similarity between “laptop” and “computer” is very high. Thus, in order to find information related to the computer, we find laptop-related information, and then return back to the user according to the similarity. The advantage is that the retrieval of query information is intelligent and very comprehensive.

There are several advances in ontology semantic similarity computation. Rodriguez and Egenhofer [35] presented a method to compute semantic similarity which relaxes the demand of a single ontology. Steichen et al. [42] constructed a morphological abnormality ontology in breast pathology to assist inter-observer consensus, and it implemented position-based, content-based and mixed semantic similarity measures between concepts in this ontology. Al-Mubaid and Nguyen [3] proposed a ontology-structure-based trick for measuring semantic similarity across multiple ontologies. By means of human phenotype ontology, Kohler et al. [27] adapted semantic similarity metrics to compute phenotypic similarity between queries and hereditary diseases annotated. Batet et al. [5] studied a measure in view of the exploitation of the taxonomical structure of a biomedical ontology. Albacete et al[2]. gave proposal for computing a similarity function for each dimension of knowledge. Taha [43] presented techniques for determining the semantic relationships among GO terms. Taieb et al. [45] raised an ontology measure for quantifying the degree of the semantic similarity between concepts. Mazandu et al. [32] introduced adaptable gene ontology semantic similarity-based on functional analysis. Lastra-Diaz et al. [29] presented a detailed companion reproducibility article of the trick and experiments proposed by former researchers in a survey where the state of the art on this topic is presented.

Specifically, the framework of ontology can be expressed as a simple graph in which each concept, element or object corresponds to a vertex of the graph and each edge represents a potential link (or potential relationship) between two concepts.

In the previous conditions, let G=(V(G),E(G)) be a graph corresponding to the ontology O with vertex set V(G) and edge set E(G). In the engineering applications of ontology to various fields, the fundamental goal of the ontology algorithm is to obtain the best ontology function which is applied to measure the similarities between ontology vertices in single ontology or multiple ontologies. The aim of the ontology map is to get the high similarity vertices from different ontologies, i.e., to deduce the similarity between two or multiple ontologies, and it is used to build a bridge between different ontologies thus helps to yield a potential connection among the elements or objects from target ontologies.

At the beginning, the design of the formulas for ontology similarity measuring were heuristic based, i.e., the similarity formula is determined by the researchers according to the structural features of the ontology and the characteristics of the specific application domain. The shortcomings of this method are:

  • 1.

    It relies on the participate of high-level field experts.

  • 2.

    The similarity formula contains many man-made parameters.

  • 3.

    It can not adapt to the dynamic changes in the ontology.

  • 4.

    It has high complexity, and thus not suited in the specific application with big data background.

In order to overcome these shortcomings, the machine learning techniques are gradually applied to the ontology algorithm.The specific idea is to get the optimal ontology function f:V(G)R from the sample learning, which maps each vertex in ontology graph to a real number, and thus maps the whole ontology graph to the one dimension real axis (for multiple ontologies, we put all the graphs into one graph, each ontology is seen as a connected branch of the graph). Then the similarity between the ontology concepts is determined by the distance of their corresponding vertex on the real axis. It means, the similarity between vertices vi and vj is measured by |f(vi)f(vj)|. To have a closer distance means to have higher similarity. The advantage of this algorithm is that it does not depend on domain experts; the results are intuitive; the parameters set by man-made settings are greatly reduced; and most important, the computational complexity is greatly reduced because there is no pairwise similarity calculating.

There are several ontology learning algorithms and theoretical analysis results proposed in recent years. For instance, Gao et al. [17] studied the strong and weak stability of k-partite ranking based ontology algorithm. Gao and Xu [19] presented the uniform stability analysis of learning algorithms for ontology similarity computation. Gao and Zhu [20] raised gradient based ontology learning algorithm. Gao et al. [21] obtained the Ontology sparse vector learning algorithm using ADAL technology. Gao and Farahani [15] researched the generalization bounds and uniform bounds for multi-dividing ontology algorithms with convex ontology loss function. More related contexts on ontology and machine learning can be referred to Cucker and Zhou [10], Smale and Zhou [41], Zhou [50], Ibrahim et al. [24], Jiao et al. [25], Shang et al. [37], [38], [40] or [39].

Among these ontology learning algorithms, multi-dividing ontology algorithm is the most popular ontology learning approach in which all vertices in ontology graph or multi-ontology graph are divided into k parts (correspond to the k classes of rates). Assume that f(va) > f(vb) if va belongs to rate a and vb belongs to rate b with 1 ≤ a < b ≤ k. Note that for ontology graph with tree or tree-likely structure, each kind of branch is corresponding to a rate in the dividing. Since most of ontology graphs have tree structure, multi-dividing ontology algorithm method is widely used in various of engineering filed like biology, medicine, chemistry, etc. Gao and Farahani [15] and Wu et al. [49] presented respectively some examples to show how multi-dividing ontology algorithm is applied to some specific engineering applications.

Although there have been several recent advances in the developing of algorithms for various settings on the multi-dividing ontology learning problem, the study of more available tricks and generalization properties of multi-dividing ontology learning algorithms has been largely limited to the special setting. It inspires us to explore more advanced techniques of ontology learning algorithm in multi-dividing setting and theoretical analysis from statistics learning theory.

In this paper, we present a partial multi-dividing ontology learning algorithm and study its statistics characteristics from a mathematical point of view. In this trick, we divide the whole ontology graph into some branches which are corresponding to several rates. The optimal ontology function is obtained by learning the ontology sample set which also can be divided into k training subsets, and the partial learning framework in multi-dividing setting plays a key role in the implementation process. The structure of the paper is as follows: firstly, we introduce the setting of multi-dividing ontology learning; secondly, the main algorithm is presented in Section 3; and finally, the effectiveness of proposed ontology learning algorithm is stated via five experiments developed in various of engineering applications.

Section snippets

Preliminaries, notation and background

For our mathematical discussion and learning setting expression, for each vertex in the ontology graph, we use a p dimensional vector to express all semantic information of its corresponding ontology concept. We shall use v to denote the vertex v and its corresponding vector in Rp.

Let VRp (pN) be a vertex space for ontology graph G, and the vertices in V are drawn independently and randomly according to certain unknown distribution D. The target of ontology learning algorithms is to predict

Description of the partial multi-dividing ontology algorithm

In this section, we consider ontology function f:VR denoted by f(v)=βv for some βRp. The contexts in this section is organized as follows: we first introduce the structural SVM based multi-dividing ontology framework with hinge ontology loss; then the partial multi-dividing ontology framework with hinge ontology loss is presented; next, we discuss the optimization methods for partial multi-dividing ontology framework based on structural SVM in interval [0,α2a,b] and [0,α2a,b], respectively

Experiments

We underline that to implement our algorithm with mathematical learning setting, for each vertex in each ontology in the experiments we shall use fix dimensional vectors to express vertex’s semantic and construct information. All the information of the concept include its name, attribute, instance and structure of vertex in the ontology graph is packaged in its corresponding vector. In this section, five experiments are designed and presented to measure the effectiveness of our partial

Conclusions

In recent years, since most ontology structure can be expressed as a tree or analogous to tree, multi-dividing ontology learning becomes a hot topic in ontology research in which all concepts are divided into k parts corresponding to k rates according to the branches of ontology tree, and the rank among these k parts are determined by domain experts. There are several advances both in theoretical and engineering applications in multi-dividing ontology setting, and proved to be in high

Conflict of interests

The authors hereby declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

We thank the reviewers for their constructive comments in improving the quality of this paper. This work has been partially supported by MINECO grant number MTM2014-51891-P and Fundación Séneca de la Región de Murcia grant number 19219/PI/14 and National Science Foundation of China grant number 11761083.

References (50)

  • K. Taha

    Determining the semantic similarities among gene ontology terms

    IEEE J. Biomed. Health Inform.

    (2013)
  • M.A.H. Taieb et al.

    Ontology-based approach for measuring semantic similarity

    Eng. Appl. Arti. Intell.

    (2014)
  • E. Albacete et al.

    Semantic similarity measures applied to an ontology for human-like interaction

    J. Artif. Intel. Res.

    (2012)
  • H. Al-Mubaid et al.

    Measuring semantic similarity between biomedical concepts within multiple ontologies

    IEEE Trans. Syst. Cybern. Part C-Appl. Rev.

    (2009)
  • M.P. Banchetti-Robino

    Van helmont’s hybrid ontology and its influence on the chemical interpretation of spirit and ferment

    Found. Chem.

    (2016)
  • V. Bertaud-Gounot et al.

    Ontology and medical diagnosis

    Inform. Health Soc. Care

    (2012)
  • D.M. Bowden et al.

    Neuronames: an ontology for the braininfo portal to neuroscience on the web

    Neuroinformatics

    (2012)
  • N. Craswell et al.

    Overview of the TREC 2003 Web Track

    Proceedings of the Twelfth Text Retrieval Conference, Gaithersburg

    (2003)
  • E. Cuccia

    Aquinas’S ontology of the material world. change: hylomorphism and material objects

    Scripta Mediaevalia

    (2016)
  • F. Cucker et al.

    Learning Theory: An Approximation Theory Viewpoint

    (2007)
  • F. Delgado et al.

    An evaluation of ontology matching techniques on geospatial ontologies

    Int. J. Geogr. Inf. Sci.

    (2013)
  • G. Demartini et al.

    The bowlogna ontology: fostering open curricula and agile knowledge bases for Europe’s higher education landscape

    Semant. Web

    (2013)
  • R. Fumagalli

    Choice models and realistic ontologies: three challenges to neuro-psychological modellers

    Eur. J. Philos. Sci.

    (2016)
  • GaoW. et al.

    Generalization bounds and uniform bounds for multi-dividing ontology algorithms with convex ontology loss function

    Comput. J.

    (2017)
  • W. Gao et al.

    Distance learning techniques for ontology similarity measuring and ontology mapping

    Clust. Comput. J. Netw. Softw. Tools Appl.

    (2017)
  • Cited by (184)

    View all citing articles on Scopus
    View full text