A Methodology for Engineering Domain Ontology using Entity Relationship Model

org


I. INTRODUCTION
Currently, the data is rapidly increasing and changing over the World Wide Web (WWW).In order to extract the precise information from a huge unstructured pool of WWW is like searching a needle in a haystack.For the extraction of precise and relevant information, researchers have proposed the representation of unstructured WWW data into an intelligent knowledge structure, namely, ontology.Ontology is a source of explicit specification of domain concepts, properties, constraints and security [1].The ontology knowledge helps to uncover the implicit domain semantics which can be used in various intelligent systems such as query expansion [2,3] and expert systems [4,5].
In recent, various ontology building techniques have been developed and published by researchers and experts [6,7].These techniques provide useful guidelines for ontology creation such as ontology development life cycle, tools and ontology languages.Despite of the exiting techniques, the ontology development process is complex, restricted to particular scenarios, and lacks validation [9].Moreover, domain ontologies are still inadequate and yet need to be applied widely [8].To overcome these limitations, this research work proposed a simple and portable approach that facilitates ontology developers to create an accurate and quality based domain-specific ontology with fewer efforts and less time.
The proposed procedure of ontology engineering (OE) is novel in two aspects: (1) it is based on well know ER-schema which is readily available for most of the database-based organizations or can be developed efficiently for any domain of interest with accuracy.(2) The method proposes instant and cost-effective rules for ER to ontology translation while maintaining all semantic checks.The ER-schema and translation model (viewing the two facets as simple and portable) support the development of ontology for any knowledge domain.Focusing on the discipline of information technology where the learning material related to the curriculum is highly unstructured, we have developed an OE tool that captures semantics from the ER-schema and automatically constructs the domain ontology.
This research work has the following structure.Section 2 covers the background study which describes the exiting techniques related to OE.In Section 3 the proposed OE methodology is discussed.ER to ontology mapping model is outlined in Section 4. Section 5 gives the implementation detail of ontology development for the domain of information technology curriculum.Section 6 wraps up the article with the conclusion and future work.

A. Ontology Engineering
OE deals with the systematic process of ontology creation.The technique describes the terms exploited in the domain and the associations between them.In the past, various manual OE techniques have been developed that are based on different steps [11].However, many of the exiting methodologies confine the process of ontology creation into a group of different phases which are used only to create a native or domain-dependent ontology.

B. Issues in Ontology Engineering
In the process of ontology creation, it is necessary to discuss the three most important issues in detail.

1) Rules definition:
The main reasons for ontology development include the removal of ambiguity among the ontology concepts, enlargement of ontology scope, and enhancement in the quality of domain knowledge.Furthermore, the procedure of ontology management turns into more complicated and multifaceted in large-scale development [12].Therefore, to create a true and good quality ontology with smooth development procedure, and less effort www.ijacsa.thesai.organd time, some rules (i.e., steps or phases) are needed to be defined and followed.
2) Reusability of domain information: An important goal of OE is to develop an ontology that can be used in various applications and tasks.Reusability not only saves time and effort but also increases the reliability and consistency of the ontology.The high reusability of ontology indicates its acceptance in various applications.For instance, a general ontology can be used to represent different domain aspects (e.g., UNSPSC1 ).
3) Ontology usability: The main focus of the ontology creation process is usability.Usability refers to an ontology that can accomplish the application requirements.Another point of view of usability is to create diverse or contradictory ontologies (satisfying application requirements) by using similar domain concepts.However, Gyrard et al. [13] argued that this feature makes the ontology dependent on particular application or task; thus, making its reusability low.
So, the technique involves in ontology creation must be based on good quality rules, and determine either ontology is useable or reusable.Note that if the ontology creation process pays attention to application requirements and utilization, the resulting ontology will be usable (application-dependent).On the contrary, the final ontology will become applicationindependent (reusable) if the development process overlooks the purpose and utilization of the application.

C. Literature Review
Currently, various techniques exist for the development of ontologies.Most of these techniques follow general steps: (1) identify the set of general terms, (2) create classes for the terms and then organize classes in a hierarchical formulation, and (3) finally apply constraints on identified associations between the classes.For instance, Reda et al. [14] have created an ontology graph (in RDF language) from the diverse Internet of Things (IoT) data to facilitate the interoperability of different IoT devices.The approach first constructs the IoT fitness ontology by recognizing the classes and associations between the classes using protégé ontology-editing tool.The proposed mapping rules and fitness ontology are then used to generate the semantic RDF graph (i.e., final ontology) for IoT.Another promising technique is adopted by [15], where the seven rules of software engineering and features of the structured design are combined to develop a generic educational ontology.However, the approach follows a complex procedure; comprising of six phases which are further split into smaller steps to build the final ontology.A similar approach is exploited in Remolona et al. [16], where machine learning and natural language processing techniques are combined to generate ontology from the journal database.Chujai et al. [20] have demonstrated a stepwise approach for building ontology from ER model using Protégé tool, whereas the research does not address the ontology translation independent of Protégé features.A more related approach is presented in [10], where a prototype tool is developed to automatically convert the relational data to ER schema.The intermediate ER data is then mapped to OWL ontology.However, the approach does not provide detail about the ontology development life cycle, and ER to OWL mapping (e.g., composite attribute and restriction mappings are inappropriate).The authors themselves suggest that the final ontology may be incomplete due to mapping inconsistencies.In recent, Ellefi et al. [17] proposed a novel model to develop ontology in culture heritage (CH) domain where the data is vast and diverse.The novelty is based on conceptualizing CH resources under three dimensions, namely, topology-based, photogrammetrical process-based and spatial information-based.Moreover, the authors have published the final ontology for knowledge sharing and reusability purposes.
Another vein of OE is to develop ontology from the textual data.In [18], authors have presented a novel technique for ontology construction.The method is based on the combination of knowledge extraction and knowledge capturing approaches from the text.The knowledge extraction approaches are used to extract the synonyms, terms, linear relations, hierarchical relationships and rules needed for ontology construction.On the other hand, latter methodologies (namely, natural language processing and text mining) provide a means to capture the semantic knowledge from textual data.Another model for extracting the vocabulary (terms and phrases) and semantic relations among the vocabulary from the agriculture textual data is presented in [19].The model is based on RelExOnt algorithm for the automatic extraction of pre-defined relationships from the textual data.The final generated ontology is validated by experts against the limited relationships and achieved 75.7% precision.
In the literature survey, the main focus of this study was to analyze the core structure and ontology development life cycle of manual approaches towards OE.Furthermore, the general steps that every approach usually follow are also identified.
After the detailed analysis of existing methodologies, it has been observed that some techniques have not provided the detail of each step involved in ontology development life cycle, while others are specialized for a particular application only.Therefore, due to the absence of a standardized approach for OE researchers are still facing issues (as described in Section 2-B) in ontology development.

III. ENGINEERING ONTOLOGY FOR INFORMATION TECHNOLOGY CURRICULUM
The proposed framework constructs ontology for Information Technology Curriculum (ITC) using the ER diagrams.The procedure for ontology development encompasses three simple phases: feasibility study, planning, and ontology formulation (as illustrated in Fig. 1).In the first phase, preliminary investigation is performed to gather the requirements for ontology domain.Based on the gathered data, ER-schema is crafted in the second phase.The last phase deals with the identification of ER schema to ontology (EROnt) conversion rules and implementation aspects of the system prototype.

A. Feasibility Study (Step 1)
For ontology construction, a preliminary investigation is performed to collect the entire requirements set of the ICT www.ijacsa.thesai.orgdomain.This investigation helps to understand the domain and recognize the sources for acquiring knowledge about ICT.

1) Domain understanding:
The ontology designers must have complete knowledge about the structure of the intended domain of interest.This knowledge is necessary to create an ontology in easy and smooth manner.
Ontology domain can be easily extracted by analyzing the detail of the targeted subject.For the ICT ontology, which must identify various courses to be taught under different disciplines of information technology, and relationships that exist between these disciplines, this research work has selected universities and Higher Education Commission (HEC) of Pakistan as our target subjects.

2) Gathering intended knowledge:
To gather the requirements for ICT ontology, we have analyzed prospectuses and websites of numerous universities of Pakistan that follow the HEC curriculum.We have also consulted different experts and students from the universities to understand the hierarchy and structure of various disciplines, and courses taught in each discipline.Further, the latest edition of HEC curriculum is accessed that helped us to identify the classification of ICT courses (for instance, the learning material can be grouped into general-education, core, compulsory, supporting and elective categories).

B. Planning (Step 2)
Researchers utilize numerous methodologies to mine the knowledge for the ontology construction, for example, using a structured relational model [21] or exploiting an unstructured Web source [22] or utilizing both the structured and unstructured data sources [23].The main objective of this step is to select the appropriate model to acquire the precise knowledge of the domain.
To achieve the goal of a simple and portable OE scheme to construct ontology, we have chosen the ER model.Benefits can be gained from using ER schema as (1) ER diagram is simple and can be created quickly with little expertise or by the experts of database systems, (2) many existing systems have already been archived in the form of ER diagrams (e.g., database management systems), and (3) ER is a portable model (not limited to a particular domain) that can accurately capture the conceptual needs of an intended domain (e.g., ICT).Fig. 2 depicts the example of ICT entities and relationships between them in which a class is denoted by a rectangle, relationship by diamond and attribute of the class by an oval.

C. Ontology Formulation (Step 3)
When the ICT entities, attributes, relationships, and cardinalities are identified in the form of ER-schema, the next step is to convert the schema into ontology knowledgebase.The process of ER to ontology translation can be viewed from Fig. 3.This phase proceeds with defining the EROnt conversion rules.These rules are then applied to illustrate the possible conceptualization (ontology) of ER model (see Section 4).Finally, the ICT ontology vocabulary (that includes concept data property, object property, and constraint) is validated using a reasoning engine (i..e., Herrmit).The use of semantic reasoner is a good choice as it enables to interpret logical consequences within the newly created ontology.This interpretation identifies the inaccurate and inconsistent classification of ICT concepts within the newly created ontology.
Further, the performance of EROnt translation model for the creation of ontology is evaluated in terms of precision and recall ratios (see Section 5).To this end, experts" opinion is obtained to assess whether the generated ontology vocabulary precisely reflects all the elements of ER model in ICT domain (i.e., developed in the planning phase of the proposed framework) or not.

IV. INTERPRETING ERONT MAPPING MODEL
This section presents the ER model in detail and the proposed mapping rules for an ontology creation process.We have recognized a total of 54 entities, 12 relationships, and 31 attributes as a part of ICT ER schema.Fig. 4 gives a sample list of ICT entities along with their attributes, and relationships between the entities.
The semantic interpretation of ICT ER-schema associates each component of ER to ontology vocabulary (such as entity to a concept, attribute, and relationship to datatype property or object property, and cardinality to restriction) using OWL-Lite language [24].Table I lists these interpretations as EROnt mapping rules for the ER to ontology conversion process.The conversion process proceeds by applying the set of outlined rules, for instance, entity or strong entity of ER schema is translated to OWL-class.Single-valued attribute having NULL value is mapped to OWL functional property while restricting the minimum cardinality to one.The final outcome of the process is the OWL ontology vocabulary that accurately mirrors all components of the ER schema.The ICT-schema and proposed EROnt rules are used to obtain the ontology vocabulary for an intended domain.A system prototype is developed using the Java framework and language.Other tools and APIs such as ARQ engine, and Jena API are used to implement the ER to ontology translation to obtain the resultant ICT OWL-ontology.The proposed method also utilized Protégé (an open-source ontology editor) to visualize and validate the resultant ontology.Fig. 5 from the Ontograf tool (a visual built-in plug-in in protégé) depicts a graphical overview of the new ICT ontology.In order to validate the consistency of ICT ontology, this study relied on logic reasoning engine, namely, Hermit (i.e., plug-in in protégé).The reasoner tested the ontology (without human intervention) for concepts redundancy and accuracy of extracted relationships between the concepts, and reported consistency of 100%.Furthermore, the identified vocabulary (e.g., concepts, relationships) of new ontology is inspected by the experts to estimate the performance of the system prototype.Two groups of twenty participants (i.e., a faculty member and research students) from two universities have taken part in the evaluation process.Each group of experts received an ER schema and the corresponding generated ontology to explore four key ontology elements: (1) concepts, (2) data property, (3) object property and (4) constraints, that was obtained as an outcome of the system prototype.Furthermore, the group assessment was shuffled with each other to avoid any missinterpretation.
We have calculated precision and recall values to measure the effectiveness of the system prototype.The precision measure is important as it represents the accurate modeling of domain knowledge, while recall value shows the system reliability in EROnt rules to generate the final ontology.These measures are calculated manually from the experts" judgment about the extracted ontology vocabulary using Equation (1) and Equation (2) as follows: (1) (2) where T can be either concept, attribute or relation.

Table II reports the results for the extracted vocabulary.
From the results, it is evident that our approach achieved high value for precision measure (i.e., valid vocabulary identification).Recall findings are also significant with a little variation in the reliable conversion of constraints, which we believe is might be because of inconsistency in the design of ICT ER-schema.Ultimately, the framework achieved 95.75% average precision and 90.75% average recall in the overall procedure of engineering the ICT domain ontology.www.ijacsa.thesai.org

VI. CONCLUSION AND FUTURE DIRECTIONS
The use of ICT ontology can improve domain description and meaningful information retrieval.This semantic structure facilitates students in course selection as well as researchers in identifying the constitution and hierarchies of higher education curriculum.However, the formulation of an ontology requires the acquisition of complete and precise description of the ICT structure.Furthermore, it is important that OE process is done efficiently and accurately.
Keeping in view, we have presented the ER-schema based approach that allows researchers to develop a domain ontology in standard and domain-independent form.In the context of ICT, our methodology acquires ICT needs from the universities and HEC documentation.The ER model of ICT is used as a representation of domain requirements due to its semantic orientation.The ontology vocabulary (concepts, properties, etc.) is then identified from the ER schema using EROnt translation rules.These rules influence the working of system prototype in the overall process of OE.The evaluation via experts (in terms of precision and recall) and a logic reasoner (i.e., consistency test) confirm that the resultant ICT ontology accurately represents the domain knowledge.
In the future, the ICT ontology can be enhanced by adding other disciplines and constraints which make its use feasible for every field of academia, and for the users in semantic search over WWW.

-
Transform as a subclass of class obtain from a strong entity called host class -Set the subclass min-cardinality to one and object-property in the host class with the range set to subclass class Attributes Attribute -Map to data-type property Key Attributes -Map to functional data-type properties -Max-cardinality is set to one and the uniqueness is represented by an inverse functional property Data Type (date, varchar, integer etc.) -Map to ranges of data-type property Single value attribute Null -Map to a functional data-type property with min-cardinality set to zero Non-Null -Map to a functional data-type property with min-cardinality set to one Composite Attribute -Ignore the composite attributes and map simple attributes into data-type properties (OR) Map composite attributes as a sub-properties of corresponding data-type properties Multi-valued Attribute Null -Map to a data-type property with min-cardinality set to zero Non-Null -Map to a data-type property with min-cardinality set to one Relations Relation -Map to an object-property IS-A relationship -Map to subClassOf relation Ternary relationship -Map relation as a class having three inverse object-properties (participating class, associating class and relationship class) Recursive Relation (constraints) 1:N relationship -Map to a object-property with range and domain set to the same class -Set min and max cardinalities M:N relationship -Map relation to a class: an entity and relationship class with someValueFrom constraint Binary Relation (with Attributes) -Map relation to a class and create two object-properties (relation class and participating class) Binary Relation (No Attributes) -Map to two object-properties which are inverse of each other Binary Relation (constraints) 1:M relationship -Map to min and max cardinalities M:N relationship -Apply constraints after dividing relationship into 1: M and M: 1 relationships 1:1 relationship -Map as a functional property and set max-cardinality to one V.IMPLEMENTATION AND RESULTS

TABLE .
II. PRECISION AND RECALL OF EXTRACTED VOCABULARY