Ontology and the Semantic Web

: This paper discusses the development of a new information representation system embodied in ontology and the Semantic Web. The new system differs from other representation systems in that it is based on a more sophisticated semantic representation of information, aims to go well beyond the document level, and designed to be understood and processed by machine. A common theme underlying these three features, i

creation of subject sets.Relationship lists emphasize the connection between terms and concepts.Hjørland(2007)  In the first category (term lists), terms that contain specific meanings are listed, typically in alphabetical order, so that they can be easily accessed when needed.The associations among these terms normally do not go beyond their alphabetical order.In other words, the meaning of a term does not have any relation with the meaning of a term that comes before or after it.They are related by the order of alphabetic letters, not by the meaning they contain.The relations they indicate are generally not semantic relations.The second category (classifications and categories) arranges terms or concepts hierarchically.The hierarchical order is determined by a specific type of relation among terms or concepts.Those arranged in the higher level are in a higher class or a broader category, and usually more inclusive in meaning than those arranged in the lower order.Hierarchical lists indicate, if not more, at least class-subclass semantic relations among terms and concepts that are associated in meaning.In the third category (relationship lists), relations indicated among terms or concepts normally go beyond their hierarchical order.More semantic relations are constructed and expressed in relationship lists.Terms and concepts can be meaningfully associated, for instance, in hierarchical order (class-subclass), horizontal order (synonyms), reverse order (antonyms), or causation order (cause-effect).
The understanding of different semantic relations indicated in term lists, hierarchical lists, and relationship lists provides a useful framework to explain how ontology is different from or similar to other forms of representation models.Researchers in library and information science note that ontology is associated in one way or anther with traditional library representations such as a thesaurus, taxonomy, classification scheme, controlled vocabulary, or even a dictionary (Daconta et al., 2003;Jacob, 2003).To what extent traditional library representation models and ontology are associated can be illustrated by arranging them in the following taxonomy: Semantically speaking, the association between an ontology and representation models in the term list category remains fairly weak.An ontology is a rich expression of semantic relations while a term list, free or controlled, is a natural arrangement of word forms.The semantic tie between an ontology and representation models in the hierarchical list category increases as hierarchical semantic relations are present in an ontology as well as in a classification scheme and a taxonomy.However, as Wang et al. (2006) have pointed out, classification schemes are largely tied to a paper-based environment and more constrained within the academic community while taxonomies are largely created in a Web environment to organize digital resources that are not limited within subjects.As a result, a taxonomy bears a closer tie to an ontology than a classification scheme.Daconta et al. (2003) noted that in the model of ontological representation lies an underlying taxonomical relationship and the basic taxonomic sub-class of hierarchies acts as the framework of ontologies.Welty and Guarino (2001) identified that some notions in a taxonomy are also used to represent the most important properties in an ontology, thus indicating strong mutual relationships between these two content representation forms.Hjørland (2007) stated that a thesaurus is basically a semantic tool because the "road map" it provides mainly connects concepts via semantic relations.The same is true of an ontology.However, one major difference between an ontology and a thesaurus is the richer set of relations used in an ontology (Khoo and Na, 2006).According to Daconta et al. (2003), the basic taxonomic sub-class of hierarchies acts as the skeleton of ontologies, but ontologies add additional muscle and organs -in the form of elaborate relations, properties/attributes, or property values.Ontologies thus enable people to specify the semantics of their domain in great detail.Because of their rich semantic representation power, to equate ontologies with any other type of representational structure is to diminish both the function and potential of ontologies (Jacob, 2003).Jacob thus urged the library community to make a conscious effort to rethink the traditional representational approaches in light of the changing requirements generated by Web environments.

Granular accessibility
Classification schemes are used to classify and allocate library collections into predefined subjects while taxonomies are used to categorize information resources (Wang et al., 2006).They usually do not go below the document level.Terms in classification schemes and taxonomies contain summary information of document content to describe the document as a whole.As a result, they have a low level of granular access to information and are usually used to support browsing or aid navigation.In contrast, ontologies can be used to describe individual words and phases in a document and have a higher level of granularity in information access.Fast and Campbell (2001) compared the level of granularity between a metadata harvesting system and the Semantic Web.They found that in a metadata harvesting-based system like the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH), users search metadata records, not the full text of documents, and resource discovery is therefore significantly less granular.The Semantic Web, through its semantic markup, provides highly granular access to semantically meaningful segments within an entire Web document: In the Semantic Web, documents on the Web may or may not have enveloping metadata descriptions, as envisioned by the designers of metadata harvesting protocols.But these documents will have metadata embedded within them: descriptions which use the emerging Resource Description Framework (RDF) to link specific elements within these documents to definitions which enable computers to interpret these elements semantically (p.13).
Granular accessibility built in ontology and the Semantic Web implies a two-folded meaning.First, there is what Vickery calls "the 'granularity' or 'grain size' of an ontology -to what degree of specifity should the concept hierarchy be continued" (Vickery, 1997, 278).According to Gilchrist (2002), Vickery was one of the first in the LIS field to draw attention to the emergence of the term ontology in knowledge engineering and in information science.In his 1997 article, Vickery highlighted two general trends among authors in the ontology literature: those who concentrated on the top-level types of concept occurring in the domain, i.e., the ontological categories, and those who considered it necessary to include all the specific concepts occurring in the domain to achieve high granularity.Similar discussions can be found in the ontology engineering literature.Researchers try to reduce the problems posted by complexity in constructing ontologies by distinguishing between upper-and domain-level ontologies.Upperlevel ontologies describe domain-independent concepts while domain-level ontologies describe the knowledge of a specific domain (Gahleitner et al., 2005).Gahleitner and his colleagues proposed a system that uses upper-level ontologies as the starting point for defining domain-level ontologies.Sanchez-Alonso and Garcia- Barriocanal (2006) found that upper-level ontologies only apply to large general knowledge bases and do not include concepts specific to given domains.However, upper-level ontologies can be used to avoid defining time and complexity, again as a useful start point.
Granular accessibility in the context of ontology and the Semantic Web also reflects what Fast and Campbell (2001, 14) label as "long-term trends" of breaking down "the traditional document into its component elements" in order to improve information accessibility.Prior to the advent of the digital revolution, human access to information resources relied largely on traditional access tools (e.g., classification schemes, catalogs, indices) to bridge the distance between the user and the resource.Inspired by the promising potential to reduce that distance and bring the full text directly to the user, the library community has started to contemplate seriously on its need and obligation to expand the functions of traditional representations to enable more granular access to library resources.Markey et al. (2006) predicted that in the middle of the next decade, mass digitization efforts are likely to come to a successful end and access to digital collections will probably take precedence over physical library collections; they made three recommendations regarding the future classification online: (1) revisiting chain indexes for producing brief, sound-bite-sized phrases to serve as the briefest document representations in the staging of access to lengthier document representations, (2) building new dimensions in a classification to retrieve the best digital information for the topics that interest people, and (3) building information search tactics into Web search engines that execute automatically to find additional material based on the end user's assessment of retrieved documents (p.35).
The University of California Libraries Bibliographic Services Task Force called for simpler and more efficient cataloging practices to keep pace with the constantly changing digital environment and considered the descriptive function of traditional cataloging "obviously not as important in a world where the item is made directly accessible to users on a computer terminal" (2005,23).In her final report prepared for the Library of Congress, Calhoun commented that in today's academic environment, "a large and growing number of students and scholars routinely bypass library catalogs in favor of other discovery tools, and the catalog represents a shrinking proportion of the universe of scholarly information" (2006, 5) and urged library leaders to "move swiftly to establish the catalog within the framework of online information discovery systems of all kinds" (7).Coyle also called for a more radical change to the "rules for cataloging that are remnants of a long departed technology: the card catalog" and claimed that information professionals "needed a much simpler yet standard way to describe the new forms of intellectual output, as well as the more granular items turning up as products of libraries' and archives' own digital library projects" (Coyle, 2007, online).
More direct and granular access to information is an important target explicated in ontology and Semantic Web technologies.With more and more full text documents and digital objects available online, the future trend will continue to shift "from document retrieval to component aggregation based on specific needs" (Fast and Campbell, 2001, 17).Where ontology and the Semantic Web meet user expectations is the potential to break the text or the object into meaningful components for the computer to process in a way that better satisfies user needs.Machine processibility is thus a key component rooted in ontology and the Semantic Web.

Machine processibility
To develop a Web with semantics, resources on the Web need to be represented or annotated with structured machine-understandable descriptions of their contents and relationships, using vocabularies and constructs that have been explicitly and formally defined with a domain ontology (Lu et al., 2002).The machine processibility that may be achieved on the Semantic Web relies to a great extent on the availability and proliferation of ontologies.Research in ontology engineering covers ontology generation, maintenance, and reuse.Ding andFoo (2001, 2002) conducted a survey of ontology generation, mapping and maintenance and found that most of the generation, mapping and maintenance reviewed in the surveyed systems are dependent on human experts and available facilitating tools remain limited in their functions.Although researchers realize that manual construction of ontologies is a tedious, time-consuming, and error-prone task, fully automated tools to build ontologies from existing information are still at a very early stage of implementation.As a result, the method of a semi-automatic ontology extraction can be seen as a practical short-term solution (Benslimane et al., 2006).The fact that ontologies are tedious and difficult to create also makes the investigation of how to reuse existing ontologies a popular research topic.Alani (2006) described a number of steps necessary to reusing online ontologies to construct new ontologies, including ontology search, ranking, segmentation, mapping and merging, annotation, and evaluation.
The core technology to build a machine-understandable Web is to develop "a series of new markup languages" (Legg, 2007, 415) capable of representing semantic relationships in ontologies.When Berner-Lee described his vision in 2001, the development of the Semantic Web made use of two existing technologies: Extensible Markup Language (XML) and the Resource Description Framework (RDF).In February 2004, the World Wide Web Consortium announced the final approval of two key Semantic Web technologies, the revised RDF and the Web Ontology Language (OWL).By 2006, RDF, RDF Schema (RDFS), and OWL are generally regarded as standard Semantic Web technologies that have been developed to add another layer on top of XML to make Web representation more semantically meaningful to computers (Robu et al., 2006).
Extensible Markup Language (XML), by keeping content, structure, and representation apart, is considered a far more adequate means of knowledge representation (Lu et al., 2002) and is generally regarded as the first level of "semantics" of the Semantic Web upon which other representation tools will be built.Resource Description Framework (RDF) is a language designed to represent information about resources in the World Wide Web so that this information can be exchanged between applications without loss of meaning.RDF provides a foundation for building ontologies, performing logical reasoning, describing Web services, and a host of other Semantic Web activities (Passin, 2004).RDF Schema (RDFS) is designed to express classes and their (sub-class) relationships, as well as to define properties and associate them with classes to facilitate inference and enhance searching (Passin, 2004).Finally, Web Ontology Language (OWL) extends the limited expressiveness of RDFS by adding constructors that allow the building of complex class expressions, cardinality restrictions on properties, characteristics of properties, and mapping between classes and individuals (Taniar and Rahayu, 2006).
Machine processibility by means of markup languages cannot be achieved on the Web without a large-scale effort to "mark up Web pages with the required semantic metadata" (Legg, 2007, 414).Unfortunately, since Semantic Web markup languages such as RDF, RDFS, and OWL are not as straightforward as HTML, the general public will not be able to adopt them quickly.Moreover, there is a lack of motivation among those responsible because "until the Web includes a significant quantity of semantic metadata, developers have little incentive to produce applications for the Semantic Web; but if few Semantic Web applications exist, there is little incentive for the Web authors to mark up pages semantically" (Legg, 2007, 415).The alternative solution, as Legg explained, might be automatic markup.Some research programs have explored the possibility of automation, but with little success.Sure and Studer (2005) introduced annotation tools that allow users to add semantic markups to documents or resources.They also commented that the great challenge was to automate the annotation task as much as possible so as to reduce the burden of manual annotation for large-scale resources.
Making machines understand the meaning of language and act intelligently is also a field of natural language processing in Artificial Intelligence (AI) research (Chu, 2003).Chu cited Doszkocs (1986) to explain that natural language can be processed in AI at phonological, morphological, lexical, syntactic, semantic, and pragmatic levels and "each holds implications for information representation and retrieval" (Chu, 2003, 230).The phonological level of processing reflects the "sound like" representation feature in information retrieval (e.g., finds documents containing terms that sound like "music").The morphological level of processing includes truncation as a retrieval technique and automatic indexing as an information representation model.The lexical level of processing recognizes the signification and application of words (e.g., determines whether book is meant as in a publication or as in making a reservation) and provides automatic search term substitution and augmentation.The syntactic level of processing conducts phrase and proximity searching.The semantic level of processing automatically displays cross references, synonyms, and related terms.Finally, the pragmatic level represents the highest level of language processing designed to "decide the meaning of the language by considering the surrounding context, the author, the user, and knowledge of the real world" (Chu, 2003, 231).
Chu considers language processing at the semantic and pragmatic level a key to the success of the vision of the Semantic Web."If language processing can be done successfully at the semantic and pragmatic level, the Semantic Web envisioned by Berners-Lee, Hendler, and Lassila would become a reality in processing the semantics of Web pages" (Chu, 2003, 231-232).Scenarios created to illustrate the potential machine processibility empowered by Semantic Web technologies indicate that three levels of language processing (lexical, semantic, and pragmatic) would need to be processed to achieve a semantic search on the Web.Researchers envisioned a Semantic Web that would be able to tell the meaning of a word based on where it is used (lexical level processing): This will allow authors to make a distinction between "contact" as in contact information, Contact as in the film starring Jodie Foster (or the book by Carl Sagan upon which the film is based), and "contact" in the context of electrical circuits (Fast and Campbell, 2001, 13).
For example, one might pose a query "return all the reviewers for book 'The Semantic Web: an Introduction'" to a semantics-based Web search engine, then the engine will return only reviewers for this book instead of returning Web pages that contain keyword "reviewer" and/or term "The Semantic Web: an Introduction".For another example, if one pose query "return all the chairs", with the guidance of a furniture ontology, only those furniture chairs are returned; and with the guidance of a person ontology, only people who are chairs of some organizations will be returned (Lu et al., 2002, online).
Researchers hope that the Semantic Web will help end-users locate documents that contain a concept that can be described using a variety of terms (semantic level processing): To enhance the search process, reasoning algorithms will be distributed across the Semantic Web.Knowing that "Tony Blair" and "prime minister" are equivalent, the algorithms will deduce that text written by the "leader of the Labor Party" was written by Tony Blair, because the Labor Party currently forms the British government and hence its leader is the prime minister (Warren, 2006, 53-54).
The most challenging part of the vision outlined by Berners-Lee and his colleagues ( 2001) is to make computers understand what users need and carry out sophisticated tasks (pragmatic level processing) like Pete and Lucy booking a doctor's appointment for their Mom: The [Semantic Web] agent promptly retrieved information about Mom's prescribed treatment from the doctor's agent, looked up several lists of providers, and checked for the ones in-plan for Mom's insurance within a 20-mile radius of her home and with a rating of excellent or very good on trusted rating services.It then began trying to find a match between available appointment times (supplied by the agents of individual providers through their Web sites) and Pete's and Lucy's busy schedules (p.34).
Machine processibility at various levels in the Semantic Web world would be achieved, as currently conceived, by means of ontologies that express semantic relations within a domain, new markup languages to turn ontologies into machine-understandable languages, and annotation tools to mark up Web pages with machine-processable semantic metadata.It is no small undertaking and inevitably faced with serious challenges.

Challenges
It has been five years since the vision of the Semantic Web was laid out in Berners-Lee's 2001 Scientific American article and fifty years since the term Artificial Intelligence (AI) was coined by John McCarthy at the 1956 Dartmouth Conference (Shadbolt et al., 2006).In spite of the progress made, the Semantic Web remains a vision.A wide coverage of good quality Semantic Web has not yet appeared (McCool, 2005).The number of Web pages written in semantic markup languages is very small (Lee and Goodwin, 2005).McCool traces the root of Semantic Web challenges to the technique of knowledge representation, i.e., Edgar Codd's work using set theory and predicate calculus that led to the relational database revolution in the 1980s.According to McCool, knowledge representation (e.g., ontology) uses Codd's mathematical theory to translate information, that humans represent with natural language, into sets of tables that use well-defined schema to define what can be entered in the rows and columns.It is a technique similar to database, but with a large number of columns and a relatively sparse set of non-empty cells.Such a complex format requires enormous cost in creation and maintenance, which makes it difficult for the Semantic Web to achieve widespread public adoption.
Given this limitation, McCool (2006) called for a new approach.He cited lessons in simplicity learned from how the World Wide Web was first developed.According to McCool, Berners-Lee developed the Web by taking the salient ideas of hypertext and SGML syntax and removing complexities such as backward hyperlinks, which has made authoring, sharing, and copying simple enough for people to adopt quickly.Similarly, the Semantic Web formats must be simplified in order to produce user communities.McCool claimed that instead of a Semantic Web containing classes, relations, and triples, parameters should be added to existing markup tags to generate a named-entity Web (NEW).A radical simplification would be the solution to the barriers of the Semantic Web such as limited participation.NEW would make use of existing Web technologies and provide direct benefits at a far lower participation cost.Hepp (2006) went even further to challenge such a data-centric approach.In his opinion, McCool's lightweight approach to annotating existing Web data (i.e., adding some extra tags to existing Web content) might work for a small part of the Web, but would not make the original Semantic Web vision a reality.Hepp thought building the Semantic Web by means of meddling with existing Web data a flawed idea because it is based on several myths about the Web.First, the common assumption that everything is on the Web and one just needs to find the means to locate them is not true.Second, the business Web is not static and constant updates would fail any data-centric annotation.To further complicate, the symmetry and strategic aspects of revealing information in the business world (e.g., disclose information only to seriously interested parties) runs counter to the Semantic Web notion that requires data to be persistently published for an unknown audience.Hepp proposed a different approach.He suggested that entities are more willing to expose functionality than data in business settings and urged that more research attention be paid to developing Semantic Web services (i.e., annotating computational functionality) than to annotating Web content data.Hepp advocated a substantial shift from the data-centric approach of annotating information on Web pages to annotating exposed functionality in Semantic Web services technologies.
More important than proposed solutions are the inquiries focusing on the root of Semantic Web's challenges.In an attempt to tackle the uncontrollable nature of data on the Web, the Semantic Web presents a unique challenge to current knowledge and information representation techniques.Edgar Codd's seminal contributions to the theory of relational databases led to the success of modern database technology, but it is no easy task to turn information represented through natural human language into machine interpretable data.The key to the success of the Semantic Web, according to McCool (2005), lies in finding this generation's Edgar Codd to solve the representation problem.Representations to be developed under a new theoretical framework must be easy to translate to and from natural language to make semantic representation of human knowledge more a reality than a theory.

Conclusion: The Semantic Web revisited
In spite of all the challenges, Berners-Lee and his colleagues remain optimistic about the future of ontology and the Semantic Web (Shadbolt et al., 2006).They believe in the notion of a Web of data and information (for computers to manipulate) in contrast to the current Web of documents (for humans to read).To them, the dream of the Semantic Web is not only about building a Web of actionable information derived from data through a semantic theory, but also about contributing to a new Web science, which they define as a science that seeks to develop, deploy, and understand distributed information systems, processible by both humans and computers, and operating on a global scale.The future of the Web, as Berners-Lee recently stated (2007), lies largely in its ability to manage, integrate, and anlyze data, i.e., individual information elements within documents.With technical innovations like RDF, which identifies and exchange data, and OWL, which expresses how data sources connect together, the Semantic Web will "enable better data integration by allowing everyone who puts individual items of data on the Web to link them with other pieces of data using standard formats" (Berners- Lee, 2007, online).Ultimately, in support of this grand mission will eventually evolve a sophisticated, granular, and machine-processible semantic representation system.
summarized Hodge's list of systems into the following taxonomy: