Previous Chapter: 3 Resolution Next Chapter: 5 Applications 4 DOI® Data ModelThis chapter explains the basis for the second main technical component of the DOI® System, the DOI Data Model, and its ability to ensure interoperability of DOI name metadata assigned through existing metadata schemes. The chapter gives an overview of the system, and then separate sections discuss the aims of the DOI Data Model policy -- interoperability and good administration -- and the three tools of the Metadata System -- kernel metadata, the data dictionary and schemas for metadata interchange. Readers are advised to consult the Glossary of Terms at the start of the Handbook in conjunction with this chapter. For RAs and those wishing to explore this further, more extensive discussion of the issues and detailed specification of the components of the DOI Data Model are found in Appendices 4-6.
4.1 Overview of the DOI® Data Model 4.1 Overview of the DOI® Data ModelWithout metadata, an identifier is of very little value. Metadata, which may be defined in this context as information about an identified Referent, provides human beings or machines with the data they need to enable them to make use of that identified Referent. Metadata may include names, identifiers, descriptions, types, classifications, locations, times, measurements, relationships and any other kind of information related to a Referent. For a fuller review of the relation of metadata to the DOI System in general see the factsheet "DOI® System and data dictionaries". There are two ways in which every IDF Registration Agency is bound to deal with metadata. An RA will gather input metadata from Referent providers (typically, descriptions of the Referents and associated rights and policies); and an RA will need to provide some level of output or service metadata to support DOI System services. Input metadata will provide some, but not necessarily all, of the service metadata. In some cases, a metadata declaration will itself be a complete DOI System service (for example, "provide an ONIX Product message for this Referent"). These two flows of metadata declarations are illustrated in figure 1.
Figure 1: Flows of metadata within the RA network DOI System policy places no restrictions on the form and content of an RA's input and service metadata declarations, except insofar as input metadata must support the minimum requirements implicit in the DOI Kernel (see below). RAs may specify their own metadata schemes and messages, or use any existing schemes in whole or part for their input and service metadata declarations. DOI data model policy is concerned with the internal management and exchange of metadata between RAs within the "RA network", and is designed to achieve two aims:
The DOI Data Model has three tools to support its metadata policy:
The responsibilities of RAs can be summarized in these three statements:
These responsibilities are not mandatory for all DOI names: exceptions are discussed in terms of the requirement for interoperability described in the next section. 4.2 Aims of DOI Data Model policyThe first aim of DOI Data Model policy is to promote interoperability within the network of DOI name users. It does this by providing ways of achieving "semantic compatibility" between different RAs described in this chapter. Standardization of any kind is driven by a need for interoperability. If an RA is issuing DOI names for Referents for use within a private domain where that RA is able to command all aspects of metadata gathering and output, then it has no need for standardization or conformance with DOI Data Model obligations. The RA will lay out its schemas and declarations, and its providers and users will, hopefully, conform to them. Such a situation is described as restricted use of the DOI System, and applies typically where an organization becomes an RA for the specific purpose of issuing DOI names for use only within its own private organization. Restricted use is discussed more fully in section 6.5 of the Handbook. However, such isolation is unusual. Normally, when a DOI name is issued to a Referent, one fundamental assumption may be made about interoperability: the RA or the Referent provider may wish (now or in the future) that the DOI name should be available for use in services provided by other RAs. For example, where several RAs are issuing DOI names to journal articles from different publishers, it is likely that some RAs and publishers will want their DOI names to be included in journal-related services supported by other RAs. In a similar way, many RAs will want DOI names issued by other RAs to be available for inclusion in services they themselves are providing. Such interoperability is one of the principal benefits of the DOI System. As the RA network grows, such requirements are emerging, and where specific opportunities do not yet exist they are anticipated. In such circumstances neither the RA nor the Referent provider wishes to issue a second DOI name for the Referent, nor to provide and capture the input metadata all over again from its source. In addition, some DOI System services may not, in future, be the direct responsibility of RAs. Any service provider making use of DOI names issued by different RAs under different Application Profiles will be faced with the question of metadata interoperability. Any DOI name which is intended for interoperability -- that is, which has the possibility of use in services outside of the direct control of the issuing RA -- is subject to DOI Data Model policy. The aim of metadata interoperability can therefore be expressed in these two objectives:
The first objective is dealt with by the DOI Kernel, and the second by the interchange provisions of the RMD and iDD. The above provisions do not apply to DOI names registered under the legacy "Zero AP" described in Chapter 5. 4.2.2 Administrative capability The second aim of DOI Data Model policy is "To ensure minimum standards of quality of administration of DOI names by Registration Agencies, and facilitate the administration of the DOI System as a whole". This aim may also be seen as supporting the first aim of interoperability, but it specifically addresses the need to ensure that a prospective RA is competent to issue DOI names responsibly and that ambiguous DOI names do not enter the network. The Data Model policy provides a simple test for an RA's competence: the ability to make a DOI® Kernel Declaration, which ensures that the RA has an internal system which can support the unambiguous allocation of a DOI name and is fundamentally sound enough to support interoperability within the network. In addition, Data Model policy also requires that an RA maintains records of the date of allocation of a DOI name and the identity of the registrant on whose behalf the DOI name was allocated. The metadata policy also exists to support the future development of mechanisms for facilitating the administration of the DOI System as a whole. This might be done, for example, through the use of iDD-registered terms as types to classify DOI names, Services or Application Profiles. 4.3 DOI Data Model tools4.3.1 DOI Metadata Specification Data Dictionary The Digital Object Identifier Registration Authority shall provide a data dictionary as the repository for all data elements and allowed values (the items which may be used as values of each element) used in DOI metadata specifications. Further details shall be provided in the User Manual. The data dictionary shall enable the definition within an ontology of all metadata elements to be available to all registration agencies, and provide the mappings to support metadata integration and transformations required for data interchange between registration agencies. If a registration agency wishes to consolidate metadata provided by several other registration agencies for a specific service, the data dictionary will provide the data mappings required to enable the registration agency to present the consolidated metadata as if from a single set. The data dictionary shall also contain mappings of other relevant schemes, as determined by the Digital Object Identifier Registration Authority (such as ISO codes for territories, currencies and languages). All allowed values used by a registrant in Kernel Metadata shall be registered in the data dictionary. DOI Kernel Metadata Declaration Assignment of a DOI name shall require the Registrant to record metadata describing the object to which the DOI name is being assigned. At minimum this shall consist of a DOI Kernel Metadata Declaration. Values of each kernel element shall be drawn from a set of allowed values specified by the Digital Object Identifier Registration Authority. Table 1 shows the basic descriptive elements in a DOI Kernel Metadata Declaration (also known as the DOI Kernel). The formal specification of the DOI Kernel Metadata Declaration is given in an XML schema maintained and published by the DOI Digital Object Identifier Registration Authority.
Table 1: Descriptive elements of the DOI Kernel Metadata Declaration Table 2 shows the basic administrative elements in a DOI Kernel Metadata Declaration. These elements relate to the issuance of the DOI name and to the registration record itself.
Table 2: Administrative elements of the DOI Kernel Metadata Declaration The Digital Object Identifier Registration Authority shall specify the Kernel elements that will be used in common by all DOI registration agencies and shall issue prescribed sets of allowed values for such elements. For other elements and sub-elements, DOI registration agencies can develop and use their own choice of values as needed. Each DOI registration agency shall register such value sets in the data dictionary specified by the Digital Object Identifier Registration Authority in order to facilitate the integration of DOI data from different sources by a common application. The Kernel Declaration, which is formally specified in an XML schema, answers a number of basic questions about the identified Referent (see Table 3). The answers to these questions should all be known by the RA at the time the DOI name is issued: if they are not, it will be questionable that the DOI name has been allocated unambiguously.
Table 3: Kernel elements There may also be a few questions about the issuing of the DOI name and Kernel itself (Table 4):
Table 4: Administrative Kernel elements The Kernel has one major function: it ensures that a basic set of interoperable, descriptive metadata exists so that DOI names can be discovered and disambiguated across multiple services and Application Profiles in a coherent way. The "AP1" Application Profile for Kernel metadata is under development to enable access to Kernel metadata for any DOI name. It is not mandatory that all DOI names should be accessible through such a service: but no such cross-network tool, however limited, would be feasible without a standard such as the Kernel. Values of some Kernel elements (names and identifiers) are simply data strings. The other elements are drawn from sets of allowed values: for example, an agentRole might be "Publisher", "Composer" or "Distributor". These values may be expressed in different ways in code lists or "pick-lists", and they may be more or less well defined, but what they share is that, for interoperability to succeed, the values used by different providers or RAs must be reconciled at some point through mapping. Two Kernel elements (structural Type and mode) have a small, prescribed set of allowed values which all RAs must recognize. For the other elements and sub-elements, RAs may use their own choice of values, and add to them as and when required. These value sets must be registered in the data dictionary (iDD) for mapping purposes, so that any application using Kernel metadata from more than one source may be capable of presenting an integrated set of values to its users. The use of certain standard values and the registration and mapping of other key values has another essential purpose: to ensure that metadata from RAs is not fundamentally inconsistent. For example, if one RA is issuing DOI names for digital fixations of journal articles, and another is issuing DOI names for abstractions of the same articles, the two cannot be used in the same way in the same service. Such distinctions are by no means self-evident, and unless they are made explicitly, using a common or mapped vocabulary, confusion is inevitable. As the RA network grows such confusion would result in costly problems and constraints on commerce. The Kernel Declaration described here applies to referents in the form of Creations (items of intellectual property which represent the scope of early DOI System implementation). However, other types of referent (such as Parties and Places) are also necessarily involved in intellectual property transactions and may in principle be identified by DOI names. As DOI names are applied to entities other than Creations, an appropriate Kernel will be defined. Kernel metadata for all DOI names may be published under Application Profile AP1. Technical arrangements for the provision of Kernel metadata records through a generalized Kernel Metadata service are under development. The detailed specification and XML schema for the Kernel Declaration is given in Appendix 6. 4.3.3 indecs Data Dictionary (iDD) The indecs Data Dictionary (iDD) is under development as the repository for all data elements and allowed values used in Kernel Metadata declarations and Referent Metadata Declarations (RMDs). The iDD enables the definition and ontology of all metadata elements to be available to all RAs, and provides the necessary mappings to support metadata integration and transformations required for data interchange between RAs. For example, if an RA wishes to consolidate metadata provided by several other RAs for a specific service, the iDD will provide the data mappings required to enable the RA to present the consolidated metadata as if from a single set. iDD also contains mappings of "third party" schemes such as ONIX, the MPEG-21 Rights Data Dictionary and ISO Territory, Currency and Language codes. The iDD is based on a contextual metadata framework developed under the <indecs> project to support interoperability of multiple metadata schemes (the IDF was a partner in the original indecs activity). The contextual structure of iDD supports mapping and transformation in a richer and more comprehensive way than conventional one-to-one "crosswalks". It is explicitly designed to enable metadata to be expressed in the simplest or most complex ways and transformed from one to the other. iDD is a structured ontology compliant with logical axioms and constructors common to ontology languages such as W3C's OWL (Web Ontology Language). It can, for example, support the production of legal OWL ontologies. All allowed values used by an RA in its Kernel Metadata, and all data elements used by an RA when mapping to an RMD, must be registered in the iDD. The iDD is administered on behalf of the IDF by an agency appointed for the purpose. Each iDD-registered Term will have its own DOI name to support DOI System services accessing the dictionary. A more detailed description of the iDD is given in Appendix 4. See also see the factsheet "DOI System® and data dictionaries". 4.3.4 Referent metadata Declaration (RMD) for metadata interchange A DOI Referent Metadata Declaration (RMD) is a message designed specifically for metadata exchange between RAs. The format may also be used for input or service metadata, but it is not intended as a replacement for other domain or service specific schemes. An RMD is in the form of an XML document which conforms to an XML Schema (xsd). All its elements and allowed values are mapped into the iDD. The first RMD ("Journal-RMD") was designed in the spring of 2004 for exchange of journal metadata used by several RAs to support different services. An RMD may be developed for any "domain", which may be defined in any way that a group of RAs requires. Typically these are expected to be for domains such as "eBooks", "sound recordings", "multimedia rights" or "educational coursepacks", which may be centred on a type of referent, or sector or function, supporting any group of Application Profiles or DOI System services. Interoperability of RMDs will be ensured by a common structure and the underlying dictionary. The RMD uses a generic metadata structure of ten basic data element classes, developed from the <indecs> framework model and designed to incorporate all types of Referent metadata in a structured and flexible way. Table 3 shows the ten RMD basic elements, and to which class each of the more specialized Kernel elements belong:
Table 3: RMD basic element classes Subtypes can be added to the ten RMD elements to any level of granularity: for example, an identifier might have a subtype of ISBN, or vehicle registration number, and a relative might be a page or an edition. The elements can be nested in any way required: for example, a place may have a name which has an annotation, or an agent may have a category which has an identifier. Elements can be grouped together in any combination in composite elements. RMDs may incorporate data elements, allowed values, codes and composites from any other standard or proprietary message or metadata schemes (for example ONIX, SCORM or MARC) and draws on standard ISO codes and formats for Languages, Territories, Currencies, Measures and Dates and Times. All element types and allowed values for an RMD are registered in the iDD. Every RA wishing to make use of an RMD must register the corresponding data elements and values in its own database to ensure reliable mapping by other RAs. A set of standard element groups or "composites" will be developed to form a core XML schema so that these composites can be re-used in different RMDs. The generic RMD structure and early iDD vocabulary is particularly appropriate for multimedia referent and rights metadata, but is in extensible for any Referent and domain. RAs are free, of course, to use existing standards to communicate metadata between them where they are suitable. If, for example, two RAs are providing services requiring ONIX metadata, then it would be expected for one to provide ONIX message to the other. Likewise, one RA may wish to make different metadata records available to its users: a MARC-based RA may provide users with ONIX metadata records supplied by another RA for the same DOI name. The RMD is not a replacement for these, but to deal with a different issue: the integration of metadata from RAs and other sources using different standards (or none) where it is required. The RMD and iDD combine to provide a generic solution for this problem, ensuring that all such interchange schemas within the RA network are themselves compatible and maximise the opportunities to re-use data and formats. An RMD is developed with contributions from two or more RAs. RMDs are available for use by any RA. Any RA making use of a specific RMD may contribute to the editorial development of the RMD. An RMD will include the metadata elements required for all nominated services by any participating RA. Specific data elements within an RMD may be required only for specific RAs or Application Profiles, enabling the same RMD to be used flexibly within a community. The metadata flows between RAs, using RMDs and the iDD, are illustrated in figure 2 below.
Figure 2: Flows of metadata within the RA network More detailed discussion of the RMD data elements and structure can be found in Appendix 5. Previous Chapter: 3 Resolution Next Chapter: 5 Applications |