Great Idea, but how do I do it? A practical example of learning object creation using SGML/XML

The educational community is interested in learning objects, what they are, how they are used, and the many benefits derived from their use. Most educators are familiar with the value of learning objects in theory, but on the practical side are wondering what is involved in creating them. This paper offers a "how-to" of learning object implementation, for text, based on four years experience working with Open Learning Agency’s structured content development model. Throughout the paper the analogy of a yogurt container will be used to help illustrate the concepts behind implementing a structured content development model.


Introduction
objects. The format and size of a learning object depends on the need the object was created to meet, including its potential for reuse. The value of learning objects is in their reuse. The Open Learning Agency (OLA) is a publicly funded BC learning establishment involved in the development and delivery of distributed learning materials for university, college, K-12, and workplace training. OLA has been using a structured content development model for creation of its text-based learning objects since 1998 and in the fall of 1999 published fourteen senior secondary courses that were developed using this model. Implementation of a structured content development model was undertaken as course content was required to be output in both print and online media. The efficiencies of developing and maintaining one instance of content, in a platform and software independent format, is an important feature as well as the potential for re-use. For example, content developed for Information Technology grade 11 and 12 courses is output as two secondary courses ( http://k12online.ola.bc.ca/index.html ) and is re-purposed for use in The Learning Lab ( http://www.ola.bc.ca/tll ), a teacher's professional development program (Porter, 2001).

Structured Content Development Model
To implement a structured content development model, the underlying structure for all text learning objects must be designed. The learning objects themselves are authored in accordance to this defined structure using a markup language such as SGML (Standard Generalized Markup Language) or XML (eXtensible Markup Language). Authored content in the form of SGML/XML files resides in a content repository, which has capabilities for search, retrieval, revision, and version control, among other things. Acquisition and administration of such a repository is required. For output, the SGML/XML file must be transformed for a specified media. The transformation process requires a programmer to develop custom scripts, a technician to run the scripts once developed, and visual designers to create the look of the output media such as web pages, print products, and CD-ROMs.
Developing educational content using a structured content development model requires a great deal of effort. For organizations that create a lot of similar content, the effort is worthwhile as content can be output in a variety of media, with a consistent look within each output media. Also, the concept of content that can be reused in different contexts, both within and outside an institution, is the underlying theme behind the "economics" of learning objects (Downes, 2000).
write content, but also spend hours formatting so it has a pleasing look. More sophisticated users set up style sheets in their word processing or desktop publishing software in order to apply consistent styles throughout their documents. Usually these styles have names like body text, heading, and subhead . This naming reflects the style applied to the content, not what the content is. For example, body list may be applied to a list of learning outcomes and a list of required resources for a lesson. It may be visually attractive to style these two items in the same way, however, semantically they are very different. Using a markup language to define the content, the learning outcomes may be identified with a tag called learningoutcomes and the resources with a tag called resources. In output to the specified medium, the content in learningoutcomes and resources could have the same style applied to them, or different styles. The power of identifying content for what it is lies in the ability to intelligently search content for reuse. For example, an instructor able to search for learningoutcomes can easily and efficiently find and evaluate existing content. In the same way, if a certain textbook is already being used, searching for it in resources is an efficient way to find relevant content. It is advantageous to use a markup language for educational content because: A markup language allows you to define content for what it is, thereby creating a database of content that can be easily searched and reused.
Defining a structure ensures consistency and completeness in all documents of that type. Content is independent of format and style (these are applied at output when the medium is specified). Corrections only need to be done in one file, not in every output instance of the document. For example, if a typo is found in a course delivered both in print (developed in desktop publishing software) and online (developed in HTML), the correction must be made in two places. If the course were developed using a markup language, the correction would only need to be made in one place and the two output instances run again. If more extensive changes were required, the benefits of "single source" are even greater. SGML and XML are ASCII files and so are platform and software independent.
Designing the Structure -Document Type Definitions The first step in developing a structured content development model is to define the document's structure for the markup language. This structure is referred to as a Document Type Definition (DTD). A DTD is a "rule set" (Maler & El Andaloussi, 1996, p.4) to identify what pieces of content (defined as "elements") are required or allowed at what place in the document. Elements such as learningoutcomes and resources are created, and rules such as "learningoutcomes are required in a lesson, but resources are optional" (there may be no resources required for a lesson) are defined. Once the structure is defined, content is written to the structure using a markup language (SGML or XML). The DTD is a text file that is referenced by the conforming SGML or XML file that holds the authored content. To create a valid SGML or XML file, a parser is used to compare the file to the DTD to ensure it conforms to the structure defined in the DTD.
Creating a DTD involves a detailed analysis of the current set of documents it needs to represent (Maler & El Andaloussi, 1996). As part of the analysis, group discussions with individuals involved with the document creation process need to be conducted. This includes working with instructors, course designers, writers, editors, desktop publishers, and end users of the document to find out what is important, what works well, and where problem areas are. In analysis, look for patterns and anomalies and clarify if what breaks a pattern should be allowed or not. This structure will be the defining blueprint from which content is authored. An SGML or XML file is not valid unless it conforms to the defined DTD so it is important to involve all stakeholders and be sure they understand the DTD will define the standard for the organization's document creation. Group consensus is needed in decisions regarding the structure of the document.
As an example, the text for a reading journal activity follows. In analysing the meaning of the content of the activity, three distinct pieces are identified: title, instructions, and criteria. In looking at other activities, one would compare if these three pieces were common to all activities and if other pieces were needed such as introduction, associated marks, exemplar, etc. In studying a group of activities, it might also become apparent that one activity type was not sufficient, and that activities should be modelled depending on what type of activity they were. For example, a reflection activity such as the reading journal is a much simpler structure than an activity with set question types such as multiple choice.  Another factor to keep in mind when designing a DTD, is to design structures to hold the type of information that would accompany authored content, but is not output for viewing. For example, it might be useful to include information such as author and date of creation for the reading journal activity above. Information such as this is referred to as metadata.
One of the most difficult decisions in designing a DTD is establishing how detailed to make it. A good representation of structure with enough rigidity to force compliance and with enough flexibility to handle all the different nuances within documents, is desired -detailed enough to handle all possibilities yet simple enough to be easy to use. Too detailed a structure is difficult to write content to and more difficult to process for output, as the more elements defined, the more must be written to and addressed. A loose structure results in allowing elements to be interpreted and used in different ways by different authors. Knowing the authors and their tolerance for following a set structure can be used as a guide to define how rigid or loose to make the model. If it is too complex, people will not want to use it.
Development of a DTD is an iterative process. Once defined and implemented, users will request changes based on things they need that were not included or things that are not working well for them. The best way to see if the structure will work is to try it out with actual content as a beta project. There is a significant learning curve as users learn to develop within a structured environment. This is to be expected as change and establishment of new methods is a lengthy process. It is best to start with a simple structure and build on it as stakeholders and users become more experienced working within a structured environment and more knowledgeable about what it can and can not do. At OLA we are currently revising our original DTD based on feedback from four years' use.
A DTD is a defined, hierarchical structure that can be thought of as a collection of containers to hold and identify specified objects. Using an analogy, if you wanted to purchase yogurt you would go to a food store and look for the yogurt in the dairy case. The food store can be thought of as a structured environment, a container for food that can be purchased. The dairy case too is a structure, a container for dairy-based foods. The yogurt container is a structure designed to contain yogurt. The labelling on the container provides information about the kind of yogurt. Just as the yogurt is contained within a dairy case and food store, a learning object, such as an activity, is contained within a course structure.
When designing DTDs, diagrams are often used to visualize the structure before writing occurs. Following are two tree diagrams (Maler & El Andaloussi, 1996) of simplified structures. Elements are depicted as boxes. No occurrence indicator beside an element means that the element is required and can only occur once. The "?" means there can be zero or more elements, "+" means one or more elements are required, and "…" means that part of the diagram is described elsewhere. For example, in the case of the course structure under assessment there are three choices: reflection, practice, and assignment. There must be at least one assignment within assessment, and there can be none or more reflection and/or practice activities. In this diagram the structure for reflection is modelled, however, the structures for practice and assignment are not. To keep the examples clear, the structures represented are greatly simplified from how they would be designed for actual application.

Figure1. Tree diagram
showing simplified food store structure.  ("PCDATA" stands for parsed-character data, which means it is text or "data characters" (Maler & El Andaloussi, 1996, p. 22).) Figure 3. Course DTD segment.of this example. To keep things simple, the example does not contain content for assignment, which is a required element within assessment.

Authoring to a Structure
Once the structure (DTD) is defined it can be filled with authored content. In our example of a yogurt container as the defined structure, the content is the yogurt. While any type of yogurt could be placed in the container, in this example it is Market's 2% organic plain yogurt. This would be marked up as shown in Figure   Marked-up content for yogurt example.
Following, the reading journal activity is marked up using the assessment reflection structure as defined in the Course DTD. The original content was revised to make it a standalone learning object -references to other parts of the course (lessons and sections) and media specific references (workbook) were removed. Information abut the activity has also been added so the content now consists of metadata, learning outcome, instructions for the learner, and evaluation criteria. Some parts of the metadata, such as title, will be output for the learner to view, while others, such as date and description are only for indexing purposes. A more extensive set of metadata would be used to index a "real" activity, just as a "real" learning object and course would contain a structure much more complex than that. To keep things simple, the example does not contain content for assignment, which is a required element within assessment.
Part of writing a DTD is writing accompanying documentation including an element dictionary (to define all elements in the structure) and best practices guide (the best way to write content for the structure).
Documentation such as this is necessary so that content is structured in a consistent and meaningful manner.
For example, one could argue that the reading journal activity might be more appropriate in the content structure as it is not formally assessed. If, in the best practices guide, there are directions to put any writing or other creation action of the learner within the assessment structure, and any activity to do with personal opinion in the reflection structure, it is clear where the reading journal activity should go.  In establishing the set of metadata to use for learning objects, one needs to keep in mind whether conformance to a standard or specification is important (how broadly will the learning object be shared?). Decisions about which elements of the standard or specification to use (most elements are optional) also needs to be made. One has to establish a balance between the amount of useful information to hold against the time and effort to enter and maintain the information. The possible exchange of learning objects within a community of organizations will also impact the choice of metadata.

We find what is inside based on the labelling of the containers (the metadata).
For example, in the case of the reading journal activity, a search could be done on the title element using the keyword "journal". The description element could then be viewed to assess whether or not to investigate the content. A content repository is a database designed to hold structured documents. The repository includes features such as search, edit, access control, version tracking, reuse of elements within other documents through reference, and dynamic delivery to the web. In choosing a repository, consider the integration requirements of systems, such as the learning management system, student registration and administration system, test bank, and external learning institution systems.

Once the content repository is populated with learning objects, clearly identified by associated metadata, there exists a bank of information to draw upon. This bank may be only within an institution or may be part of a broader community of
institutions.

Output
We take the yogurt (content) out of the refrigerator (repository) to use in a variety of ways, such as eating it plain, mixing it with other ingredients to make a salad, or stirring it into a curry. We can do this because we know we have plain yogurt by the labelling on its container (metadata) and plain yogurt can be reused in many ways. Output of learning objects is similar in that they can easily be found (due to their metadata) and reused (due to how they are written). Standalone content is content that is self-contained. Self-containment facilitates use and reuse of learning objects as they may be combined in any number and sequence. To be self-contained, a learning object should not make explicit reference to objects outside itself. For example, the revised reading journal activity does not make reference to a workbook or a specific story or poem.

For example, the reading journal activity could be used for a number of
Directions to read particular works are separate from the activity, so that the activity is standalone making its potential for reuse much higher.
Granularity refers to the size of the component that defines a self-contained learning object or "the degree of precision with which learning objects … can be described" (Porter, 2001, p. 49). How granular the object is should be decided upon before writing so the writer knows the level of self-containment to write to.
It could be as large as a course or as small as an activity within a lesson. The smaller the piece, the more potential for reuse, but if too small can be difficult to write to. Also, contextualizing many small pieces all the pieces in a meaningful way can be challenging.
Writing content that is separate from format and media is also important. This involves avoiding specifics such as "glossary terms will be in shown in blue" or "bold face" as this may not be the case in all output media. Language must be as non-specific as possible in terms of format and media. Write to the meaning, not the look or location of the content. Statements such as "look at the green area on the map on the next page" should be written as "look at the Black Land area beside the Nile River on the following map of Egypt". Once a writer becomes familiar with this style of writing, it is not difficult to write standalone content that still provides clear instruction.

As content is self-contained, it is necessary to break it up into distinct chunks
where each chunk contains only information relevant to itself. This set of information should be given a meaningful title. Keeping this in mind while authoring contributes to clearer instructional writing as the writer strives to separate concepts into discrete pieces rather than mixing them (Kilian, 2000). At OLA writers are typically contracted so it is unreasonable to expect them to purchase and learn to use a structured authoring tool for a short-term contract.
Our solution is to provide a guidelines document explaining the structured development model along with word processing templates which define the structure (in broad terms). The files from the writers are converted into markup by a team of production staff dedicated to "tagging" content. Having a team of dedicated staff results in consistent markup and frees the writer to concentrate on the content. Throughout the process, however, the writer must be guided to provide content that will fit the defined structure of the model.

Team Support and Education
In and contributors, date, revision history, and copyright; they just don't think of it as metadata.) Select a metadata application profile,like CanCore, and choose the elements that are appropriate to identify the content. The content itself does not have to be part of the database; a reference to where it resides is enough. This way the idea of searching and reusing learning objects can begin to happen. As user sophistication about metadata grows, the metadata set can be added to.
Much of the work to date has concentrated on defining the metadata around the learning object, not on the structure of the object itself. For experience in using markup languages, a DTD could be written for the metadata set. Once the DTD is written, the metadata for the content can be structured in a markup language like XML.

Conclusion
As more institutions become involved in e-learning, the wealth of available learning objects will grow and the importance of finding an efficient way to handle the interoperability -the identification and exchange -of educational content will become a crucial issue. Using a markup language such as XML to define the metadata and the object itself is an obvious solution. In a structured • Identifying priority areas for Canadian involvement in national and international e-learning specifications and standards activities to ensure a coordinated and effective role for Canada.
• Establishing a focal point for coordinating and disseminating information on educational standards and specifications.
• Promoting and encourage networking between key players in the field, both within Canada and internationally, to leverage insights and knowledge and add weight to Canadian priorities internationally.