Outstanding

This document describes a coding of email and other messages in XML. This is such messages.


Introduction
This document describes a coding of email and similar messages (such as RFC822 [1]) using XML [2], described here as the Email+XML message format.
The present document is presented as a design that can be used by XML applications that deal with email and similar messages.  [9] specifications.
o to allow header information to be compatible with RDF format [10], for use by generalized metadata processing applications.

Structure of this document
Section 2 describes the overall message structure, showing how the message header and message content can be conveyed in MIME and XML transfer environments.
Section 3 describes the message header in greater detail, with particular reference to differences in the value of individual fields compared their RFC822 counterparts.
Section 4 discusses issues that may arise when converting between traditional RFC822 and the Email+XML message format described here.
Appendix A contains a MIME content-type registration for Message/Email+XML.

Klyne Internet draft [Page 3]
XML coding of RFC822 messages 9 April 2002 <draft-klyne-message-rfc822-xml-03.txt> Appendix B contains a DTD for the Email+XML message format.
Appendix C contains an XML schema for the Email+XML message format. (XML schema are set to replace DTDs are the prferred way to describe XML docoment content.) Appendix D briefly discusses the RDF representation [10] and its applicability to the Email+XML message format.
Appendix E contains an RDF schema [23] description for the Email+XML message format.

Document terminology and conventions
Message an assemblage of information that constitutes a communication of information from a sender to one or more recipients. Consists of a message header and message content.
Message header contains information about the message that is conveyed between message user agents, and not used by the message transfer mechanisms. This may include who the message is from, who it is addressed to, other parties to whom it has been copied, subject of the message, date the message was composed, etc.
Message content some arbitrary data carried in a message.
Email+XML is the message format defined by this document. (This name uses the XML content type labelling convention [11].) The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [19]. There is, of course, some overlap in capabilities, and reasonable people may disagree about the appropriateness of using MIME and/or XML in particular circumstances.
This document is predicated on the idea that XML is a useful mechanism (in addition to existing facilities) for structuring message header information. It aims to be agnostic with regard to using MIME or some other framework for composing and encapsulating messages. o The message header contains information about the message: who it was sent by, who it is addressed to, its subject, date it was sent, and many other related pieces of information.
o The message content is any data that is carried by the message: e.g. a text message, fax image, voice message or arbitrary application data. In principle, any data that can be transfered as a MIME object can be message content, though specific applications may limit the kinds of data that can be transferred.
The Email+XML message format uses a URI-reference [3] in the message header to reference the message content. Thus, the message content may be completely separate from the message header; the message header is the root information of a message, from which message content may be discovered.
Two specific message structure scenarios are contemplated here: o Multipart/related, and o An XML element within the message header.
These are described below. Other message structures are possible (e.g. multiple resources on a web server, multiple channels in a multiplexed protocol), but are not described here.

Message header overview
The message header is an XML document whose root element is <Message>. This contains a number of elements; an initial set of such elements is defined based on RFC822 message headers [1].
The message content is indicated by an attribute of the <Message> element whose value is a URI-reference for the content.
The message header is discussed in greater detail in section 3 below. The Multipart/related content-type header indicates the root of the message by its Content-ID value [6]. In turn, the message header refers to the message content with a <Message> element 'content=' attribute whose value is a 'cid:' URI [16].

Inline XML message structure
When the message content can be expressed as simple text or XML, it may be included within the message header using a <content> element containing the message content instead of a 'content=' attribute.  [24]. The document may contain <?XML?> and <!DOCTYPE> declarations, but these are not required.
The body of the document is a <Message> element, as described below.
The character set encoding used in a Message/Email+XML entity is UTF-8.
A Content-type registration template for Message/Email+XML is contained in Appendix A of this document.

Message header
The Email+XML message header contains header fields based on RFC822, and coded in XML.
The message header contains information about the message that is conveyed between message user agents, and not used by the message transfer mechanisms. This may include who the message is from, who it is addressed to, other parties to whom it has been copied, subject of the message, date the message was composed, etc.
The message header also contains a reference to the message content, as described in the previous section.

The <Message> element
The <Message> element contains the message header, and references the message content. Header field elements may appear in any order. When present, the <content> element MUST the last one in the <Message>.
The <Message> element MUST contain either a 'content=' attribute or a single <content> element. It must not contain both.

Use of XML namespaces
The <Message> element, <Address> and related element names, the <content> element and <Message-content> element names name are all associated with a namespace called 'URN:ietf:params:email-xml:'. RFC822 header element names are associated with a namespace called 'URN:IANA:namespace:rfc822:'. (These namespace identifiers are based on "A URN Sub-namespace for Registered Protocol Parameters" [20].) The namespaces must be declared, either as a default namespace or using a namespace prefix (which is an arbitrary local name). The namespace declaration may appear as an attribute of the <Message> element, or in the surrounding XML context.

Klyne
Internet draft [Page 10] XML coding of RFC822 messages 9 April 2002 <draft-klyne-message-rfc822-xml-03.txt> The message examples in section 2 use namespace prefixes 'emx:' and 'rfc822', but any prefix could be used here. Here is a different message example using a default namespace rather than a namespace prefix for the non-RFC822-derived names:

The <content> element
The <content> element is used to include the message content as text or XML data in the message header. It is present when the <Message> element does not have a 'content=' attribute.
Possible <content> attributes are: o 'type=' is optional, and indicates the MIME content-type of the message content. If not specified, a content type of "text/xml" is assumed.
(Whatever MIME content-type may be declared, the message content must be well-formed XML or character data. In practice, this means the content must be some character-based data representation.) o 'xml:lang=' [2] may be used, in which case it specifies the language of the message content.

Klyne Internet draft [Page 11]
XML coding of RFC822 messages 9 April 2002 <draft-klyne-message-rfc822-xml-03.txt> The character encoding for the message content is the same as that used for the surrounding XML. This is typically UTF-8, from the character set encoding of the MIME content-type Message/Email+XML.) The message content may be any well-formed XML, which includes simple character data. Characters '<' and '&' that are not part of XML markup MUST be represented as '&lt; and '&amp;' respectively. The character '>' appearing in the sequence ']]>', other than at the end of a CDATA section, MUST be represented as '&gt;'.

General form of header field elements
Each header field is represented by an XML element that identifies the field.
The element content is the header field value. For RFC822 and MIME header fields, the field value is character data in which the characters '<', '&' and '>' are represented as for character data in <Message-content> (see above).

RFC822-derived header elements
For representing information about email messages, this specification introduces message header elements with names and semantics based on RFC822 header fields [1]. The intent is that the semantics of any RFC822 header field is easily represented in an Email+XML header element; it is not a goal to capture the detailed syntax of any particular RFC822 message, or to construct a corresponding RFC822 message from any Email+XML message.
RFC822-derived header elements have names based on RFC822 header names, using all lower-case characters (noting that XML element names are case sensitive). o Special considerations apply to fields containing human-readable text values (subject, comments, etc.) --see section below.

Header fields containing addresses
Parts of an RFC822 address value are separated out into separate elements, all contained within an <Address> element. The element types defined here are <adrs> and <name>.
A major change from RFC822 is that all addresses are presented as URIs, rather than as RFC822 'addr-spec' values. Email addresses (the only kind that appear in RFC822 headers) are expressed as 'mailto:' URLs [21]. Address URIs are enclosed in an <adrs> element.
This change anticipates that XML-based message headers may be used with a variety of different protocols with different addressing schemes.
Finally, only one address per message header element is allowed (or an address group: see below). Where permitted, multiple values are represented by repeating the header element for each value.
Note that characters in URIs are drawn from a limited repertoire; the URI '%' escape sequence may be used to represent other characters that are legal for the URI scheme used [14].
The RFC822 address structures using 'phrase' are supported. The 'phrase' is a "formal name", and is enclosed in a <name> element. XML coding of RFC822 messages <draft-klyne-message-rfc822-xml-03.txt> <emx:name>MR SANDERS</emx:name> </emx:Address> Any '<', '&' and certain '>' characters appearing in a formal name (<name> element) MUST be represented using '&lt;', '&amp;' or '&gt;' as noted previously in section 3.4. 3.7.1 Header fields containing address groups Some RFC822 headers can have address group values as well as just address values. The RFC822 'group' structure associates a collection of addresses with a name for that collection. The individual addresses in a group may be omitted.
An address group is expressed using a <Group> element containing the name of the group and zero, one or more <member> elements each containing an <Address>: In the absence of such an attribute, any language applicable to the surrounding XML is to be assumed.
3.9 MIME header fields MIME content header fields MAY be part of the message header, using the same general format and XML namespace as RFC822-derived header fields (i.e. element name based on the MIME header field name, and associated with the same XML namespace).
But note that most MIME header fields are not appropriate for use with the Email+XML message format. When the message content is supplied as a separate MIME entity then MIME content header fields SHOULD be applied to that entity.
It is expected that MIME header fields may be useful in the following circumstances: o When the message content is included as inline XML, to convey information about it that cannot be conveyed using native XML mechanisms; e.g. the Content-features header [22].
o MIME headers, not having an obvious XML counterpart, that express information that might be taken as metadata applying to the message as a whole, in isolation from the specific message content; e.g. the Content-description header field.

Other header fields
A message header MAY contain header fields that are not derived from RFC822 or MIME. Any such header field names used MUST be associated with a different namespace. But sometimes it is desirable to introduce new header fields that must be understood for proper processing of the message to take place. This specification defines an XML attribute 'mustUnderstand=', which indicates whether or not the element to which it applies must be understood by a message processor: is the default case, and indicates that the corresponding element MAY safely be ignored.
mustUnderstand='true' indicates that the element to which it applies MUST be processed, OR processing of the entire message (or message header) MUST be abandoned.
In XML namespace terms [9], the 'mustUnderstand=' attribute belongs to a "per-element-type namespace partition". Interpretation of the attribute is a property of the element to which it applies. In any case, the DTD or XML schema must declare that the element is allowed on any particular XML element type. It is strongly recommended that any header elements used within an Email+XML message header allow this attribute with the interpretation described here.
Non-validating XML processors used to handle Email+XML message headers MAY interpret the 'mustUnderstand=' attribute appearing on any header field element as described here.
Notwithstanding the presence or absence of a 'mustUnderstand=' attribute, individual applications may require that certain header elements are present or absent from any header that they interpret.

Internationalization considerations
This specification attempts to relax the restriction of international data imposed by RFC822.
RFC822 limits characters in address local parts to US-ASCII. This specification uses URIs and XML-based address format, relaxing that constraint so that foreign language personal names can be represented. Character restrictions apply to URIs, and the %-escape mechanism defined by RFC2396 must be followed for representing non-URI characters. The character encoding used is dependent on the URI scheme, but UTF- 8  Similarly, the characters that can be used in domain names are currently severely constrained. Work is under way to define international forms for domain names.
Message content is tagged using standard MIME capabilities (charset parameter for text data [13], and Content-language header for language tagging [22]). Mandating handling of international data formats is a matter for particular applications; it is recommended that applications using the Email+XML message format be required to process UTF-8 coded character data. That does not necessarily mean that all characters received can be displayed.
For content included in an XML element, language tagging can be achieved by including an 'xml:lang=' attribute [16] in the <Message-content> element (subject to appropriate DTD or XML schema permission to use that attribute). The original XML spec says (http://www.w3.org/TR/1998/REC-xml-19980210#sec-external-ent): An XML processor should handle a non-ASCII character in a URI by representing the character in UTF-8 as one or more bytes, and then escaping these bytes with the URI escaping mechanism (i.e., by converting each byte to %HH, where HH is the hexadecimal notation of the byte value).
This says that the XML processor should do this for you, and therefore it should be okay for you to put in the original characters. But there are three problems here: o It says 'should', not must. o It's not clear whether it applies to all URIs, or just to the URIs used in System Identifiers, and in the former case, it's not clear how an XML processor would find all URIs in a document (without e.g. Schema information).
o The text in the second edition of XML (http://www.w3.org/TR/REC-xml#sec-external-ent) is much clearer about how the conversion has to take place; unfortunately, it doesn't make clear who should do this conversion (the original document producer or the XML processor). The idea was not to change this for the second edition, but somehow it got lost. I'm following up on this.