This version of the document may be found at:
http://pueblo.lbl.gov/~olken/mendel/X3L8/xmlannex/annex_v7.htm
This version of the document may also be found at:
http://www.sdct.itl.nist.gov/~ftp/x3l8/sc32wg2/projects/11179p3r/Current-mdl-terms/20000605-version/annex_v7.htm
June 13, 2000
This annex describes an XML (Extensible Markup Language) representation for the contents of a Data Registry of ISO/IEC 11179 Part 3. The starting point of this design is the normative UML metamodel for the data registry.
Topics:
This document does not (at present) specify an encoding of the UML metamodel for the data registry.
This document does describe the generation of an XML Schema suitable for the specification of an XML document which would contain the contents of an ISO 11179 Data Registry.
This document does not describe how to formulate an XML Query Language query against the data registry contents - because such Query Language is not yet standardized by the W3C. However, it is envisaged that the XML Schema which we do specify could be used as the basis against which such queries could be formulated once XML QL is standardized. We acknowledge that the schema design embodied in this document has sacrificed some ease of querying in favor of simplicity of design. We note that one consideration in the design of the XML schema herein was to assure that the resulting document could be uniformly queryable.
This document does not specify in detail the process of encoding the contents of a data registry into an XML document. The XML schema we specify for such a document is of sufficient simplicity that the requisite process should be self-evident. In any case it will vary somewhat depending on implementation decisions made by the registry designer.
There are many different ways that the contents of an ISO 11179 Data Registry could be encoded into XML for data transfer. Interoperability requires that there be only one or at most a small number of standard ways to conduct such interchanges. There are number of reasons for specifying a semi-automatic algorithmic approach to the generation of the XML Schema:
We have adopted the most recent public specification of the XML Schema Language to specify the structure of the XML document to be used to encode the contents of a data registry, in lieu of an XML DTD (Document Type Definition). The XML Schema Language will soon supercede the DTDs. A DTD can be mechanically generated from the XML Schema, albeit with some loss of information (e.g., concerning types and keys). We anticipate that the XML Schema Language Recommendation will be adopted by the World Wide Web Consortium prior to the completion of the final International Standard for ISO 11179 Part 3.
We had originally envisioned the use of XMI (XML Metadata Interchange Format) which has been standardized by the OMG (Object Management Group). This approach would have met our functional requirements, and has been the subject of much more detailed design studies. However, members of the L8 committee felt strongly that the XMI encodings were far too verbose. Furthermore, there was a reluctance to make the 11179 registry XML encoding dependent on an OMG standard.
The implication of the decision not to use XMI is that it will be necessary to write our own detailed specifications for the construction of the XML schema. Furthermore, it will be necessary to construct software (schema synthesizers, dumpers, loaders) which are specific to the ISO 11179 standard, rather than commercially available XMI tools.
The fundamental problem we confront in constructing an XML encoding of the data registry content is that the data registry schema is a graph, whereas XML (i.e., the nested element structure) is basically a tree. Thus we need to encode a collection of graphs (registry contents) into a forest of trees. Questions that need to be resolved include the following.
We have decided to make each object in the registry metamodel into an element, with the attributes of an object becoming nested elements. This solves the problem of choosing the roots of the trees - every object is the root of its own tree. This design has the advantage of simplicity, ease of dumping from a relational DBMS, and uniform treatment of all object instances. However, it is likely to be cumbersome to query - in effect requiring multiple joins to reconstruct the deep structure of the database.
The shallow tree design (shrubs) requires that every association (relationship) be encoded using IDREFs - again a very simple design.
We propose to use IDs constructed from persistent keys from the data registry. The advantage of this approach is that it does not require a separately constructed symbol (for the surrogate keys), and the key (IDs) are persistent, and hence can be used for navigational queries.
XML 1.0 requires that tag names are globally unique within the document. However, it si common modeling practice to reuse attribute names in multiple object classes. Hence we will prefix attribute names with the object class names to assure uniqueness. Because this has already been done in some portions of the registry data model, we will remove duplicate object class name prefixes wherever we encounter them.
| UML construct | XML construct |
| objects | elements |
| object class names | tag names - blanks changed to underscores |
| attributes | nested elements |
| attributes names | tag names - prefixed with object class name (separated by double hyphen) |
| associations | IDREFs |
| keys (see discussion) | IDs |
| specialization relations | IDREFs |
| containment relations | IDREFs |
Each object in the UML metamodel will be mapped into an element in the XML schema. The UML object class name will be used to construct the XML element tag name. Since XML tag names cannot contain blanks or colons (which are used to indicate namespace prefixes) we replace spaces with underscores and colons with hyphens. We assume that object class names are unique within the UML metamodel. Note that we have wrapped each collection of object instances in <object_class_name_collection> tags.
Each attribute of an object in the UML metamodel will be mapped into an element in the XML schema, nested within the object element. The UML attribute name will be prefixed with the object class name (separated by a hyphen) to construct the XML element tag name. Again we will replace spaces with underscores, and colons with double hyphens. If the object class name has already been prefixed to the the attribute name in the UML, it will be removed, i.e., only one copy of the object class name will appear in the final element tag name.
Note that we do not propagate object class name into complex types (nested element structures), since we assume that complex types are uniquely named within the entire registry metamodel. Hence, we prefix the name of the complex type to the attributes of the complex type in order to assure that the corresponding element names are unique within the XML document. See examples below.
Associations will be mapped in empty (i.e., singleton) XML tags. Empty tags have no content or matching closing tags. The tag name will be generated by concatentating the object class name with the role name, separated by a double hyphen.
Keys are not explicitly specified in UML. Hence we have inferred that attribute names ending in "identifier" are usually keys or part of keys. We concatenate all of the key attributes into a single string to construct a key (ID) for the object instance. We use double hyphens to separate key components and underscores to replace embedded spaces in the key components.
Note that this example is incomplete.
<ISO_11179_Data_Registry_Contents>
<administered_component_collection>
<administered_component
ID="DB--Lawrence_Berkeley_Lab--NERSC--unknown--data_ID--Version_7.3" >
<administered_component--identifier>
<Component_Identifier--registration_authority_identifier>
<Registration_Authority_Identifier--International_Code_Designator>
DB
</Registration_Authority_Identifier--International_Code_Designator>
<Registration_Authority_Identifier--organization_identifier>
Lawrence Berkeley Lab
</Registration_Authority_Identifier--organization_identifier>
<Registration_Authority_Identifier--organization_part_identifier>
NERSC
</Registration_Authority_Identifier--organization_part_identifier>
<Registration_Authority_Identifier--OPI_source>
DOE
</Registration_Authority_Identifier--OPI_source>
</Component_Identifier--registration_authority_identifier>
<Component_Identifier--data_identifier>
data ID
</Component_Identifier--data_identifier>
<Component_Identifier--version>
Version 7.3
</Component_Identifier--version>
</administered_component--identifier>
<administered_component--registration_status>
registered
</administered_component--registration_status>
<administered_component--administrative_status>
unknown
</administered_component--administrative_status>
<administered_component--creation_date>
1999-07-14
</administered_component--creation_date>
<administered_component--effective_date>
1999-07-14
</administered_component--effective_date>
<administered_component--until_date>
2000-07-14
</administered_component--until_date>
<administered_component--change_description>
unchanged
</administered_component--change_description>
<administered_component--administrative_note>
none
</administered_component--administrative_note>
<administered_component--explanatory_comment>
no explanation
</administered_component--explanatory_comment>
<administered_component--unresolved_issues>
no unresolved issues
</administered_component--unresolved_issues>
<administered_component--origin>
United Nations
</administered_component--origin>
<administered_component--described_by
IDREF="Reference_Document--ISO_11179_Part_3--2000" />
<administered_component--described_by
IDREF="Reference_Document--ISO_11404" />
</administered_component>
</administered_component_collection>
<Reference_Document_collection>
<Reference_Document ID="ISO_111793_Part_3--2000" >
<Reference_Document--identifier>
ISO_111793_Part_3--2000
</Reference_Document--identifier>
<Reference_Document--type_description>
ISO Standard
</Reference_Document--type_description>
<Reference_Document--language>
English
</Reference_Document--language>
<Reference_Document--title>
Specification and Standardization of Data Elements, Part 3
</Reference_Document--title>
<Reference_Document--describing
IDREF="DB--Lawrence_Berkeley_Lab--NERSC--DOE--data_ID--Version_7.3" >
</Reference_Document>
<Reference_Document ID="ISO_11404" >
<Reference_Document--identifier>
ISO_11404
</Reference_Document--identifier>
<Reference_Document--type_description>
ISO Standard
</Reference_Document--type_description>
<Reference_Document--language>
English
</Reference_Document--language>
<Reference_Document--title>
Language Independent Data Types
</Reference_Document--title>
<Reference_Document--describing
IDREF="DB--Lawrence_Berkeley_Lab--NERSC--DOE--data_ID--Version_7.3"/>
</Reference_Document>
</Reference_Document_collection>
</ISO_11179_Data_Registry_Contents>
[Editorial note:
Maintained by Frank Olken (Lawrence Berkeley National Laboratory)
olken@lbl.gov
Last revised:
2000-06-13 2:13 PDT]