<!-- Annex D -->
<annexn cols=1 id=annfsi>Formal System Identifiers
<p>This annex states the requirements for the formal definition of
notations used in system identifiers to specify access to the storage
objects in which entities are stored.  Access is provided by "storage
managers" such as file systems, data bases, and main memory managers.
Objects may be stored individually, or as part of larger storage
objects, called "containers" or "archives", with a defined format for
multiple-object storage (e.g., PKZIP, TAR, etc.).  Access may involve
auxiliary processes, such as coding conversion, record boundary
recognition, and other processes required to present storage objects
to the SGML parser as entities.
<h2>System identifiers
<p>
An entity is a virtual storage object. A system identifier parameter
of a markup declaration can be used to map an entity onto one or more
real storage objects (or portions thereof), and to specify processes
to be performed in the course of accessing the object as an entity.
The format of a system identifier is normally system-specific (an
"informal system identifier"). However, when access to storage is
specified in the manner described in this annex, the system identifier
is called a "formal system identifier" (FSI).
.*
<h3>Storage object specification (SOS)
<p>
A formal system identifier consists of one or more "storage object
specifications", each of which identifies a storage object. The format
of an SOS resembles an element, in that it consists of a tag followed
by content. The storage objects are concatenated in the order
specified to comprise the storage of the entity.
<p>
In the SOS tag, the name appearing as the generic identifier is that
of the storage manager (SMName).  It is the name of a "storage manager
notation" that was identified as such by a declaration.  There can
also be an attribute specification list, consisting of attributes
defined for the storage manager notation.  These SOS attributes serve
as parameters that govern the access to the storage object.
<p>
The content of an SOS is known as the "storage object
identifier" (SOI). Its syntax and semantics depends on the individual
storage manager.
<p>
An SOS tag is recognized by the occurrence of a
start-tag open delimiter followed immediately by a declared storage
manager name (see <hdref refid=fsiops> followed either by an SGML "s"
separator or a tag-close delimiter.
The concrete syntax of an SOS is that of the prolog
in which the SOS appears.
<p>
SGML numeric character references are recognized in the attribute
value literals and content of an SOS. They can be used to avoid false
delimiter recognition.
.*
<h3>Informal system identifiers
<p>
A system identifier is recognized as an FSI only if it begins with an
SOS tag. Informal system identifiers can be used in the same document as
FSI's as long as storage manager names are chosen so that informal
system identifiers don't appear to begin with an SOS tag.
.***
.*
<h2 id=fsiaux>Auxiliary processes
<p>
Several auxiliary processes may be required to convert a newly-created
entity into the form in which it will be stored. Conversely, auxiliary
processes may be needed to convert the bits of a storage object to the
bit combinations seen by an SGML parser.
<p>
The SGML language is designed so that SGML documents can be stored in
the same form as other text files.
<note>
This design feature allows access and processing with normal
text processing tools, in case SGML-aware tools are unavailable or are
deficient in some respect.
</note>
As a result, after an entity is created or modified a number of
processes can take place as it is stored. First, if the entity is not
to be stored in a single storage object, it is divided into as many
portions as there are to be storage objects. The storage objects
can be in different storage systems (that is, under different
storage managers). For each portion, the following steps may be
performed:
<ol>
<li>Record boundaries are converted to the storage system form of line
endings for text files (for example, carriage returns, line feeds, or both).
<li>The fixed-width bit combinations seen by the SGML parser may be
converted to a variable-width code to save storage space (most likely
when multi-byte codes are used).
<li>The stored text may be encrypted.
<li>The stored text may be compressed.
<li>The stored text may be "sealed" by calculating a check number
that will no longer be valid if the storage object is modified.
<li>The location that the stored text is to occupy in the storage
object is determined (if it is not the entire object).  The text can
occupy one or more extents in the storage object.
</ol>
<p>
When an entity is accessed, the entity manager invokes the storage
manager for each storage object specification. For each storage
object, the following steps may be performed:
<ol>
<li>The extents of the storage object that are occupied by the
entity are located and concatenated into a single portion of stored
text.
<li>The integrity of the stored text (if sealed) is verified.
<li>The stored text is decompressed (if it was compressed).
<li>The stored text is decrypted (if it was encrypted).
<li>If the code in which the text is stored requires translation
and/or conversion to fixed-width bit combinations, the code is
normalized.
<li>Record boundaries are recognized and converted to SGML RE and RS
characters.
</ol>
<note>
Although the auxiliary processing is described sequentially for
clarity, an implementation can perform the processes in parallel and
in any order as long as identical results are achieved.
</note>
.***
<h3>Code normalization
<p>
Code normalization processes are "registered" by declaring them as
notations. The following ones are defined in this International
Standard:
<syml>
<sym>utf8
<desc>
Converts UTF8 to fixed-width encoding.  Invalid multi-byte sequences
are represented by the character 0xFFFD.
<sym>ucs2
<desc>
Converts UCS2 to a fixed width encoding.  The more significant octet
of each character always precedes the less significant octet
irrespective of the system's native byte-order.  The codes 0xFFFE and
0xFEFF are not treated specially in any way.
<sym>unicode
<desc>
Converts the Unicode coding system to a fixed-width encoding.  The
Unicode coding system treats each pair of octets as a character in the
system's byte order.  If the first character is the byte order mark
character (0xFEFF), it will be discarded.  If the first character is
the byte order mark character byte-swapped, it will be discarded and
the remaining characters will be byte-swapped.
<sym>ujis
<desc>
Converts from the variable-width (packed) UJIS (EUC) coding scheme,
to an entity coding system that represents each character in the same
way as the EUC complete two-byte format.  In the entity coding system
the code of characters in the G0 set (usually the Japanese version of
ISO 646) is unchanged; The code of characters in the G1 set (usually
JIS X 0208-1990) is ORed with 0x8080; the code of characters in the G2
set (usually half-width katakana from JIS X 0201-1986) is ORed with
0x0080; the code of characters in the G3 set (JIS X 0212-1990) is
ORed with 0x8000.
<sym>sjis
<desc>
Performs an encoding conversion where the storage coding system is
Shift JIS and the entity coding system is the same as with the ujis
encoding (except for characters in the G3 set which are not
representable using Shift JIS.)
<sym>zero
<desc>
Converts bytes to characters by zero-extending each character.
<sym>same
<desc>
The encoding conversion of the storage object in which the system
identifier was specified is used.
</syml>
.***
<h3>Encryption, compression, and sealing
<p>
The methods used for encryption, compression, and sealing are
"registered" by declaring them as notations.
.**
<h2 id=boxes>Containers
.*** THIS CLAUSE NEEDS A REWRITE FOR CONSISTENCY ***
<p>A container is an entity whose storage
is partitioned so that the data of several other entities ("contained
objects") can be kept in it.  The locations of the contained objects
are specified by their
entity declarations.  The entity declarations serve as entries in a
"table of contents" or directory.
<note>
The name "sbento" comes from the Japanese word "bento": "A box or basket with
multiple
compartments, containing a collection of disparate entities arranged
in an esthetically pleasing manner."  It is an acronym for
"Standard Bento Entity for Natural Transport of Objects".
</note>
<note>
Container entities provide a storage organization that applications
may take advantage of to avoid redundant descriptor information.
Containers may also facilitate interleaving and other techniques that
optimize access to multimedia data.
</note>
<note>
HyTime does not prohibit overlapping of contained objects; it is for
the application to determine whether overlapping is valid.
Overlapping can be a useful technique, for example, for identifying
the color table in a graphics entity.  The color table would be
declared as a separate entity, but its offset and size would position
it within the storage occupied by the graphics entity.
</note>
<p>The external identifier of the entity declaration of a contained
object must be identical to that of its container entity.
<note>
Container entities can be nested by specifying this attribute on the
subordinate (inner) container entity.
</note>
<cptr><![CDATA[
                   <!-- HyTime Data Attributes -->
   <!-- Standard Bento Entity for Natural Transport of Objects -->
<!attlist #NOTATION sbento  -- Attributes of sbento data content notation type--
                            -- Creates container by partitioning storage object
                               so several entities can store data in it. --
                   unitsize -- Sbento partitioning unit size (bits per unit) --
                            NUMBER   8
                   insbento -- Location of this entity in a sbento's storage --
                            -- External identifier of entity must be same as on
                               sbento entity so SDIF will not duplicate data --
                            -- ENTITY: sbento entity in which this occurs.
                               Digit+: Offset in sbento units (origin 1).
                               Digit+: Size in sbento units.
                               Offset/size repeatable for segmentation --
                            -- lextype(ENTITY, (s+, digit+, s+, digit+)+) --
                            CDATA    #IMPLIED  -- Default: not in a sbento --
>
]]></cptr>

.*
.*********
<h2>Identification facilities
<p>
A document indicates the presence of formal system identifiers by
using the APPINFO parameter of the SGML declaration and the
other facilities described in this sub-clause.
<h3 id=fsiinfo>APPINFO parameter of SGML declaration
<p>
Potential use of one or more storage managers defined in
accordance with these requirements is indicated by specifying the
keyword "FSISM" as a sub-parameter of the APPINFO parameter of the
SGML declaration.  The keyword indicates the potential presence in one
or more DTDs of a formal system identifier storage managers (FSISM)
declaration that identifies the "storage object specification
notations".
<p>
The format of the sub-parameter is:
<cptr><![CDATA[
FSISM
]]></cptr>
The sub-parameter can also specify the
name of the FSISM declarations in the document's DTDs if it is other
than "FSISM". The format is:
<cptr><![CDATA[
FSISM=FSIUsed
]]></cptr>
where "FSIUsed" is replaced by the declaration name.
.*
<h3 id=fsiops>Formal system identifier storage managers declaration
<p>
A FSISM declaration identifies one or more storage manager notations
used in system identifiers. System identifiers in which such notations
occur are known as "formal system identifiers".
A FSISM declaration should precede the storage manager notation
declarations pertaining to the storage managers that it identifies.
There can be more than one FSISM declaration in a DTD.
<p>
Syntactically, the FSISM declaration is a processing instruction
(PI), not an SGML markup declaration.  In the template below, it is
shown in the reference concrete syntax.  In use, the SMName-list
parameter must be replaced by one or more storage manager names
(SMName)
declared as notation names, separated by SGML <hp1>ts</hp1> separators
(white space).  It is an RAE if the DTD or meta-DTD does not declare a
notation for each SMName specified.
<p>
The declaration name is the initial character string, up to the first
ts separator. The name is always "FSISM" in meta-DTDs. It should
also be "FSISM" in DTDs, but provision is made for changing it in
the APPINFO parameter of the SGML declaration, if necessary, to avoid
the (admittedly unlikely) possibility of conflicts when retro-fitting
an architecture to a document that already has PIs that begin with
"FSISM ".
<cptr><![CDATA[
              <!-- FSI Storage Managers Declaration -->
         <!-- TEMPLATE FOR PI IN DTD OR DERIVED META-DTD -->
<?FSISM SMName-list >
]]></cptr>
.****
<h3 id=fsidec>Storage manager support declarations
<p>
Each storage manager name (SMName) specified in an
FSISM declaration must be declared as a notation name in a notation
declaration. The declaration identifies the storage manager definition
document, which defines the syntax and semantics of the storage object
specifications for that storage manager.
Associated with the notation declaration can be an attribute
definition list declaration for "storage manager support attributes".
<note>
It is not necessarily an error if the definition document cannot
be accessed, as an implementation might not require access to it.
The primary purpose of the declarations is to identify the storage
operators that are used and to declare any support attributes that
they require.
</note>
.****
<h4 id=fsinot>Storage manager notation declaration
<p>
A storage manager is a program (or a combination of programs or a
portion of a program) that manages the storage of real physical
objects (as opposed to an entity manager, which manages the storage of
virtual objects).
<note>
For a given system environment, there is likely to be a well-defined
set of available storage managers. The declarations for these could be
maintained in a public entity and made available to all users.
</note>
<p>
The content of an SOS (that is, the SOI) conforms to the rules of the
storage manager. Where those rules allow a relative filename, it is
interpreted relative  to  the  file  in which the system identifier is
specified.
<p>
Declarations for an incomplete list of well-known
storage managers are provided below. Because they are well-known, this
International Standard is referenced as the definition document.
Alternatively, declarations referencing the actual definition
documents could be used.
<p>
The list also includes declarations for special storage
managers actually defined in this International Standard. They are:
<syml>
<sym>fd
<desc>
The SOI is a number.  The storage object specification locates
the storage object that is created when the system reads from the file
descriptor with that number.  For example, in Unix and DOS
systems, fd:0 will read the storage object from standard input.
<sym>url
<desc>
The SOI is a Uniform Resource Locator, as used in the Internet's
WorldWide Web.
<sym>literal
<desc>
The SOI is treated as the literal text of a storage object.
SGML named character references are recognized in the literal text.
<note>
Literal text is used chiefly as a connector when concatenating other
storage objects.
The named character references can be used to insert record
boundaries between the concatenated objects.
</note>
<sym>thisone
<desc>
The SOI must be empty. The storage object is that
in which the formal system identifier occurs.
<note>
This one can be used in containers to make the system identifiers portable.
</note>
</syml>
<cptr keep=5><![CDATA[
            <!-- Storage Manager Notation Declarations -->
<!NOTATION Literal  PUBLIC "ISO/IEC 10744:1992//NOTATION
                    Literal Storage Object Specification//EN" >
<!NOTATION ThisOne  PUBLIC "ISO/IEC 10744:1992//NOTATION
                    This One Storage Object Specification//EN" >
<!NOTATION url      PUBLIC "ISO/IEC 10744:1992//NOTATION
                    URL Storage Object Specification//EN" >
<!NOTATION fd       PUBLIC "ISO/IEC 10744:1992//NOTATION
                    File Descriptor Storage Object Specification//EN">
<!NOTATION unix     PUBLIC "ISO/IEC 10744:1992//NOTATION
                    Unix Storage Object Specification//EN" >
<!NOTATION dos      PUBLIC "ISO/IEC 10744:1992//NOTATION
                    DOS Storage Object Specification//EN" >
<!NOTATION os2fat   PUBLIC "ISO/IEC 10744:1992//NOTATION
                    OS/2 FAT Storage Object Specification//EN" >
<!NOTATION os2hpfs  PUBLIC "ISO/IEC 10744:1992//NOTATION
                    OS/2 HPFS Storage Object Specification//EN" >
<!NOTATION win95    PUBLIC "ISO/IEC 10744:1992//NOTATION
                    Win95 Storage Object Specification//EN" >
<!NOTATION system7  PUBLIC "ISO/IEC 10744:1992//NOTATION
                    System 7 Storage Object Specification//EN" >
<!NOTATION cms      PUBLIC "ISO/IEC 10744:1992//NOTATION
                    CMS Storage Object Specification//EN" >
<!NOTATION mvs      PUBLIC "ISO/IEC 10744:1992//NOTATION
                    MVS Storage Object Specification//EN" >
<!NOTATION vms      PUBLIC "ISO/IEC 10744:1992//NOTATION
                    VMS Storage Object Specification//EN" >
<!NOTATION vse      PUBLIC "ISO/IEC 10744:1992//NOTATION
                    VSE Storage Object Specification//EN" >
<!NOTATION sql      PUBLIC "ISO/IEC 10744:1992//NOTATION
                    SQL Storage Object Specification//EN" >
<!NOTATION notes    PUBLIC "ISO/IEC 10744:1992//NOTATION
                    Lotus Notes Storage Object Specification//EN" >
<!NOTATION afs      PUBLIC "ISO/IEC 10744:1992//NOTATION
                    AFS Storage Object Specification//EN" >
<!NOTATION netware  PUBLIC "ISO/IEC 10744:1992//NOTATION
                    Netware Storage Object Specification//EN" >
]]></cptr>
.****
<h4 id=fsiatt>Storage manager support attributes
<p>
The designer of a storage object specification can optionally provide
an attribute definition list declaration. The attributes are used to
specify parameters to the storage access, in addition to the storage
object identifier.
<p>
The template for such a declaration is shown below. In use, SMName is
replaced by the actual notation name for the storage manager.
The template includes all of the storage manager support attributes
defined in this International Standard. In practice, an actual
declaration might include some or none of them, possibly with other
attributes unique to a particular storage manager.
<p>
The attribute <hp2>container</hp2> (<hp1>containr</hp1>) identifies
the object as a contained object of a container entity.  The SOI
locates the object within the container.  If the container is an
sbento entity, the SOI must be empty or the same as an SOI for the
container.  The "extents" attribute then provides the necessary table
of contents information.
<p>
The attribute <hp2>occupied extents</hp2> (<hp1>extents</hp1>)
specifies the extents of the storage object occupied by the entity as
a HyTime dimlist.  The number of bits per quantum is determined by the
"extquant" attribute.  Multiple dimension specifications can be
specified if the entity is segmented and distributed in several
locations within the storage object.
<note>
For example, this technique can be used to interleave the text of
entities that are accessed concurrently.
</note>
<p>The attribute <hp2>extent quantum</hp2>
(<hp1>extquant</hp1>) specifies the unit of storage in which the
partitioning of the container is expressed.
<note>
For example, the partitioning unit would be "8" if an
8-bit bit combination were used and the desired granularity for
specifying offset and size of the contained objects was the bit combination.
The unit would be "16" if either a 16-bit bit combination were
used and bit combination granularity was wanted, or if an 8-bit bit
combination were used but the granularity for offset and size was to be
pairs of bit combinations.
</note>


<cptr keep=5><![CDATA[
                  <!-- Storage Manager Support Attributes -->
<!ATTLIST #NOTATION SMName
                   containr -- Container in which this object is stored --
                            -- lextype(ENTITY) --
                            NAME     #IMPLIED  -- Default: not in container --
                   extents  -- Dimensions of occupied extents of object --
                            -- Constraint: applies to object as stored --
                            -- Constraint: interpreted as HyTime dimlist --
                            -- lextype(snzi, snzi)+ --
                            CDATA    "1 -1"    -- Default: entire object --
                   extquant -- Extent quantum: bits per quantum for
                               HyTime dimlist in extents attribute --
                            NUMBER   8
                   seal     -- Integrity information --
                            -- ulextype(NOTATION, s+, char*) --
                            CDATA    #IMPLIED  -- Default: none --
                   compress -- Compression information --
                            -- ulextype(NOTATION, s+, char*) --
                            CDATA    #IMPLIED  -- Default: none --
                   encrypt  -- Encryption information --
                            -- ulextype(NOTATION, s+, char*) --
                            CDATA    #IMPLIED  -- Default: none --
                   codenorm -- Code normalization --
                            -- lextype(NOTATION) --
                            NAME     same
                   records  -- Record boundary recognition --
                              (system|cr|lf|crlf|rms|none) system
                   socdel   -- Storage object character number delimiter --
                            -- Constraint: single character --
                            CDATA    #IMPLIED  -- Default: none --
                   tracking -- Record boundary tracking in messages --
                            (track|notrack) track
>
]]></cptr>