L1 (Linear Assertion Notation)
Description
During the development of GwTk we noticed that the
process of constructing a topic map graph from markup (e.g. XTM) can be
split into two phases: transforming the markup into a sequence of assertions
between subjects and building the graph from that sequence.
The advantage of this approach is the possibility to decouple the markup
processor from the graph implementation. It also emphazises that the
interpretation of a certain type of markup (the processing model) is
absolutely independent from the concept of the topic map graph and its
validity requirements.
In GwTk the connection between processor and topic map graph were made by way
of callback functions (a portion of glue code registered callbacks with the
processor and passed the received assertions on to the graph building
module) while Goose takes this a step further and introduces an intermediate
representation of the topic map information. L1 is the notation for this
intermediate representation.
L1 is a linear notation and each line of an L1 representation contains
exactly one assertion as described in the
ISO Draft
Reference Model for Topic Maps
So, what is an assertion ?
An assertion can be pictured in ASCII-art like this:
P
|
R1 | R2
| | |
| | |
x1------C1------A------C2------x2
The assertion itself (the subject that represents the relationship) is expressed
as the A-node. The P-node represents the assertion pattern (the subject that
expresses the type of relationship). For each membership there is an RCx subgraph
that expresses that a certain player (x) plays a certain role (R) in the assertion.
The 'fact that x plays R in A' is itself a subject that can be talked about, hence
it is represented as a node, too, called the casting node (C-node).
A line in L1 notation is the linear representation of such an assertion and the
above example would look like this:
P A ( R1 C1 x1 )( R2 C2 x2 )
In order to identify what subjects all the nodes refer to, subject indicators of
the nodes are used. In most cases, the markup contains so-called node-demanders
(elements that demand the existance of a node in the resulting graph) for the various
portions of the assertions and the addresses of those elements become the subject
indicators in the L1 notation. Sometimes no node demanders are present for A and
C nodes and they may therfore be omited from the L1 line. Thus
P ( R1 x1 )( R2 x2 )
is also a valid L1 version of an assertion.
P, A and R subjects may never be addressable subjects so they can be unambigously
identified by the plain address (e.g. URI) of any of their subject indicators. Not so
in the case of the role players. They might either be non-addressable subjects
(indicated by a resource) or addressable subjects (constituted
by a resource). In order two distiguish between the two the addresses are
surrounded by angle brackets in the former case and square brackets in the later:
< http://www.w3.org/ >
refers to the subject indicated by the resource (presumably the W3C) and
[ http://www.w3.org/index.html ]
refers to the particular resource itself (the particular document).
BNF for L1
Here is the BNF for L1:
assertion ::= patternLoc ws (anodeLoc ws)? member*
member ::= '(' ws roleLoc ws (cnodeLoc ws)? (sir | scr) ws ')'
sir ::= '<' ws resource ws '>'
scr ::= '[' ws resource ws ']'
resource ::= locator (ws '"' data '"')?
patternLoc ::= loc
anodeLoc ::= loc
roleLoc ::= loc
cnodeLoc ::= loc
loc ::= character+ except ws
data ::= character* " must be escaped as \"
ws ::= space
space ::= #x20 /* US-ASCII space - decimal 32 */
character ::= [#x20-#x7E] /* US-ASCII space to decimal 127 */
tab ::= #x9 /* US-ASCII horizontal tab - decimal 9 */
|