openEHR logo

Object Data Instance Notation (ODIN)

Issuer: openEHR Specification Program

Release: LANG latest

Status: STABLE

Revision: [latest_issue]

Date: [latest_issue_date]

Keywords: data, notation, JSON, syntax

openEHR components
© 2003 - 2023 The openEHR Foundation

The openEHR Foundation is an independent, non-profit foundation, facilitating the sharing of health records by consumers and clinicians via open specifications, clinical models and open platform implementations.

Licence

image Creative Commons Attribution-NoDerivs 3.0 Unported. https://creativecommons.org/licenses/by-nd/3.0/

Support

Issues: Problem Reports
Web: specifications.openEHR.org

Amendment Record

Issue Details Raiser Completed

1.5.3

SPECLANG-10: Fix typos in ODIN.

R Kavanagh

03 Apr 2023

LANG Release 1.0.0

1.5.2

SPECLANG-7: Move IDON spec to LANG component.

openEHR SEC

11 Nov 2018

SPECLANG-3: Add grammar support for plus-or-minus ranges;
SPECLANG-4: Update grammar to support documented ODIN keys;
SPECLANG-5: Update grammar to support documented openEHR ISO 8601 variant using '??'; make MONTH and DAY regexes in grammar more accurate; correct xref in section 7.

K Allen,
T Beale,
S Garde

29 Aug 2018

SPECLANG-6: Correct minor comment errors in section 7 to do with intervals.

C Nanjo

08 Apr 2016

BASE Release 1.0.3

1.5.1

Separate from ADL spec and rename to ODIN.

T Beale

15 Apr 2013

1.5.0

Add fractional seconds to dADL grammar.

T Beale

15 Aug 2012

Release 1.0.2

1.4.1

SPEC-268: Correct missing parentheses in dADL type identifiers. dADL grammar rules updated.

R Chen

12 Dec 2008

Release 1.0.1

1.4.0

SPEC-213: Include 'T' in dADL Date/time, Time and Duration values.

T Beale

13 Mar 2007

 

SPEC-216: Allow mixture of W, D etc in ISO 8601 Duration (deviation from standard).

S Heard

 

Release 1.0

1.3.0

SPEC-143. Add partial date/time values to dADL syntax.

S Heard

18 Jun 2005

SPEC-149. Add URIs to dADL and remove query() syntax.

T Beale

Release 0.95

1.2

SPEC-110. Added explanatory material; added domain type support; rewrote of most dADL sections.

T Beale

15 Nov 2004

Release 0.9

0.9.9

SPEC-75. Added dADL object model.

T Beale

28 Dec 2003

0.9.8

SPEC-70. Copyright Assigned by Ocean Informatics P/L Australia to The openEHR Foundation.

T Beale,
S Heard

29 Nov 2003

0.9.7

Added simple value list continuation (",…​").
Changed path syntax so that trailing '/' required for object paths.
Remove ranges with excluded limits.
Added terms and term lists to dADL leaf types.

T Beale

01 Nov 2003

0.9.6

Additions during HL7 WGM Memphis Sept 2003

T Beale

09 Sep 2003

0.9.5

Renamed dDL to dADL. Changed path syntax to conform (nearly) to Xpath.

T Beale

03 Sep 2003

0.9

Rewritten with sections on dDL.

T Beale

28 July 2003

0.8

Initial Writing

T Beale

10 July 2003

Acknowledgements

Primary Author

  • Thomas Beale, Ars Semantica; openEHR Foundation Management Board; (previously Ocean Informatics UK).

Contributors

This specification has benefited from formal and informal input from the openEHR and wider health informatics community. The openEHR Foundation would like to recognise the following people for their contributions.

  • Kurt W. Allen, Applicadia Inc, Minnesota, USA

  • Rong Chen, MD, PhD, Cambio Healthcare Systems, Sweden

  • Sam Heard, MD, DRCOG, MRCGP, FRACGP, Director at Ocean Informatics, Australia

  • Claude Nanjo, MA African Studies., M Public Health, Cognitive Medical Systems Inc., California, USA

Supporters

The work reported in this paper has been funded by the following organisations:

  • University College London - Centre for Health Informatics and Multi-professional Education (CHIME);

  • Ocean Informatics.

Special thanks to David Ingram, Emeritus Professor of Health Informatics at UCL who provided a vision and collegial working environment ever since the days of GEHR (1992).

Trademarks

  • 'openEHR' is a trademark of the openEHR Foundation

  • 'Java' is a registered trademark of Oracle Corporation

  • 'Microsoft' and '.Net' are trademarks of the Microsoft Corporation

1. Preface

1.1. Purpose

This document describes the syntax of the Object Data Instance Notation (ODIN), previously known as the 'dADL' syntax from the openEHR ADL specifications. It is intended for software developers, technically-oriented domain specialists and subject matter experts (SMEs). ODIN is designed as a human-readable and computer-processible data representation syntax, and can be hand-edited using a normal text editor.

1.2. Nomenclature

In this document, the term 'attribute' denotes any stored property of a type defined in an object model, including primitive attributes and any kind of relationship such as an association or aggregation. Where XML is mentioned, XML 'attributes' are always referred to explicitly as 'XML attributes'.

1.3. Status

This specification is in the STABLE state. The development version of this document can be found at https://specifications.openehr.org/releases/LANG/latest/odin.html.

Known omissions or questions are indicated in the text with a 'to be determined' paragraph, as follows:

TBD: (example To Be Determined paragraph)

1.4. Feedback

Feedback may be provided on the openEHR languages specifications forum.

Issues may be raised on the specifications Problem Report tracker.

To see changes made due to previously reported issues, see the LANG component Change Request tracker.

1.5. Conformance

Conformance of a data or software artifact to an openEHR specification is determined by a formal test of that artifact against the relevant openEHR Implementation Technology Specification(s) (ITSs), such as an IDL interface or an XML-schema. Since ITSs are formal derivations from underlying models, ITS conformance indicates model conformance.

2. Overview

The ODIN syntax provides a formal means of expressing instance data based on an underlying object-oriented or relational information model, which is readable both by humans and machines. The general appearance is exemplified by the following:

person = (List<PERSON>) <
    [01234] = <
        name = < -- person's name
            forenames = <"Sherlock">
            family_name = <"Holmes">
            salutation = <"Mr">
        >
        address = < -- person's address
            habitation_number = <"221B">
            street_name = <"Baker St">
            city = <"London">
            country = <"England">
        >
    >
    [01235] = < -- etc >
>

In the above the attribute names person , name , address etc, and the type List<PERSON> are all assumed to come from an information model. The [01234] and [01235] tags identify container items.

The basic design principle of ODIN is to be able to represent data in a way that is both machine-processable and human-readable, while making the fewest assumptions possible about the information model to which the data conforms. To this end, type names are optional; often, only attribute names and values are explicitly shown. No syntactical assumptions are made about whether the underlying model is relational, object-oriented or what it actually looks like. More than one information model can be compatible with the same ODIN-expressed data. The UML semantics of composition/aggregation and association are expressible, as are shared objects. Literal leaf values are only of 'standard' widely recognised types, i.e. Integer, Real, Boolean, String, Character and a range of Date/time types. In standard ODIN, all complex types are expressed structurally.

The ODIN syntax as described above has a number of useful characteristics that enable the extensive use of paths to navigate it, and express references. These include:

  • each <> block corresponds to an object (i.e. an instance of some type in an information model);

  • the name before an '=' is always an attribute name or else a container element key, which attaches to the attribute of the enclosing block;

  • the formal type of leaf values can be inferred purely from syntax;

  • paths can be formed by navigating down a tree branch and concatenating attribute name, container keys (where they are encountered) and '/' characters;

  • every node is reachable by a path;

  • dynamically bound types can be explicitly indicated;

  • shared objects can be referred to by path references.

3. Basics

3.1. File Encoding

ODIN files are intended to be capable of accommodating characters from any language, and may be used for multiple languages at once, e.g. as translation files for software. Accordingly, the assumed encoding for an ODIN file is UTF-8 unicode.

ODIN files encoded in latin 1 (ISO-8859-1) or another variant of ISO-8859 - both containing accented characters with unicode codes outside the ASCII 0-127 range - may work perfectly well, for various reasons:

  • the contain nothing but ASCII, i.e. unicode code-points 0 - 127; this will be the case in English-language authored archetypes containing no foreign words;

  • some layer of the operating system is smart enough to do an on-the-fly conversion of characters above 127 into UTF-8, even if the archetype tool being used is designed for pure UTF-8 only;

  • a tool using the ODIN file might support UTF-8 and ISO-8859 variants.

For situations where binary UTF-8 cannot be supported, ASCII encoding of unicode characters above code-point 127 should only be done using the system supported by many programming languages today, namely \u escaped UTF-16. In this system, unicode codepoints are mapped to either:

  • \uHHHH - 4 hex digits which will be the same (possibly 0-filled on the left) as the unicode code-point number expressed in hexadecimal; this applies to unicode codepoints in the range U+0000 - U+FFFF (the 'base multi-lingual plane', BMP);

  • \uHHHHHHHH - 8 hex digits to encode unicode code-points in the range U+10000 through U+10FFFF (non-BMP planes); the algorithm is described in IETF RFC 2781.

It is not expected that the above approach will be commonly needed, and it may not be needed at all; it is preferable to find ways to ensure that native UTF-8 can be supported, since this reduces the burden for ODIN parser and tool implementers. The above guidance is therefore provided only to ensure a standard approach is used for ASCII-encoded unicode, if it becomes unavoidable.

Thus, while the only officially designated encoding for ODIN and its constituent syntaxes is UTF-8, real software systems may be more tolerant. This document therefore specifies that any tool designed to process ODIN files need only support UTF-8; supporting other encodings is an optional extra. This could change in the future, if required by the ODIN user community.

URIs, which have their own data type in ODIN, are handled in a specific way: all characters outside the 'unreserved set' defined by IETF RFC 3986 are 'percent-encoded'. The unreserved set is:

unreserved : ALPHA | DIGIT | '-' | '.' | '_' | '~' ;

ALPHA : [a-zA-Z] ;
DIGIT : [0-9] ;

3.2. Special Character Sequences

In string and character data values, characters not in the lower ASCII (0-127) range should normally be UTF-8 encoded, but with the option of using the following quoted forms customary in software development:

  • \r - carriage return

  • \n - linefeed

  • \t - tab

  • \\ - backslash

  • \" - literal double quote within a String terminal value

  • \' - literal single quote within a Character terminal value

Any other character combination starting with a backslash is illegal; to get the effect of a literal backslash, the \\ sequence should always be used.

Typically in a normal string, including multi-line paragraphs as used in ODIN, only \\ and \" are likely to be necessary, since all of the others can be accommodated in their literal forms; the same goes for single characters - only \\ and \' are likely to commonly occur. However, some authors may prefer to use \n and \t to make intended formatting clearer, or to allow for text editors that do not react properly to such characters. Parsers should therefore support the above list.

3.3. Keywords

ODIN has no keywords of its own: all identifiers are assumed to come from an information model.

3.4. Reserved Characters

In ODIN, a small number of characters are reserved and have the following meanings:

  • <: open an object block;

  • >: close an object block;

  • =: indicate attribute value = object block;

  • (, ): type name or plug-in syntax type delimiters;

  • <#: open an object block expressed in a plug-in syntax;

  • #>: close an object block expressed in a plug-in syntax.

Within <> delimiters, various characters are used as follows to indicate primitive values:

  • ": double quote characters are used to delimit string values;

  • ': single quote characters are used to delimit single character values;

  • |: bar characters are used to delimit intervals;

  • []: brackets are used to delimit coded terms.

3.5. Comments

In an ODIN text, comments are indicated by the characters "--". Multi-line comments are achieved using the "--" leader on each line where the comment continues.

In this document, ODIN comments are shown in brown.

3.6. Information Model Identifiers

Two types of identifiers from information models are used in ODIN: type names and attribute names. A type name is any identifier with an initial upper case letter, followed by any combination of letters, digits and underscores. A generic type name (including nested forms) additionally may include commas and angle brackets, and must be syntactically correct as per the UML. An attribute name is any identifier with an initial lower case letter, followed by any combination of letters, digits and underscores. Any convention that obeys this rule is allowed.

At least two well-known conventions that are ubiquitous in information models obey the above rule. One of these is the following convention:

  • type names are in all uppercase, e.g. PERSON , except for 'built-in' types, such as primitive types (` Integer` , String , Boolean , Real , Double ) and assumed container types (List<T> , Set<T> , Hash<T, U> ), which are in mixed case, in order to provide easy differentiation of built-in types from constructed types defined in the reference model. Built-in types are the same types assumed by UML, OCL, IDL and other similar object-oriented formalisms.

  • attribute names are shown in all lowercase, e.g. home_address .

  • in both type names and attribute names, underscores are used to represent word breaks. This convention is used to maximise the readability of this document.

The other common style is the programmer’s mixed-case or "camel case" convention exemplified by Person and homeAddress , as long as they obey the rule above. The convention chosen for any particular ODIN document should be based on the convention used in the underlying information model. Identifiers are shown in green in this document.

3.7. Semi-colons

Semi-colons can be used to separate ODIN blocks, for example when it is preferable to include multiple attribute/value pairs on one line. Semi-colons make no semantic difference at all, and are included only as a matter of taste. The following examples are equivalent:

term = <text = <"plan">; description = <"The clinician's advice">>
term = <text = <"plan"> description = <"The clinician's advice">>

term = <
    text = <"plan">
    description = <"The clinician's advice">
>

Semi-colons are completely optional in ODIN.

4. ODIN Artefacts

An ODIN text may be created in two different physical forms: embedded fragments and documents. For both the following general structure applies:

odin_text           ::=   ( schema_identifier )? main_text ;
schema_identifier   ::=   `@` schema '=' URI ;

where 'URI' is an value of the URI primitive type. This is used to indicate the schema, including its version, on which the main ODIN text is based. It is optional because in many cases the schema is known a priori, or can be inferred from context.

The following sections describe various types of ODIN artefact.

4.1. Embedded Fragment

Fragments of ODIN text can appear embedded within other artefacts. A fragment typically includes no object identifiers nor schema identifier since both of these are assumed to be inferred from the surrounding context. A typical fragment has the following appearance:

... other formalism text ...
... delimiter ...

--
-- ODIN Embedded Fragment
--
    attr_1 = <
        attr_12 = <
            attr_13 = <leaf_value>
        >
    >
    attr_2 = <
        attr_22 = <leaf_value>
    >

... delimiter ...
... other formalism text ...

4.2. Document

An ODIN document is considered a standalone artefact whose contents can take various forms, all assumed to correspond to one or more serialised objects.

4.2.1. Implicit Object Document

A document containing only an embedded fragment such as shown above is considered to be an 'implicit' object document, and its contents are assumed to consist of values of various object properties. This format is a degenerate form of the 'anonymous' object document but so common and useful it is treated as a legal ODIN form.

Implicit object documents are commonly used to serialise models, such as information model schemas, application configuration files and so on. The usual assumption is that the filename and/or ODIN content will identify the artefact sufficiently for tools to know what its information model is.

4.2.2. Anonymous Object Document

Any other kind of ODIN document contains one or more explicit objects. The anonymous object form has one object per document, with no object identifier and consists of an outer '<>' delimiter pair containing an ODIN embedded fragment, i.e.:

--
-- ODIN Anonymous Object Document
--
<
    attr_1 = <
        attr_12 = <
            attr_13 = <leaf_value>
        >
    >
    attr_2 = <
        attr_22 = <leaf_value>
    >
>

This form has no practical benefit over the implicit document form, but is syntactically more correct, and should be supported by parsers.

4.2.3. Identified Object Document

The next variant corresponds to serialisations of multiple objects, each of which is identified.

--
-- ODIN Identified Object Document
--
["id_1"] = <
    attr_1 = <
        attr_12 = <
            attr_13 = <leaf_value>
        >
    >
>

["id_2"] = <
    attr_1 = <
        attr_12 = <
            attr_13 = <leaf_value>
        >
    >
>

...

["id_N"] = <
    attr_1 = <
        attr_12 = <
            attr_13 = <leaf_value>
        >
    >
>

 

Identifiers can be values of the String, Integer or any Date/Time primitive types. Strings are most commonly used e.g.:

--
-- ODIN Identified Object Document
--

["aaa"] = <
    ...
>
["bbb"] = <
    ...
>
["ccc"] = <
    ...
>

Identified Object Documents are most commonly used for representing serialised in-memory instance networks, i.e. the notion of an 'object dump'.

5. Content

5.1. General Structure

Within any kind of ODIN object instance (i.e. implied, anonymous or identified), the structure is a hierarchy of attribute names and object values. In its simplest form, an ODIN object consists of repetitions of the following pattern:

object = attribute_name '=' '<' value '>' ;

Each attribute name is the name of an attribute in an implied or actual object or relational model. Each "value" is either a literal value of a primitive type (see section Section 7.1) or a further nesting of attribute names and values, terminating in leaf nodes of primitive type values. Where sibling attribute nodes occur, the attribute identifiers must be unique, just as in a standard object or relational model.

The following shows a typical structure.

attr_1 = <
    attr_2 = <
        attr_3 = <leaf_value>
        attr_4 = <leaf_value>
    >
    attr_5 = <
        attr_3 = <
            attr_6 = <leaf_value>
        >
        attr_7 = <leaf_value>
    >
>
attr_8 = <...>

In the above structure, each "<>" encloses an instance of some type. The hierarchical structure corresponds to the part-of relationship between objects, otherwise known as composition and aggregation relationships in object-oriented formalisms such as UML (the difference between the two is usually described as being "sub-objects related by aggregation can have independent lifetimes, whereas sub-objects related by composition have co-terminal lifetimes and are always destroyed with the parent"; ODIN does not differentiate between the two, since it is the business of a model, not the data, to express such semantics). Associations between instances in ODIN are also representable by references, and are described in the section Section 6.1.

Validity rules for object-structuring of ODIN are as follows:

VDATU: attribute name uniqueness: sibling attributes occurring within an object node must be uniquely named with respect to each other, in the same way as in class definitions in an information model.

5.2. Paths

For any ODIN structure, a set of paths can be extracted that correspond to the tree structure of the data. The complete set of paths for the above example is as follows.

    /attr_1
    /attr_1/attr_2
    /attr_1/attr_2/attr_3           -- path to a leaf value
    /attr_1/attr_2/attr_4           -- path to a leaf value
    /attr_1/attr_5
    /attr_1/attr_5/attr_3
    /attr_1/attr_5/attr_3/attr_6    -- path to a leaf value
    /attr_1/attr_5/attr_7           -- path to a leaf value
    /attr_8

The path syntax used with ODIN maps trivially to W3C Xpath and Xquery paths, and is described in section Section 8.

5.3. Void Objects

A void object, i.e. an object attribute that has no value is allowed in an ODIN text, but ignored by parsers. It is legal to output void objects, but not recommended. A void object looks as follows:

    address = <...>    -- person's address

5.4. Container Objects

The syntax described so far allows an instance of an arbitrarily large object to be expressed, but does not support attributes of container types such as lists, sets and hash tables, i.e. items whose type in an underlying reference model is something like attr:List<Type> , attr:Set<Type> or attr: Hash<KeyType, ValueType> . There are two ways instance data of such container objects can be expressed in ODIN. The first applies to leaf values and is to use a list style literal value for , where the "list nature" of the data is expressed within the manifest value itself, as in the following examples.

fruits = <"pear", "cumquat", "peach">
some_primes = <1, 2, 3, 5>

See Section 7.4 for the complete description of list leaf types.

However for containers holding non-primitive values, including more container objects, a different syntax is needed. Consider by way of example that an instance of the container List<Person> could in theory be expressed as follows.

-- WARNING: THIS IS NOT VALID ODIN

people = <
    <name = <...> date_of_birth = <...> sex = <...> interests = <...> >
    <name = <...> date_of_birth = <...> sex = <...> interests = <...> >
    -- etc
>

Here, 'anonymous' blocks of data are repeated inside the outer block. However, this makes the data hard to read, and does not provide an easy way of constructing paths to the contained items. A better syntax becomes more obvious when we consider that members of container objects in their computable form are nearly always accessed by a method such as member(i) , item[i] or just plain [i] , in the case of array access in the C-based languages.

ODIN opts for the array-style syntax, known in ODIN as container member keys. No attribute name is explicitly given; any primitive comparable value is allowed as the key, rather than just integers used in C-style array access. Further, if integers are used, it is not assumed that they dictate ordinal indexing, i.e. it is possible to use a series of keys [2] , [4] , [8] etc. The following example shows one version of the above container in valid ODIN:

people = <
    [1] = <name = <...> birth_date = <...> interests = <...> >
    [2] = <name = <...> birth_date = <...> interests = <...> >
    [3] = <name = <...> birth_date = <...> interests = <...> >
>

Strings and dates may also be used. Keys are coloured blue in this specification in order to distinguish the run-time status of key values from the design-time status of class and attribute names. The following example shows the use of string values as keys for the contained items.

people = <
    ["akmal:1975-04-22"] = <name = <...> birth_date = <...> interests = <...> >
    ["akmal:1962-02-11"] = <name = <...> birth_date = <...> interests = <...> >
    ["gianni:1978-11-30"] = <name = <...> birth_date = <...> interests = <...> >
>

The syntax for primitive values used as keys follows exactly the same syntax described below for data of primitive types. It is convenient in some cases to construct key values from one or more of the values of the contained items, in the same way as relational database keys are constructed from sufficient field values to guarantee uniqueness. However, they need not be - they may be independent of the contained data, as in the case of hash tables, where the keys are part of the hash table structure, or equally, they may simply be integer index values, as in the 'locations' attribute in the 'school_schedule' structure shown below.

Container structures can appear anywhere in an overall instance structure, allowing complex data such as the following to be expressed in a readable way.

school_schedule = <
    lesson_times = <08:30:00, 09:30:00, 10:30:00, ...>

    locations = <
        [1] = <"under the big plane tree">
        [2] = <"under the north arch">
        [3] = <"in a garden">
    >

    subjects = <
        ["philosophy:plato"] = < -- note construction of key
            name = <"philosophy">
            teacher = <"plato">
            topics = <"meta-physics", "natural science">
            weighting = <76%>
        >
        ["philosophy:kant"] = <
            name = <"philosophy">
            teacher = <"kant">
            topics = <"meaning and reason", "meta-physics", "ethics">
            weighting = <80%>
        >
        ["art"] = <
            name = <"art">
            teacher = <"goya">
            topics = <"technique", "portraiture", "satire">
            weighting = <78%>
        >
    >
>

The example above conforms directly to the object-oriented type specification (given in a pascal-like syntax):

class SCHEDULE
    lesson_times: List<Time>
    locations: List<String>
    subjects: List<SUBJECT> -- or it could be Hash<SUBJECT>
end

class SUBJECT
    name: String
    teacher: String
    topics: List<String>
    weighting: Real
end

Other class specifications corresponding to the same data are possible, but will all be isomorphic to the above.

How key values relate to a particular object structure depends on the object model being used during the ODIN parsing process. It is possible to write a parser which makes reasonable inferences from an information model whose instances are represented as ODIN text; it is also possible to include explicit typing information in the ODIN itself (see section Section 5.6 below).

The validity rule for objects within a container attribute is as follows:

VDOBU: object identifier uniqueness: sibling objects occurring within a container attribute must be uniquely identified with respect to each other.

Paths through container objects are formed in the same way as paths in other structured data, with the addition of the key, to ensure uniqueness. The key is included syntactically enclosed in brackets, in a similar way to Xpath predicates. Paths through containers in the above example include the following:

/school_schedule/locations[1]                   -- path to "under the big..."
/school_schedule/subjects["philosophy:kant"]    -- path to "kant"

5.5. Nested Container Objects

In some cases the data of interest are instances of nested container types, such as List<List<Message>> (a list of Message lists) or Hash<List<Integer>, String> (a hash of integer lists keyed by strings). The ODIN syntax for such structures follows directly from the syntax for a single container object. The following example shows an instance of the type List<List<String>> expressed in ODIN syntax.

list_of_string_lists = <
    [1] = <
        [1] = <"first string in first list">
        [2] = <"second string in first list">
    >
    [2] = <
        [1] = <"first string in second list">
        [2] = <"second string in second list">
        [3] = <"third string in second list">
    >
    [3] = <
        [1] = <"only string in third list">
    >
>

The paths of the above example are as follows:

/list_of_string_lists[1]/[1]
/list_of_string_lists[1]/[2]
/list_of_string_lists[2]/[1]
etc

5.6. Adding Type Information

In many cases, ODIN data is of a simple structure, very regular, and highly repetitive, such as the expression of simple demographic data. In such cases, it is preferable to express as little as possible about the information model on which the data are based, since various software components want to use the data, and use it in different ways. However, there are also cases where the data is highly complex, and more model information is needed to help a parser. Examples include large design databases for aircraft and health records. Data obeying more complex models typically include sub-objects that are of a subtype of the statically declared type in the information model, in other words, dynamically bound types.

Where dynamic binding occurs in the data, it must be indicated in an ODIN document. Typing information is added to using a syntactical addition inspired by the (type) casting operator of the C language, whose meaning is approximately: force the type of the result of the following expression to be type. In ODIN typing is therefore done by including the type name in parentheses after the '=' sign, as in the following example.

destinations = <
    ["seville"] = (TOURIST_DESTINATION) <
        profile = (DESTINATION_PROFILE) <...>
        hotels = <
            ["gran sevilla"] = (HISTORIC_HOTEL) <...>
            ["sofitel"] = (LUXURY_HOTEL) <...>
            ["hotel real"] = (PENSION) <...>
        >
        attractions = <
            ["la corrida"] = (SPORT_VENUE) <...>
            ["Alcázar"] = (HISTORIC_SITE) <...>
        >
    >
>

The path set from the above example is as follows:

/destinations["seville"]/hotels["gran sevilla"]
/destinations["seville"]/hotels["sofitel"]
/destinations["seville"]/hotels["hotel real"]

/bookings["seville:0134"]/customer_id
/bookings["seville:0134"]/period
/bookings["seville:0134"]/hotel

/hotels["sofitel"]
/hotels["hotel real"]
/hotels["gran sevilla"]

In the above, no type identifiers are included after the hotels and attractions attributes, so it is assumed by the parser that they are of their statically declared type, typically something like List<HOTEL> and List<ATTRACTION> respectively. Nevertheless, complete typing information can be included, as follows.

hotels = (List<HOTEL>) <
    ["gran sevilla"] = (HISTORIC_HOTEL) <...>
>

This illustrates the use of generic, i.e. template types, expressed in the standard UML syntax, using angle brackets. Any number of template arguments and any level of nesting is allowed, as in the UML. At first view, there may appear to be a risk of confusion between template type '<>' delimiters and the standard ODIN block delimiters. However the parsing rules are easy to state; essentially the difference is that an ODIN data block is always preceded by an '=' symbol.

Type identifiers can also include namespace information, which is necessary when same-named types appear in different packages of a model. Namespaces are included by prepending package names, separated by the '.' character, in the same way as in most programming languages, as in the qualified type names org.openehr.rm.ehr.content.ENTRY and Core.Abstractions.Relationships.Relationship.

6. References

6.1. Associations and Shared Objects

All of the facilities described so far allow any object-oriented data to be faithfully expressed in a formal, systematic way which is both machine- and human-readable, and allow any node in the data to be addressed using an Xpath-style path. The availability of reliable paths allows not only the representation of single 'business objects', which are the equivalent of UML aggregation (and composition) hierarchies, but also the representation of associations between objects, and by extension, shared objects.

6.1.1. Within An Object

Consider that in the example above, 'hotel' objects may be shared objects, referred to by association. This can be expressed as follows.

destinations = <
    ["seville"] = <
        hotels = <
            ["gran sevilla"] = </hotels["gran sevilla"]>
            ["sofitel"] = </hotels["sofitel"]>
            ["hotel real"] = </hotels["hotel real"]>
        >
    >
>

bookings = <
    ["seville:0134"] = <
        customer_id = <"0134">
        period = <...>
        hotel = </hotels["sofitel"]>
    >
>

hotels = <
    ["gran sevilla"] = (HISTORIC_HOTEL) <...>
    ["sofitel"] = (LUXURY_HOTEL) <...>
    ["hotel real"] = (PENSION) <...>
>

Associations are expressed via the use of fully qualified paths as the data for an attribute. In this example, there are references from a list of destinations, and from a booking list, to the same hotel object. If type information is included, it should go in the declarations of the relevant objects; type declarations can also be used before path references, which might be useful if the association type is an ancestor type (i.e. more general type) of the type of the actual object being referred to.

6.1.2. Across Objects

In an ODIN document containing identified objects, with references across objects, reference paths will include object identifiers, as shown below:

["travel_db_0293822"] = <
    destinations = <
        ["seville"] = <
            hotels = <
                ["gran sevilla"] = <["tourism_db_13"]/hotels["gran sevilla"]>
                ["sofitel"] = <["tourism_db_13"]/hotels["sofitel"]>
                ["hotel real"] = <["tourism_db_13"]/hotels["hotel real"]>
            >
        >
    >

    bookings = <
        ["seville:0134"] = <
            customer_id = <"0134">
            period = <...>
            hotel = <["tourism_db_13"]/hotels["sofitel"]>
        >
    >
>

["tourism_db_13"] = <
    hotels = <
        ["gran sevilla"] = (HISTORIC_HOTEL) <...>
        ["sofitel"] = (LUXURY_HOTEL) <...>
        ["hotel real"] = (PENSION) <...>
    >
>

6.1.3. Across ODIN Documents

Data in other ODIN documents can be referred to using a URI containing a reference path to locate the document, with the internal path included as described above.

7. Leaf Data

All ODIN data eventually devolve to instances of the primitive types String , Integer , Real , Double , String , Character , various date/time types, lists or intervals of these types, and a few special types. ODIN does not use type or attribute names for instances of primitive types, only manifest values, making it possible to assume as little as possible about type names and structures of the primitive types. In all the following examples, the manifest data values are assumed to appear immediately inside a leaf pair of angle brackets, i.e.

some_attribute = <manifest value>

7.1. Primitive Types

7.1.1. Character Data

Characters are shown in a number of ways. In the literal form, a character is shown in single quotes, as follows:

    'a'

Characters outside the low ASCII (0-127) range must be UTF-8 encoded, with a small number of backslash-quoted ASCII characters allowed, as described in the section Section 3.1.

7.1.2. String Data

All strings are enclosed in double quotes, as follows:

    "this is a string"

Quotes are encoded using ISO/IEC 10646 codes, e.g. :

    "this is a much longer string, what one might call a &quot;phrase&quot;."

Line extension of strings is done simply by including returns in the string. The exact contents of the string are computed as being the characters between the double quote characters, with the removal of white space leaders up to the left-most character of the first line of the string. This has the effect of allowing the inclusion of multi-line strings in ODIN texts, in their most natural human-readable form, e.g.:

    text = <"And now the STORM-BLAST came, and he
            Was tyrannous and strong :
            He struck with his o'ertaking wings,
            And chased us south along.">

String data can be used to contain almost any other kind of data, which is intended to be parsed as some other formalism. Characters outside the low ASCII (0-127) range must be UTF-8 encoded, with a small number of backslash-quoted ASCII characters allowed, as described in section Section 3.1.

7.1.3. Integer Data

Integers are represented simply as numbers, e.g.:

    25
    300000
    29e6

Commas or periods for breaking long numbers are not allowed, since they confuse the use of commas used to denote list items (See Section 7.4 below).

7.1.4. Real Data

Real numbers are assumed whenever a decimal is detected in a number, e.g.:

    25.0
    3.1415926
    6.023e23

Commas or periods for breaking long numbers are not allowed. Only periods may be used to separate the decimal part of a number; unfortunately, the European use of the comma for this purpose conflicts with the use of the comma to distinguish list items (See Section 7.4 below).

7.1.5. Boolean Data

Boolean values can be indicated by the following values (case-insensitive):

    True
    False

7.1.6. Dates and Times

7.1.6.1. Complete Date/Times

In ODIN, full and partial dates, times and durations can be expressed. All full dates, times and durations are expressed using a subset of ISO 8601. The openEHR Foundation Types specification provides a full explanation of the ISO 8601 semantics supported in openEHR.

In ODIN, the use of ISO 8601 allows extended form only (i.e. ':' and '-' must be used). The ISO 8601 method of representing partial dates consisting of a single year number, and partial times consisting of hours only are not supported, since they are ambiguous. See below for partial forms.

Durations are expressed using a string which starts with 'P', and is followed by a list of periods, each appended by a single letter designator: 'Y' for years, "M' for months, 'W' for weeks, 'D' for days, 'H' for hours, 'M' for minutes, and 'S' for seconds. The literal 'T' separates the YMWD part from the HMS part, ensuring that months and minutes can be distinguished. Examples of date/time data include:

    1919-01-23                 -- birthdate of Django Reinhardt
    16:35:04,5                 -- rise of Venus in Sydney on 24 Jul 2003
    2001-05-12T07:35:20+1000   -- timestamp on an email received from Australia
    P22DT4H15M0S               -- period of 22 days, 4 hours, 15 minutes
7.1.6.2. Partial Date/Times

Two ways of expressing partial (i.e. incomplete) date/times are supported in ODIN. The ISO 8601 incomplete formats are supported in extended form only (i.e. with '-' and ':' separators) for all patterns that are unambiguous on their own. Dates consisting of only the year, and times consisting of only the hour are not supported, since both of these syntactically look like integers. The supported ISO 8601 patterns are as follows:

    yyyy-MM            -- a date with no days
    hh:mm              -- a time with no seconds
    yyyy-MM-ddThh:mm   -- a date/time with no seconds
    yyyy-MM-ddThh      -- a date/time, no minutes or seconds

To deal with the limitations of ISO 8601 partial patterns in a context-free parsing environment, a second form of pattern is supported in ODIN, based on ISO 8601. In this form, '?' characters are substituted for missing digits. Valid partial dates follow the patterns (corresponding to reduced accuracy forms described in ISO 8601-2004, section 4.1.4.3):

    yyyy-MM-??         -- date with unknown day in month
    yyyy-??-??         -- date with unknown month and day

Valid partial times follow the patterns (corresponding to reduced accuracy forms described in ISO 8601-2004, section 4.2.2.3):

    hh:mm:??           -- time with unknown seconds
    hh:??:??           -- time with unknown minutes and seconds

Valid date/times follow the patterns (corresponding to reduced accuracy forms described in ISO 8601-2004, section 4.3.3):

    yyyy-MM-ddThh:mm:??     -- date/time with unknown seconds
    yyyy-MM-ddThh:??:??     -- date/time with unknown minutes and seconds

7.2. Intervals of Ordered Primitive Types

Intervals of any ordered primitive type, i.e., Integer, Real, Date, Time, Date_time and Duration, can be stated using the following uniform syntax, where N, M are instances of any of the ordered types:

    |N..M|        -- the two-sided range N >= x <= M;
    |>N..M|       -- the two-sided range N > x <= M;
    |N..<M|       -- the two-sided range N <= x <M;
    |>N..<M|      -- the two-sided range N > x <M;
    |<N|          -- the one-sided range x < N;
    |>N|          -- the one-sided range x > N;
    |>=N|         -- the one-sided range x >= N;
    |<=N|         -- the one-sided range x <= N;
    |N +/-M|      -- interval of N ±M.
    |N±M|         -- interval of N ±M.

The allowable values for N and M include any value in the range of the relevant type.

Note
for the plus/minus form (N ± M format), spaces are not significant either side of the '±' sign or the equivalent '+/-' string.

Examples of this syntax include:

    |0..5|              -- integer interval
    |0.0..1000.0|       -- real interval
    |0.0..<1000.0|      -- real interval 0.0 <= x < 1000.0
    |08:02..09:10|      -- interval of time
    |>=1939-02-01|      -- open-ended interval of dates
    |5.0 ±0.5|          -- 4.5 ±5.5
    |5.0 +/-0.5|        -- 4.5 ±5.5
    |>=0|               -- >= 0

7.3. Other Built-in Types

7.3.1. URIs

URI can be expressed as ODIN data in the usual way found on the web, and follow the standard syntax from IETF RFC3986. Examples of URIs in ODIN:

    http://openEHR.org/home
    ftp://get.this.file.com?file=cats.doc#section_5
    http://www.mozilla.org/products/firefox/upgrade/?application=thunderbird

Encoding of special characters in URIs follows the IETF RFC 3986, as described in the section Section 3.1.

7.3.2. Coded Terms

Coded terms are ubiquitous in medical and clinical information, and are likely to become so in most other industries, as ontologically-based information systems and the 'semantic web' emerge. The logical structure of a coded term is simple: it consists of an identifier of a terminology (with optional version), and an identifier of a code within that terminology. The ODIN string representation is of the following form:

    [terminology_id::code]

where terminology_id is an alphanumeric name, optionally followed by a version in parentheses, and code is a string. The allowed characters in each part are described in the grammar.

Examples from clinical data:

    [icd10AM::F60.1]            -- from ICD10AM
    [snomed_ct::2004950]        -- from snomed-ct
    [snomed_ct(3.1)::2004950]   -- from snomed-ct v 3.1

7.4. Lists of Built-in Types

Data of any primitive type can occur singly or in lists, which are shown as comma-separated lists of items, all of the same type, such as in the following examples:

    "cyan", "magenta", "yellow", "black"    -- printer's colours
    1, 1, 2, 3, 5                           -- first 5 fibonacci numbers
    08:02, 08:35, 09:10                     -- set of train times

No assumption is made in the syntax about whether a list represents a set, a list or some other kind of sequence - such semantics must be taken from an underlying information model.

Lists which happen to have only one datum are indicated by using a comma followed by a list continuation marker of three dots, i.e. "…​", e.g.:

    "en", ...       -- languages
    "icd10", ...    -- terminologies
    [at0200], ...

White space may be freely used or avoided in lists, i.e. the following two lists are identical:

    1,1,2,3
    1, 1, 2,3

8. Path Syntax

8.1. Semantics

The general form of the path syntax is as follows (see syntax section below for full specification):

path            =     ['/'] path_segment { '/' path_segment } ;
path_segment    =     attr_name [ '[' object_id ']' ] ;

Essentially, ODIN paths consist of segments separated by slashes ('/'), where each segment is an attribute name with optional object identifier predicate, indicated by brackets ('[]').

ODIN Paths are formed from an alternation of segments made up of an attribute name and optional object node identifier predicate, separated by slash ('/') characters. Node identifiers are delimited by brackets (i.e. []).

Paths are absolute or relative with respect to the document in which they are mentioned. Absolute paths commence with an initial slash ('/') character.

A typical ODIN path used to refer to a node in an ODIN text is as follows.

/term_definitions["en"]/items["at0001"]/text

In the following sections, paths are shown for all the ODIN data examples.

8.2. Relationship with W3C Xpath

The ODIN path syntax is semantically a subset of the Xpath query language, with a few syntactic shortcuts to reduce the verbosity of the most common cases. Xpath differentiates between "children" and "attributes" sub-items of an object due to the difference in XML between Elements (true sub-objects) and Attributes (tag-embedded primitive values). In ODIN, as with any pure object formalism, there is no such distinction, and all sub-parts of any object are referenced in the manner of Xpath children; in particular, in the Xpath abbreviated syntax, the key child:: does not need to be used.

ODIN does not distinguish attributes from children, and also assumes the node_id attribute. Thus, the following expressions are legal for cADL structures:

items[1]            -- the first member of 'items'
items["systolic"]   -- the member of 'items' with meaning 'systolic'
items["at0001"]     -- the member of 'items' with node id 'at0001'

The Xpath equivalents are:

items[1]                                -- the first member of 'items'
items[@key = 'systolic']                -- the member of 'items' with key "systolic"
items[@archetype_node_id = 'at0001']    -- the member of 'items' with archetype_node_id attribute 'at0001'

9. Plug-in Syntaxes

Using the ODIN syntax, any object structure can be serialised. In some cases, the requirement is to express some part of the structure in an abstract syntax, rather than in the more literal serialised object form of ODIN. ODIN provides for this possibility by allowing the value of any object (i.e. what appears between any matching pair of <> delimiters) to be expressed in some other syntax, known as a "plug-in" syntax. Plug-in syntaxes are indicated in ODIN in a similar way as typed objects, i.e. by the use of the syntax type in parentheses preceding the <> block. For a plug-in section, the <> delimiters are modified to <# #>, to allow for easier parser design, and easier recognition of such blocks by human readers. The general form is as follows:

attr_name = (syntax) <#
...
#>

The following example illustrates a cADL plug-in section in an archetype, which it itself an ODIN document:

definition = (cadl) <#
    ENTRY[at0000] ∈ { -- blood pressure measurement
        name ∈ { -- any synonym of BP
            CODED_TEXT ∈ {
                code ∈ {
                    CODE_PHRASE ∈ {[ac0001]}
                }
            }
        }
    }
#>

Clearly, many plug-in syntaxes might one day be used within ODIN data; there is no guarantee that every ODIN parser will support them. The general approach to parsing should be to use plug-in parsers, i.e. to obtain a parser for a plug-in syntax that can be built into the existing parser framework.

Appendix A: Relationship with other Syntaxes

A.1. XML

A common question about ODIN is why it is needed, when there is already XML? To start with, this question highlights the widespread misconception about XML, namely that because it can be read by a text editor, it is intended for humans. In fact, XML is designed for machine processing, and is textual to guarantee its interoperability, not its readability. Realistic examples of XML (e.g. XML-schema instance, OWL-RDF ontologies) are generally unreadable for humans. ODIN is on the other hand designed as a human-writable and readable formalism that is also machine processable; it may be thought of as an abstract syntax for object-oriented data. ODIN also differs from XML by:

  • providing a more comprehensive set of leaf data types, including intervals of numerics and date/time types, and lists of all primitive types;

  • adhering to object-oriented semantics, particularly for container types, which XML schema languages generally do not;

  • not using the confusing XML notion of 'attributes' and 'elements' to represent what are essentially object properties;

  • requiring roughly half the space of the equivalent XML.

This does not prevent ODIN documents being converted to XML and indeed the conversion to XML instance is rather straightforward.

A.1.1. Expression of ODIN in XML

The ODIN syntax maps relatively easily to XML instance. It is important to realise that developers using XML often develop different mappings for object-oriented data, due to the fact that XML does not have systematic object-oriented semantics. This is particularly the case where containers such as lists and sets such as 'employees: List<Person>' are mapped to XML; many implementors have to invent additional tags such as 'employee' to make the mapping appear visually correct. The particular mapping chosen here is designed to be a faithful reflection of the semantics of the object-oriented data, and does not try to take into account visual aesthetics of the XML. The result is that Xpath expressions are the same for ODIN and XML, and also correspond to what one would expect based on an underlying object model.

The main elements of the mapping are as follows.

A.1.1.1. Single Attributes

Single attribute nodes map to tagged nodes of the same name.

A.1.1.2. Container Attributes

Container attribute nodes map to a series of tagged nodes of the same name, each with the XML attribute 'id' set to the ODIN key. For example, the ODIN:

subjects = <
    ["philosophy:plato"] = <
        name = <"philosophy">
    >
    ["philosophy:kant"] = <
        name = <"philosophy">
    >
>

maps to the XML:

<subjects id="philosophy:plato">
    <name> philosophy </name>
</subjects>
<subjects id="philosophy:kant">
    <name> philosophy </name>
</subjects>

This guarantees that the path subjects[@id='philosophy:plato']/name navigates to the same element in both ODIN and the XML.

A.1.1.3. Nested Container Attributes

Nested container attribute nodes map to a series of tagged nodes of the same name, each with the XML attribute 'id' set to the ODIN key. For example, consider an object structure defined by the signature countries:Hash<Hash<Hotel,String>,String> . An instance of this in ODIN looks as follows:

countries = <
    ["spain"] = <
        ["hotels"] = <...>
        ["attractions"] = <...>
    >
    ["egypt"] = <
        ["hotels"] = <...>
        ["attractions"] = <...>
    >
>

can be mapped to the XML in which the synthesised element tag "_items" and the attribute "key" are used:

<countries key="spain">
    <_items key="hotels">
        ...
    </_items>
    <_items key="attractions">
        ...
    </_items>
</countries>

<countries key="egypt">
    <_items id="hotels">
    ...
    </_items>
    <_items key="attractions">
    ...
    </_items>
</countries>

In this case, the ODIN path countries["spain"]/["hotels"] will be transformed to the Xpath countries[@key="spain"]/_items[@key="hotels"] in order to navigate to the same element.

A.1.1.4. Type Names

Type names map to XML 'type' attributes e.g. the ODIN:

destinations = <
    ["seville"] = (TOURIST_DESTINATION) <
        profile = (DESTINATION_PROFILE) <...>
        hotels = <
            ["gran sevilla"] = (HISTORIC_HOTEL) <...>
        >
    >
>

maps to:

<destinations id="seville" rm:type="TOURIST_DESTINATION">
	<profile rm:type="DESTINATION_PROFILE">
		...
	</profile>
	<hotels id="gran sevilla" rm:type="HISTORIC_HOTEL">
		...
	</hotels>
</destinations>

A.2. JSON

The JavaScript Object Notation (JSON) was designed with the aim of representing JavaScript data objects in a programming language independent way, primarily for use with the web and JavaScript. The majority of use was for small fragments, although in more recent years it is starting to be used for more complex data representation tasks, for example with REST web services.

A.2.1. Leaf types

ODIN has more terminal types than JSON, including the date/time types, and the Interval types.

Date/time types would typically be mapped to and from Strings containing ISO 8601 syntax dates and times.

The interval is a built-in ODIN type that would need to be explicitly expanded into a JSON structure, with an assumed model of the parts of the Interval. For this purpose, the following model is recommended as a basis for constructing the JSON equivalent:

class Interval <T: Ordered> {
    T lower;
    T upper;
    Boolean lower_included;
    Boolean upper_included;
}

A.2.2. Typing

ODIN supports optional type markers, which are not available with JSON. In a conversion situation these would need to be converted to an explicit structure.

Appendix B: Syntax Specification

The grammar and lexical specification for the standard ODIN syntax is shown below in ANTLR4 form.  

//
// description: Antlr4 grammar for Object Data Instance Notation (ODIN)
// author:      Thomas Beale <thomas.beale@openehr.org>
// contributors:Pieter Bos <pieter.bos@nedap.com>
// support:     openEHR Specifications PR tracker <https://openehr.atlassian.net/projects/SPECPR/issues>
// copyright:   Copyright (c) 2015- openEHR Foundation <http://www.openEHR.org>
// license:     Apache 2.0 License <http://www.apache.org/licenses/LICENSE-2.0.html>
//

grammar odin;
import odin_values;

//
// -------------------------- Parse Rules --------------------------
//

odin_text :
      attr_vals
    | object_value_block
	;

attr_vals : ( attr_val ';'? )+ ;

attr_val : odin_object_key '=' object_block ;

odin_object_key : ALPHA_UC_ID | ALPHA_UNDERSCORE_ID | rm_attribute_id ;

object_block :
      object_value_block
    | object_reference_block
    ;

object_value_block : ( '(' rm_type_id ')' )? '<' ( primitive_object | attr_vals? | keyed_object* ) '>' | EMBEDDED_URI;

keyed_object : '[' key_id ']' '=' object_block ;

key_id :
      string_value
    | integer_value
    | date_value
    | time_value
    | date_time_value
    ;

object_reference_block : '<' odin_path_list '>' ;

odin_path_list  : odin_path ( ( ',' odin_path )+ | SYM_LIST_CONTINUE )? ;
odin_path       : '/' | ADL_PATH ;

rm_type_id      : ALPHA_UC_ID ( '<' rm_type_id ( ',' rm_type_id )* '>' )? ;
rm_attribute_id : ALPHA_LC_ID ;

The following grammar defines ODIN terminal value syntax, and can be imported by any parser needing the ODIN values.

//
// grammar defining ODIN terminal value types, including atoms, lists and intervals
// author:      Pieter Bos <pieter.bos@nedap.com>
// support:     openEHR Specifications PR tracker <https://openehr.atlassian.net/projects/SPECPR/issues>
// copyright:   Copyright (c) 2018- openEHR Foundation <http://www.openEHR.org>
//

grammar odin_values;
import base_lexer;

//
// ========================= Parser ============================
//

primitive_object :
      primitive_value
    | primitive_list_value
    | primitive_interval_value
    ;

primitive_value :
      string_value
    | integer_value
    | real_value
    | boolean_value
    | character_value
    | term_code_value
    | date_value
    | time_value
    | date_time_value
    | duration_value
    ;

primitive_list_value :
      string_list_value
    | integer_list_value
    | real_list_value
    | boolean_list_value
    | character_list_value
    | term_code_list_value
    | date_list_value
    | time_list_value
    | date_time_list_value
    | duration_list_value
    ;

primitive_interval_value :
      integer_interval_value
    | real_interval_value
    | date_interval_value
    | time_interval_value
    | date_time_interval_value
    | duration_interval_value
    ;

string_value : STRING ;
string_list_value : string_value ( ( ',' string_value )+ | ',' SYM_LIST_CONTINUE ) ;

integer_value : ( '+' | '-' )? INTEGER ;
integer_list_value : integer_value ( ( ',' integer_value )+ | ',' SYM_LIST_CONTINUE ) ;
integer_interval_value :
      '|' SYM_GT? integer_value '..' SYM_LT? integer_value '|'
    | '|' relop? integer_value '|'
    | '|' integer_value SYM_PLUS_OR_MINUS integer_value '|'
    ;
integer_interval_list_value : integer_interval_value ( ( ',' integer_interval_value )+ | ',' SYM_LIST_CONTINUE ) ;

real_value : ( '+' | '-' )? REAL ;
real_list_value : real_value ( ( ',' real_value )+ | ',' SYM_LIST_CONTINUE ) ;
real_interval_value :
      '|' SYM_GT? real_value '..' SYM_LT? real_value '|'
    | '|' relop? real_value '|'
    | '|' real_value SYM_PLUS_OR_MINUS real_value '|'
    ;
real_interval_list_value : real_interval_value ( ( ',' real_interval_value )+ | ',' SYM_LIST_CONTINUE ) ;

boolean_value : SYM_TRUE | SYM_FALSE ;
boolean_list_value : boolean_value ( ( ',' boolean_value )+ | ',' SYM_LIST_CONTINUE ) ;

character_value : CHARACTER ;
character_list_value : character_value ( ( ',' character_value )+ | ',' SYM_LIST_CONTINUE ) ;

date_value : ISO8601_DATE ;
date_list_value : date_value ( ( ',' date_value )+ | ',' SYM_LIST_CONTINUE ) ;
date_interval_value :
      '|' SYM_GT? date_value '..' SYM_LT? date_value '|'
    | '|' relop? date_value '|'
    | '|' date_value SYM_PLUS_OR_MINUS duration_value '|'
    ;
date_interval_list_value : date_interval_value ( ( ',' date_interval_value )+ | ',' SYM_LIST_CONTINUE ) ;

time_value : ISO8601_TIME ;
time_list_value : time_value ( ( ',' time_value )+ | ',' SYM_LIST_CONTINUE ) ;
time_interval_value :
      '|' SYM_GT? time_value '..' SYM_LT? time_value '|'
    | '|' relop? time_value '|'
    | '|' time_value SYM_PLUS_OR_MINUS duration_value '|'
    ;
time_interval_list_value : time_interval_value ( ( ',' time_interval_value )+ | ',' SYM_LIST_CONTINUE ) ;

date_time_value : ISO8601_DATE_TIME ;
date_time_list_value : date_time_value ( ( ',' date_time_value )+ | ',' SYM_LIST_CONTINUE ) ;
date_time_interval_value :
      '|' SYM_GT? date_time_value '..' SYM_LT? date_time_value '|'
    | '|' relop? date_time_value '|'
    | '|' date_time_value SYM_PLUS_OR_MINUS duration_value '|'
    ;
date_time_interval_list_value : date_time_interval_value ( ( ',' date_time_interval_value )+ | ',' SYM_LIST_CONTINUE ) ;

duration_value : ISO8601_DURATION ;
duration_list_value : duration_value ( ( ',' duration_value )+ | ',' SYM_LIST_CONTINUE ) ;
duration_interval_value :
      '|' SYM_GT? duration_value '..' SYM_LT? duration_value '|'
    | '|' relop? duration_value '|'
    | '|' duration_value SYM_PLUS_OR_MINUS duration_value '|'
    ;
duration_interval_list_value : duration_interval_value ( ( ',' duration_interval_value )+ | ',' SYM_LIST_CONTINUE ) ;

term_code_value : TERM_CODE_REF ;
term_code_list_value : term_code_value ( ( ',' term_code_value )+ | ',' SYM_LIST_CONTINUE ) ;

relop : SYM_GT | SYM_LT | SYM_LE | SYM_GE ;

The following grammar defines lexer patterns of generic lexical tokens.

//
//  General purpose patterns used in all openEHR parser and lexer tools
//  author:      Pieter Bos <pieter.bos@nedap.com>
//  support:     openEHR Specifications PR tracker <https://openehr.atlassian.net/projects/SPECPR/issues>
//  copyright:   Copyright (c) 2018- openEHR Foundation <http://www.openEHR.org>
//

lexer grammar base_lexer;

// ---------- whitespace & comments ----------

WS         : [ \t\r]+    -> channel(HIDDEN) ;
LINE       : '\r'? EOL  -> channel(HIDDEN) ;  // increment line count
CMT_LINE   : '--' .*? '\r'? EOL  -> skip ;   // (increment line count)
fragment EOL : '\n' ;


// -------------- template overlay cannot be handled in a more simple way because it includes the comment line
SYM_TEMPLATE_OVERLAY : H_CMT_LINE (WS|LINE)* SYM_TEMPLATE_OVERLAY_ONLY;
fragment H_CMT_LINE : '--------' '-'*? ('\n'|'\r''\n'|'\r')  ;  // special type of comment for splitting template overlays
fragment SYM_TEMPLATE_OVERLAY_ONLY     : [Tt][Ee][Mm][Pp][Ll][Aa][Tt][Ee]'_'[Oo][Vv][Ee][Rr][Ll][Aa][Yy] ;

// ---------- path patterns -----------

ADL_PATH : ADL_ABSOLUTE_PATH | ADL_RELATIVE_PATH;
fragment ADL_ABSOLUTE_PATH : ('/' ADL_PATH_SEGMENT)+;
fragment ADL_RELATIVE_PATH : ADL_PATH_SEGMENT ('/' ADL_PATH_SEGMENT)+;

fragment ADL_PATH_SEGMENT      : ALPHA_LC_ID ('[' ADL_PATH_ATTRIBUTE ']')?;
fragment ADL_PATH_ATTRIBUTE    : ID_CODE | STRING | INTEGER | ARCHETYPE_REF | ARCHETYPE_HRID;


// ---------- ISO8601-based date/time/duration constraint patterns

DATE_CONSTRAINT_PATTERN      : YEAR_PATTERN '-' MONTH_PATTERN '-' DAY_PATTERN ;
TIME_CONSTRAINT_PATTERN      : HOUR_PATTERN ':' MINUTE_PATTERN ':' SECOND_PATTERN TZ_PATTERN? ;
DATE_TIME_CONSTRAINT_PATTERN : DATE_CONSTRAINT_PATTERN 'T' TIME_CONSTRAINT_PATTERN ;

fragment YEAR_PATTERN   : 'yyyy' | 'YYYY' | 'yyy' | 'YYY' ;
fragment MONTH_PATTERN  : 'mm' | 'MM' | '??' | 'XX' | 'xx' ;
fragment DAY_PATTERN    : 'dd' | 'DD' | '??' | 'XX' | 'xx'  ;
fragment HOUR_PATTERN   : 'hh' | 'HH' | '??' | 'XX' | 'xx'  ;
fragment MINUTE_PATTERN : 'mm' | 'MM' | '??' | 'XX' | 'xx'  ;
fragment SECOND_PATTERN : 'ss' | 'SS' | '??' | 'XX' | 'xx'  ;
fragment TZ_PATTERN     : '±' ('hh' | 'HH') (':'? ('mm' | 'MM'))? | 'Z' ;

DURATION_CONSTRAINT_PATTERN  : 'P' [yY]?[mM]?[Ww]?[dD]? ( 'T' [hH]?[mM]?[sS]? )? ;

// ---------- Delimited Regex matcher ------------
// In ADL, a regexp can only exist between {}.
// allows for '/' or '^' delimiters
// logical form - REGEX: '/' ( '\\/' | ~'/' )+ '/' | '^' ( '\\^' | ~'^' )+ '^';
// The following is used to ensure REGEXes don't get mixed up with paths, which use '/' chars

//a
// regexp can only exist between {}. It can optionally have an assumed value, by adding ;"value"
CONTAINED_REGEXP: '{'WS* (SLASH_REGEXP | CARET_REGEXP) WS* (';' WS* STRING)? WS* '}';
fragment SLASH_REGEXP: '/' SLASH_REGEXP_CHAR+ '/';
fragment SLASH_REGEXP_CHAR: ~[/\n\r] | ESCAPE_SEQ | '\\/';

fragment CARET_REGEXP: '^' CARET_REGEXP_CHAR+ '^';
fragment CARET_REGEXP_CHAR: ~[^\n\r] | ESCAPE_SEQ | '\\^';

// ---------- various ADL2 codes -------

ROOT_ID_CODE : 'id1' '.1'* ;
ID_CODE      : 'id' CODE_STR ;
AT_CODE      : 'at' CODE_STR ;
AC_CODE      : 'ac' CODE_STR ;
fragment CODE_STR : ('0' | [1-9][0-9]*) ( '.' ('0' | [1-9][0-9]* ))* ;

// ---------- ISO8601 Date/Time values ----------

ISO8601_DATE      : YEAR '-' MONTH ( '-' DAY )? | YEAR '-' MONTH '-' UNKNOWN_DT | YEAR '-' UNKNOWN_DT '-' UNKNOWN_DT ;
ISO8601_TIME      : ( HOUR ':' MINUTE ( ':' SECOND ( SECOND_DEC_SEP DIGIT+ )?)? | HOUR ':' MINUTE ':' UNKNOWN_DT | HOUR ':' UNKNOWN_DT ':' UNKNOWN_DT ) TIMEZONE? ;
ISO8601_DATE_TIME : ( YEAR '-' MONTH '-' DAY 'T' HOUR (':' MINUTE (':' SECOND ( SECOND_DEC_SEP DIGIT+ )?)?)? | YEAR '-' MONTH '-' DAY 'T' HOUR ':' MINUTE ':' UNKNOWN_DT | YEAR '-' MONTH '-' DAY 'T' HOUR ':' UNKNOWN_DT ':' UNKNOWN_DT ) TIMEZONE? ;
fragment TIMEZONE : 'Z' | ('+'|'-') HOUR_MIN ;   // hour offset, e.g. `+0930`, or else literal `Z` indicating +0000.
fragment YEAR     : [0-9][0-9][0-9][0-9] ;		   // Year in ISO8601:2004 is 4 digits with 0-filling as needed
fragment MONTH    : ( [0][1-9] | [1][0-2] ) ;    // month in year
fragment DAY      : ( [0][1-9] | [12][0-9] | [3][0-1] ) ;  // day in month
fragment HOUR     : ( [01]?[0-9] | [2][0-3] ) ;  // hour in 24 hour clock
fragment MINUTE   : [0-5][0-9] ;                 // minutes
fragment HOUR_MIN : ( [01]?[0-9] | [2][0-3] ) [0-5][0-9] ;  // hour / minutes quad digit pattern
fragment SECOND   : [0-5][0-9] ;                 // seconds
fragment SECOND_DEC_SEP : '.' | ',' ;
fragment UNKNOWN_DT  : '??' ;                    // any unknown date/time value, except years.

// ISO8601 DURATION PnYnMnWnDTnnHnnMnn.nnnS 
// here we allow a deviation from the standard to allow weeks to be // mixed in with the rest since this commonly occurs in medicine
// TODO: the following will incorrectly match just 'P'
ISO8601_DURATION : '-'?'P' (DIGIT+ [yY])? (DIGIT+ [mM])? (DIGIT+ [wW])? (DIGIT+[dD])? ('T' (DIGIT+[hH])? (DIGIT+[mM])? (DIGIT+ (SECOND_DEC_SEP DIGIT+)?[sS])?)? ;

// ------------------- special word symbols --------------
SYM_TRUE  : [Tt][Rr][Uu][Ee] ;
SYM_FALSE : [Ff][Aa][Ll][Ss][Ee] ;

// ---------------------- Identifiers ---------------------

ARCHETYPE_HRID      : ARCHETYPE_HRID_ROOT '.v' ARCHETYPE_VERSION_ID ;
ARCHETYPE_REF       : ARCHETYPE_HRID_ROOT '.v' INTEGER ( '.' DIGIT+ )* ;
fragment ARCHETYPE_HRID_ROOT : (NAMESPACE '::')? IDENTIFIER '-' IDENTIFIER '-' IDENTIFIER '.' LABEL ;
fragment ARCHETYPE_VERSION_ID: DIGIT+ ('.' DIGIT+ ('.' DIGIT+ ( ( '-rc' | '-alpha' ) ( '.' DIGIT+ )? )?)?)? ;
VERSION_ID          : DIGIT+ '.' DIGIT+ '.' DIGIT+ ( ( '-rc' | '-alpha' ) ( '.' DIGIT+ )? )? ;
fragment IDENTIFIER : ALPHA_CHAR WORD_CHAR* ;

// --------------------- composed primitive types -------------------

// e.g. [ICD10AM(1998)::F23]; [ISO_639-1::en]
TERM_CODE_REF : '[' TERM_CODE_CHAR+ ( '(' TERM_CODE_CHAR+ ')' )? '::' TERM_CODE_CHAR+ ']' ;
fragment TERM_CODE_CHAR: NAME_CHAR | '.';

// --------------------- URIs --------------------

// URI recogniser based on https://tools.ietf.org/html/rfc3986 and
// http://www.w3.org/Addressing/URL/5_URI_BNF.html
EMBEDDED_URI: '<' ([ \t\r\n]|CMT_LINE)* URI ([ \t\r\n]|CMT_LINE)* '>';

fragment URI : URI_SCHEME ':' URI_HIER_PART ( '?' URI_QUERY )? ('#' URI_FRAGMENT)? ;

fragment URI_HIER_PART : ( '//' URI_AUTHORITY ) URI_PATH_ABEMPTY
    | URI_PATH_ABSOLUTE
    | URI_PATH_ROOTLESS
    | URI_PATH_EMPTY;

fragment URI_SCHEME : ALPHA_CHAR ( ALPHA_CHAR | DIGIT | '+' | '-' | '.')* ;

fragment URI_AUTHORITY : ( URI_USERINFO '@' )? URI_HOST ( ':' URI_PORT )? ;
fragment URI_USERINFO: (URI_UNRESERVED | URI_PCT_ENCODED | URI_SUB_DELIMS | ':' )* ;
fragment URI_HOST : URI_IP_LITERAL | URI_IPV4_ADDRESS | URI_REG_NAME ; //TODO: ipv6
fragment URI_PORT: DIGIT*;

fragment URI_IP_LITERAL   : '[' URI_IPV6_LITERAL ']'; //TODO, if needed: IPvFuture
fragment URI_IPV4_ADDRESS : URI_DEC_OCTET '.' URI_DEC_OCTET '.' URI_DEC_OCTET '.' URI_DEC_OCTET ;
fragment URI_IPV6_LITERAL : HEX_QUAD (':' HEX_QUAD )* ':' ':' HEX_QUAD (':' HEX_QUAD )* ;

fragment URI_DEC_OCTET  : DIGIT | [1-9] DIGIT | '1' DIGIT DIGIT | '2' [0-4] DIGIT | '25' [0-5];
fragment URI_REG_NAME: (URI_UNRESERVED | URI_PCT_ENCODED | URI_SUB_DELIMS)*;
fragment HEX_QUAD : HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT ;

fragment URI_PATH_ABEMPTY: ('/' URI_SEGMENT ) *;
fragment URI_PATH_ABSOLUTE: '/' ( URI_SEGMENT_NZ ( '/' URI_SEGMENT )* )?;
fragment URI_PATH_NOSCHEME: URI_SEGMENT_NZ_NC ( '/' URI_SEGMENT )*;
fragment URI_PATH_ROOTLESS: URI_SEGMENT_NZ ( '/' URI_SEGMENT )*;
fragment URI_PATH_EMPTY: ;

fragment URI_SEGMENT: URI_PCHAR*;
fragment URI_SEGMENT_NZ: URI_PCHAR+;
fragment URI_SEGMENT_NZ_NC: ( URI_UNRESERVED | URI_PCT_ENCODED | URI_SUB_DELIMS | '@' )+; //non-zero-length segment without any colon ":"

fragment URI_PCHAR: URI_UNRESERVED | URI_PCT_ENCODED | URI_SUB_DELIMS | ':' | '@';

//fragment URI_PATH   : '/' | ( '/' URI_XPALPHA+ )+ ('/')?;
fragment URI_QUERY : (URI_PCHAR | '/' | '?')*;
fragment URI_FRAGMENT  : (URI_PCHAR | '/' | '?')*;

fragment URI_PCT_ENCODED : '%' HEX_DIGIT HEX_DIGIT ;

fragment URI_UNRESERVED: ALPHA_CHAR | DIGIT | '-' | '.' | '_' | '~';
fragment URI_RESERVED: URI_GEN_DELIMS | URI_SUB_DELIMS;
fragment URI_GEN_DELIMS: ':' | '/' | '?' | '#' | '[' | ']' | '@'; //TODO: migrate to [/?#...] notation
fragment URI_SUB_DELIMS: '!' | '$' | '&' | '\'' | '(' | ')'
                         | '*' | '+' | ',' | ';' | '=';

// According to IETF http://tools.ietf.org/html/rfc1034[RFC 1034] and http://tools.ietf.org/html/rfc1035[RFC 1035],
// as clarified by http://tools.ietf.org/html/rfc2181[RFC 2181] (section 11)
fragment NAMESPACE : LABEL ('.' LABEL)* ;
fragment LABEL : ALPHA_CHAR (NAME_CHAR|URI_PCT_ENCODED)* ;


GUID : HEX_DIGIT+ '-' HEX_DIGIT+ '-' HEX_DIGIT+ '-' HEX_DIGIT+ '-' HEX_DIGIT+ ;

ALPHA_UC_ID :   ALPHA_UCHAR WORD_CHAR* ;   // used for type ids
ALPHA_LC_ID :   ALPHA_LCHAR WORD_CHAR* ;   // used for attribute / method ids
ALPHA_UNDERSCORE_ID : '_' WORD_CHAR* ;     // usually used for meta-model ids

// --------------------- atomic primitive types -------------------

INTEGER : DIGIT+ E_SUFFIX? ;
REAL :    DIGIT+ '.' DIGIT+ E_SUFFIX? ;
fragment E_SUFFIX : [eE][+-]? DIGIT+ ;

STRING : '"' STRING_CHAR*? '"' ;
fragment STRING_CHAR : ~["\\] | ESCAPE_SEQ | UTF8CHAR ; // strings can be multi-line

CHARACTER : '\'' CHAR '\'' ;
fragment CHAR : ~['\\\r\n] | ESCAPE_SEQ | UTF8CHAR  ;

fragment ESCAPE_SEQ: '\\' ['"?abfnrtv\\] ;

// ------------------- character fragments ------------------

fragment NAME_CHAR     : WORD_CHAR | '-' ;
fragment WORD_CHAR     : ALPHANUM_CHAR | '_' ;
fragment ALPHANUM_CHAR : ALPHA_CHAR | DIGIT ;

fragment ALPHA_CHAR  : [a-zA-Z] ;
fragment ALPHA_UCHAR : [A-Z] ;
fragment ALPHA_LCHAR : [a-z] ;
fragment UTF8CHAR    : '\\u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT ;

fragment DIGIT     : [0-9] ;
fragment HEX_DIGIT : [0-9a-fA-F] ;

// -------------------- common symbols ---------------------

SYM_COMMA: ',' ;
SYM_SEMI_COLON : ';' ;

SYM_LPAREN   : '(';
SYM_RPAREN   : ')';
SYM_LBRACKET : '[';
SYM_RBRACKET : ']';
SYM_LCURLY   : '{' ;
SYM_RCURLY   : '}' ;

// --------- symbols ----------
SYM_ASSIGNMENT: ':=' | '::=' ;

SYM_NE : '/=' | '!=' | '≠' ;
SYM_EQ : '=' ;
SYM_GT : '>' ;
SYM_LT : '<' ;
SYM_LE : '<=' | '≤' ;
SYM_GE : '>=' | '≥' ;

// TODO: remove when [] path predicates supported
VARIABLE_WITH_PATH: VARIABLE_ID ADL_ABSOLUTE_PATH ;

VARIABLE_ID: '$' ALPHA_LC_ID ;


//
// ========================= Lexer ============================
//

// -------------------- symbols for lists ------------------------
SYM_LIST_CONTINUE: '...' ;

// ------------------ symbols for intervals ----------------------

SYM_PLUS : '+' ;
SYM_MINUS : '-' ;
SYM_PLUS_OR_MINUS : '+/-' | '±' ;
SYM_PERCENT : '%' ;
SYM_CARAT: '^' ;

SYM_IVL_DELIM: '|' ;
SYM_IVL_SEP  : '..' ;