ContextItem Data - 2016 ASABE AIM
Basic Concept: Identity and the Compound Identifier
The CompoundIdentifier serves two purposes.
First, it has a locally-scoped identifier (ReferenceId) which allows its parent object to be used by reference within a given collection of data (dataset). When dealing with especially large datasets, the ability to use objects by reference results in a smaller footprint when persisting to storage or attempting transmission.
Second, it enables the association of multiple unique identifiers and their origin information (UniqueId) to an object. This allows for an enhanced exchange of data between different systems because each system can include its internal, unique identifier for an object without compromising any of the others.
One historic pain point in data exchange has been that systems vary in the way they choose to construct unique identifiers. It is for this reason, that the UniqueId class uses a combination of string (Id) and enumeration (CompoundIdentifierTypeEnum) to fully describe the unique identifier. This allows us to support a broad variety of identification methods. For example, Company A might use integers as their unique (only inside their system) identifier but that is fundamentally incompatible with Company B which uses UUIDs (globally unique).
Almost as important is the need to capture the originating source, or issuing authority of the identifier. Again, a combination of string (Source) and enumeration (IdSourceTypeEnum) are used to fully describe the identifier's source.
The ContextItem System Data Model: Bird's-eye view
(Note: showing all relationships)
asddasds
asdasd
Detailed ContextItem System Data Model
(Note: CompoundIdentifier not shown for clarity; see separate class diagram.)
A ContextItem is a single, specific, use of a ContextItemDefinition. Put a simpler way, a ContextItem is a key / value pair. Its Code property is the "key" (corresponding to the unique Code property of a single ContextItemDefinition), while the rest of its properties describe the "value".
The Value property is ALWAYS expressed as a string even though that may not be how it was collected or how it is expected to be used elsewhere. The ValueType property of the associated ContextItemDefinition supplies the user with this data type information. In some cases, like the presence of NestedItems, there is no real Value to record so this property is optional.
The ValueUOM property contains the optional unit of measure, further defining the Value. A DefaultUOM property is included with the associated ContextItemDefinition and may be used instead of including it locally with the ContextItem. In an effort to foster the broadest appeal, UN Rec 20 codes will be the default vocabulary for expressing units of measure.
If the ContextItemDefinition has a ValueType of "Nested", the NestedItems collection allows a resulting ContextItem to support a hierarchical structure of other instances of ContextItem. See the below discussion about the NestedItemIds property of ContextItemDefinition for further information.
Notice the inclusion of an optional collection of TimeScope objects. This is used to record the various relationships a given Value may have with time. For example, there could be a TimeScope that captures when the Value was recorded and another TimeScope that expresses the duration for which the Value is considered valid.
The ContextItemDefinition describes a property of interest. It should communicate everything needed to enable that property to be captured and displayed appropriately. What follows is a description of the ContextItemDefinition object and some of the rationale behind its design.
The Code property is
- expected to be universally unique within the ContextItemDefinition domain
- is used as part of the URI that forms the identity of a ContextItemDefinition
- is the "key" used by ContextItem in its Code property
- is issued by a central authority to ensure its uniqueness
The Version property serves one purpose; change detection. Whenever a member property of ContextItemDefinition is changed, the Version number is incremented. This allows users who have a cached version of a ContextItemDefinition to realize when a change has taken place. It does NOT communicate what change was made, by whom, or why it occurred; only that it has happened. This approach was deemed more reliable than trying to use a "modified" date time stamp. All ContextItemDefinitions start with a Version value of 0.
The Status property is selected from an enumeration of two values; "Active" or "Inactive". We realize that there will be instances where a ContextItemDefinition or one of its ContextItemEnumItem(s) will need to be "retired" from use. We also recognize that those ContextItemDefinition(s) might still be needed in order to interpret historical data. As a result we choose not to delete things, but to instead mark them as "Inactive". It will be the responsibility of the user to avoid the use of "Inactive" objects in the generation of new data.
The ValueType property describes the expected data type for the "value" of a ContextItem. It is expressed as an enumeration with the values of Bool, String, Double, Integer, Enum, and Nested. The first four data types (Bool, String, Double, Integer) can be easily represented as a string and then parsed back to their original form. This is the reason that the Value property of ContextItem is a string. The "Enum" data type indicates that this ContextItemDefinition is an encoded enumerated list. The items in the list are ContextItemEnumItem objects and are included by value in the EnumItems collection property. When creating a ContextItem using an "Enum" type ContextItemDefinition, the ContextItem Value property corresponds to the Value property of the selected ContextItemEnumItem. The "Nested" data type indicates that this ContextItemDefinition is a container that encapsulates a group of other ContextItemDefinitions. When creating a ContextItem using a "Nested" type ContextItemDefinition, the ContextItem Value property is left empty but its NestedItems property contains a collection of ContextItems.
The Description property is nothing more than a "friendly" name that makes it easier to correctly choose the ContextItemDefinition of interest from a pick list. It is not meant to be a lengthy explaination.
At this point we have discussed all the properties that are required to define a simple ContextItemDefinition (everything except "Nested" or "Enum"). This minimum set of information, while sufficient to allow for the capturing of data, does little to convey any real meaning. The goal of the remaining properties are to allow a fuller description of the target concept in a semantically rich way.
The Keywords property is a collection of strings intended to be an aid for querying. They are expected to be single words, lower case.
The Lexicalizations property is a collection of Lexicalization objects. The concept that a ContextItemDefinition represents may be expressed a number of different ways, not just in different languages, but also using different terms in the same language. For example, regional differences in the common name of a pest. A Lexicalization object contains the text, a reference to the language that text is in, and a list of GeoPoliticalContext objects that describe "where" that terminology is used. We have chosen to use the IANA Language Subtag Registry as the controlled vocabulary for languages.
The Properties property is a collection of ContextItem objects. This is how we can supply additional information needed to use the definition correctly. For example, if a ContextItemDefinition involves capturing values for latitude & longitude, it might include in the Properties a ContextItem that indicates the geodetic datum is expected to be WGS84.
The NestedDefIds property is a collection of references to other ContextItemDefinition(s). This is only populated if ValueType is set to "Nested". Sometimes there are groups of data points that need to be collected together rather than individually. For example, the Public Land Survey System is composed of 4 pieces of data the Meridian, Township, Range, and Section. Each is defined as its own ContextItemDefinition, and is represented as ContextItems that are included by value in the NestedItems property of the PLSS ContextItem.
The Presentations property is a collection of Presentation objects. One of the challenges in collecting quality data is being able to make sure that it is entered properly.The Presentation object contains a "friendly" name description, regular expressions that define how the value is supposed to look when entered or displayed, and a list of GeoPoliticalContext objects that describe "where" this presentation is used. A regular expression is a sequence of one or more characters, alone or in groups, that can be used to describe an expected pattern. It is used to test a given string to see if it follows the pattern.
The EnumItems property is a collection of ContextItemEnumItem objects. This is how we encode the enumerated values for a ContextItemDefinition of ValueType "Enum". The ContextItemEnumItem contains some of the same properties that ContextItemDefinition does. Instead of having a Code property, it has a Value. This Value is expected to be unique within the domain of the ContextItemDefinition it is attached to. When creating a ContextItem using an "Enum" type ContextItemDefinition, the ContextItem Value property corresponds to the Value property of the selected ContextItemEnumItem.
The DefaultUOM property is an optional string describing the expected unit of measure. In an effort to foster the broadest appeal, UN Rec 20 codes will be the default vocabulary for expressing units of measure.
The AllowConversion property is an optional boolean value that is used in conjunction with the DefaultUOM. This serves as a flag to determine if the user can allow the value to be entered in a unit compatible with DefaultUOM, or are they required to use the default unit.
The TimeScopes property is an optional collection of TimeScope objects. This collection allows us to attach a variety of time-related attributes to the ContextItemDefinition. For example, a TimeScope could describe when its ContextItemDefinition was created, another when it was updated, and a third the date range it is valid for.
The ModelScopeIds property is a collection of references to ModelScope objects. The ModelScope object represents a business object in either the ADAPT model or ISO11783 (potentially other models as well). For example, there is a ModelScope object for the ADAPT Farm class as well as a separate one for the ISO11783 Farm tag (FRM). This coded list of business objects forms the controlled vocabulary we use to specify which objects a given ContextItemDefinition can be used to describe. We have tried to reuse existing controlled vocabularies where ever we could; the language and geopolitical vocabularies we use both point to external sources. In this case; however, we were forced to create our own.
The GeoPoliticalContextIds property is a collection of references to GeoPoliticalContext objects. The GeoPoliticalContext object represents an entry in an external controlled vocabulary that describes a particular geographic/political domain or organization. This allows us to tag ContextItemDefinitions with a marker that conveys "where" the data it represents is relevant.