2017 ASABE AIM Semantic Infrastructure

Next meeting: Wednesday June 7th, 10:30 AM CDT  GotoMeeting #385939997

Abstract

Helping machines and people agree on what things mean in production agriculture: AgGateway’s Glossary and Semantic Infrastructure

Daggett, Dennis G.; Ferreyra, R. Andres; Garrett, Rachel; Mund, Arren; Rhea, Stuart T.; Russo, Joseph M.; Wilson, James A.

AgGateway is a nonprofit consortium of 240+ companies dedicated to advancing implementation of standards for interoperability in agriculture. Its work originally targeted supply chain processes, but was expanded in 2010 to include production agriculture field operations such as planting, crop protection, crop nutrition, irrigation, and harvest. An important deliverable of AgGateway’s field operations projects (SPADE, PAIL) has been to identify the data variables that support the growers’ field operations processes, document their meaning, and establish the various contexts in which they are used. It subsequently became necessary to create machine-readable variable type registries, controlled vocabularies, and other semantic resources to facilitate interoperability, thus enabling all participants in a data exchange process (or their software, rather) to unambiguously understand the meaning of the data being exchanged. These resources will be made available to users primarily through one or more application programming interfaces (APIs) for use by farm management information systems (FMIS) and other software.

There are, so far, three fundamental types of variable registries in this semantic infrastructure: a) The Representation System, which describes “universal” variables and their units of measure (e.g., yield as mass per unit area), derived from an internal representation system John Deere donated to the ADAPT project; b) The ContextItem System, which describes geopolitical-content-dependent variables (e.g., EPA numbers, FSA tract Ids) and complements the Representation system; c) Observation Codes, which are a specialized form of representation system built using the ContextItem system’s sophisticated architecture.

This infrastructure will require ongoing additions and maintenance, but the resources’ and APIs’ originating projects (e.g., SPADE, PAIL) have a finite duration. AgGateway’s Standards and Guidelines Committee consequently saw fit to create a Controlled Vocabularies Working Group (CVWG) to provide a permanent home for the semantic resources’ management processes. The overarching intent is to create resources that will be used extensively by the industry. An important precondition for that usage is transparency in governance; as a result, the CVWG derived its governance model from the ISO 19135 standard for register management.

AgGateway has also invested in a human-readable Glossary, hosted at www.agglossary.org and designed to bring together agricultural terminology from different sources (e.g., ASABE) as an educational resource and discussion-support tool. This resource includes a taxonomy for representing the origin and context of each term, and was initially hosted using the MediaWiki platform. The Glossary Team is now working toward upgrading the architecture of the glossary, seeking machine-readabiity and compatibility with the ISO 25964 data model, in pursuit of maximizing the value of the resource to contributors and users of the data, especially for concepts and terms that may be contained within it and not in international efforts such as GACS (the Global Agricultural Concept Scheme); e.g., legal terms (insert reference to RMA terms) and terms derived from ASABE (and other) standards.

Keywords: information systems; software; semantics; infrastructure; API

Introduction

  • Basic introductory statement

  • Interoperability problems

  • Semantic component of the interoperability problem

    • Frame the discussion
    • What is a semantic asset? 
    • How does it help?
  • Case for shared semantic assets

AgGateway Projects and Semantic Resources

  • Stuff we've done at this point. An important deliverable of AgGateway’s field operations projects (SPADE, PAIL) has been to:
    • identify the data variables that support the growers’ field operations processes
    • document their meaning, and 
    • establish the various contexts in which they are used.  (Can we point to resources?)
  • It subsequently became necessary to create machine-readable (why is this necessary: sheer volume + need to enable automated reasoning, value discovery, etc.) variable type registries, controlled vocabularies, and other semantic resources to facilitate interoperability, thus enabling all participants in a data exchange process (or their software, rather) to unambiguously understand the meaning of the data being exchanged.
  • Bolster credibility by pointing out the number of experts that have participated.

Delivering Semantic Resources: APIs

  • Idea and importance of making semantic resources available to all (Stuart)  S3CM-67 - Getting issue details... STATUS
  • APIs (definition). Different kinds of APIs. Why REST's self-documenting system and http-based structure is convenient for our application. (Arren)  S3CM-68 - Getting issue details... STATUS
    • APIs are used to communicate information from one information system to another. Traditional APIs require that a system fully understands an API that it wants to request data from. REST APIs follow a standard set of architectural principals that allow a system to understand just a base pointer to the API. With REST APIs understanding the unique sub-structures of the system is done automatically by sending with a request the appropriate paths to follow to dig deeper into the API. As a result the system presenting the data can make changes to its infrastructure and internal design without affecting any existing systems that requests data from it (e.g. all of the paths to data in the API can change and they will automatically be followed by a requester thanks to the fact that the changes will be reflected in new requests).
    • Why is REST a benefit to this project? It's not really, considering the static nature of the data. 
  • Documenting APIs: Need to be user (developer)-friendly. Swagger: what it is, how it works, how developers can use it when implementing an API, and when consuming data from an API (Arren)  S3CM-69 - Getting issue details... STATUS
    • In order to ensure easy adoption and proper usage an API needs easy to read and complete documentation. While REST is self-documenting and allows the routes to be discovered at run-time it is still necessary for an API developer to understand the flow of information through the API and what data structures are ultimately available to consume.
    • The OpenAPI specification defines a set of files to document a REST API. One can then use tools like Swagger-UI to parse those files and provide rich, interactive documentation. This allows potential developers of an API to not only see what paths are available and the format of the data objects, but to also have a chance to run the API in a live environment and see the results, giving them a complete picture of how data flows through the API.

The Three Semantic Resource Systems, and their APIs

  • Task for Stuart and Andres: Figure out how to generalize this  
    • Approach to finding out what you want to know
  • Representation (Stuart) - API does not exist yet  S3CM-59 - Getting issue details... STATUS
    • Story of ways to find out what you want to know. By dimension? By what field operation you want to conduct? (Example: ISO elements' lists of DDIs that are in scope)
  • ContextItem (Andres description S3CM-71 - Getting issue details... STATUS & Stuart API S3CM-72 - Getting issue details... STATUS (Lay out story of finding stuff out by next call!!!) 
    • Story of ways of finding out what you want to know is well understood: by geopolitical context, by modelscope, etc.
    • We expect the ContextItem System to grow quickly.

      We recognized that finding what you want/need in a big set of ContextItemDefinitions would be a usability issue.

      We built in five separate controlled vocabularies to support classifying ContextItemDefinitions as they are added to the service.

      A ModelScope represents the "what" (Grower, Farm, Field, Product, etc) a given ContextItemDefinition is expected to describe or modify. The collection of available ModelScopes contains entries for both ADAPT classes and ISO-11783 tags, with representations available from the "model-scopes" endpoint (api.contextitem.org/ms). ModelScopes are used by reference (ModelScope.Code) as part of a ContextItemDefinition(ContextItemDefinition.ModelScopeCodes).

      A GeoPoliticalContext represents the "where" (USA, California, EU, etc.) a given ContextItemDefinition is expected to be used. ISO 3166 Alpha-3 ("ISO3") is the starting point. It is expected that this service will grow to also use elements of the FAO's Geopolitical Ontology and geonames.org; though there are issues with granularity and completeness. Entries will be added as needed, with representations available from the "geo-political-contexts" endpoint (api.contextitem.org/gpc). GeoPoliticalContexts are used by reference (GeoPoliticalContext.Code) as part of a ContextItemDefinition (ContextItemDefinition.GeoPoliticalContextCodes), its Lexicalizations (Lexicalization.GeoPoliticalContextCodes), and its Presentations (Presentation.GeoPoliticalContextCodes).

      A Source describes "who" submitted a ContextItemDefinition or ContextItemEnumItem. The collection of Sources contains entities who have contributed to the service, with representations available from the "sources" endpoint (api.contextitem.org/src).

      A Keyword aids the the querying of ContextItemDefinitions. The collection of Keywords is an emergent controlled vocabulary that grows as ContextItemDefinitions are tagged with descriptive terms. The collection of Keywords is available from the "keywords" endpoint (api.contextitem.org/key).

      Finally, Language. The IANA Language Subtag Registry(http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry) is the starting point for this controlled vocabulary. Entries will be added as needed, with representations available from the "languages" endpoint (api.contextitem.org/lang). A Language is used by reference (Language.Code) as part of a Lexicalization (Lexicalization.LanguageCode) and describes the Lexicalization.Text property.

      This results in a robust mechanism for querying the service to find exactly what you need for your given situation. The "definitions" endpoint (api.contextitemp.org/def) supports querying for ContextItemDefinitions based on these five controlled vocabularies.

  • Observations (Andres) - API does not exist yet  S3CM-73 - Getting issue details... STATUS


Semantic Infrastructure vs (Product?) Reference Data APIs

S3CM-66 - Getting issue details... STATUS

  • Difference between a variable-type registry and a product reference data API (e.g., products vs ContextItems): Reference data can change over time. Both controlled vocabularies, and both describe concepts, but serve different functions. Reference Data API has less in the way of a semantic underpinning.
  • ContextItem et al. provide data-driven descriptions of parts of a data model (pieces of data to be collected or displayed); reference data API provides instances of the same data model (actual data)?

Examples of Variable Type Registries in Action

  • How would this stuff be used:
    • These resources will be made available to users primarily through one or more application programming interfaces (APIs) for use by farm management information systems (FMIS) and other software.
    • Example 1 (Representation): Various flavors of Yield, planting rate / density.
    • Example 2 (ContextItem): EPA number vs European equivalent.
    • Example 3 (Observation code): "Is 'T' (in that CSV header) a temperature? If so, WHAT temperature"? 

The Ag Glossary

Rationale for the AgGlossary Tool (Andres on conceptual motivation; Dennis to add)

AgGateway's work, both within its projects, and when partnering with other organizations, is inherently a team activity, presented in an organizational context that prioritizes collaboration and consensus-building (Reference from AGW material). The success of this teamwork depends very strongly on how well the different participants can be socialized into this context, and develop the ability to make sense of, and respond to, each other's actions (Suchman, 1987; Weick, 1993). And while most of AgGateway's collaborations take place in virtual environments such as GoToMeeting and Webex teleconferences, ..... (Andres S3CM-74 - Getting issue details... STATUS )


Creating translucence in virtual teams requires, among other things, the development of a shared work language.

Previous Work in Agriculture

History: AGROVOC, National Agricultural Library Thesaurus, CABI thesaurus, GACS (Dennis S3CM-75 - Getting issue details... STATUS )

AgGateway's Glossary, and how it differs from all of the above (Dennis S3CM-76 - Getting issue details... STATUS ) (Note: Dennis has added content here!)

  • ISO and other standards stuff
  • Geopolitical-context-dependent stuff (incl. regulatory agencies)

The AgGlossary (What was done)

Dennis: Description of how we did what we did, including taxonomy and rationale therefor.

Semantics of a glossary, and how to get organized (Dennis and Andres)

  • Bottom-up approach → semantically weak, disconnected from rest of infrastructure. Opportunity to get better, starting with ISO 25964 effort.
  • ISO 25964: Concept vs term.

Application: Asserting relationships

Makes everything else a lot more powerful.

Governance: The Glossary and Controlled Vocabularies Working Groups

  • ISO 19135
  • Transparency
  • Simple process
  • How to get involved?

Discussion

  • 1: Funding / business model for this stuff? RDAPI has idea of Tiered structure in delivery. Delivery and business models: easy to enable different ways of working with stuff. How would this work?
  • 2: Improvement of data consistency; not everyone will use the same vocabularies right now, but we're setting the stage for that to happen eventually ("one vocabulary emerging"). There are precedents of projects that use crowd-sourced vocabularies, but we're not going in that direction, because this isn't a popularity contest, really. 
  • 3: Concept mapping opportunities. 
  • 4: Not reinvent the wheel. Leverage existing vocabularies, research stuff, etc.
  • 5: Aiming for simple; the industry is probably not ready for ontologies and the lengthy development process thereof. By virtue of enabling concepts to have URIs, we enable mapping and so forth. (Ask Nick for an example)
  • 6: Maybe talk about Joe Tevis' son's browser thing?
  • 7: Uncertainty reduction

Conclusions




asdasd