Species 2000 data standards
Species 2000 is currently operating a federated environment
which delivers the Species 2000 global Dynamic Checklist and a regional species checklist for Europe, which may be seen as a prototype for further regional hubs in other parts of the world and similar developments.
This page is intended to act as a source of information
about this environment and, in particular, the interoperability conventions and standards which it adopts. Note that it refers primarily to what is now referred to as the "e-1 architecture". A new architecture, the "e-2 architecture", is being developed at Cardiff University as its main contribution to the EC Framework 7 4D4Life project.
Species 2000 protocols, Common Data Model and schemas
Species 2000 operates a working federated biodiversity information system
to deliver a Catalogue of Life,
consisting of basic information about all species of organisms
together with the hierarchy in which they are arranged.
Species data providers interoperate in this federation
according to a number of conventions and principles.
-
A model for a federation in which data providers each provide data about one or more large homogeneous groups of species (taxonomic sectors),
chosen according to the expertise of their staff,
and these sectors are assembled side by side
into the complete global or regional catalogue.
-
A framework for information exchange between components of the federation,
including data providers, the hub and its software interfaces
based on the use of a number of defined requests (currently six)
for specific purposes, and their corresponding responses, thus avoiding the need for data providers to handle general database queries; these six requests are not typically employed by users interacting with user interfaces, where simpler search and browse strategies are more appropriate.
-
A human-readable Common Data Model (CDM) for reference purposes,
which provides an abstract definition for the requests and responses
(a request model),
including the data transmitted and received
(the data model),
but without specifying particular communication protocols,
as these are subject to periodic change in the community as new Web and Grid standards are created and adopted.
-
A set of specific computer-readable interface definitions, following the CDM, for use with particular implementations of communication interfaces within the system,
including a Corba IDL, XML DTD and XML Schema, which can be added to as new protocols come into use.
These protocols and standards are intended to be open and available
for others to use when building similar federated information systems.
They are described in more detail in the documents listed below.
1. Federation model
Diagram showing the relationship between the Species 2000 Hierarchy (top),
array of GSDs (middle) and linked rich data providers (below):
2. Common Data Model (CDM)
As mentioned above, this defines abstract models for both requests and data.
However, in order to deal with some issues which arose during the latter stages of the EuroCat project, modifications to the concrete implementations described in the next section were made, which have created some discrepancies between the abstract and concrete definitions.
As an interim measure, we therefore present here three versions of the CDM which were referred to during these developments.
Future work will be directed towards convergence of these definitions.
Specific concrete implementations are defined in the next section.
3. Communications interface definitions
(i) Requests from hub to wrappers
At present, the Spice hub communicates with its wrappers using an HTTP protocol:
HTTP CGI [GET] requests are sent to a wrapper, which responds by generating an XML document in response.
This is not a Web Service, nor is SOAP used
(but see below for the Web Service which supports the Dynamic Checklist user interface.)
-
The XML Schema
defines the specific XML requests and responses used by the current Spice
software to communicate with GSD wrappers using CDM 1.20.
-
The XML Document Type Definition (DTD, version 1.20)
was formerly used to describe the XML generated by wrappers in their responses,
but is now superseded by the use of the XML Schema
to define and validate both the XML requests sent to wrappers and the responses generated by the wrappers.
-
The Spice Corba IDL is no longer used as an external interface to current wrappers,
but is used internally within Spice for communications between modules.
It consists of a master IDL file
which invokes two further data-definition files compliant with versions
1.11 and
1.20 of the CDM.
(ii) Requests from external software to the hub
A Web Service to allow other software, such as the Species 2000 Dynamic Checklist user interface, to interrogate the Spice global and European hubs is currently located at
http://spice.sp2000europa.org/SPICE/services/CASWebService
but the location and definition of this service may change.
The XML used by the Web Service to communicate with the user interface
may not currently agree with the schema listed above in all details
- this is work in progress.
Support for wrapper writers
-
Species 2000 has provided a page called
Instructions for wrapper writers,
which includes a URL for downloading an archive called the
Wrapper Development Kit
(WDK), assembled by Qinglai Ni.
This version of the WDK is said to include some of the resources mentioned above together with further documentation and a current example wrapper for the Ticksbase database,
implemented as a Java Servlet on Apache Tomcat.
This kit should prove most useful for writing CDM 1.20 compliant wrappers in Java,
but also as a guide for developers in other languages.
Also included in the kit are documents that explain the wrapper writing procedure.
-
The URL provided for the WDK does not currently work.
However, there is an alternative web page with some more information, called
Dynamic Checklist Instructions,
which provides a link to the
Guidelines for Wrapper Writers, version 1.3 (January 2006).
The Instructions page also provides links to download
documentation and source code for writing Java wrappers,
with Python support also promised.
Further information
-
The Spice software which implements these standards to provide the Common Access System (CAS) for the Species 2000 Dynamic Checklist
is described further in
Spice for Species 2000.
-
The Species 2000 programme and the Sp2000 & ITIS Catalogue of Life are described further in
Species 2000.
-
The Biodiversity Software Repository at Cardiff
gives access to Spice, other software and some of the wrappers.
Last updated by Andrew Jones on 25 July 2011