The Accession Clearing House Mechanism could perhaps
better be designated by Germplasm Clearing House Mechanism,
Global Germplasm Index
The name ACHM would perhaps be unclear to people outside
the (plant) genetic resources community.
Accession is used to describe the process to acquire something;
a museum collection is extended through accessionÉ
The word germplasm will produce more useful dictionary
lookup descriptions.
Elements of the CHM (data classes)
Global unique ID (GUID)
Taxonomy classification
Cultivar classification
Germplasm collecting missions
Original Source (collecting event, breeding event)
Conserved germplasm (ex situ, in situ)
Germplasm donor events (exchange of germplasm between genebanks)
Germplasm regeneration events (rejuvenation and multiplication)
Germplasm distribution events (seed distribution)
Contact institutes and persons (is WIEWS sufficient?)
...
Activities/outputs
Global unique IDs and a system to resolve metadata from the GUID
Taxonomic name service (synonyms, etc)
Inventory of commercial cultivars (breeding company, names, release yearÉ)
Inventory of germplasm collecting missions
Inventory of collecting events (collecting number, location, time, collecting mission)
Inventory of breeding events (cultivars, landraces)
Global index of conserved germplasm accessions (fixed release of download version, every month, every 3 monthÉ?)
Inventory of donor events (?)
Inventory of regeneration events (?)
Inventory of germplasm distribution events (?)
Inventory of institutes and persons (complement and expand the WIEWS db?)
...
Uses cases for the CHM
Use cases of the taxonomic name service
Your germplasm sample is assigned a taxonomic name and you are curious if it belongs to the same species as another germplasm sample from another genebank in another continent.
You want to check the spelling of the taxonomic names in your local database and need a checklist of correct names or perhaps a list of commonly misspelled names.
You want to provide more taxonomic information from within your germplasm information system than you have capacity or competence to curate within your institute.
You want to provide a small illustration of the species in question to make the public layman interface to your online germplasm catalogue more accessible.
Use case for the commercial cultivar index
You want to complete or improve the information in your breeding event database with breeding company name, breeder person, breeding year, commercial release year etc.
You want to lookup the release of a commercial cultivar in other countries.
You want to lookup the GUID of a commercial cultivar to improve data interoperability of your cultivar database.
You want to lookup the pedigree of a germplasm accession you has received from a genebank.
Use case for the inventory of germplasm origin
You are to study the duplication and identify the Òmost original sampleÓ (MOS) of germplasm accessions globally or in a region, e.g. EUROPE. Comparing the passport metadata will provide you with a likelihood of two samples being duplicates. If the two samples are classified to the same original germplasm source in the CHM the relation of the samples are also stated. The storage conditions since the origin of the germplasm sample may still cause the two accessions to be (significantly?) different, but the statement of both samples originating from the same origin would be valid (as far as the classification itself is correct).
You want to complete or improve on the passport data for your stored accession and lookup the original source germplasm sample in the CHM to find the coordinates of the collecting/gathering site, collecting institute or person etc.
Use case for the index of germplasm genebank accessions
The accession passport metadata published from the holding genebank or uploaded as a data export to facilitate data exchange from the local database to CHM index.
The CHM index is online public searchable and provides a global view of all provided germplasm accessions.
The CHM index is exported to a more download friendly format and fixed releases (every month, every 3 month, every 6 month) are made available and stored for future reference. Fixed releases make it easier to reference specific dataÉ
Description and design of the CHM data classes
Design of the taxonomic name service
The taxonomic name service could be based on or a simple mirror of the Mansfeld database of cultivated plants, the GRIN Taxonomy database, the Swedish cultivated plants database (SCUD) and other relevant nomenclature projects with focus on the cultivated germplasm. Further the taxonomic name service should be compatible with the Species2000, IT IS, IPNI and the GBIF activity ECAT (Electronic Catalogue of Names of Known Organisms).
Design of the commercial cultivar index
The index of the commercial cultivar could perhaps be based on data from UPOV? Other sources are national lists of released commercial cultivars for individual countries. Help from national experts would be needed to understand national procedures and language for the national cultivar catalogues. Other sources of information could be the seed catalogues produced by the commercial breeding companies as well as documentation on cultivars maintained by genebank institutes.
Design of the germplasm origin data index
The collecting/gathering event data class would be based on the germplasm collecting form design. The breeding event data class design would be different for traditional breeding/domestication producing landraces and the modern commercial breeding. Breeding lines and the result of modern molecular biology breeding methods would again produce different data classes for these breeding events.
Design of the germplasm accession index
Basic metadata on all germplasm accessions conserved by any genebank worldwide should be indexed here. All germplasm samples should be resolvable through a GUID, preferably implemented at the holding genebank information system.
Data elements/concepts for the CHM data classes
All date values indexed should have both a proper date-time format and a data-text format in case the reported value cannot be converted to a valid timestamp.
All data classes should store record created timestamp (when record added to the index), record updated timestamp (when record was last modified in the index), record deleted timestamp (when the record was deleted from the index, alternative is to destroy the record in the index). If available the source data object created timestamp and source data object updated timestamp should be recorded.
Data concepts for original germplasm (under developmentÉ!)
Original germplasm GUID
Original germplasm designation (collector accession designation/number, breeder accession designation)
Origin country
Institute (collecting or breeding institute)
Person (collector or breeder)
Date/time of the ÒcreationÓ event (collecting date, cultivar release dateÉ)
For gathering/collecting events
Site/location
Longitude (decimal degree format)
Latitude (decimal degree format)
Coordinate precision
Elevation
For breeding/cultivation events
Geographic area where germplasm is cultivated or developed to be farmed
Year of development
Pedigree formula (Purdy)
Ancestral notes/remarks
NB! Should we split the gathering event, traditional cultivation event, modern breeding event, breeding for research event more distinctivelyÉ? We do perhaps also need to define the events and the limitation of the data classes betterÉ? Would we build an origin super class to inherit from, and a breeding event class to inherit the breeding sub classes from as indicated aboveÉ?
Data concepts for germplasm accessions
Accession GUID (global unique ID)
Accession UID (local unique ID)
Accession number
Data resource/provider (BioCASE entry point, uploaded by, etc)
Data source (germplasm holding genebank institute)
Data source code (FAO code, WIEWS)
Sub collection (named collection within the holding institute)
Taxon name (raw name as reported)
Taxon concept (reference to a taxon name service or checklist)
Original germplasm (reference to the collecting/gathering or breeding/cultivation event, preferably on GUID)
...
Accession name
Country of origin
Many database tables versus one or few tables, relational schema versus flat schemaÉ A flat schema is easier to harvest into and to export data out from. A relational schema makes the building of links between data classes easier (an accession have a holding institute, the holding institute have staff, the staff have collected the following accessionsÉ). A relational schema will store the data more efficiently consuming less megabyte hard disc space.
Text: Dag Endresen, NGB/IPGRI.
Please feel free to reuse or modify the text above.