GBIF Tools

An index to tools brought to you be the Global Biodiversity Facility.



Darwin Core Archive Assistant

The Darwin Core Archive Assistant is a web application that presents a simple interface for describing the data elements a data publisher wishes to serve to the GBIF network as basic text files and composes the appropriate XML descriptor file as defined in the Darwin Core Text Guidelines to accompany them. It communicates with the GBIF registry to provide an up-to-date listing of all relevant Darwin Core terms and available extensions and presents these in a simple checklist format.

The Darwin Core is a body of standards that include a set of terms relating to taxa and their occurrence in nature, and a set of practices regarding the use of these terms in the publication of biodiversity data and information. GBIF has adopted a text-based solution for using Darwin Core that both simplifies and extends the publication of species and species-occurrence data. This format is referred to as a Darwin Core Archive (DWCA) and provides a relatively non-technical option for publishing biodiversity data that does not require complicated installations of data publication software. Darwin Core Archives can be published via a simple web address or URL.

Darwin Core Archives support the publication of enriched data types that extend the core terms while retaining the relatively simple, text-based data format. These extensions, however, require the inclusion of an XML descriptor file (meta.xml) that serves as a map to the different files and data elements in the archive. Many biologists and data managers find working with XML challenging while otherwise finding the technical threshold for producing Darwin Core Archives quite low.


Darwin Core Archive Validator

The validator is a tool to test Darwin Core Archives as specified in the Darwin Core Text Guidelines. Due to the simplicity of the archives GBIF encourages publishers to create them using simple custom scripts. Therefore the need arises to provide a testing framework for developers to make sure GBIF and others can read the information as expected.

The validator uses the official XML schema to validate the meta.xml descriptor, but additionally it uses the Darwin Core Archive Reader java library to validate the content against the known extensions and terms registered within the GBIF network for sharing biodiversity data. GBIF runs a production and a development registry that keeps track of extensions, both of which are used by this validator.

GBIF recommends to bundle an Ecological Markup Language (EML) xml file with an archive. As EML is a rather large and complex schema GBIF has specified a GBIF profile that uses a subset of EML 2.1.1 and also declares specific additions to EML within the generic additionalMetadata section of EML. Every valid GBIF profile document should therefore always be valid according to the official EML schema. The EML validation is done according those two xml schemas.


Spreadsheet Processor

The spreadsheet processor is a web application that transforms that transforms pre-configured MS Excel spreadsheet files to a GBIF-supported standard data format called a DarwinCore Archive file. The pre-configured Excel files contain multiple worksheets that support data entry. One worksheet supports the GBIF metadata profile. A second worksheet supports the publication of either primary biodiversity data, in the form of natural history collections /species observational data or basic species checklists. The spreadsheet processor accepts completed spreadsheet files via a web-form or as an email attachment. The processor performs a series of data validation and transformation steps and returns a validated DarwinCore Archive file to the user that can be published via GBIF or other biodiversity networks that support this format.


Name Parser

A simple html form using the ECAT name parser webservice. The parser is written in java and based on regular expressions to disect name strings into its components. It does only keep name parts required to reconstruct a full 3-parted name with an optional subgenus, but ignores additional infraspecific parts such as the subspecies given for varieties.