GBIF Tools

An index to tools brought to you be the Global Biodiversity Facility.

Darwin Core Archive Assistant

The Darwin Core Archive Assistant is a web application that presents a simple interface for describing the data elements a data publisher wishes to serve to the GBIF network as basic text files and composes the appropriate XML descriptor file as defined in the Darwin Core Text Guidelines to accompany them. It communicates with the GBIF registry to provide an up-to-date listing of all relevant Darwin Core terms and available extensions and presents these in a simple checklist format.

The Darwin Core is a body of standards that include a set of terms relating to taxa and their occurrence in nature, and a set of practices regarding the use of these terms in the publication of biodiversity data and information. GBIF has adopted a text-based solution for using Darwin Core that both simplifies and extends the publication of species and species-occurrence data. This format is referred to as a Darwin Core Archive (DWCA) and provides a relatively non-technical option for publishing biodiversity data that does not require complicated installations of data publication software. Darwin Core Archives can be published via a simple web address or URL.

Darwin Core Archives support the publication of enriched data types that extend the core terms while retaining the relatively simple, text-based data format. These extensions, however, require the inclusion of an XML descriptor file (meta.xml) that serves as a map to the different files and data elements in the archive. Many biologists and data managers find working with XML challenging while otherwise finding the technical threshold for producing Darwin Core Archives quite low.

Darwin Core Archive Validator

The validator is a tool to test Darwin Core Archives as specified in the Darwin Core Text Guidelines. Due to the simplicity of the archives GBIF encourages publishers to create them using simple custom scripts. Therefore the need arises to provide a testing framework for developers to make sure GBIF and others can read the information as expected.

The validator uses the official XML schema to validate the meta.xml descriptor, but additionally it uses the Darwin Core Archive Reader java library to validate the content against the known extensions and terms registered within the GBIF network for sharing biodiversity data. GBIF runs a production and a development registry that keeps track of extensions, both of which are used by this validator.

GBIF recommends to bundle an Ecological Markup Language (EML) xml file with an archive. As EML is a rather large and complex schema GBIF has specified a GBIF profile that uses a subset of EML 2.1.1 and also declares specific additions to EML within the generic additionalMetadata section of EML. Every valid GBIF profile document should therefore always be valid according to the official EML schema. The EML validation is done according those two xml schemas.

Spreadsheet Processor

The spreadsheet processor is a web application that transforms that transforms pre-configured MS Excel spreadsheet files to a GBIF-supported standard data format called a DarwinCore Archive file. The pre-configured Excel files contain multiple worksheets that support data entry. One worksheet supports the GBIF metadata profile. A second worksheet supports the publication of either primary biodiversity data, in the form of natural history collections /species observational data or basic species checklists. The spreadsheet processor accepts completed spreadsheet files via a web-form or as an email attachment. The processor performs a series of data validation and transformation steps and returns a validated DarwinCore Archive file to the user that can be published via GBIF or other biodiversity networks that support this format.

DarwinCore Archive Registration Form

If you have a DarwinCore Archive that you would like to publish through GBIF and are NOT using the Integrated Publishing Toolkit (IPT). This simple form allows you to manually register your dataset with our registry via the GBIF helpdesk.

GBIF Resource Browser

The GBIF Resource Browser provides search and browse access to terms and their definitions stored at that form the basis of data exchange within the GBIF network. There are three categories of terms accessible through the browser.

This browser uses data stored on the GBIF Resource Repository.

Taxon Tagger

TaxonTagger is a rich client web application that identifies, highlights, and extracts scientific names from web pages and PDF documents. TaxonTagger uses the GBIF name finder web services as it's name-finding engine. The application highlights names in a document and provides a limited capacity to annotate the service output such as highlighting missed names, extending a find, etc. The extract list of species names can be subsequently exported as a simple DarwinCore list or can be cross-referenced and mapped to the Catalogue of Life or other GBIF-indexed species lists to output a complete classification of the extracted taxa.

Name Parser

A simple html form using the ECAT name parser webservice. The parser is written in java and based on regular expressions to disect name strings into its components. It does only keep name parts required to reconstruct a full 3-parted name with an optional subgenus, but ignores additional infraspecific parts such as the subspecies given for varieties.

Name Finder

A simple html client to the name finding webservices hosted at GBIF. These indexing services try to locate and extract scientific names within a text document. The global names architecture (GNA) has defined a standard Name Finding API that our services adhere to. Currently we are hosting 2 finding services using different algorithms that you can use with the client:

  1. uBio's TaxonFinder: A service wrapping the original uBio TaxonFinder.
  2. Lucene Name Indexer: A java implementation based on Apache Lucene by GBIF that consume documents in many file formats. This service is still considered experimental

Text Extraction Service

This Apache TIKA based service will accept a url to many binary document formats such as PDF, MS Word, Excel or PowerPoint, Keynote and many others.

Please pass the location of the document to convert as a get parameter url. For example:

You can use this service to pipe its results into the name finding service. Enter the complete extraction service url into the finders url field.

Terms Used in Bionomenclature

A glossary of over 2,100 terms used in biological nomenclature, the naming of whole organisms of all kinds. It covers terms in use in the current editions of the different internationally mandated and proposed organismal Codes; i.e. those for botany (including mycology), cultivated plants, prokaryotes (archaea and bacteria), virology, and zoology, as well as the Draft BioCode and PhyloCode. Any abbreviations latinizations and synonyms are incorporated, as are terms which are either no longer employed, are used outside the formal nomenclatural Codes, or are otherwise likely to be encountered. As some of the terms used in the classification of plant communities are identical to those of whole organisms, terms used in phytosociological nomenclature are also included.

The glossary has been prepared with inputs from numerous nomenclatural specialists, especially representatives of the different Codes serving on the IUBS/IUMS International Committee on Bionomenclature. It is intended for use as a reference work by all biologists involved with the description or re-classification of organisms, as well as those investigating the status and application of previously proposed names.

Global Protected Areas Assessment and Monitoring Pilot Viewer

The GPAAMP Viewer is a freely distributable, standards-based, Open Source web-based GIS client. Supporting viewing and download services, it functions primarily as a means for visualizing information from disparate sources delivered through standards-based web services. This project supports the interaction of geospatial data according to the OGC Open Geospatial Consortium formats. These geoportals focus on

  1. Viewing WMS, WFS, GeoRSS and KML Data from many different OGC geospatial servers
  2. Turn key geo portals for setting up interactive WFS-T layers to add, edit and delete data.

Please visit the project site to learn more or download the tool.
A demonstration is available to explore the viewer.