How to link datasets to a project

{{result.description | stripTags | limitTo:200 }}

{{ result.publishingOrganizationTitle | limitTo:100 }}

... ...

How to link events and news to a project

Various specimen labels

Traditionally, transcribing specimen labels and extracting Darwin Core terms is time consuming and difficult to scale. Labels are variable in form, context and language, and the process needs human input at almost every stage. As a result, the majority of specimens in many natural history collections are still not digitized and published.

The potential for using AI to overcome these problems and accelerate label digitization at scale has skyrocketed recently, leading to much interest from the GBIF community with several active experiments ongoing in various countries. This project aims to consolidate these efforts, taking into account the diverse use cases, and develop a broadly-applicable proof of concept AI Machine Annotation Service (MAS) for bulk specimen image OCR and text extraction into Darwin Core for FAIR data publication.

This project features:

  • a hackathon to design and build the MAS AI pipeline based on paid and open-source AI software and services. The hackathon is supported by the DiSSCo team to integrate with the DiSSCo infrastructure, with testing to happen in the DiSSCo sandbox environment.
  • a capacity building workshop to optimize prompts, test and refine the service, using the DiSSCover annotations platform for participants to curate the AI annotations.
  • a period of online testing of a larger sample of the available records for project participants for catching more edge cases.
  • a final recorded dissemination workshop open to GBIF Nodes for discussion, further testing and refinement of the service.

All developed software code, documentation and training materials will be open-source. Created datasets will be published in GBIF. The outcomes may be used as a basis for a larger project that eventually creates community MAS for large scale specimen label digitization using AI.

€ {{ 18027 | localNumber }}
€ {{ 46668 | localNumber }}
Duration
2024年12月1日 - 2025年12月31日
Project identifier
CESP2024-016
Funded by
Partners
Project lead
GBIF Poland
Contact details

Piotr Tykarski
University of Warsaw
Żwirki i Wigury 101
CNBCh
02-089 Warsaw
Poland

€ {{ 18027 | localNumber}}