Both sides previous revision Previous revision | |
dataflow:general_dataflow [2025/03/24 14:47] – birgit | dataflow:general_dataflow [2025/04/15 16:43] (current) – birgit |
---|
| |
The [[https://www.gfbio.org/data-centers/LIB|LIB Biodiversity Data Center]] is one of the seven [[https://www.gfbio.org/data-centers|GFBio Collection Data Centers]] that are part and form the backbone of the GFBio Submission, Repository and Archiving Infrastructure. The data archiving and publication at LIB includes the management systems [[https://diversityworkbench.net/Portal/Diversity_Workbench|Diversity Workbench]] as well as the digital asset management system [[https://fylr.io/|fylr]] and [[https://data.bolgermany.de/gbol1/metabarcoding|ASV-Registry]], a tool to manage asv/otu tables. Management tools and archiving processes as done at the Datacenter are described under [[https://gfbio.biowikifarm.net/wiki/Technical_Documentations|Technical Documentations]]. This includes services for documentation, processing and archiving of the provided original data and metadata sets (source data; SIP). Data producers are welcome to use Spreadsheet templates as provided under [[https://gfbio.biowikifarm.net/wiki/Forms_and_Assessments|Templates for data submission]]. | The [[https://www.gfbio.org/data-centers/LIB|LIB Biodiversity Data Center]] is one of the seven [[https://www.gfbio.org/data-centers|GFBio Collection Data Centers]] that are part and form the backbone of the GFBio Submission, Repository and Archiving Infrastructure. The data archiving and publication at LIB includes the management systems [[https://diversityworkbench.net/Portal/Diversity_Workbench|Diversity Workbench]] as well as the digital asset management system [[https://fylr.io/|fylr]] and [[https://data.bolgermany.de/gbol1/metabarcoding|ASV-Registry]], a tool to manage asv/otu tables. Management tools and archiving processes as done at the Datacenter are described under [[https://gfbio.biowikifarm.net/wiki/Technical_Documentations|Technical Documentations]]. This includes services for documentation, processing and archiving of the provided original data and metadata sets (source data; SIP). Data producers are welcome to use Spreadsheet templates as provided under [[https://gfbio.biowikifarm.net/wiki/Forms_and_Assessments|Templates for data submission]]. |
The workflow for submission, archiving and publication of data follows the standard for a __O__pen __A__rchival __I__nformation __S__ystem ([[https://www.iso.org/standard/57284.html|OAIS - Open archival information system]] and [[https://public.ccsds.org/pubs/650x0m2.pdf|Reference Model for an Open Archival Information System (pdf)]]). This ISO standard basically distinguished between different information packages for submission (SIP), archiving (AIP), and dissemination (DIP). For an overview of ISO standards for digital archives see [[ https://gfbio.biowikifarm.net/wiki/ISO_Standards_for_Digital_Archives|ISO Standards for Digital Archives]]. | The workflow for submission, archiving and publication of data follows the standard for a __O__pen __A__rchival __I__nformation __S__ystem ([[https://www.iso.org/standard/87471.html|OAIS - Open archival information system]] and [[https://ccsds.org/wp-content/uploads/gravity_forms/5-448e85c647331d9cbaf66c096458bdd5/2025/01//650x0m3.pdf|Reference Model for an Open Archival Information System (pdf)]]). This ISO standard basically distinguished between different information packages for submission (SIP), archiving (AIP), and dissemination (DIP). For an overview of ISO standards for digital archives see [[ https://gfbio.biowikifarm.net/wiki/ISO_Standards_for_Digital_Archives|ISO Standards for Digital Archives]]. |
| |
The different modules from Diversity Workbench for specimen occurrence data, literature, taxonomies, and others are used at LIB for data and metadata import, metadata enrichment and data quality control (see [[https://www.gfbio.org/data/tools|Tools & Workbenches for Data Management at GFBio]]). | The different modules from Diversity Workbench for specimen occurrence data, literature, taxonomies, and others are used at LIB for data and metadata import, metadata enrichment and data quality control (see [[https://www.gfbio.org/data/tools|Tools & Workbenches for Data Management at GFBio]]). |
**Provision of versioned Datasets** | **Provision of versioned Datasets** |
| |
Datasets containing occurrence data are published by creating a snapshot from the data and metadata in DiversityWorkbench for one dataset. This is done with the external helper tool, available from: [[https://datacenter.LIB.de/gitlab/BioCASe/biocase_media/releases|LIB GitLab: VCAT-Transfer]]. All data are mapped using the [[https://wiki.bgbm.org/bps|BioCASe Provider Software]] to the [[https://archive.bgbm.org/TDWG/CODATA/Schema/ABCD_2.1/ABCD_2.1.html|ABCD 2.1 Standard]]. A Dissemination Information Package (DIP according to OAIS) is created and stored as zip-archive in the digital asset management system [[https://media.leibniz-lib.de/biocase-archives|fylr at LIB]]. Each DIP is versioned and the version is identified by a date suffix and its version number consisting of a major version and a minor version (e.g. 2.1). Major changes, such as the addition of further data, increment the major version. Minor changes, e.g. correction of typing errors or changes in the metadata are reflected in an increment of the minor version. | Datasets containing occurrence data are published by creating a snapshot from the data and metadata in DiversityWorkbench for one dataset. This is done with the external helper tool, available from: [[https://gitlab.leibniz-lib.de/BioCASe/vcat-transfer|LIB GitLab: VCAT-Transfer]]. All data are mapped using the [[https://wiki.bgbm.org/bps|BioCASe Provider Software]] to the [[https://archive.bgbm.org/TDWG/CODATA/Schema/ABCD_2.1/ABCD_2.1.html|ABCD 2.1 Standard]]. A Dissemination Information Package (DIP according to OAIS) is created and stored as zip-archive in the digital asset management system [[https://media.leibniz-lib.de/biocase-archives|fylr at LIB]]. Each DIP is versioned and the version is identified by a date suffix and its version number consisting of a major version and a minor version (e.g. 2.1). Major changes, such as the addition of further data, increment the major version. Minor changes, e.g. correction of typing errors or changes in the metadata are reflected in an increment of the minor version. |
| |
Datasets stored and curated in [[https://media.leibniz-lib.de|fylr]] are published from within the software. | Datasets stored and curated in [[https://media.leibniz-lib.de|fylr]] are published from within the software. |
Published datasets are citable using direct URLs to the DIP or via the DOIs. Based on the data provider's input the citation of the dataset will be prepared by the LIB data curator adjusting the input (submission metadata) to be conform with the GFBio citation pattern. The citation is finalized in close collaboration with the data provider. For details see General part of [[https://gfbio.biowikifarm.net/wiki/Data_Publishing/General_part:_GFBio_publication_of_type_1_data_via_BioCASe_data_pipelines|GFBio publication of type 1 data via BioCASe data pipelines]] | Published datasets are citable using direct URLs to the DIP or via the DOIs. Based on the data provider's input the citation of the dataset will be prepared by the LIB data curator adjusting the input (submission metadata) to be conform with the GFBio citation pattern. The citation is finalized in close collaboration with the data provider. For details see General part of [[https://gfbio.biowikifarm.net/wiki/Data_Publishing/General_part:_GFBio_publication_of_type_1_data_via_BioCASe_data_pipelines|GFBio publication of type 1 data via BioCASe data pipelines]] |
| |
Example: ''ZFMK Coleoptera Working Group (2023). ZFMK Coleoptera Oberthuer collection. [Dataset]. Version: 2.0. Data Publisher: LIB Biodiversity Datacenter. https://doi.org/10.20363/ZFMK-Coll.Oberthuer-2023-02'' | Example: ''ZFMK Ichthyology Working Group. (2024). ZFMK Ichthyology collection (Version 5) [Data set]. LIB Biodiversity Datacenter. https://doi.org/10.20363/zfmk-coll.ichthyology-2024-06'' |
| |
| |
==== Access to data via different portals ==== | ==== Access to data via different portals ==== |
| |
Indexed and faceted data are available in public portals such as GBIF, Europeana and GFBio, which are operated by national or international consortia. Specialized web portals for access to the data are developed and provided by the LIB Data Center. These include the [[https://collections.leibniz-lib.de|LIB digital collection catalogue]], the portal of the [[https://bolgermany.de|German Barcode of Life project (GBOL)]], or interfaces to the data, which also provide APIs for machine readable formats and access to the data using CETAF stable identifiers ([[https://id.zfmk.de|id.zfmk.de]], or [[https://id.zmh-coll.de|id.zmh-coll.de]]). | Indexed and faceted data are available in public portals such as GBIF, Europeana and GFBio, which are operated by national or international consortia. Specialized web portals for access to the data are developed and provided by the LIB Data Center. These include the [[https://collections.leibniz-lib.de|LIB digital collection catalogue]], the portal of the [[https://bolgermany.de|German Barcode of Life project (GBOL)]], or interfaces to the data, which also provide APIs for machine readable formats and access to the data using CETAF stable identifiers ([[https://id.zfmk.de|id.zfmk.de]], [[https://id.zmh-coll.de|id.zmh-coll.de]]) or [[https://id.zfmk.de/collection_GFBIO/|id.zfmk.de/collection_GFBIO]] |
| |
The published data are provided with a recommended citation, license and DOI (see above). | The published data are provided with a recommended citation, license and DOI (see above). |