DEV | BGE - Self-Assessment

This workflow describes the act of data publishing (also data publication) where research data is released in published form for reuse by others.

This policy area does not have a further breakdown into components.

How

ABS stands for Access and Benefit - Sharing. Access means requesting a permit to collect and utilise genetic resources (in-situ and ex-situ resources), and Benefits-Sharing means giving something back to the country where the material comes from.

How

The Convention on Biological Diversity (CBD), entered into force in December 1993. The CBD recognises that countries have sovereign rights over their natural resources, and its ‘third’ objective is the fair and equitable sharing of benefits arising from the utilisation of genetic resources, it means, between the users of genetic resources (commercial and non-commercial) and the countries providing those resources.

The Nagoya Protocol, which entered into force on the 12th October 2014, is a further supplementary agreement that aims to provide a clear, transparent and non-arbitrary international framework to implement the ‘third’ objective of the CBD. In other words, the Nagoya Protocol is the international framework for ABS and compliance. In the Annex of the Protocol, you can find examples of non-monetary and monetary benefits that could be shared from the utilisation of genetic resources and/or traditional knowledge of indigenous people and local communities associated with genetic resources. The ABS measures may cover genetic resources that are acquired in situ (from their natural origin in a country) and ex situ (meaning genetic resources that may be stored in an institute that is not in the country where it originally came from, or even if you buy them in the store).

How

A is for access

Each country decides if they regulate access to their genetic resources and how.

NOT implemented at EU Level – each Member State has the right to decide

Provider countries may request a prior informed consent (PIC) – it means asking for permission first.

The ABS National Focal Points should provide information on the country’s ABS rules and procedures.

B is for benefit-sharing

Benefit-sharing is based on mutually agreed terms (MAT) between the provider country and the user. Benefits could be monetary or non-monetary.

C is for compliance

Measures to control that the material, utilised within its jurisdiction, have been accessed legally.

Compliance obligations are determined at the EU level though the Regulation (EU) No 511/2014. It establishes the framework for member states on obligations for users of genetic resources within the EU.

How

The EU Regulation 511/2014 on compliance measures for users of genetic resources and associated traditional knowledge, establishes rules governing compliance with ABS in accordance with the provisions of the Nagoya Protocol.The EU ABS Regulations defines “utilisation” as conducting research and development on the genetic and/or biochemical composition of genetic resources.

To determine whether you have compliance obligations under the EU ABS Regulation, these are the elements that need to be considered.

To Be Aware

EU member states may have additional and stricter laws on access to genetic resources than what is stated in the EU Regulation on ABS

How

Before checking your ABS responsibilities, it helps to understand these commonly used terms:

Prior Informed Consent (PIC)

Permission from the provider country (usually through a national authority) to access its genetic resources. This must be obtained before collecting or using the material.

Mutually Agreed Terms (MAT)

A legally binding agreement between the user and the provider country, outlining how the material may be used and how benefits will be shared (e.g. co-authorship, sharing results, royalties).

Material Transfer Agreement (MTA)

Material Transfer Agreements (MTAs) are legal documents that set the terms for transferring biological materials—such as specimens or genetic samples—between organizations. They define the rights, responsibilities, and conditions related to the material’s use, including legal, ethical, and institutional obligations.

MTAs are essential for:

Ensuring proper credit and acknowledgment in research outcomes

Protecting ownership, intellectual property, and access to benefits

Defining the nature and conditions of the transfer

Clarifying liability and responsibility in case of misuse or loss

Complying with legal, ethical, and funding requirements (e.g., BGE project standards)

Digital Sequence Information (DSI)

Refers to digital genetic data such as DNA, RNA, or protein sequences. DSI is not currently regulated under the EU ABS Regulation, but some provider countries do consider it part of their ABS requirements. Always check local rules via the ABS Clearing-House.

DECLARE

The EU’s online platform for submitting due diligence declarations under the ABS Regulation, required when conducting research on ABS regulated genetic material, especially if using EU funding.

How

When publishing or sharing genetic data derived from biological material, you must consider whether your activity is subject to ABS requirements under the Nagoya Protocol and the EU ABS Regulation (Regulation EU No 511/2014), or EU member states ABS regulations. Even if you are only handling digital data, if it originates from a physical genetic resource, the same obligations may apply.

Some questions to ask your self:

Did your data originate from genetic resource material collected in a country that is party to the Nagoya Protocol with national ABS laws?

Was the material accessed after 12th October 2014?

Are you publishing or sharing research that investigates the genetic or biochemical composition of the material (e.g. sequencing, gene function, etc)?

If yes to all of the above, then your data falls within the scope of ABS and you will need to demonstrate compliance.

To Be Aware

if you are unsure it is still best to check and follow the ABS compliance.

How

Whether you collected the material yourself or received it from another institution or researcher, you will need to confirm what legal and ethical terms apply before publishing the genetic data.

If you collected the material yourself, did you obtain Prior Informed Consent (PIC) and agree on Mutually Agreed Terms (MAT) with the provider country? Do those agreements cover data sharing or publication?

If acquired the material from another individual or institution, was it transferred under a Material Transfer Agreement (MTA)? If so, review it carefully for any restrictions or obligations (e.g. embargos, licensing, co-authorship or benefit-sharing terms).

Use the ABS Clearing-House: (https://absch.cbd.int/en/ ) to check the provider country’s ABS rules.

If the terms are unclear, contact the National Focal Point listed for that country.

Always check with your institution's legal or ethics office if you're unsure. These agreements form the legal basis for whether you can publish or share the data.

Also review the ABS check points for Terrestrial and fresh water sampling, Marine sampling and Sampling in ex-situ collections for further information.

How

Use trusted repositories that allow you to include information about permits, origin, and usage terms (e.g. Zenodo, ENA, GBIF, GGBN).
Clearly document the provider country, collection permits, and any conditions in your data metadata and publications.
Even if data is open, you must respect the original agreements.

How

DECLARE is a web based application(DECLARE). It requires a login. When you get to this point you have done most of the work, as the due diligence obligation exercise is quite easy.. It is five questions and is simply entering in the information that you have already accumulated (permits and permissions etc), and so it is a five min process. You should safely store all documents so if genetic materials obtained are transferred to a third party they are accompanied with these documents.. However, prior to any third party transfer you must ensure this is allowed within the conditions of the MAT. Even if allowable, you still must have a paper track of the transfer. Finally, if you are receiving material from an external party you should always ask where the material was obtained from.

How

Below are the references used to for writing the checkpoints, and are good resource starting points. Please also see the resource hub for more resources.

Convention on Biological Diversity (n.d.). ABS Factsheet: Frequently Asked Questions. Available at: https://www.cbd.int/doc/programmes/abs/factsheets/abs-factsheet-faqs-en.pdf
Consortium of European Taxonomic Facilities (CETAF) ( 2019). Code of Conduct & Best Practice for Access and Benefit‑Sharing [Leaflet]. Available at: https://www.cetaf.org/wp-content/uploads/
Consortium of European Taxonomic Facilities (CETAF) (2019). CETAF Code of Conduct and Best Practices. Available at: https://cetaf.org/wp-content/uploads/CETAF_Code-of-Conduct-and-Best-Practices_UK-final-version.pdf
de Mestier, A.H., et al. (2023). Policies Handbook on Using Molecular Collections. Research Ideas and Outcomes, 9: e102908. Available at: https://riojournal.com/article/102908/
European Parliament and Council of the European Union (2014). Regulation (EU) No 511/2014 of 16 April 2014 on compliance measures for users under the Nagoya Protocol (text with EEA relevance) [Regulation]. Official Journal of the European Union L 150, pp. 59–71. Available at: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32014R0511 (Accessed: 28 August 2025).
European Commission (2014). Nagoya Protocol: Questions and Answers [Press Release]. Available at: https://ec.europa.eu/commission/presscorner/detail/en/memo_14_411
European Commission (2021). Guidance document on the scope of application and core obligations of Regulation (EU) No 511/2014 of the European Parliament and of the Council on the compliance measures for users from the Nagoya Protocol on Access to Genetic Resources and the Fair and Equitable Sharing of Benefits Arising from their Utilisation in the Union (2021/C 13/01). Official Journal of the European Union C 13, pp. 1–22. Available at: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=uriserv%3AOJ.C_.2021.013.01.0001.01.ENG&toc=OJ%3AC%3A2021%3A013%3ATOC
Nagoya Protocol Compliance Self-Assessment Tool (n.d.). Self-Assessment Tool for ABS Compliance in Organizations. Available at: https://nagoyaprotocol.myspecies.info/content/self-assessment-tool-abs-compliance-organizations
Nagoya Protocol Hub (n.d.). Homepage. Available at: https://nagoyaprotocol-hub.de/
Rochette, N.C., et al. (2023). Global survey of biodiversity access and benefit sharing awareness and capacity. bioRxiv. Available at: https://www.biorxiv.org/content/10.1101/2023.06.28.546652v4
Scarlett Sett & Dirk Neumann (2023). Biodiversity Genomics workshop on ABS Training Materials: Training Videos (1 & 2) and Presentations [Videos 1, Video 2 and presentation slides]. BGE Project.
Secretariat of the Convention on Biological Diversity (2010). Introduction to Access and Benefit‑Sharing (ABS) [Brochure]. Available at: https://www.cbd.int/abs/infokit/brochure-en.pdf (Accessed: 28 August 2025).

How

Before conducting your field sampling you should check to see if you need to comply with the Nagoya Protocol/ABS regulations. Non-Compliance could lead to penalties, and the inability to publish or use research results. Please be aware of sampling outside the EU as regulations may differ.

To Be Aware

It is strongly advised that you also look at the ABS guidance for fresh terrestrial sampling, fresh marine sampling, or sampling in ex-situ collections (depending on how you obtained the genetic resources) to determine if you need to comply with ABS and how to do so.

The FAIR principles provide guidelines to enhance the Findability, Accessibility, Interoperability, and Reuse of digital assets, stressing machine-actionability to manage increasing data volume, complexity, and speed.

How

The FAIR data principles are a set of best practice guidelines for research data. The FAIR principles were first published in 2016 (Wilkinson et al. 2016) by a group of international researchers and were designed to address the growing need for data that is not just openly available, but structured and described to in particular allow for machine readability and interoperability. But also to increase the ability of reuse and better harmonised interpretation by my people.

FAIR stands for:

Findable: Data should be easy to find for both humans and computers. Metadata and data should be assigned a globally unique and persistent identifier.

Accessible: Once found, data should be retrievable using standardised protocols and, where necessary, with clear authentication and authorisation procedures.

Interoperable: Data should use a formal, accessible, shared, and broadly applicable language for knowledge representation.

Reusable: Data should be richly described with accurate and relevant attributes, released with a clear usage licence, and meet relevant community standards.

While the FAIR principles are not legally mandatory, they are widely endorsed by the EU, and also the Genomics, Natural History, Biodiversity and Taxonomic Communities and research infrastructures (ERGA, BIOSCAN, ELIXIR, EOSC, CETAF, DiSSCO, BGE, etc.). These principles ensure that research data remains discoverable, shareable and valuable long after a project ends.

FAIR data does not mean that it has to be fully open, which is not possible due to the need to protect sensitive data.

How

Biodiversity genomics and taxonomy often rely on international networks, shared databases, and multi-site research projects (e.g. ERGA, BIOSCAN, GBIF). The FAIR principles ensure that data can be accessed, interpreted, and reused across different countries and institutions without having to “reinvent the wheel” each time.

By making data findable and linked through persistent identifiers (such as specimen catalogue numbers and GenBank or ENA accessions), researchers support reproducibility, validation, and future use, particularly in areas like DNA barcoding, voucher-based research, and species identification.

Well documented, reusable data is also critical for real-world applications. It helps assess species conservation status, monitor invasive species or pathogens, and guide ecological restoration. Without FAIR, these efforts become fragmented, inconsistent, or even impossible

How

Need to write

The CARE Principles for Indigenous Data Governance are people and purpose-oriented, reflecting the crucial role of data in advancing Indigenous innovation and self-determination.

How

CARE principles are a set of guidelines for indigenous data governance, more specifically they are about ensuring that data use respects people and communities, especially indigenous peoples. The CARE principles were developed in 2019 by the Global Indigenous Data Alliance (GIDA) in response to the limitations of FAIR for addressing indigenous rights, ethics and sovereignty in data practices.

CARE stands for

Collection benefit: Data involving indigenous peoples should support their communities, helping them thrive and pursue their goals (Socially, culturally, and economically). Data should contribute to shared well-being, not just individual or institutional research interests

Authority to control: Indigenous people have the right to say how data about them, including their lands, culture, languages, and biological knowledge is collected, used and shared. Their authority over this information must be respected at all stages.

Responsibility: Those using indigenous data should engage respectfully and meaningfully with communities, making sure their work doesnt cause harm. Responsibility means being transparent, accountable, and working in a way that honours cultural values and priorities.

Ethics: Ethical research with Indigenous communities goes beyond legal compliance. It means listening to and respecting indigenous worldviews, protocols and rights, and making sure that research and data practices are built on trust, fairness and care.

Resources

https://www.rd-alliance.org/wp-content/uploads/2024/04/CARE20Principles20for20Indigenous20Data20Governance_OnePagers_FINAL20Sept2006202019.pdf

https://datascience.codata.org/articles/10.5334/dsj-2020-043

How

Before you begin, check whether your sampling, sequencing, or data publication involves:

Specimens collected from Indigenous territories

Traditional knowledge about species or ecosystems

Collaborations with indigenous communities or citizen scientists

If so, the CARE principles apply.

How

If indigenous peoples may be connected to your research or material/data, engage with them before data collection or publication. Seek consent, understand community priorities, and respect cultural or knowledge protocols. They should be involved in the planning of research, sharing benefits, and also the giving of credit/acknowledgment.

How

Indigenous communities have the right to control how data about their heritage, lands or knowledge is used and shared. This may include requesting data restrictions, requiring review before publication, or declining certain uses altogether.

How

Consider how your research can support indigenous goals, whether through co-authorship, shared data ownership, funding, training, or making findings accessible and useful to communities. This is a legal (e.g. Access and Benefit-sharing) and ethical requirements.

How

Follow your institution's ethics policies, but also check if there are community research codes, indigenous data governance protocols, or national guidelines (e.g. GIDA, local agreements). These will help guide your work.

How

That means depositing it in a genomic community endorsed repository that allows open access, and not in a private database. Use genomic community metadata standards to make sure data is understandable and reusable, and allows for attaching permission permits.

Pick a trusted data repository that suits your data type, for example:

ENA, Genbank, BOLD, GGBN for genomic sequences

GBIF for species occurrence records

Dryad, Zenodo or your institution's data repository for more general datasets.

Also see the guidance for FAIR principles and CARE principles for Data governance and standards.

How

Your dataset should be shared in a format that can be processed by software, something machine readable. That might be a CSV file, JSON, etc, or another open format that doesn’t require a specialist or software that is privately owned (e.g. paid licence).

Use genomic community metadata standards to make sure data is understandable and reusable, and allows for attaching permission permits.

The EU Open Data Directive promotes free access and reuse of publicly funded data, including research, to support transparency, innovation, and interoperability across Europe.

How

The Open data and the re-use of public sector information (Directive (EU) 2019/1024) is an EU law that requires publicly funded data, including research data, to be made as openly available as possible. The idea is that data collected using public money should be reusable by others, whether for research, policy, or even commercial use, unless there’s good reason to restrict it (e.g. Location data on protected species).

For biodiversity genomic researchers, this means that data you have generated through publicly finded projects should be published in a way that allows others to access and re-use it. It doesn’t apply to academic papers, just the data behind them.

Resources add - EU open science framework.

How

This directive only applies to data that is:

Fully publicly funded or partly publicly funded

Already made publicly available by you or your team (e.g. in a repository)

Not protected by specific restrictions (e.g. CITEs, endangered species, Personal data regulations, national security etc).

So, if you are working on a biodiversity genomics project funded by an EU government agency or an EU programme, and you are planning to share your data in a public archive, then this directive applies to you.

How

The directive expects that you will allow others to reuse your data, so long as that it is safe and lawful. The easiest way to do this is to attach an open licence.

The most common choices are:

CC0 (Creative Commons Zero): (add source) waives all rights; people can use the data however they want.

CC-BY (Attribution): (add source) lets others re-use your data as long as they credit you.

Most repositories let you choose the licence when you upload the dataset. If you are unsure, check what your funder or institution recommends, or ask your data manager.

How

If your dataset includes information that could put endangered species at risk like precise locations of nesting sites or breeding areas you are allowed to restrict that information.

You can do this by:

Generalising or masking location data (e.g. reporting at the 10km level instead of exact coordinates),

Putting sensitive data in a restricted-access part of a trusted repository,

Explaining in the metadata why certain information has been withheld.

See IUCN mapping standards

The directive allows for this. In fact, it follows the principle of “as open as possible, as closed as necessary” which gives you flexibility to protect the species and habitats you work on.

How

Check what your funder and institution require. Many already have open data policies in place that reflect this directive, or go even further. Following those policies will almost always mean you’re also complying with the directive.

If you’re unsure, contact your research office or data manager. They can often help with things like choosing a repository, setting licences, and writing data management plans.

Repositories recognised by the scientific community or certified by standards bodies ensure long-term preservation, accessibility, and integrity of data or specimens through best practices in curation, metadata, and legal compliance.

How

A trusted data repository is one that meets recognised standards for long-term preservation, accessibility, interoperability, and data quality, and has been evaluated and certified with the CoreTrustSeal) or endorsed by genomic research community as a reliable for storing, managing and sharing specific types of data, and uses recognised metadata standards.

It is recommended that genomic researchers and collection managers put specimen data, sequence data, DNA barcodes, Occurrence data into appropriate trusted public repositories because it ensures that data is:

Preserved long-term, because trusted repositories provide ongoing curation and back-ups and format migration so that data isn’t lost when a project ends or staff change.

Accessible to different stakeholders, which allows other researchers, conservationists, and policymakers to discover and use the data.

Interoperable and standardised, trusted repositories apply community metadata standards (e.g. DarwinCore, MIxs) making it possible to integrate your data with other datasets globally and to enrich and compare.

Citable and traceable, genomic trusted repositories most often issue persistent identifiers to datasets, this means that your data can be easily referenced, linked to publications, and credited to you and your institution.

Funder and journal requirements, many funders, including the EU’s Horizon Europe, and scientific journals mandate deposition in recognised repositories for transparency and reproducibility.

Aligns with FAIR principles, and strengthens scientific impact.

How

Repository	Purpose Use	Metadata standards	Requirements/restrictions
Barcode of Life Data System (BOLD)	Specialised database for DNA barcoding data. Stores specimen, sequence, and metadata, assigning BINs (Barcode Index Numbers) to clusters of related sequences.	BOLD-specific metadata schema aligned with Darwin Core and MIxS for sequence-associated data.	Requires user registration. Sequence data must be linked to voucher specimens. Metadata must meet BOLD standards including taxonomy, collection data, and georeferencing.
European Nucleotide Archive	Comprehensive repository for nucleotide sequences, raw reads, assemblies, and related metadata. Issues accession numbers for data traceability.	MIxS (Minimum Information about any (x) Sequence), INSDC feature tables, and BioSample metadata	Data submitters must register and prepare data using ENA submission tools. Must provide MIxS-compliant metadata for sequence data.
Global Genome Biodiversity Network	Portal linking biodiversity biobanks worldwide, providing persistent identifiers for physical specimens and standardised metadata.	GGBN Data Standard (extension of Darwin Core and ABCD schema).	Institutions must be registered GGBN members to deposit data. Metadata must comply with GGBN Data Standard; some data can be restricted
Global Biodiversity Information Facility (GBIF)	Global platform for biodiversity occurrence data. Aggregates specimen and observation records using Darwin Core standards.	Darwin Core (DwC) and EML (Ecological Metadata Language).	Data must be licensed (e.g., CC0, CC BY). Only occurrence/specimen data accepted; no sequence storage. Data providers control update cycles.
International Nucleotide Sequence Database Collaboration (INSDC)	Partnership between ENA, GenBank, and DDBJ to share nucleotide sequence data globally. Ensures synchronisation and interoperability.	MIxS, INSDC feature table	Submissions via partner repositories (ENA, GenBank, DDBJ) only. Metadata and sequence formats must meet INSDC specifications.
Collaborative Open Plant Omics	Platform for managing, annotating, and publishing plant omics datasets, facilitating submission to public repositories.	MIxS, ISA-Tab for experimental	Platform for managing, annotating, and publishing plant omics datasets, facilitating submission to public repositories.
Genomes on a Tree (GoAT)	Phylogenomic resource providing visualisation of genome relationships across species, based on available genomic data.	Phylogenetic metadata standards, linked to INSDC accessions	Data sources vary; quality depends on underlying repositories. Primarily for visualisation, not raw data deposition.
European Molecular Biology Laboratory (EMBL)	Major European life sciences organisation hosting key repositories such as ENA, providing data storage, analysis tools, and training.	Dependent on sub-repository (e.g., ENA uses MIxS; ArrayExpress uses MAGE-TAB).	Access policies vary by repository/service within EMBL. ENA follows INSDC rules; other EMBL databases have domain-specific requirements.

Agreed-upon specifications for the structure, format, and content of descriptive information (metadata) that ensure data are consistent, interoperable, and reusable across systems and disciplines.

How

Metadata standards are the agreed set of rules, definitions and formats that ensure consistency across data sets and institutions. A metadata standard often specifies a schema, but also includes guidelines for controlled vocabularies syntax, and usage.

Using community endorsed metadata standards are important for:

Scientific reproducibility, they make sure essential information about sequence, standard, or data set is captured in a consistent way that allows other researchers to replicate or validate results.

Linking data across domains: Genomic data often lives in sequence databases, while specimen and ecological data live in biodiversity repositories. Standards act like a “common language” so these systems can link a sequence to its specimen, habitat, and project.

Interoperability Across Infrastructures, standards let different repositories, projects, and organisations share data without custom reformatting each time.

Compliance and legal requirements: The Nagoya Protocol on Access and Benefit-Sharing requires certain metadata: country of origin, permit numbers, dates.

Standards ensure this legal metadata is consistently included, which is essential for international sharing of genomic data.

FAIR principles: community endorsed formal standards work together to make genomic data Findable, Accessible, Interoperable, and Reusable (FAIR) without them, genomic data would be fragmented, hard to verify, and often unusable outside the original project.

Long term data preservation, Genomic projects are long-term investments. Without standardised metadata, future researchers may find the sequence useless because its origin or methods are unclear.

How

When starting a new biodiversity genomics project, it’s important to plan your metadata strategy from the very beginning. First, check what data will be created or used, your institution’s policies and any requirements from your funders or data repositories you intend to use, this will often determine which standards you must follow.

Name	Purpose	Owner	Repositories	Legal Ethical
Darwin Core (DwC)	Global biodiversity data standard for occurrence data, taxonomy, collection events and specimens.	TDWG	GBIF, many natural history musuems, DiSSCo Research Infrastructure, GGBN, BOLD, IBOL	Basic: Country, rights, permit info via extensions
Access to Biological Collection Data (ABCD)	More detailed alternative to DwC for natural history collections/specimens	TDWG & Botanical Garden and Botanical Museum Berlin (BBGM)	GBIF, many natural history museums, DiSSCo Research Infrastructure, GGBN,	Strong: Explicit permit & authorization fields
DwC Extension: DNA/DwC	Added fields to DwC for DNA extraction sequencing, and genetic resource IDs	GBIF in collaboration with TDWG	GBIF, BOLD, GGBN	Moderate: Adds fields for permits, MTAs, sequence provenance; complements core DwC ABS coverage
Minimum Information about any (x) Sequence (MIxS)	A set of standards for genomic metagenomic, and marker-gene sequences.	Genomic Standards Consortium (GSC)	INSDC members (GenBank, ENA, DDBJ), GOLD (Genomics Online Database), MGnify (EMBL-EBL metagenomics platform), UNITE, Eu projects like the Biodiversity Genomics Europe (BGE)	Basic: Country + custom fields for ABS
MIxS - Environment Packages	Links specimen data with environmental context	Genomic Standards Consortium (GSC)	GOLD, MGnify, ENA, large ecological sequencing projects (e.g. Earth Microbiome Project)	asic: Same as MIxS core; no extra ABS-specific fields, but inherits country/location attributes
MIGs/MIMs/MIMARKS	Specialisation of MIxS for specific sequence types: Genome (MIGS), Metagenome (MIMS), Marker-gene (MIMARKS).	Genomic Standards Consortium (GSC)	NCBI BioSample, ENA, BOLD,	Basic: se of geo_loc_name, lat_lon, and voucher/specimen linkage; no dedicated ABS terms.
INSDC Submission Standards		INSDC International Nucleotide Sequence Database Collaboration (GenBank [NCBI, USA], ENA [EMBL-EBI, UK], DDBJ [Japan])	GenBank EMBL- ENA DDBJ	Basic: Require country, isolation_source, voucher specimen link; optional free-text notes for permits/licences.
BioSample (NCBI/INSDC)	Metadata model for describing biological samples in public sequence databases.	INSDC International Nucleotide Sequence Database Collaboration (GenBank [NCBI, USA], ENA [EMBL-EBI, UK], DDBJ [Japan])	NCBI BioSample database (for all sequence-linked samples), ENA Sample records (compatible schema), DDBJ BioSample Large projects: iBOL, GGBN genome submissions, Darwin Tree of Life	Basic: geo_loc_name, isolation_source, plus free-text or custom attributes for permit/MTA numbers.
ENA checklist	European Nucleotide Archive-specific metadata	EMBL-EBI – European Bioinformatics Institute (UK)	EMBL-EBI / ENA for all submissions, MGnify (uses ENA checklist in submission pipeline), European sequencing centers (e.g., Wellcome Sanger Institute) when submitting	Moderate: Requires country; some checklists have collection_permit or project permit fields; free-text license notes possible
GGBN Darwin Core/ABCD Extensions	Extends DwC/ABCD with genomics-specific fields	Global Genome Biodiversity Network (GGBN) in cooperation with TDWG	Global Genome Biodiversity Network (GGBN), GBIF.	Strong: Detailed legal provenance: collecting permit IDs, ABS agreement status, MTAs, country of origin, restrictions.
BOLD metadata specification	Not a formal open standard like Darwin Core or ABCD. it is a custom schema that combines specimen, taxonomic, and sequence data fields tailored for DNA barcoding. IBOLD maps its fields to Darwin Core and other standards when sharing with external systems like GBIF or GenBank.	Centre for Biodiversity Genomics (CBG)	BOLD systems, IBOL, IBOL EU	Moderate: Country of collection, legal/permit info for projects; provider institution; mapped to DwC when exporting to GBIF
Dublin Core	Occasionally used in data repositories for describing datasets	Dublin Core Metadata Initiative (DCMI)	General repositories: Zenodo, Dryad, Figshare (dataset-level metadata), GBIF	Basic: rights, rightsHolder, license fields can describe ABS restrictions at dataset level; not specimen/sequencing specific.

Refers to the use of stable, globally recognised identifiers (e.g. DOIs, accession numbers, specimen IDs) to track and link genomic data, specimens, and related documentation, ensuring long-term traceability, citation, and data integration

How

Need to write

How

An International Nucleotide Sequence Database Collaboration (INSDC) accession number is a unique permanent identifier that is automatically assigned when it is submitted to the International Nucleotide Sequence Database Collaboration (INSDC). The INSDC is a global partnership between ENA (Europe), GenBank (USA) and DDBJ (Japan) that provides free, permanent archiving of nucleotide sequence data. Researchers must submit sequence data to one of these partner databases to receive an accession number, which serves as a persistent identifier for the sequence. This number should be cited in publications, linked to specimen metadata (e.g. voucher ID, GUID) and shared in repositories to ensure findability, traceability, and compliance with community standards.

How

BioSample ID and a Bioproject ID are both identifiers issued by the INSDC partner databases (ENA, GenBank, DDBJ) to help organise and link related genomic data.

BioSample ID: Identifies the biological sample from which sequence data were derived. It contains metadata like organism name, collection location, collection, data and links to relevant specimen or culture collection record. This ensures that any sequence submitted from that sample can be traced back to its origin.

BioProject ID: Identifies the overall research project or initiative under which the sequence was done. It can group multiple BioSamples, sequence runs, and data types together under one umbrella.

Both BioSample IDs and BioProject IDs are assigned automatically by the INSDC databases when a researcher submits the required metadata

How

When a researcher submits their DNA barcode data to the Barcode of Life Data System (BOLD) it automatically assigns several types of identifiers to manage, track, and cited barcode data. These identifiers linke specimen, sequences and datasets, ensuring they remain findable, traceable, and reusable. The main types are:

Process ID: A unique code for each specimen record uploaded to BOLD, linking its metadata, images, and sequences.

BIN (Barcode Index Number): An algorithm assigned cluster ID that groups similar barcode sequences, often serving a species proxy when taxonomic names are uncertain.

Dataset DOI: a permanent, citable Digital Object Identifier (DOI) assigned to an entire dataset when made public in BOLD.

These IDs work together: a Process ID identifies the individual record, a BIN groups related sequences, and a Dataset DOI cites the entire published dataset. Together, they ensure traceability from the individual specimen to its place in the broader biodiversity data landscape.

How

Specimen catalogue numbers refer to the physical specimen and they are the primary identifier given by the holding institution (e.g. museum, herbarium or biobank, etc.). They are usually assigned in the institution's collection management systems. They usually are a combination of the institution ID, the catalogue, number and specimen type. It is best practice that these IDs are machine resolvable and globally unique.

Collection managers can obtain globally unique identifiers from related communities. Here is a table of some below specimen IDs:

Identifier type	Purpose	Where to get it	Restrictions/requirements
CETAF Specimen ID	Globally unique resolvable URI for physical specimens	Minted by institutions via their collection management system	Ust have structured specimen metadata designed for the physical voucher specimen.
GGBN GUID	Persistent identifier for genomic samples and linked specimens in the Global Genome Biodiversity Network.	Minted by GGBN when the physical specimen data set is submitted to the GGBN repository	Must be GGBN institutional member; data must meet GGBN metadata standards; can be linked to GBIF and INSDC.
GBIF Occurrence ID	Globally unique identifier for occurrence or specimen records in GBIF.	Assigned via GBIF Integrated Publishing Toolkit (IPT) when institution publishes dataset.	Requires dataset registration with GBIF; occurrence data must meet Darwin Core standards.

These physical Specimen IDs can be cross referenced in different community trusted repositories included in the metadata.

How

Need to write

How

need to write

General Data Protection Regulation compliance.

How

Before your sampling activity begins, assess whether you collect or use any personal data (e.g. Names, contact details, images, audio recordings or any data linked to participants. If so GDPR likely applies. Please refer to your institution's policies for confirmation, also it may contain mandatory consent templates, storage specifications or reporting requirements.

How

If personal data is involved, make sure your activity follows key GDPR principles including:

Informed consent: Clearly explain how the data will be used, stored, shared, and how the owner can access and request for withdrawal.

Data limitation: only collect what date is necessary.

Purpose limitation: don’t use the data for purposes other than what was agreed

Security: use secure storage (e.g. encrypted drives or approved or approved platforms)

Access control: limit who can view or process the data.

Resources EU GDPR Leglisation.

How

European Parliament and Council of the European Union (2016). Regulation (EU) 2016/679 of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation) [Regulation]. Official Journal of the European Union L 119, pp. 1–88. Available at: https://eur-lex.europa.eu/eli/reg/2016/679/oj/eng

Genomic data sharing and publication

Policy & Best Practice Self-assessment Tool