#  Transparency and Traceability of Data 

 



 ##  

  expand\_more  

 
  

 

## Introduction

This section explores the best practices associated with sharing DNA sequence data.

## What are some ways that scientists share genetic data?

[International Nucleotide Sequence Database Collaboration](https://www.insdc.org/) (INSDC) databases is “a long-standing foundational initiative that operates between [DDBJ](https://www.ddbj.nig.ac.jp/), [EMBL-EBI](https://www.ebi.ac.uk/) and [NCBI](https://www.ncbi.nlm.nih.gov/). INSDC covers the spectrum of data raw reads, through alignments and assemblies to functional annotation, enriched with contextual information relating to samples and experimental configurations.” The databases consist of the DNA Database of Japan ([DDBJ](https://www.ddbj.nig.ac.jp/contact-ddbj-e.html)), European Bioinformatics Institute’s [EMBL-EBI](https://www.ebi.ac.uk/), and National Center for Biotechnology Information [NCBI](https://www.ncbi.nlm.nih.gov/). All the databases are publicly available to use at no cost to upload data and also to utilize the data. Each organization provides various capacity building opportunities to further metagenomic research.

## Would it be possible to use the INSDC databases for the purposes of sharing the DNA sequence data generated from MGRs of ABNJ?

Yes. In fact, the INSDC databases are the only databases that are publicly available and usable for the sharing of DNA sequence data that are generated from MGRs of ABNJ. Further, the policy change on the mandatory requirement for spatio-temporal data with new sequence data submissions will support better identification of sequences from ABNJ, *see*[ https://www.insdc.org/spatio-temporal-annotation-policy-18-11-2021](https://www.insdc.org/spatio-temporal-annotation-policy-18-11-2021).

The benefit of using the INSDC databases is that those utilizing the data associated with MGRs of ABNJ would not need to actively notify the Clearing-House Mechanism. Rather, the Clearing-House Mechanism could mine citations from the literature available in the databases. In fact, methods and tools exist to carry out this task and providing these as services for the Clearing-House Mechanism would be a comparatively minor addition to the existing functions of the INSDC databases.

## What are some ways that the INSDC databases can be adapated to suppport the MGR Mechanism?

The following functions could be added to the INSDC databases to support the MGR Mechanism:

● build "contextual data services" to provide marine region mark-up on areas beyond national jurisdiction and coastal records to aid management and classification; and/or

● build further services to support reporting on citations of data (based on the literature mining) to provide a view of the "reach" of a sequence from a given marine region, for example.

Any new services on the literature database or INSDC would have development and some operational costs. Such expenses are expected to be low, but there is no immediately available source of finances for the new functions; if these services were useful to support the Treaty, the BBNJ Treaty's Conference of the Parties could request the Science and Technology Body and INSDC to explore how best to collaborate as part of designing the Clearing-House Mechanism.

## Do the INSDC databases track usage at the level of individual users? 

No. The INSDC does not track usage at the level of individual users. Such tracking would require the registration of all users (including, for example, high school students who might be using the data for a school report). It would also require an additional login for each use, which would require an implementation of an authentication system for all users. INSDC is against such an approach in principle because any requirement to log in (which is needed to document access) will increase friction to data access. Further, while many users access INSDC databases directly, logging in for these individual users would be an inconvenience and for some, an unacceptable loss of anonymity; however, usage is often through machine access. In other words, many users work with software that accesses various databases and retrieves a data set for further processing or analysis.