Guidelines for genomics & transcriptomics data

Guidelines for COVID-19 data

Make your COVID-19 research data useful and accessible for the rest of the research community by publishing in a public repository together with descriptive metadata.


We suggest that raw virus sequence data as well as assembled and annotated genomes are submitted to ENA. See documentation about submission at SARS-CoV-2 submission. Before submission of raw sequence data (e.g. shotgun sequencing) it is necessary to remove contaminating human reads.


Metadata provides ‘data about data’ , and may include information on the methodology used to collect the data, analytical and procedural information, definitions of variables, units of measurement, any assumptions made, the format and file type of the data and software used to collect and/or process the data. Researchers are strongly encouraged to use community metadata standards where these are in place.

MINSEQE (Minimal Information about a high throughput SEQuencing Experiment) is the preferred minimal metadata standard for transcriptomics data in general. For viral data, consider using the ENA virus pathogen reporting standard checklist.

It is highly recommended to, from the very beginning of the project, structure e.g. sample metadata in a way that enables sequence data submission without having to reformat the metadata.