European Nucleotide Archive: Quick tour

The European Nucleotide Archive [2] (ENA) provides a comprehensive, accessible and publicly available repository for nucleotide sequence data. The ENA attracts users from a multitude of research disciplines and serves as an underlying data infrastructure for other EBI services, including Ensembl [3], Ensembl Genomes [4], UniProt [5] and ArrayExpress [6]. Data submitted to the ENA are validated by automated quality checking and, where possible, manual inspection and curation [7].

the EBI Toolbox and third party tools. Report novel annotation [13] relating to existing sequence as part of the Third Party Data [15] policy. Browse existing sequence and annotation referred to in the literature. Find all sequence and annotation available for a gene of interest. Use sequence similarity to search data (including unassembled raw data [11]) and find out what is known about your new sequence. Link through from nucleotide data to a host of integrated resources, such as genomes (Ensembl [3] and Ensembl Genomes [4]), the scientific literature (CiteXplore [16]), protein products (UniProt [17]) and protein families, motifs and domains (InterPro [18]).

Search and download data
The ENA data are available through several routes (outlined below). For more information, see the ENA search and browse page [19].

Search from the ENA homepage
Carry out searches using accession [20] IDs for nucleotide sequences, text searches (against annotation [13] fields, such as gene symbols and culture collection identifiers), and rapid comprehensive sequence similarity searches.

Web browser
All types of data held in the ENA are available from the browser in integrated views, which include annotated sequences, projects, coding sequences, taxa [21] and experiments in the Sequence Read Archive [22]. See more information about the ENA browser [23].

Parallel web services [24]
Programmatic data access [23] to these services is also provided.

FTP [25] server
For large-scale data downloads, there are FTP and Aspera [26] channels.

Search results
Page 3 of 8

Submitting data to the ENA
Many journals and funders require authors to submit their sequence information to a database that is a member of the International Nucleotide Sequence Database Collaboration [27] (INSDC; see information on collaborations in get help and support on the ENA [28]) prior to publication. The advantage of submitting your sequence data to the ENA is that your data will be permanently available and readily accessible to scientists worldwide. After submission, accession [20] numbers are assigned to identify your sequence and any related information. These accession numbers should be included in your manuscript. You can choose whether to make your data publicly available immediately or wait until your paper is published.

Manual submissions
You can register new sequencing projects and submit assembled sequence and annotation [13] to the ENA using Webin, the EBI's preferred web-based submission system. Webin provides interactive web forms that are tailored to the type of data to be submitted and that capture and validate the information required. Sequence and annotation can also be uploaded to Webin in several formats. Webin is available from the ENA's submissions login page [29].
To submit small-scale raw sequence data, send an e-mail request to datasubs [at] ebi.ac.uk (submissions) and a secure Webin-box will be set up. Users can upload data and metadata [30] files (prepared in third party editors) into the Webin-box by FTP [25] or Aspera [26] and using a dedicated webpage.

Automated submissions
The EBI works closely with sequencing centres and other facilities to ensure the timely incorporation of data into the ENA. A number of options are available for automated submissions, including submission accounts for annotated sequences, the RESTful web-based submission service for nextgeneration sequence metadata, as well as FTP and Aspera drop-boxes. All enquiries should be directed to datasubs [at] ebi.ac.uk (submissions).

Updating ENA content
Records can eventually become out of date; authors might need to make corrections to sequence and assemblies, or they might discover new features that need to be added through annotation. Because such findings are rarely published in journals, it is important that authors communicate their new findings to the ENA. Authors wishing to do so should use the update procedure available from the submissions page [31].

Your feedback
Please tell us what you thought about this course. Your feedback is invaluable and helps us to improve our courses and thus enhance your learning experience.