Refers to the current gold standard; genome sequences with less than 1 error per 100,000 base pairs and where each replicon is assembled into a single contiguous sequence with a minimal number of possible exceptions commented in the submission record. All sequences are complete and have been reviewed and edited, all known misassemblies have been resolved, and repetitive sequences have been ordered and correctly assembled. Remaining exceptions to highly accurate sequence within the euchromatin are commented in the submission. The Finished product is appropriate for all types of detailed analyses and acts as a high-quality reference genome for comparative purposes. Some microbial genome sequences where multiple platforms have been used for the same genome have exceeded this standard, and it is believed that no bases are incorrect except for natural, low-level biological variation.
Intermediate standards often overlap, and although we do not advocate any one standard, we recommend that the target standard be based on the needs and goals of each project. There may be cases where select regions will be targeted for improvement and more than one standard may apply (such regionally improved sequences should be identified). This approach is most often used for eukaryotic whole-genome sequencing projects, where the cost of complete finishing remains prohibitive, and allows improvement to be directed at euchromatic sequence, because heterochromatic sequence remains largely recalcitrant to available approaches. Legacy eukaryotic tiling path standards will remain in use for a time.
Here, we have attempted to capture in a technology-independent fashion the types of whole-genome sequencing projects that are beginning to populate databases, and we have defined a set of standards that accommodate a growing list of alternative genome products that have been obtained via less conventional means, such as environmental (metagenomic) or single-cell sequencing. Ongoing discussions with genome database repositories have been met with enthusiasm, and the implementation of these standards as a requirement for genome submissions is expected. To aid in adoption of this classification of sequence finishing standards, we have added this classification to the Sequence Ontology (3
) where it can now be used to comply with the Genomic Standards Consortium's (GSC) “Minimum Information about a Genome Sequence” standard (4
) “sequencing status” descriptor. Furthermore, the efforts described here recently have been adopted under the umbrella of the GSC (5
). This common currency in defining the products of genome projects enables better management of expectations and allows users of genomic data to assess the quality of the deposited available sequences and decide whether these meet their needs.