P0934 Submitting Sequence Data to NCBI: An Avenue to Improve Genome Annotation

Melissa J. Landrum , National Center for Biotechnology Information, NLM, NIH, Bethesda, MD
Kim D. Pruitt , National Center for Biotechnology Information, NLM, NIH, Bethesda, MD
Ilene Mizrachi , National Center for Biotechnology Information, Bethesda, MD
Karen Clark , National Center for Biotechnology Information, NLM, NIH, Bethesda, MD
the NCBI Data Submission Groups , National Center for Biotechnology Information, NLM, NIH, Bethesda, MD
Effective use of ever-increasing genome sequence data is dependent on the availability of comprehensive high-quality gene annotation, which in turn is dependent on the availability of evidence for gene prediction.  NCBI maintains databases for various kinds of primary sequence and annotation which are used as evidence for gene predictions, variation calls, and more. GenBank is an archival database for many sequence types, including genomic, mRNA and EST sequences; it includes the Third Party Annotation (TPA) division, which accepts annotations for GenBank sequences that were not determined by the submitter. Submissions of large-scale high-throughput datasets, either genomic or transcriptomic, start with submission to the BioProject database. BioProject organizes meta-data and links to the submitted data for research initatives, allowing simple retrieval of multiple data types for a single registered BioProject. Each dataset is then submitted to the more specialized database. For example, computationally assembled transcript sequences, using sequences submitted by the same submitter to dbEST, Short Read Archive (SRA), and the Trace Archive, are submitted to the Transcriptome Shotgun Assembly (TSA) respository. RNA-seq data for functional genomics studies is submitted to the Gene Expression Omnibus (GEO) which brokers the submission of the sequence reads to SRA.  Whole genome assemblies are submitted to GenBank, with or without annotation. Descriptions of biological source materials used in experimental assays may be submitted to the BioSample database. A new submission portal for submission of metadata to BioProject and BioSample is under development and will be expanded to facilitate submission of datasets to NCBI archival databases.