P0933 dbSNP and dbVar: NCBI Databases of Simple and Structural Variations

Lon Phan , NIH/NLM/NCBI, Bethesda, MD
Ming Ward , NIH/NLM/NCBI
Hua Zhang , NIH/NLM/NCBI
Dmitry Rudnev , NIH/NLM/NCBI
Mike Kholodov , NIH/NLM/NCBI
David Shao , NIH/NLM/NCBI
Eugene Shekhtman , NIH/NLM/NCBI
Rama Maiti , NIH/NLM/NCBI
John Lopez , NIH/NLM/NCBI
Tim Hefferon , NIH/NLM/NCBI
John Garner , NIH/NLM/NCBI
Deanna M. Church , NIH/NLM/NCBI
Karl Sirotkin , NIH/NLM/NCBI
Donna Maglott , NIH/NLM/NCBI, Bethesda, MD
Mike Feolo , NIH/NLM/NCBI
Steve Sherry , NIH/NLM/NCBI
The National Center for Biotechnology Information (NCBI) creates and maintains a set of databases that archive, process, display and report information related to germline and somatic variants from multiple species. These databases, primarily the Database of Short Genetic Variations (dbSNP) and the Database of Genomic Structural Variations (dbVar) are integrated with many resources at NCBI including Gene, PubMed, Nucleotide, and Genome. This presentation focuses on dbSNP and dbVar, summarizing current function and highlighting recent improvements. dbSNP and dbVar represent millions of variants from over 150 species, including many agriculturally important plants and animals . The primary roles of both databases are to process submissions, archive the data, and distribute for general use.  Each submission is assigned a database identifier (ss# in dbSNP or nsv#/esv# in dbVar) based either on flanking invariant sequence or locations asserted on reference sequences. These submissions are then processed to aggregate information from multiple submitters (assign rs# in dbSNP) and to calculate locations on NCBI Reference Sequences (RefSeqs).  Because these stable public accessions are citable in publications, they facilitate aggregation of information as diverse organisms are tested for variation. Researchers and genetic testers are encouraged to submit their variation data and to cite their submissions in manuscripts and on the web.   Once data are accessioned, they are made available in diverse ways: Entrez searches, study-specific reports, annotation on the genome, and ftp transfer.