P0945 Improving Protein Annotation Through Protein Clusters

Anjana Raina Vatsan , NCBI/NLM/NIH, Bethesda, MD
Azat Badretdin , NCBI/NLM/NIH, Bethesda, MD
Slava Chetvernin , NCBI/NLM/NIH, Bethesda, MD
Boris Fedorov , NCBI/NLM/NIH, Bethesda, MD
William Klimke , NCBI/NLM/NIH, Bethesda, MD
Boris Kiryutin , NCBI/NLM/NIH, Bethesda, MD
Sergei Resenchuk , NCBI/NLM/NIH, Bethesda, MD
Brian Smith-White , NCBI/NLM/NIH, Bethesda, MD
Igor Tolstoy , NCBI/NLM/NIH, Bethesda, MD
Tatiana Tatusova , NCBI/NLM/NIH, Bethesda, MD
NCBI's protein clusters database for plants includes 71,000 clusters composed of 373,000 RefSeq proteins from seven higher plants (Arabidopsis thaliana, Arabidopsis lyrata, Vitis vinifera, Populus trichocarpa, Ricinus communis, Oryza sativa japonica, Sorghum bicolor), Physcomitrella patens, Selaginella bicolor, five algea (Chlamydomonas reinhardtii, Volvox carteri, ostreococcus lucimarinus, Micromonas pusilla and Micromonas sp. RCC299), two diatoms (Thalassiosira pseudonana, Phaeodactylum tricornutum), and the nucleomorph genomes of Guillardia theta, Hemiselmis andersenii and Bigelowiella natans.

Clusters are manually curated to add functional annotation for genes and proteins, E.C. numbers, and publications. Annotation added to the cluster is then transferred to all protein members within that cluster, and this is reflected in the genomic RefSeq records in Entrez Nucleotide, Genome, Protein and Gene. This will help NCBI to enrich functional annotation on the RefSeq genomes, allowing most up-to-date information on molecular functions of all genomes submitted to NCBI.

ProtClust database is part of NCBI's Entrez query and can be accessed at http://www.ncbi.nlm.nih.gov/proteinclusters.