P0932 UniGene: an NCBI resource for plant and animal transcripts

Lukas Wagner , National Center for Biotechnology Information, Bethesda, MD
Wonhee Jang , National Center for Biotechnology Information, MD
Kirill Rotmistrovsky , National Center for Biotechnology Information, Bethesda, MD
The UniGene datasets are assembled with the goal of grouping mRNA sequences and ESTs into sets ("clusters") which are in one-to-one correspondence with genes. UniGene datasets exist for over 140 organisms and are refreshed monthly whenever new sequence data is available. There are several modes of search and browsing supported for UniGene. UniGene allows querybar searches for words describing sequences within a cluster or proteins which are similar to sequences in a cluster, as well as gene names for organisms with accepted nomenclature. Browsing of expression neighbors (that is, clusters with similar expression patterns), clusters which differentially express sequences from user-defined sets of libraries, and clusters homologous to known genes is supported. Species for which UniGene datasets are newly available this year include Amphimedon queenslandica.  Direct links from UniGene to MapViewer displays for some organisms have been recently enabled as well. UniGene transcript clusters are singly-linked, so any sequence within a cluster only need be pairwise similar to one other sequence to be included. UniGene builds are frequently updated; the members of a cluster will change as new sequences are added to the dataset; clusters may split or merge.  While cluster identifier numbers are kept as stable as possible with new UniGene builds, they are not stable archival references. Keeping a list of GenBank accession numbers comprising a cluster of long-term interest is the safest  course,particularly for small clusters. http://www.ncbi.nlm.nih.gov/unigene/