W194 Using RNA-Seq in Multi-Genome Gnomon for Fungal Genomes Annotation

Date: Tuesday, January 17, 2012
Time: 4:10 PM
Room: Royal Palm Salon 1,2,3
Alexandre Souvorov , NIH/HLM/NCBI, Bethesda, MD
The comparative multi-genome approach utilizes the fact that protein-coding regions are more conserved than non-functional regions. The multi-genome Gnomon method for the parallel annotation of several genomes is an iterative process that starts from single genome gene Gnomon predictions and uses predicted proteins to gradually improve the annotation. At each iteration, the best models are selected and used as a training set and evidence for the next Gnomon step. At the end of the iterative process, all genomes are trained and annotated. This process produces annotations which are consistent across all involved genomes. The cDNA evidence available for any of the genomes contributes into other genomes annotations. The RNA-Seq data available from the deep-sequencing projects has a tremendous potential for the gene finding. Also it creates new challenges for the gene finding algorithms. To name a few they include the sheer amount of the  raw data, a large number of false positive hits and logical problems of finding the right chaining path for connection of small pieces into a full length gene. In the new Gnomon version we implemented filtering and chaining algorithms which are capable of finding genes and their alternative variants using a combination RNA-Seq data and all other evidence, if available (mRNA, EST and proteins). The multi-genome approach has been applied to eight Aspergillus genomes. Four of these genomes have RNA-Seq data available. It was shown that the multi-genome approach successfully used the indirect RNA-Seq data to improve the annotation of the genome without such data.