P1005 trainAUGUSTUS - A Webserver Application for Parameter Training and Gene Prediction in Eukaryotes

Katharina Hoff , University of Greifswald, Greifswald, Germany
Mario Stanke , University of Greifswald, Greifswald, Germany
AUGUSTUS is a tool for predicting genes in eukaryotic genomic sequences [1, 2]. For achieving accurate gene predictions, a species-specific set of parameters is needed. Due to the rapidly growing number of newly sequenced genomes, an automated and easy-to-use procedure is needed in order to make gene prediction parameters for new species availabe. Gene prediction parameters are optimized using annotated genes from the species of interest. Such initial gene sets may be generated automatically, e.g. from aligning expressed sequence tags (ESTs) to genomic sequences, or by mapping protein coding genes from other species to the genome. We present a web server application for creating high quality training gene sets from ESTs or protein sequences. Subsequent to finding training genes, the web server application optimizes AUGUSTUS parameters and makes predictions in the supplied genomic sequence using the newly trained parameters and the supplied ESTs or protein sequences as external supporting evidence ("hints"). It is also possible to supply hints that were created externally, e.g. through manual editing, or from RNAseq data alignments. The web server application is available at http://bioinf.uni-greifswald.de/trainaugustus

[1] M. Stanke and S. Waack (2003) "Gene prediction with a hidden Markov model and a new intron submodel", Bioinformatics, Vol. 19, Suppl. 2, pages ii215-ii225

[2] M. Stanke, M. Diekhans, R. Baertsch, D. Haussler (2008) "Using native and syntenically mapped cDNA alignments to improve de novo gene finding", Bioinformatics, 24(5):637