Date: Tuesday, January 17, 2012
Time: 3:45 PM
Time: 3:45 PM
Room: Golden Ballroom
Genome annotation is one of the most difficult tasks in genome sequencing projects, but it is essential for connecting genome sequence to biology. With the advent of next-generation sequencing technologies new genomes are being sequenced at a faster rate than they are being fully and correctly annotated. To manage the large amount of data generated by >1Gb genome size sequencing projects, sequence annotation needs to be automated.To achieve a systematic and comprehensive annotation of the bread wheat genome sequence, a parallelized automated pipeline, called TriAnnot (http://www.clermont.inra.fr/triannot), has been developed under the umbrella of the IWGSC (http://www.wheatgenome.org), and installed on a cluster of 712 cores (60 TB, 8.5 Tflops) . The goals of TriAnnot are to provide the international scientific community with an online user-friendly interface for simple BAC or BAC contig analysis and to facilitate large scale analysis such as the annotation of the ~1 Gb wheat chromosome 3B sequence. The modular architecture of the TriAnnot pipeline allows the annotation of repeats and Transposable Elements (TEs), protein-coding genes structural and functional annotation, RNA-coding genes and other biological features identifications. The pipeline uses 73 databanks and 21 bio informatics programs. EMBL/GFF output files can be displayed with GBrowse, Artemis and GenomeView to help further manual expertise. The pipeline can be adapted to the annotation of other plant species. We will explain how to use TriAnnot for small (web interface) or large scale analyses (unix command line), as well as describe the different output files and their use using wheat case studies.