P0617 Pig Genome 10.2 Annotation and Gene Prediction using mRNA and microRNA Sequencing Evidence

Frank Panitz , Aarhus University, Department of Molecular Biology and Genetics, Faculty of Science and Technology, Tjele, Denmark
Henrik Hornshoj , Aarhus University, Department of Molecular Biology and Genetics, Faculty of Science and Technology, Tjele, Denmark
Rasmus Ory Nielsen , Aarhus University, Department of Molecular Biology and Genetics, Faculty of Science and Technology, Tjele, Denmark
Mathilde Nielsen , Aarhus University, Department of Molecular Biology and Genetics, Faculty of Science and Technology, Tjele, Denmark
Bo Thomsen , Aarhus University, Department of Molecular Biology and Genetics, Faculty of Science and Technology, Tjele, Denmark
Christian Bendixen , Aarhus University, Department of Molecular Biology and Genetics, Faculty of Science and Technology, Tjele, Denmark
We contribute to the annotation of the Pig Genome version 10.2 by performing AUGUSTUS gene predictions that can incorporate genome-aligned mRNA sequence evidence as hints to improve the analysis. A large collection of porcine mRNA sequences was established from various internal and external sources including more than one billion in-house Illumina RNA-seq reads (muscle, liver, lung, brain, kidney, spleen, heart), 5.3 million in-house 454 Roche/Sanger sequencing long EST reads (100 different tissues) and about 400 million Illumina RNA-seq reads from the Pinky Tabasco clone (pool of 10 different tissues). In addtion, publicly available sequences as 19039 Ensembl Known cDNAs (November 2, 2011), 3310 NCBI RefSeq mRNAs (Release 49) and 1.3 million NCBI ESTs (UniGene build 41) were used. The mRNA sequences were mapped with TOPHAT software to the NCBI genome sequence target database build from assembled chromosomes (1-18, X, Y) with mapping options adjusted according to the various sequence types from the different sources. TOPHAT alignment bam files were then processed by the CUFFLINKS software pipeline for genome-wide mRNA transcript assembly and generation of a single transcriptome hints file. Finally, AUGUSTUS gene prediction was performed chromosome-wise (chrID) generating non alternative spliced transcripts including UTRs. In total 21,077 gene predictions were produced by AUGUSTUS of which 18,328 map to 14,414 NCBI human RefSeq targets and 20,864 map to 16,754 NCBI mammalian RefSeq targets (TeraBLASTNH, e-value cutoff 1e-8). Further annotation of the Sus scrofa genome version 10.2 was performed by microRNA sequencing and gene prediction based on miRDeep analysis.