W291 The genome of aspen (Populus tremula) - the most complex genome sequenced to date?

Date: Sunday, January 15, 2012
Time: 11:30 AM
Room: Sunrise
Stefan Jansson , Umeň Plant Science Centre, Dept of Plant Physiology, Umeň University, Umeň, Sweden
Populus species are dioecious, and available data indicates that aspen (Populus tremula) has higher levels of nucleotide polymorphisms and allelic divergence than other tree species investigated. Since the natural Swedish aspen population also has unusually low levels population structure, association mapping have great potential and give within-gene resolution, perhaps allowing identification of quantitative trait nucleotides.  We have sequenced the ca 450 Mbp genome of aspen to >10 X using 454 and >100 X using Illumina. The assembly is more complex than for other plant genomes sequenced to date because of the high abundance of rather large indels between haplotypes in intergenic regions. Even at this very high coverage, contigs do typically not span two genes, the NG50 of the current best assembly is 14.6 kbp and matching the two divergent haplotypes in intergenic regions to each other is very hard. Therefore, a large fraction of the genome appear “haplotype-specific”, Kmer-plots show that the two haplotypes of the sequenced individual are much more divergent than e g Vitis vinerfera ‘Pinot noir’ or a P.nigra x P.deltiodes hybrid. Only ca 50 % of the contigs map to the P.trichocarpa genome. Nevertheless, ≈97 % of the transcripts predicted in the P.trichocarpa genome are recovered in the assembly, most in one single contig. Deep sequencing of transcripts is used to define the gene space, many assembled transcripts do not match predicted transcripts in the P.trichocarpa genome. The ongoing resequencing of our association mapping population allow us to extend these findings to population level.