P0059 Sequencing and Assembly of the Largest and Most Complex Genome to Date - the Norway Spruce (Picea abies)

Bjorn Nystedt , SciLifeLab, Stockholm, Sweden
The Spruce Genome Project , .
Conifers are the dominant plant species in many ecosystems, including large areas in Sweden. Despite this, no conifer genome has yet been published, mainly owing to their large size and complexity. The lack of a genome sequence has hampered our understanding of conifer biology and evolution, as well as the development of potential novel breeding strategies of these economically important species.  We are currently performing whole genome sequencing and assembly of the 20 Gbp Norway spruce genome. This genome contains huge amounts of repeated elements, with an estimated gene density of only 1/500 kbp. In common with other tree genomes, heterozygozity is high, which further complicates the assembly process. The Spruce Genome Project is addressing questions of genome size, content and evolution, including analyses of gene families and repeats, and will establish Norway spruce as a prime model species for conifer research.  In this talk, we will present our main strategies concerning sequencing and assembly of the Norway spruce genome, and give an update on the results obtained so far. In brief, we use a combination of whole genome shotgun and fosmid pool sequencing, followed by scaffolding and merging of the separate assemblies. This is complemented by a manually curated spruce-specific repeat library, sequencing of random fosmid clones for assembly benchmarking, as well as assemblies of the chloroplast and mitochondrial genomes.  By this approach we hope to rapidly achieve an accurate and comprehensive genomic resource that will be of great value to the conifer, and wider plant, community.