P1028 Preliminary Data Analysis and Assembly of the Loblolly Pine Genome

Daniela Puiu , Johns Hopkins University School of Medicine, Baltimore, MD
Aleksey Zimin , University of Maryland, College Park, MD
Steven L Salzberg , Johh Hopkins University School of Medicine, Baltimore, MD
The Loblolly Pine Genome Project (LPGP) is part of the larger Pinerefseq project and has as its goal the sequencing, assembly and annotation of the Loblolly pine (Pinus taeda). Development of a high-quality reference genome will serve as a model for assembling two other conifers genomes, Douglas fir (Pseudotsuga menziesii) and sugar pine (Pinus lambertiana). With 12 chromosomes, a genome size of ~24Gbp, and high repeat content, this genome presents a major challenge for genome assembly. We are sequencing the genome using the Illumina technology from a combination of whole-genome shotgun and pooled fosmid libraries. Johns Hopkins School of Medicine along with Univ. of Maryland College Park are currently working on sequence library qc, read correction, whole genome & pooled library assembly, finishing of the chloroplast & mitochondrion genomes. We are evaluating existing read correction and assembly software as well as developing new methods targeted towards this particular genome. We will present preliminary results from assembly of four test datasets, including pools containing 500 and 1000 fosmids as well as WGS sequence from megagamethophytes. Each dataset contains multiple libraries corresponding to different preparation conditions. We have assembled these using two different genome assemblers, SOAPdenovo and MSR-CA, and we will describe the overall results as well as analyses of chloroplast and mitochondrial DNA. The genome appears in initial testing to be very highly repetitive, as is expected based on its unusually large size.