P0033 A Whole Genome Reference Sequence of Amborella

Srikar Chamala , University of Florida, Gainesville, FL
Brandon Walts , University of Florida, Gainesville, FL
Jamie Estill , University of Georgia, Athens, GA
Seunghee Lee , Arizona Genomics Institute, University of Arizona, Tucson, AZ
Paula Ralph , Penn State University
Lynn P. Tomsho , Pennsylvania State University
Yeisoo Yu , Arizona Genomics Institute, University of Arizona, Tucson, AZ
Joshua P. Der , Penn State University, University Park, PA
Victor A. Albert , University of Buffalo, Buffalo, NY
Claude dePamphilis , Penn State University, University Park, PA
Jim Leebens-Mack , University of Georgia, Athens, GA
Hong Ma , The Pennsylvania State University, University Park, PA
Jeff Palmer , University of Indiana
Steve Rounsley , University of Arizona/Dow Agrosciences, Indianapolis, IN
Stephan C. Schuster , Pennsylvania State University
Douglas E. Soltis , University of Florida, Gainesville, FL
Pamela S. Soltis , University of Florida, Gainesville, FL
Susan Wessler , University of California, Riverside
Rod A. Wing , Arizona Genomics Institute, University of Arizona, Tucson, AZ
Brad Barbazuk , University of Florida, Gainesville , FL
Amborella trichopoda, as the sister to all other extant angiosperms, occupies a crucial evolutionary position, and its genome sequence is an important reference for comparative genomic studies across the angiosperms. A complete genome sequence will help in understanding the evolution of key angiosperm traits and provide a baseline to examine genome organization throughout angiosperms. We are using a whole-genome shotgun strategy to sequence the ~790-980 Mbp Amborella genome using the Roche Genome Sequencer FLX and the Illumina platforms. To date we have analyzed 28.5 plates of unpaired and five plates of paired GS FLX sequencing, along with 11 plates of unpaired FLX Titanium reads, which provide ~21x coverage.  De novo sequence assembly was done on these filtered 454 runs (28.5 plates of unpaired FLX + 5 plates of paired 8kbp insert + 11 plates of XLR) and 69,466 BAC-end sequences using Newbler.  This resulted in 10,967 scaffolds with a mean scaffold size of 66 Kbp and N50 size of 3.5 Mbp covering ~90% of the Amborella genome. Illumina reads are currently being used to provide additional scaffolding and error correction. Annotation of the assembled contigs is underway using DAWGPAWS and TWINSCAN. We are developing a GBrowse-based website to share these results with the community.