W315 A Draft Assembly and Analysis of the Highly Heterozygous Diploid Red Raspberry Genome (Rubus idaeus cv. Heritage)

Date: Saturday, January 14, 2012
Time: 9:45 AM
Room: Pacific Salon 3
Judson A. Ward , Cornell University, Geneva, NY
Jared Calvin Price , Brigham Young University, Provo, UT
Mark Clement , Brigham Young University, Provo, UT
Michael Schatz , Cold Spring Harbor Laboratory, Cold Spring Harbor, NY
Courtney A. Weber , Cornell University , Geneva, NY
John David Swanson , Salve Regina University, Newport , RI
Paul Bodily , Brigham Young University, Provo, UT
Kimberly S. Lewers , USDA-ARS , Beltsville, MD
Felicidad Fernandez , East Malling Research, East Malling, Kent , United Kingdom
Paul D. Burns , Georgia Tech, Atlanta, GA
Riccardo Velasco , IASMA Research and Innovation center, Foundation E. Mach, Dept. of Genomics and Crop Biology, San Michele all'Adige, Italy
Dan Sargent , FEM-IASMA, San Michele all'Adige, Italy
Joshua Udall , Brigham Young University, Provo, UT
An improved draft genome sequence of the highly heterozygous diploid red raspberry variety ‘Heritage’ (R. idaeus subsp. vulgatus Arrhen. x R. idaeus subsp. strigosus Michx) is presented. Fragments from 400 bp to 20,000 bp in length were sequenced using a combination of 454 and Illumina technologies.  The reads were assembled using Newbler to produce 14,867 scaffolds containing a total of 252,359,088 bases (Ns included) with an N50 scaffold size of 218,942 bp.  The internal contig graph structure of the assembly confirms genetic linkage studies suggesting an extremely high rate of heterozygosity.  This extreme heterozygosity places several interesting demands on the assembly including 1) distinguishing between sequences that are in both chromosomes of a homologous pair and sequence variants that may be present in only one and 2) ensuring that the contigs are a consistently phased for a given chromosome.  Efforts have been made to fill the intrascaffold gaps though use of syntenic evidence from the closely related Fragaria vesca genome, additional paired-end reads, and high-density linkage maps. Preliminary gene finding with GeneMark ES+ estimates a gene content of 37,803, which is close to the estimated 34,809 genes of F. vesca. Annotation of predicted gene models demonstrates that most genes from conserved metabolic pathways such as the tricarboxylic acid cycle and glycolysis are present in scaffolds.  Novel algorithms being developed for this assembly will be of value for the assembly of other highly heterozygous genomes.