W625 De novo assembly of complex genomes using 3rd generation sequencing

Date: Sunday, January 15, 2012
Time: 3:50 PM
Room: Pacific Salon 2
Michael Schatz , Cold Spring Harbor Laboratory, Cold Spring Harbor, NY
Emerging third-generation single molecule sequencing instruments can generate much longer sequences than prior methods, with the potential to dramatically improve genome and transcriptome assembly for complex genomes. However, the high error rate of the sequence reads makes their use in de novo assembly challenging, and has limited their use to specialized applications. To address these limitations, we introduced a novel sequence correction algorithm and assembly strategy that utilizes shorter, high-identity sequences to correct the inherent error in long, single-molecule sequences. We demonstrate the utility of this approach on Pacbio RS reads of phage, prokaryotic, and eukaryotic whole genomes, including the de novo assembly of yeast (Saccharomyces cerevisiae), the novel genome of the parrot Melopsittacus undulatus, as well as for RNA-seq reads of the corn (Zea mays) transcriptome. Our approach achieves over 99.9% read correction accuracy and produces substantially better assemblies than any other sequencing strategy currently available: in many cases, doubling the median contig size relative to high-coverage, second-generation assemblies.