P0989 Improving Draft Genome Assemblies Using Next-gen Data with Gap-Filling and Scaffolding Assembly Tools

Jixin Deng , Baylor College of Medicine
Jiaxin Qu , Baylor College of Medicine
Huaiyang Jiang , Baylor College of Medicine
Yue Liu , Baylor College of Medicine
Xiang Qin , Baylor College of Medicine
Xing Zhi Song , Baylor College of Medicine
Stephen Richards , Baylor College of Medicine, Houston, TX
Kim Worley , Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX
Richard A. Gibbs , Human Genome Sequencing Center, Baylor College of Medicine
The recent advances in sequencing technology have allowed  large amounts of additional sequence data to be generated for relatively little cost to improve existing genome assemblies and to generate de novo assemblies. Extracting information from those data for good genome assembly requires meticulous and effective methodologies and tools.  We present two tools that work with initial assemblies in a de novo assembly process to improve scaffolding and fill gaps.  Atlas-Link can produce a de novo scaffold structure, or upgrade an existing scaffold structure of a draft assembly using mate pair sequencing data from different next-gen sequencing platforms in the standard bam file format.  Atlas-GapFill identifies mate pair data that can step into assembly gaps and locally assembles the data to fill the gaps in an existing scaffold or to extend contigs at the ends of a scaffold. The tools have been applied to Illumina, SOLiD and 454 sequencing data.  These new modules of the Atlas assembly suite have significantly improved previously released draft genome assemblies for a number of projects at the BCM-HGSC including several arthropod genomes of agriculturally important pests.  Both Atlas-Link and Atlas-GapFill were used to produce assemblies of the Assemblathon 2 data for that community-wide comparison.  Ongoing comparative genomics projects generating high quality draft genomes for primates, mammals, cetaceans, echinoderms and arthropods are benefitting from these methods.