P0123 De novo Assembly of a Novel Filamentous Blue-green Algal Genome for Leptolyngbya sp. Strain BL0902 Enabled by a Novel, Extra-long Read Sequencing Protocol

Kevin Clancy , Life Technologies, Carlsbad, CA
Michael Laptewicz , Life Technologies, Carlsbad, CA
Steve A. Kay , Division of Biological Sciences, University of California San Diego, La Jolla, CA
Wally Zhang , Life Technologies, Carlsbad, CA
Arnaud Taton , Division of Biological Sciences, University of California San Diego, La Jolla, CA
John Bishop , Life Technologies, Carlsbad, CA
Ewa Lis , Life Technologies, Carlsbad, CA
Susan S. Golden , Division of Biological Sciences, University of California San Diego, La Jolla, CA
Todd Peterson , Life Technologies, Carlsbad, CA
James W. Golden , Division of Biological Sciences, University of California San Diego, La Jolla, CA
Gina Costa , Life Technologies, Carlsbad, CA
Assembling whole genome sequences de novo for previously unsequenced organisms is a computationally challenging process that is greatly simplified by the availability of long (> 300 bp) sequencing reads. We describe here several reagent and protocol advances that provide > 300 Mb of Q20 sequencing reads with > 350 bp average read length from a single sensor-based sequencing run. We first assessed the accuracy of these novel methods by comparing a de novo assembly of reads produced from Synechococcus elongatus PCC 7942 to the existing Synechococcus reference genome. We subsequently produced and sequenced a ˜ 400 bp insert library from a previously unsequenced filamentous cyanobacterium, Leptolyngbya sp. strain BL0902. Sensor-based Ion Torrent sequencing and existing open-source bioinformatics tools afforded the completion of both genome sequencing and de novo whole genome assembly of the ˜ 5 Mb genome on a single Linux computer in less than one day. The > 350 bp reads yielded a contig N50 > 16,000 bp, with a largest Contig size of 105,000 bp. This result is considerably higher than a parallel assembly performed with ˜ 100 bp reads. Gene ontology analysis and targeted PCR of the genome indicate the presence of many genes for primary and secondary metabolism, photosynthesis, and an interesting array of transport proteins. Comparison of predicted genes from the short, long, and combined short and long read assemblies is guiding us to resolution of sequence assemblies as we move from de novo contigs to assembly of sequence scaffolds. The identification of the genes present within these de novo assembled contigs and their assignment to multiple pathway systems supports the accuracy and completeness of the de novo assembly methodologies and sequencing approach. These reagents and techniques enable the rapid and inexpensive exploration of previously uncharacterized genomes by any lab.