Production of Long (1.5kb – 15.0kb), Accurate, DNA Sequencing Reads Using an Illumina HiSeq2000 to Support De novo Genome Assembly

Date: Saturday, January 12, 2013
Time: 3:50 PM
Room: Town and Country
Geoff Waldbieser , USDA - Agricultrural Research Service, Stoneville, MS
Michael Kertesz , Moleculo, Inc., San Francisco, CA
Dmitry Pushkarev , Moleculo, Inc., San Francisco, CA
Tim Blauwkamp , Moleculo, Inc., San Francisco, CA
John Liu , Auburn University, Auburn, AL
Interspersed repeat sequences such as transposons and short tandem repeats shatter de novo genome sequence assemblies because short DNA sequences cannot span the repeat sequence. In order to produce a more contiguous genome assembly for the blue catfish, Ictalurus furcatus, we have used Moleculo’s Long Reads product to generate extremely long and accurate reads through Illumina-based sequencing of libraries produced from long genomic fragments. To date, 201,508 long reads (933Mb total) have been produced from two libraries, ranged in length from 1.5kb to 15.8kb, and 83% of the total sequence was found in 129,554 long reads of at least 3.0kb. Pairwise alignments revealed 145,189 long reads contained only one or no mismatched bases along lengths of 400bp to 15,887bp. Preliminary assembly of only the long reads, using 99% sequence overlap identity, produced 46,098 contigs with an N50 length of 12.9kb and N80 length of 8.5kb. A further 42,141 long reads remained singlets (N80 = 4.6kb, N50 = 7.0kb). The long reads were aligned with the 1.6kb Tip1 and 1.0kb Tip2 transposons of channel catfish. Twenty one long reads contained an average 2.1kb (minimum 200bp) of sequence flanking the Tip1 orthologs. Similarly, 226 long reads contained an average 3.0kb of sequence flanking the Tip2 orthologs. The initial results demonstrate the utility of long and accurate DNA reads in bridging repetitive regions of the genome that cannot be otherwise resolved. Four additional Moleculo Long Read libraries are currently being sequenced to produce an additional ~2Gb of sequence in long reads.