P1010 PacBio Sequencing and Assembly of Complex BAC Clones

David Horvath , USDA-ARS, Fargo, ND
Victor Weigman , Expression Analysis Inc., Durham, NC
Saeed Salem , North Dakota State University, Fargo, ND
Shadi BaniTaan , North Dakota State University, Fargo, ND
PacBio sequencing has excellent attributes for sequencing individual BAC clones. With read lengths averaging nearly 3,000 bases each for over 75,000 reads, sequencing of individual BACs with an average insert size of 150 kb will provide approximately 1500X coverage. This high possibility of redundancy opens the possibility of running multiple BACs in a single cell. With current costs of ~$650 for library construction and $550 per cell, sequencing whole genomes worth of BAC clones could provide an attractive alternative to whole genome shot-gun sequencing efforts for previously un-sequenced organisms- particularly since BAC clones could be prescreened for gene-containing regions. However, assembly of PacBio sequence data is confounded by a high error rate which makes assembly difficult.  We have used PacBio sequencing on a series of BACs from leafy spurge (an auto-allo hexaploid invasive weedy perennial) containing different FLOWERING LOCUS T (FT) family members and several DORMANCY ASSOCIATED MADS-BOX (DAM) genes that are suspected to be organized as tandemly-repeated gene families. Six libraries were constructed with either a single BAC or with complexes of multiple BACs. Error rates in the sequencing data were nearly 14% based on comparison to known vector sequences. The actual redundancy based on vector sequence presence ranged from as little as 26 to slightly over 700. Significant contamination with unknown sequences was observed. Initially assemblies failed, so various correction protocols were developed and tested.  The results from these processing methods will be presented as will information of gene structure for FT and DAM genes of leafy spurge.