P0994 Genome Puzzle Master (GPM) - an Integrated Platform for Assembling

Jianwei Zhang , Arizona Genomics Institute, University of Arizona, Tucson, AZ
Yeisoo Yu , Arizona Genomics Institute, University of Arizona, Tucson, AZ
Seunghee Lee , Arizona Genomics Institute, University of Arizona, Tucson, AZ
Jose Luis Goicoechea , Arizona Genomics Institute, University of Arizona, Tucson, AZ
Kristi Collura de Baynast , Arizona Genomics Institute, University of Arizona, Tucson, AZ
Dave Kudrna , Arizona Genomics Institute, University of Arizona, Tucson, AZ
Marina Wissotski , Arizona Genomics Institute, University of Arizona, Tucson, AZ
Rod A. Wing , Arizona Genomics Institute, University of Arizona, Tucson, AZ
Next generation sequencing technology has revolutionized our ability to cheaply generate vast quantities of raw sequence data in short periods of time. Once generated, sequences are assembled into contigs, scaffolds, superscaffolds, and sometimes even whole pseudomolecules. One piece missing from this process is the ability to easily and efficiently integrate: 1) long-range scaffolding data (i.e. BAC end sequences); 2) BAC-based physical maps; 3) molecular genetic markers; and 4) edit “next-gen-assemblies” to produce high-quality “annotation ready” assemblies that more accurately and completely reflect the structure of the genome under investigation. To fill this gap, we developed an integrated semiautomated tool named “Genome Puzzle Master” (GPM). We first focused on two major functions: data management and information-guided assembly. GPM currently can use any kind of sequence data, such as contigs, scaffolds, BAC end sequences, and reference genome sequences. The loaded data sets can be connected to each other via their relationship, which can guide assembly operations, including, but not limited to grouping, merging, ordering and orienting. The GPM assembly process will take into consideration as much  information as possible from multiple data sources for creating correctly ordered and orientated genome sequences. Most manual editing can be performed with a user-friendly graphical interface. GPM is a web-based platform built with LAMP and can be easily deployed locally. The package will soon be available at www.genome.arizona.edu and preliminary results will be discussed in detail.