W370 Reference-Assisted Approach for Chromosome Assembly of Non-Classical Model Genomes Generated Using NGS Techniques

Date: Sunday, January 15, 2012
Time: 12:00 PM
Room: Pacific Salon 4-5 (2nd Floor)
Denis Larkin , Aberystwyth University, Aberystwyth, United Kingdom
Jaebum Kim , University of Illinois at Urbana-Champaign, Urbana, IL
Qingle Cai , Beijing Genomics Institute, Shenzhen, China
A. Asan , Beijing Genomics Institute, Shenzhen, China
Yongfen Zhang , Beijing Genomics Institute, Shenzhen, China
Ri-Li Ge , Beijing Genomics Institute, Shenzhen, China
Loretta Auvil , University of Illinois at Urbana-Champaign, Urbana, IL
Boris Capitanu , University of Illinois at Urbana-Champaign, Urbana, IL
Guojie Zhang , Beijing Genomics Institute, Shenzhen, China
Harris Lewin , University of California, Davis, Davis, CA
Jian Ma , University of Illinois at Urbana-Champaign, Urbana, IL
Next-generation sequencing (NGS) technologies together with de novo assembly algorithms have provided us the unprecedented opportunity to unravel the genomes of different species at low cost. This trend will be further accelerated by the launch of large-scale genome projects such the Genome 10K and i5k projects. However, due to the limitation of read length of NGS and the lack of physical map for most of the target species, identifying the order and orientation of assembled sequence scaffolds from the de novo genome assembly on chromosomes is still a pressing challenge and is largely unresolved. To address this problem, we developed a novel computational method, called RACA, to further assemble the sequence scaffolds of a de novo genome assembly without using the genetic or physical map. Given the de novo genome assembly of a target species together with a closely related species as reference and one or more outgroup genomes, RACA reconstructs highly probable order and orientation of the scaffolds in the target species based on (i) the likelihood of scaffold adjacencies by considering genome evolution and (ii) the coverage of paired-end reads from the target species. Simulation results indicated that our approach can be applied to any de novo assembled genome if a good reference assembly is available. We applied our method to the reconstruction of Tibetan antelope chromosome fragments based on the 1,434 scaffolds assembled by SOAPdenovo using cattle genome as the reference and human genome as the outgroup. Our method was able to further assemble these scaffolds into 65 chromosome fragments, of which 15 correspond to complete cattle chromosomes. In addition, we were able to identify Tibetan antelope scaffolds that span 46 known evolutionary breakpoint regions, and 130 mis-assembled scaffolds in the de novo assembly. As read length of NGS increases by the implementation of new technologies, the chromosome assemblies obtained by our method will become even more accurate. We believe this method will significantly facilitate the study of chromosome evolution and genome rearrangement for large number of genomes sequenced by NGS.