W112 Status of the Buffalo Genome

Date: Saturday, January 14, 2012
Time: 8:15 AM
Room: Pacific Salon 4-5 (2nd Floor)
Giordano Mancini , UniversitÓ della Tuscia, Viterbo, Italy
Tommaso Biagini , CASPUR - InterUniversity Consortium for Supercomputing Applications, Rome, Italy
Giovanni Chillemi , CASPUR - InterUniversity Consortium for Supercomputing Applications, Rome, Italy
Francesco Strozzi , Parco Tecnologico Padano, Lodi, Italy
John Williams , Fondazione Parco Technologico Padano, Lodi, Italy
Steven G. Schroeder , BFGL, ARS-USDA, Beltsville, MD
Aleksey Zimin , University of Maryland, College Park, MD
The genome of the water buffalo, Bubalus bubalis, was sequenced employing two different Next Generation Sequencing platforms: Roche 454 and Illumina Genome Analyzer II. The majority of coverage comes from Illumina short paired end libraries that were used to build contigs. The assembly made use of the 454 20kb paired end data and Illumina “jump library” 5kb paired ends for merging contigs and building scaffolds. We started with ~ 900M of Illumina reads (of which about 35% were 5K jump libraries) and 13M 454 reads (of which about 30% were mate pair). The total clone coverage was more than 40X. The water buffalo genome was assembled using the MSR-CA software pipeline developed by Zimin et al. that pre-processes the short reads data and then performs the final assembly. MSR-CA includes newly developed software for the error correction of reads, the Jellyfish k-mer counter and an ad-hoc modified version of the CABOG assembler (Celera Asembler with Best Overlap Graph). This method combines the benefits of de Bruijn graph and OVERLAP-LAYOUT-CONSENSUS assembly approaches.