W063 Avocado genome sequencing project

Date: Sunday, January 15, 2012
Time: 10:20 AM
Room: Pacific Salon 1
Enrique Ibarra-Laclette , Laboratorio Nacional de Genomica para la Biodiversidad del Centro de Investigacion y de Estudios Avanzados del IPN
Alfonso Mendez-Bravo , Laboratorio Nacional de Genomica para la Biodiversidad del Centro de Investigacion y de Estudios Avanzados del IPN
Anahi Perez-Torres , Laboratorio Nacional de Genomica para la Biodiversidad del Centro de Investigacion y de Estudios Avanzados del IPN
Gustavo Hernandez , Laboratorio Nacional de Genomica para la Biodiversidad del Centro de Investigacion y de Estudios Avanzados del IPN
Victor A. Albert , University at Buffalo (SUNY), Buffalo, NY
Luis Herrera-Estrella , Laboratorio Nacional de Genomica para la Biodiversidad del Centro de Investigacion y de Estudios Avanzados del IPN, Guanajuato, Mexico
Avocado (Persea americana) is a major fruit crop with high nutritional and industrial value. It is one of the rare crop plants among the basal angiosperms that belongs to Lauraceae family. Genomic characterization of avocado will provide an opportunity to examine the developmental genetics of fleshy fruits and some of the mechanisms of early evolutive adaptations in flowering plants. To generate de novo draft genome sequences from avocado genome we combined several massive sequencing strategies: conventional BACs paired-end (PE) Sanger sequencing with ABI 3730xl sequencer; single-end (SE) and PE 454 reads from Roche GS platforms, and PE reads from SOLiD sequencer. First, we constructed a BACs library, from which we generated 55,824 Sanger Pair-End sequences (approximately 0.03-fold genome sequence coverage). We then used sheared genomic DNA to generate 34 read sets on Roche GS-Titanium and GS-Plus sequencers, producing about forty three million reads with an average length of 338.84 and 583.87 bp respectively (16.1-fold genome sequence coverage). We supplemented these data sets with about six million pair-end 454 reads (approximately 2-fold genome sequence coverage; sequencing libraries were constructed with insert sizes of 3 and 8 kbp). Finally, we generated around 125 million short PE reads by using the SOLiD system (approximately 6.2 Gpb of sequence data; 6.4-fold genome sequence coverage). In total our reads collection represent 24.5-fold redundant coverage of the Persea americana var. drymifolia genome. In the near future, we expect to conclude the full genome assembly and gene annotation through integrating both transcriptomic and genomic sequences.  In this paper I will present the advances in the transcriptome analysis and genome assembly and annotation.