W178Long-Read Deep Sequencing and Assembly of the Allotetraploid Coffea arabica cv. Caturra and its Maternal Ancestral Diploid species Coffea eugenioides
Time: 4:20 PM
We collaborated with Pacific BioSciences for the construction of three long insert coffee genomic libraries, Blue Pippin size selection, QC, and high-throughput PACBio sequencing as well as de novo assembly of the allotetraploid Coffea arabica cv. Caturra genome. This genotype was selected since we have already developed a high density molecular genetic map (>1,000 sequenced-based markers), and a BAC library (11X coverage, 114,816 clones) completely BAC end-sequenced using Sanger and fingerprinted, as well as comprehensive transcriptomics data (>125,000 Sanger-sequenced ESTs), and several populations that are currently being phenotyped for climate change adaptation. High-throughput PACBio sequencing of the libraries generated 73.54 Gb of post-filtered data with a N50 read of 12-15Kb for ~57-60X coverage of the C. arabica genome (1,300 Mb). This data has been used for a first assembly of the allotetraploid C. arabica genome.
To guide and validate the allotetraploid assembly, we will use high quality assemblies of the genomes of its two diploid ancestral species. We collaborated with Roche to generate the first high quality de novo assembly of the maternal ancestor of C. arabica, the diploid species C. eugenioides (660 Mb). We used Roche 454 FLX+ (average read length 750 bp) to sequence a WGS library at 9X coverage, and 454 Titanium (paired end reads) to sequence twelve 20 Kb insert libraries at 3.03X coverage, as well as Illumina Moleculo (10 Kb fragments) paired end sequenced at 3.43X coverage. Our strategy mimicked the sequencing strategy for the high quality assembly of the diploid paternal ancestor of C. arabica, the cultivated species C. canephora that was recently published (Denoed et al. 2014. Science 345: 1181-1184; genome assembly available at: http://coffee-genome.org). Transcriptome assembly and GBS studies to validate genome assemblies and anchor scaffolds to chromosomes for C. arabica and C. eugenioides are on going, and should dramatically improve our understanding of coffee genetics and genomics providing direct applications to breeders for climate change adaptation. Integration of genomic studies of equivalent quality among the allotetraploid C. arabica and its diploid progenitors will maximize scientific insights into the complex biology of polyploids.
This work is co-funded by IDB/FONTAGRO, FNC/CENICAFE, Colombia/MinAgriculture.
This abstract will be co-presented with an extended time by co-authors Gaitán, Cristancho and Góngora.