Dihaploid Coffea arabica Genome Sequencing and Assembly
Date: Sunday, January 11, 2015
Time: 5:20 PM
Room: Esquire - Meeting House
which accounts for 70% of world coffee production is an allotetraploid with a genome size of approximately 1.3 Gb and is derived from the hybridization of C. canephora
(710 Mb) and C. eugenioides
(670 Mb). To elucidate the evolutionary history of C. arabica
, and generate critical information for breeding programs, a sequencing project is underway to finalize a reference genome using a dihaploid line and a set of 30 C. arabica
accessions. For the reference genome, we have generated two assemblies, one from Illumina data (>150x coverage) and a second from PacBio sequences (>50x coverage). The present assemblies cover 1,031 and 1,042 Mb, respectively. After further refinement, using Illumina mate pairs and optical mapping, the genome assemblies will be annotated using RNA-Seq. Resequencing of C. eugenioides
and C. canephora
has been completed and is being used to better assess homeologs within the sub-genomes. Furthermore, 30 C. arabica
accessions, representing wild and cultivated genotypes, are being resequenced (20x coverage) using Illumina. A C. arabica
genetic map, currently including over 600 SSR markers, that differentiate between the two sub-genomes, is used to anchor the assemblies. Newly identified SNP markers are being added to the map.
The final goals of the project are to produce a high quality reference genome, assess an eventual neo-diversification occurring in the cultivated varieties, have a better understanding of the species formation and evolution, and develop tools that will make the finished genome accessible and useful to breeders and researchers.