W178
Long-Read Deep Sequencing and Assembly of the Allotetraploid Coffea arabica cv. Caturra and its Maternal Ancestral Diploid species Coffea eugenioides

Date: Sunday, January 11, 2015
Time: 4:20 PM
Room: Esquire - Meeting House
Alvaro Gaitan , Centro Nacional de Investigaciones de Cafe, CENICAFE, Chinchiná, Colombia
Marco A. Cristancho , Centro Nacional de Investigaciones de Cafe, CENICAFE, Chinchiná, Colombia
Carmenza E. Gongora , Centro Nacional de Investigaciones de Cafe, CENICAFE, Chinchiná, Colombia
Pilar Moncada , Centro Nacional de Investigaciones de Cafe, CENICAFE, Chinchiná, Colombia
Huver Posada , Federacion Nacional de Cafeteros de Colombia (FNC)/ Centro Nacional de Investigaciones de Café (CENICAFE), Chinchina, Caldas, Colombia
Fernando Gast , Centro Nacional de Investigaciones de Cafe, CENICAFE, Chinchiná, Colombia
Marcela Yepes , Cornell University/ School of Integrative Plant Sciences/ Plant Pathology and Plant Microbe Biology Section, Geneva, NY
Herb Aldwinckle , Cornell University/ School of Integrative Plant Sciences/ Plant Pathology and Plant Microbe Biology Section, Geneva, NY
Over the last decade, climate change has caused major reductions in coffee production due to increased incidence of insect pests and diseases, and abiotic stresses that are threatening sustainable coffee production at a global scale.  In Latin America, the coffee leaf rust epidemic has had a devastating effect with losses of >1 billion dollars that are threatening coffee production and food security for many small coffee farmers. During 2008-2011, coffee leaf rust caused a reduction of nearly one third of the coffee harvest in Colombia, and between 2012-2014, it caused >50% reduction in production in Central America, affecting more than 5 million people. Peru has been hit particularly hard in 2014. Our research is focus on de novo  sequencing and assembly of the coffee genome to accelerate adaptation of the crop to climate change.

We collaborated with Pacific BioSciences for the construction of three long insert coffee genomic libraries, Blue Pippin size selection, QC, and high-throughput PACBio sequencing as well as de novo assembly of the allotetraploid Coffea arabica cv. Caturra genome.  This genotype was selected since we have already developed a high density molecular genetic map (>1,000 sequenced-based markers), and a BAC library (11X coverage, 114,816 clones) completely BAC end-sequenced using Sanger and fingerprinted, as well as comprehensive transcriptomics data (>125,000 Sanger-sequenced ESTs), and several populations that are currently being phenotyped for climate change adaptation. High-throughput PACBio sequencing of the libraries generated 73.54 Gb of post-filtered data with a N50 read of 12-15Kb for ~57-60X coverage of the C. arabica genome (1,300 Mb).  This data has been used for a first assembly of the allotetraploid C. arabica genome.

 To guide and validate the allotetraploid assembly, we will use high quality assemblies of the genomes of its two diploid ancestral species.  We collaborated with Roche to generate the first high quality de novo assembly of the maternal ancestor of C. arabica, the diploid species C. eugenioides (660 Mb).  We used Roche 454 FLX+ (average read length 750 bp) to sequence a WGS library at 9X coverage, and 454 Titanium (paired end reads) to sequence twelve 20 Kb insert libraries at 3.03X coverage, as well as Illumina Moleculo (10 Kb fragments) paired end sequenced at 3.43X coverage. Our strategy mimicked the sequencing strategy for the high quality assembly of the diploid paternal ancestor of C. arabica, the cultivated species C. canephora that was recently published (Denoed et al. 2014.  Science 345: 1181-1184; genome assembly available at: http://coffee-genome.org).  Transcriptome assembly and GBS studies to validate genome assemblies and anchor scaffolds to chromosomes for C. arabica and C. eugenioides are on going, and should dramatically improve our understanding of coffee genetics and genomics providing direct applications to breeders for climate change adaptation.  Integration of genomic studies of equivalent quality among the allotetraploid C. arabica  and its diploid progenitors will maximize scientific insights into the complex biology of polyploids.

This work is co-funded by IDB/FONTAGRO, FNC/CENICAFE, Colombia/MinAgriculture.

This abstract will be co-presented with an extended time by co-authors Gaitán, Cristancho and Góngora.