Branching Out – Expanding the Sequencing of the Mammalian Tree

Johnson, Jeremy

The Broad Institute has been sequencing a large number of vertebrate genomes with the goals of annotating the human genome, understanding vertebrate genome evolution and leveraging model organisms. These goals align well with some of the goals of the Genome 10K community. The analysis of 29 mammalian genomes has identified 3.6 million conserved elements, accounting for ~4.2% of the human genome. Sequence analysis and comparison with other datasets has allowed a candidate function to be assigned for up to 60% of these elements, including a rich annotation of hundreds of novel RNA structures and synonymous constraint elements within coding genes likely involved in gene regulation. We estimate that 150-200 mammals will be needed to develop a map of constraint at single-base resolution. The Broad has started on this quest by sequencing an additional 30 mammals selected in collaboration with the G10K community. Progress has been rapid; more than half of these genomes already sequenced using ~80x Illumina data including Fosmid links. Broad Illumina genomes are assembled with ALLPATHS-LG and achieve a quality rivaling Sanger “deep” coverage draft sequencing in terms of connectivity, accuracy and completeness. All assemblies are made public by submission to NCBI. In addition to providing high quality de novo Illumina genomes, the Broad has also exploited RNA-Seq as a powerful tool for genome annotation and transcriptome analysis for selected species. We have also provided assistance to number of communities with more targeted research questions. Here we present the status of mammalian genome sequencing at the Broad Institute.

P0078 Branching Out – Expanding the Sequencing of the Mammalian Tree