W158 Sheep Genome Project: Gap Filling and Unbalanced Allelic Expression

Date: Saturday, January 14, 2012
Time: 9:55 AM
Room: San Diego
Yu Jiang , CSIRO Livestock Industries, on behalf of the International Sheep Genomics Consortium, Brisbane, Australia
A de novo assembly of the sheep genome (Oarv2.0) has been constructed, using 120 fold Illumina sequence from a female and a male Texel. The genome assembly was publically released in early 2011 (http://www.livestockgenomics.csiro.au/sheep/). In order to design an efficient gap filling and finishing strategy, the size and distribution of both intra- and inter-scaffold gaps was analyzed. Comparative genomics was used to evaluate gaps separating 4430 scaffolds mapped to chromosomes. This revealed a median inter-scaffold gap length of 760 bp spanning 42 Mb of sequence. Analysis within scaffolds revealed a much larger number of gaps (158,000 with total length of 123 Mb) that represents 4.7% of the genome. Most gaps within scaffolds arise due to repeat elements, but high GC rich sequences in the 5’-ends of genes are a significant problem, which has implications for the consortium’s proposed gap filling strategy. To explore the transcriptional complexity of the genome, 15 Gb of RNA-seq data was collected from seven tissues harvested from the female Texel used to build Oarv2.0. Preliminary analysis focused on 5 million SNP that are heterozygous within the Oarv2.0. The RNA-seq data was examined to detect genes containing SNP where only one allele appears to be expressed. This revealed 636 genes, some of which are clustered in genomic regions likely to be under coordinated control. One example is the DLK1 cluster which resides in a 220 kb region of chromosome 18 containing 194 adjacent mono-allelic expressed SNPs. See: http://www.sheephapmap.org