P0129 SNP-calling in the Zebrafish Mutation Project

Richard White , Wellcome Trust Sanger Institute, Cambridge, United Kingdom
Ian Sealy , Wellcome Trust Sanger Institute, Cambridge, United Kingdom
Ross Kettleborough , Wellcome Trust Sanger Institute, Cambridge, United Kingdom
Steve Harvey , Wellcome Trust Sanger Institute, Cambridge, United Kingdom
Elisabeth Busch-Nentwich , Wellcome Trust Sanger Institute, Cambridge, United Kingdom
Colin Herd , Wellcome Trust Sanger Institute, Cambridge, United Kingdom
Fruzsina Fenyes , Wellcome Trust Sanger Institute, Cambridge, United Kingdom
Derek Stemple , Wellcome Trust Sanger Institute, Cambridge, United Kingdom
The Zebrafish Mutation Project (ZMP) aims to create a knockout allele in every protein-coding gene in the zebrafish genome, using a combination of whole exome enrichment and Illumina next generation sequencing. Samples of genomic DNA from FI mutagenised individuals were exome enriched using an Agilent Sureselect assay designed using the gene build from the Zv8 zebrafish genome assembly. These samples are then sequenced using the Illumina platform. Continual improvements in sequencing yield have allowed us to increasingly multiplex samples, improving throughput.  The sequencing data is mapped to the genome and we then use three different SNP callers (GATK's unified genotyper, Samtools mpileup and QCall) to call SNPs and indels. SNPs must be called by all three callers and are then filtered according to quality thresholds. This allows us to eliminate a large proportion of false positives. Ensembl gene-build information is then used to annotate the potential consequences of a given SNP and non-sense and essential splice-site mutations are then confirmed by KASP genotyping. In addition, SNP and indel data for the whole sequenced population are included in the design of the genotyping assays in order to reduce the problem of allele bias in genotyping. Over the next five years, we intend to sequence a total of 10,000 exomes. Using this strategy, we have so far identified 4163 alleles in 3525 genes from 381 sequenced individuals. These alleles are displayed on the ZMP website (http://www.sanger.ac.uk/Projects/D_rerio/zmp) and distributed free of charge to external requestors.