MAPHiTS: an efficient workflow for SNP detection

Bras, Marc

In the framework of large SNP discovery projects, we developed a novel “Mapping Analysis Pipeline for High-Throughput Sequences” (MAPHiTS). The pipeline allows the detection of single nucleotide polymorphisms (SNPs) and small insertion/deletions (indels) by comparing high-throughput Illumina short-reads (GAIIx or HighSeq) with a reference sequence from the same or a different species. This pipeline is based on public softwares (BWA, Bowtie, SAMtools, VarScan and Tablet) and homemade tools. In particular, we developed tools to filter out short-reads of low quality and to prepare the mapping and SNP calling. We also developed tools to filter out the called SNPs according to genome coverage, allele frequency, pValue, and SNP positions in the read. Finally, we develop tools to parallelize all needed computation on a computer cluster. It has been used efficiently on 20 runs of GAIIx of grapevine sequences. Most analysis runs in few hours on our 700 cores computer cluster. We integrated MAPHiTS into our Galaxy workflow manager (http://urgi.versailles.inra.fr/galaxy) allowing biologists without any Unix skills to easily analyse short-reads sequences with a user-friendly interface. Results can be used in various type of diversity analysis, whole genome association studies (WGAS), inserted in our GnpIS information system (http://urgi.versailles.inra.fr/gnpis) or used to design genotyping microarrays.

C28 MAPHiTS: an efficient workflow for SNP detection