P0013 Pipseq: a DNA Sequencing Alignment, Variant Detection, Annotation and Effect Prediction Pipeline

Eric Fritz , Iowa State University, Ames, IA
James Koltes , Iowa State University, Ames, IA
Richard G. Tait Jr , Iowa State University, Ames, IA
Dorian J. Garrick , Iowa State University, Ames, IA
James M. Reecy , Iowa State University, Ames, IA
Next-generation sequencing is a versatile new tool that opens new avenues for genome-based analyses.  Whole-genome re-sequencing (WGRS) of sufficient depth allows detection of nearly all single nucleotide polymorphisms (SNPs) and insertion deletion (indels) for DNA with intermediate GC content.  The variants so identified will include mutations that are causal for phenotypic variation.  We have developed a pipeline, Pipseq, that streamlines the detection and interpretation of variants.  Pipseq aligns sequence reads to reference genomes with either bowtie or bwa. SNP and indels are identified with Genome Analysis Toolkit (GATK) or samtools. In a preliminary analysis, WGRS data from six related Angus individuals was analyzed to identify genetic variants.  More than 16.4 million unique, high quality (Q>=50) SNPs and > 600,000 unique, high quality (Q>=50) indels were detected. These variants were annotated using HGNC nomenclature to genomic regions and genes using ENSEMBL’s variant effect predictor API.  In the patriarch sire, we identified 2,552,763 intergenic, 25,859 synonymous, 12,655 non-synonymous, 7,096 3’-UTR, 3,136 splice-site, 1,343 5’-UTR, and 10 mature miRNA SNPs as well as 147 newly introduced stop codons, and 26 lost stop codons.  Furthermore, we identified 59 frameshifts due to indels.  Methods to detect copy number variation (CNV) and to phase variants using pedigree relationships are being tested to facilitate imputation.  An interactive website is being designed to allow selection of alignment software, variant calling software, reference, and read-type information to automate the process of variant detection for non-programming users.  This website will facilitate data transfer and retrieval from a MySQL database.