P0657 Comprehensive Annotation of the Transcriptome of Channel Catfish (Ictalurus punctatus) by Paired-end RNA-Seq

Shikai Liu , Auburn University, Auburn, AL
Yu Zhang , Auburn University, Auburn, AL
Zunchun Zhou , Auburn University, Auburn, AL
Geoff Waldbieser , USDA-ARS, Stoneville, MS
Fanyue SUN , Auburn University, Auburn, AL
Jianguo Lu , Auburn University, Auburn, AL
Jiaren Zhang , Auburn University, Auburn, AL
Yanliang Jiang , Auburn University, Auburn, AL
Hao Zhang , Auburn University, Auburn, AL
Xiuli Wang , Auburn University, Auburn, AL
Rajendran K.V. , Auburn University, Auburn, AL
Huseyin Kucuktas , Auburn University, Auburn, AL
Eric Peatman , Auburn University, Auburn, AL
John Liu , Auburn University, Auburn, AL
Genome annotation depends on the availability of transcript information as well as orthology information. In teleost fish, genome annotation is seriously hindered by genome duplication. Because of gene duplications, one cannot establish orthologies simply by homology comparisons. Rather intense phylogenetic analysis or structural analysis of orthologies is required for the identification of genes. To conduct phylogenetic analysis and orthology analysis, full-length transcripts are essential. In this work, we took advantage of the doubled haploid catfish, which has only two sets of identical chromosomes and in theory there should be no SNPs. As such, transcript sequences generated from next generation sequencing can be assembled into full length transcripts. Deep sequencing of the doubled-haploid channel catfish transcriptome was performed using Illumina HiSeq 2000 platform, yielding over 300 million high-quality trimmed reads totaling 27 gigabase pairs. In order to obtain the most comprehensive and reliable assembly, three different assemblers including CLC Genomics workbench, ABySS and Velvet were used for de novo assembly, and generated 217,114, 192,558, and 311,734 contigs with a minimum length of 200bp, respectively. Functional annotation of the assemblies was initially carried out by BLASTX search against NCBI zebrafish Refseq protein and UniProt/Swiss-Prot protein databases. Only one of the contigs from three assemblers with best BLAST match was chosen for further analysis, resulting in 27,597 contigs that have unique protein hits from either one of the protein databases. Over 16,000 contigs with unique protein hits were identified as putative full-length transcripts with complete open reading frames. Gene Ontology analysis of these transcripts showed high similarity to transcriptomes of other fish species with known genome sequences. The large set of transcripts reconstructed in this study will provide the much needed resource for functional genome research in catfish, serving as a reference transcriptome for studying gene family structures, digital gene expression analysis, as well as aiding in the annotation of the catfish genome. Furthermore, the full set of transcripts with “signs” of SNPs has been identified, which may represent transcripts from duplicated gene copies in the genome. Therefore, this work will also lay ground for genome-scale analysis of gene duplication in catfish.