C27 iPlantTF: The Web Server That Identifies Plant Transcription Factors at The Genome Scale

Date: Wednesday, January 18, 2012
Time: 11:20 AM
Room: California
Xinbin Dai , The Samuel Roberts Noble Foundation, Ardmore, OK
Patrick Xuechun Zhao , The Samuel Roberts Noble Foundation, Ardmore, OK
The advancement of high-throughput sequencing technologies, e.g. the successful development and deployment of cost-effective 2nd and 3rd generation DNA sequencing (2GS and 3GS) platforms, has significantly advanced the study of gene expression regulatory mechanism at whole genome level in plants. This urgently calls for the development of high-throughput bioinformatics systems to effectively mine large-scale sequences generated from the 2GS and 3GS platforms for the systematical identification of important regulatory elements, such as transcription factor genes. We present a web-based analysis server named iPlantTF, which integrates 1) a sophisticate back-end high-performance parallel computing prediction module to systematically identify and classify plant transcription factors in user-submitted sequences at very high prediction accuracy and coverage, and 2) a series of intuitive web interfaces for user to submit large-scale sequences and retrieve analysis results.

The iPlantTF integrates conserved domain patterns for 103 published transcription factor families in plants. These patterns were manually collected, compiled and curated to guarantee the prediction quality. The back-end prediction module employs InterProScan, a popular protein domain search tool, to search unique domains in the user-submitted sequences, and further screen potential transcription factors by referring to conserved domain patterns of each transcription factor family. In order to provide genome-scale analysis capability, we optimized the InterProScan by trimming its databases only to include relevant domain information, and further optimized the back-end prediction module using parallel computing techniques, which are able to effectively use a 400-core Linux cluster. With these optimizations, the iPlantTF system is able to analyze genome-scale sequences, for example, in one test, the iPlantTF was able to satisfactory analyze the entire Arabidopsis thaliana genome within eight minutes.

The iPlantTF is publicly and freely available at http://plantgrn.noble.org/iPlantTF/.