P0943 Managing Multiple Assemblies: Comparative Genomics Tools for Annotation and Visualization

Deanna M. Church , NIH/NLM/NCBI, Bethesda, MN
NCBI Assembly Annotation Group , NCBI/NLM/NIH, Bethesda, MD
Genome assembly is challenging; different assemblers given the same set of reads will produce different assemblies. These assemblies will have much in common, but the differences will be of interest as they often represent regions of biological complexity. Variation between individuals will result in some divergent regions that cannot be resolved to a single, haploid consensus. This has led to multiple assemblies being available for some species, such as cow. The ability to compare these assemblies and provide consistent annotation of these data is critical in order to provide a coherent view of genomic biology. We have built many key pieces of infrastructure to facilitate management of multiple assemblies. The first of these pieces is an assembly database that tracks sequence relationships and meta-data. We have also developed a robust process for aligning two assemblies. This process takes advantage of the assembly structure and employees a two pass system in order to capture reciprocal best hit alignments and repeat expansions/contractions. We provide users access to these alignments via our ‘NCBI Remap’ tool. These alignments are used as part of our genome annotation process to provide consistent gene annotation between assemblies. We have developed visualization tools for viewing of data across multiple assemblies. Genome Workbench, a client side tool that can be used to both view and analyze sequence data. NCBI remap provides genome workbench files containing the assembly-assembly alignments. Our web based comparative genomics tool, Map Viewer, allows for direct querying and viewing of annotation data on multiple assemblies.