Extracting and screening synteny blocks in genome comparisons

Tang, Haibao

WGDs present a major challenge for accurately attributing synteny blocks to different evolutionary origins, especially when there are multiple sequential WGD events. In particular, some WGDs may be species-specific and others shared by multiple species. Identifying which genomic regions are the products of which duplication or speciation event is not trivial, and until recently it has largely been done manually. We present a method for screening a set of synteny blocks so that only those compatible with the overall ploidy relationship between the genomes are retained for downstream analyses. We formulate the problem as an optimization problem known as “Binary Integer Programming” (BIP), and our QuotaAlign tool can be used to reformat genomic data to the appropriate BIP instance. For a pair of genomes, the method works by asking the user to specify the ratio of “quotas” for the two genomes. A genome's quota is the maximum permitted ploidy (i.e. copy number) of regions resulting from WGDs specific to that genome. Our second tool, called SynFind, uses a sensitive algorithm to extract local synteny regions in multiple genomes given a query sequence. SynFind not only reports the regions that contain the conserved gene homolog to the query, but also lists the regions where synteny is “expected” but the matching genes were lost – a much needed utility for tracking the evolutionary history of gene families. Both QuotaAlign and SynFind are implemented as modular components in CoGe analysis system (http://genomevolution.com/CoGe), offering easier access to thousands of genomes to non-programmers.

W089 Extracting and screening synteny blocks in genome comparisons