Michael Hiller speaking about TOGA2 at the Vertebrate Genomes Project (VGP) conference
Annotating coding genes in newly sequenced genomes and inferring orthologs are classical challenges in genomics. TOGA (Tool to infer Orthologs from Genome Alignments) is a method that simultaneously annotates coding genes and infers orthologous loci from intronic and intergenic alignments. We will present TOGA 2.0, the unpublished, completely reimplemented successor to the original method, that provides a number of new features. A new alignment strategy that exploits orthology at the exon level increases runtime ~7 fold and massively reduces memory requirements ~300 fold. By incorporating deep learning-based splice site predictions, TOGA 2.0 achieves more accurate exon boundary identification and copes with evolutionary exon-intron structure changes such as splice site shifts, precise intron deletions and exonization of intronic sequences. A gene tree-based reconciliation step improves orthology resolution and detects additional 1:1 orthologs. Finally, TOGA 2.0 can also predict untranslated regions. As a reference-based method, TOGA 2.0 scales to hundreds and even thousands of species. To demonstrate this, we provide a comprehensive comparative genomics resource for the VGP and other genomes, including gene annotations, sets of orthologs, lost and duplicated genes, and codon alignments for >1000 mammal and >650 bird genome assemblies.

