Resources

Programs, scripts and data


  • CodonPhyML - a program for phylogeny inference under codon models using maximum likelihood
  • Models of codon substitution are commonly used to detect signals of natural selection acting on proteins. In protein-coding genes selection (negative or positive) on protein is an important evolutionary force. Consequently, codon models that account for genetic code and include selection as an explicit parameter can be expected to effciently use phylogenetic information to resolve early and recent events. We have developed CodonPhyML - a fast and accurate implementation for the maximum likelihood estimation of phylogenies under codon models. If you use our software, we will be grateful to receive your questions and comments (both positive and negative :-)

  • ALF: simulating genome evolution
  • ALF = Artificial Life Framework simulates a root genome and evolves it along a phylogeny into a number of related genomes. Result files include the gene sequences at the leaves of the tree, the true tree and true MSAs. A description of ALF can be found in Dalquen et al (2011). We provide a web-server for ALF, where a standalone version of the program can also be downloaded: here.


  • Simulated data for benchmarking: protein tandem repeats
  • One key genomic feature is the repetitive DNA segments arranged in a tandem - tandem repeats (TRs). While most TRs are found in non-coding sequences, mounting evidence suggests their substantial presence in protein-coding genes: at least in 14% of proteins in all kingdoms of life, and of much higher frequency in eukaryotes (>30% in humans). High incidence of TRs has been observed in proteins with fundamental biological functions, those related to infectious and neurodegenerative diseases in humans, as well as in virulence and resistance conferring genes. We have recently proposed a framework for TR scoring that helps to filter false positive repeat predictions: see Schaper et al. (2012). The simulated data used for this study can be downloaded here.


  • This database is an extension of PANDIT (Whelan et al. 2006), containing protein-coding Pfam alignments and phylogenetic trees for protein domains and families from the three domains of life. Along with DNA and amino acid sequence data for homologs, PANDITplus provides pre-computed estimates from evolutionary models, data on protein interactions, functional and chemical pathway annotation, gene expression, and association with disease. The idea behind PANDIT was to encourage the evolution-centric analyses of protein domains and families, based on reliable sets of HMM-based alignments and associated phylogenetics trees. If you use PANDITplus, please cite Dimitrieva and Anisimova (2010).

  • ProGraphMSA+TR is a program for global graph-based phylogeny-aware alignment of sequences with tandem repeats (TRs).
  • TR events are not restricted by TR unit boundaries (which are often artificial due to slippage). TR indels are modeled separately, and penalized using the phylogeny-aware alignment algorithm. This ensures enhanced accuracy of reconstructed alignments, disentangling TRs and measuring indel events and rates in a biologically meaningful way. Our method detects not only duplication events, but all changes in TR regions due to recombination, strand slippage, and other events inserting or deleting TR units. Citation: Szalkowski and Anisimova (2013).