Programs, scripts and data
Models of codon substitution are commonly used to detect signals of natural selection acting on proteins. In protein-coding genes selection (negative or positive) on protein is an important evolutionary force. Consequently, codon models that account for genetic code and include selection as an explicit parameter can be expected to effciently use phylogenetic information to resolve early and recent events. We have developed CodonPhyML - a fast and accurate implementation for the maximum likelihood estimation of phylogenies under codon models. If you use our software, we will be grateful to receive your questions and comments (both positive and negative :-)
ALF = Artificial Life Framework simulates a root genome and evolves it along a phylogeny into a number of related genomes. Result files include the gene sequences at the leaves of the tree, the true tree and true MSAs. A description of ALF can be found in Dalquen et al (2011). We provide a web-server for ALF, where a standalone version of the program can also be downloaded: here.
One key genomic feature is the repetitive DNA segments arranged in a tandem - tandem repeats (TRs). While most TRs are found in non-coding sequences, mounting evidence suggests their substantial presence in protein-coding genes: at least in 14% of proteins in all kingdoms of life, and of much higher frequency in eukaryotes (>30% in humans). High incidence of TRs has been observed in proteins with fundamental biological functions, those related to infectious and neurodegenerative diseases in humans, as well as in virulence and resistance conferring genes. We have recently proposed a framework for TR scoring that helps to filter false positive repeat predictions: see Schaper et al. (2012). The simulated data used for this study can be downloaded here.
TR events are not restricted by TR unit boundaries (which are often artificial due to slippage). TR indels are modeled separately, and penalized using the phylogeny-aware alignment algorithm. This ensures enhanced accuracy of reconstructed alignments, disentangling TRs and measuring indel events and rates in a biologically meaningful way. Our method detects not only duplication events, but all changes in TR regions due to recombination, strand slippage, and other events inserting or deleting TR units. Citation: Szalkowski and Anisimova (2013).