Research
|
Most research projects in my group focus on the process of gene and genome evolution and the mechanisms of adaptive genetic change. Our genomics studies are enhanced by utilizing new data from complimentary disciplines, such as transcriptomics, proteomics and metabolomics. We strongly believe in an interdisciplinary approach to address the challenges of understanding the dynamics of large-system biological data. Main projects:
One key genomic feature is the presence of repetitive DNA segments arranged in a tandem, so called tandem repeats (TRs). While most TRs are found in non-coding sequences, mounting evidence suggests their substantial presence in protein-coding genes: at least in 14% of proteins in all kingdoms of life, and of much higher frequency in eukaryotes (e.g., ~30% in humans). High incidence of TRs has been observed in proteins with fundamental biological functions, those related to infectious and neurodegenerative diseases in humans, as well as in virulence and resistance conferring genes. Some of such interesting examples are shown below. We have recently proposed a framework for TR scoring that helps to filter false positive repeat predictions. Currently we are conducting a large-scale study of proteins with tandem repeats: their structure, function and the dynamics of protein-protein interactions. We are assessing the evolutionary rates and the role of adaptive change in tandem repeat regions. We aim to infer characteristics of TRs associated with disease. ![]()
Models of codon substitution are commonly used to detect signals of natural selection acting on proteins. The non-synonymous to synonymous substitution rate ratio acts as an indicator of selective pressure. In most protein-coding genes synonymous rates are much higher compared to non-synonymous rates - due to selection at the protein. Consequently, codon models can be expected to effciently use phylogenetic information to resolve early and recent events. Codon models, however, are computationally very intensive due to the large state space assumed in the Markov process (61 characters, as opposed to 4 for nucleotide and 20 for amino-acid models). We have developed CodonPhyML - a fast and accurate implementation for the maximum likelihood estimation of phylogenies under codon models. CodonPhyML boasts about 50 different codon models, including semiparamteric variants. We thoroughly tested our implementation (e.g., see figure below) and are continuosly working on optimization of the code and further development of this unique applciation.
In this project we will study selection on protein-coding DNA that acts on the choice of synonymous codons – DNA triplets translating for the same amino acid. Synonymous (or ‘silent’) changes are usually assumed to be neutral – free of selective constraints, having no effect on genetic fitness. However, in many organisms the observed synonymous codon bias cannot be explained by neutral mutational biases alone, but is thought to be due to selection. Recent experimental evidence and in silico modeling show that synonymous mutations can have important consequences on genetic fitness - see the schema below. We use phylogenetic approach to study fluctuations in synonymous rates and to identify codon positions where a synonymous change may affect protein product.
Intrinsically disordered proteins (IDPs) or proteins with disordered regions (IDRs) do not have a well-defined tertiary structure, but perform a multitude of functions, often relying on their native disorder to achieve the binding flexibility through changing to alternative conformations. Intrinsic disorder is frequently found in all three kingdoms of life, and may occur in short stretches or span whole proteins. Based on DisProt (experimenally validated database of proteins with IDRs) we estimated Markov amino acid models for disordered and ordered protein regions (Szalkowski and Anisimova 2011). Our phylogenetic mixture models can be used to study the evolution of proteins with IDRs and to improve prediction of IDRs that are conserved in homologous proteins.
The phytopathogenic bacterium R. solanacearum encodes type III effectors known as GALA proteins, which contain F-box and LRR domains. Our phylogenetic analyses of F-box domains support the lateral gene transfer of bacterial GALA proteins from host plants (Kajava et al. 2008). The examination of the selective evolutionary pressure acting on GALA proteins shows that the convex side of their horse-shoe shaped LRR domains (see figure below) is more prone to positive selection than the concave side. Therefore we proposed that the convex surface might be the site of protein binding relevant to the adaptor function of the F-box GALA proteins. We are now conducting further evolutionary, functional and pathogeneicity analyses of GALA proteins (e.g., see Remigi et al 2011). Further, we collaborate on the construction of the database of all known type III effectors in the genus Ralstonia (with up to 100 different gene families). |
