Orthology Inference:
Practical part
Christophe Dessimoz, cdessimoz@inf.ethz.ch
Part 1: Interactive, web-based exploration of orthology
The following tools are recommended to solve part I:
- Orthology: http://omabrowser.org
- MSA: http://www.ebi.ac.uk/mafft/
- Distance Trees: http://omabrowser.org/PhylogeneticTree.html
- ML trees: http://phylobench.vital-it.ch/raxml-bb/index.php
Orthology and function
Consider the following protein sequence:
MAINPQYEEIGKGFVTQYYA LFDDSTQRPSLVNLYNAELS FMTFEGQQIQGAAKILEKLQ SLTFQNIKRVLTAVDSQPMF DGGVLINVLGRLQCDEDPPH AYSQTFVLKPLGGTFFCAHD IFRLNIHNSA
- In which species can you find this protein sequence?
- Consider now the orthologs predicted by OMA. In which type of organism are they present?
- What could be the function of this sequence? Does such function make sense in the organisms that have it?
Orthology and Distance Tree Reconstruction
Now let us consider the protein with SwissProt ID ADC_HUMAN.
- Create a FASTA file with 6 sequences: the ADC_HUMAN sequence, its ortholog in the chimpanzee (PANTR), in the mouse (MOUSE), in the elephant (LOXAF) and in the chicken (CHICK). In addition, include also the paralog DCOR_HUMAN.
- Reconstruct a distance tree (format "Phylogram") from these sequences. Is the tree obtained consistent with OMA's predictions of orthology? Why?
- The constructed tree is rooted. Can you trust that the rooting is correct? What would be the alternative topologies?
- If we consider the tree correct, what is the most likely evolutionary relation between DCOR_HUMAN and the protein from chicken?
- To increase your confidence in the tree, add the protein with Ensembl ID ENSGALP00000036115 to this set of sequences, and perform tree reconstruction on this extended set of sequences. Discussion?
ML Tree Reconstruction (optional)
Finally, we reconstruct the extend tree using Maximum Likelihood.
- Construct a multiple sequence alignment for the set of 9 sequences from the previous part.
- Use this alignment as input for the ML reconstruction method (options: protein, JTT substitution matrix, maximum likelihood search), with DCOR_HUMAN and ENSGALP00000036115 as explicit outgroups. Once the computation is done (it takes about 10 minutes), view the tree with branch lengths and bootstrap support values, and compare it to the distance tree.
Part 2: Local inference of orthology using stand-alone packages
The goal of this part is to identify orthologs among a few sequences from human, mouse and dog, using Inparanoid and OMA.
For the purpose of this exercise, we will not compare the real, full genomes (as this would take too long. Instead, we will use some toy examples with only a few databases, in Fasta format, which you can download from here: human.fa, mouse.fa, dog.fa
Inparanoid comes in form of a Perl script, and relies on NCBI Blast for alignments. You can download Inparanoid from here: inparanoid4.tgz
OMA comes in form of a Darwin script. Darwin is a programming environment for bioinformatics which is related to Maple, a well-known software for symbolic computation. The OMA standalone version can be downloaded from there OMA Standalone
Questions
- How many pairs of orthologs do Inparanoid and OMA identify between
human and mouse?
- You will now do a simple evaluation of the results obtained by the two programs. Consider the orthologs reported between human and mouse. Compare the protein annotations (on http://www.ensembl.org or http://www.omabrowser.org - use the protein IDs as key). Do they match?
Part 3: Your own implementation!
This exercise complements the work you did on Thursday on Blast bidirectional best
hit. You will do it in Darwin, the language OMA is based on.
You can start your session by entering "omadarwin" in the terminal.
Before you start the exercise, you might want to familiarize yourself with
Darwin. First, the following documents will be of help:
- First steps in
Darwin. (The input commands are in green, darwin output is in red.)
- Darwin
Quick Reference
- Darwin Online Help Index
- Bio-recipes (On this site, you
can find a number of "How To" descriptions that solve biological
and mathematical problems using Darwin)
Now you are ready for the exercise! Please refer to the description on the
following web page:
http://people.inf.ethz.ch/cdessimo/pasteur/OrthologsEx.html
File translated from
TEX
by
TTH,
version 3.86.
On 9 Jul 2010, 10:20.