Idea of the algorithm

Next: Algorithm Up: Tree Construction Algorithm Previous: Tree Construction Algorithm

Idea of the algorithm

The basic idea of the tree construction algorithm is the following: write down the leaves in order of C. Starting from the leaves we want to find the internal nodes. Since we consider binary trees only, we know that at least two leaves are connected. When two leaves that are connected are swapped, the result a tree with the same topology. If we traverse the tree with the swapped leaves in the same circular order as before, we get the same path length. But when two leaves are swapped that are not connected (see Figure 3), some edges of the tree are traversed too often, which increases the score. So all we have to do is to swap each of the neighboring pair of leaves and to calculate the resulting total path length or score. If the score stays the same, we know that the two leaves are connected. If the score increases, the leaves are not connected. Since we use a TSP algorithm to find the order with the smallest score, the resulting score can never decrease. To save computation time we only recalculate the terms that actually change.

**Figure 6:** d(A, B) + d(C, D) - d(A, C) - d(B, D) = 0
$\begin{figure} \begin{center} \mbox{\psfig{file=firstdiff.ps,height=0.12\textheight,angle=0} } \end{center} \end{figure}$

Definition 4.1 Given is a tree T and a circular order C and leaves $L=\{s_1, .., s_n\}$ . Rename the leaves in a way s.t. the order of the leaves is in circular order C. We define $\delta (s_i)$ to be: $\delta (s_i) = d(s_{i-1}, s_i) + d(s_{i+1}, s_{i+2}) - d(s_{i-1}, s_{i+1}) - d(s_i, s_{i+2})$

Example 4.1: Assume leaves B and C were connected (see figure 6). In the tree the distance d(A, B) plus d(C, D) is the same as the distance d(A, C) plus d(B, D) (when the leaves B and C are interchanged), because the tree topologies are identical. The difference between those two sums $\delta(B)$ is:
$\delta (B) = d(A, B) + d(C, D) - d(A, C) - d(B, D)$
$\delta(C)$ is 2*x (if we try to swap the leaves (C, D)) (see Figure 7), because the two leaves are not connected. So with the function $\delta (s_i)$ we can determine whether leaves s_i and s_i+1 are connected or not.
Note that $\delta (s_i)$ corresponds exactly to the equation we used in the determination of the bound for the circular order. This means that the same error bound holds for the tree construction as for the circular order.

**Figure:** $d(B, C) + d(D, E) - d(B, D) - d(C, E) = -2\cdot x$
$\begin{figure} \begin{center} \mbox{\psfig{file=firstdiff1.ps,height=0.12\textheight,angle=0} } \end{center} \end{figure}$

Algorithm

Next: Algorithm Up: Tree Construction Algorithm Previous: Tree Construction Algorithm

Chantal Korostensky
1999-07-14