next up previous
Next: Dynamic Programming Approach Up: Appendix Previous: Appendix

Search Space and Error Analysis

For a tree with n different leaves, there are in the order of $
N(n) = 3 * 5 * .. * (2n -5) = \prod_{k=1}^{n-3} 2*k +1$ different tree topologies [10]. If a circular order is given, there are O(2n-3) different trees that can be built with that order. So a circular order alone reduces the search space. To get an intuition we build a small table (see Table 7.1).
  
Figure 11: Number of possible trees that can be built without and with a circular order C
\begin{figure}
\begin{tabular}{\vert c\vert c\vert c\vert}\hline
n & no circu...
...000 \\ \hline
\end{tabular}\begin{footnotesize}\end{footnotesize}
\end{figure}

We have seen that the probability to get a wrong TSP order is very small. But given a correct TSP order, there are still 2n-3 possible different trees. So far we know that if the error of each distance measurement is not larger then $\frac{x}{2}$, where x is the shortest edge length, we can guarantee that the tree construction is correct. As a next step we want to determine the probability to get a wrong tree for a given normal distribution of the distance measurement. Our goal is to express the probability to get a wrong edge in terms of the shortest edge x. At each connection step, we connect leaf si and si+1 with the smallest $\delta (s_i)$. Now assume that $\delta (s_i)$ has some error s.t we connect the wrong leaves, and one of the correct choices would have been $\delta(s_j)$. We know that if there was no error, then
$\delta (s_i) = d(s_{i-1}, s_i) + d(s_{i+1}, s_{i+2}) - d(s_{i-1},
s_{i+1}) - d(s_i, s_{i+2}) = -2 \cdot x$
We assume that the $\epsilon _{ij}$ follow a normal distribution with standard deviation $\sigma $. Then the error $\epsilon_{d}$ also follows a normal distribution $\sigma_d$, where the standard deviation is the sum of the individual standard deviations: $\sigma_d = 4 \sigma$ (see Figure 12). In order to get a wrong connection, the following condition must be true: for all $j \neq i$, where $\delta(s_j)$ is a correct connection, $\epsilon_{dd} = \delta(s_i) - \delta(s_j) \leq
2\cdot x$. This again is a normal distribution, with $\sigma_{dd}
= 8 \sigma$. $\epsilon_{dd}$ must be $\leq 2\cdot x$ for all possible valid connections. For a balanced tree with n leaves (tree depth log(n)), there are at most $\frac{n}{2}$ valid connections. In the worst case, if the tree is extremely unbalanced, there is only 1 correct connection at each step. Let $P_{dd}(\sigma)$ be the probability that one $\epsilon_{dd} \leq
2\cdot x$ for a given standard deviation $\sigma $ :
$P_{dd}(\sigma) = \frac{2}{\sqrt{2 \pi}} \int \limits_{z=
\frac{x}{4 \sigma }}^{inf} e^{\frac{1}{2}z^2}$
So the overall probability to get a wrong connection is:
$P_w(\sigma) = P_{dd}(\sigma)^\frac{n}{2}$ in the best case, and
$P_w(\sigma) = P_{dd}(\sigma)$ in the worst case.

  
Figure: Normal distribution for the error of $\delta (s_i)$
\begin{figure}
\begin{center}
\mbox{\psfig{file=normal.eps,height=0.12\textheight,angle=0} }
\end{center}
\end{figure}

To illustrate the probability to get the shortest edge x wrong we build a table (see Table 14) for an example tree of 16 leaves. So if the standard deviation $\sigma $ of the error $\epsilon$ of the distance measurement is $50\%$ of the shortest edge x, then the probability to get this edge wrong is $2\%$ in the best, and $60\%$ in the worst case. Hence for very large errors, a dynamic programming approach is more appropriate.
  
Figure: Probability to get the shortest edge x wrong depending on the standard deviation $\sigma $ of the error $\epsilon _{ij}$ of the distance measurements in terms of x.
\begin{figure}
\begin{center}
\begin{tabular}{\vert c\vert c\vert c\vert}
\hl...
...\%$\space & $80\%$\space \\ \hline
\end{tabular}
\end{center}
\end{figure}


next up previous
Next: Dynamic Programming Approach Up: Appendix Previous: Appendix
Chantal Korostensky
1999-07-14