Search Space and Error Analysis

Next: Dynamic Programming Approach Up: Appendix Previous: Appendix

Search Space and Error Analysis

For a tree with n different leaves, there are in the order of $N(n) = 3 * 5 * .. * (2n -5) = \prod_{k=1}^{n-3} 2*k +1$ different tree topologies [10]. If a circular order is given, there are O(2^n-3) different trees that can be built with that order. So a circular order alone reduces the search space. To get an intuition we build a small table (see Table 7.1).

**Figure 11:** Number of possible trees that can be built without and with a circular order C
$\begin{figure} \begin{tabular}{\vert c\vert c\vert c\vert}\hline n & no circu... ...000 \\ \hline \end{tabular}\begin{footnotesize}\end{footnotesize} \end{figure}$

We have seen that the probability to get a wrong TSP order is very small. But given a correct TSP order, there are still 2^n-3 possible different trees. So far we know that if the error of each distance measurement is not larger then $\frac{x}{2}$ , where x is the shortest edge length, we can guarantee that the tree construction is correct. As a next step we want to determine the probability to get a wrong tree for a given normal distribution of the distance measurement. Our goal is to express the probability to get a wrong edge in terms of the shortest edge x. At each connection step, we connect leaf s_i and s_i+1 with the smallest $\delta (s_i)$ . Now assume that $\delta (s_i)$ has some error s.t we connect the wrong leaves, and one of the correct choices would have been $\delta(s_j)$ . We know that if there was no error, then
$\delta (s_i) = d(s_{i-1}, s_i) + d(s_{i+1}, s_{i+2}) - d(s_{i-1}, s_{i+1}) - d(s_i, s_{i+2}) = -2 \cdot x$
We assume that the $\epsilon _{ij}$ follow a normal distribution with standard deviation $\sigma$ . Then the error $\epsilon_{d}$ also follows a normal distribution $\sigma_d$ , where the standard deviation is the sum of the individual standard deviations: $\sigma_d = 4 \sigma$ (see Figure 12). In order to get a wrong connection, the following condition must be true: for all $j \neq i$ , where $\delta(s_j)$ is a correct connection, $\epsilon_{dd} = \delta(s_i) - \delta(s_j) \leq 2\cdot x$ . This again is a normal distribution, with $\sigma_{dd} = 8 \sigma$ . $\epsilon_{dd}$ must be $\leq 2\cdot x$ for all possible valid connections. For a balanced tree with n leaves (tree depth log(n)), there are at most $\frac{n}{2}$ valid connections. In the worst case, if the tree is extremely unbalanced, there is only 1 correct connection at each step. Let $P_{dd}(\sigma)$ be the probability that one $\epsilon_{dd} \leq 2\cdot x$ for a given standard deviation $\sigma$ :
$P_{dd}(\sigma) = \frac{2}{\sqrt{2 \pi}} \int \limits_{z= \frac{x}{4 \sigma }}^{inf} e^{\frac{1}{2}z^2}$
So the overall probability to get a wrong connection is:
$P_w(\sigma) = P_{dd}(\sigma)^\frac{n}{2}$ in the best case, and
$P_w(\sigma) = P_{dd}(\sigma)$ in the worst case.

**Figure:** Normal distribution for the error of $\delta (s_i)$
$\begin{figure} \begin{center} \mbox{\psfig{file=normal.eps,height=0.12\textheight,angle=0} } \end{center} \end{figure}$

To illustrate the probability to get the shortest edge x wrong we build a table (see Table 14) for an example tree of 16 leaves. So if the standard deviation $\sigma$ of the error $\epsilon$ of the distance measurement is $50\%$ of the shortest edge x, then the probability to get this edge wrong is $2\%$ in the best, and $60\%$ in the worst case. Hence for very large errors, a dynamic programming approach is more appropriate.

**Figure:** Probability to get the shortest edge x wrong depending on the standard deviation $\sigma$ of the error $\epsilon _{ij}$ of the distance measurements in terms of x.
$\begin{figure} \begin{center} \begin{tabular}{\vert c\vert c\vert c\vert} \hl... ...\%$\space & $80\%$\space \\ \hline \end{tabular} \end{center} \end{figure}$

Next: Dynamic Programming Approach Up: Appendix Previous: Appendix

Chantal Korostensky
1999-07-14