Development of Methods for Estimating the Prediction Error of FIR

Introduction

The five nearest neighbors (5NN) in the data base of experience data, the so-called training data sets, are being used for predicting a quantitative value of the output variable of a qualitative FIR model.

To this end, the fuzzy values of the input variables of the testing data record to be predicted are compared with all fuzzy values of the input variables of the stored training data sets. The 5NNs are being determined by performing a pseudo-quantification in the input space of the testing data set as well as in the input spaces of all training data records. The pseudo-quantified records are compared with each other in the input space, and the 5NNs are being selected by minimizing the distances between the testing data set and the training data sets in a Euclidean sense.

The prediction of the output value of the testing data set is then computed as a weighted average of the output values of the 5NNs, whereby neighbors that are closer to the testing data set in their input spaces are being assigned a larger weight factor.

Two types of errors can occur when making predictions of analytical functions. These are related to the resolution inaccuracies in the input and output spaces.

The resolution inaccuracy in the input space relates to the relevance of the available training data. The larger the distance of the 5NNs from the testing data set in the input space, the more heavily must be interpolated in the output space. Consequently, the confidence in a prediction of an output variable must be related to the distance of the 5NNs from the testing data set in the input space.

The resolution inaccuracy in the output space relates to the internal consistency among the 5NNs. It can happen that neighbors, that are very close to each other in their input spaces, are associated with very different output values. Such data sets are virtually useless. Consequently, the confidence in a prediction of an output variable must also be related to the dispersion among the 5NNs in the output space.

Josefina (Fina) López researched in her PhD. dissertation, how the confidence values can be used in estimating the prediction error. To this end, she developed two separate metrics, a probabilistic distance metric and a possibilistic similarity metric, that can quantify the confidence to be expressed in a prediction [1].

The estimation of the confidence in the prediction was furthermore exploited for the purpose of improving the prediction itself. To this end, several predictions were calculated in parallel using different sub-optimal masks (qualitative models). In each forecasting step, the prediction was approved that was accompanied by the largest confidence value [2].

Most Important Publications

Cellier, F.E., J. López, A. Nebot, and G. Cembrano (1996), Means for Estimating the Forecasting Error in Fuzzy Inductive Reasoning, Proc. ESM'96, European Simulation MultiConference, Budapest, Hungary, pp.654-660.
López, J., and F.E. Cellier (1999), Improving the Forecasting Capability of Fuzzy Inductive Reasoning by Means of Dynamic Mask Allocation, Proc. ESM'99, European Simulation MultiConference, Warsaw, Poland, pp.355-362.
López, J. (1999), Time Series Prediction Using Inductive Reasoning Techniques, Organització i Control de Sistemes Industrials, Universitat Politècnica de Catalunya, Barcelona, Spain.

Development of Methods for Estimating the Prediction Error of FIR

Introduction

Most Important Publications

Sponsors