ApproxPts

© 2006 John Abbott
GNU Free Documentation License, Version 1.2



index page

User documentation for files ApproxPts.H and ApproxPts.C

[ApproxPts] offers three functions for preprocessing sets of approximate points whose coordinates are given as [double]s. Given a large set of approximate points with considerable overlap of the error boxes of adjacent points, the preprocessing algorithms determine a smaller set of approximate points which preserve the geometrical disposition of the original points but with little or no overlap of the error boxes. The preprocessed points are NOT necessarily a subset of the original points. Here is a quick summary of the functions.

   typedef ApproxPts::ApproxPt ApproxPt;
   vector<ApproxPt>  OriginalPoints; // to be filled with the original approx pts
   vector<double>    epsilon;        // epsilon[i] is semiwidth of error box in dimension i
   vector<ApproxPt>  NewPoints;      // the preprocessed points will be put here
  
   ApproxPts::PreprocessGridAlgm(NewPoints, OriginalPoints, ErrorBoxSemiwidths);
   ApproxPts::PreprocessAggrAlgm(NewPoints, OriginalPoints, ErrorBoxSemiwidths);
   ApproxPts::PreprocessSubdivAlgm(NewPoints, OriginalPoints, ErrorBoxSemiwidths);

PreprocessGridAlgm
is fast but the results tend to be rather crude; it is possible that some of the preprocessed points are close together. In contrast,

PreprocessAggrAlgm
gives much better results but can be considerably slower, perhaps requiring an hour's computation for around 10000 original points. Finally,

PreprocessSubdivAlgm
gives the best results (fewest output points, and best visual disposition of them); it can be rather slower than PreprocessAggrAlgm in certain cases (e.g. when the input points are already fairly well separated) but can be faster (e.g. when only few preprocessed points are produced, which will happen if the original points are densely packed compared to their error neighbourhoods).

All three algorithms work by partitioning the original point into sets, and then choosing the average of each set as the representative of those original points. Full details of these algorithms can be found in Marialaura Torrente's thesis.

PreprocessGridAlgm the sets in the partition comprise all original points which are closer to a fixed grid point than to any other of the grid points. In other words, viewing the grid as a lattice, the whole space can be covered by grid-translates of the fundamental region; the partitions are simply the original points lying in one of these grid-translates.

PreprocessAggrAlgm the sets in the partition are determined by an iterative process. Initially each set contains a single original point, then closest pair of sets are iteratively merged (provided that no original point would end up too far from the average of the merged sets).

PreprocessSubdivAlgm the sets in the partition are determined by an iterative process. Initially there is a single set containing all the original points, then if some original point is too far from the average of the set to which it belng, that point is moved to its own new set, then an iterative redistribution occurs (reassigning original points to optimize the "goodness of representation").

Maintainer documentation for files ApproxPts.H and ApproxPts.C

All the preprocessing algorithms rescale their inputs so that the error widths in each dimension are all equal to 1. The main work is done with these rescaled pointed, and at the very end the results are scaled back.

PreprocessGridAlgm might be better if we were to use std::maps, but it seems fast enough as is. We consider each input point: if the nearest grid point is already in out list of grid points we append the input to its list of associated original points, otherwise we append a new grid point with just the single input point in its list of associated original points. Finally we compute the averages of each list of original points associated to a fixed grid point. These averages are our result.

PreprocessAggrAlgm implements an "aggregative" algorithm: initially the original points are split into subsets each containing exactly one original point, then iteratively nearby subsets are coalesced into larger subsets provided each original point of the two subsets is not too far from the "centre of gravity" of the coalesced set -- this proviso is necessary as otherwise there are pathological examples (see Laura Torrente's thesis).

PreprocessSubdivAlgm implements a "subdivision" algorithm. Initially all original points are placed into a single partition. Then iteratively we seek the original point furthest from the average of its set. If this distance is below the threshold then we stop (all original points are sufficiently well represented by the averages of their sets). Otherwise we separate the worst represented original point into a new set initially containing just itself. Now we redistribute the original points: we do this by minimizing the sum of the squares of the L2 distances of the original points from their respective representatives.

Bugs, Shortcomings and other ideas

I do not like the typedef for ApproxPts::ApproxPt because the name seems very redundant; I am also uneasy about having a typedef in a header file -- perhaps it should be a genuine class?

The preprocessing algorithms should really receive input as a pair of iterators, and the output should be sent to an output iterator. But such an interface would rather uglify the code -- what to do???

Laura Torrente should check this file, and complete the descriptions of the implementation of PreprocessAggrAlgm and PreprocessDisjAlgm.