The letter ‚b‘ as a grayscale image with 21×32 pixels.
It is a point in X=[0,1]^{21×32}
The image is clipped in 16 different ways, then scaled down to 5×7 pixels.
A threshold operation turns them into black and white images with 17 black pixels each.
Each image as a hexadecimal code:
[0] 44a50f7bc [1] 44a58f79c [2] 44ad0f79c [3] 05ad0f79c [4] 44a50f7bc [5] 44a50f7bc [6] 44ad0e7bc [7] 05ad0f79c [8] 44ac8779e [9] 45a48779e [10] 45ac0f79c [11] 45ac8779c [12] 44aca779c [13] 44ad8779c [14] 45ac0f79c [15] 45ad0779c
A nearest neighbor classifier that uses euclidean distance in R^{21×32} must calculate many scalar products or vector norms. Early computers were pitifully slow. By replacing euclidean distance in R^{21×32} with hamming distance in the space of bitmaps, we can speed up the calculation considerably. This programming trick has been around for a long time.
If the reduction map from R^{21×32} to bit vectors were isometric, it would be a perfect solution — and nothing more than a trick. But it is not even close. The 16 small grayscale images form a tight cluster of similar images while their binary counterparts look quite erratic. It is difficult to represent letters as small bitmaps. The following table shows the hamming distance between image [i] and image [j]. The diameter of this cluster is eight!
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [a] [b] [c] [d] [e] [f] [0] 0 2 2 4 0 0 2 4 6 6 4 6 6 4 4 4 [1] 2 0 2 4 2 2 4 4 4 4 4 4 4 2 4 4 [2] 2 2 0 2 2 2 2 2 4 6 2 4 4 2 2 2 [3] 4 4 2 0 4 4 4 0 6 6 2 4 6 4 2 2 [4] 0 2 2 4 0 0 2 4 6 6 4 6 6 4 4 4 [5] 0 2 2 4 0 0 2 4 6 6 4 6 6 4 4 4 [6] 2 4 2 4 2 2 0 4 6 8 4 6 6 4 4 4 [7] 4 4 2 0 4 4 4 0 6 6 2 4 6 4 2 2 [8] 6 4 4 6 6 6 6 6 0 2 4 2 2 2 4 4 [9] 6 4 6 6 6 6 8 6 2 0 4 2 4 4 4 4 [a] 4 4 2 2 4 4 4 2 4 4 0 2 4 4 0 2 [b] 6 4 4 4 6 6 6 4 2 2 2 0 2 2 2 2 [c] 6 4 4 6 6 6 6 6 2 4 4 2 0 2 4 4 [d] 4 2 2 4 4 4 4 4 2 4 4 2 2 0 4 2 [e] 4 4 2 2 4 4 4 2 4 4 0 2 4 4 0 2 [f] 4 4 2 2 4 4 4 2 4 4 2 2 4 2 2 0
The diameter of this cluster of images of the same letter ‚b‘ is eight. Which image should be chosen as a prototype for a nearest neighbor classifier? There is only one conclusion: We need to store them all. They are somehow close together, because they are effects from the same cause.
The nearest neighbor classifier with few stored prototypes cannot possibly work at this low resolution. The table below shows 16 images of the letter ‚h‘ and the pairwise distance from the 16 images of the letter ‚b‘. The smallest distance is only 2. A change of two pixels may turn a ‚b‘ into an ‚h‘. Can you spot the stepstone that connects the two clusters?
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [a] [b] [c] [d] [e] [f] [0] 4 4 2 4 4 4 2 4 6 6 4 6 6 6 4 4 [1] 4 4 4 4 4 4 4 6 4 4 4 4 4 4 6 4 [2] 6 6 4 6 6 6 4 6 6 6 6 6 6 6 6 6 [3] 6 6 4 6 6 6 4 4 6 6 6 4 6 6 4 6 [4] 4 4 2 4 4 4 2 4 6 6 4 6 6 6 4 4 [5] 4 4 2 4 4 4 2 4 6 6 4 6 6 6 4 4 [6] 6 6 4 6 6 6 4 6 8 8 6 8 8 8 6 6 [7] 6 6 4 6 6 6 4 4 6 6 6 4 6 6 4 6 [8] 8 8 8 8 8 8 8 10 8 8 8 8 8 8 10 8 [9] 8 8 8 8 8 8 8 8 8 8 8 6 8 8 8 8 [a] 8 8 6 8 8 8 6 6 8 8 8 6 8 8 6 8 [b] 8 8 8 8 8 8 8 8 8 8 8 6 8 8 8 8 [c] 8 8 8 8 8 8 8 10 8 8 8 8 8 8 10 8 [d] 6 6 6 6 6 6 6 8 6 6 6 6 6 6 8 6 [e] 8 8 6 8 8 8 6 6 8 8 8 6 8 8 6 8 [f] 8 8 6 8 8 8 6 6 8 8 8 6 8 8 6 8
Interesting enough: The specimen with hamming distance 2 are among the best representations of ‚b‘ and ‚h‘ at this resolution. The bad, severely pixelated versions are separated by larger distances. The search for perfect representations actually harms discrimination performance!
Aehm, nothing, really.
It’s just silly to expect that these simple constructions will give a nonexpanding map to a discrete set of symbols that’s defined almost everywhere and factors through the space of bitmaps with the hamming metric.
The discretization maps h_{i} from X to {0,1}^{35} are definitely not useless. After all, we can still recognize ‚b‘ and ‚h‘ at this resolution.
Let’s look at the pre-images of balls around x = ‚b‘ ∊ X.
First, consider k=0. The ball is a single black-and-white bitmap. What does the fibre look like? The pre-image in [0,1]^{35} is the solution set of a system of 306 inequalities among the coordinates. Indeed, each bitmap corresponds to a partitioning of {1,2,…,35} into a light subset of size 18 and a dark subset of size 17. Each pixel from the dark subset must be darker than every pixel from the light subset. There are no further constraints. The solution set of these inequalities is a polytope inside [0,1]^{35}. It does not look like a small ball in euclidean space at all: Its diameter is large, its volume is small. Yet, most of its members are clearly recognized as the letter ‚b‘.
The pre-image of B_{1}(x) is the union of this set with its 35 neighbors of similar shape. The larger B_{k}(x) start filling the entire space, until B_{36} equals X. Very soon, the strong impression of ‚b‘-ness is lost. There are examples of the letter ‚h‘ at distance 2 from a ‚b‘. Obviously, hamming distance in the space of 5×7-bitmaps is fairly useless for distances greater than zero. The bitmap space may be treated as discrete without much loss.
As an application of the vernier principle, we now use several discretization maps
Each X_{k} carries the discrete metric. On their product, we have a nontrivial hamming metric given by d(x_{1},x_{2}) = 1/n Sum(k=1,n, h_{k}(x_{1}) ≠ h_{k}(x_{2}) ). The balls B_{ε} are defined as B_{ε} = { y ∊ X : d(x,y) ≤ ε }. Small balls are B_{ε} with small ε ≥ 0; B_{0} is a single point and B_{1} is the whole space. The space of 5×7–bitmaps considered above arises in the same way as a product of the 35 single-bit spaces. The individual spaces X_{k} are too small and the discrete metric on them conveys only a tiny amount of information. If we take each X_{k} as the space of 5×7–bitmaps, the value of a single coordinate function h_{k}(x) is almost sufficient to identify the letter ‚b‘.
The pre-image of B_{0} under this map is an intersection of the polyhedral shapes discussed above. It is much smaller, and it’s more rounded than the spiky monsters of the individual h_{k}. The small movements that differentiate between the h_{k} break sharp edges. Locally, for ε ≪ 1, the map h is almost an isometry. All global information, however, is lost as the distance reaches 1. This is well: We can’t expect to get a global classifier all in one go. We have already seen that the euclidean metric on X isn’t the last word either. Why should we bother to preserve it globally? We just want to obtain spaces and maps by glueing local data.
The discretization maps that arise from small distortions, scaling and thresholding are familiar to anyone who has used the Gimp or Photoshop. Unfortunately, there are no simple visualizations of the fibres. The voronoi diagram below shows another discretization map that’s perhaps better known. X is the unit square [0,1]x[0,1] with euclidean metric. The small discrete space X_{1} consists of 100 points chosen at random in X. The discretization map h_{1}takes each x ∊ X to its nearest neighbor in X_{1}. A larger X_{1} gives us a higher resolution. Alternatively, we can use several spaces X_{1} … X_{n}, each consisting of points in X chosen at random. The fibres of the map into Prod(k=1,n,X_{k}) are intersections of voronoi cells. Their number grows slowly with k because the dimension of X is low (two). The total number of points in the product space grows much more rapidly, but the image of X in it stays small. And still, the hamming metric in the product space will preserve some local information about X. For example, a path in X that connects two points a and b will appear as a sequence of points in the product space. Generically, ( add a small deformation if necessary ) the path will never cross more than one boundary at once. In the product space, one component will change after the other. This give us a sequence of points x_{0} = h(a), x_{1} ,… x_{s} = h(b) that connects h(a) and h(b) in hamming space. The distance between adjacent points has the smallest positive value, 1/n . Perception tends to be constant along such paths!
[print_link]