We want to build a classifier for a simplistic OCR system. It should map any grayscale image of an ‚a‘ to the discrete symbol ‚a‘. The grayscale image to the left is already a member of a discrete, finite set X: There are 17×20 pixels, and each pixel takes one of 256 values. The number of possible images is small enough to be written down on this page: 6.33E818 needs less than 1K digits. On the other hand, there is no chance to store the classifying function X → Y as a table.
The a-ness of the image survives under a wide range of linear or nonlinear transformations. Most notable are combinations of scaling and threshold operations. First, the image is scaled down to 5×6 pixels. You may need to squint to see more clearly, but the ‚a‘ is still recognizable. The threshold operation separates the thirty pixels into two sets. Each pixel in the dark set is darker than any pixel in the white set and vice versa. Note how small the image space X_{1} has become. It has only binomial(30,15) ≈ 1.55e8 points. The seriphs are still visible. ( see also Phenomena )
Let’s pretend for the moment that we have a good classifier f, perhaps sitting as a black box in front of the screen. The classifier f : X → Y still factors over this small space X_{1}. At least, this is true for the fiber over ‚a‘ ∈ Y. There may be more difficult points in X, where the factorization becomes impossible. Let’s define U_{1} ⊂ X as the subset of X where the factorization of f through h_{1} : X → X_{1} remains possible. If we choose several maps h_{i}: X → X_{i}, the open sets U_{i} may form an open cover of X! We may recover our classifier although the individual spaces are very small.
The useful spaces X_{1} lie somewhere between these extremes.
Yes, certainly. In our simplistic OCR example we’ve ignored character segmentation. The bounding box of our ‚a‘ is not given, we have to compute it, and we should expect some variability in this process. Likewise, there may be small changes in page orientation. Our helper functions h_{i} are locally constant on pitifully small domains. Even very slight changes of bounding box and orientation will be reflected in the bitmaps.
The original ‚a‘ is a 17×20 image. We can crop off any of four borders. That will give us 16 slightly different images.
The scaled-down versions are very similar.
The threshold operation separates images that are close together in R^{30}.
In this special case we may identify the spaces X_{i}. The pairs (h_{i}, X_{i}) are different, though. In general, the X_{i} will have nothing in common.
We can use small rotations instead of clipping:
Or scale down to different grid resolutions:
Or fix the geometry and vary the threshold value:
The range of locally constant maps is in no way exhausted with these graphic examples. The threshold operation is just a very special case. We may, for example, calculate the rank order of all pixel values. This gives a map from R^{6×5} into a discrete set S_{30} of permutations. It is large, but the number of frequently occuring permutations is much smaller. This bold idea works remarkably well with tiny images of 5×4 pixels each! (See here.)In the same vein, every patch of 2×2 pixels in our 6×5 area determines a permutation of four pixel values. It encodes the local gradient. Taken together, these codes give a map from R^{6×5} into {1,…,24}^{5×4}.
Next: On Hamming Distance