|
We [4] have attempted three models for textons. The results for a cheetah
image are show as below. Click each topic for details.
Textons refer to fundamental micro-structures in
generic natural images and the basic elements in early (pre-attentive) visual
perception. In practice, the study of textons has important implications on a
series of problems. Firstly, decomposing an image into its constituent
components reduces information redundancy and thus leads to better image coding
algorithms. Secondly, the decomposed image representation often has much reduced
dimensions and less dependence between variables (coefficients), therefore it
facilitates image modeling which are necessary for image segmentation and
recognition. Thirdly, in biologic vision the micro-structures in natural images
provide an ecologic cue for understanding the functions of neurons in the early
stage of biologic vision system. However, in the literature of computer vision
and visual perception, the word "texton" remains a vague
concept and a precise mathematical definition has yet to be found. Here we show
some study related to this topic.
-
Sparse coding with over-complete basis
As shown in Figure 1, Olshausen and
Field (1997)[3] learned a set of bases in a non-parametric form from a large
ensemble of image patches. These are over-complete basis
learned under the general idea of sparse coding. In contrast to the
orthogonal bases or tight frame in the Fourier and wavelet transforms, these
bases are highly correlated.

Figure 1. Some image bases learned with sparse coding
by (Olshausen and Field 1997)
[Top]
-
K-mean clustering in feature space
Leung and Malik
(1999)[2] use a discriminative model to compute image elements by clustering
the filter responses. At each pixel, a pyramid of image filters at various
scales and orientations are convolved with the image. The the filter responses
are clustered by a K-mean clustering method. Because the feature vector
over-constrains a local image patch, a pseudo-inverse method can recover an image
icon from each cluster center, as shown in Figure 2 below. It is obvious that the potentially
same image structure appears multiple times which are shifted, rotated, or
scaled versions of each other.
(a).
(b).
(c). 
Figure 2. (a) Polka-dot image. (b) Textons found via
K-means with K=25. (c) Mapping of pixels to the texton channels. (Leung and
Malik 1999)
[Top]
-
Transformed components in filter space
To overcome the obvious problem in Leung and Malik's
model[2], we[4] adopt a TCA
method by introducing a transformation
as hidden (latent) variable. The potentially same image structures are
transformed and thus combined into one cluster. Figure 3 shows two examples. More
details and examples please see
here.
Figure 3. The learned textons by the TCA method in filter
space. To the right of icons are the label maps. (Zhu, Guo, Wu and Wang 2002)
[Top]
-
Transformed components in image space
For the method above, even thought the local image patch could be obtained from
the feature vector of filter responses by some methods, such as
pseudo-inverse, it is not a convenient way to reconstruct the
image. It is a discriminative model, not a generative model. We[1]
build a generative model by replacing the
filter responses with image patch as the features. The images patches
can move within a local area and can be rotated and scaled. Like the TCA in
filter space, these local patches are transformed to form tight clusters in
the 121-space by an EM-algorithm. Image elements in a cheetah skin pattern are found
as shown in Figure 4. More examples please see here.

Figure 4. The learned textons by TCA in image space. (Guo,
Zhu and Wu 2001)
[Top]
-
Texton learning: from bases to textons
One of the main problems with previous work is lacking variability in the
learned image elements. We[4] propose to define "texton" as a mini-template
that consists of a number of bases at some geometric and photometric
configurations. Figure 5 shows one example of star pattern. A "star" is
represented explicitly by a generative model of several bases. In addition,
these bases are not assumed to be independently distributed any more. A
sophisticated probabilistic model which accounts for the spatial relation of
the bases is built and learned. More examples please
see here.

a) Reconstructing a star pattern by two layers of bases. An
individual star is decoupled into a LoG base in the upper layer for the body of
the star plus a few other bases (mostly Gcos, Gsin) in the lower layer for the
angels.

b) The texton template for the star pattern.

c) How bases compose the image of a star.
Figure 5. The illustration of from bases to textons by an
example of a star pattern. (Zhu, Guo, Wu and Wang 2002)
[Top]
References:
-
Guo, C., Zhu, S. and Wu, Y. "Visual learning by
integrating descriptive and generative methods", Proc. of 8th Int'l
Conf. on Computer Vision, Vancouver, Canada, July 2001
-
Leung, T. and Malik, J. "Recognizing surface using
three-dimensional textons", Proc. of 7th Int'l Conf. on Computer
Vision, Corfu, Greece, September 1999
-
Olshausen, B. and Field, D. "Sparse coding with an
over-complete basis set: A strategy employed by V1?", Vision
Research, 37:3311-3325, 1997
-
Zhu, S., Guo, C., Wu, Y. and Wang, Y. "What are
Textons?", Proc. of 7th European Conf. on Computer Vision, Copenhagen, Denmark,
May-June 2002
-
Cheng-en Guo, Song-chun Zhu and Yingnian Wu,"A Mathematical Theory of Primal Sketch and Sketchability" (.pdf
553K)(results in .ppt 3.5M),
Proc. of International
Conference on Computer Vision,
Nice France, 2003
|