nextupprevious
Next:BibliographyUp:Applying perceptual grouping toPrevious:Estimation of

Training images - Parameter estimation

A total of 30 training images were used (20% of the image set) with 10 training images for each of the three classes. The remaining 120 images (80%) were used for testing. The feature vector, ${\bf {X}}_i$, where $i \in\{1,2,3\}$, (designated here with the subscript $i$ to denote its class), is extracted from these training images. The parameters ${\bf {\mu}}_i$ and $\Sigma_{i}$ are estimated using maximum likelihood estimation [25], i.e.,
\begin{displaymath}{\bf {\mu}}_{i} = E[{\bf {X}}_{i}] \end{displaymath} (30)
\begin{displaymath}\Sigma_{i} = E[({\bf {X}}_{i} - {\bf {\mu}}_{i})({\bf {X}}_{i} - {\bf {\mu}}_{i})^t] \end{displaymath} (31)
where $E(\cdot)$ is the expectation operator. startsection section1@1.0ex plus 1ex minus .2ex.2ex plus .2exResults obtained This section describes the results of the two experiments performed on the database. In generating these results we have assumed that the a priori probabilities $P(\Omega_i)$'s are equal. The images present in the database were specifically selected to represent an approximately equal a priori distribution. The database contains 150 images, of which 55 are building images, 51 are non-building images, and 44 are intermediate images. Images used in the training phase were not used in either experiment.

The first experiment measured the recall and the precision. Recall is defined as the fraction of the total number of building images that are retrieved correctly by the system. Precision is defined as the fraction of images retrieved that are actually building images. In this experiment the term $\log[p({\bf X})]$ in equation 29 may be ignored, hence, $g_i({\bf X}) =-\log\vert\Sigma_i\vert - ({\bf X} - {\bf {\mu}}_i)^t \Sigma_i^{-1} ({\bf X} - {\bf {\mu}}_i)$.

Some of the images retrieved by the system that are classified as building images are shown in figure 3. Recall and precision are shown in table 2. The first column shows the three classes. The second, third and fourth columns show the number of images (T) in each of the three classes, the number of images retrieved (R) in the respective classes, and the number of correct images (C) in the set of images retrieved, respectively.

\begin{figure}\vspace{-10pt} %%<----------------------------------------------......D6_mvc-018f.ps,width=0.5625in,height=0.421875in}}\\\end{tabular} }\end{figure}
Figure 3: Some of the building images retrieved.
Table 2: Recall and precision.
 
Class T R C Recall Precision  
        (C/T) (C/R)  
$\Omega_1$ 45 43 36 80.00% 83.72%  
$\Omega_2$ 41 32 25 60.98% 78.13%  
$\Omega_3$ 34 45 21 61.76% 46.67%  


The system retrieved a set of 43 images as buildings images. Of these, 36 images were actually building images. Therefore, the system has a recall of 80% (36/45), and a precision of 83.72% (36/43) for the building class. Similarly, values of recall and precision for the other two classes are also shown in table 2.
Table 3: Distribution of correct images in the ``best matches'', and the efficiency of the system.
 
1 2 3 4 5 6 7 8 9 10
Class 1-20 21-40 41-60 61-80 81-100 100-120 T M Efficiency (M/T)
$\Omega_1$ 19 15 8 2 1 - 45 36 80.00%
$\Omega_2$ 17 12 8 3 1 - 41 29 70.73%
$\Omega_3$ 10 9 8 3 3 1 34 17 50.00%


In the second experiment the ``best matches'' for the three classes were retrieved. Images were sorted in descending order on the corresponding value of $g_i({\bf X})$, ($\log[p({\bf X})]$ in equation 29 cannot be ignored now), hence, $g_i({\bf X}) = -\frac {1} {2} \log\vert\Sigma_i\vert - \frac {1} {2} ({\bf X} -......{1} {2} [({\bf X} - {\bf {\mu}}_j)^t \Sigma_j^{-1} ({\bf X} - {\bf {\mu}}_j)]}]$.

The best matches are analyzed in table 3. The first column shows the three classes. The number of images that actually belong to a particular class within the best matches are shown in ranges of 20 images in $2^{nd}$$7^{th}$ columns of the table. Efficiency is defined as the number of images (M) that actually belong to a particular class that are obtained in the first T best matches for that class, expressed a fraction. These values are shown in $8^{th}$$10^{th}$ columns of the table. startsection section1@1.0ex plus 1ex minus .2ex.2ex plus .2exConclusions This paper has presented perceptual grouping as an effective tool in the content-based image retrieval framework for the retrieval of images based on the semantic interrelationships of different primitive image features. A methodology for the application of the perceptual grouping rules for the retrieval of building images from a database of still monocular grayscale outdoor images is illustrated to serve as an example. The images were taken from a ground-level camera.

The system analyzed each of the images to extract features that were strong evidence of the presence of buildings. These features are generated by the strong boundaries typical of the different structures that comprise the building. The features, which are specific shapes of corners, junctions and parallels, are obtained by perceptual grouping of primitive image features, by bottom-up processing. A Bayesian framework analyzed these features and retrieved images which it perceived to be building images. Results obtained are encouraging for pursuing future work in applying higher-level semantic knowledge for image retrieval.


nextupprevious
Next:BibliographyUp:Applying perceptual grouping toPrevious:Estimation of
Qasim Iqbal 2001-03-01