User Tools

Site Tools


ece4580:module_recognition

This is an old revision of the document!


Object Recognition

Object recognition is very similar to object classification, maybe even detection. It is quite sensible to ask: what is the difference between object classification and object recognition? Even I would say that I have trouble. One way that I like to think about it is that classification is more about differentiating instances that perhaps are clearly different (fork vs spoon vs knife, or cat vs dog vs face), whereas maybe recognition is more about differentiating objects within a class (dinner fork vs salad fork vs seafood fork, or different celebrities faces). We more often hear “face recognition” versus “face classification.” Likewise we more often hear animal classification versus animal recognition (as in cat vs dog vs human vs elephant).

So, here we study methods that deal with differentiating objects within a given class of objects (or so I believe …). It may involve differentiation across different types of objects too.

/*

(1) object detector with boosting: 
http://people.csail.mit.edu/torralba/shortCourseRLOC/boosting/boosting.html 

*/

Module #1: Bag-of-Words

Before people got into the auto-learning of feature spaces, it was common to try to hand craft a feature space, or come up with a mechanism for creating feature spaces that generalized. One such mechanism is called Bag-of-Words. It comes from the text classification field. Imagine trying to classify a book as either horror, romance, comedy, or tragedy. You would imagine that the language in each would be slightly different, that the words that inspired related emotions would vary. By comparing the frequency of special words in the books, one could identify the type of stories the books contained. Computer vision researcher made an effort to translate that same concept to imagery. But how do images have words? Well, let's see …

Week #1: Clustering to Define Words

  1. Study k-means clustering algorithm and the algorithmic steps for k-means clustering.
  2. Download (or clone) the clustering skeleton code here
  3. Implement k-means clustering algorithm working in RGB space by following the algorithmic steps. You are welcome to implement from scratch without skeleton code.
  4. Test your algorithm on segmenting the image segmentation.jpg using k=3
  5. Try different random initialization and show corresponding results.
  6. Comment on your different segmentation results.

Matlab Notes: Matlab has several functions that can assist with the calculations so that you do not have to process the data in a for loops. For example, there is pdist2, the mean function can process data row-wise or column-wise if specified properly, and there are ways to perform sub-assignments using either a binary array or the list of indices themselves. You should be taking advantage of these opportunities to have compact code.

Week #2: Object Recognition

  1. Study the bag-of-words approach for classification/Recognition task
  2. We begin with implementing a simple but powerful recognition system to classify faces and cars.
  3. Check here for skeleton code. First, follow the README to setup the dataset and vlfeat library.
  4. In our implementation, you will find vlfeat library very useful. One may use vl_sift, vl_kmeans and vl_kdtreebuild.
  5. Now, use first 40 images in both categories for training.
  6. Extract SIFT features from each image
  7. Derive k codewords with k-means clustering in module 1.
  8. Compute histogram of codewords using kd-tree algorithm using vlfeat.
  9. Use the rest of 50 images in both categories to test your implementation.
  10. Report the accuracy and computation time with different k

Week #3: Spatial Pyramid Matching (SPM)
Usually objects have different properties across the spatial scales, even though they may appear common at one given scale. Consequently, differentiation of objects is often improved by concatenating feature vectors or agglomerating feature descriptors that exist at different scales. This set of activities will explore how well that can work.

  1. Study Spatial Pyramid Matching which can improve BoW apporach by concatenating histogram vectors.
  2. We will implement a simplified version of SPM based on your molude 2
  3. First, for each traning image, divide it equally into a 2 × 2 spatial bin.
  4. Second, for each of the 4 bins, extract the SIFT features and compute the histograms of codewords as in module 2
  5. Third, concatenate the 4 histogram vectors in a fixed order. (hint: the a vector has 4k dimension.)
  6. Forth, concatenate the vector you have in module 2 with this vector (both weighted by 0.5 before concatenated).
  7. Finally, use this 5k representation and re-run the training and testing again.
  8. Compare the results from module 3 and module 2. Explain what you observe.

Module #4

Features

  1. Now you have a good object recognition system in hand.
  2. But we selected sift descriptor as a black box without understanding it. Is there any other feature descriptors? Yes!
  3. In modules 4, try to replace sift with other feature descriptor. For example, you may want to see how SURF or HOG works.
  4. Select one feature descriptor, rerun the training and testing, and show your comparison results and observations.

Module #5

ece4580/module_recognition.1485711152.txt.gz · Last modified: 2024/08/20 21:38 (external edit)