This is an old revision of the document!
Object Recognition
Object recognition is very similar to object classification, maybe even detection. It is quite sensible to ask: what is the difference between object classification and object recognition? Even I would say that I have trouble. One way that I like to think about it is that classification is more about differentiating instances that perhaps are clearly different (fork vs spoon vs knife, or cat vs dog vs face), whereas maybe recognition is more about differentiating objects within a class (dinner fork vs salad fork vs seafood fork, or different celebrities faces). We more often hear “face recognition” versus “face classification.” Likewise we more often hear animal classification versus animal recognition (as in cat vs dog vs human vs elephant).
So, here we study methods that deal with differentiating objects within a given class of objects (or so I believe …).
/*
(1) object detector with boosting: http://people.csail.mit.edu/torralba/shortCourseRLOC/boosting/boosting.html
*/
Module #1
Clustering
- Study k-means clustering algorithm and the algorithmic steps for k-means clustering.
- Download (or clone) the clustering skeleton code here
- Implement k-means clustering algorithm working in RGB space by following the algorithmic steps. You are welcome to implement from scratch without skeleton code.
- Test your algorithm on segmenting the image segmentation.jpg using k=3
- Try different random initialization and show corresponding results.
- Comment on your different segmentation results.
Module #2
Object Recognition
- Study the bag-of-words approach for classification/Recognition task
- We begin with implementing a simple but powerful recognition system to classify faces and cars.
- Check here for skeleton code. First, follow the README to setup the dataset and vlfeat library.
- In our implementation, you will find vlfeat library very useful. One may use vl_sift, vl_kmeans and vl_kdtreebuild.
- Now, use first 40 images in both categories for training.
- Extract SIFT features from each image
- Derive k codewords with k-means clustering in module 1.
- Compute histogram of codewords using kd-tree algorithm using vlfeat.
- Use the rest of 50 images in both categories to test your implementation.
- Report the accuracy and computation time with different k
Module #3
Spatial Pyramid Matching (SPM)
- Study Spatial Pyramid Matching which can improve BoW apporach by concatenating histogram vectors.
- We will implement a simplified version of SPM based on your molude 2
- First, for each traning image, divide it equally into a 2 × 2 spatial bin.
- Second, for each of the 4 bins, extract the SIFT features and compute the histograms of codewords as in module 2
- Third, concatenate the 4 histogram vectors in a fixed order. (hint: the a vector has 4k dimension.)
- Forth, concatenate the vector you have in module 2 with this vector (both weighted by 0.5 before concatenated).
- Finally, use this 5k representation and re-run the training and testing again.
- Compare the results from module 3 and module 2. Explain what you observe.
Module #4
Features
- Now you have a good object recognition system in hand.
- But we selected sift descriptor as a black box without understanding it. Is there any other feature descriptors? Yes!
- Select one feature descriptor, rerun the training and testing, and show your comparison results and observations.