This is an old revision of the document!

Object Recognition

/*

(1) object detector with boosting: 
http://people.csail.mit.edu/torralba/shortCourseRLOC/boosting/boosting.html

*/

Module #1

Clustering

Study k-means clustering algorithm and the algorithmic steps for k-means clustering.
Download (or clone) the clustering skeleton code here
Implement k-means clustering algorithm working in RGB space by following the algorithmic steps. You are welcome to implement from scratch without skeleton code.
Test your algorithm on segmenting the image segmentation.jpg using k=3
Try different random initialization and show corresponding results.
Comment on your different segmentation results.

Object Recognition

Study the bag-of-words approach for classification/Recognition task
We begin with implementing a simple but powerful recognition system to classify faces and cars.
Check here for skeleton code. First, follow the README to setup the dataset and vlfeat library.
In our implementation, you will find vlfeat library very useful. One may use vl_sift, vl_kmeans and vl_kdtreebuild.
Now, use first 40 images in both categories for training.
Extract SIFT features from each image
Derive k codewords with k-means clustering in module 1.
Compute histogram of codewords using kd-tree algorithm using vlfeat.
Use the rest of 50 images in both categories to test your implementation.
Report the accuracy and computation time with different k

Spatial Pyramid Matching (SPM)

Study Spatial Pyramid Matching which can improve BoW apporach by concatenating histogram vectors.
We will implement a simplified version of SPM based on your molude 2
First, for each traning image, divide it equally into a 2 × 2 spatial bin.
Second, for each of the 4 bins, extract the SIFT features and compute the histograms of codewords as in module 2
Third, concatenate the 4 histogram vectors in a fixed order. (hint: the a vector has 4k dimension.)
Forth, concatenate the vector you have in module 2 with this vector (both weighted by 0.5 before concatenated).
Finally, use this 5k representation and re-run the training and testing again.
Compare the results from module 3 and module 2. Explain what you observe.