Image-Based Classification

Classification involves obtaining a sample and determining to which class the sample belongs (from a finite set of classes). The simplest form of classification is binary classification, where you must determine whether it is of class A or class B. The task ultimately boils down to determining identifying features of each class that are distinct from each other, then using those features to define a selection or classification policy. Here, we will explore different ways to generate or discover the identifying features, also known as the feature descriptor or feature vector of the sample.

This set of learning modules will go a little backwards relative to how our understanding has evolved regarding classification. It starts with an unsupervised method for performing feature descriptor estimation, where machine learning tools are used to come up with the identifying features. Later methods to be explored are slightly more engineered solutions that have been shown to work well for some problems.

Module #1: Digit Classification Using Stacked Autoencoders

/* Andrew Ng */

The classical artificial intelligence/machine learning example of classification is digits classification. Can a computer learn to recognize the 10 digits 0 through 9? Clearly this question was of great utility to the United State Post Office as well as to many banks (for automatically processing cashed checks). Most likely the first person to really solve this well is now fairly wealthy (can you figure out who this person might be?).

This set of activities comes courtesy of Prof. Andrew Ng at Stanford university. We will be using his tutorials , which you will read and implement as you go through this learning module.

Week #1: Sparse Autoencoder
Implement the 'sparse autoencoder' section.

Week #2: Data Pre-Processing and Classification
Complete the `Vectorized implementation', 'Preprocessing: PCA and Whitening' and 'Softmax Regression' sections. With the complete of this step you should now have a digits classifier that does decently. Here the feature space is the data itself, modulo the data normalization that might have been performed as the first step.

Week #3: Feature Learning and Classification.
Complete the 'Self-Taught Learning and Unsupervised Feature Learning' section for digit classification.

Week #4: Deep Neural Networks.
Now, we are going to use the autoencoder repeteadly to create a deep network. The learnt feature space from the first week will be analyzed to learn a feature space for that one (and maybe repeat the process). The idea being that each level will capture any special higher order relationships in the data. At this point, your train classifier should have a classification accuracy of 98%. In prinicple, the deep neural network should be able to beat that. Based on your results to date, please complete the 'Building Deep Networks for Classification' section.

Week #5: Character Classification with Deep Neural Networks.
Now you have a very nice classification system on digits. Can we classify on other similar data, such as characters, using the same approach? The answer should be yes, but the bigger question is with what accuracy? Take your feature learning and classification training deep network procedure and train it with characters instead of digits. For characters, we would like to use Omniglot dataset. Let's do the following:

Use the feature space you learned already from the digits, and use it as your neural network feature descriptor function. Using the Omniglot data as training data, re-learn a new regressor (as the last network layer). Get the accuracy and confusion matrix.
Start all over from scratch, and learn with autoencoders a feature space for characters. Display the first layer output to see what yo uget. Discuss how similar or different it is to the digits feature space.
Use the new character feature space to train a regression classifier. Get the accuracy and confusion matrix.
Report your results and comparison.

Module #2: Engineered Features: Bag-of-Words

Before people got into the auto-learning of feature spaces, it was common to try to hand craft a feature space, or come up with a mechanism for creating feature spaces that generalized. One such mechanism is called Bag-of-Words. It comes from the text classification field. Imagine trying to classify a book as either horror, romance, comedy, or tragedy. You would imagine that the language in each would be slightly different, that the words that inspired related emotions would vary. By comparing the frequency of special words in the books, one could identify the type of stories the books contained. Computer vision researcher made an effort to translate that same concept to imagery. But how do images have words? Well, let's see …

/* (2) bag-of-words classifier: http://people.csail.mit.edu/fergus/iccv2005/bagwords.html They were short courses on ICCV 2005. */ /* Other related material, maybe more classic. */

Week #1: Clustering to Define Words

Study k-means clustering algorithm and the algorithmic steps for k-means clustering.
Download (or clone) the clustering skeleton code here
Implement k-means clustering algorithm working in RGB space by following the algorithmic steps. You are welcome to implement from scratch without skeleton code.
Test your algorithm on segmenting the image segmentation.jpg using k=3
Try different random initialization and show corresponding results.
Comment on your different segmentation results.

Matlab Notes: Matlab has several functions that can assist with the calculations so that you do not have to process the data in a for loops. For example, there is pdist2, the mean function can process data row-wise or column-wise if specified properly, and there are ways to perform sub-assignments using either a binary array or the list of indices themselves. You should be taking advantage of these opportunities to have compact code.

Week #2: Object Recognition

Study the bag-of-words approach for classification/Recognition task
We begin with implementing a simple but powerful recognition system to classify faces and cars.
Check here for skeleton code. First, follow the README to setup the dataset and vlfeat library.
In our implementation, you will find vlfeat library very useful. One may use vl_sift, vl_kmeans and vl_kdtreebuild.
Now, use first 40 images in both categories for training.
Extract SIFT features from each image
Derive k codewords with k-means clustering in module 1.
Compute histogram of codewords using kd-tree algorithm using vlfeat.
Use the rest of 50 images in both categories to test your implementation.
Report the accuracy and computation time with different k

Week #3: Spatial Pyramid Matching (SPM)
Usually objects have different properties across the spatial scales, even though they may appear common at one given scale. Consequently, differentiation of objects is often improved by concatenating feature vectors or agglomerating feature descriptors that exist at different scales. This set of activities will explore how well that can work.

Study Spatial Pyramid Matching which can improve BoW apporach by concatenating histogram vectors.
We will implement a simplified version of SPM based on your molude 2
First, for each traning image, divide it equally into a 2 × 2 spatial bin.
Second, for each of the 4 bins, extract the SIFT features and compute the histograms of codewords as in module 2
Third, concatenate the 4 histogram vectors in a fixed order. (hint: the a vector has 4k dimension.)
Forth, concatenate the vector you have in module 2 with this vector (both weighted by 0.5 before concatenated).
Finally, use this 5k representation and re-run the training and testing again.
Compare the results from module 3 and module 2. Explain what you observe.

ECE4580 Learning Modules

Patricio Vela: Course Wiki

Table of Contents

Image-Based Classification

Module #1: Digit Classification Using Stacked Autoencoders

Module #2: Engineered Features: Bag-of-Words