Classification involves obtaining a sample and determining to which class the sample belongs (from a finite set of classes). The simplest form of classification is binary classification, where you must determine whether it is of class A or class B. The task ultimately boils down to determining identifying features of each class that are distinct from each other, then using those features to define a selection or classification policy. Here, we will explore different ways to generate or discover the identifying features, also known as the feature descriptor or feature vector of the sample.
This set of learning modules will go a little backwards relative to how our understanding has evolved regarding classification. It starts with an unsupervised method for performing feature descriptor estimation, where machine learning tools are used to come up with the identifying features. Later methods to be explored are slightly more engineered solutions that have been shown to work well for some problems.
/* Andrew Ng */
The classical artificial intelligence/machine learning example of classification is digits classification. Can a computer learn to recognize the 10 digits 0 through 9? Clearly this question was of great utility to the United State Post Office as well as to many banks (for automatically processing cashed checks). Most likely the first person to really solve this well is now fairly wealthy (can you figure out who this person might be?).
This set of activities comes courtesy of Prof. Andrew Ng at Stanford university. We will be using his tutorials , which you will read and implement as you go through this learning module.
Week #1: Sparse Autoencoder
Implement the 'sparse autoencoder' section.
Week #2: Data Pre-Processing and Classification
Complete the `Vectorized implementation', 'Preprocessing: PCA and Whitening' and 'Softmax Regression' sections. With the complete of this step you should now have a digits classifier that does decently. Here the feature space is the data itself, modulo the data normalization that might have been performed as the first step.
Week #3: Feature Learning and Classification.
Complete the 'Self-Taught Learning and Unsupervised Feature Learning' section for digit classification.
Week #4: Deep Neural Networks.
Now, we are going to use the autoencoder repeteadly to create a deep network. The learnt feature space from the first week will be analyzed to learn a feature space for that one (and maybe repeat the process). The idea being that each level will capture any special higher order relationships in the data. At this point, your train classifier should have a classification accuracy of 98%. In prinicple, the deep neural network should be able to beat that. Based on your results to date, please complete the 'Building Deep Networks for Classification' section.
Week #5: Character Classification with Deep Neural Networks.
Now you have a very nice classification system on digits. Can we classify on other similar data, such as characters, using the same approach? The answer should be yes, but the bigger question is with what accuracy?
Take your feature learning and classification training deep network procedure and train it with characters instead of digits. For characters, we would like to use Omniglot dataset. Let's do the following:
Before people got into the auto-learning of feature spaces, it was common to try to hand craft a feature space, or come up with a mechanism for creating feature spaces that generalized. One such mechanism is called Bag-of-Words. It comes from the text classification field. Imagine trying to classify a book as either horror, romance, comedy, or tragedy. You would imagine that the language in each would be slightly different, that the words that inspired related emotions would vary. By comparing the frequency of special words in the books, one could identify the type of stories the books contained. Computer vision researcher made an effort to translate that same concept to imagery. But how do images have words? Well, let's see …
/* (2) bag-of-words classifier: http://people.csail.mit.edu/fergus/iccv2005/bagwords.html They were short courses on ICCV 2005. */ /* Other related material, maybe more classic. */
Week #1: Clustering to Define Words
Matlab Notes: Matlab has several functions that can assist with the calculations so that you do not have to process the data in a for loops. For example, there is pdist2, the mean function can process data row-wise or column-wise if specified properly, and there are ways to perform sub-assignments using either a binary array or the list of indices themselves. You should be taking advantage of these opportunities to have compact code.
Week #2: Object Recognition
Week #3: Spatial Pyramid Matching (SPM)
Usually objects have different properties across the spatial scales, even though they may appear common at one given scale. Consequently, differentiation of objects is often improved by concatenating feature vectors or agglomerating feature descriptors that exist at different scales. This set of activities will explore how well that can work.