User Tools

Site Tools


ece4580:module_classification

This is an old revision of the document!


Classification

Classification involves obtaining a sample and determining to which class the sample belongs (from a finite set of classes). The simplest form of classification is binary classification, where you must determine whether it is of class A or class B. The task ultimately boils down to determining identifying features of each class that are distinct from each other, then using those features to define a selection or classification policy. Here, we will explore different ways to generate or discover the identifying features, also known as the feature descriptor or feature vector of the sample.

This set of learning modules will go a little backwards relative to how our understanding has evolved regarding classification. It starts with an unsupervised method for performing feature descriptor estimation, where machine learning tools are used to come up with the identifying features. Later methods to be explored are slightly more engineered solutions that have been shown to work well for some problems.

/* (2) bag-of-words classifier: http://people.csail.mit.edu/fergus/iccv2005/bagwords.html They were short courses on ICCV 2005. */

/* Andrew Ng */

Module #1: Digit Classification Using Stacked Autoencoders

The classical artificial intelligence/machine learning example of classification is digits classification. Can a computer learn to recognize the 10 digits 0 through 9? Clearly this question was of great utility to the United State Post Office as well as to many banks (for automatically processing cashed checks). Most likely the first person to really solve this well is now fairly wealthy (can you figure out who this person might be?).

This set of activities comes courtesy of Prof. Andrew Ng at Stanford university. We will be using his tutorials , which you will read as you go through this learning module.

Week #1: Sparse Autoencoder
Implement the 'sparse autoencoder' section.

Week #2: Data Pre-Processing and Classification
Complete the `Vectorized implementation', 'Preprocessing: PCA and Whitening' and 'Softmax Regression' sections. With the complete of this step you should now have a digits classifier that does decently. Here the feature space is the data itself, modulo the data normalization that might have been performed as the first step.

Week #3: Feature Learning and Classification.
Complete the 'Self-Taught Learning and Unsupervised Feature Learning' section for digit classification.

Week #4: Deep Neural Networks.
Now, we are going to use the autoencoder repeteadly to create a deep network. The learnt feature space from the first week will be analyzed to learn a feature space for that one (and maybe repeat the process). The idea being that each level will capture any special higher order relationships in the data. At this point, your train classifier should have a classification accuracy of 98%. In prinicple, the deep neural network should be able to beat that. Based on your results to date, please complete the 'Building Deep Networks for Classification' section.

Week #5: Character Classification with Deep Neural Networks.
Now you have a very nice classification system on digits. Can we classify on other similar data, such as characters, using the same approach? The answer should be yes, but the bigger question is with what accuracy? Take your feature learning and classification training deep network procedure and train it with characters instead of digits. For characters, we would like to use Omniglot dataset. Let's do the following:

  1. Use the feature space you learned already from the digits, and use it as your neural network feature descriptor function. Using the Omniglot data as training data, re-learn a new regressor (as the last network layer). Get the accuracy and confusion matrix.
  2. Start all over from scratch, and learn with autoencoders a feature space for characters. Display the first layer output to see what yo uget. Discuss how similar or different it is to the digits feature space.
  3. Use the new character feature space to train a regression classifier. Get the accuracy and confusion matrix.
  4. Report your results and comparison.

/* Other related material, maybe more classic. */


ECE4580 Learning Modules

ece4580/module_classification.1485388875.txt.gz · Last modified: 2024/08/20 21:38 (external edit)