Here, we are going to step away from the video processing part for a moment and focus solely on the datasets that you have all generated. We want to test out an algorithm for correctly matching the pairs of people based on their image information alone. The more people in the dataset, the trickier it becomes. In a binary selection process, random guessing will get you a 50% correctness outcome. When selecting amongst 10 things, then random guessing is only 10% correct. Naturally, it drops inversely proportional to the number of distinct people. Can we beat this probabilistic nightmare? Let's see.
The most basic form is distinguishing targets based on their appearance would be to Gaussian mixture model for targets using Expectation Maximization algorithm.
- Please everyone upload their people Matlab files (with some unique name) to the ECE4580 people google drive folder. It is necessary so others can have access to your people, much like you'll need access to their people. - Implement a target-based Gaussian mixture model estimator. Apply to each target in your peopleA dataset. - Visualize the Gaussian mixture model estimate.
OK, now that we know what to do, there is a matter of how to do it. We are not going to use just any Gaussian mxture model because that typically leads to non-sensical results. Rather we will use an over-segmented Gaussian mixture model. First, the data needs to be converted from image form to the proper form for use as a Gaassian mixture model. You will need to create a specialized flattening function. The input is the image proper, but the output is both the image data and the pixel coordinate location of the data. Data type wise we have an M x N x 3 matrix that gets converted to a 5 x (M*N) matrix which has the pixel locations appended. Assume that the image is centered. That means the middle of the image is at coordinate (0,0). It is almost as easy as doing a reshape, but the pixel coordinates are missing. For that, you can create them using the meshgrid
function. Your two grids should be of the same dimension of the image patch of the target, but be centered (really, it is just an offset). you should be able to figure it out with a little messing around. Call it flattenTarget.
Once you have the data converted, we need to estimate a mixture model. Matlab will give crappy models if you let it guess, so you should provide the guesses. The guesses will be sampled data from the actual image patch.
Think of it as writing a flattenTargetSubsampled
function. Instead of returning every pixel and its coordinates, you should only return them on a coarser grid. Like every 5 over and 5 down, or 7 over and down (numbers are pixels). So, for a 21 x 31 target with origin at middle it might grab all x coordinates (-10, -5, 0, 5, 10) and y coordinates (-15, -10, -5, 0, 5, 10, 15) for a total of 35 sub-sampled points. These will be the guesses. Then run the Gaussian mixture model to estimate the data. You might have to prevent death of a model, or allow for death of a model. Play around with the parameters.
Now, how do we visualize the data. We will do so using nearest neighbors. Take the resulting output mixtures and covariances. Extract only the position part (the last two if data is R-G-B-x-y), and also extract only the position sub-matrix of the covariance matrix. Pack these back up into a mixture distribution. Take the meshgrid output for the image patch coordinates only and use the cluster
function on the list of image patch coordinates to give them an assignment. What you want to do is to replace each assignment with the color of the original Gaussian model (ignore the coordinates). Now you should have a bunch of colors. Reshape them into the image patch size and that should give you an image. Plot the images for your people. It should be some kind of piecewise constant approximation of the original person based on your mixture model. It might look cartoony or stylized in some sense.
The sequence of activities above both create the model, and test that the model is correct. You are hopefully comfortable with reshaping images into vector data, manipulating it, then reshaping it back into image data nd displaying. If not, then time to become an expert.