Patricio Vela: Course Wiki

This is an old revision of the document!

Module #2: Target Modelling and Re-Identification

Week #2: Testing the Method

Once we know the mixture models are proper, then the next step is to actually use the models as a means to test proximity of one model to another. If two models are close, or the data from a model is consistent with the model, then two should be the same. In this manner, we can identify when the same person comes in and out, or when it gets occluded and re-appears. This action is called re-identification.

To start simply, let's first consider the case of checking the data from the model for consistency. Imagine that there are 10 people who has walked in, each with their own Gaussian mixture model as the signature, and that they have subsequently left the scene. One of those 10 returns. Which of them is it? … Well, we have a new set of image data from the binary masked image portion of the recently entered person. Our question get converted to the question: Which of the existing models is the new data a good fit for? That naturally begets the question: How can we create a scoring mechanism for testing fitness of data to existing models?

We have been doing this in the course homework using a scoring energy principally based on squared distance. For Gaussian models, the squared distance is not so appropriate since there is a known covariance matrix which describes how the space should be warped to respect the spread of the data. Such a warped squared distance is known as the Mahalanobis distance, though in other domains it is simply a non-trivial $L_2$ norm. Matlab's implementation of a the Gaussian mixture (GM) model class has a member function for computing the Mahalanobis distance of data to a GM model (called 'mahal'). The equation for the scoring energy is:

$$E(T ; M) = \int_{\mathcal{D}} \min_{i} (T(x) - \mu_i)^T \Sigma_i (T(x) - \mu_i) dt \approx \sum_{t_\alpha \epsilon T} \;\min_{i} (t_\alpha - \mu_i) \Sigma_i (t_\alpha - \mu_i) $$

where $T$ is the template image data extracted from the image of the just entered or newly detected target/person, $M$ is a pre-existing target Gaussian mixture model whose mixtures $(\mu_i, \Sigma_i)$ are indexed by $i$. The two versions of the energy are the continuous model and the discrete approximation. Let's review their details.

Recall that we treat an image as a 2D function $I:\mathbb{R}^2 \rightarrow \mathbb{R}^d$ whose output dimension is $d$. Based on the earlier module, our template model appended the pixel coordinate position in a template centered frame, so it is $T:\mathbb{R}^2 \rightarrow \mathbb{R}^{d+2}$. Except that the template is only valid on the subset $\mathcal{D} \subset \mathbb{R}^2$, which is the domain of the template coordinate system. In the discretized approximation, we think of the “vectorized” or sequentially listed discrete pixel elements, hence the indexing via $\alpha$ into $T$ as $t_\alpha$. Secondly, only one model within the mixture model should be selected for the scoring or energy process, hence the $\min$ evaluation. The scores for all models are computed, but only the best is taken. The minimization makes sense because if a data point $t_\alpha$ is spot on for one model, it will be far for the others. That farness from the others should not be included since it is an unnecessary penalty.

Of course, that gives the energy for one single model. For multiple stored target models, the matching model for the test template would be the one with the smallest score. Let the model set be $\mathcal{M}$, which stores all of the known models $M_r$ indexed by $r$. The best match is:

$$ r^* = \argmin_{r} E(T; M_r). $$

Back