We can do a better job handling things like occlusions, as well as improve the data association, by adding in a temporal filter. A simple, powerful method is to recursively estimate and correct the moving target dynamics through a Kalman filter. Matlab has a page on how to do so for object tracking along with a more general intro page to additional documentation. In corporate the Kalman filter into the overall process, so that detected people have their trajectories predicted.
A Kalman filter is really what would be called an observer in the world of control theory. In some instances it works like how you imagine a filter should. However, in others it goes one step further and can provide filtered outputs of data that hasn't even been measured. That aspect is what makes it an observer. The most important part of the Kalman filter, that gives it the observer property, is the prediction dynamics. Through some nice statistics the prediction is joined with the measurement to create corrected outputs of the system state.
Explore and Deliverable: The Matlab version allows you to specify different prediction dynamics. These include a constant velocity model and a constant acceleration model. How do they differ mathematically? How do they differ in implementation? For one video quantitatively evaluate the different outcomes like before: use your known count of people in and out from before and compare to what the two versions output. In addition to the prediction models, the Kalman filter has some covariance parameters that arise from the probabilistic derivation of the Kalman filter. What role do they play? What are the main two covariance parameters and what do they mean? How sensitive is the system to these two covariance parameters?
Additional Task: For the future activities, we will need a body of test cases. Your assignment is to generate the data set of test cases. The next module will explore something in addition to spatio-temporal filtering for maintaining track of objects. It will be person identification, or better put, person re-identification. Testing out this concept will require collecting pairs of images that correspond to the same person. From the three videos, and your background estimation-based tracking, extract the image regions associated to the bounding boxes of individual people. Also, extract the bounding boxes for the binary foreground blob regions. If you do this, you should have four cropped images, two of the same person and two of that person's blob. Collect this information for 20 people, only 2 of which are from the mandatory video, 8 from your selected video pair, and 10 more from other videos you choose from the collection of videos available.
Save them all in a Matlab file in a structure array called peopleA
and peopleB
, where peopleA
contains one image of each person, and peopleB
contains the second. The ordering should match, as in the structure element peopleA(1).person
should be the same person as in peopleB(1).person
, where the person
field has the cropped image region of the person based on the bounding box. The structure field labeled fgblob
should be the binary mask from the foreground detection. When applied as a mask, it should give the person region only (in principle).