Both sides previous revisionPrevious revisionNext revision | Previous revision |
ece4580:module_surveillance [2017/04/03 10:02] – pvela | ece4580:module_surveillance [2024/08/20 21:38] (current) – external edit 127.0.0.1 |
---|
- Target recognition | - Target recognition |
| |
| ===== Learning Modules ===== |
| ---------------------------- |
The sequence below introduces one aspect of surveillance systems at a time. They direct you to Matlab code that sometimes implements multiple steps at a time. It is recommended that you implement each one individually to get a sense for what role it plays in the entire system, rather than just copy/paste the whole system. | The sequence below introduces one aspect of surveillance systems at a time. They direct you to Matlab code that sometimes implements multiple steps at a time. It is recommended that you implement each one individually to get a sense for what role it plays in the entire system, rather than just copy/paste the whole system. |
| |
| |
Module Set #1: A Basic Surveillance System | Module Set #1: A Basic (Foreground Detection-Based) Surveillance System |
- [[ECE4580:Module_Surveillance:M1W1|Week #1]]: Setup, Data, and Basics | - [[ECE4580:Module_Surveillance:M1W1|Week #1]]: Setup, Data, and Basics |
- [[ECE4580:Module_Surveillance:M1W2|Week #2]]: Foreground Object Extraction | - [[ECE4580:Module_Surveillance:M1W2|Week #2]]: Foreground Object Extraction |
- [[ECE4580:Module_Surveillance:M1W4|Week #4]]: Adding Temporal Dynamics via a Kalman Filter | - [[ECE4580:Module_Surveillance:M1W4|Week #4]]: Adding Temporal Dynamics via a Kalman Filter |
| |
Module Set #2: Target Re-Identification | Module Set #2: Target Modelling and Re-Identification |
- [[ECE4580:Module_Surveillance:M2W1|Week #1]]: Differentiating People | - [[ECE4580:Module_Surveillance:M2W1|Week #1]]: Differentiating People |
- [[ECE4580:Module_Surveillance:M1W2|Week #2]]: Testing the Person Model | - [[ECE4580:Module_Surveillance:M2W2|Week #2]]: Testing the Person Model |
- [[ECE4580:Module_Surveillance:M1W3|Week #3]]: Enhancing Tracking | - [[ECE4580:Module_Surveillance:M2W4|Week #3]]: Re-Identification in Action |
- [[ECE4580:Module_Surveillance:M1W4|Week #4]]: Re-Identification in Action | - [[ECE4580:Module_Surveillance:M2W3|Week #4]]: Enhancing Tracking |
| |
===== Module #1: A Basic Surveillance System ===== | Module #3: Merging and Splitting |
--------------------------------------------------- | - TBD |
==== Week #1: Setup, Data, and Basic ==== | |
Explore the [[ECE4580:Module_Surveillance:MatlabVideos|datasets available]] and select a couple of videos to use as the starting point for processing, testing, and debugging the surveillance system to be created as part of this learning module. | |
| |
Meanwhile, check out this review of three [[http://www.eetimes.com/document.asp?doc_id=1275604|simple background modeling methods]]. You are not advised to use the code provided when implementing the same in future activities. There are more efficient ways to do the same without resorting to as many for loops, plus the implementation needs to be packaged up for use by the overall system. Plus, the activities below utilize existing Matlab libraries to the extent possible or desirable. | Module #4: Tracking vs Detection |
| - TBD. |
As a first step, obtain the [[http://pvela.gatech.edu/classes/files/ECE4580/Module_Surveillance/survSystem.m|surveillance system class stub]] and also the [[http://pvela.gatech.edu/classes/files/ECE4580/Module_Surveillance/mainLoop.m|main execution script]]. Modify the main loop code stub so it loads a video you've chosen, loops through the frames, displays the image associated the each frame, and quits when that's done. Naturally this code won't do any surveillance, but it will setup the system to do so. | |
| |
As a second step, implement the basic background modeling detection step. Matlab has implementation of the [[https://www.mathworks.com/help/vision/ref/vision.foregrounddetector-class.html|mixtures of Gaussians adaptive background estimation algorithm]]. Just perform the estimation part and retrieve the binary foreground image. Modify the ''displayState'' function to display this output. When run, the system should display the source video, plus a binary image sequence associated to the detected objects. | |
| |
//Explore & Deliverables:// How well is the background modeled? You can identify how well it works by examining the quality of the binary image sequence. Does it capture the target objects only? Are there more false positives or false negatives than you like? What did you do to get the best result possible (to what parameters)? You should turn in at least one image pair showing the input frame, plus the output frame after foreground detection (or with the mask as noted in the code stub). | |
| |
==== Week #2: Foreground Object Extraction ==== | |
With the background modelling step done, we have a means to identify regions of the image that do not conform to the expected scene. We will presume that these are all objects of interest to track, e.g. //targets//. Advance beyond the current foreground modelling step to include processing of the binary foreground image for extraction of detected targets and their bounding boxes. | |
[[https://www.mathworks.com/help/vision/examples/detecting-cars-using-gaussian-mixture-models.html|Extracting the foreground objects]] is really performing blob extraction from the binary foreground image. While there are Matlab examples that go beyond simply detecting the bounding box, this activity is simply asking to place in the surveillance system the foreground detection, the blob extraction, and the visualization with track number. | |
| |
Flesh out the ''overlayState'' function so that it can overlay the surveillance system output over the current image frame. In addition to keeping track of the total number of detected objects, like in the example, for each box plotted, plot the detection index associated to the box like [[http://www.mathworks.com/help/examples/vision_product/multiObjectTracking_02.png|here]]. You will also need to modify the functions that are invoked within the ''process'' member function. You may find that some will not be modified due to how the Matlab implementation works. | |
| |
Apply the algorithm to the video [[http://www.cvg.reading.ac.uk/PETS2009/a.html | S2.L1 Walking, View 001]] from the PETS 2009 People Tracking Dataset. It may be a collection of images, in which case some modification of the main loop will be needed. I have a [[https://github.gatech.edu/ivaMatlibs/readers|github repository]] of reader functions/classes, one of which allows for reading from a directory with images. It is called ''impathreader'' and has an interface somewhat similar to how Matlab deals with video. You are free to use it. Also apply to the couple of videos that you've selected (here, couple = exactly two). If one of them is the S2.L1 PETS 2009 video, then select another one to process. | |
| |
//Explore & Deliverables:// Turn in state overlays of the processed video for the specified PETS 2009 dataset, plus for your two additional videos chosen. Discuss the performance of the foreground detection process, as well as the detection labelling for the targets over time. | |
* Are there any challenges that pop up regarding the box estimation process? | |
* How did you select your parameters? | |
* What are morphological operations? | |
* Explain the relationship between the erode, dilate, open, and close operations. | |
* One thing that the Matlab examples do that I requested not to include is to include some kind of spatio-temporal tracking module to maintain the track identify of the objects. Based on pure detection and the processing natively done in Matlab, how often do you see the numerical identity given to a detected person change? What happens as they come in and out? cross people? How often? | |
Naturally, you should be turning in your code in the submission document. The best is to provide code snippets of the class member functions that you modified as part of this activity. If you submit the entire class, then highlight the modified functions so that we can see what was done. | |
| |
==== Week #3: Optimization-Based Data Association ==== | |
Performing detection does provide a means to identify objects of interest versus the prevailing background image. However if we are interested in maintaining the identity of the objects, additional processing and logic is required. The simplest scheme simply considers the spatio-temporal history of the targets and tries to link the current detected objects to the previously detected objects. This form of data association is known as the //assignment problem.// | |
| |
The classic algorithms for performing the assignment are the www.mathworks.com/matlabcentral/fileexchange/20652-hungarian-algorithm-for-linear-assignment-problems--v2-3-|Hungarian or Munkres' algorithm]] and | |
[[https://www.mathworks.com/matlabcentral/fileexchange/26836-lapjv-jonker-volgenant-algorithm-for-linear-assignment-problem-v3-0|Jonker-Volgenant Algorithm]]. Another [[http://goldberg.berkeley.edu/pubs/acc-2012-visual-tracking-final.pdf|paper]] a few years ago looked at a different version of the assignment problem based on "stable marriage" selection. All of them seek a solution to the assignment problem, but do so in different ways. | |
| |
Incorporate the assignment component into your surveillance system. The net result should be similar to this Matlab demo on [[http://www.mathworks.com/help/vision/examples/motion-based-multiple-object-tracking.html|multiple object tracking]], which also implements far more than what is described. You should strip the extras. You'll get there, but I want for you to see the role that each component plays. Yes, Matlab gives almost all of the answer, but that's not the point. At least not for me. | |
| |
//Explore and Deliverables:// Run your surveillance system on the two videos selected plus the mandatory one from earlier weeks, both with and without the assignment problem. Turn in how many people there really were, plus how many the two versions of the system claimed there were at the end of the video, or at the end of the processing. Comment on whether the identity tracking appeared to improve the system's ability to maintain the identity of the targets. | |
| |
| |
====Week #4: Adding Temporal Dynamics via a Kalman Filter==== | |
We can do a better job handling things like occlusions, as well as improve the data association, by adding in a temporal filter. A simple, powerful method is to recursively estimate and correct the moving target dynamics through a Kalman filter. Matlab has a page on how to do so for | |
[[https://www.mathworks.com/help/vision/examples/using-kalman-filter-for-object-tracking.html|object tracking]] along with a more general [[https://www.mathworks.com/discovery/kalman-filter.html | intro page]] to additional documentation. In corporate the Kalman filter into the overall process, so that detected people have their trajectories predicted. | |
| |
A Kalman filter is really what would be called an //observer// in the world of control theory. In some instances it works like how you imagine a filter should. However, in others it goes one step further and can provide filtered outputs of data that hasn't even been measured. That aspect is what makes it an //observer//. The most important part of the Kalman filter, that gives it the observer property, is the prediction dynamics. Through some nice statistics the prediction is joined with the measurement to create corrected outputs of the system state. | |
* If you are interested in the details of the Kalman filter and their connection to code, this Matlab file exchange [[https://www.mathworks.com/matlabcentral/fileexchange/5377-learning-the-kalman-filter|code tutorial]] may be of assistance. | |
| |
//Explore and Deliverable:// The Matlab version allows you to specify different prediction dynamics. These include a constant velocity model and a constant acceleration model. How do they differ mathematically? How do they differ in implementation? For one video quantitatively evaluate the different outcomes like before: use your known count of people in and out from before and compare to what the two versions output. In addition to the prediction models, the Kalman filter has some covariance parameters that arise from the probabilistic derivation of the Kalman filter. What role do they play? What are the main two covariance parameters and what do they mean? How sensitive is the system to these two covariance parameters? | |
| |
**Additional Task:** For the future activities, we will need a body of test cases. Your assignment is to generate the data set of test cases. The next module will explore something in addition to spatio-temporal filtering for maintaining track of objects. It will be person identification, or better put, //person re-identification//. Testing out this concept will require collecting pairs of images that correspond to the same person. From the three videos, and your background estimation-based tracking, extract the image regions associated to the bounding boxes of individual people. Also, extract the bounding boxes for the binary foreground blob regions. If you do this, you should have four cropped images, two of the same person and two of that person's blob. Collect this information for 20 people, only 2 of which are from the mandatory video, 8 from your selected video pair, and 10 more from other videos you choose from the collection of videos available. | |
| |
Save them all in a Matlab file in a structure array called ''peopleA'' and ''peopleB'', where ''peopleA'' contains one image of each person, and ''peopleB'' contains the second. The ordering should match, as in the structure element ''peopleA(1).person'' should be the same person as in ''peopleB(1).person'', where the ''person'' field has the cropped image region of the person based on the bounding box. The structure field labeled ''fgblob'' should be the binary mask from the foreground detection. When applied as a mask, it should give the person region only (in principle). | |
| |
| |
===== Module #2: Going Further ===== | |
------------------------------------ | |
| |
Here, we are going to step away from the video processing part for a moment and focus solely on the datasets that you have all generated. We want to test out an algorithm for correctly matching the pairs of people based on their image information alone. The more people in the dataset, the trickier it becomes. In a binary selection process, random guessing will get you a 50% correctness outcome. When selecting amongst 10 things, then random guessing is only 10% correct. Naturally, it drops inversely proportional to the number of distinct people. Can we beat this probabilistic nightmare? Let's see. | |
| |
==== Week #1: Differentiating People ==== | |
| |
The most basic form is distinguishing targets based on their appearance would be to | |
Gaussian mixture model for targets using [[http://www.mathworks.com/matlabcentral/fileexchange/26184-em-algorithm-for-gaussian-mixture-model|Expectation Maximization algorithm]]. | |
| |
- Please everyone upload their people Matlab files (with some unique name) to the [[https://drive.google.com/open?id=0Bx2Un_yG4X8YOGhRVWd3TjdrWlU|ECE4580 people google drive folder]]. It is necessary so others can have access to your people, much like you'll need access to their people. | |
- Implement a target-based [[https://www.mathworks.com/help/stats/gmdistribution.fit.html|Gaussian mixture model estimator]]. Apply to each target in //your// **peopleA** dataset. | |
- Visualize the Gaussian mixture model estimate. | |
| |
OK, now that we know what to do, there is a matter of how to do it. We are not going to use just any Gaussian mxture model because that typically leads to non-sensical results. Rather we will use an over-segmented Gaussian mixture model. First, the data needs to be converted from image form to the proper form for use as a Gaassian mixture model. You will need to create a specialized flattening function. The input is the image proper, but the output is both the image data and the pixel coordinate location of the data. Data type wise we have an M x N x 3 matrix that gets converted to a 5 x (M*N) matrix which has the pixel locations appended. Assume that the image is centered. That means the middle of the image is at coordinate (0,0). It is almost as easy as doing a reshape, but the pixel coordinates are missing. For that, you can create them using the ''meshgrid'' function. Your two grids should be of the same dimension of the image patch of the target, but be centered (really, it is just an offset). you should be able to figure it out with a little messing around. Call it ''flattenTarget.'' | |
| |
Once you have the data converted, we need to estimate a mixture model. Matlab will give crappy models if you let it guess, so you should provide the guesses. The guesses will be sampled data from the actual image patch. | |
Think of it as writing a ''flattenTargetSubsampled'' function. Instead of returning every pixel and its coordinates, you should only return them on a coarser grid. Like every 5 over and 5 down, or 7 over and down (numbers are pixels). So, for a 21 x 31 target with origin at middle it might grab all x coordinates (-10, -5, 0, 5, 10) and y coordinates (-15, -10, -5, 0, 5, 10, 15) for a total of 35 sub-sampled points. These will be the guesses. Then run the Gaussian mixture model to estimate the data. You might have to prevent death of a model, or allow for death of a model. Play around with the parameters. | |
| |
Now, how do we visualize the data. We will do so using nearest neighbors. Take the resulting output mixtures and covariances. Extract only the position part (the last two if data is R-G-B-x-y), and also extract only the position sub-matrix of the covariance matrix. Pack these back up into a mixture distribution. Take the meshgrid output for the image patch coordinates only and use the ''cluster'' function on the list of image patch coordinates to give them an assignment. What you want to do is to replace each assignment with the color of the original Gaussian model (ignore the coordinates). Now you should have a bunch of colors. Reshape them into the image patch size and that should give you an image. Plot the images for your people. It should be some kind of piecewise constant approximation of the original person based on your mixture model. It might look cartoony or stylized in some sense. | |
| |
The sequence of activities above both create the model, and test that the model is correct. You are hopefully comfortable with reshaping images into vector data, manipulating it, then reshaping it back into image data nd displaying. If not, then time to become an expert. | |
| |
==== Week #2: Testing the Method ==== | |
| |
Once we know the mixture models are proper, then the next step is to actually use the models as a means to test proximity of one model to another. If two models are close, or the data from a model is consistent with the model, then two should be the same. In this manner, we can identify when the same person comes in and out, or when it gets occluded and re-appears. This action is called //re-identification//. | |
| |
To start simply, let's first consider the case of checking the data from the model for consistency. Imagine that there are 10 people who has walked in, each with their own Gaussian mixture model as the signature, and that they have subsequently left the scene. One of those 10 returns. Which of them is it? ... Well, we have a new set of image data from the binary masked image portion of the recently entered person. Our question get converted to the question: Which of the existing models is the new data a good fit for? That naturally begets the question: How can we create a scoring mechanism for testing fitness of data to existing models? | |
| |
We have been doing this in the course homework using a scoring energy principally based on squared distance. For Gaussian models, the squared distance is not so appropriate since there is a known covariance matrix which describes how the space should be warped to respect the spread of the data. Such a warped squared distance is known as the [[https://en.wikipedia.org/wiki/Mahalanobis_distance|Mahalanobis distance]], though in other domains it is simply a non-trivial L2 norm. Matlab's implementation of a the Gaussian mixture (GM) model class has a member function for computing the Mahalanobis distance of data to a GM model (called 'mahal'). The equation for the scoring energy is: | |
| |
Testing | |
| |
| |
| |
==== Week #3: Appearance-Based Data Association ==== | |
| |
Comparing models. | |
| |
==== Week #4: Re-Identification ==== | |
| |
Using comparison to know when same person has re-entered. | |
| |
===== Module #3: Merging and Splitting ===== | |
-------------------------- | |
| |
TBD. | |
| |
===== Additional Information ===== | ===== Additional Information ===== |