Obtaining a Sparser Set of Local Neighborhoods

Obtaining a Sparser Set of Local Neighborhoods

This section covers how to obtain a sub-sampled set of points for extraction of local neighborhoods of the point clouds. The idea behind using local neighborhoods is that some information is best not calculated for every point in a point cloud, but rather at a subset of points, called central points. Usually the information will require computations with a small region around each central point. Once computed for the small region, there is no need to compute it at points in the small region that differ from the chosen central point.

Below are a few ways to obtain these local neighborhoods. They are not all the same. Some allow for the neighborhood regions to overlap, some may not. all involve specifying how big the neighborhood should be. Usually that is done through a radius specification $\bar \rho$.

Using the Connectivity Matrix

The connectivity matrix approach requires one to select a radius $\rho$, then generate a connectivity matrix with that radius. Given a central point, all points that are connected to it form the neighborhood. That's easy to find. The goal then is to come up with a set of central points. One basic procedure is as follows:

First define an inPlay array consisting of all point in the point cloud. False if the point has been taken and is not available for use. True if it is still in play. Mark all entries as true.
Pick a point that has not been taken and mark it as taken.
Find the connected neighbors of the point. That is its neighborhood. Here, it doesn't matter if they are taken or not.
Mark them as taken also.
While there is still a point in play, repeat.

The trick of the approach is to be able to pick an inPlay point. The easiest is to simply find the first non-zero entry and use that as the new central point. Mind you, the above says how to do the picking and neighborhood selection. The procedure is missing the storing of that center point and the local neighborhood into some data structure for later use. The easiest structure to use would be a Matlab cell array that contains lists of indices of all points each the neighborhood. The first index in the list should be the central point, while the 2:end-1 indices are the neighbors. Processing the data for some computation will then use a for loop through the cell array and use of the index array at each cell location.

Note: The reason that a cell array is needed is that the set of neighbors will change from central point to central point. A cell array can store lists that have different lengths.

Note: Some local neighborhoods will not have enough points to compute anything useful (imagine a lone point far away from all the others). In that case, it is up to your processing algorithm to detect that and return some sensible default value.

Using Sub-Sampling and Clustering via Nearest Neighbors

One simple approach that uses a fair amount of Matlab's internal routines would be to sub-sample the point cloud on a grid that is of size $\rho$ or maybe $2 \rho$. Using a nearest-neighbor search on the full point cloud using the sub-sampled point set would then give you the neighboring points. Of course, one would have to check that the nearest neighbors were within some threshold distance (they really should be, but this is a sanity check that the code is working).

Matlab's nearest neighbor search routine provides both the distance to the nearest point, as well as the index to that point from the target set. The Matlab function that does e the searching is called knnsearch. It is documented here, and roughly works as follows:

  [idx, dist] = knnsearch(target, source)

where source is the set of points that we wish to compute the nearest distances for, as well as generate the label assignment. The set of points in target are the ones we want to check to nearest distance against. All points in the source set with the same index belong to the same neighborhood. The point in that neighborhood with the smallest distance to the target point (from the target set at the location index) should be used as the central point. One could try to use the actual target point, but some sub-sampling algorithms also change the data (for example through local averaging).

Implementation: Matlab has an internal point cloud routine for down sampling a point cloud, which is one way to do it. Another is to simply sub-sample by grabbing every $n$-th index from the set of all points, much like in the Matlab example for point cloud normals. Of course, the Matlab code is not optimized for use in the pipeline just described and there is a better, shorter way of doing it, but the main idea holds.

Warning: Many students get caught up on errors that arise from not reading the documentation well. In particular, Matlab and other implementations are not always consistent about how data should be organized when being used as arguments to functions. For vectorial arrays, you need to check if the data is expected to be row-wise or column-wise. Sending the wrong way leads to algorithm mistakes though the code is syntactically correct.

Using Greedy Point Selection

This is sort of the dual to the Matlab nearest neighbor search process and more in line with the connectivity matrix approach. Matlab has a function called rangesearch that finds all the points in a source set that are within range of points in the target set. The return values are cell arrays that give the necessary connectivity and distance information. Since the maximum radius as specified when invoking the function, only the index cell sets are needed. They provide the connectivity information. As noted in the Matlab documentation for the function, its goes as follows:

  [idx, dists] = rangesearch(source, target, radius)

Here, the source point set is the original data, while the target point set is the sub-sampled point set. The cell array provides the indices into the source set of close or connected points to each point in the target set. As an example idx{1} gives the indices of the source set that are within the specified distance radius of the point target(1,:) of the target set. How to compute the sub-sampled target set was explained above.

Then, the next step would be to process the connectivity sets determined by the cell array idx.

What Should the Radius Be?

Radius selection for locality calculations has been studied a fair amount, with many folk coming up with different rules. Surely trying to figure out what this number should be by guess and checking is probably not optimal, and definitely does not lend itself to automation. Here are a couple of ideas:

Compute the median distance of all pair-wise point distances. For that, the pdist function could be useful.
Compute the average distance of only the $k$-nearest neighbors of a point set to itself. Basically, the $k$-nearest neighbors are sought for each point, then all of the distances are collected for these neighbors. Their mean is then computed (or median if you'd prefer). No for loops should be necessary. Matlab's knnsearch method does most of the work. A couple lines later, you are good to go.
The value $k$ should be chosen so that you almost always get the number of neighbors you need with the radius chosen. Say you need 10 points, then a good value for $k$ would be something like 15. It should be that more than 70% of the time, you will get 10 points.

Back

Table of Contents

Obtaining a Sparser Set of Local Neighborhoods

Using the Connectivity Matrix

Using Sub-Sampling and Clustering via Nearest Neighbors

Using Greedy Point Selection

What Should the Radius Be?