====== Obtaining a Sparser Set of Local Neighborhoods ======

This section covers how to obtain a sub-sampled set of points for extraction of local neighborhoods of the point clouds. The idea behind using local neighborhoods is that some information is best not calculated for every point in a point cloud, but rather at a subset of points, called central points. Usually the information will require computations with a small region around each central point.  Once computed for the small region, there is no need to compute it at points in the small region that differ from the chosen central point.

Below are a few ways to obtain these local neighborhoods.  They are not all the same.  Some allow for the neighborhood regions to overlap, some may not.  all involve specifying how big the neighborhood should be.  Usually that is done through a radius specification $\bar \rho$.

===== Using the Connectivity Matrix =====

The connectivity matrix approach requires one to select a radius $\rho$, then generate a connectivity matrix with that radius.  Given a central point, all points that are connected to it form the neighborhood.  That's easy to find.  The goal then is to come up with a set of central points.  One basic procedure is as follows:
  - First define an ''inPlay'' array consisting of all point in the point cloud. False if the point has been taken and is not available for use.  True if it is still in play. Mark all entries as true.
  - Pick a point that has not been taken and mark it as taken.
  - Find the connected neighbors of the point.  That is its neighborhood. Here, it doesn't matter if they are taken or not.
  - Mark them as taken also.
  - While there is still a point in play, repeat.

The trick of the approach is to be able to pick an ''inPlay'' point.  The easiest is to simply ''find'' the first non-zero entry and use that as the new central point.  Mind you, the above says how to do the picking and neighborhood selection.  The procedure is missing the storing of that center point and the local neighborhood into some data structure for later use.  The easiest structure to use would be a Matlab ''cell array'' that contains lists of indices of all points each the neighborhood.  The first index in the list should be the central point, while the ''2:end-1'' indices are the neighbors.  Processing the data for some computation will then use a ''for'' loop through the cell array and use of the index array at each cell location.

//Note:// The reason that a cell array is needed is that the set of neighbors will change from central point to central point.  A cell array can store lists that have different lengths.

//Note:// Some local neighborhoods will not have enough points to compute anything useful (imagine a lone point far away from all the others).  In that case, it is up to your processing algorithm to detect that and return some sensible default value.

===== Using Sub-Sampling and Clustering via Nearest Neighbors =====

One simple approach that uses a fair amount of Matlab's internal routines would be to sub-sample the point cloud on a grid that is of size $\rho$ or maybe $2 \rho$. Using a nearest-neighbor search on the full point cloud using the sub-sampled point set would then give you the neighboring points.  Of course, one would have to check that the nearest neighbors were within some threshold distance (they really should be, but this is a sanity check that the code is working).

Matlab's nearest neighbor search routine provides both the distance to the nearest point, as well as the index to that point from the target set.  The Matlab function that does e the searching is called ''knnsearch''. It is documented [[https://www.mathworks.com/help/stats/knnsearch.html?requestedDomain=www.mathworks.com | here]], and roughly works as follows:
<code>
  [idx, dist] = knnsearch(target, source)
</code>
where ''source'' is the set of points that we wish to compute the nearest distances for, as well as generate the label assignment.  The set of points in ''target'' are the ones we want to check to nearest distance against.  All points in the ''source'' set with the same index belong to the same neighborhood.  The point in that neighborhood with the smallest distance to the target point (from the target set at the location ''index'') should be used as the central point.  One could try to use the actual target point, but some sub-sampling algorithms also change the data (for example through local averaging).

//Implementation:// Matlab has an internal point cloud routine for [[https://www.mathworks.com/help/vision/ref/pcdownsample.html|down sampling]] a point cloud, which is one way to do it.  Another is to simply sub-sample by grabbing every $n$-th index from the set of all points, much like in the [[https://www.mathworks.com/help/vision/ref/pcnormals.html|Matlab example]] for point cloud normals. Of course, the Matlab code is not optimized for use in the pipeline just described and there is a better, shorter way of doing it, but the main idea holds.

//Warning:// Many students get caught up on errors that arise from not reading the documentation well.  In particular, Matlab and other implementations are not always consistent about how data should be organized when being used as arguments to functions.  For vectorial arrays, you need to check if the data is expected to be **row-wise** or **column-wise**. Sending the wrong way leads to algorithm mistakes though the code is syntactically correct.

===== Using Greedy Point Selection =====

This is sort of the dual to the Matlab nearest neighbor search process and more in line with the connectivity matrix approach.  Matlab has a function called ''rangesearch'' that finds all the points in a source set that are within range of points in the target set. The return values are cell arrays that give the necessary connectivity and distance information.  Since the maximum radius as specified when invoking the function, only the index cell sets are needed.  They provide the connectivity information.  As noted in the [[https://www.mathworks.com/help/stats/rangesearch.html| Matlab documentation]] for the function, its goes as follows:
<code>
  [idx, dists] = rangesearch(source, target, radius)
</code>
Here, the ''source'' point set is the original data, while the ''target'' point set is the sub-sampled point set.  The cell array provides the indices into the ''source'' set of close or connected points to each point in the ''target'' set. As an example ''idx{1}'' gives the indices of the ''source'' set that are within the specified distance radius of the point ''target(1,:)'' of the target set. How to compute the sub-sampled ''target'' set was explained above.

Then, the next step would be to process the connectivity sets determined by  the cell array ''idx''.

===== What Should the Radius Be? =====

Radius selection for locality calculations has been studied a fair amount, with many folk coming up with different rules.  Surely trying to figure out what this number should be by guess and checking is probably not optimal, and definitely does not lend itself to automation.  Here are a couple of ideas:

  - Compute the median distance of all pair-wise point distances.  For that, the ''pdist'' function could be useful.
  - Compute the average distance of only the $k$-nearest neighbors of a point set to itself. Basically, the $k$-nearest neighbors are sought for each point, then all of the distances are collected for these neighbors.   Their mean is then computed (or median if you'd prefer). No for loops should be necessary. Matlab's ''knnsearch'' method does most of the work.  A couple lines later, you are good to go. \\ The value $k$ should be chosen so that you almost always get the number of neighbors you need with the radius chosen. Say you need 10 points, then a good value for $k$ would be something like 15. It should be that more than 70% of the time, you will get 10 points.

--------------------------
;#;
[[ECE4580:Module_PCD:LocalNormals|Back]]
;#;