Connected Components Clustering for Point Clouds
Connected component clustering seeks to identify the sets of points that are “connected” where connectivty is determined by some kind of test. In our case that test is proximity, if two points are close enough then they are connected. The simplest form of connectivity clustering is a greedy method that arbitrarily picks a point (called the seed point) and finds its neighbors, then finds their neighbors, and so on until there are no new neighbors being added. The set of points becomes a cluster, and no longer available. The process repeats with a new seed point selected, and only available points added to the mix until there are no new ones. Then it repeats again until there are no new seed points to be chosen.
The process we will use involves a connectivity matrix (also called an adjacency matrix). Let $C$ be the connectivity matrix. To know if point $p_i$ is close to point $p_j$, we look at matrix entry $C(i,j)$. If it is true, then the two points are connected.
Here is the main algorithm for doing so:
- Compute the proximity matrix (see
pdist
). - Apply a threshold to the proximity matrix to get the connectivity matrix $C$.
- Find the first row with a non-zero as the index.
- numSets = 0
- Set labels to be a 1 x $n_p$ matrix where $n_p$ is the number of points in the point cloud.
- While there is an index to grab
- numSets = numSets + 1
- Grab a random point from the point cloud. Let its index be $i$.
- Initialize the index set $S = {i}$.
- Initialize the neighbor index set $N = {i}$.
- While the neighbor set is not empty, do the following:
- Zero out the rows for the current neighbor index set ($C(N,:) = 0$). Makes them no longer available.
- Given the neighbor set, find indices of its connected components. These are all of the indices that are non-zero in the column $C(N,:)$.
- Add these indices to the index set $S$.
- SOverwrite the neighbor index set $N$ to the found connected components.
- The index set $S$ forms a cluster.
- Set labels($S$) = numSets
Each time a new set $S$ is computed, it defines a connected component cluster. The indices in the set $S$ are then used to label the points. The variable labels indicates what label each point is in, in order of the points stored in the point cloud matrix. In the code stub, there is code to generate a matrix of different color values by hue for each label. Use that to generate a large matrix that is $n_p$ x 3 of colors, where the color chosen is based on the label assignment. This color matrix can be computed in one line of code if you know your Matlab. Invoke the setColor
function with the matrix to assign a color to each point. The next time you plot the points, they will be colored according to their assignment.
The labels variable should be returned.