CUDA k-Means clustering

Clustering algorithms have been a big part of the research I have conducted over the last four years.

The k-Means clustering algorithm works as follows;

  • Randomly select k centre points from the population X
  • Assign each point in the population to the cluster where the centre is closest
  • Calculate a new centre for each cluster
  • Repeat 2 and 3 until there is no movement in the centre values

This is implemented in C using CUDA kernels for the assignment, calculating new cluster centers and checking for movement. It is important to note that this was written for a purpose so may not be robust when used for other things. It is published here as it may be useful for others.

Download

cuda_kmeans.cu

Usage

Compile:

nvcc cuda_kmeans.cu -o cuda_kmeans

To run it is the filename followed by k value and then number of columns in your data. e.g.

./cuda_kmeans points.txt 10 2

It assumes a space seperated file of points. e.g.

20085 8450
6861 17864
688 2877
5679 2363
10763 19460
4542 19642
7753 6872
7934 22417
12341 27547
12472 19925