CUDA k-Means clustering

Clustering algorithms have been a big part of the research I have conducted over the last four years.

The k-Means clustering algorithm works as follows;

Randomly select k centre points from the population X
Assign each point in the population to the cluster where the centre is closest
Calculate a new centre for each cluster
Repeat 2 and 3 until there is no movement in the centre values

This is implemented in C using CUDA kernels for the assignment, calculating new cluster centers and checking for movement. It is important to note that this was written for a purpose so may not be robust when used for other things. It is published here as it may be useful for others.

Download

cuda_kmeans.cu

Usage

Compile:

nvcc cuda_kmeans.cu -o cuda_kmeans

To run it is the filename followed by k value and then number of columns in your data. e.g.

./cuda_kmeans points.txt 10 2

It assumes a space seperated file of points. e.g.


				20085 8450

				6861 17864

				688 2877

				5679 2363

				10763 19460

				4542 19642

				7753 6872

				7934 22417

				12341 27547

				12472 19925

Martin Vickers

Researcher and former SysAdmin

CUDA k-Means clustering

Download

Usage