Machine Learning: Kmeans algorithm

What is Kmeans?

If you are given a data base x1, x2, x3, ..., xn that have not label or just consisting of a set of attributes then you can group them by using kmeans. With a kmeans you can group unlabeled data whose number of groups is determined by number k. K denotes the number of data groups to be formed in the kmeans. The process of grouping unlabelled data is expressed by clustering.

Steps of kmeans

The calculation process kmeans is very simple and does not require a complicated process.

Kmeans has steps like below:

1. Set up unlabeled data sets x1, x2, x3, ..., xn, dataset consists only of a set of numeric attributes.

2. Determine coefficient k, k denote the number of data groups to be created.

3. Determine the number of data centers c1, c2, ..., ck, number of centers equals the value of
k. determined by random, or other methods.

4. Calculate the distance between each data point with the entire data center by using euclidean distance. Once the distance calculation is done then you get the distance of each point with each centroid.

5. Compare distance between each centroid. The closest distance determines the cluster of data points.

6. Collect all points that have the same cluster and then calculate the mean value. This determines the new centroid.

7. Compare the new centroid with the old centroid, if both are the same then kmeans algorithm has been completed but if not the same then done the calculation process as before. The algorithm process returns to step 4 through step 5 to determine the new centroid.

If you want to see a sample calculation kmeans please go to the post Calculation of kmeans to cluster the data set of iris

Machine Learning

Jumat, 28 Juli 2017

Kmeans algorithm

Tidak ada komentar:

Posting Komentar