Machine Learning: weakness of kmeans algorithm

Although kmeans have many advantages and are often used in many studies in the field of computer science, the kmeans have weakness, these shortcomings which lead to popping up research on kmeans.

This article is not intended to discuss how to solve the problem of weaknesses kmeans but only mentions the weaknesses of kmeans algorithm. Among the weaknesses found in the kmeans algorithm are:

1. When the numbers of data are not so many, initial grouping will determine the cluster significantly.
2. The result is circular cluster shape because based on distance.
3. The number of cluster, K, must be determined before hand. Selection of value of K is itself an issue and sometimes its hard to predict before hand the number of clusters that would be there in data
4. We never know the real cluster, using the same data, if it is inputted in a different order may produce different cluster if the number of data is few.
5. Sensitive to initial condition. Different initial condition may produce different result of cluster. The algorithm may be trapped in the local optimum.
6. We never know which attribute contributes more to the grouping process since we assume that each attribute has the same weight.
7. Weakness of arithmetic mean is not robust to outliers. Very far data from the centroid may pull the centroid away from the real one.
8. Experiments have shown that outliers can be a problem and can force algorithm to identify false clusters.
9. Experiments have shown that performance of algorithms degrade in higher dimensions and can be off by factor of 5 from optimum

Machine Learning

Rabu, 02 Agustus 2017

weakness of kmeans algorithm

Tidak ada komentar:

Posting Komentar