Selasa, 01 Agustus 2017

Example 1: Calculation of kmeans to cluster the data set of iris

You can get iris data sets easily, these are public data sets, data sets are often used in research on computer science. They consisted of a collection of iris data that were grouped into three groups of iris variants based on petal size and sepal iris flowers. They can be taken from the website https://archive.ics.uci.edu/ml/datasets/iris.

Here I give an example of calculation using iris data set because they are simple consist of 4 attributes, among which attributes are petal length, petal width, sepal length, petal width and have three class of setosa, versicolour, virginica.

Although labeled but the iris dataset here is positioned as data unknown to the label or means you hide the label. So this data is feasible for use in this example?

The clustering method to be used is k means. Before doing the calculations, below look at the pieces of the dataset that will be used in this example.


Sepal length
Sepal Width
Petal length
Petal Width
5.1
3.5
1.4
0.2
4.9
3.0
1.4
0.2
4.7
3.2
1.3
0.2
4.6
3.1
1.5
0.2
4.8
3.4
1.6
0.2
5.1
3.5
1.4
0.3
7
3.2
4.7
1.4
6.4
3.2
4.5
1.5
5.5
2.3
4.0
1.3
6.3
3.3
4.7
1.6
5.2
2.7
3.9
1.4
6.1
2.9
4.7
1.4
7.1
3.0
5.9
2.1
6.5
3.0
5.9
2.1
7.2
3.6
6.1
2.5
5.8
2.8
5.1
2.4
7.7
2.6
6.9
2.3
5.6
2.8
4.9
2.0

We try to cluster the slice of the data set into 3 clusters.

Let's start to answer.

1. Determining the number of coefficients k (cluster)
    k=3

2. Determining the centroid data, this time we use the random method of retrieving data to 5, 6, 9           from the data set iris.


Sepal length
Sepal Width
Petal length
Petal Width
4.8
3.4
1.6
0.2
5.1
3.5
1.4
0.3
5.5
2.3
4.0
1.3

3. Calculate the distance between all centroids with each data using euclidean distance. 

Here I show the calculation for the first data x1, connected to the three centroid (cent1, cent2, cent3).
This is the first data view taken from the iris data set.

Sepal length
Sepal Width
Petal length
Petal Width
5.1
3.5
1.4
0.2


- Calculating the distance of x1 with the first centroid (cent1)



- Calculating the distance of x1 with the second centroid (cent2)



- Calculate the distance of x1 with the third centroid (cent3)



The next data is calculated as the calculation in the first data so that after all the distance calculated the resulting distance each cluster as below.

Sepal length
Sepal Width
Petal length
Petal Width
Distance
Cluster1
Cluster2
Cluster3
5.1
3.5
1.4
0.2
0.3741657
0.1
3.093542
4.9
3
1.4
0.2
0.4582576
0.547723
2.969848
4.7
3.2
1.3
0.2
0.3741657
0.519615
3.154362
4.6
3.1
1.5
0.2
0.3741657
0.655744
2.984962
4.8
3.4
1.6
0.2
0
0.387298
2.944486
5.1
3.5
1.4
0.3
0.3872983
0
3.059412
7
3.2
4.7
1.4
3.9912404
3.974921
1.886796
6.4
3.2
4.5
1.5
3.5637059
3.581899
1.382027
5.5
2.3
4
1.3
2.9444864
3.059412
0
6.3
3.3
4.7
1.6
3.7188708
3.749667
1.489966
5.2
2.7
3.9
1.4
2.7166155
2.847806
0.519615
6.1
2.9
4.7
1.4
3.6041643
3.668787
1.104536
7.1
3
5.9
2.1
5.2488094
5.266878
2.701851
6.5
3
5.9
2.1
5.0149776
5.069517
2.39583
7.2
3.6
6.1
2.5
5.598214
5.599107
3.229551
5.8
2.8
5.1
2.4
4.2953463
4.368066
1.661325
7.7
2.6
6.9
2.3
6.4459289
6.466838
3.786819
5.6
2.8
4.9
2
3.8897301
3.984972
1.249



























4. Selection of cluster, each distance in each cluster compared to the others, the smallest value indicates the cluster location of the data. Here I demonstrate the first data.

Sepal length
Sepal Width
Petal length
Petal Width
Distance
Cluster1
Cluster2
Cluster3
5.1
3.5
1.4
0.2
0.3741657
0.1
3.093542





Compare the cluster1, cluster2, and cluster3 ranges, it was found that the lowest value was 0.1. This means the first data is entered into cluster2.


So also with other data, use the same way that is compare between the distance of each cluster.

So after the process then obtained the results cluster as follows.

Sepal length
Sepal Width
Petal length
Petal Width
Cluster
Cluster1
Cluster2
Cluster3
5.1
3.5
1.4
0.2
-
v
-
4.9
3
1.4
0.2
v
-
-
4.7
3.2
1.3
0.2
-
-
v
4.6
3.1
1.5
0.2
v
-
-
4.8
3.4
1.6
0.2
v
-
-
5.1
3.5
1.4
0.3
-
v
-
7
3.2
4.7
1.4
-
-
v
6.4
3.2
4.5
1.5
-
-
v
5.5
2.3
4
1.3
-
-
v
6.3
3.3
4.7
1.6
-
-
v
5.2
2.7
3.9
1.4
-
-
v
6.1
2.9
4.7
1.4
-
-
v
7.1
3
5.9
2.1
-
-
v
6.5
3
5.9
2.1
-
-
v
7.2
3.6
6.1
2.5
-
-
v
5.8
2.8
5.1
2.4
-
-
v
7.7
2.6
6.9
2.3
-
-
v
5.6
2.8
4.9
2
-
-
v

To Be Continued..........

Tidak ada komentar:

Posting Komentar