If you are given data that has multiple attributes and has a label then you can use the decision tree to classify unknown data (test data). As the name implies, the algorithm tries to build a decision tree to serve as a model. With this model, each test data can be classified based on the number of labels that have been determined from the decision tree process.
Decision tree model is formed based on calculation result with input in the form of training data. Calculations aim to find important components of decision trees such as finding tree roots, tree branches and tree branches. Roots, branches, twigs, and leaves represent attributes on the dataset affecting the classification process..
The decision tree is then formed into a rule that is used to classify new data (test data) inputted to the decision tree. The rules used in the form of logic "if else" which in the process will produce label classification of data entered into the decision tree.
How to calculate decision trees algorithm?
Model making is done by calculating the impurity value of each attribute on the label contained in the training dataset. The methods used are entropy, gini index, classification error. The impurity value is said to be good if it has a very small value. The impurity calculation is performed on each attribute paired with the label. Next each impurty value on each attribute is compared to the other attribute, the attribute that has the smallest value will be used as the root.
What is the formula of impurity?
The Decision tree actually uses entropy to determine the degree of impurity of its attributes. As the formula shows below.
It can also be used gini index formula
Or the classification error
After obtaining the value of impurty degree on each attribute then done the comparison process on them, if got attribute with the smallest impurty value then it is chosen as root, or stem, or twig.
Tidak ada komentar:
Posting Komentar