In Data mining C4.5 algorithm is used as a Decision Tree Classifier, which employed to generate decisions that are based on a certain sample of data. A decision tree nearly looks like any flow chart, which has some conditions, which influence a decision.
C4. 5 converts the trained trees into sets of if-then rules. This accuracy of each rule is then evaluated to determine the order in which they should be applied. Pruning is done by removing a rule’s precondition if the accuracy of the rule improves without it.
C4. 5 algorithm was proposed in 1993, again by Ross Quinlan, to overcome the limitations of the ID3 algorithm. At each node of the tree, C4.5 chooses one attribute of the data that most effectively splits its set of samples into subsets enriched in one class or the other. Its criterion is the normalized information gain (difference in entropy) that results from choosing an attribute for splitting the data. The attribute with the highest normalized information gain is chosen to make the decision.
C4.5 uses “Information gain,” This computation does not, in itself, produce anything new. However, it allows for measuring a gain ratio. The Gain ratio, is defined as follows:
The C4.5 Algorithm is as follows.
Applications of Decision Tree C4.5
In an analysis of coal logistics customer
For the analysis of logistic customers need to build a decision tree based on the C4.5 algorithm for coal logistics customer analysis, adopt Pessimistic Error Pruning (PEP) to simplify it, and apply the rules extracted to the CRM of the coal logistics road company which shows that the method can be able to accurately classify the types of customers.
In Scholarship evaluation
Manual evaluation of educational scholarships always revolves around the rank of scholarship and the number of students who are rewarded rather than analyzing the facts which can influence the achievement of the scholarship. As a result, the evaluation is apparently lack fairness and efficiency. Hence based on the C4.5 decision tree, a higher education scholarship evaluation model is built.
In soil quality grade forecasting model
C4.5 to establish a soil quality grade prediction model and combines the soil composition in Lishu to be a training sample. C4.5 algorithm also expresses the acquired knowledge by means of quantitative rules. The experiment results manifest that the expression of the C4.5 algorithm’s knowledge is easy to understand, is convenient for practical application, improves forecasting accuracy, and provides a reliable theoretical basis for precision fertilization.
In Cattle Disease Classification
In cattle disease classification also C4.5 algorithm has been used and it brought success to predict and classify disease in cattle and so that we can treat the cattle accordingly without any further delay.
In English emotional classification
C4.5 Algorithm of a decision tree to classify semantics (positive, negative, neutral) for the English documents. The C4.5 algorithm on the 70,000 English positive sentences to generate a decision tree and many association rules of the positive polarity are created by the decision tree. Also, the C4.5 algorithm on the 70,000 English negative sentences to generate a decision tree and many association rules of the negative polarity are created by the decision tree. Classifying sentiments of one English document is identified based on the association rules of the positive polarity and the negative polarity.
Advantages of C4.5 Decision Tree
- C4.5 is an easy algorithm to implement.
- It deals with noise.
- It doesn’t get much affected with missing data.
- It can convert the tree to rules
- It can also deal with continuous attributes.
Disadvantages of C4.5 Decision Tree
- Small variation in data can lead to a different decision tree
- It does not work well on the small training data set.
- It is over fitting.
- Only one attribute at a time is tested for making decisions.
- . https://towardsdatascience.com/what-is-the-c4-5-algorithm-and-how-does-it-work-2b971a9e7db0