Decision Tree C4.5 or J48 – Algorithm, Applications, Advantages & Disadvantage
C4.5 uses “Information gain,” This computation does not, in itself, produce anything new. However, it allows for measuring a gain ratio. The Gain ratio is defined as follows:
The C4.5 Algorithm is as follows.
Applications of Decision Tree C4.5
In an analysis of coal logistics customer
For the analysis of logistic customers need to build a decision tree based on the C4.5 algorithm for coal logistics customer analysis, adopt Pessimistic Error Pruning (PEP) to simplify it, and apply the rules extracted to the CRM of the coal logistics road company which shows that the method can be able to accurately classify the types of customers.
In Scholarship evaluation
Manual evaluation of educational scholarships always revolves around the rank of scholarship and the number of students who are rewarded rather than analyzing the facts which can influence the achievement of the scholarship. As a result, the evaluation is apparently lacking fairness and efficiency. Hence based on the C4.5 decision tree, a higher education scholarship evaluation model is built.
In soil quality grade forecasting model
C4.5 to establish a soil quality grade prediction model and combines the soil composition in Lishu to be a training sample. C4.5 algorithm also expresses the acquired knowledge by means of quantitative rules. The experiment results manifest that the expression of the C4.5 algorithm’s knowledge is easy to understand, is convenient for practical application, improves forecasting accuracy, and provides a reliable theoretical basis for precision fertilization.
In Cattle Disease Classification
In cattle disease classification also C4.5 algorithm has been used and it brought success to predict and classify disease in cattle and so that we can treat the cattle accordingly without any further delay.
In English emotional classification
C4.5 Algorithm of a decision tree to classify semantics (positive, negative, neutral) for the English documents. The C4.5 algorithm on the 70,000 English positive sentences generates a decision tree and many association rules of the positive polarity are created by the decision tree. Also, the C4.5 algorithm on the 70,000 English negative sentences to generate a decision tree, and many association rules of the negative polarity are created by the decision tree. Classifying sentiments of one English document is identified based on the association rules of the positive polarity and the negative polarity.
Advantages of C4.5 Decision Tree
- C4.5 is an easy algorithm to implement.
- It deals with noise.
- It doesn’t get much affected by missing data.
- It can convert the tree to rules
- It can also deal with continuous attributes.
Disadvantages of C4.5 Decision Tree
- The small variation in data can lead to a different decision tree
- It does not work well on the small training data set.
- It is over-fitting.
- Only one attribute at a time is tested for making decisions.
- . https://towardsdatascience.com/what-is-the-c4-5-algorithm-and-how-does-it-work-2b971a9e7db0