Decision Tree C4.5 or J48 – Algorithm, Applications, Advantages & Disadvantage

In data mining C4.5 algorithm is considered a supervised learning algorithm. That is a Decision Tree and also known as J48 decision tree. The C4.5 or J48 decision tree algorithm is employed to generate decision tree or IF-THEN-ELSE rules. The C4. 5 algorithm was proposed in 1993, by Ross Quinlan, to overcome the limitations of the ID3 algorithm. At each node of the tree, C4.5 chooses one attribute of the data that most effectively splits its set of samples into subsets enriched in one class or the other. Its criterion is the normalized information gain (difference in entropy) that results from choosing an attribute for splitting the data. The attribute with the highest normalized information gain is chosen to make the decision.

C4.5 uses “Information gain,” This computation does not, in itself, produce anything new. However, it allows for measuring a gain ratio. The Gain ratio is defined as follows: 

The C4.5 Algorithm is as follows.

Applications of Decision Tree C4.5

In an analysis of coal logistics customer 

For the analysis of logistic customers need to build a decision tree based on the C4.5 algorithm for coal logistics customer analysis, adopt Pessimistic Error Pruning (PEP) to simplify it, and apply the rules extracted to the CRM of the coal logistics road company which shows that the method can be able to accurately classify the types of customers. 

In Scholarship evaluation 

Manual evaluation of educational scholarships always revolves around the rank of scholarship and the number of students who are rewarded rather than analyzing the facts which can influence the achievement of the scholarship. As a result, the evaluation is apparently lacking fairness and efficiency. Hence based on the C4.5 decision tree, a higher education scholarship evaluation model is built. 

In soil quality grade forecasting model 

C4.5 to establish a soil quality grade prediction model and combines the soil composition in Lishu to be a training sample. C4.5 algorithm also expresses the acquired knowledge by means of quantitative rules. The experiment results manifest that the expression of the C4.5 algorithm’s knowledge is easy to understand, is convenient for practical application, improves forecasting accuracy, and provides a reliable theoretical basis for precision fertilization. 

In Cattle Disease Classification 

In cattle disease classification also C4.5 algorithm has been used and it brought success to predict and classify disease in cattle and so that we can treat the cattle accordingly without any further delay.

In English emotional classification 

C4.5 Algorithm of a decision tree to classify semantics (positive, negative, neutral) for the English documents. The C4.5 algorithm on the 70,000 English positive sentences generates a decision tree and many association rules of the positive polarity are created by the decision tree. Also, the C4.5 algorithm on the 70,000 English negative sentences to generate a decision tree, and many association rules of the negative polarity are created by the decision tree. Classifying sentiments of one English document is identified based on the association rules of the positive polarity and the negative polarity.

Advantages of C4.5 Decision Tree

  1. C4.5 is an easy algorithm to implement. 
  2. It deals with noise. 
  3. It doesn’t get much affected by missing data.  
  4. It can convert the tree to rules 
  5. It can also deal with continuous attributes. 

Disadvantages of C4.5 Decision Tree 

  1. The small variation in data can lead to a different decision tree 
  2.   It does not work well on the small training data set.
  3.   It is over-fitting. 
  4.  Only one attribute at a time is tested for making decisions. 


No Comments
Posted in:
Data Mining
There are no comments yet.
Write a comment
Your comment
Follow by Email