ID3 is a decision tree algorithm. ID3 is an abbreviation for Iterative Dichotomiser 3. It was invented by J. Ross Quinlan in 1975. It uses a fixed set of examples to build a decision tree. The resulting tree is used to classify further examples or samples. The class name contains the leaf nodes and the non-leaf nodes are the decision nodes. It is used to generate a decision tree from a given data set by employing a top-down, greedy search, to test each attribute at every node of the tree. The resulting tree used to classify future samples.

We use two mathematical factors to implement the ID3 algorithm.

**1. Entropy**

It is a fundamental theorem which commonly used in information theory to measure the importance of information relative to its size. Let x is our training set contains positive and negative examples, then the entropy of x relative to this classification is:

**2. Information Gain **

In multivariate calculus, we have learned how to use a partial derivative of each variable relative to all other variables to find local optimum. In information theory, we used a similar concept, we derive the original entropy of the population to measure the information gain of each attribute. For training set x and its attribute y, the formula of Information Gain is:

**Steps of ID3 Algorithm**

1. Calculate Entropy for the dataset.

2.For each attribute/ feature,

a. Calculate entropy for its all categorical values.

b. Calculate Information Gain for the features.

3. Find the feature with maximum Information gain.

4. Repeat, until we get the desired tree.

**Applications of ID3 Algorithm**

**ID3 In Information Assets Identification**

The algorithm selects attributes with the largest amount of information gain as the test attribute of the current node. It makes the minimum amount of information needed by the data classification and reflects the principle of minimum randomness. The ID3 algorithm is applied to the recognition of the value of information assets. Thus, we can get the value of the assets identification rules, and provide important support for information security risk evaluations.

**In Soil Quality Grade Forecasting**** Model**

ID3 to establish a soil quality grade prediction model and combines the soil composition in Lishu to be a training sample. ID3 algorithm also expresses the acquired knowledge by means of quantitative rules. The experiment results manifest that the expression of the ID3 algorithm's knowledge is easy to understand, is convenient for practical application, improves forecasting accuracy, and provides a reliable theoretical basis for precision fertilization.

**In Cattle Disease Classification **

In cattle disease classification also ID3 algorithm has been used and it brought success to predict and classify disease in cattle and so that we can treat the cattle accordingly without any further delay.

**In Scholarship Evaluation **

Manual evaluation of educational scholarships always revolves around the rank of scholarship and the number of students who are rewarded rather than analyzing the facts which can influence the achievement of the scholarship. As a result, the evaluation is apparently lacking fairness and efficiency. Hence based on the ID3 decision tree, a higher education scholarship evaluation model is built.

**In An Analysis of Coal Logistics Customer **

For the analysis of logistic customers need to build a decision tree based on the C4.5 algorithm for coal logistics customer analysis, adopt Pessimistic Error Pruning (PEP) to simplify it, and apply the rules extracted to the CRM of the coal logistics road company which shows that the method can be able to accurately classify the types of customers.

**Advantages of ID3 Algorithm
**

1. In ID3 understandable prediction rules are created from the training data.

2. It builds the fastest tree.

3. In this only need to test enough attributes until all data is classified.

4. The whole dataset is searched to create a tree.

5. It easily handles irrelevant attributes.

Disadvantages of ID3 Algorithm

1. If a small sample is tested, then data may be overfitted or over classified.

2. Only one attribute at a time is tested for making a decision.

3. Does not handle streaming data easily.

4. Classifying continuous data may be computationally expensive, as many trees must be generated to see where to break the continuum.

**References
**

https://www.cise.ufl.edu/~ddd/cap6635/Fall-97/Short-papers/2.htm

https://towardsdatascience.com/decision-trees-introduction-id3-8447fd5213e9

https://www.sciencedirect.com/science/article/abs/pii/S0924013601010160

https://slideplayer.com/slide/6360835/

Other Related Articles

CART Algorithm

https://maxtech4u.com/academics/cart-algorithm-applications-advantages-disadvantages/C4.5 Algorithm

https://maxtech4u.com/academics/decision-tree-c4-5-or-j48-algorithm-applications-advantages-disadvantage/