CART Algorithm – Applications, Advantages & Disadvantages

CART stands for Classification And Regression Trees. It is a widely used technology in machine learning. It is a predictive model that explains that the value of an outcome can be predicted which would be based on other values. It gives the output as a decision tree wherein a predictor variable each fork is split and a prediction for the outcome variable contained by each node. 

The CART or Classification & Regression Trees methodology was introduced in 1984 by Leo Breiman, Jerome Friedman, Richard Olshen, and Charles Stone.

The structure of the CART algorithm can be defined as a sequence of questions, the answers to which determined what will be the next question if there is any. The CART algorithm can be applied for both regression and classification. 

The main elements of CART (and any decision tree algorithm) are: 

  1. Rules for splitting data at a node based on the value of one variable; 
  2.  Stopping rules for deciding when a branch is terminal and can be split no more; and 
  3. Finally, a prediction for the target variable in each terminal node. 

The CART algorithm works via the following process: 

  1.   The best split point of each input is obtained. 
  2.  Based on the best split points of each input in Step 1, the new “best” split point is identified. 
  3.   Split the chosen input according to the “best” split point. 
  4.  Continue splitting until a stopping rule is satisfied or no further desirable splitting is available.

As the algorithm works by using impurity measures to quantify the purity of the nodes. A node is deemed to be “pure” when the target values or categories are homogenous, and further splits are undesirable.

Applications of CART Algorithm 

CART for Quick Data Insights

The CART model is used to find out the relationship among defective transactions and “amount,” “channel,” “service type,” “customer category” and “department involved.” After building the model, the Cp value is checked across the levels of tree to find out the optimum level at which the relative error is minimum. The optimum Cp value is then used to prune the tree. 

CART in Blood Donors Classification 

CART decision tree algorithm implemented in Weka. So Numerical experimental results on the UCI ML blood transfusion data with the enhancements helped to identify donor classification. Conclusion: The CART derived model along with the extended definition for identifying regular voluntary donors provided a good classification accuracy based model.   

CART for Environmental and Ecological Data

The CART algorithm is adapted to the case of spatially dependent samples, focusing on environmental and ecological applications. Two approaches are considered. The first one takes into account the irregularity of the sampling by weighting the data according to their spatial pattern using two existing methods based on Voronoi tessellation and regular grid, and one original method based on kriging. The second one uses spatial estimates of the quantities involved in the construction of the discriminate rule at each step of the algorithm. These methods are tested on simulations and on a classical dataset to highlight their advantages and drawbacks. They are then applied to an ecological data set to explore the relationship between pollen data and presence/absence of tree species, which is an important question for climate reconstruction based on paleoecological data.

CART in  Psychiatric Services

CART was used to identify potential high users of services among low-income psychiatric outpatients. Sociodemographic variables, clinical variables (e.g., psychiatric diagnosis and type of presenting complaint), source of referral, and the most recent psychiatric treatment setting used were studied. Discharge from inpatient psychiatric treatment right before admission to outpatient psychiatric treatment was found to be the most consistent, the most powerful, and the only necessary predictor of high use of outpatient psychiatric services. 

CART in the Financial Sector

The main idea is that the learning sample is consistently replenished with new observations. It means that the CART tree has an important ability to adjust to the current situation in the market. Many banks are using the Basel II credit scoring system to classify different companies to risk levels, which uses a group of coefficients and indicators. This approach, on the other hand, requires continuous correction of all indicators and coefficients in order to adjust to market changes.

Advantages of CART Algorithm 

  1. . CART requires minimal supervision and produces easy to understand models. 
  2.  It focuses on finding interactions and signal discontinuity. 
  3.  It finds important variables automatically. 
  4. It is invariant to monotone transformations of predictors. 
  5. It uses any combination of continuous/ discrete variables. 

Disadvantages of CART Algorithm

  1. This does not use combinations of variables. 
  2. The tree structure may be unstable. 
  3.  It has a limited number of positions to accommodate available predictors. 
  4.  It has a limited number of positions to accommodate available predictors. 


No Comments
Posted in:
Data Mining
There are no comments yet.
Write a comment
Your comment
Follow by Email