Classification tree model is a data mining technique that can select and prioritizes from among a large number of independent variables that are most important in determining the outcome of the dependent variable to be explained. The resulting model also show how the importance of an independent variable are affected by other independent variables. Marcom uses CART and CHAID algorithm to implement classification tree analysis.
The model is particular useful in segmentation study to group respondents into homogenous subgroups in terms of how they respond to different marketing mix.
What you need to give us?
You send us the data file. Tell us which value of the dependent variable you want to predict and optionally which variables most likely will affect the outcome of the dependent variable. In the Titanic example, "Y" is the value in the "survived" variable we want to predict.
What we give you back?
(A) A Tree Model indicating segments of people who are more likely to answer the predicted value. The tree diagram representing a classification system or a predictive model.
(B) Gain chart
the gain table of the Titanic case will indicates that:
a female passenger in 1st class will have 301% higher chance of survival than the average.
the first 4 nodes in total only account for 24.3 of the total sample, but has 52.5% of the total survived rather than just 24.3% of the total survived.
69.9% of the pasangers in the first 4 nodes have survived while only 32.3% of the total passengers have survived.
(C) Prioritize or rank independent variables by importance
A list of independent variables which are found to have significant effect on the outcome of the dependent variable are ranked in order of importance in their predictive power. For example, Sex is found to be the most factor in determining a passenager would survive or not.
(D) IF - THEN rules to predict the survival segments
In the Titanic case,
Segment/node 5 is ‘survived’ segment, and the rule is if female and 1st, then 97% survived.
Segment/node 6 is ‘survived’ segment, and the rule is If female and (2nd or crew), then 88% survived
From these rules and the tree model, we will observe that "first class, women and children first" policy and the fact that policy was not entirely successful in saving the women and children in the third class - are reflected
(E) A misclassification table indicating accuracy of the model
In the Titanic case, our generated tree model is correct in prediction 78% of the time.
How long does it take?
After clarification of the dependent and independent variables of interest, we take about 2 working days to finish your work and send you back the result.
Asian B2B Expert
China | Hong Kong | Japan | Korea | Taiwan | Singapore | India | Australia
Suppose you have a data file containing cases of passengers on the Titanic. In this example, "survived" is the variable of interest so it is called the dependent or target variable. The variables that may affect the dependent variable (sex, age, class) are known as independent variables.
Our analysis goal is:
How can we use our existing information to learn what variables would provide us with an indication of which persons are most likely to survive?