Abstract—Cancer datasets contains large number of gene expression values with a limited number of samples. Classifying these datasets using different classification algorithm is one of the most challenging tasks for the researchers because of their high dimensionality and enormous size. Extracting predictive features for accurately classifying these datasets requires choosing appropriate classification algorithm. Along with the feature selection capability embedded in the classifiers, some additional feature selection method can be useful for better classification accuracy for cancer datasets. Decision tree classifiers are good candidates for this purpose while in this paper we have used the boosting algorithms AdaBoost as a boosting algorithm for classifying along with the decision tree classifiers for evaluating their performance for different cancer datasets with different size and number of features (genes). In this paper, one of the previously proposed methods of feature selection has been used along with some conventional feature selection methods to obtain predictive features for classification and the performances on the accuracy of classifying. The time to build the model for decision three induction classifiers and for the Boosting algorithm is also analyzed.
Index Terms—Boosting algorithm, cancer datasets, classification, data mining, decision tree induction.
Abid Hasan is with the Department of Computer Science and Information Technology, Islamic University of Technology, Gazipur 1704, Bangladesh (e-mail: email@example.com).
Cite: Abid hasan, "Evaluation of Decision Tree Classifiers and Boosting Algorithm for Classifying High Dimensional Cancer Datasets," International Journal of Modeling and Optimization vol. 2, no. 2, pp. 92-96, 2012.