tree - Use significant attributes only, or use full set of attributes to build J48 model after checking information gain? -
weka's j48 allows 1 check information gain on full set of attributes, should use significant attributes build model? or should use full set of attributes?
in data mining, there multi-way trade-off between number of features use, accuracy, , time takes generate model. in theory, you'd want include every possible feature boost accuracy; however, going data mining in way guarantees lengthy model generation times. further, models produce textual decision trees j48 aren't useful when tree has thousands of nodes.
depending on how many features start out with, may want remove features don't provide large enough information gain. if have small number of features begin (e.g. fewer 20), might make sense keep of them.
if wish limit number of features use, best choose highest information gain. worthwhile things principal component reduction (which can done through weka) select best features.
Comments
Post a Comment