apache spark - SparkMLlib MultiClassMetrics.confusionMatrix() and precision() seems giving contradictory results -
hi new machine learning , spark mllib. have created randomforest classifier model using randomforest.trainclassifier()
training data set categorical in nature , have response/target variables actionable/noactionable. have created predictionandlables
rdd
using test data , model.predict()
trying following validate model accuracy.
multiclassmetrics metrics = new multiclassmetrics(predictionandlables.rdd()) system.out.println(metrics.precision()); //prints 0.94334140435 system.out.println(metrics.confusionmatrix()); //prints following 1948.0 0.0 117.0 0.0
now if see model accuracy printed using precision()
method seems around 94%
if see above confusion matrix seems wrong have 1948
nonactionable target variables , 117
actionable target variable in test data set. according confusion matrix predict nonactionable correctly , not predict @ actionable variables. please me understanding confusion matrix , why precision 94%
. results contradicting. please guide in advance.
imagine 117 actionable rows glued 500 nonactionable ones. classifier can move 617 actionable column , 500 nonactionable ones wrong or can move them nonactionable column , 117 wrong. unless tell 117 actionable wrong more wrong 500 nonactionable that. figure out how balance problem (fake out more actionable items, subsample nonactionable ones, weight actionable items more heavily etc) , work on more features weaken "glue" (make actionable , nonactionable different possible classifier)
Comments
Post a Comment