apache spark - SparkMLlib MultiClassMetrics.confusionMatrix() and precision() seems giving contradictory results -

- April 15, 2010

hi new machine learning , spark mllib. have created randomforest classifier model using randomforest.trainclassifier() training data set categorical in nature , have response/target variables actionable/noactionable. have created predictionandlables rdd using test data , model.predict() trying following validate model accuracy.

multiclassmetrics metrics = new multiclassmetrics(predictionandlables.rdd()) system.out.println(metrics.precision()); //prints 0.94334140435 system.out.println(metrics.confusionmatrix()); //prints following  1948.0     0.0 117.0      0.0

now if see model accuracy printed using precision() method seems around 94% if see above confusion matrix seems wrong have 1948 nonactionable target variables , 117 actionable target variable in test data set. according confusion matrix predict nonactionable correctly , not predict @ actionable variables. please me understanding confusion matrix , why precision 94%. results contradicting. please guide in advance.

imagine 117 actionable rows glued 500 nonactionable ones. classifier can move 617 actionable column , 500 nonactionable ones wrong or can move them nonactionable column , 117 wrong. unless tell 117 actionable wrong more wrong 500 nonactionable that. figure out how balance problem (fake out more actionable items, subsample nonactionable ones, weight actionable items more heavily etc) , work on more features weaken "glue" (make actionable , nonactionable different possible classifier)

Search This Blog

Th

apache spark - SparkMLlib MultiClassMetrics.confusionMatrix() and precision() seems giving contradictory results -

Comments

Post a Comment

Popular posts from this blog

xslt - Substring before throwing error -

javascript - gulp-nodemon - nodejs restart after file change - Error: listen EADDRINUSE events.js:85 -

oracle - Changing start date for system jobs related to automatic statistics collections in 11g -