r - scatter plot for a multiclass dataset with class imbalance and class overlapping -
i'm using weka develop classifier detecting semantic relations. lets supose have multiclass dataset. dataset, @ first, contains 4 numeric features (could on 4) , class attribute, valid class attribute value "hypernym", "synonym" or "no", i.e., 3 classes. so, examples of instances be:
feat1 feat2 feat3 feat4 class .... 0.32 0.45 0.15 5 no 0.26 0.48 0.93 20 hyper 0.65 0.32 0.43 13 no 0.43 0.19 0.89 45 syn ...
this typical classification problem. however, must consider dataset inflicted class imbalance problem (it problem in machine learning total number of class of data (positive) far less total number of class of data (negative)) , class overlapping (examples of different classes have similar characteristics).
the question is: how can represent each instance in graph 2d, in way can visualize degree of overlapping between classes?
i have found a picture illustrates possible example of graph, scatter plot. however, don't know how plot this.
is there easy way make figure similar, in r or using weka?
you can use multidimensional scaling (mds) first, reduce dimension of data , plot it. method tries preserve distances between points when projecting lower dimension.
here example in r iris dataset
data <- iris colors <- as.integer(as.factor(data$species)) d <- dist(data[,1:4]) fit <- cmdscale(d,k=2)# k resulting dimension x <- fit[,1] y <- fit[,2] plot(x, y, xlab="coordinate 1", ylab="coordinate 2", main="mds", pch=19, col=colors)
or reduce 3 dimensions , plot using scatterplot3d library.
fit <- cmdscale(d,k=3)# k resulting dimension x <- fit[,1] y <- fit[,2] z <- fit[,3] scatterplot3d(x,y,z, color = colors, pch = 19)
about class imbalance problem, don't know how represent in scatter plot. maybe increasing size of points minority classes.
Comments
Post a Comment