scala - How to do distributed Principal Components Analysis + Kmeans using Apache Spark? -


i need run principal components analysis , k-means clustering on large-ish dataset (around 10 gb) spread out on many files. want use apache spark since it's known fast , distributed.

i know spark supports pca , pca + kmeans.

however, haven't found example demonstrates how many files in distributed manner.


Comments

Popular posts from this blog

xslt - Substring before throwing error -

javascript - gulp-nodemon - nodejs restart after file change - Error: listen EADDRINUSE events.js:85 -

oracle - Changing start date for system jobs related to automatic statistics collections in 11g -