scala - How to do distributed Principal Components Analysis + Kmeans using Apache Spark? -


i need run principal components analysis , k-means clustering on large-ish dataset (around 10 gb) spread out on many files. want use apache spark since it's known fast , distributed.

i know spark supports pca , pca + kmeans.

however, haven't found example demonstrates how many files in distributed manner.


Comments

Popular posts from this blog

javascript - gulp-nodemon - nodejs restart after file change - Error: listen EADDRINUSE events.js:85 -

Fatal Python error: Py_Initialize: unable to load the file system codec. ImportError: No module named 'encodings' -

javascript - oscilloscope of speaker input stops rendering after a few seconds -