scala - How to do distributed Principal Components Analysis + Kmeans using Apache Spark? -
i need run principal components analysis , k-means clustering on large-ish dataset (around 10 gb) spread out on many files. want use apache spark since it's known fast , distributed.
i know spark supports pca , pca + kmeans.
however, haven't found example demonstrates how many files in distributed manner.
Comments
Post a Comment