hadoop - How to efficiently find top-k elements? -

- August 15, 2011

i have big sequence file storing tfidf values documents. each line represents line , columns value of tfidfs each term (the row sparse vector). i'd pick top-k words each document using hadoop. naive solution loop through columns each row in mapper , pick top-k file becomes bigger , bigger don't think solution. there better way in hadoop?

 1. in every map calculate topk (this local top k each map)  2. spawn signle reduce , top k mappers flow reducer , hence global top k evaluated.

think of problem

 1. have been given results of x number of horse races.   2. need find top n fastest horse.

Search This Blog

Th

hadoop - How to efficiently find top-k elements? -

Comments

Post a Comment

Popular posts from this blog

xslt - Substring before throwing error -

oracle - Changing start date for system jobs related to automatic statistics collections in 11g -

javascript - gulp-nodemon - nodejs restart after file change - Error: listen EADDRINUSE events.js:85 -