In the last post we saw how to write a MapReduce program for finding the top-n items of a dataset. The code in the mapper emits a pair key-value for every word found, passing the word as the key and 1 as the value. Since the book has roughly 38,000 w…
1.2MapReduce 和 HDFS 是如何工作的 MapReduce 其实是两部分,先是 Map 过程,然后是 Reduce 过程.从词频计算来说,假设某个文件块里的一行文字是”Thisis a small cat. That is a smalldog.”,那么,Map 过程会对这一行进行处理,将每个单词从句子解析出来,依次生成形如<“this”,1>, <”is”, 1>, <”a”, 1>, <”small”, 1>,<”cat”, 1>…