8 month ago
deusx : Tom White: "Disks have become tapes" - "In essence MapReduce works by repeatedly sorting and merging data that is streamed to and from disk at the transfer rate of the disk. Contrast this to accessing data from a relational database that operates at the seek rate of the disk"
Matthew M. Boedicker : thinking of disks as a sequential device rather than a random access device - (via reddit) [via]
# copy
11 month ago
nelson : MapReduce whitepaper - Nice summary, link in comments to full paper. Google's apparently crunching the equivalent of 130,000 computers full time.
# copy
14 month ago
Jeremy Zawodny : Yahoo's Doug Cutting on MapReduce and the Future of Hadoop - Yahoo's Doug Cutting on MapReduce and the Future of Hadoop: "In this special InfoQ interview Cutting discusses how Hadoop is used at Yahoo, the challenges of its development, and the future direction of the project."
nelson : Hadoop interview - Doug Cutting is one of the smartest programmers I know
# copy
19 month ago
Jeremy Zawodny : Yahoo Pig and Google Sawzall - Yahoo Pig and Google Sawzall: "I have to say, it is good to see Yahoo building these kinds of tools for large scale data manipulation."
nelson : Yahoo Pig and Google Sawzall - massively parallel data crunching platforms
# copy
28 month ago
kellan : Google: "one trillion words from public Web pages." - note to self, revisit Hadoop #
Paul Hammond : Official Google Research Blog: All Our N-gram are Belong to You - We processed 1,011,582,453,213 words of running text and are publishing the counts for all 1,146,580,664 five-word sequences that appear at least 40 times
joshua : Official Google Research Blog: All Our N-gram are Belong to You - i wish this wasn't $150
# copy