blogmarks.net Get Firefox!

Tom White: "Disks have become tapes"

8 month ago

deusx : Tom White: "Disks have become tapes" - "In essence MapReduce works by repeatedly sorting and merging data that is streamed to and from disk at the transfer rate of the disk. Contrast this to accessing data from a relational database that operates at the seek rate of the disk"

Matthew M. Boedicker : thinking of disks as a sequential device rather than a random access device - (via reddit) [via]

Tags : disks mapreduce scaling tapes

  copy

MapReduce whitepaper

11 month ago

nelson : MapReduce whitepaper - Nice summary, link in comments to full paper. Google's apparently crunching the equivalent of 130,000 computers full time.

Tags : google mapreduce systems technology

  copy

Yahoo's Doug Cutting on MapReduce and the Future of Hadoop

14 month ago

Jeremy Zawodny : Yahoo's Doug Cutting on MapReduce and the Future of Hadoop - Yahoo's Doug Cutting on MapReduce and the Future of Hadoop: "In this special InfoQ interview Cutting discusses how Hadoop is used at Yahoo, the challenges of its development, and the future direction of the project."

nelson : Hadoop interview - Doug Cutting is one of the smartest programmers I know

Tags : links cluster code cutting distributed grid hadoop lucene mapreduce opensource scalability via:zawodny yahoo

  copy

Yahoo Pig and Google Sawzall

19 month ago

Jeremy Zawodny : Yahoo Pig and Google Sawzall - Yahoo Pig and Google Sawzall: "I have to say, it is good to see Yahoo building these kinds of tools for large scale data manipulation."

nelson : Yahoo Pig and Google Sawzall - massively parallel data crunching platforms

Tags : links api google grid mapreduce pig programming sawzall yahoo

  copy

Google: "one trillion words from public Web pages."

28 month ago

kellan : Google: "one trillion words from public Web pages." - note to self, revisit Hadoop #

Paul Hammond : Official Google Research Blog: All Our N-gram are Belong to You - We processed 1,011,582,453,213 words of running text and are publishing the counts for all 1,146,580,664 five-word sequences that appear at least 40 times

joshua : Official Google Research Blog: All Our N-gram are Belong to You - i wish this wasn't $150

Tags : public web google research ngram mapreduce big.numbers data ir

  copy
xml
Upian.