3 month ago
Jeremy Zawodny : Hadoop at Twitter (part 1): Splittable LZO Compression - Hadoop at Twitter (part 1): Splittable LZO Compression: LZO sounds pretty interesting
nelson : LZO + Hadoop - Great article about using compression to make distributed computation faster
# copy6 month ago
bmilleare : CloudCrowd - If Carlsberg made worker/job queue servers...
Simon Willison : cloud-crowd - cloud-crowd. New parallel processing worker/job queue system with a strikingly elegant architecture. The central server is an HTTP server that manages job requests, which are farmed out to a number of node HTTP servers which fork off worker processes to
# copy10 month ago
nelson : Hadoop petabyte sort - Hadoop sorts a petabyte in 16 hours. Compare Google's 6 hour petabyte sort, but then that's not open source
# copy
24 month ago
deusx : Tom White: "Disks have become tapes" - "In essence MapReduce works by repeatedly sorting and merging data that is streamed to and from disk at the transfer rate of the disk. Contrast this to accessing data from a relational database that operates at the seek rate of the disk"
Matthew M. Boedicker : thinking of disks as a sequential device rather than a random access device - (via reddit) [via]
# copy
26 month ago
nelson : MapReduce whitepaper - Nice summary, link in comments to full paper. Google's apparently crunching the equivalent of 130,000 computers full time.
# copy
30 month ago
Jeremy Zawodny : Yahoo's Doug Cutting on MapReduce and the Future of Hadoop - Yahoo's Doug Cutting on MapReduce and the Future of Hadoop: "In this special InfoQ interview Cutting discusses how Hadoop is used at Yahoo, the challenges of its development, and the future direction of the project."
nelson : Hadoop interview - Doug Cutting is one of the smartest programmers I know
# copy
35 month ago
Jeremy Zawodny : Yahoo Pig and Google Sawzall - Yahoo Pig and Google Sawzall: "I have to say, it is good to see Yahoo building these kinds of tools for large scale data manipulation."
nelson : Yahoo Pig and Google Sawzall - massively parallel data crunching platforms
# copy
44 month ago
kellan : Google: "one trillion words from public Web pages." - note to self, revisit Hadoop #
Paul Hammond : Official Google Research Blog: All Our N-gram are Belong to You - We processed 1,011,582,453,213 words of running text and are publishing the counts for all 1,146,580,664 five-word sequences that appear at least 40 times
joshua : Official Google Research Blog: All Our N-gram are Belong to You - i wish this wasn't $150
# copy
44 month ago
Paul Hammond : Joel on Software - Can Your Programming Language Do This? - By abstracting away the very concept of looping, you can implement looping any way you want
deusx : Can Your Programming Language Do This? - Joel on Software - "I hope you're convinced, by now, that programming languages with first-class functions let you find more opportunities for abstraction, which means your code is smaller, tighter, more reusable, and more scalable."
# copy