Some highlights:
- Number of links between pages in the index: roughly 1 trillion links
- Size of output: over 300 TB, compressed
- Number of cores used to run a single Map-Reduce job: over 10,000
- Raw disk used in the production cluster: over 5 Petabytes
- Hadoop has allowed us to run the identical processing we ran pre-Hadoop on the same cluster in 66% of the time our previous system took. It does that while simplifying administration.
Update - added few more links:
Jeremy Zawodny blog (Yahoo!) http://jeremy.zawodny.com/blog/archives/009992.html
Interview with Doug Cutting (InfoQ) http://www.infoq.com/articles/hadoop-interview
No comments:
Post a Comment