Archive for November, 2007

The New York Times Uses Amazon S3 To Store Archive

Thursday, November 1st, 2007

The New York Times (NYT) has had an interesting project just complete.

They decided to make all their public domain articles from 1851-1922 available free of charge.

These articles are all in the form of images scanned from original editions of the paper.

To do this they had to upload the scanned images of the 11 million articles up to Amazon S3. That was about 4 TB (Terabytes)

Then they used Amazon’s E2 “rent some seriously powerful servers” service to create PDF files of the 11 million articles using a combination of specially written software.

If you are familiar with Amazon’s E2 service you might like to know it took 24 hours using 100 Es instances to create the 11 million PDF’s.

When finished, the 11 million articles generated 1.5TB of data to store in Amazon S3.

And that’s were it now sit’s waiting to be searched by anybody via the main NYT website at http://www.nytimes.com

Just look for the Search Box and select:  NYT Archive 1851 - 1980 from the drop down list!

…yet another great example of using the power of Amazon S3 in business!

Have you got your software to start using Amazon S3 yet?

Visit: http://www.databucketpro.com

Marc LironRegards

Marc Liron - Microsoft MVP