Reply to this post | Go Back
View Post [edit]

Poster: FamousLongAgo Date: Jul 14, 2003 2:00am

Forum: researchproposals Subject: Research Proposal from FamousLongAgo

Which collection would you like to work with: fullweb
Name: Maciej Ceglowski
Organization: NITLE
Email: maciej@ceglowski.com
Project name: NITLE Blog Census
Abstract: The blog census (http://www.blogcensus.net) is an attempt to identify and archive all weblogs on the Net. Currently we have 614K blogs in our list, and do a full snapshot of our database (including HTML for all blogs in the census) every twelve days or so. The crawl has been active since May 2003, with the first snapshot taken June 28. If this material is of interest to the Internet Archive, we would like to donate it on an ongoing basis. This will also ensure that the data is not lost (we lost our first, June 10 snapshot to a weird RAID 5 error).
Description: The full database snapshot is about 3GB compressed (12GB uncompressed). Metadata includes:

* crawl date
* language (identified from content)
* blogging tool used
* URL
* full HTML
* outbound links

Since the number of blogs is growing rapidly, a conservative estimate is 12 GB/month of compressed data.
We're working on setting up diffs or some other way of reducing the storage requirements.

Internet Archive Audio

Featured

Top

Images

Featured

Top

Software

Featured

Top

Books

Featured

Top

Video

Featured

Top

Mobile Apps

Browser Extensions

Archive-It Subscription

Save Page Now

Reply to this post | Go Back
View Post [edit]

Poster: FamousLongAgo Date: Jul 14, 2003 2:00am

Forum: researchproposals Subject: Research Proposal from FamousLongAgo

Poster:	FamousLongAgo	Date:	Jul 14, 2003 2:00am
Forum:	researchproposals	Subject:	Research Proposal from FamousLongAgo

Internet Archive Audio

Featured

Top

Images

Featured

Top

Software

Featured

Top

Books

Featured

Top

Video

Featured

Top

Mobile Apps

Browser Extensions

Archive-It Subscription

Save Page Now

Reply to this post | Go Back View Post [edit]

Poster: FamousLongAgo Date: Jul 14, 2003 2:00am Forum: researchproposals Subject: Research Proposal from FamousLongAgo

Reply to this post | Go Back
View Post [edit]

Poster: FamousLongAgo Date: Jul 14, 2003 2:00am

Forum: researchproposals Subject: Research Proposal from FamousLongAgo