Which collection would you like to work with:
November 14, 2002 10:33:51am
Research Proposal from Pedro Domingos
University of WashingtonEmail:
Predicting the Evolution of the World Wide WebAbstract:
We use algorithms for mining massive data streams
to build models that predict how many links new
Web pages are going to accrue, and how their PageRanks will vary as a result. We look at
a wide variety of predictive features, and
select the best ones to use in our models.Description:
The data we need is a series of snapshots of
the Web. By comparing the first two snapshots we find which pages are new in the second one.
By comparing the second and third ones we build
models of how pages' inlinks and PageRanks vary
over time. We test our models by making predictions for the fourth snaphsot. With
more snapshots, we can build longer-range models.
Our computation needs are to do some simple preprocessing of each page to extract the features we want to download (e.g., frequencies of
words in the page).