View Post [edit]
Poster: | Pedro Domingos | Date: | Nov 14, 2002 10:33am |
Forum: | researchproposals | Subject: | Research Proposal from Pedro Domingos |
Name: Pedro Domingos
Organization: University of Washington
Email: pedrod@cs.washington.edu
Project name: Predicting the Evolution of the World Wide Web
Abstract: We use algorithms for mining massive data streams
to build models that predict how many links new
Web pages are going to accrue, and how their PageRanks will vary as a result. We look at
a wide variety of predictive features, and
select the best ones to use in our models.
Description: The data we need is a series of snapshots of
the Web. By comparing the first two snapshots we find which pages are new in the second one.
By comparing the second and third ones we build
models of how pages' inlinks and PageRanks vary
over time. We test our models by making predictions for the fourth snaphsot. With
more snapshots, we can build longer-range models.
Our computation needs are to do some simple preprocessing of each page to extract the features we want to download (e.g., frequencies of
words in the page).