Skip to main content

View Post [edit]

Poster: Pedro Domingos Date: Nov 14, 2002 10:33am
Forum: researchproposals Subject: Research Proposal from Pedro Domingos

Which collection would you like to work with: fullweb
Name: Pedro Domingos
Organization: University of Washington
Email: pedrod@cs.washington.edu
Project name: Predicting the Evolution of the World Wide Web
Abstract: We use algorithms for mining massive data streams
to build models that predict how many links new
Web pages are going to accrue, and how their PageRanks will vary as a result. We look at
a wide variety of predictive features, and
select the best ones to use in our models.
Description: The data we need is a series of snapshots of
the Web. By comparing the first two snapshots we find which pages are new in the second one.
By comparing the second and third ones we build
models of how pages' inlinks and PageRanks vary
over time. We test our models by making predictions for the fourth snaphsot. With
more snapshots, we can build longer-range models.
Our computation needs are to do some simple preprocessing of each page to extract the features we want to download (e.g., frequencies of
words in the page).