|
|
Distributed Algorithms for PageRank Computation
PageRank refers to the ranking system used at Google for efficiently listing the search results.
It quantifies the importance or popularity of webpages by examining the link structure of the
entire web. The underlying idea of this system is that pages having incoming links from important
pages should be important as well, see [1]. The main challenge in implementing this algorithm, however,
is the size of the web. It is reported that Google possesses over 8 billion webpage indices, which
poses serious issues in terms of computation. In fact the PageRank computation is performed once a
month and it takes about a week. There has been a line of research in developing efficient
numerical methods for PageRank including, for example, exploiting Markov chain methods,
utilizing techniques from Monte Carlo simulation, applying numerical analysis
methods based on asynchronous iteration. In particular, in [2], [3], [4] a different approach has been
proposed. The approach is randomized-based and distributed, and each page computes its own PageRank
value locally by communicating with the pages that are connected by direct links. That is, each
page exchanges its value with the pages that it links to and those linked to it and then update the
value in a sequential manner to find the true PageRank. Randomization is with respect to the time
instants at which each page decides to initiate the communication for updates. The time
is randomly chosen and is independent among the pages. Hence, there is no need of a fixed order
among the pages or a leader agent that specifies the pages to start updates.
The PageRank toolbar for Google (searching SICE of Japan)
PageRank update based on a decentralized randomized algorithm (Las Vegas-type)
References
-
S. Brin and L. Page, "The Anatomy of a Large-Scale Hypertextual Web
Search Engine," Computer Networks & ISDN Systems, Vol. 30, pp. 107–117, 1998.
-
H. Ishii and R. Tempo, "A Distributed Randomized Approach for the
PageRank Computation: Part 1," Proc. 47th IEEE Conference on Decision
and Control, Cancun, Mexico, 2008.
-
H. Ishii and R. Tempo, "A Distributed Randomized Approach for the
PageRank Computation: Part 2," Proc. 47th IEEE Conference on Decision
and Control, Cancun, Mexico, 2008.
-
H. Ishii and R. Tempo, "Computing the PageRank Variation for Fragile
Web Data" SICE Journal on Control, Measurement, and System Integration,
Vol. 2, pp. 1-9, 2009.
|
|