Distributed Algorithms for PageRank Computation

PageRank refers to the ranking system used at Google for efficiently listing the search results. It quantifies the importance or popularity of webpages by examining the link structure of the entire web. The underlying idea of this system is that pages having incoming links from important pages should be important as well, see [1]. The main challenge in implementing this algorithm, however, is the size of the web. It is reported that Google possesses over 8 billion webpage indices, which poses serious issues in terms of computation. In fact the PageRank computation is performed once a month and it takes about a week. There has been a line of research in developing efficient numerical methods for PageRank including, for example, exploiting Markov chain methods, utilizing techniques from Monte Carlo simulation, applying numerical analysis methods based on asynchronous iteration. In particular, in [2], [3], [4] a different approach has been proposed. The approach is randomized-based and distributed, and each page computes its own PageRank value locally by communicating with the pages that are connected by direct links. That is, each page exchanges its value with the pages that it links to and those linked to it and then update the value in a sequential manner to find the true PageRank. Randomization is with respect to the time instants at which each page decides to initiate the communication for updates. The time is randomly chosen and is independent among the pages. Hence, there is no need of a fixed order among the pages or a leader agent that specifies the pages to start updates.



The PageRank toolbar for Google (searching SICE of Japan)




PageRank update based on a decentralized randomized algorithm (Las Vegas-type)


References

  1. S. Brin and L. Page, "The Anatomy of a Large-Scale Hypertextual Web Search Engine," Computer Networks & ISDN Systems, Vol. 30, pp. 107–117, 1998.


  2. H. Ishii and R. Tempo, "A Distributed Randomized Approach for the PageRank Computation: Part 1," Proc. 47th IEEE Conference on Decision and Control, Cancun, Mexico, 2008.


  3. H. Ishii and R. Tempo, "A Distributed Randomized Approach for the PageRank Computation: Part 2," Proc. 47th IEEE Conference on Decision and Control, Cancun, Mexico, 2008.


  4. H. Ishii and R. Tempo, "Computing the PageRank Variation for Fragile Web Data" SICE Journal on Control, Measurement, and System Integration, Vol. 2, pp. 1-9, 2009.


 back