skip to main content
article
Free Access

Authoritative sources in a hyperlinked environment

Published:01 September 1999Publication History
Skip Abstract Section

Abstract

The network structure of a hyperlinked environment can be a rich source of information about the content of the environment, provided we have effective means for understanding it. We develop a set of algorithmic tools for extracting information from the link structures of such environments, and report on experiments that demonstrate their effectiveness in a variety of context on the World Wide Web. The central issue we address within our framework is the distillation of broad search topics, through the discovery of “authorative” information sources on such topics. We propose and test an algorithmic formulation of the notion of authority, based on the relationship between a set of relevant authoritative pages and the set of “hub pages” that join them together in the link structure. Our formulation has connections to the eigenvectors of certain matrices associated with the link graph; these connections in turn motivate additional heuristrics for link-based analysis.

References

  1. AROCENA,G.O.,MENDELZON,A.O.,AND MIHAILA, G. A. 1997. Applications of a Web query language. In Proceedings of the 6th International World Wide Web Conference (Santa Clara, Calif., Apr. 7-11). Google ScholarGoogle Scholar
  2. BARRETT, R., MAGLIO, P., AND KELLEM, D. 1997. How to personalize the web. In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI '97) (Atlanta, Ga., Mar. 22-27). ACM, New York, pp. 75-82. Google ScholarGoogle Scholar
  3. BERMAN, O., HODGSON,M.J.,AND KRASS, D. 1995. Flow-interception problems. In Facility Location: A Survey of Applications and Methods, Z. Drezner, ed. Springer-Verlag, New York.Google ScholarGoogle Scholar
  4. BERNERS-LEE, T., CAILLIAU, R., LUOTONEN, A., NIELSEN,H.F.,AND SECRET, A. 1994. The world-wide web. Commun. ACM 37, 1 (Jan.), 76-82. Google ScholarGoogle Scholar
  5. BHARAT, K., BRODER, A., HENZINGER,M.R.,KUMAR, P., AND VENKATASUBRAMANIAN, S. 1998. Connectivity server: Fast access to linkage information on the web. In Proceedings of the 7th International World Wide Web Conference (Brisbane, Australia, Apr. 14-18). Google ScholarGoogle Scholar
  6. BHARAT, K., AND HENZINGER, M. R. 1998. Improved algorithms for topic distillation in a hyperlinked environment. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Melbourne, Australia, Aug. 24-28). ACM, New York, pp. 104-111. Google ScholarGoogle Scholar
  7. BOTAFOGO, R., RIVLIN, E., AND SHNEIDERMAN, B. 1992. Structural analysis of hypertext: Identify-ing hierarchies and useful metrics. ACM Trans. Inf. Sys. 10, 2 (Apr.), 142-180. Google ScholarGoogle Scholar
  8. BRIN, S., AND PAGE, L. 1998. Anatomy of a large-scale hypertextual web search engine. In Proceedings of the 7th International World Wide Web Conference (Brisbane, Australia, Apr. 14-18). pp. 107-117. Google ScholarGoogle Scholar
  9. CARRIERE, J., AND KAZMAN, R. 1997. WebQuery: Searching and visualizing the web through connectivity. In Proceedings of the 6th International World Wide Web Conference (Santa Clara, Calif., Apr. 7-11). Google ScholarGoogle Scholar
  10. CHAKRABARTI, S., DOM, B., GIBSON, D., KUMAR,S.R.,RAGHAVAN, P., RAJAGOPALAN, S., AND TOMKINS, A. 1998. Experiments in topic distillation. In Proceedings of the ACM SIGIR Workshop on Hypertext Information Retrieval on the Web (Melbourne, Australia). ACM, New York.Google ScholarGoogle Scholar
  11. CHAKRABARTI, S., DOM, B., GIBSON, D., KLEINBERG, J., RAGHAVAN, P., AND RAJAGOPALAN, S. 1998. Automatic resource compilation by analyzing hyperlink structure and associated text. In Proceed-ings of the 7th International World Wide Web Conference (Brisbane, Australia, Apr. 14-18). pp. 65-74. Google ScholarGoogle Scholar
  12. CHUNG, F. R. K. 1997. Spectral Graph Theory. AMS Press, Providence, R.I.Google ScholarGoogle Scholar
  13. CHEKURI, C., GOLDWASSER, M., RAGHAVAN, P., AND UPFAL, E. 1997. Web search using automated classification. In Proceedings of the 6th International World Wide Web Conference (Santa Clara, Calif., Apr. 7-11).Google ScholarGoogle Scholar
  14. CUTTING,D.R.,PEDERSEN, J., KARGER,D.R.,AND TUKEY, J. W. 1992. Scatter/gather: A cluster-based approach to browsing large document collections. In Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Copenhagen, Denmark, June 21-24). ACM, New York, pp. 330-337. Google ScholarGoogle Scholar
  15. DE SOLLA PRICE, D. 1981. The analysis of square matrices of scientometric transactions. Sciento-metrics 3 55-63.Google ScholarGoogle Scholar
  16. DEERWESTER, S., DUMAIS, S., LANDAUER, T., FURNAS, G., AND HARSHMAN, R. 1990. Indexing by latent semantic analysis. J. Amer. Soc. Info. Sci. 41, 391-407.Google ScholarGoogle Scholar
  17. DIGITAL EQUIPMENT CORPORATION. AltaVista search engine, http://altavista.digital.com/.Google ScholarGoogle Scholar
  18. DONATH,W.E.,AND HOFFMAN, A. J. 1973. Lower bounds for the partitioning of graphs. IBM J. Res. Develop. 17.Google ScholarGoogle Scholar
  19. DOREIAN, P. 1988. Measuring the relative standing of disciplinary journals, Inf. Proc. Manage. 24, 45-56. Google ScholarGoogle Scholar
  20. DOREIAN, P. 1994. A measure of standing for citation networks within a wider environment. Inf. Proc. Manage. 30, 21-31. Google ScholarGoogle Scholar
  21. EGGHE, L. 1988. Mathematical relations between impact factors and average number of citations. Inf. Proc. Manage. 24, 567-576. Google ScholarGoogle Scholar
  22. EGGHE, L., AND ROUSSEAU, R. 1990. Introduction to Informetrics, Elsevier, North-Holland, Am-sterdam, The Netherlands.Google ScholarGoogle Scholar
  23. FIELDER, M. 1973. Algebraic connectivity of graphs. Czech. Math. J. 23, 298-305.Google ScholarGoogle Scholar
  24. FRIEZE, A., KANNAN, R., AND VEMPALA, S. 1998. Fast Monte-Carlo Algorithms for Finding Low-Rank Approximations. In Proceedings of the 39th IEEE Symposium on Foundations of Computer Science (Palo Alto, Calif., Nov. 8-11). IEEE Computer Society Press, Los Alamitos, Calif. Google ScholarGoogle Scholar
  25. FRISSE, M. E. 1988. Searching for information in a hypertext medical handbook. Commun. ACM 31, 7 (July), 880-886. Google ScholarGoogle Scholar
  26. GARFIELD, E. 1972. Citation analysis as a tool in journal evaluation. Science 178, 471-479.Google ScholarGoogle Scholar
  27. GELLER, N. 1978. On the citation influence methodology of Pinski and Narin. Inf. Proc. Manage. 14, 93-95.Google ScholarGoogle Scholar
  28. GIBSON, D., KLEINBERG, J., AND RAGHAVAN, P. 1998. Inferring web communities from link topology. In Proceedings of the 9th ACM Conference on Hypertext and Hypermedia (Pittsburgh, Pa., June 20-24). ACM, New York, pp. 225-234. Google ScholarGoogle Scholar
  29. GIBSON, D., KLEINBERG, J., AND RAGHAVAN, P. 1998. Clustering categorical data: An approach based on dynamical systems. In Proceedings of the 24th International Conference on Very Large Databases (New York, N.Y., Aug. 24-27). pp. 311-322. Google ScholarGoogle Scholar
  30. GOLUB, G., AND VAN LOAN, C. F. 1989. Matrix Computations. Johns Hopkins University Press, Baltimore, Md.Google ScholarGoogle Scholar
  31. HOTELLING, H. 1933. Analysis of a complex statistical variable into principal components. J. Educ. Psychol. 24, 417-441.Google ScholarGoogle Scholar
  32. HUBBELL, C. H. 1965. An input-output approach to clique identification. Sociometry 28, 377-399.Google ScholarGoogle Scholar
  33. HUBERMAN, B., PIROLLI, P., PITKOW, J., AND LUKOSE, R. 1998. Strong regularities in world wide web surfing. Science, 280.Google ScholarGoogle Scholar
  34. JOLLIFFE, I. T. 1986. Principal Component Analysis. Springer-Verlag, New York.Google ScholarGoogle Scholar
  35. KATZ, L. 1953. A new status index derived from sociometric analysis. Psychometrika 18, 39-43.Google ScholarGoogle Scholar
  36. KESSLER, M. M. 1963. Bibliographic coupling between scientific papers. Amer. Document. 14, 10-25.Google ScholarGoogle Scholar
  37. LARSON, R. 1996. Bibliometrics of the world wide web: An exploratory analysis of the intellectual structure of cyberspace. In Proceedings of the Annual Meeting of the American Society of Information Science (Baltimore, Md., Oct. 19-24).Google ScholarGoogle Scholar
  38. LEVINE, J. H. 1979. Joint-space analysis of 'pick-any' data: Analysis of choices from an uncon-strained set of alternatives. Psychometrika, 44, 85-92.Google ScholarGoogle Scholar
  39. MARCHIORI, M. 1997. The quest for correct information on the web: Hyper search engines. In Proceedings of the 6th International World Wide Web Conference (Santa Clara, Calif., Apr. 7-11). Google ScholarGoogle Scholar
  40. MCBRYAN, O. 1994. GENVL and WWWW: Tools for taming the web. In Proceedings of the 1st International World Wide Web Conference (Geneva, Switzerland, May).Google ScholarGoogle Scholar
  41. MCCAIN, K. 1986. Co-cited author mapping as a valid representation of intellectual structure. J. Amer. Soc. Info. Sci. 37, 111-122.Google ScholarGoogle Scholar
  42. NOMA, E. 1982. An improved method for analyzing square scientometric transaction matrices. Scientometrics 4, 297-316.Google ScholarGoogle Scholar
  43. NOMA, E. 1984. Co-citation analysis and the invisible college. J. Amer. Soc. Info. Sci. 35, 29-33.Google ScholarGoogle Scholar
  44. PAPADIMITRIOU,C.H.,RAGHAVAN, P., TAMAKI, H., AND VEMPALA, S. 1998. Latent semantic indexing: A probabilistic analysis. In Proceedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (Seattle, Wash., June 1-3). ACM, New York, pp. 159-168. Google ScholarGoogle Scholar
  45. PINSKI, G., AND NARIN, F. 1976. Citation influence for journal aggregates of scientific publications: Theory, with application to the literature of physics. Inf. Proc. Manage. 12, 297-312.Google ScholarGoogle Scholar
  46. PIROLLI, P., PITKOW, J., AND RAO, R. 1996. Silk from a sow's ear: Extracting usable structures from the web. In Proceedings of ACM SIGCHI Conference on Human Factors in Computing Systems (CHI '96) (Vancouver, B.C., Canada, Apr. 13-18). ACM, New York, pp. 118-125. Google ScholarGoogle Scholar
  47. PITKOW, J., AND PIROLLI, P. 1997. Life, death, and lawfulness on the electronic frontier. In Proceedings of ACM SIGCHI Conference on Human Factors in Computing Systems (CHI '97) (Atlanta, Ga., Mar. 22-27). ACM, New York, pp. 383-390. Google ScholarGoogle Scholar
  48. SALTON, G. 1989. Automatic Text Processing. Addison-Wesley, Reading, Mass. Google ScholarGoogle Scholar
  49. SHAW, W. M. 1991. Subject and citation indexing. Part I: The clustering structure of composite representations in the cystic fibrosis document collection. J. Amer. Soc. Info. Sci. 42, 669-675.Google ScholarGoogle Scholar
  50. SHAW, W. M. 1991. Subject and citation indexing. Part II: The optimal, cluster-based retrieval performance of composite representations. J. Amer. Soc. Info. Sci. 42, 676-684.Google ScholarGoogle Scholar
  51. SMALL, H. 1973. Co-citation in the scientific literature: A new measure of the relationship between two documents. J. Amer. Soc. Info. Sci. 24, 265-269.Google ScholarGoogle Scholar
  52. SMALL, H. 1986. The synthesis of specialty narratives from co-citation clusters. J. Amer. Soc. Info. Sci. 37, 97-110. Google ScholarGoogle Scholar
  53. SMALL, H., AND GRIFFITH, B. C. 1974. The structure of the scientific literatures I. Identifying and graphing specialties. Science Studies 4, 17-40.Google ScholarGoogle Scholar
  54. SPERTUS, E. 1997. ParaSite: Mining structural information on the web. In Proceedings of the 6th International World Wide Web Conference (Santa Clara, Calif., Apr. 7-11). Google ScholarGoogle Scholar
  55. VAN RIJSBERGEN, C. J. 1979. Information Retrieval. Butterworths, London, England. Google ScholarGoogle Scholar
  56. WEISS, R., VELEZ, B., SHELDON,M.A.,NEMPREMPRE, C., SZILAGYI, P., DUDA, A., AND GIFFORD, D. K. 1996. HyPursuit: A hierarchical network search engine that exploits content-link hypertext clustering. In Proceedings of the 7th ACM Conference on Hypertext (Washington, D.C., Mar. 16-20). ACM, New York, pp. 180-193. Google ScholarGoogle Scholar
  57. WIRED DIGITAL,INC. Hotbot, http://www.hotbot.com.Google ScholarGoogle Scholar
  58. YAHOO!CORPORATION Yahoo!, http://www.yahoo.com.Google ScholarGoogle Scholar

Index Terms

  1. Authoritative sources in a hyperlinked environment

            Recommendations

            Reviews

            Lynda Hardman

            Searching for relevant information on the World Wide Web can be very much a hit-or-miss experience. In order to improve matters, Kleinberg first introduces the notion of broad-topic queries, where the user is interested in information on a particular topic and a standard text search may produce hundreds or thousands of hits of uncertain relevance. He then introduces the notions of hubs and authorities, where an authority is a page linked to by many hubs, and a hub is a page linking to many authorities. While this at first seems to be a circular definition, he presents a computationally inexpensive algorithm that is able to identify hubs and authorities reliably. (Note that Kleinberg does not claim that the algorithm finds all hubs and authorities relevant to the query.) In addition, the motivation of the algorithm is highly intuitive and is , in itself, an interesting and insightful contribution. In addition to presenting his own work, the author devotes a large part of the paper to a thorough discussion of related work, covering studies not only of online sources but also of printed materials (such as journal citation indices). Not only is the paper authoritative in its own right, it is a hub pointing to other works on the topic.

            Access critical reviews of Computing literature here

            Become a reviewer for Computing Reviews.

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader