skip to main content
article
Free Access

Web usage mining: discovery and applications of usage patterns from Web data

Published:01 January 2000Publication History
Skip Abstract Section

Abstract

Web usage mining is the application of data mining techniques to discover usage patterns from Web data, in order to understand and better serve the needs of Web-based applications. Web usage mining consists of three phases, namely preprocessing, pattern discovery, and pattern analysis. This paper describes each of these phases in detail. Given its application potential, Web usage mining has seen a rapid increase in interest, from both the research and practice communities. This paper provides a detailed taxonomy of the work in this area, including research efforts as well as commercial offerings. An up-to-date survey of the existing work is also provided. Finally, a brief overview of the WebSIFT system as an example of a prototypical Web usage mining system is given.

References

  1. Accrue. http://www.accrue.com.]]Google ScholarGoogle Scholar
  2. Alladvantage. http://www.alladvantage.com.]]Google ScholarGoogle Scholar
  3. Andromedia aria. http://www.andromedia.com.]]Google ScholarGoogle Scholar
  4. Broádvision. http://www.broadvision.com.]]Google ScholarGoogle Scholar
  5. Hit list commerce, http://www.marketwave.com.]]Google ScholarGoogle Scholar
  6. Likeminds. http://www.andromedia.com.]]Google ScholarGoogle Scholar
  7. Netgenesis. http://www.netgenesis.com.]]Google ScholarGoogle Scholar
  8. Netperceptions. http://www.netperceptions.com.]]Google ScholarGoogle Scholar
  9. Netzero. http://www.netzero.com.]]Google ScholarGoogle Scholar
  10. Platform for privacy project. http://www.w3.org/P3P/.]]Google ScholarGoogle Scholar
  11. Surfaid analytics. http://surfaid.dfw.ibm.com.]]Google ScholarGoogle Scholar
  12. Truste: Building a web you can believe in. http://www.truste.org/.]]Google ScholarGoogle Scholar
  13. Webtrends log analyzer. http://www.webtrends.com.]]Google ScholarGoogle Scholar
  14. World wide web committee web usage characterization activity. http://www.w3.org/WCA.]]Google ScholarGoogle Scholar
  15. European commission, the directive on the protection of individuals with regard ot the processing of personal data and on the free movement of such data. http://www2.echo.lu/, 1998.]]Google ScholarGoogle Scholar
  16. Data mining: Crossing the chasm, 1999. Invited talk at the 5th ACM SIGKDD Int'l Conference on Knowledge Discovery and Data Mining(KDD99).]]Google ScholarGoogle Scholar
  17. Charu C Aggarwal and Philip S Yu. On disk caching of web objects in proxy servers. In CIKM 97, pages 238--245, Las Vegas, Nevada, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. of the 20th VLDB Conference, pages 487--499, Santiago, Chile, 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Virgilio Almeida, Azer Bestavros, Mark Crovella, and Adriana de Oliveira. Characterizing reference locality in the www. Technical Report TR-96-11, Boston University, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Martin F Arlitt and Carey L Williamson. Internet web servers: Workload characterization and performance implications. IEEE/ACM Transactions on Networking, 5(5):631--645, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Balabanovic and Y. Shoham. Learning information retrieval agents: Experiments with automated web browsing. In On-line Working Notes of the AAAI Spring Symposium Series on Information Gathering from Distributed, Heterogeneous Environments, 1995.]]Google ScholarGoogle Scholar
  22. Alex Buchner and Maurice D Mulvenna. Discovering internet marketing intelligence through online analytical web usage mining. SIGMOD Record, 27(4):54--61, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. L. Catledge and J. Pitkow. Characterizing browsing behaviors on the world wide web. Computer Networks and ISDN Systems, 27(6), 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. S. Chen, J. Han, and P. S. Yu. Data mining: An overview from a database perspective. IEEE Transactions on Knowledge and Data Engineering, 8(6):866--883, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. S. Chen, J. S. Park, and P. S. Yu. Data mining for path traversal patterns in a web environment. In 16th International Conference on Distributed Computing Systems, pages 385--392, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Roger Clarke. Internet privacy concerns conf the case for intervention. 42(2):60--67, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. E. Cohen, B. Krishnamurthy, and J. Rexford. Improving end-to-end performance of the web using server volumes and proxy filters. In Proc. ACM SIGCOMM, pages 241--253, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Robert Cooley, Bamshad Mobasher, and Jaideep Srivastava. Grouping web page references into transactions for mining world wide web browsing patterns. In Knowledge and Data Engineering Workshop, pages 2--9, Newport Beach, CA, 1997. IEEE.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Robert Codley, Bamshad Mobasher, and Jaideep Srivastava. Web mining: Information and pattern discovery on the world wide web. In International Conference on Tools with Artificial Intelligence, pages 558--567, Newport Beach, 1997. IEEE.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Robert Cooley, Bamshad Mobasher, and Jaideep Srivastava. Data preparation for mining world wide web browsing patterns. Knowledge and Information Systems, 1(1), 1999.]]Google ScholarGoogle Scholar
  31. Robert Cooley, Pang-Ning Tan, and Jaideep Srivastava. Discovery of interesting usage patterns from web data. Technical Report TR 99-022, University of Minnesota, 1999.]]Google ScholarGoogle Scholar
  32. T. Fawcett and F. Provost. Activity monitoring: Noticing interesting changes in behavior. In Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 53--62, San Diego, CA, 1999. ACM.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. From data mining to knowledge discovery: An overview. In Proc. ACM KDD, 1994.]]Google ScholarGoogle Scholar
  34. David Gibson, Jon Kleinberg, and Prabhakar Raghavan. Inferring web communities from link topology. In Conference on Hypertext and Hypermedia. ACM, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Chi E. H., Pitkow J., Mackinlay J., Pirolli P., Gossweiler, and Card S. K. Visualizing the evolution of web ecologies. In CHI '98, Los Angeles, California, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Bernardo Huberman, Peter Pirolli, James Pitkow, and Rajan Kukose. Strong regularities in world wide web surfing. Technical report, Xerox PARC, 1998.]]Google ScholarGoogle Scholar
  37. T. Joachims, D. Freitag, and T. Mitchell. Webwatcher: A tour guide for the world wide web. In The 15th International Conference on Artificial Intelligence, Nagoya, Japan, 1997.]]Google ScholarGoogle Scholar
  38. Reagle Joseph and Cranor Lorrie Faith. The platform for privacy preferences. 42(2):48--55, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. H. Lieberman. Letizia: An agent that assists web browsing. In Proc. of the 1995 International Joint Conference on Artificial Intelligence, Montreal, Canada, 1995.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Stephen Lee Manley. An Analysis of Issues Facing World Wide Web Servers. Undergraduate, Harvard, 1997.]]Google ScholarGoogle Scholar
  41. B. Masand and M. Spiliopoulou, editors. Workshop on Web Usage Analysis and User Profiling (WebKDD), 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. B. Mobasher, N. Jain, E. Han, and J. Srivastava. Web mining: Pattern discovery from world wide web transactions. (TR 96-050), 1996.]]Google ScholarGoogle Scholar
  43. Bamshad Mobasher, Robert Cooley, and Jaideep Srivastava. Creating adaptive web sites through usage-based clustering of urls. In Knowledge and Data Engineering Workshop, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Olfa Nasraoui, Raghu Krishnapuram, and Anupam Joshi. Mining web access logs using a fuzzy relational clustering algorithm based on a robust estimator. In Eighth International World Wide Web Conference, Toronto, Canada, 1999.]]Google ScholarGoogle Scholar
  45. D. S. W. Ngu and X. Wu. Sitehelper: A localized agent that helps incremental exploration of the world wide web. In 6th International World Wide Web Conference, Santa Clara, CA, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Balaji Padmanabhan and Alexander Tuzhilin. A belief-driven method for discovering unexpected patterns. In Fourth International Conference on Knowledge Discovery and Data Mining, pages 94--100, New York, New York, 1998.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. M. Pazzani, L. Nguyen, and S. Mantik. Learning from hotlists and coldlists: Towards a www information filtering and seeking agent. In IEEE 1995 International Conference on Tools with Artificial Intelligence, 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Mike Perkowitz and Oren Etzioni. Adaptive web sites: Automatically synthesizing web pages. In Fifteenth National Conference on Artificial Intelligence, Madison, WI, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Mike Perkowitz and Oren Etzioni. Adaptive web sites: Conceptual cluster mining. In Sixteenth International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Peter Pirolli, James Pitkow, and Ramana Rao. Silk from a sow's ear: Extracting usable structures from the web. In CHI-96, Vancouver, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. G. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, New York, 1983.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. S. Schechter, M. Krishnan, and M. D. Smith. Using path profiles to predict http requests. In 7th International World Wide Web Conference, Brisbane, Australia, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Cyrus Shahabi, Amir M Zarkesh, Jafar Adibi, and Vishal Shah. Knowledge discovery from users web-page navigation. In Workshop on Research Issues in Data Engineering, Birmingham, England, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. E. Spertus. Parasite : Mining structural information on the web. Computer Networks and ISDN Systems: The International Journal of Computer and Telecommunication Networking, 29:1205--1215, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Myra Spiliopoulou and Lukas C Faulstich. Wum: A web utilization miner. In EDBT Workshop WebDB98, Valencia, Spain, 1998. Springer Verlag.]]Google ScholarGoogle Scholar
  56. Kun-lung Wu, Philip S Yu, and Allen Ballman. Speed-tracer: A web usage mining and analysis tool. IBM Systems Journal, 37(1), 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. T. Yan, M. Jacobsen, H. Garcia-Molina, and U. Dayal. From user access patterns to dynamic hypertext linking. In Fifth International World Wide Web Conference, Paris, France, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. O. R. Zaiane, M. Xin, and J. Han. Discovering web access patterns and trends by applying olap and data mining technology on web logs. In Advances in Digital Libraries, pages 19--29, Santa Barbara, CA, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Amir Zarkesh, Jafar Adibi, Cyrus Shahabi, Reza Sadri, and Vishal Shah. Analysis and design of server informative wwwsites. In Sixth International Conference on Information and Knowledge Management, Las Vegas, Nevada, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Web usage mining: discovery and applications of usage patterns from Web data
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM SIGKDD Explorations Newsletter
              ACM SIGKDD Explorations Newsletter  Volume 1, Issue 2
              January 2000
              115 pages
              ISSN:1931-0145
              EISSN:1931-0153
              DOI:10.1145/846183
              Issue’s Table of Contents

              Copyright © 2000 Authors

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 1 January 2000

              Check for updates

              Qualifiers

              • article

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader