Abstract
Web usage mining is the application of data mining techniques to discover usage patterns from Web data, in order to understand and better serve the needs of Web-based applications. Web usage mining consists of three phases, namely preprocessing, pattern discovery, and pattern analysis. This paper describes each of these phases in detail. Given its application potential, Web usage mining has seen a rapid increase in interest, from both the research and practice communities. This paper provides a detailed taxonomy of the work in this area, including research efforts as well as commercial offerings. An up-to-date survey of the existing work is also provided. Finally, a brief overview of the WebSIFT system as an example of a prototypical Web usage mining system is given.
- Accrue. http://www.accrue.com.]]Google Scholar
- Alladvantage. http://www.alladvantage.com.]]Google Scholar
- Andromedia aria. http://www.andromedia.com.]]Google Scholar
- Broádvision. http://www.broadvision.com.]]Google Scholar
- Hit list commerce, http://www.marketwave.com.]]Google Scholar
- Likeminds. http://www.andromedia.com.]]Google Scholar
- Netgenesis. http://www.netgenesis.com.]]Google Scholar
- Netperceptions. http://www.netperceptions.com.]]Google Scholar
- Netzero. http://www.netzero.com.]]Google Scholar
- Platform for privacy project. http://www.w3.org/P3P/.]]Google Scholar
- Surfaid analytics. http://surfaid.dfw.ibm.com.]]Google Scholar
- Truste: Building a web you can believe in. http://www.truste.org/.]]Google Scholar
- Webtrends log analyzer. http://www.webtrends.com.]]Google Scholar
- World wide web committee web usage characterization activity. http://www.w3.org/WCA.]]Google Scholar
- European commission, the directive on the protection of individuals with regard ot the processing of personal data and on the free movement of such data. http://www2.echo.lu/, 1998.]]Google Scholar
- Data mining: Crossing the chasm, 1999. Invited talk at the 5th ACM SIGKDD Int'l Conference on Knowledge Discovery and Data Mining(KDD99).]]Google Scholar
- Charu C Aggarwal and Philip S Yu. On disk caching of web objects in proxy servers. In CIKM 97, pages 238--245, Las Vegas, Nevada, 1997.]] Google ScholarDigital Library
- R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. of the 20th VLDB Conference, pages 487--499, Santiago, Chile, 1994.]] Google ScholarDigital Library
- Virgilio Almeida, Azer Bestavros, Mark Crovella, and Adriana de Oliveira. Characterizing reference locality in the www. Technical Report TR-96-11, Boston University, 1996.]] Google ScholarDigital Library
- Martin F Arlitt and Carey L Williamson. Internet web servers: Workload characterization and performance implications. IEEE/ACM Transactions on Networking, 5(5):631--645, 1997.]] Google ScholarDigital Library
- M. Balabanovic and Y. Shoham. Learning information retrieval agents: Experiments with automated web browsing. In On-line Working Notes of the AAAI Spring Symposium Series on Information Gathering from Distributed, Heterogeneous Environments, 1995.]]Google Scholar
- Alex Buchner and Maurice D Mulvenna. Discovering internet marketing intelligence through online analytical web usage mining. SIGMOD Record, 27(4):54--61, 1998.]] Google ScholarDigital Library
- L. Catledge and J. Pitkow. Characterizing browsing behaviors on the world wide web. Computer Networks and ISDN Systems, 27(6), 1995.]] Google ScholarDigital Library
- M. S. Chen, J. Han, and P. S. Yu. Data mining: An overview from a database perspective. IEEE Transactions on Knowledge and Data Engineering, 8(6):866--883, 1996.]] Google ScholarDigital Library
- M. S. Chen, J. S. Park, and P. S. Yu. Data mining for path traversal patterns in a web environment. In 16th International Conference on Distributed Computing Systems, pages 385--392, 1996.]] Google ScholarDigital Library
- Roger Clarke. Internet privacy concerns conf the case for intervention. 42(2):60--67, 1999.]] Google ScholarDigital Library
- E. Cohen, B. Krishnamurthy, and J. Rexford. Improving end-to-end performance of the web using server volumes and proxy filters. In Proc. ACM SIGCOMM, pages 241--253, 1998.]] Google ScholarDigital Library
- Robert Cooley, Bamshad Mobasher, and Jaideep Srivastava. Grouping web page references into transactions for mining world wide web browsing patterns. In Knowledge and Data Engineering Workshop, pages 2--9, Newport Beach, CA, 1997. IEEE.]] Google ScholarDigital Library
- Robert Codley, Bamshad Mobasher, and Jaideep Srivastava. Web mining: Information and pattern discovery on the world wide web. In International Conference on Tools with Artificial Intelligence, pages 558--567, Newport Beach, 1997. IEEE.]] Google ScholarDigital Library
- Robert Cooley, Bamshad Mobasher, and Jaideep Srivastava. Data preparation for mining world wide web browsing patterns. Knowledge and Information Systems, 1(1), 1999.]]Google Scholar
- Robert Cooley, Pang-Ning Tan, and Jaideep Srivastava. Discovery of interesting usage patterns from web data. Technical Report TR 99-022, University of Minnesota, 1999.]]Google Scholar
- T. Fawcett and F. Provost. Activity monitoring: Noticing interesting changes in behavior. In Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 53--62, San Diego, CA, 1999. ACM.]] Google ScholarDigital Library
- U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. From data mining to knowledge discovery: An overview. In Proc. ACM KDD, 1994.]]Google Scholar
- David Gibson, Jon Kleinberg, and Prabhakar Raghavan. Inferring web communities from link topology. In Conference on Hypertext and Hypermedia. ACM, 1998.]] Google ScholarDigital Library
- Chi E. H., Pitkow J., Mackinlay J., Pirolli P., Gossweiler, and Card S. K. Visualizing the evolution of web ecologies. In CHI '98, Los Angeles, California, 1998.]] Google ScholarDigital Library
- Bernardo Huberman, Peter Pirolli, James Pitkow, and Rajan Kukose. Strong regularities in world wide web surfing. Technical report, Xerox PARC, 1998.]]Google Scholar
- T. Joachims, D. Freitag, and T. Mitchell. Webwatcher: A tour guide for the world wide web. In The 15th International Conference on Artificial Intelligence, Nagoya, Japan, 1997.]]Google Scholar
- Reagle Joseph and Cranor Lorrie Faith. The platform for privacy preferences. 42(2):48--55, 1999.]] Google ScholarDigital Library
- H. Lieberman. Letizia: An agent that assists web browsing. In Proc. of the 1995 International Joint Conference on Artificial Intelligence, Montreal, Canada, 1995.]]Google ScholarDigital Library
- Stephen Lee Manley. An Analysis of Issues Facing World Wide Web Servers. Undergraduate, Harvard, 1997.]]Google Scholar
- B. Masand and M. Spiliopoulou, editors. Workshop on Web Usage Analysis and User Profiling (WebKDD), 1999.]] Google ScholarDigital Library
- B. Mobasher, N. Jain, E. Han, and J. Srivastava. Web mining: Pattern discovery from world wide web transactions. (TR 96-050), 1996.]]Google Scholar
- Bamshad Mobasher, Robert Cooley, and Jaideep Srivastava. Creating adaptive web sites through usage-based clustering of urls. In Knowledge and Data Engineering Workshop, 1999.]] Google ScholarDigital Library
- Olfa Nasraoui, Raghu Krishnapuram, and Anupam Joshi. Mining web access logs using a fuzzy relational clustering algorithm based on a robust estimator. In Eighth International World Wide Web Conference, Toronto, Canada, 1999.]]Google Scholar
- D. S. W. Ngu and X. Wu. Sitehelper: A localized agent that helps incremental exploration of the world wide web. In 6th International World Wide Web Conference, Santa Clara, CA, 1997.]] Google ScholarDigital Library
- Balaji Padmanabhan and Alexander Tuzhilin. A belief-driven method for discovering unexpected patterns. In Fourth International Conference on Knowledge Discovery and Data Mining, pages 94--100, New York, New York, 1998.]]Google ScholarDigital Library
- M. Pazzani, L. Nguyen, and S. Mantik. Learning from hotlists and coldlists: Towards a www information filtering and seeking agent. In IEEE 1995 International Conference on Tools with Artificial Intelligence, 1995.]] Google ScholarDigital Library
- Mike Perkowitz and Oren Etzioni. Adaptive web sites: Automatically synthesizing web pages. In Fifteenth National Conference on Artificial Intelligence, Madison, WI, 1998.]] Google ScholarDigital Library
- Mike Perkowitz and Oren Etzioni. Adaptive web sites: Conceptual cluster mining. In Sixteenth International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 1999.]] Google ScholarDigital Library
- Peter Pirolli, James Pitkow, and Ramana Rao. Silk from a sow's ear: Extracting usable structures from the web. In CHI-96, Vancouver, 1996.]] Google ScholarDigital Library
- G. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, New York, 1983.]] Google ScholarDigital Library
- S. Schechter, M. Krishnan, and M. D. Smith. Using path profiles to predict http requests. In 7th International World Wide Web Conference, Brisbane, Australia, 1998.]] Google ScholarDigital Library
- Cyrus Shahabi, Amir M Zarkesh, Jafar Adibi, and Vishal Shah. Knowledge discovery from users web-page navigation. In Workshop on Research Issues in Data Engineering, Birmingham, England, 1997.]] Google ScholarDigital Library
- E. Spertus. Parasite : Mining structural information on the web. Computer Networks and ISDN Systems: The International Journal of Computer and Telecommunication Networking, 29:1205--1215, 1997.]] Google ScholarDigital Library
- Myra Spiliopoulou and Lukas C Faulstich. Wum: A web utilization miner. In EDBT Workshop WebDB98, Valencia, Spain, 1998. Springer Verlag.]]Google Scholar
- Kun-lung Wu, Philip S Yu, and Allen Ballman. Speed-tracer: A web usage mining and analysis tool. IBM Systems Journal, 37(1), 1998.]] Google ScholarDigital Library
- T. Yan, M. Jacobsen, H. Garcia-Molina, and U. Dayal. From user access patterns to dynamic hypertext linking. In Fifth International World Wide Web Conference, Paris, France, 1996.]] Google ScholarDigital Library
- O. R. Zaiane, M. Xin, and J. Han. Discovering web access patterns and trends by applying olap and data mining technology on web logs. In Advances in Digital Libraries, pages 19--29, Santa Barbara, CA, 1998.]] Google ScholarDigital Library
- Amir Zarkesh, Jafar Adibi, Cyrus Shahabi, Reza Sadri, and Vishal Shah. Analysis and design of server informative wwwsites. In Sixth International Conference on Information and Knowledge Management, Las Vegas, Nevada, 1997.]] Google ScholarDigital Library
Index Terms
- Web usage mining: discovery and applications of usage patterns from Web data
Recommendations
The use of web structure and content to identify subjectively interesting web usage patterns
The discipline of Web Usage Mining has grown rapidly in the past few years, despite the crash of the e-commerce boom of the late 1990s. Web Usage Mining is the application of data mining techniques to Web clickstream data in order to extract usage ...
Interpretable Mining of Influential Patterns from Sparse Web
WI-IAT '21: IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent TechnologyBig data are everywhere. World Wide Web is an example of these big data. It has become a vast data production and consumption platform, at which threads of data evolve from multiple devices, by different human interactions, over worldwide locations, ...
Research of WEB Usage Mining Based on Negative Association Rules
IFCSTA '09: Proceedings of the 2009 International Forum on Computer Science-Technology and Applications - Volume 01Today, Internet has become an indispensable tool for everyone, Web usage mining correspondingly becomes a hotspot, Which uses large amounts of data in the Web server log and other relevant data sets for mining analysis and gains valuable knowledge model ...
Comments