research-article

ARCOMEM: from collect-all ARchives to COmmunity MEMories

Authors:
Thomas Risse

University of Hanover, Hanover, Germany

University of Hanover, Hanover, Germany
View Profile

,
Wim Peters

University of Sheffield, Sheffied, United Kingdom

University of Sheffield, Sheffied, United Kingdom
View Profile

WWW '12 Companion: Proceedings of the 21st International Conference on World Wide WebApril 2012Pages 275–278https://doi.org/10.1145/2187980.2188027

Published:16 April 2012Publication History

WWW '12 Companion: Proceedings of the 21st International Conference on World Wide Web

Pages 275–278

ABSTRACT

The ARCOMEM project is about memory institutions like archives, museums and libraries in the age of the Social Web. Social media are becoming more and more pervasive in all areas of life. ARCOMEM's aim is to help to transform archives into collective memories that are more tightly integrated with their community of users and to exploit Web 2.0 and the wisdom of crowds to make Web archiving a more selective and meaning-based process. ARCOMEM (FP7-IST-270239) is an Integrating Project in the FP7 program of the European Commission, which involves twelve partners from academia, industry and public sector. The project will run from January 1, 2011 to December 31, 2013.

References

A. Arvidson and F. Lettenström. The Kulturarw Project - The Swedish Royal Web Archive. Electronic library, 16(2), 1998.Google Scholar
R. Baeza-Yates, C. Castillo, M. Marin, and A. Rodriguez. Crawling a country: better strategies than breadth-first for web page ordering. In Special interest tracks and posters of the 14th international conference on World Wide Web, WWW '05, pages 864--872, New York, 2005. ACM. Google ScholarDigital Library
Blue Ribbon Task Force on Sustainable Digital Preservation and Access. Sustainable economics for a digital planet, ensuring long-term access to digital information, 2010.Google Scholar
S. Chakrabarti, M. V. D. Berg, and B. Dom. Focused crawling: a new approach to topic-specific web resource discovery. In Computer Networks, pages 1623--1640, 1999. Google ScholarDigital Library
J. Cho, H. Garcia-Molina, and L. Page. Efficient crawling through url ordering. In Proceedings of the seventh international conference on World Wide Web 7, WWW7, pages 161--172, Amsterdam, The Netherlands, The Netherlands, 1998. Elsevier Science Publishers B. V. Google ScholarDigital Library
H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. GATE: A framework and graphical development environment for robust NLP tools and applications. In Proc. of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL'02), 2002.Google Scholar
P. G. Enser, C. J. Sandom, J. S. Hare, and P. H. Lewis. Facing the reality of semantic image retrieval. Journal of Documentation, 63(4):465 -- 481, 2007.Google ScholarCross Ref
A. Goyal, B.-W. On, F. Bonchi, and L. V. S. Lakshmanan. Gurumine: A pattern mining system for discovering leaders and tribes. In Proceedings of the 2009 IEEE International Conference on Data Engineering, pages 1471--1474, Washington, DC, USA, 2009. IEEE Computer Society. Google ScholarDigital Library
S.-M. Kim and E. Hovy. Automatic detection of opinion bearing words and sentences. In Companion Volume to the Proceedings of IJCNLP-05, the Second International Joint Conference on Natural Language Processing, pages 61--66, Jeju Island, KR, 2005.Google Scholar
R. Kumar, J. Novak, P. Raghavan, and A. Tomkins. On the bursty evolution of blogspace. In Proceedings of the 12th international conference on World Wide Web, WWW '03, pages 568--576, New York, NY, USA, 2003. ACM. Google ScholarDigital Library
J. Masanès. Web archiving. Springer, 2006. Google ScholarDigital Library
D. Maynard, Y. Li, and W. Peters. NLP Techniques for Term Extraction and Ontology Population. In P. Buitelaar and P. Cimiano, editors, Bridging the Gap between Text and Knowledge - Selected Contributions to Ontology Learning and Population from Text. IOS Press, 2008. Google ScholarDigital Library
D. McClosky, M. Surdeanu, and C. D. Manning. Event extraction as dependency parsing. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, HLT '11, pages 1626--1635, Stroudsburg, PA, USA, 2011. Association for Computational Linguistics. Google ScholarDigital Library
F. Menczer, G. Pant, and P. Srinivasan. Topical web crawlers: Evaluating adaptive algorithms. ACM Trans. Internet Technol., 4:378--419, Nov. 2004. Google ScholarDigital Library
G. Mohr, M. Kimpton, M. Stack, and I. Ranitovic. Introduction to Heritrix, an archival quality web crawler. In 4th International Web Archiving Workshop (IWAW04), 2004.Google Scholar

Index Terms

ARCOMEM: from collect-all ARchives to COmmunity MEMories

Recommendations

Exploiting the social and semantic web for guided web archiving
TPDL'12: Proceedings of the Second international conference on Theory and Practice of Digital Libraries

The constantly growing amount of Web content and the success of the Social Web lead to increasing needs for Web archiving. These needs go beyond the pure preservation of Web pages. Web archives are turning into "community memories" that aim at building ...
Read More
The archival acid test: evaluating archive performance on advanced HTML and JavaScript
JCDL '14: Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries

When preserving web pages, archival crawlers sometimes produce a result that varies from what an end-user expects. To quantitatively evaluate the degree to which an archival crawler is capable of comprehensively reproducing a web page from the live web ...
Read More
Uncovering the unarchived web
SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval

Many national and international heritage institutes realize the importance of archiving the web for future culture heritage. Web archiving is currently performed either by harvesting a national domain, or by crawling a pre-defined list of websites ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '12 Companion: Proceedings of the 21st International Conference on World Wide Web
April 2012
1250 pages
ISBN:9781450312301
DOI:10.1145/2187980
General Chairs:
Alain Mille
Université de Lyon, France
,
Fabien Gandon
INRIA, France
,
Jacques Misselis
HP, France
,
Program Chairs:
Michael Rabinovich
Case Western Reserve University, USA
,
Steffen Staab
University of Koblenz-Landau, Germany
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 April 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
architecture
social web
text analysis
web archiving
web crawler
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 130
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

ARCOMEM: from collect-all ARchives to COmmunity MEMories

WWW '12 Companion: Proceedings of the 21st International Conference on World Wide Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Exploiting the social and semantic web for guided web archiving

The archival acid test: evaluating archive performance on advanced HTML and JavaScript

Uncovering the unarchived web