Web Citations Analysis of the JASSS : the First Ten Years

The aim of this research is to scrutinize the accessibility and decay of web citations (URLs) used in the refereed articles published by Journal of Artificial Societies and Social Simulation (JASSS). To do this, at first, we downloaded all articles of JASSS from 1998 to 2007. After acquiring all articles, their web citations are extracted and analyzed from the accessibility and decay point of view. Moreover, for initially missed web citations complementary pathways such as using internet explorer and Google search engine are employed. Then, data collected are analyzed using descriptive statistical methods. The study revealed that at first check 75% of web citations are accessible while 25% disappeared. It is notable that rate of accessibility increased to 94% and rate of decay decreased to 6% after using complementary pathways. The .edu/.ac.xx domain with the accessibility of 98% (decay, 2%) has the most stability and persistency among all domains, while the most stable files format was PDF with the accessibility of 99% (decay, 1%) and HTM/HTML files with 96% accessibility (4%). Finally, some suggestions and recommendations are presented to stop, or diminish, decay phenomenon.


Introduction 1.1
The Internet is one of the most important and complex innovations in human history, the largest and most complete tool for information exchange ever made available to the global population (Maharana et al. 2006).Nowadays, the use of Internet for identifying valuable and timely information has become necessary for most scientists as well as the public with access to the World Wide Web, since scientific and other work is created and added in digital format on the Internet every day (Falagas et al. 2007).

1.2
With the progress of Internet, the full texts of many articles in the scientific journals are presented electronically and in open access form for researchers and hence the Internet has become one the main communicational tools among researchers.Ebooks, e-journals, e-databases, e-theses and dissertations, e-prints of research papers, and the like have provided a scope for researchers and authors in various subject fields and stimulated their research productivity.Consequently, citations to these Internet resources as novel references have emerged and increased in number.As Zhao and Logan (2002), have indicated, the main reason for such an increase in the number of web citations in scholarly papers is that the web has become the first choice for finding information on current research, for breaking scientific discoveries and for keeping up with colleagues at other institutions.On the other hand, there are a lot of open access resources and journals available on the web.This has led authors to refer to more and more web resources as part of their increased research productivity.

1.3
Currently (February 2011), according to the Directory of Open Access Journal there are 6114 open access journals (DOAJ 2011).Among them, 200 journals including Journal of Artificial Societies and Social Simulation (JASSS) are indexed in ISI (ISI 2004).The Journal of Artificial Societies and Social Simulation (JASSS) is an interdisciplinary journal for the exploration and understanding of social processes by means of computer simulation.Since its first issue in 1998, it has been a world-wide leading reference for readers interested in social simulation and the application of computer simulation in the social sciences (JASSS 2011).

1.4
Social simulation can be considered a young and fast-developing field (Meyer et al. 2009).Social simulation's rapid development is reflected in its publication outlets.In addition to workshops and conferences being documented in edited volumes, scientific journals such as the Journal of Artificial Societies and Social Simulation (JASSS), the Journal of Economic Interaction and Coordination (JEIC) and Computational and Mathematical Organization Theory (CMOT) now specialize in social simulation (Gilbert 1998;Namatame et al. 2006;Carley 1995;Meyer et al. 2009).

1.5
These journals and those citations play a pivotal and key role in the publishing new papers by social simulation researchers.Nevertheless, in this atmosphere the instability and shakiness of the Internet is a concern for researchers.They are worried about the decay and annihilation of the citations which gradually hinder accessibility to Internet citations (Germain 2000;Dimitrova and Bugeja 2007;Goh and Ng 2007;Wagner et al. 2009).Therefore, this study attempts to delineate the accessibility and decay of web citations in JASSS.The possible issues preventing this problem are also discussed.

2.1
This study is based on a range of previous, related efforts to study the accessibility and decay of web citations in different journals.

2.2
Harter and Kim's (1996) article entitled "Electronic journals and scholarly communication: a citation and reference study" was one of the first studies on availability and permanency of URLs.The major purpose of their research was to study the effects of scholarly, peer-reviewed e-journals on formal scholarly and scientific communication, as measured by cited references.Accordingly, they extracted and examined 47 unique URLs of 39 scholarly, peer-reviewed e-journals published during 1993 to 1995 .They showed that one-third of citations (31%) became inaccessible at the end of 1995 (Harter and Kim 1996).Koehler (1999 and2002), in a longitudinal study, examined both the accessibility and content of 360 randomly chosen URLs obtained from a web crawler over 3 years.He found that about 50% of them were still active at the end of this time and most had changed in content.

2.4
Germain (2000) investigated the reliability of URLs in academic citation.31 randomly chosen academic journal articles, containing 64 citations with URLs, were reviewed.It is worth saying that the academic journals used were from a variety of disciplines.13 citations were from information and library science, 10 from the hard sciences, 17 from computer science, 11 from the humanities, and 13 from the social sciences.The printed journals were published between 1995 and 1997.Results of this longitudinal study showed an increasing decline in the availability of URLs.Statistically, after a three-year period, almost 50 percent of the URLs could not be accessed and two-thirds of the journal articles contained corroded citations (Germain 2000).The main error message was "Not Found".

2.5
Davis and Cohen (2001) made a citation analysis of undergraduate term papers in microeconomics and revealed a significant decrease in the frequency of scholarly resources cited between 1996 and 1999.Web citations checked in 2000 revealed that only 55% of URLs cited in 1999 led to the correct Internet document (Davis and Cohen 2001).Davis (2002) in an update to the 1996-1999 citation analysis concluded that 65% of the citations pointed directly to the cited document, up from 55% in 1999.

2.6
Dellavalle et al. (2003) examined systematically the extent of Internet referencing and Internet reference activity in medical or scientific publications in more than 1000 articles published between 2000 and 2003 in three journals: New England Journal of Medicine, Journal of the American Medical Association , and Science.They found that Internet references accounted for 2.6% of all references (672/25548) and in articles 27 months old, 13% of Internet references were inactive with error messages.

2.7
Casserly and Bird (2003) examined 500 Internet citations randomly chosen from scholarly articles published in library and information science journals.The average number of web citations for each paper was 2.5.They found that only 56.4% of those URLs were permanent, while the rest had disappeared from the original web address.Moreover, the error message "not available" was the most frequent message found by this study.They found that the most frequent cited URLs belong to the domains ".org" and ".edu.xx".

2.8
McCown et al. (2005) explored the availability and persistence of URLs cited in articles published in D-Lib Magazine.For their research, they extracted 4387 unique URLs referenced in 453 articles published from July 1995 to August 2004.In conclusion, it was realized that approximately 28% of those URLs failed to resolve initially, and 30% failed to resolve at the last check.A majority of the unresolved URLs were due to 404 (page not found) error.Moreover, based on the data collected, they found that URLs were more likely to be unavailable if they pointed to resources in the .net,.edu.xx or country-specific domain, used nonstandard ports (i.e., not port 80), or pointed to resources with uncommon or deprecated extensions (e.g., .shtml,.ps,.txt).

2.9
Wren et al. (2006) explored URL decay in dermatology journals.To do this, they considered URLs in articles published between January 1, 1999, and September 30, 2004, in the 3 dermatology journals with the highest scientific impact.The percentage of articles containing at least 1 URL increased from 2.3% in 1999 to 13.5% in 2004.Of the 1113 URLs, 81.7% were available (decreasing with time since publication from 89.1% of 2004 URLs to 65.4% of 1999 URLs) (p < .001).

2.10
Dimitrova and Bugeja (2007) examined the use of web citations, focusing on five leading journals in journalism and communication.They analyzed 1126 URL reference addresses in citations of articles published between 2000 and 2003.The results showed that only 61% of the web citations remained accessible in 2004.The content analysis also demonstrated that ".org" and ".gov" were the most stable domains with 70% active links.
2.11 Goh & Ng (2007) studied the accessibility and decay of three specialized library and information science journals during 1997-2003.They found that only 69% of those URLs were permanent, while the remaining 31% had disappeared from the original web address.56% of the error messages were "404" (page not found).The ".edu.xx" with 36% active links were the most stable domains.Wagner et al. (2009) studied the accessibility and decay of health care management journals during 2002-2004.They found that only 50.7% of those URLs were permanent, while the remaining 49.3% had disappeared from the original web address and results in error messages.The ".edu.xx"domain with 68.4% accessibility, appear as the most stable domain.

2.13
In order to provide a better understanding and comparison of the related literature, a summary of previous studies on URLs decay is included in Table 1.

3.2
Data collection is performed using Internet.First, we downloaded all articles of JASSS from 1998 to 2007 and calculated the number of all citations to both printed and Web sources.

3.3
It should be noted that a 10-years span was selected because the Web is a dynamic and ever-changing medium and thus, Web citations will be gradually inaccessible after the publication of articles.In addition, Web citations existing in younger articles are typically more accessible than ones used in older articles.Consequently, Web citations of articles published at least 3 years ago (namely 2007 and before) were chosen for final study.

3.4
It should be noted that only refereed articles have been studied.Accordingly, editorials, reports, reviews and so on have been neglected.The total number of 241 refereed articles with 558 unique web citations was obtained and then the accessibility and decay of these web citations was checked.When we could not access directly e-address (link) inserted in any article by author(s), we tried to view the referred website.If this attempt seemed to be insufficient, following the procedure used in prior research (Casserly and Bird 2003), the search engine "Google" was employed to access the missing reference(s).

3.5
Google was selected as the search engine for this process because of the large number and variety of documents to which it provides access and because the researchers believed that its relevancy ranking would be effective for the types of narrowly defined searches they would be conducting.We performed up to five Google searches using different combinations of titles, keywords, author names, and source information.If none of these searches returned the cited content in the first twenty-five results, that content was considered to be inaccessible.Finally, collected data were analyzed, and tables and figures were drawn using Excel software.

Web Citations Analysis Results
Distribution of articles, citations and web citations

4.3
Based on findings included in Table 3, it can be said that Social Simulation authors have been increasingly making use of web citations in their studies.Also, it is of importance that these results are considerably different from Wren et al. ( 2006) because they reported that 13.5% of dermatology articles have web citations, while, as illustrated in Figure 1, JASSS refereed articles have increasingly used web citations, from 43% in 1998 to 77% in 2007.Table 4 shows that 75% (421 URLs) are accessible and the remaining 25% (137 URLs) are inaccessible at the first survey.Moreover, the most accessibility of web citations is pertaining to year 2004 (88%) and the most decay of web citations is encountered in year 1998 (62%).Pearson's correlation coefficient = 0.219, Sig= 0.000, N=558

4.5
On the basis of the data available in Table 4 and according to the statistical test of Pearson's correlation coefficient, there is a significant direct correlation between accessibility and publication year.

4.6
As reported in Table 4, 137 web citations were not directly accessible.When we failed to directly access the reference at the electronic address provided by the authors, extra efforts were made to reach to these citations using the search engine "Google".These searches were employed via different combinations of titles, keywords, author names, and source information.
If none of these searches returned the cited content in the first twenty-five results, that content was considered to be inaccessible.The status of the accessibility and decay of web citations after searching through Google is illustrated in Table 5.

4.7
Data collected in Table 5 shows that after searching for the missing web citations the corresponding accessibility rate increases from 75% to 94%.

4.8
Different methods of gaining access to the web citations are presented in Figure 2. Accordingly, of total number of accessible web citations, 80% were found at the cited URL, 15% were accessed using the Google search engine, 2% were accessed through searching for the missing URL in the Internet and 3% were found at a URL other than the cited URL.Accessibility and decay of URLs by type of domain 4.12 The URL is the address of the location of a digital document on the web.A URL essentially has four parts: protocol, domain, directory and file.A domain name is a way to identify and locate computers connected to the Internet.No two organizations can have the same domain name.A domain name always contains two or more components separated by periods, which are called "dots".A domain name can often tell the user if it is a government site, an academic site or a commercial site.Some common domain name endings are: .com.xx or .co.xx: a commercial organization.
.gov: an official government site.
.net: traditionally for network organizations, but now can be used by anyone.
In this study, five different types of domain are taken into account from the accessibility and decay viewpoint: .org,.edu/.ac.xx, .com.xx, .govand .net.Those domains not falling into any of these categories were assigned to the "others" category (Figure 4).4.14 Accordingly, from 558 web citations, the highest numbers of domains (366) are of .edu/.ac.xx type.This reveals that the data sources of most of the web citations in the present study are websites of various educational organizations and the like.Moreover, the percentage of accessibility and decay of domains which indicated as lines in Figure 4, proved that .edu/.ac.xx with 98% accessibility (2% decay) is the most stable and persistent domain while .govdomain with merely 75% accessibility (25% decay) is the least stable.

4.15
It can be said that much of the apparently high stability of the domain .edu/.ac.xx is because of the accessibility of all internal citations of JASSS.

4.16
In line with previous studies (e.g.McCown et al. 2005 andMaharana et al. 2006), the URLs were categorized into four different file formats as follows: Slash files (/): URLs which end with the / sign.HTML (hyper text markup language): Web documents created in the HTML scripting language.PDF (portable document format): the file format for documents created using Adobe Acrobat.DOC: documents created using Microsoft Word.
The data as illustrated in Figure 5 indicate that the greatest numbers of web citations are HTML files.It is important to note that all internal citations of JASSS were of this type.
4.17 Out of 558 Web citations, 367 cases are HTML files, followed by 67 PDF files, 56 Slash files, 2 DOC files.Some other files, 66, in formats which did not match these four categories, were included in the "other" category (Figure 5).
in formats which did not match these four categories, were included in the "other" category (Figure 5).The percentage of accessible files by format are indicated as lines in Figure 5. PDF files with 99% accessibility and HTML files with 96% accessibility were the most stable and DOC files with 50% accessibility were the most unstable and susceptible to decay.The high permanence of PDF files is due to their freshness and newness since PDF has recently become a standard for storing and preserving scholarly works.The stability of HTML files is based on the accessibility of all internal citations of JASSS.

5.1
This research analyzed the web citations used in refereed articles published in the first ten years of JASSS.Findings indicated that among all citations (7433), there are 558 Web citations with the average of 2.31 web citations per paper.Based the data collected, 66% of articles have Web citations and the number of articles containing web citations has increased from 43% in 1998 to 77% in 2007.In the study of accessibility and decay of web citations, findings showed that 75% (421 URLs) are accessible and the remaining 25% (137 URLs) are inaccessible at the first survey.It is notable that rate of accessibility increased to 94% after searching for web citations using the Google search engine.

5.2
Additionally, of the total number of accessible web citations, 80% were found at the cited URL and most of errors are related to message "404 not found" (43%).The .edu/.ac.xx domain with the accessibility of 98% has the most stability and persistency among all domains, while the most stable file format was PDF with the accessibility of 99% and HTML with 96% accessibility.

5.3
Our results show that the accessibility of web citations in JASSS is 75% while the reported decay by other researches appear to be higher (Harter and Kim 1996) 31%, Koehler (1999and 2002) Goh and Ng (2007) 31%, Wagner et al.(2009) 49.3%).Except for the latter, our obtained decay is lower, perhaps because of the effect of JASSS being an open access journal compared to the non-open access ones which were studied in the previous research.On the other hand, the importance of medical science information in the study of Dellavalle et al. (2003) may result in the higher accessibility of their web citations.

5.4
Our findings on error messages obtained on searching the missing citations are compatible with those reported by Germain (2000), McCown et al. (2005) and Wagner et al. (2009).All showed that error 404 (not found) is the main message.Our findings on the question of how domains affect the accessibility and decay of web citations are in agreement with that of Wagner et al. (2009), but in contrast to those of McCown et al. (2005) and Dimitrova and Bugeja (2007).The two latter reports found that .orghas the most stability and persistency among all domains.

5.5
As mentioned earlier, the Internet has become a valuable and perhaps indispensable resource in conducting scientific research.This is due to the ability of the Internet to add the convenience of rapid information retrieval and sharing as well as providing available resources that the printed media simply cannot (Lawrence and Giles 1999).Therefore, even though the authors may appreciate the risk of future inaccessibility of Internet references, they cannot easily avoid the use of Internet in their publications (Falagas et al. 2007).Moreover, consistently with the statements of Dimitrova and Bugeja (2007), we found that the Internet may prove to be an inhospitable medium especially for web based research because web citations are speedily and constantly fading away.Nevertheless, it should be accepted that the use of the Internet for identifying valuable and timely information has become inevitable for most scientists as well as the public with access to the World Wide Web, since scientific and other work is created and added in digital format on the Internet every day.

5.6
In order to overcome the risk of instability and decay of web citations and/or increase the rate of availability of URLs, some recommendations are suggested that publishers, editors, and authors should implement together (Wren et al. 2006;Casserly and Bird 2003;Germain 2000;Dimitrova and Bugeja 2007): Systematic checking of the web citations before publication Getting backup of cited information Using the more stable file formats and domains

5.7
The best solution to prevent decay or disappearance of web citations and diminish URLs decay will be use of tools such as WebCite®-enhanced reference and the Digital Object Identifier (DOI®) System.WebCite®, a member of the International Internet Preservation Consortium, is an on-demand archiving system for Web references (cited Web pages and websites, or other kinds of Internet-accessible digital objects), which can be used by authors, editors, and publishers of scholarly papers and books, to ensure that cited web material will remain available to readers in the future.A WebCite®-enhanced reference is a reference which contains-in addition to the original live URL (which can and probably will disappear in the future, or its content change)-a link to an archived copy of the material, exactly as the citing author saw it when he accessed the cited material (WebCite 2011a).Since its official launch in October 2005, more than 100 journals are already using WebCite® on a routine basis (WebCite 2011b).Thus, it is suggested that all scholarly journals, particularly JASSS as an internationally well-known journal, should call for using WebCite®-enhanced references and oblige authors to utilize WebCitation.orgfor all web citations referred to in their articles.It is notable that the references in JASSS articles now include DOI where available.The DOI System is for identifying content objects in the digital environment.
This bibliometrics study emphasizes citation analysis.Citation analysis is a well known technique that has long been used to study scholarly communication.In citation analysis studies, citations in research articles, often published in journals, are analyzed as artifacts of scholarly communication representing the citing author's use of the previously published work(Zhao and Logan 2002).As the Web is becoming a new and powerful medium for scientific communication, citation analysis and other bibliometric techniques have been applied to the study of this new phenomenon in scholarly communication(Maharana et al. 2006).

Figure 1 .
Figure 1.Percentage of articles with web citations in JASSS

Figure 2 .
Figure 2. Different methods to access web citations

Figure 3 .
Figure 3. Error messages found at inaccessible web citations 4.11 The error messages 403 (forbidden) and 401 (unauthorized) are possibly due to fire walls, filtering and Internet limitations in Iran although other messages like 404 (not found), 500 (Internet server error) are likely to be due to the deletion of web site or

Figure 4 .
Figure 4. Accessibility and decay of URLs by type of domain 4.13 As one can see from Figure 4, the domains of the cited URLs mostly include the .edu/.ac.xx type.All internal citations of JASSS (citations that refer to other articles within JASSS) were the .edu/.ac.xx type.

Figure 5 .
Figure 5. Accessibility and decay of URLs by file format types DOI names are assigned to any entity for use on digital networks.They are used to provide current information, including where they (or information about them) can be found on the Internet.Information about a digital object may change over time, including where to find it, but its DOI name will not change.The DOI System provides a framework for persistent identification, managing intellectual content, managing metadata, linking customers with content suppliers, facilitating electronic commerce, and enabling automated management of media.DOI names can be used for any form of management of any data, whether commercial or non-commercial.The DOI System is an ISO International Standard.It is managed by the International DOI Foundation, an open membership consortium including both commercial and non-commercial partners.Over 50 million DOI names have been assigned by DOI System Registration Agencies in the US, Australasia, and Europe.Using DOI names as identifiers makes managing intellectual property in a networked environment much easier and more convenient, and allows the construction of automated services and transactions (DOI 2011).

Table 1 :
Summary of previous studies on URL decay

Table 2 :
1998-2007in Table2, 241 refereed articles in total have been published in JASSS during years1998-2007 .Years 2006  and 2007with 35 articles and year 2000 with 9 articles have the most and the least number of articles, respectively.According to Table2, there were a total of 7433 citations in 241 articles, and an average 30.84citations per paper.Among all the citations (7433), there are 558 web citations with an average 2.31 web citations per paper.Distribution of articles, citations, and web citations in JASSS As can be seen in Table3, of 241 published refereed articles, 141 have web citations.59% of articles have at least one web citation.

Table 3 :
Distribution of articles with web citations in JASSS

Table 4 :
Accessibility and decay of web citations at first check

Table 5 :
Accessibility and decay of web citations after extra search attempts