PrestigeRank: A new evaluation method for papers and journals

https://doi.org/10.1016/j.joi.2010.03.011Get rights and content

Abstract

This paper studies how missing data in the PageRank algorithm influences the result of papers ranking and proposes PrestigeRank algorithm on that basis. We make use of PrestigeRank to give the ranking of all papers in physics in the Chinese Scientific and Technology Papers and Citation Database (CSTPCD) published between 2004 and 2006. We compared PrestigeRank result with PageRank and citation ranking. We found PrestigeRank is significantly correlated with PageRank and citation counts. We also used paper citation networks to rank journals, and compared the result with that of journal citation networks. We proposed PRsum, PRave, and compared both of them with citation counts and impact factor. It indicates PRsum, PRave can reflect journal's authority favorably. We also discuss the advantages and disadvantages, application scope and application prospects of PrestigeRank in the evaluation of papers and journals.

Introduction

Citation analysis is one of the most widely used bibliometric tools for ranking papers and journals. Garfield proposed that a citation count could be used to measure the impact of publications (Garfield, 1955) and the impact factor as a tool in journal evaluation (Garfield, 1972). Because these measures are easily comprehensible and quickly obtainable, they are being used more and more widely. However, the citation counts and impact factor have intrinsic limitations (Buela-Casal, 2004, Maslov and Redner, 2008). They assume that all citations are equal, no matter whether they are from an important paper or a poor-quality paper, which is clearly unreasonable.

Many researchers refer to the Search Engine Algorithms to obtain a solution to the importance differentiation of citations. Bollen, Rodriguez, and Van de Sompel (2006) undertook a journal ranking study for journal citation data from ISI using the PageRank algorithm. Using PageRank, the SCImago Research Group defined the SCImago Journal Rank (SJR) based on the SCOPUS Database (http://www.scimagojr.com). The Journal Citation Report promulgated by ISI in 2008 used a new journal evaluation index, Eigenfactor, the calculation of which is based on the PageRank algorithm, but eliminates self-citations in journals (http://www.eigenfactor.org). Having proposed a new journal ranking algorithm based on PageRank and the HITS algorithm, Su et al., 2009a, Su et al., 2009b, Su et al., 2009c made use thereof to do an empirical study of Chinese science and technology journals. Several studies (Chen et al., 2007, Li and Zhai, 2007, Luo et al., 2007, Ma et al., 2008, Walker et al., 2007) applied PageRank algorithm to the publication citation network for measuring the importance of scientific papers. The common factor in all these studies is the comprehensive consideration of the quantity and quality of citations to calculate the scores of journals or papers. They differentiate the importance of the citations, which is undoubtedly more reasonable than only considering the citation counts. Nevertheless, there are at least two new questions worth considering. First, with regard to journal evaluation, is the algorithm based on a journal level citation graph able to cover the differences in citations of the paper level graph? If this is possible, we should conduct journal evaluation using a network of paper citations. Because the quality of a journal is decided by the quality of all the papers contained therein, if the evaluation of the papers can be addressed, the problem of journal evaluation will be solved. Second, in contrast to a web page link graph, there is a wide range of types of citations in papers in a publication citation graph, including traditional journal articles, as well as conference papers, books, standards, patents and network information. There is no single database that contains all these types of documents and the issue of missing documents therefore has an effect on the metric. None of the above studies on designing networks of paper citations considered how to deal with missing parts of the literature.

To solve the two issues highlighted above, we propose PrestigeRank, a new algorithm for a publication citation graph based on PageRank. We also aim to find a solution suitable for cases where there are missing papers in the database citing network.

Section snippets

Data

The Chinese Scientific and Technical Papers and Citations Database (CSTPCD) is a scientific publications system developed by the Institute of Scientific and Technical Information of China. CSTPCD is based on representative domestic scientific and technical journals. It contains more than 1700 Chinese scientific and technological journals published in English and Chinese (2008) with the source journals covering mathematics, information and systems science, physics, mechanics, chemistry,

The results of citation counts, PageRank and PrestigeRank rank

As can be seen from Table 5, there is a positive correlation in the PrestigeRank, PageRank and citation counts. The Spearman correlation coefficient between the PrestigeRank and citation count is greater than between PageRank and citation counts, which suggests a correlated relationship between PrestigeRank and citation counts is higher than between PageRank and citation counts.

As mentioned earlier, the PageRank value of one paper is in inverse proportion to the references counts of other

Discussion and conclusion

In this work we discuss how papers that are cited by papers in the collection but are not themselves included in the collection in the PageRank algorithm influence the ranking of paper and propose PrestigeRank, which use a “virtual node” to represent those references not included in the collection and receives all citations that come from papers in the collection. We make use of PrestigeRank to give the ranking of all physics-related papers from 2004 to 2006 in CSTPCD. Furthermore, we compared

Acknowledgments

This work is supported by the Ministry of Science and Technology in China under contract 2006BAH03B05, National Natural Science Foundation of China (70973118) and the Foundation of the Institute of Scientific and Technical Information of China (YY-200902). The authors thank three anonymous reviews for valuable comments and suggestions which helped shape and improve this paper. We also thank Xiong Ping for her assistance in polishing this article.

References (21)

  • S. Brin et al.

    The anatomy of a large-scale hypertextual web search engine

    Computer Networks and ISDN Systems

    (1998)
  • P. Chen et al.

    Finding scientific gems with Google's PageRank algorithm

    Journal of Informetrics

    (2007)
  • N. Ma et al.

    Bringing PageRank to the citation analysis

    Information Processing and Management

    (2008)
  • J. Bollen et al.

    Journal status

    Scientometrics

    (2006)
  • G. Buela-Casal

    Assessing the quality of articles and scientific journals: Proposal for weighted impact factor

    Psychology in Spain

    (2004)
  • E. Garfield

    Citation indexes for science: A new dimension in documentation through association of ideas

    Science

    (1955)
  • E. Garfield

    Citation analysis as a tool in journal evaluation

    Essays of An Information Scientist

    (1972)
  • J.M. Kleinberg

    Authoritative sources in a hyperlinked environment

  • A. Langville et al.

    Google's PageRank and Beyond: The science of search engine rankings

    (2006)
  • C. Li et al.

    Exploration of PageRank-based citation analysis method

    Information Studies: Theory & Application

    (2007)
There are more references available in the full text version of this article.

Cited by (39)

  • A novel method to identify influential nodes in complex networks based on gravity centrality

    2022, Information Sciences
    Citation Excerpt :

    The complex network is an intuitive and efficient way to describe the connection between people or objects. An increasing number of researchers in the past decades applied this model to solve problems in various fields, such as finding social network spreaders [1–2], analyzing drugs and proteins [3–4], ranking professional sports players [5], spreading mechanisms of financial risks [6–7] and predicting outstanding scientists or journals [8–10]. Among them, finding influential nodes in complex networks is an enduring research hotspot, as it is a common problem in different fields.

  • Statistics in everyone's backyard: An impact study via citation network analysis

    2022, Patterns
    Citation Excerpt :

    The method is fully automated and can be applied to any external topic of choice, providing a different application for PPR and related techniques. In contrast, existing applications of PPR31–33 and various modified versions of the algorithm34–36 to citation networks are focused on analyzing the internal network of a field and ranking the papers in terms of their internal impact. ( 3) Under a commonly used network model, we provide the first theoretical justification for the combined use of aPPR and conductance to identify a target community, demonstrating that our approach is principled and generalizable.

  • A new citation concept: Triangular citation in the literature

    2021, Journal of Informetrics
    Citation Excerpt :

    On the basis of citation and cooperation, Li, Liu, and Zhang (2016) constructed an evaluation framework for the academic influence of scientific papers based on the three dimensions of knowledge input, knowledge output, and knowledge flow. Based on the PageRank algorithm, Su et al. (2011) proposed a quality evaluation algorithm for a single paper by adding factors such as journal impact factor, paper publication time, publishing organization, author authority, and subject differences. In summary, the theories and methods of examining citations between literature have been widely used in the frontier exploration of science, the evolutionary analysis of knowledge, academic evaluation, and other fields.

View all citing articles on Scopus
View full text