Petra Gašparac [*] [1]

Show more about author

Introduction

 
The development of science with a rapid increase in the number of scientific researches, scientists and researchers, and the advent of novel scientific disciplines have led to an exponential rise in the number of journals, books, congress proceedings, dissertations, patents, technical reports and other publications bringing research results. These publications are referred to as primary publications or primary sources of information, pointing to the original character of the information presented. Convenient search of such a huge number of publications issued in different countries, in different languages and stored in different media would be impossible without special aids, i.e. secondary publications or secondary information sources, which process, analyze and summarize primary publications and help in their target search. As such a great amount of publications and their authors require proper evaluation, these secondary sources of information are also used for scientific validation, based on the high criteria they employ in the selection of primary publications to be systematically followed and processed.
 

Bibliographic and Citation Databases as Information Aids

 
Bibliographic databases as a secondary source of information arise from the need to make the follow up, search and access to most relevant literature ever more convenient of the users. Current bibliographic databases were preceded by printed publications (so-called index journals and abstract journals), their best known and oldest representatives being Index Medicus and Chemical Abstracts. These secondary publications consisted of several types of indexes (listing data on author names, topic, journal title, citations, etc. in alphabetical order), thus pointing to original articles. That is why this type of publications are occasionally referred to as index publications. In the 1960s, when computers were initially introduced in the issuing of these publications, their computer-read equivalents – bibliographic databases appeared first as an accompanying phenomenon. With time, however, the traditional printed publications gave way to electronic databases in most cases.

Bibliographic index

Each bibliographic database follows and processes (popularly termed indexing) a great number (hundreds to thousands) of carefully selected publications, most of them articles from (scientific) journals. The selection and processing are performed by renowned experts from the scientific fields covered by the given database. Each paper is represented by bibliographic record that contains data such as author name(s), title of the article, name of the publication where the article has appeared, year of publication, key words, abstract, author affiliation(s), original language of the article, type of the article, etc. These data are structured and classified into special boxes (authors, title, source, key words, etc.); the more boxes in a database, the more convenient is the record, allowing for a more precise target search and more accurate search result. Ensuring an adequate amount of data on each paper included and its classification in particular boxes enables the user to search and survey a great number of papers, and to find those of interest for his/her query. Initially, bibliographic databases did not provide an insight into the extensive text of the articles, however, in some bibliographic databases links have lately been attached by use of special tools to the extensive text of the papers found in e-journals.

Citation databases

Citation databases make a separate entity within bibliographic databases. The procedure of citation is a usual practice in scientific communication; therefore, the authors of articles appearing in scientific journals add bibliography or the list of references at the end of the paper. Citation databases are specific for presenting each article included in the base also by the respective list of references in addition to bibliographic record. These lists of references are called cited references or citations. The search according to cited references is more complete because it enables target follow up of a particular topic through all articles on the topic which are included in the database. Namely, citations are presumed to be related to the topic of the current paper by their contents, irrespective of the reasons for their citing (i.e. favorable, such as paying credit to, or for criticism and correction). In addition to allowing for literature searching according to topics, citation databases provide data on the number of citations received by a particular journal, author, or paper.

Database selectivity

One of the main characteristics of bibliographic and citation databases is their selectivity, arising from the impossibility to follow the entire body of all scientific publications in the world. These databases are predominated by journals as a major medium of scientific information dissemination in most fields. It is estimated that some 100,000 scientific, scientific-professional and professional journals are currently published in the world. An absolute comprehensiveness of bibliographic and citation databases would be both economically impractical and unnecessary. Analyses of scientific publications have revealed that in each particular field, the majority of relevant scientific results appear in a relatively small number of journals, referred to as “core journals”. Between 30% and 40% of the overall number of journals are being selected and processed in the relevant discipline-oriented bibliographic databases. There are a great number of databases (about 900) worldwide, the most widely known in the field of science and biomedicine being Medline and PubMed (biomedicine and related fields), Chemical Abstracts (chemistry and related fields), EMBASE (biomedicine, pharmacology), International Pharmaceutical Abstracts (pharmacy), Biological Abstractsand Biological Sciences (biology and biological sciences).

Selection criteria

The basic criteria to be met by the journals to be included in the international databases were established as early as towards the end of the 1960s (1). The manuscripts accepted for publication should contain new scientific information based on reproducible and reliable methods and statistical procedures. Regular appearance of the journal issues suggests that the journal appears at the stated frequency, which in turn implies an adequate amount of manuscripts warranting its vitality. The editorial board of the journal should include representatives of all subdisciplines covered by the journal. Besides editors, high quality reviewers provide an additional warranty for the quality of papers presented in the journal. Also, the journal should receive an appropriate number of citations by other journals. In addition to these criteria, the journal should also meet a series of formal (publishing) properties to be considered a serious publication, i.e. ISSN (International Standard Serial Number), Coden (journal name abbreviated to 9 letters), information on the initial year of publication, place and country of publication, and frequency of appearance. The journal should also bring thorough instructions to authors, information on the type of articles published in the journal, on reviewing procedure, etc. (2). Along to these basic criteria, each database has also set some specific criteria; e.g., on journal selection, Medline takes discipline coverage in consideration and tends not to include for indexing those journals the biomedical contents of which has already been properly covered elsewhere; EMBASE database pays attention primarily to the articles that bring some new information on drugs.

International databases of highest selectivity

The Current Contents (CC) bibliographic database, and Science Citation Index (SCI), Social Sciences Citation Index (SSCI) and Arts and Humanities Citation Index (AHCI) citation databases are international databases of highest selectivity. Until 2004, these were produced by the Institute for Scientific Information (ISI) from Philadelphia (thus being referred to as ISI databases); in 2004, they were purchased by Thomson Corporation, now Thomson-ISI. Since 1997, the above mentioned citation databases have been unified into a unique base, Web of Science (WoS). ISI databases are multidisciplinary bases and follow less than 10% of the overall periodical production in all fields worldwide, i.e. some 8,700 titles. These periodicals are considered the “core of world knowledge”. The ISI base entry criteria are very strict; only 10% to 12% of 2,000 new journals reviewed by their experts per year are being selected while continuously reconsidering the included titles and excluding those that do not meet the ISI quality criteria or are not relevant for a particular field anymore. So, the journal nucleus is not static but is permanently being changed. In the procedure of journal validation, a set of criteria are taken in consideration (3). In addition to the previously mentioned regularity of appearance, quality editorial board and reviewer team, and journal citation verification, there also are some other criteria. English language is mandatory in paper title, summary and author’s key words. Author’s address should be provided to allow for due communication to the potentially interested scientists. Bibliographic data in the list of references should be absolutely correct and complete. National periodicals are expected to be internationally visible through a due proportion of authors from different countries and geographical variety of the authors of cited papers. The ISI tendency to follow the supreme national journals is emphasized. In case of such a journal, ISI editor considers it within the category of journals from the same geographical area rather than comparing it with all journals in the field.
 

Bibliographic and citation databases as instruments of scientific validation

 
Although bibliographic and citation databases were primarily developed to serve topical search of the literature, they have gradually been employed as a validation instrument because of their high criteria used on the selection of journals to be followed up. Therefore, the criteria used to assess scientific contribution (such as the reviewing procedure) in the existing model of scientific validation also include some quantitative parameters: indexing, i.e. representation of the journal or papers in relevant databases, and data on paper citation in journals.
Indexing or representation of the journal in relevant databases contributes to its better visibility and availability, and is considered as an indicator of its impact on the international scientific production. Representation in ISI databases is considered highly relevant for their high quality filtering, and there certainly is no scientific periodical editor who would not want his journal to become part of the “core of the world knowledge”. Scientists tend to publish their papers in renowned journals, thus to ensure the best possible visibility in the international scientific community, and due respect and career promotion for them personally. In Croatia, the procedure of scientific validation of a particular author is also based on the number of published papers, whereby those indexed in relevant bibliographic databases are of special importance. For example, according to Regulations on the Conditions for Scientific-Educational Degrees at School of Pharmacy and Biochemistry, the papers published in journals indexed in relevant databases receive higher score, and those indexed in Current Contents highest score.
 

Impact factor

A particular journal can be represented in several relevant databases; however, its true usefulness for scientists will be determined by the number of citations received by its articles. Analysis of citations at ISI citation databases provides numerical indicators on the basis of which the journal’s echo in a particular field is estimated. Impact factor (IF) is most popular of these indicators. IF is a figure stating how many times on an average a scientific paper from a journal has been cited during a given period of time. IF for the current year is calculated by dividing the number of citations received in the current year for the papers published over the preceding two years by the number of papers published during this two-year period.
For example, if we are interested in a journal IF for 2005, it can be calculated by use of the following equation:
162_Gasparac_P._Formula_engl
This way, a new impact factor is calculated for each year. It should be noted that impact factor for the current year can only be calculated in the following year.
Based on IF, periodicals are ranked from the most influential to the least influential one; however, it should be borne in mind that exclusively journals within a particular scientific field can be compared on the basis of this indicator. The reason for this is variation in the number of journals published and in the citation practice varying from field to field. The fields with a very high IF at the same time are the fields with a very great number of journals and published papers, and thus a high number of citations per paper. These are fields that undergo fast development, such as biochemistry and molecular biology, where the highest IF for 2005 was 33.456. In minor fields, on the other hand, a small number of papers are published, thus the expected rate of citation being low. A good example is mathematics, where the highest IF in 2005 was 2.323 (6). The lists of journals ranked according to IF serve, among others, as orientation to scientists on selecting the journal to which to submit their paper for publication. It is a matter of prestige for a scientist to publish a paper in a journal with a very high IF, and most scientists want to have their manuscripts published in these periodicals (4). Although this parameter has been designed as an aid in assessing the journal quality, it has currently found major application as an aid on validating the quality of a scientist’s work and of a particular paper, which is not fully justified. The true response to a paper will rather be indicated by the number of citations received from other papers, irrespective of the journal’s IF. This is supported by the fact that in the majority of journals, 20% of the articles contribute some 80% of the citations, while a large proportion of articles receive no citation ever (5).
 

Rate of citation

For a number of reasons, caution is warranted on using the number of citations as an indicator of a particular paper quality. One cannot state that the papers receiving no citation are not read at all or that they have no scientific value, although a scientist’s scientific contribution frequently reflects in a high rate of citation of his/her papers. In addition, citation by itself need not imply giving credit in terms of appraisal but may also be motivated by the need of pointing to a correction, criticism or disapproval of the others’ ideas and works. Furthermore, self-citations are regularly found in scientific paper bibliographies, where the authors refer to their own previous papers; in this sense, self-citations are a natural process. Yet, self-citation may also be employed to upgrade the rate of citation (one’s own, or of the colleagues from the school, university or institute) in a relatively artificial way. Most authors agree on the 10% to 20% rate of self-citation to be acceptable, depending on the field of scientific research. In Croatia, however, the procedure of a scientist’s validation required for scientific-educational degree appointment should be substantiated by evidence on the response to the respective scientists’ papers measured by the number of citations, including self-citations, without any additional explanation of the nature of these citations.
As ISI emphasizes the multidisciplinary and international character of its databases, along with the quality of the periodicals processed, they have received considerable criticism during their history concerning representation of national journals and particular disciplines. This criticism has addressed their orientation to journals published in English language and in industrialized countries (mostly in the USA) to the account of small countries, developing countries, and non-English speaking areas. In addition, big commercial publishers like Elsevier, Springer, Wiley, Blackwell, etc., have advantage over small publishers such as academic institutions or professional organizations. Unlike the latter as small, non-profit, voluntary scientific journals, the former have significantly better chances to enter the “core” of world knowledge, with their well-established business mechanisms, great financial, technological and manpower potentials, from journal design through the system of paper reviewing and ensuring access to large journal “packages” in the form of databases with integral texts of the articles, easily available to scientists worldwide (2). This issue makes another shortcoming of the current model of scientific validation that is based on indexing and citation data deriving from ISI databases.
Finally, it should be noted that the quantitative indicators presented (number of published and indexed papers, number of citations, and impact factor) cannot be used as quality indicators or as exclusive parameters on validation of a journal or a scientist’s scientific work. On validation of the scientific contribution, quantitative parameters can only serve as an adjunct to the contents based validation of papers through reviewing procedure performed by competent experts.
 

Databases Relevant for the Fields of Biomedicine and Natural Sciences

 
A number of bibliographic and citation databases relevant for the fields of biomedicine and natural sciences are briefly described below. Web addresses are only listed for the databases the access to which for the Croatian university and scientific institutions has been ensured by the Croatian Ministry of Science, Education and Sports. (The list of all databases accessible by the Croatian scientific community is found on the pages of the Online Database Center: http://www.online-baze.hr).
 

Current Contents (CC)

Current Contents (CC), the most famous database in the world, by its seven sections covers all fields of science: Agriculture, Biology and Environmental Sciences; Clinical Medicine; Engineering, Technology and Applied Sciences; Life Sciences; Physical, Chemical and Earth Sciences; Social and Behavioral Sciences; and Arts and Humanities. All these sections are integrated into a unique database. CC was preceded by a printed publication developed in the 1960s, which brought copies of the contents of particular journal issues (thus the name “current contents”) grouped according to scientific disciplines, with the aim to regularly inform the scientists worldwide on the just published scientific data. Currently, CC covers in the form of bibliographic records more than 7,500 leading periodicals in the world and more than 2,000 books and congress proceedings.
 

Web of Science (WoS)

Web of Science (WoS) integrates three citation databases, i.e. Science Citation Index (SCI), Social Sciences Citation Index (SSCI) and Arts & Humanities Citation Index (AHCI), which taken together index more than 8,700 scientific periodicals from all scientific fields. There is 90%-100% overlapping between Current Contents and citation databases. Currently, they cover 8,700 journals. When ISI launched the development of Science Citation Index as the first citation index in the 1960s, it was based on some 600 titles in the fields of natural and applied sciences. As the number of periodicals grew in the world during the decades to come, the journal “core” increased accordingly. Until the recent advent of Scopus in 2004, Web of Science was the only world citation database.

Scopus

Scopus is the latest product launched by Elsevier, the biggest scientific publisher in the world. It is a bibliographic and citation database like Web of Science but with a by far greater catchment area. More than 14,000 reviewed journals from natural, technical and social sciences and biomedicine issued by more than 4000 publishers are being indexed. In addition to periodicals, Scopus covers 250 million of quality and relevant web sites including 13 million patents. Also, Scopus completely covers two separate biomedicine databases, Medline and EMBASE. The specificity of this database is that more than 60% of the journals included come from non-American countries. The conditions that a journal has to meet to be covered by Scopus are as follows: paper title and abstract in English language (whereas extensive text of the paper may be in any other language), regular appearance, some form of quality control (e.g., reviewing procedure), and high overall quality (which is assessed by the number of citations the journal receives via Scopus, reputation of the publisher, authors and editorial board, and some other parameters). In contrast to ISI citation databases that tend to exclusivism, Scopus plans to include all valuable sources of information and to develop further depending on the rise in the number of new information.

Medline

Medline is a first-rate source of information in the fields of medicine, population and reproductive biology, and other medicine and health care related fields. It is produced by the U.S. National Library of Medicine. Prior to its appearance in e-form, Medline was issued in the form of three printed publications separate in terms of contents, i.e. Index Medicus, Index to Dental Literature, and International Nursing Index. This database processes articles from some 4,600 periodicals published in over 80 countries, whereby only some papers are being selected and indexed (as distinguished from Current Contents, which follows all articles from the indexed journals). PubMed is a free of charge version of the Medline database available to the public.
 

EMBASE

EMBASE is a biomedical database produced by Elsevier, preceded by the printed secondary publication Excerpta Medica. It includes approximately 5,000 periodicals from some 70 countries, covering an array of fields such as drug research, pharmacology, pharmacy, pharmacoeconomics, pharmaceutics and toxicology, clinical and experimental medicine, drug addiction and abuse, psychiatry, forensic science, and biomedical engineering and instrumentation. EMBASE selectively covers nursing, dental medicine, veterinary medicine, psychology, and alternative medicine. Like Medline, it is international by its scope, however, with emphasis put on European journals (in contrast to Medline, which is focused on American titles). There is a 80% overlapping in titles between Medline and EMBASE. It should be noted that Biochemia Medica has been indexed in EMBASE and Scopus since 2006.
International Pharmaceutical Abstracts (IPA)
International Pharmaceutical Abstracts (IPA) includes more than 800 journals and covers information on the following fields: drug reactions, toxicity, drug trials, drug assessment, drug interactions, medicinal products of synthetic and biologic origin, drug stability, pharmacology, preliminary drug testing, pharmaceutical chemistry, drug analysis, drug metabolism, pharmacognosy and methodology. Abstracts of reports presented at major pharmaceutical meetings are also included. In 1964, the American Society of Health System Pharmacists (ASHP) launched it in the form of printed publication, and since the 1970s it has also been issued in e-form. Since 2005, it is owned by Thomson Corporation.
 

Chemical Abstracts (CA)

Chemical Abstracts (CA) produced by the Chemical Abstracts Service (CAS), a section of the American Chemical Society (ACS), is the largest and most comprehensive guide through chemistry and related fields literature. It covers more than 8,000 journals and patents from 26 countries as well as congress proceedings, books, dissertations, and technical reports. In addition to bibliographic data with abstracts from the mentioned publications, Chemical Abstracts brings information on more than 30 million organic and inorganic substances, and on 58 million DNA sequences (known as CAS Registry). Each substance is allocated a registration number (CAS registry number) that is now widely used for uniform identification of chemical substances. SciFinder and SciFinder Scholar are e-equivalents of the printed secondary publication Chemical Abstracts.
 

Biological Abstracts (BA)

Biological Abstracts (BA) covers literature in natural sciences, among others microbiology, biology, biochemistry, biomedicine, biotechnology, botany, ecology, genetics, nutrition, and pharmacology. There is 30% overlapping between Medline and Biological Abstracts; however, Medline is focused on clinical medicine, whereas Biological Abstracts provides a more thorough coverage of preclinical and experimental medicine, pharmaceutical botany, pharmacognosy, proteomics, nanotechnology, and gene therapy. It follows up over 5,000 periodicals from more than 100 countries.
 

Biological Sciences

Biological Sciences offers access to the literature in the fields of biochemistry, biotechnology, ecology, genetics, microbiology, molecular biology, zoology and some aspects of agriculture, medicine and veterinary medicine. This database covers more than 6000 periodicals, congress proceedings, technical reports, books and patents from 1982 to the present. It is produced by Cambridge Scientific Abstracts (CSA).

References

1.     Zwemer RL. Identification of journal characteristics useful in improving input and output of a retrieval system. Fed Proc 1970;29(5):1595-1604.
2.     Jokić M. Bibliometrijski aspekti vrednovanja znanstvenog rada. Zagreb: Sveučilišna knjižara; 2005.
3.     The Thomson scientific journal selection process. Available at:http://scientific.thomson.com/free/essays/selectionofmaterial/journalsel... Accessed October 26, 2006.
4.     Lawrence PA. The politics of publication. Nature 2003;422(6929):259-261.
5.     Gisvold SE. Citation analysis and journal impact factors – is the tail wagging the dog? Acta Anaesth Scand 1999;43(10):971-973.
6.     Journal Citation Reports, Science Edition, 2005. Available at: http://portal.isiknowledge.com/. Accessed October 28th 2006.