Accessing the content of nineteenth-century periodicals: the Science in the Nineteenth- Century Periodical project (SciPer)

20 As they have strained to house the vast runs of dusty and rarely consulted quarto and octavo volumes, librarians have had good reason to be far more acutely aware than most academics of the immense bulk represented by nineteenth-century periodicals. Even so, the recent assertion by John North that ‘periodicals and newspapers are more than 100 times the volume of printed books’ might be greeted with surprise.1 There is, nevertheless, clear evidence that, if not exceeding books by quite such a margin, nineteenth-century periodicals and newspapers did exceed them significantly, both in numbers of titles and in commercial value. North’s own Waterloo Directory of English Newspapers and Periodicals, 1800–1900 suggests that, by the end of the century, more than 20,000 periodical and newspaper titles were in print in England, while the Nineteenth-Century Short-Title Catalogue suggests that only a third that number of new books were issued each year in the main British publishing centres.2 This disparity is also reflected in Simon Eliot’s finding that, when measured by net commercial value of printed output, periodicals and newspapers represented two-thirds as much again as books in 1907, a figure which rose to 250% more than the value of books by 1924.3 Such findings as these give ample justification for John North’s claim that ‘Periodical literature is the largest single source of Victorian material available to us’.4 However, despite the activities of a dedicated body of periodical researchers (the Research Society for Victorian Periodicals will shortly celebrate its fortieth anniversary), historical scholars have generally continued to be circumspect in their use of this vast and valuable corpus of material. Among the leading reasons for this have undoubtedly been the difficulties of access. With such an overwhelming weight of printed matter, how are historians to find material relevant to their particular researches? Even if they are aware of the relevant periodicals (an assumption it is far from safe to make), how are they to pinpoint the key references across many thousands of pages? In this article, I will begin by considering the traditional means of accessing the content of Victorian periodicals. I will then outline the approach taken Accessing the content of nineteenth-century periodicals: the Science in the NineteenthCentury Periodical project (SciPer)

As they have strained to house the vast runs of dusty and rarely consulted quarto and octavo volumes, librarians have had good reason to be far more acutely aware than most academics of the immense bulk represented by nineteenth-century periodicals. Even so, the recent assertion by John North that 'periodicals and newspapers are more than 100 times the volume of printed books' might be greeted with surprise. 1 There is, nevertheless, clear evidence that, if not exceeding books by quite such a margin, nineteenth-century periodicals and newspapers did exceed them significantly, both in numbers of titles and in commercial value. North's own Waterloo Directory of English Newspapers and Periodicals,  suggests that, by the end of the century, more than 20,000 periodical and newspaper titles were in print in England, while the Nineteenth-Century Short-Title Catalogue suggests that only a third that number of new books were issued each year in the main British publishing centres. 2 This disparity is also reflected in Simon Eliot's finding that, when measured by net commercial value of printed output, periodicals and newspapers represented two-thirds as much again as books in 1907, a figure which rose to 250% more than the value of books by 1924. 3 Such findings as these give ample justification for John North's claim that 'Periodical literature is the largest single source of Victorian material available to us'. 4 However, despite the activities of a dedicated body of periodical researchers (the Research Society for Victorian Periodicals will shortly celebrate its fortieth anniversary), historical scholars have generally continued to be circumspect in their use of this vast and valuable corpus of material. Among the leading reasons for this have undoubtedly been the difficulties of access. With such an overwhelming weight of printed matter, how are historians to find material relevant to their particular researches? Even if they are aware of the relevant periodicals (an assumption it is far from safe to make), how are they to pinpoint the key references across many thousands of pages? In this article, I will begin by considering the traditional means of accessing the content of Victorian periodicals. I will then outline the approach taken Accessing the content of nineteenth-century periodicals: the Science in the Nineteenth-Century Periodical project (SciPer) Nineteenth-century periodicals significantly outnumber books from that era, and present historians with an immensely valuable set of sources, but their use is constrained by the difficulty of identifying relevant material. For many periodicals, contents pages and volume indexes have been the only guide, and the few subject indexes that exist usually provide only an indication of the subjects mentioned in the article titles. By contrast, the Science in the Nineteenth-Century Periodical project (SciPer) indexed the science content of general-interest periodicals by skim-reading the entire text. The project's approach to indexing is described and the relative merits of indexing and digitization in aiding researchers to locate relevant material are discussed. The article concludes that, notwithstanding the more sophisticated search interfaces of more recent retrodigitization projects, human indexing still has an important role to play in providing access to the content of historic periodicals and in mapping their data structure.

JON TOPHAM
Lecturer in History of Science University of Leeds to the problem by the Science in the Nineteenth-Century Periodical project (SciPer), which involved manual indexing of periodical content. Finally, I will reflect on the relative merits of such an approach when compared with retro-digitization.
When I began periodical research some 20 years ago, much of the work had to be done by identifying relevant periodicals from existing historical literature and various periodical bibliographies and catalogues, before reading through innumerable contents pages (and volume indexes where they existed) and skim-reading the periodical texts, in order to track down relevant articles. (The first part of this exercise has certainly been eased in the interim by the advent of John North's monumental Waterloo Directory, which, by allowing subject-searching, has provided an invaluable means of identifying relevant periodicals.) In addition, however, there were in existence a relatively small number of printed indexes to the contents of periodicals which sometimes provided invaluable assistance.
The first class of index comprised author-title indexes usually based around particular periodicals or groups of periodicals. The most notable singleperiodical index was Palmer's Index to The Times, which was issued separately in quarterly volumes, starting in 1868, although the advent of electronic publishing latterly provided a much more convenient, fully integrated version. Perhaps more common were author-title indexes of groups of periodicals, such as annuals or little magazines. Particularly significant here, of course, was the Wellesley Index to Victorian Periodicals (5 vols, 1966-89), which provided an index to articles in 43 leading magazines and reviews, identifying anonymous authors as far as possible. More recently, Chadwyk-Healey's online Periodicals Content Index has taken a similar approach to a far wider range of periodicals, although without the scholarly focus on the identification of anonymous authors.
A second class of index, which to some degree graduated into this first class, comprised periodical subject indexes. Often prompted by professional requirements, many of the early subject indexes focused on particular groups of specialist periodicals, such as the Royal Society's Catalogue of Scientific Papers, 1800-1900 (19 vols, 1867-1925)  Useful as they could be, both of these classes of index had only limited usefulness for the historian seeking to identify articles on a particular topic. Author-title indexes, of course, make no pretence to offer the reader a guide to the subjects of articles, although both title and author may incidentally do so. Even the subject indexes mentioned, however, provide only limited help. Poole's Index, for instance, takes as its subject terms words derived directly from article titles. As Scott Bennett has shown, this results in 26 articles on the Paris Commune appearing under 13 different index headings. 5 A key part of the motivation for the Nineteenth-Century Readers' Guide was the desire to substitute a standard list of subject headings for Poole's more haphazard list, but there is no evidence to suggest that subjects were assigned on the basis of any information other than the title.
Article titles are often, however, a poor guide to their content. For instance, the title of the Boy's Own Paper article, 'Men Who Are Talked About', does little to alert scholars to the extensive material it contains on both Darwin and Edison ( Figure 1). Neither does the title of the Punch illustration, 'Animal Magnetism; Sir Rhubarb Pill Mesmerizing the British Lion', immediately suggest a caricature of Sir Robert Peel ( Figure 2). Consequently, a number of scholars in recent years have put together subject indexes and bibliographies of articles based on a more detailed engagement with the source material. Thus, for instance, Eugenia Palmegiano's Health and British Magazines in the Nineteenth Century (1998) provides brief characterizations of medical articles from nineteenth-century magazines, the titles of many of which give no clue to their medical content. Supremely, Ruth Richardson and Robert Thorne's The Builder: Illustrations Index, 1843-1883 (1994) provides a fantastically detailed subject analysis of over 12,000 illustrations, based on personal inspection.
Such indexing is, however, extremely time consuming. When the Science in the Nineteenth-Century Periodical project was inaugurated at the Universities of Leeds and Sheffield eight years ago, the project staff seriously debated whether detailed manual indexing of this sort was warranted in an increasingly digital age. Would the advance of digitization rapidly supersede manual indexing? Our conclusion at the time was that it would not. Experience with existing retro-digitization projects, like the Internet Library of Early Journals, suggested that searches could return huge numbers of hits, many of which were irrelevant or trivial. For instance, searching a 20-year run of Blackwood's Edinburgh Magazine returned 3,247 hits for the word 'science', including multiple references to 'con-science'. 6 Employing trained historians to index the material would, we believed, add considerable value to the data that could only be added by such means.
The aim of the SciPer project was to identify and analyse the representation of science, technology and medicine, as well as the interpenetration of science and literature, in the general periodical press in Britain between 1800 and 1900. The approach taken was consequently to skim-read each periodical from cover to cover, however unlikely it seemed that individual articles would contain relevant material, in order to locate significant scientific references incorporated in fiction, news commentaries, poetry and even sermons. Articles were included in the index if references were either of relevance to the current interests of historians, or revealing of the science as it was conceived in its original historical context. However, while articles containing longer references of scientific relevance were catalogued however conventional their contents, those containing brief or passing references were handled with more selectivity. Thus, where a passing reference of scientific relevance was similar in content to many much longer references appearing in the same journal, it was omitted, while a passing reference of a more unusual or novel nature was generally included.
All articles selected for indexing were assigned a standard bibliographical record, but the project's larger aspirations demanded that additional interpretative information be provided, most notably genre classifiers (drawn from a restricted thesaurus of 65 terms) and subject classifiers (from a similar thesaurus of 350 terms). The subject classifiers are of two types, describing either scientific fields, like 'economic geology', 'matter theory', or 'physiology', or historical themes, like 'amateurism', 'experiment', or 'medical practitioners'. In addition, the index provides regularized records of people, institutions and publications mentioned in significant ways in the articles, with special records to indicate when publications were reviewed, extracted, abstracted, or merely noticed. Most of the entries also include a descriptive paragraph, ranging from 15 to 1,500 words, giving further information about the article content and sometimes containing quotations ( Figure 3).
The SciPer Index contains entries for approximately 15,000 articles indexed from runs of sixteen different periodicals, chosen to reflect a wide variety of genres across the century. 7 These range from the Wesleyan-Methodist Magazine to the Review of Reviews, and from the Youth's Magazine to Punch. The index can be accessed in both browsable and searchable interfaces. By allowing historians to browse through the index entries within a framework of periodical volumes, issues, sections and subsections, the main browsable interface to some extent replicates the original structure of the periodical, providing the context that may be crucial for historical interpretation. In addition, historians can browse through static indexes of the authors and illustrators of articles, and of people, books, periodicals and institutions that are mentioned in them, linking through to all relevant articles. These relatively conventional elements of the periodical index are supplemented by a complex search facility, using the full range of indexed fields to permit highly sophisticated searches. Thus, the researcher can search for all fictional articles relating to botany written by women born between 1800 and 1850 -and indeed there are some! It is easy to see that the SciPer Index circumvents the problems identified above with both the titlebased printed subject indexes of the past and the first generation of digitized periodicals. The index identifies relevant references within articles, while avoiding the indiscriminately large number of hits encountered with full-text searching. As one reviewer has observed, 'If the scanner is the combine harvester of electronic scholarship, then SciPer (in contrast) should be seen as a quality organic producer'. 8 Of course, the main drawback of this approach is that it is immensely time consuming, and in three years the project team indexed 160 volumes. In conclusion, therefore, I will reflect on the advance of retro-digitization in the eight years since the SciPer project began, and re-examine the question of its relation to manual indexing.
The progress in retro-digitization of nineteenthcentury periodicals since 1999 has been considerable. At the latest count, 60 of the titles digitized in JSTOR include nineteenth-century runs, and back-files from many commercial publishers have added dozens more. Two significant JISC-funded projects -the Medical Journals Backfiles Digitisation project and the British Library Nineteenth-Century Newspapers project -are in the process of adding further titles. 9 Moreover, Chadwyck-Healey have developed their PCI Full Text product (recently renamed Periodicals Archive Online) to incorporate an increasing range of nineteenth-century titles. Particularly notable in this regard is the launch in 2006 of the first part of their British Periodicals Online, containing 160 titles from the UMI microfilm collection Early British Periodicals. Furthermore, Thomson Gale are preparing to launch their impressive full-text collection of nineteenth-century periodicals in the near future, which will make a further significant impact on the field.
The advent of these resources marks an epoch in historical scholarship. The quantity of often difficult-to-locate periodical material which they make widely and easily available is staggering. Moreover, the capacity they provide for locating references to individuals, institutions, publications, events and concepts can circumvent weeks, months, or even years of painstaking research, and thus make possible research projects that were previously unthinkable. Do they, therefore, render the retroindexing of SciPer redundant? My view is that they do not. The quality of OCR has improved immeasurably since the ILEJ project, leading to fewer spurious search results, and publishers have sought to provide increasingly sophisticated search interfaces using complex metadata and results clustering. Nevertheless, searches still frequently produce prohibitively large and inchoate data sets, which threaten to submerge the historian. Of course, one aspect of the problem is that historians need to develop new techniques to maximize both the efficiency of their research and the historical reliability of their findings. However, there are limits to the extent to which electronic searching of these relatively unstructured texts can be made to answer the historian's requirements. Thus, to take a single example, a search for 'music and science' in British Periodicals Online does not return a hit for the anecdote in the Mirror of Literature concerning the geologist Jean Deluc's love of music, whereas the SciPer Index identifies it by the subject keywords, 'Music' and 'Scientific Practitioners'.
The continuing value of human indexing is clearly recognized by those involved in retrodigitization. As I have already observed, the capture of metadata to allow more structured searching is, to a greater or lesser extent, a standard feature of the leading digitization projects. Moreover, Chadwyck-Healey has been quick to exploit existing indexes to provide enhanced searchingboth with their Palmer's Full-Text Online and, more recently, with their cross-linking between electronic versions of the Wellesley and Poole's indexes and full-text periodicals in Periodicals Archive Online. This kind of amalgamation of digitization and traditional indexing clearly has the potential to provide the best of both worlds, and suggests a continuing role for retro-indexing. In addition, academic indexing provides an important conceptual resource for large-scale digitization projects. By exploring and analysing the data structure of the periodical in order to produce a useful index, the historical or literary scholar provides conceptual tools of importance in further developing the structuring of digital texts. This is one of the aspirations of the Nineteenth-Century Serials Edition project (NCSE), currently underway at Birkbeck College. The project is preparing an electronic 'exemplary edition' of six nineteenth-century newspaper and periodical titles, combining full-text images with sophisticated searching and indexing functions. 10 Seeking to address the concern that digitization can readily flatten out many elements of the data structure of the original periodical -particularly as regards its textual and illustrative content and its physical form -they have developed data and concept maps of extraordinary complexity. Aspects of this mapping will inform the data architecture of their own edition, but, in addition, they consider this an opportunity to 'stretch and develop current perspectives and ways of thinking about these materials'. 11 Moreover, the project is combining its exhaustive conceptual analysis with pioneering use of automated indexing techniques, so that, while the assignment of metadata is dependent on the initial academic analysis, academics do not have to spend their time assigning the metadata manually. Such techniques include automated indexical mark-up (e.g. from a name index), and data-mining (pattern-searching the dataset and relating the findings to a concept map), which might make possible the automated generation of thematic metadata. The combination of conceptual and technical analysis undertaken by projects like the NCSE promises significant dividends for those engaged in much more extensive retro-digitization projects. I, for one, await the outcome of their researches with great interest. Thus, my general conclusion remains, that academic interest in indexing nineteenth-century periodicals will continue to provide important resources for those involved in digitization, as we strive to achieve increasingly sophisticated means of accessing this vast and under-used resource.