Science in the Archives Pasts, Presents, Futures
edited by Lorraine Daston
University of Chicago Press, 2017
Cloth: 978-0-226-43222-9 | Paper: 978-0-226-43236-6 | Electronic: 978-0-226-43253-3
DOI: 10.7208/chicago/9780226432533.001.0001
ABOUT THIS BOOKAUTHOR BIOGRAPHYREVIEWSTABLE OF CONTENTS

ABOUT THIS BOOK

Archives bring to mind rooms filled with old papers and dusty artifacts. But for scientists, the detritus of the past can be a treasure trove of material vital to present and future research: fossils collected by geologists; data banks assembled by geneticists; weather diaries trawled by climate scientists; libraries visited by historians. These are the vital collections, assembled and maintained over decades, centuries, and even millennia, which define the sciences of the archives.
 
With Science in the Archives, Lorraine Daston and her co-authors offer the first study of the important role that these archives play in the natural and human sciences. Reaching across disciplines and centuries, contributors cover episodes in the history of astronomy, geology, genetics, philology, climatology, medicine, and more—as well as fundamental practices such as collecting, retrieval, and data mining. Chapters cover topics ranging from doxology in Greco-Roman Antiquity to NSA surveillance techniques of the twenty-first century. Thoroughly exploring the practices, politics, economics, and potential of the sciences of the archives, this volume reveals the essential historical dimension of the sciences, while also adding a much-needed long­-term perspective to contemporary debates over the uses of Big Data in science. 

AUTHOR BIOGRAPHY

Lorraine Daston is director of the Max Planck Institute for the History of Science in Berlin and is visiting professor in the Committee on Social Thought at the University of Chicago.

REVIEWS

Science in the Archives achieves startling coherence despite its enormous range. The science at stake embraces varieties of knowledge making that include scientific disciplines like genetics, astronomy, and climatology, yet that also reach back to Antiquity and forward to the databases of today. Each of the twelve chapters argues a different case, together unfolding the crucial generative power of archival practice.  This volume—rich, rigorous—should be required reading for anyone who thinks the sciences and the humanities are really distinct domains.”
— Lisa Gitelman, New York University

“Renowned historian of science Daston and her line-up of stellar scholars show that how data and information are organized is part of the scientific process. This essential book traces how archives provided crucial support to the process of creating scientific data.”
— Jacob Soll, University of Southern California

“This pathbreaking book brilliantly illuminates how scientific work consistently relies on the making and keeping of records. Twelve richly researched studies highlight long continuities in the hopes and resources invested in archiving of scientific research for current and future use.”
— Ann Blair, Harvard University

“The twelve essays in this elegantly crafted volume explore, as editor Lorraine Daston puts it, ‘how the sciences choose to remember past findings and plan future research.’ They look at ways in which scholars have preserved and ordered scientific knowledge from antiquity to the present….[T]he book raises important historical questions about how scholars know what has been done in the past, incorporate it into their own work, share it with others, and plan to preserve it for the future. At the same time, it offers intriguing insights into the practice of scholarly communities over a wide swathe of western history and a model of individual papers transformed into a coherent and readable whole.”
— Mathematical Association of America

“This volume’s articles deal with historically universal problems of accumulation, preservation, management, interpretation, and dissemination of data. The articles explore how stores of observations were used in the past and contemplate present techniques for retrieving electronic information. Additionally, the articles consider the types of documents in which knowledge was recorded, and evaluate the consequences of ‘data deluge.’ Political controversies have arisen over how data is gathered and analyzed, perhaps particularly in those disciplines in which the science is its archive, such as evolutionary genetics and climatology. Ownership of information and access to evidence are perennial dilemmas. Most authors focus on the biological sciences, although astronomy and paleontology are also addressed. Like editor Daston, director of the Max Planck Institute for the History of Science in Berlin, the contributors are experienced and well-regarded historians of science. While the text is likely of most interest to researchers or readers of works such as Ann Blair's Too Much to Know, this reviewer can see herself using parts of this collection in a historiography class, together with a textbook such as John Tosh's The Pursuit of History, now in its sixth edition, or The Houses of History, edited by Anna Green and Kathleen Troup. Recommended.”
— Choice

"engaging and accessible . . . these essays cover a wide range of disciplines and eras: from the aforementioned ancient astronomical records to Victorian medical case histories to contemporary data mining. Science in the Archives nonetheless manages to achieve thematic coherence by focusing readers’ attention on the ways in which scientific knowledge continually evolves amid a constant flux of tools, techniques, and scientific traditions."
— Isis

"I found it illuminating to recognize, gradually, the subtle connections that link these diverse papers. The experience of the reader recapitulates, in a way, the process of production. Daston’s technique is to choose an open-ended subject, then to bring together about a dozen original scholars with diverse interests and let them gradually settle on their individual topics. The resources of the Max Planck Institute enable them to meet together several times over a few years, and the looseness or contestability of the framing stimulates them to think through their topics in relation to one another, without seeking a tight consensus even on definitions."
— Journal of Modern History

"Ambitiously innovative . . . Science in the Archives is the fruit of a working group at the Max Planck Institute for the History of Science in Berlin. It assembles the diverse thinking of scholars who come from different directions yet animated by a shared curiosity in archives as material things, created with effort, and engendering repeated questioning, in some cases over hundreds of years."
— Studies in History and Philosophy of Biology & Biomedical Science

TABLE OF CONTENTS

Preface


DOI: 10.7208/chicago/9780226432533.003.0000
[scientific archives;media revolutions;timescales;data storage;data retrieval;libraries;collections;third nature]
First nature is the teeming, tangled complexity of phenomena as they happen; second nature is the systematic and selective investigation of phenomena in the laboratory, the observatory, and the field; third nature is the repository of those findings from second nature selected to endure in the archives of science. In contrast to the laboratory, observatory, and field, the archives of the sciences – the herbaria of the botanists, the observational records of the astronomers, the digital silos of the climate researchers, the libraries of everything from parchment scrolls to digital scans – have been largely invisible as a site of science. We are, however, in the midst of an archival moment, as new digital media raise questions about how to insure the continuity of the scientific archives that make cumulative, collective knowledge possible in both the natural and human sciences. This is not the first such archival moment – the print revolution of early modern Europe triggered similar anxieties about irretrievable loss and visions of unlimited storage – and reflections on the long and diverse history of scientific archives and their associated practices of collection, selection, preservation, transmission, and retrieval hold lessons for the present. (pages 1 - 14)
This chapter is available at:
    University of Chicago Press

I. Nature’s Own Canon: Archives of the Historical Sciences

- Florence Hsia
DOI: 10.7208/chicago/9780226432533.003.0001
[applied historical astronomy;eclipse canon;canon;print culture;celestial history;data deluge]
The longue durée of natural processes studied by astronomers seems to call for an appropriately comprehensive database, a total archive encompassing all celestial phenomena observed throughout human history. This archival desideratum, however, is a relatively recent development. Grounded in the field of applied historical astronomy and the genre of the eclipse canon, the modern ideal of the total astronomical archive has older roots in the gradual reversal of ancient conceptions of canonicity; changing attitudes towards the probative value of empirical material; an evolving print culture of astronomical data (notably the early modern genre of celestial history); and shifts in the moral economy of astronomical practice. Close attention to such disciplinary investments reveals the complexity of astronomy’s long struggle to navigate its data deluge. (pages 17 - 52)
This chapter is available at:
    University of Chicago Press
    https://academic.oup.com/chica...

- David Sepkoski
DOI: 10.7208/chicago/9780226432533.003.0002
[paleontology;databases;historical contingency;second natures;fossils]
Paleontology has, from its very beginnings, been a science deeply concerned with its archive. As this chapter argues, the fossil record—paleontology’s curated archive—has taken a variety of forms over the past two hundred years: it has sometimes been understood to be the collected physical specimens stored in museum cabinets; at others, the synoptic taxonomic information compiled in illustrated atlases or textual catalogs since the early nineteenth century; more recently, as an electronic archive of data. But these textual and digital archives have explicitly referred to an original, “natural” archive—the preserved strata of the earth itself—and may be considered to be attempts to construct a series of “second natures” that have sequentially reconfigured the natural archive as archives of information and data. One of the central features of paleontological archives is that they preserve historical contingency—sequences of events that took place at particular times, in a fixed order. This chapter explores the practical and epistemic consequences of the serial “second naturing” of archival configurations in paleontology over the past two hundred years, concluding that, despite the potential decontextualizing effects of recent electronic database analyses, contingency and narrative have remained an essential feature of paleontological archives. (pages 53 - 84)
This chapter is available at:
    University of Chicago Press
    https://academic.oup.com/chica...

- J. Andrew Mendelsohn
DOI: 10.7208/chicago/9780226432533.003.0003
[case;data;disease;bibliography;periodical;library;archive;hospital medicine;clinical medicine;evidence-based medicine]
This chapter is about medicine’s published cases - not so much how they are written as how they are used. Medicine is shown to exemplify an often-distant relationship - temporal, spatial, cognitive – between observing and knowing. Between them stands a vast library of data, of description in the form of cases. More than a record of observations, this library is a record of their readings and re-readings. In the 16th-18th centuries, most case reports or observationes found their place in encyclopedic compilations. In the late 18th and 19th centuries, as cases came increasingly to be published in periodicals and discussed in local medical societies and reviews, “new” diseases of modern clinical medicine emerged – as case literatures. Examples include leukemia, Hodgkin’s disease, and Stokes-Adams disease. These processes are shown through examples from major medical centers, such as Edinburgh, Paris, London, Dublin, Vienna, and Berlin, as well as from early-modern Italian, Swiss, and German physicians. Medicine’s renaissance of observation after 1500, of “autopsia” and bedside empiricism, equally inaugurated 500 years of data-mining. Much medical research has been library research, and this has been empirical research. The library of cases was not only written knowledge. It was the written, investigable unknown. (pages 85 - 110)
This chapter is available at:
    University of Chicago Press
    https://academic.oup.com/chica...

II. Spanning the Centuries: Archives from Ancient to Modern

- Liba Taub
DOI: 10.7208/chicago/9780226432533.003.0004
[doxography;ancient Greek;ancient Roman;Aristotle;textual format;natural philosophy;medicine;archiving;opinion;information retrieval]
Ancient Greeks and Romans confronted the need to select and store information, with a view to subsequent retrieval. A number of textual formats were used to store, organise and permit retrieval of various sorts of information. This chapter argues that certain types of texts (including doxographies, or collections of opinions) provided an archival function for studying topics and issues in scientific fields such as natural philosophy and medicine, allowing the accumulation, organisation, retrieval and use of data and ideas. Considering the functions of these and related texts can shape our understanding of ancient Greek and Roman scholarly and investigative methods. A crucial feature of doxographical texts is their archiving of already-produced opinions about nature, rather than ‘raw data’. In some cases, as Aristotle and others noted, we have to work with opinions because of a lack of data. One aim of doxography, as an ancient archiving text, is to allow the philosophical and scientific work of hypothesizing and explaining to proceed. It is perhaps only in later periods that ancient doxography becomes a repository of what had been selected to endure as archives of past philosophical and scientific enterprises, rather than as material to be mined for active scientific projects. (pages 113 - 136)
This chapter is available at:
    University of Chicago Press
    https://academic.oup.com/chica...

- Suzanne Marchand
DOI: 10.7208/chicago/9780226432533.003.0005
[ancient history;source criticism;history of historiography;Quellenkritik;history of historiography;Leopold von Ranke;Historicism;George Grote]
This essay traces the impact of the rise of modern source criticism,Quellenkritik, on the credibility of ancient history in the latereighteenth and early nineteenth centuries.It argues that through-going, historicized source criticism had a corrosive effect on ancient history in particular, calling into question the veracity of previously admired ancient writers such and Thucydides and especially Herodotus. In an age, also, of greater and greater scholarly specialization, older forms of ‘universal history’ became increasingly suspect, and writers began to focus on individual, national trajectories. It was against this background that Leopold von Ranke and other nineteenth-century would-be authors of ‘scientific’ history made the use of archives—for which ancient historians had no real equivalent—the foundation of their scholarlyauthority. Since that time, modern and ancient historians have largely parted methodological ways—to the detriment, it is argued, of both. (pages 137 - 158)
This chapter is available at:
    University of Chicago Press
    https://academic.oup.com/chica...

- Lorraine Daston
DOI: 10.7208/chicago/9780226432533.003.0006
[Corpus Inscriptionum Latinarum;Carte du Ciel;Theodor Mommsen;Big Science;sstrophotography;scientific archives;epigraphy;positivism;scientific progress]
Big Science (and Big Humanities) were invented in the nineteenth century. Two huge, expensive, and long-lived projects, the Corpus Inscriptionum Latinarum (CIL) of the classical philologists and the Carte du Ciel of the astronomers, created disciplinary archives intended to serve future researchers for centuries and even millennia to come. The CIL would transcribe and publish all known Latin inscriptions from the length and breadth of the ancient Roman empire before they succumbed to the depredations of time. The Carte du Ciel would use the new methods of astrophotography to create a photograph of the sky as seen from the earth circa 1900, which future astronomers could use to detect phenomena that unfolded on a superhuman timescale. Both projects involved international cooperation, industrial-style organization, state funding, and disciplinary stamina on an unprecedented scale. Both raise questions about the investment of resources: why create the archives for future research instead of channeling energies and funds into present inquiry, especially when the topics of future research are uncertain? The answer lies in the melancholy realization of second-wave positivism that the price of progress was ephemeral scientific knowledge. Only the archives seemed to promise permanence. (pages 159 - 182)
This chapter is available at:
    University of Chicago Press
    https://academic.oup.com/chica...

III. Problems and Politics: Controversies in the Global Archive

- Bruno J. Strasser
DOI: 10.7208/chicago/9780226432533.003.0007
[big data;data deluge;database;authorship;moral economy;collections;databases;open science;GenBank;Protein Data Bank]
How did scientific databases become filled with data? And how did they become a cornerstone of open science? Framing databases within the broader history of scientific collections provides some answers. Natural history collections were often assembled with the help of wide networks of amateurs. But in the 20th century, when experimentalists attempted to build their own collections, they had no amateur community to rely on. Professional researchers were often unwilling to share their data. Within the moral economy of the experimental sciences, “data” was something that belonged to the private archive of the experimentalist. Scientific journals made the researchers' data public in exchange for the attribution of authorship and credit. In order to create a steady flow of data from the researchers’ private archive to public databases, key scientists and database managers developed new strategies to align data deposition with the individual interest of researchers. Data submission became a condition for scientific publication, a policy enforced by journals editors. Other forms of incentives, based on data authorship and data citation, were established to reward scientists sharing data, creating our current “data deluge”. (pages 185 - 202)
This chapter is available at:
    University of Chicago Press
    https://academic.oup.com/chica...

- Cathy Gere
DOI: 10.7208/chicago/9780226432533.003.0008
[human evolution;genetic archive;scientific ethics;biocolonialism;indigenous genetics]
This chapter explores the politics and ethics of the archiving of human genetic material for the purpose of evolutionary analysis. The science of human evolution has its roots in nineteenth-century taxonomies of race. The chapter investigates how the legacy of racial science was subverted, challenged, replicated, and transformed in human genetic archiving practices since WWII. In the immediate postwar period the investigation of human evolution was led by a network of geneticists who believed that good science was the best antidote to racism. At the same time as they repudiated crude definitions of race, they were collecting and storing the blood of indigenous people whom they defined as racially and genetically pure and therefore useful for reconstructing the story of the human journey out of Africa. Starting in the 1970s, the assumption that good science was all that was needed to combat racial prejudice came under increasing pressure. By the first decades of the twenty-first century a number of legal and political victories against the collection and analysis of indigenous genomes had forced the scientists to rethink the way that they collected, stored and used genetic material in the human archive. (pages 203 - 222)
This chapter is available at:
    University of Chicago Press
    https://academic.oup.com/chica...

- Vladimir Janković
DOI: 10.7208/chicago/9780226432533.003.0009
[climatology;climate data;environmental policy;climate change;archiving]
Within a decade following the US Congress decision to pass the National Climate Program Act in 1978, the National Research Council published a series of reports on the state of climate research and institutional infrastructure intended for acquisition and management of climate data, products and services. The sense of urgency and the significance given to climate monitoring led to a series of high-level meetings aimed to address, among other issues, the adequacy of existing practices of data management and climate archiving.This chapter explores the methodological, institutional and economic dimensions of climatological data archiving argued in these meetings. It looks into the elements of the cognitive politics associated with the notion of climatological archiving in the context of the National Climate Program’s objective to provide robust climate products for outside use and the society at large. Deliberations on these issues informed the conceptualization of the climate archive as a working world combining methodological protocols, institutional politics, environmental research and the complexities arising from the materiality of archival metabolism. How did these processes affect the purpose, practice and status of modern climatological archiving and archiving more generally? (pages 223 - 244)
This chapter is available at:
    University of Chicago Press
    https://academic.oup.com/chica...

IV. The Future of Data: Archives of the New Millennium

- Rebecca Lemov
DOI: 10.7208/chicago/9780226432533.003.0010
[self-archiving;algorithmic self;self-commodification;datafication;ceaseless curation;sciences of self]
This chapter traces self-archiving practices through their historical genesis in the professional and professionalizing human sciences--especially “sciences of self” in the modern era. Considering how a search for subjective data enlivened these nascent sciences, including projects from dream collection to self-narration to political surveillance, it argues that the archiving of lives is bolstered not only by emerging digital tracking technologies but by historically crafted inquisitorial data-collecting techniques. Paradoxes that arise in treating the self as an archive are explored, as well as longtime fantasies of total information with their dreams of dramatic compressibility and expansibility of information. Three individual cases of self-archiving are described: Buckminster Fuller’s Dymaxion Chronofile experiment (beginning in the 1930s); Gordon Bell’s MyLifeBits quest to record the totality of his experience (beginning in the 1990s); and Chris Dancy’s InnerNet, an extreme self-quantification project also exploring selflessness (in the 2000s). Overall, this essay investigates the question of how and in what sense the “self” is becoming more and more an archive made up of all the moments of a human life. As self-surveillance and self-tracking become ever more common norms, how are our selves reconfigured--specifically via temporal dynamics and the place of the individual within collectives? (pages 247 - 270)
This chapter is available at:
    University of Chicago Press
    https://academic.oup.com/chica...

- Daniel Rosenberg
DOI: 10.7208/chicago/9780226432533.003.0011
[keyword;stop word;stop list;computer;data;archive;textuality;infraordinary;OULIPO;KWIC]
This paper sketches thestory of one of the notable, little-noticed mechanisms of late-twentieth century information culture, the “stop list.” A stop list is a list of words thata computer is instructed to remove or ignore when processing text. These were important tools in twentieth-century electronic text processing,and in crucial ways, determined the character of texts as encountered by computers. Yet they have never been studied or themselves archived.In the past decade, big storage, fast processors, and new statistical techniques have made the stop list less important, and the very ethos of big dataweighs against it. Yet the lessons of the stop list are, if anything, more valuable than ever today. The stop list is of special interest as a mechanismfundamental to electronic “systems of statements” and as an example of the kind of para-literary object thatwe should be especially attentive to archiving today. (pages 271 - 310)
This chapter is available at:
    University of Chicago Press
    https://academic.oup.com/chica...

- Matthew L. Jones
DOI: 10.7208/chicago/9780226432533.003.0012
[data mining;database;statistical rigor;corporate archives;internet;search;marketing;apriori algorithm;page-rank;Google]
Focused on activity at Stanford, Google, and IBM, this chapter stresses the centrality of the database community of academic computer science and industry in the creation of data mining practices during the 1990s and early 2000s. Whether creating new search algorithms or tools for enhancing marketing, database practitioners could never forget the scale of data, understood not as something intangible but as something physical existing on slow hard drives, something incapable of being resident in memory, something requiring time to move from place to place and from drives to processors. The early creators of data mining offered powerful technological determinist narratives holding that great volumes of data require the development of new algorithms, the loosening of traditional accounts of statistical rigor, the creation of new epistemic virtues, and the creation of new experts. The sheer scale of data was held to demand—and to justify—new forms of scientific knowledge, at times in conflict with long-held views of statistical rigor. (pages 311 - 328)
This chapter is available at:
    University of Chicago Press
    https://academic.oup.com/chica...

- Lorraine Daston
DOI: 10.7208/chicago/9780226432533.003.0013
Every scientific archive has a timeline. As long as the history of life on earth, or as short as the meteorological havoc wrought by the latest El Niño; reaching deep into astronomical past and future; spanning generations of natural philosophers since Thales and physicians since Hippocrates—the archive is where the scientific past, present, and future converge. More specifically, the archive is the physical expression of how present science creates a usable past for future science.... (pages 329 - 332)
This chapter is available at:
    University of Chicago Press
    https://academic.oup.com/chica...

Contributors

Bibliography

Index