In lieu of an abstract, here is a brief excerpt of the content:

Reviewed by:
  • Foundations of statistical natural language processing by Christopher D. Manning, Hinrich Schütze
  • K. Bretonnel Cohen and Andrew Dolbey
Foundations of statistical natural language processing. By Christopher D. Manning and Hinrich Schütze. Cambridge, MA: MIT Press, 1999. Pp. 680. $64.95.

This is a comprehensive reference text on statistical approaches to natural language processing. As such, anyone teaching courses or employed in that field will want to own it. The authors’ definition of the term ‘statistics’ is rather broad— ‘we take the basic meaning of the term “ statistics” as . . . encompassing all quantitative approaches to data’ (xxxii), resulting in the book covering a wide range of topics in NLP and having something to offer to readers with a pleasantly wide variety of interests.

It is less clear that this is a helpful textbook. The problem with this book as a textbook is that the authors have deliberately addressed it to a carefully circumscribed student audience: ‘It is assumed that the student has prior programming experience, and has some familiarity with formal languages and symbolic parsing methods. It is also assumed that the student has a basic grounding in such mathematical concepts as set theory, logarithms, vectors and matrices, summations, and integrations’ (xxxv). These seem like ludicrous expectations for any student in a linguistics department who is not majoring in computational linguistics and possibly for linguistics students who are majoring in computational linguistics. Thus, the text is unfortunately not likely to be accessible to students who are interested in investigating (statistical approaches to) computational linguistics but aren’ t already involved in it.

For all the questions about its accessibility to the average linguistics student, this is an excellent reference for the reader who comes to it with the appropriate background. The chapter on collocations is the best coverage of this topic that we’ve seen, and the chapter on statistical inference is quite wonderful. The chapter on the care and handling of corpus data has good coverage of the sometimes mundane but omnipresent problems and issues of proper nouns, tokenization, case, and sentence boundaries that are associated with ‘real’ data and gives some useful approaches to dealing with them. The 2 ½-page ‘Table of notations’ is very much appreciated; along with the obvious care that the authors took to identify and define domain-specific notation where it appears in the text and in tables, it adds a lot to the readability of the book in general. Indeed, the overall readability of the text is high and could only have been improved by making the index more multilevel (compare, for instance, the index entries for tagging and for collocation).

It’s worth stating again that if you teach or are otherwise active in this field, you should own this book. An informal survey of the cubicles in a company at which one of the reviewers consults counted four reasonably well-thumbed copies of this book, and the company in question has made profitable use of the techniques described in it. If you meet the description of the intended reader, you can expect to find it quite useful. [End Page 599]

K. Bretonnel Cohen
University of Colorado Health Sciences Center
Andrew Dolbey
University of California, Berkeley
...

pdf

Share