skip to main content
article
Free Access

Techniques for automatically correcting words in text

Published:01 December 1992Publication History
Skip Abstract Section

Abstract

Research aimed at correcting words in text has focused on three progressively more difficult problems:(1) nonword error detection; (2) isolated-word error correction; and (3) context-dependent work correction. In response to the first problem, efficient pattern-matching and n-gram analysis techniques have been developed for detecting strings that do not appear in a given word list. In response to the second problem, a variety of general and application-specific spelling correction techniques have been developed. Some of them were based on detailed studies of spelling error patterns. In response to the third problem, a few experiments using natural-language-processing tools or statistical-language models have been carried out. This article surveys documented findings on spelling error patterns, provides descriptions of various nonword detection and isolated-word error correction techniques, reviews the state of the art of context-dependent word correction techniques, and discusses research issues related to all three areas of automatic error correction in text.

References

  1. ABNEY, S. 1990. Rapid incremental parsing with repair. In Proceedings of the 6th New OED Conference: Electronic Text Research (Waterloo, Ontario, Oct. 1990).]]Google ScholarGoogle Scholar
  2. AHO, A.V. 1990. Algorithms for finding patterns in strings. In Handbook of Theoretical Computer Science, J. Van Leeuwen, Ed. Elsevier Science Publishers, B. V., Amsterdam.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. AHO, A. V., AND CORASICK, M.J. 1975. Fast pattern matching: An aid to bibliographic search. Commun. ACM 18, 6 (June), 333-340.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. AHO, A. V., AND PETERSON, T.G. 1972. A minimum distance error-correcting parser for context free languages. SIAM J. Comput. 1, 4 (Dec.), 305-312.]]Google ScholarGoogle ScholarCross RefCross Ref
  5. ALBERGA, C.N. 1967. String similarity and misspellings. Commun. ACM 10, 302 313.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. ALLEN, R. B., AND KAMM, C.h. 1990. A recurrent neural network for word identification from continuous phoneme strings. In Advances in Neural Information Processing Systems, vol. 3. R. P. Lippmann, J. E. Moody, D. S. Touretzky, Ed. Morgan Kaufmann Publishers, San Mateo, Calif.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. ALM, N., ARNOTT, J. L., AND NEWELL, A.F. 1992. Prediction and conversational momentum in an augmentative communication system. Cornman. ACM 35, 5 (May), 46 56.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. ANGELL, R. C., FREUND, G. E., AND WILLETT, P. 1983. Automatic spelling correction using a trigram similarity measure. Inf. Process. Manage. 19,255 261.]]Google ScholarGoogle ScholarCross RefCross Ref
  9. ATWELL, E., AND ELLIOTT, S. 1987. Dealing with ill-formed English text (Chapter 10). In The Computational Analysis of English: A Corpus- Based Approach. R. Garside, G. Leach, G. Sampson, Ed. Longman, Inc. New York.]]Google ScholarGoogle Scholar
  10. BAHL, L. R., BROWN, P. F., DESOUZA, P. V., AND MERCER, R.L. 1989. A tree-based statistical language model for natural language speech recognition. IEEE Trans. Acoust. Speech Stg. Process. 37, 7, (July), 1001-1008.]]Google ScholarGoogle Scholar
  11. BAHL, L. R., JELINEK~ F., AND MERCER, R.L. 1983. A maximum likelihood approach to continuous speech recognition. IEEE Trans. Patt. Anal. Machine Intell. PAMI-5, 2 (Mar.), 179 190.]]Google ScholarGoogle Scholar
  12. BENTLEY, J. 1985. A spelling checker. Commun. ACM 28, 5 (May), 456-462.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. BICKEL, M.A. 1987. Automatic correction to misspelled names: A fourth-generation language approach. Commun. ACM 30, 3 (Mar.), 224-228.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. BLAIR, C. R. 1960. A program for correcting spelling errors. Inf. Contr. 3, 60 67.]]Google ScholarGoogle ScholarCross RefCross Ref
  15. BLEDSOE, W. W., AND BROWMNG, I. 1959. Pattern recognition and reading by machine. In Proceedings of the Eastern Joint Computer Conference, vol. 16, 225-232.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. BOCAST, A. K. 1991. Method and apparatus for reconstructing a token from a Token Fragment. U.S. Patent Number 5,008,818, Design Services Group, Inc. McLean, Va.]]Google ScholarGoogle Scholar
  17. BOIWE, R. H. 1981. Directory assistance revisited. AT & T Bell Labs Tech. Mem. June 12, 1981.]]Google ScholarGoogle Scholar
  18. BROWN, P. F., DELLA PIETRA, V. J., DESOUZA, P. V., AND MERCER, R. L. 1990a. Class-Based n- Gram Models of Natural Language.]]Google ScholarGoogle Scholar
  19. BROWN. P., Cecum, J., DELLA PIETRA, S., DELLA PIETRA, V., JELINEK, F., MERCER, R., AND ROOSIN, P. 1990b. A statistical approach to machine translation. Con*put. Ling. 16, (June), 79-85.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. BROWN, P., DELLA PIETRA, S., DELLA PIETRA, V., AND MERCER, R. 1991. Word sense disambigaation using statistical methods. In Proeeedtngs of the 29th Annual Meeting of the Association for Computational Linguistics (Berkeley, Calif., June), ACL, 264 270.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. BURR, D. J. 1983. Designing a handwriting reader. IEEE Trans. Patt. Anal. Machine Intell. PAMI-5, 5 (Sept.), 554 559.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. BURR, D. J. 1987. Experiments with a connactionist text reader. In IEEE International Conference on Neural Networks (San Diego, Calif., June). IEEE, New York, IV:717-724.]]Google ScholarGoogle Scholar
  23. CARBERRY, S. 1984. Understanding pragmatically ill-formed input. In Proceedings of the lOth International Conference on Computational Linguistics. ACL, 100-206.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. CARBONELL, J. G., AND HAYES, P.J. 1983. Recovery strategies for parsing extragrammatical language. Amer. J. Comput. Ltng. 9, 3-4 (July-Dec.), 123 146.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. CARTER, D.M. 1992. Lattice-based word identification in CLARE. In Proceedings of the 30th Annual Meeting of the Association for Computational Linguistics (Newark, Del., June 28-July 2). ACL, 159-166.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. CHERKASSKY, V., AND VASSILAS, N. 1989a. Backpropagation networks for spelling correction. Neural Net. 1, 3 (July), 166-173.]]Google ScholarGoogle Scholar
  27. CHERKASSKY, V., AND VASSILAS, N. 1989b. Performance of back-propagation networks for associative database retrieval. Int. J. Comput. Neural Net.]]Google ScholarGoogle Scholar
  28. CHERKASSKY, V., RAO, M., AND WECHSLER, H. 1990. Fault-tolerant database retrieval using distributed associative memories. Inf. Sci. 46, 135-168.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. CHERKASSKY, V.. FASSETT, K., AND VASSILAS, N. 1991. Linear algebra approach to neural associative memories and noise performance of neural classifiers. IEEE Trans. Comput. 40, 12, 1429-1435.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. CHERKASSKY, V., VASSILAS, N., BRODT, G. L., AND WECHSLER, H. 1992. Conventional and associative memory approaches to automatic spelling checking. Eng. Appl. Artif. Intell. 5, 3.]]Google ScholarGoogle ScholarCross RefCross Ref
  31. CHERRY, L., AND MACDONALD, N. 1983. The Writer's Workbench software Byte, (Oct.), 241 248.]]Google ScholarGoogle Scholar
  32. CHOUEKA, Y. 1988. Looking fbr needles in a haystack. In Proceedtngs of RIAO, 609 623]]Google ScholarGoogle Scholar
  33. CHURCH, K.W. 1988. A stochastic parts program and noun phrase parser for unrestricted text. In Proceedings of the 2nd Applted Natural Language Processing Conference (Austin, Tex, Feb.). ACL, 136 143.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. CHURCH, K. W., ANO GALE, W.A. 1991a. Probability scoring for spelling correction. Stat. Camput. 1, 93 103.]]Google ScholarGoogle ScholarCross RefCross Ref
  35. CHURCH, K. W., AND GAbE, W. A. 1991b. Enhanced Good-Turmg and cat-cal Two new methods for esmnating probabilities of English bigrams. Comput. Speech Lung. 1991.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. COHEN, G. 1980. Reading and searching for spelling errors. In Cognitive Processes in Spelhng. Uta Frith, Ed. Academic Press, London.]]Google ScholarGoogle Scholar
  37. COtLER, C. H., CHURCH, K. W., AND LIBERMAN, M. Y. 1990. Morphology and rhyming: Two powerful alternatives to letter-to-sound rules for speech synthesis. In Proceedings of the Conference on Speech Synthesis. European Speech Communication Association.]]Google ScholarGoogle Scholar
  38. CONTANT, C., AND BRUNELLE, E. 1992 Exploratexte: Un analyseur a l'affut des erreurs grammaticales. In Actes du colloque lexiquesgrammatres compares, Universite du Quebec a Montreal. In French.]]Google ScholarGoogle Scholar
  39. CUSHMAN, W. H., OJHA, P. S., AND DANIEl, S, C. M. 1990. Usable OCR: What are the mlmmum requirements. In CH1-90 Conference Proceedrags, Special Issue o/ the ACM SIGCHI Bulletin (Seattle, Wash., Apr 1-5.) ACM, New York, 145-151.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. DAHL, P, AND CHERKASSKY, V. 1990. Combined encoding in associative spelling checkers. Umv. of Minnesota EE Dept. Tech. Rep.]]Google ScholarGoogle Scholar
  41. DAMERAV, F.J. 1990. Evaluating computer-generated domain-oriented vocabularies. Inf Process. Manage. 26, 6, 791-801.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. DAMERAU, F.J. 1964. A technique for computer detection and correction of spelling errors. Cornmun ACM 7, 3 (Mar.), 171-176.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. DAMERAU, F. J., AND MAYS, E. 1989. An examinatmn oi undetected typing errors. Inf. Process. Manage. 25, 6, 659 664.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. DAVIDSON, L 1962. Retrieval of misspelled names in an airline passenger record system. Commun. ACM 5, 169 171.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. DEERWESTER, S., DUMAIS, S. T., FURNAS, G. W., LANDAUER, T K., AND HARSHMAN, R. 1990. Indexing by Latent Semantic Analysis. JASIS 41, 6, 391-407.]]Google ScholarGoogle ScholarCross RefCross Ref
  46. DEFFNER, R., EDER, K, AND GEiGER, I-I. 1990a. Word recognition as a first step towards natural langlmge processing with artificial neural nets. In Proceedings of KONNAI-90.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. DEFFNER, R., GEIGER, H., KAHLER, R., KREMPL, T., AND BRAUER, W. 1990b. Recognizing words with connectionist architectures. In Proceedings of INNC-90-Parts (Paris, France, July), 196.]]Google ScholarGoogle Scholar
  48. DEHEER, T. 1982. The application of the concept of homeosemy to natural language information retrieval. Inf. Process. Manage. 18, 229-236.]]Google ScholarGoogle ScholarCross RefCross Ref
  49. DELOCttE, G., AND DEmLh F. 1980. Order information redundancy of verbal codes in French and English' Neurolinguistic implications. J. Verbal Learn. Verbal Behav. 19, 525-530.]]Google ScholarGoogle ScholarCross RefCross Ref
  50. DEMASCO, P. W., AND McCoY, K.F. 1992. Generating text from compressed input: An intelhgent interface for people with severe motor impairments. Commun. ACM 35, 5 (May), 68-78.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. DEROUAULT, A.-M., AND MERIALDO, B. 1984a. Language modeling at the syntactic level. In Proceedmgs of the 7th International Conference on Pattern Recognition (Montreal, Canada, July 30-Aug. 2), 1373-1375.]]Google ScholarGoogle Scholar
  52. DEROUAULT, A.-M, AND MER~ALDO, B. 1984b. TASF: A stenotypy-to-French transcription system. In Proceedings of the 7th International Conference on Pattern Recogn~tton (Montreal, Canada, July 30-Aug. 2), 866-868.]]Google ScholarGoogle Scholar
  53. DUNLAVEY, M. R 1981. On spelling correction and beyond. Cammun. ACM 24, 9 (Sept.), 608.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. DURHAM, I., LAMB, D A, AND SAXE, J B. 1983. Spelling correction in user interfaces. Commun. ACM 26, 10 (Oct.), 764 773.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. EASTMAN, C. M., AND MCLEAN, D. S. 1981. On the need for parsing ill-ibrmed input. Amer. J Comput. Ling. 7.4, 257.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. ELLIOTT, R. J. 1988. Annotating spelling list worda with a~fixation classes. AT & T Bell Labs Int. Mem. Dec. 14.]]Google ScholarGoogle Scholar
  57. ELLIS, A. W. 1979 Slips of the pen. Vis. Lang. 13, 265-282.]]Google ScholarGoogle Scholar
  58. ELLIS, A. W. 1982. Spelling and writing (and reading and speaking). In Normahty and Pathology m Cognttwe Functwns, A. W Elhs, Ed. Academic Press, London.]]Google ScholarGoogle Scholar
  59. FA$$, n., AND WILKS, Y. 1983. Preference semantics, fil-formedness, and metaphor Amer J. Comput. Ling. 9.3 4 (July-Dec), 178 189.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. F~NK, P. K., AND BIERMANN, A.W. 1986. The correction of ill-formed input using history-based expectation with applications to speech understanding. Comput. Ling. 12, i (Jan.-Mar.), 13-36.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. FORNEY, G. D., JR. 1973. The Viterbi algorithm. Prec. IEEE 61, 3 (Mar.), 268-278.]]Google ScholarGoogle ScholarCross RefCross Ref
  62. Fox, E. A., CHEN, Q. F., AND HEATH, L.S. 1992. A faster algorithm for constructing minimal perfect hash functions. In Proceedings of the 15th Annual International SIGIR Meeting, SI- GIR'92 (Denmark, June). ACM, New York~ 266-273.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. FREDKIN, E. 1960. Trie memory. Commun ACM 3, 9, (Sept.), 490-500.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. FROMKIN, V., ED. 1980. Errors in Linguistic Performance: Shps of the Tongue, Ear, Pen and Hand. Academic Press, New York, 1980.]]Google ScholarGoogle Scholar
  65. GALE, W. A., AND CHURCH, K.W. 1990. Estimation procedures for language context: Poor estimates are worse than none. In Proceedings of Compstat-90 (Dubrovnik, Yugoslavia). Springer-Verlag, New York, 69-74.]]Google ScholarGoogle ScholarCross RefCross Ref
  66. GALLANT, S. I. 1991. A practical approach for representing context and for performing word sense disambiguation using neural networks. Neural Comput. 3, 293-309.]]Google ScholarGoogle ScholarCross RefCross Ref
  67. GARRETT, M. 1982. Production of speech: Observations from normal and pathological language use. In Normality and Pathology ~n Cognttive Functmns, A. W. Ellis, Ed. Academic Press, London.]]Google ScholarGoogle Scholar
  68. GARSIDE, R., LEACH, G., AND SAMPSON, G. 1987. The Computatwnal Analysis of English: A Corpus-Based Approach. Longman, Inc., New York.]]Google ScholarGoogle Scholar
  69. GENTNER, D. R., GRUDIN, J., LAROCHELLE, S., NOR- MAN, D. A., AND RUMELHART, D. E. 1983. Studies of typing from the LNR typing research group. In Cognitive Aspects of Skilled Typewriting, W. E. Cooper, Ed. Springer- Verlag, New York.]]Google ScholarGoogle Scholar
  70. GERSHO, M., AND REITER, R. 1990. Information retrieval using self-organizing and heteroassociative supmwised neural networks. In Procee&ngs oflJCNN (San Diego, Calif. June).]]Google ScholarGoogle Scholar
  71. GOOD, I.J. 1953. The population frequencies of species and the estimation of population parameters Biometrika 40, 3 and 4 (Dec.), 129-264.]]Google ScholarGoogle Scholar
  72. GORIN, R. E. 1971. SPELL: A spelling checking and correction program. Online documentation for the DEC-10 computer.]]Google ScholarGoogle Scholar
  73. GOSHTASBY, A., AND EHRICH, R.W. 1988. Contextual word recognition using probabilistic relaxation labeling. Patt. Recog. 21, 5, 455-462.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. GRANGER, R.H. 1983. The NOMAD system: Expectation-based detection and correction of errors during understanding of syntactically and semantically ill-formed text. Amer. J. Comput. Ling. 9, 3-4 (July-Dec.), 188-196.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. GRUDIN, J. 1983. Error patterns in skilled and novice transcription typing. In Cognitive Aspects of Skilled Typewriting, W. E. Copper, Ed. Springer-Verlag, New York.]]Google ScholarGoogle Scholar
  76. GRUHIN. J. 1981. The organization of serial order in typing. Ph.D. dissertation Univ. of California, ~an Diego.]]Google ScholarGoogle Scholar
  77. HALL, P. A. V., ANn DOWLING, G. R. 1980. Approximate string matching. ACM Comput. Surv. 12, 4 (Dec.), 17 38.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. HANSON, S. J., AND KEGL, J. 1987. PARSNIP: A connectionist network that natural language grammar from exposure to natural language sentences. In Proceedings of the Cognitive Science Conference.]]Google ScholarGoogle Scholar
  79. HANSON, A. R., RISEMAN, E. M., AND FISHER, E., 1976. Context in word recognition. Part. Recog. 8, 35-45.]]Google ScholarGoogle ScholarCross RefCross Ref
  80. HARMON, L. D. 1972.Automatic recognition of print and script. Proc. IEEE 60, (Oct.), 1165 1176.]]Google ScholarGoogle ScholarCross RefCross Ref
  81. HAWLEY, M.J. 1982. Interactive spelling correction in Unix: The METRIC Library. AT &T Bell Labs Tech. Mem., August 31.]]Google ScholarGoogle Scholar
  82. HEIDORN, G.E. 1982. Experience with an easily computed metric for ranking alternative parses. In Proceedings of the 20th Annual Meeting of the Associatzon for Computational Linguistics (Toronto, Canada). ACL, 82-84.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. HEIDORN, G. E., JENSEN, K., MILLER, L. A., BYRD, R. J., AND CHODOROW, M.S. 1982. The EPIS- TLE text-critiquing system. IBM Syst. J. 21, 3,305-326.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. HENSELER, J., SCHOLTES, J. C., AND VERDOEST, C. R. J. 1987. The design of a parallel knowledge-based optical character recognition system. Master of Science Theses, Dept. of Mathematics and Informatics, Delft Univ. of Technology.]]Google ScholarGoogle Scholar
  85. HINDLE, D. 1983. User manual for Fidditch, a deterministic parser. Tech. Mere. 7590 142, Naval Research Lab.]]Google ScholarGoogle Scholar
  86. Ho, T. K., HULL, J. J., AND SRIHARI, S. N. 1991. Word recognition with multi-level contextual knowledge. In Proceedings of IDCAR-91 (St. Malo, France), 905-915.]]Google ScholarGoogle Scholar
  87. HOTOPF, N. 1980. Slips of the pen. In Cognitive Processes in Spelling, Uta Frith, Ed. Academic Press, London.]]Google ScholarGoogle Scholar
  88. HULL, J.J. 1987. Hypothesis testing in a computational theory of visual word recognition. In Proceedings of AAAI-87, 6th National Conference on Artificial Intelligence. vol. 2 (Seattle, Wash., July 13 17). AAAI, 718 722.]]Google ScholarGoogle Scholar
  89. HULL, J. J., AND SRIHARI, S. N. 1982. Experiments in text recognition with binary n-gram and Viterbi algorithms. IEEE Trans. Patt. Anal. Machine Intell. PAMI-4, 5 (Sept.), 520 530.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. JELINEK, F., MERIALDO, B., ROUKOS, S., AND STRAUSS, M. 1991. A dynamic language model for speech recognition. In Proceedings of the DARPA Speech and Natural Language Workshop (Feb. 19-22), 293-295.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. JENSEN, K., HEIDORN, G. E., MILLER, L. A., AND RAVIN, Y. 1983. Parse fitting and prose fixM ing: Getting a hold on ill-formedness. Amer. J. Comput. Ling. 9, 3-4 (July-Dec.), 147 160.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. JOHNSTON, J. C., AND MCCLELLAND, J. L. 1980. Experimental tests of a hierarchical model of word identification. J. Verbal Learn. Verbal Behav. 19, 503-524.]]Google ScholarGoogle ScholarCross RefCross Ref
  93. JONES, M. A., STORY, G. A., AND BALLARD, B. W. 1991. Integrating multiple knowledge sources in a Bayesian OCR post-processor. In Proceedtngs of IDCAR-91 (St Malo, France), 925-933.]]Google ScholarGoogle Scholar
  94. JOSHI, A.K. 1985. How much context-sensitivity is necessary for characterizing structural descriptions-Tree Adjoining Grammars In Natural Language Processing Theoretzcal, Computatzonal and Pwcholog~cal Perspectives, D. Dowty, L. Karttunen, A. Zwicky, Ed. Cambridge University Press, New York.]]Google ScholarGoogle Scholar
  95. KAHaN, S, PAVLIDiS, T., AND BAIRD. H. S. 1987. On the recognition of characters of any font size IEEE Trans Patt. Anal. Machine Intell. PAMI-9, 9, 274-287]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. KASHYAP, R. L, AND OOMMEN, B. J. 1981 An effective algorithm for string correction using generalized edit distances. Inf Sci 23, 123-142.]]Google ScholarGoogle ScholarCross RefCross Ref
  97. KASHYAP, R. L., AND OOMMEN, B.J.1984. Spelling correction using probabilistic methods. Part Recog. Lett. 2, 3 (Mar.), 147 154.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  98. KEELER, J., AND RUMELHART, D.E. 1992. A selforganizing mtegreted segmentation and recognition neural net. In Advances ~n Neural ln/~rmation Proccsszng Systems, vol. 4. J. E. Moody, S. J. Hanson, R. P. Lippmann, Ed. Morgan Kaufmann, San Mateo, Calif., 496-503.]]Google ScholarGoogle Scholar
  99. KEMPEN, G., AND VOSSE, T. 1990. A languagesensitive text editor for Dutch. In Proceedings of the Computers and Writing 111 Conference (Edinburgh, Scotland, Apr )]]Google ScholarGoogle Scholar
  100. KERNIGHAN, M.D. 1991. Specialized spelling correction for a TDD system AT & T Bell Labs Tech. Mere., August. 30.]]Google ScholarGoogle Scholar
  101. KERNIGHAN, M. D., AND GALE, W.A. 1991. Varmtions on channel-frequency spelling correction in Spamsh. AT&T Bell Labs Tech. Mem., September.]]Google ScholarGoogle Scholar
  102. KERNIGHAN, M. D., CHURCH, K. W., AND GALE, W. A. 1990. A spelling correction program based on a noisy channel model. In Proceedings of COL- ING-90, The 13th International Conference on Computational Linguistics, vol. 2 (Helsinki, Finland). Hans Kar}gren, Ed. 205-210.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  103. KNUTH, D. E. 1973. The Art of Programming. Vol. 3, Sorting and Searching. Addison-Wesley, Reading, Mass.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  104. KOHONEN, T. 1980. Content Addre.ssable Memortes Springer-Verlag, New York.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  105. KOHONEN, T. 1988. Self-Orgamzation arid Assoctative Memory. Springer-Verlag, New York.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  106. KUCERA, H., AND FRANCIS, W.N. 1967. Computational Analysis of Present-Day American Engltsh Brown University Press, Providence, R.I.]]Google ScholarGoogle Scholar
  107. KUKICH, K. 1988a. Variatmns on a back-propagation name recognition net. In Proceedings of the Advanced Technology Conference, vol 2 (May 3-5). U.S. Postal Service, Washington D.C., 722-735.]]Google ScholarGoogle Scholar
  108. KUKICH, K. 1988b. Back-propagation topologies for sequence generation. In Proceedings o/ the IEEE International Conference on Neural Networks, vol. 1 (San Diego, Calif., July 24 27). IEEE, New York, 301-308.]]Google ScholarGoogle ScholarCross RefCross Ref
  109. KUKICH, K. 1990 A comparison of some novel and traditional lexical distance metrics for spelling correction. In Proceectzngs of INNC- 90-Paris (Paris, France, July), 309-313.]]Google ScholarGoogle Scholar
  110. KUK~CH, K. 1992. Spelling correction for the telecommunications network for the deaf. Commun ACM 35, 5 (May), 80 90.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  111. LANDAUER, T. K, AND STREETER~ L. A. 1973. Structural differences between common and rare words. J. Verbal Learn. Verbal Behav. 12, 119-131.]]Google ScholarGoogle ScholarCross RefCross Ref
  112. LEE, Y.-H., EVENS, M., MICfiAEL, J. A., AND ROVlCK, A.A. 1990. Spelling Correction for an intelligent tutoring system. Tech. Rep., Dept. of Computer Science, Illinois Inst. of Technology, Chicago]]Google ScholarGoogle Scholar
  113. TEIN, V I. 1966. Binary codes capable of correcting deletions, insertions and reversals. Sov. Phys. Dokl. 10, (Feb), 707-710.]]Google ScholarGoogle Scholar
  114. AN, M. Y., AND WALKER. D.E. 1989. ACL Data Collectmn mitmtlve: First release. Fznite String 15, 4 (Dec.), 46-47.]]Google ScholarGoogle Scholar
  115. CE, R., AND WAGNER, R. 1975. An extension of the string-to-string correction problem. J. ACM 22, 2 (Apr.), 177-183.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  116. O., BURGES, C. J. C, LECuN, Y, AND DENKER, J.S. 1992. Multi-digit recogmtion using a space displacement neural network. In Advances in Neural Information Processzng Systems, vol. 4, J. E Moody, S. J. Hanson, R. P. Lippnmnn, Ed. Morgan Kaufmann, San Mateo, Calif, 488-495.]]Google ScholarGoogle Scholar
  117. E., DAMERAU, F. J., AND MERCER, R L 1991. Context based spelling correction. Inf. Process. Manage. 27, 5. 517-522.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  118. J. L., AND RUMELHART. D.E. 1981 An interactive activation model of context effects in letter perception. Psychol. Rev. 88, 5 (Sept.), 375 407.]]Google ScholarGoogle Scholar
  119. K.F. 1989 Generating context-sensitive responses to object-related misconceptions. Artif. Intell. 41, 157-195]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  120. Y, M. D 1992. Development of a spelling li~t. IEEE Trans_ Comrnun. COM-30, i (Jan.), 91 99.]]Google ScholarGoogle Scholar
  121. L.G. 1988. Cn yur cmputr reed ths. In Proceedinss of the 2nd Applzed Natural Language Processing Conference (Austin, Tex, Feb.). ACL, 93-100.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  122. S., HAYES, P. J., AND FAIN J. 1985. Controlling search in fiemble parsing. In Proceedings of the Internatzonal Jmnt Conference on Artificml Intelhgence. Morgan Kaufman, San Marco, Calif., 786-787.]]Google ScholarGoogle Scholar
  123. R. 1987. Spelhng checkers, spelling correctors, and the misspellings of poor spellers. Inf. Process. Manage. 23, 5, 495-505.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  124. R. 1986. A partial-dictionary of English in computer-usable form. Lit. Ling. Comput. 1, 4, 214 215.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  125. R. 1985. A collection of computer-readable corpora of English spelling errors. Cog. Neuropsychol. 2, 3,275-279.]]Google ScholarGoogle ScholarCross RefCross Ref
  126. AND FRAENKEL, A. S. 1982a. Retrieval in an environment of faulty texts or faulty queries. In Proceedings of the 2nd International Conference on Improving Database Usabihty and Responsiveness (Jerusalem), P. Scheuerman, Ed. Academic Press, New York, 405-425.]]Google ScholarGoogle Scholar
  127. AND FRAENKEL, A. S. 1982b. A hash code method for detecting and correcting spelling errors. Commun. ACM 25, 12 (Dec.), 935 938.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  128. H.L. 1970. Spelling correction in systems programs. Commun. ACM 13, 2 (Feb.), 90-94.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  129. R., AND CHERRY, L.L. 1975. Computer detection of typographical errors. IEEE Trans. Profess. Commun. PC-18, 1, 54-63.]]Google ScholarGoogle Scholar
  130. E., JR., AND THARP, A.L. 1977. Correcting human error in alphanumeric terminal input. Inf. Process. Manage. 13, 329-337.]]Google ScholarGoogle ScholarCross RefCross Ref
  131. ER, G. L. 1966. Introduction to Dynamic Programming. Wiley, New York.]]Google ScholarGoogle Scholar
  132. J., PHILLIPS, V. L., AND DUMAIS, S. T. 1992. Retrieving imperfectly recognized handwritten notes. Behav. Inf. Teeh.]]Google ScholarGoogle Scholar
  133. M. K., AND RUSSELL, R. C. 1918. U.S. Patent Numbers, 1,261,167 (1918) and 1,435,663 (1922). U.S. Patent Office, Washington, D.C.]]Google ScholarGoogle Scholar
  134. T., TANAKA, E., AND KASAI, T. 1976. A method of correction of garbled words based on the Levenshtein metric. IEEE Trans. Comput. 25, 172-177.]]Google ScholarGoogle Scholar
  135. T., MACHI, F., EVANS, B., AND TOM, J. 1988. Computational techniques for improved name search. In Proceedings of the 2nd Annual Applied Natural Language Conference (Austin, Tex, Feb.). ACL, 203-210.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  136. E, K., CHIGNELL, M., KHOSHAFIAN, S., AND WONG, H. 1990. Intelligent databases. A/ Expert, (Mar.), 38 47.]]Google ScholarGoogle Scholar
  137. ON, J. L. 1980. Computer programs for detecting and correcting spelling errors. Commun. ACM 23, 12, (Dec.), 676-684.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  138. PETERSON, J.L. 1986. A note on undetected typing errors. Commun. ACM 29, 7 (July), 633-637.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  139. POLLOCK, J. J., AND ZAMORA, A. 1983. Collection and characterization of spelling errors in scientific and scholarly text. J. Amer. Soc. Inf. Sci. 34, 1, 51 58.]]Google ScholarGoogle ScholarCross RefCross Ref
  140. POLLOCK, J. J., AND ZAMO~, A. 1984. Automatic spelling correction in scientific and scholarly text. Commun. ACM 27, 4 (Apr.), 358-368.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  141. RAMSaAW, L. A. 1989. Pragmatic knowledge for resolving ill-formedness. Tech. Rep. No. 89-18, BBN, Cambridge, Mass.]]Google ScholarGoogle Scholar
  142. RHYNE, J. R., AND WOLF, C. G. 1991. Paperlike user interfaces. RC 17271 (#76097), IBM Research Division, T. J. Watson Research Center, Yorktown Heights, N.Y.]]Google ScholarGoogle Scholar
  143. RHYNE, J. R., AND WOLF, C. G. 1993. Recognition-based user interfaces. In Advances m Human-Computer Interaction, vol. 4, H. R. Hartson and D. Hix, Ed. Ablex, Norwood, N.J.]]Google ScholarGoogle Scholar
  144. RICHARDSON, S. D., AND BRADEN-HARDER, L. C. 1988. The experience of developing a largerscale natural language text processing system: CRITIQUE. In Proceedings of the 2nd Annual Applied Natural Language Conference, (Austin, Tex. Feb.). ACL, 195-202.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  145. E. M., AND HANSON, A.R. 1974. A contextual postprocessing system for error correction using binary n-grams. IEEE Trans. Cornput. C-23, (May), 480-493.]]Google ScholarGoogle Scholar
  146. ROBERTSON, A. M., AND WILLETT, P. 1992. Searching for historical word-forms in a database of 17th-century English text using spelling-correction methods. In Proceedings of the 15th Annual International SIGIR Meeting, SIGIR'92 (Denmark, June). ACM, New York, 256-265.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  147. ROSENFELD, A., HUMMEL, R. A., AND ZUCKER, S. W. 1976. Scene labeling by relaxation operations. IEEE Trans. Syst. Man Cybernet. SMC-6, 6, 420-433.]]Google ScholarGoogle ScholarCross RefCross Ref
  148. RUMELHART, D. E., AND MCCLELLAND, J.L. 1982. An interactive activation model of context effects in letter perception. Psychol. Rev. 89, 1, 60-94.]]Google ScholarGoogle ScholarCross RefCross Ref
  149. RUMELHART, D. E., HINTON, G. E., AND WILLIAMS, R. J. 1986. Learning internal representations by error propagation. In Parallel Distnbuted Processing: Explorations in the Microstructure of Cognition, D. E. Rumelhart and J. L. McClelland, Ed. Bradford Books/MIT Press.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  150. SALTON, G. 1989. Automatic text transformations. In Automatic Text Processing: The Transformahon, Analysis and Retrieval of Information by Computer. Addison-Wesley, Reading, Mass.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  151. SAMPSON, G. 1989. How fully does a machineusable dictionary cover English text. Lit. Ling. Comput. 4, 1, 29-35.]]Google ScholarGoogle ScholarCross RefCross Ref
  152. SANKOFF, D., AND KRUSKAL, J. B. 1983. Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley, Reading, Mass.]]Google ScholarGoogle Scholar
  153. SANTOS, P. J., BALTZER, A. J., BADRE, A. N., HENNE- MAN. R. L.. AND MILLER. M. S. 1992. On handwriting recognition system performance: Some experimental results. In Proceedings of the Human Factors Soctety 36th Annual Meeting (Atlanta, Ga., Oct. 12-16). Human Factors Society.]]Google ScholarGoogle ScholarCross RefCross Ref
  154. SCHANK, R. C., LEBOWITZ, M., AND BIRNBAUM, L. 1980. An integrated understander. Am. J. Comput. Ltng. 6, 1, 13 30.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  155. SHELL, B. A. 1978. Median split trees. A fast look-up technique for frequently occurring keys. Commun. ACM 21, 11 (Nov.), 947-958]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  156. SH~NOHAL, R, AND TOUSSAINT, G. T 1979a Experiments in text recognition with the modified Viterbi algorithm. IEEE Trans Patt. Anal. Machine Intell. PAMI-1, 4 (Apr), 184 193.]]Google ScholarGoogle Scholar
  157. SHiNGHAL, R., AND TOUSSAINT, G.T. 1979b. A bottom-up and top-down approach to using context in text recognition. Dzt. J. Man-Machine Stud. 11,201 212.]]Google ScholarGoogle ScholarCross RefCross Ref
  158. SIDOROV, A.A. 1979. Analysis of word similarity on spelling correction systems. Program. Cornput. Softw 5, 274 277.]]Google ScholarGoogle Scholar
  159. SINHA, R. M. K., AND PRASADA, B. 1988. Visual text recognition through contextual processing. Port. Recog. 21, 5, 463 479.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  160. SITAR, E.J. 1961. Machine recognition of cursive script: The use of context for error detection and correction. Bell Labs Tech. Mem.]]Google ScholarGoogle Scholar
  161. SLEATOR, D. a., AND TEMPERLY, a. 1992. ParsLng Enghsh with a Link Grammar. Source code via internet host: spade.pc.cs.cmu.edu:/usr/ sleator/pubhc. Carnegie-Mellon Univ., Pittsburgh, Pa.]]Google ScholarGoogle Scholar
  162. SMADJA, F. 1991a From n-grams to collocations: An evaluation of XTRACT. In Proceedzngs of the 29th Ahnual Meetzng of the Assoczatlon for Computational Linguistics (Berkeley, Calif., June). ACL, 279 284.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  163. SMaDJA, F. 1991b. Extracting collocations from text. An apphcation: Text Generation. Ph.D. dissertation, Columbia Umv., New York.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  164. SMADJA, F., AND McKEOWN, K. 1990. Automatically extracting and representing collocations for language generation. In Proceedings of the 28th Annual Meeting of the Association for Computational LlnguLetics, (Pittsburgh, Pa., June). ACL, 252-259.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  165. SPENKE, M., BEILKEN, C., MATTERN, F., MEVENKAMP, M., AND H. M. 1984. A language independent error recovery method for LL(1) parsers. Softw. Pract. Exp. 14, 11.]]Google ScholarGoogle ScholarCross RefCross Ref
  166. SRItlARI, S., El). 1984. Computer Text Recognitzon and Error Correctwn. IEEE Computer Society Press, Plscataway, N.J]]Google ScholarGoogle Scholar
  167. SRIHARI, S. N., HULL, J. J., AND CHOUDHARI. R. 1983. Integrating diverse knowledge sources in text recognition. ACM Trans. Office Inf. Syst. 1, i (Jan.), 68-87.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  168. SuRL L. Z. 1991. Language transfer: A foundation for correcting the written English of ASL signers. Tech. Rep. No. 91-19, Dept. of Computer and Information Sciences, Univ. of Delaware, Newark, Del.]]Google ScholarGoogle Scholar
  169. SuRL L. Z., AND McCoY, K. F. 1991. Language transfer in deaf writing: A correction methodology for an instructional system. Tech. Rep. No. 91-20, Dept. of Computer and Information Sciences, Univ. of Delaware, Newark, De}.]]Google ScholarGoogle Scholar
  170. TAYLOR, W D. 1981. GROPE--A spelling error correction tool. AT & T Bell Labs Tech. Mere.]]Google ScholarGoogle Scholar
  171. TENCZAR, P., AND GOLDEN, W. 1972. CERL Report X-35. Computer-Based Educatmn Research Lab., Umv of Ilhnois, Urbana, Ill.]]Google ScholarGoogle Scholar
  172. THOMPSON, B. H. 1980. Linguistic analysis of natural language communication with computers. In Proceedings of the 8th Internatzonal Conference on Computational Llnguistzcs (Tokyo, Japan), 190 201.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  173. TOUSSAINT, G T. 1978. The use of context in pat-tern recognition. Patt Recog. 10, 189 204.]]Google ScholarGoogle ScholarCross RefCross Ref
  174. TR^WICK, D J. 1983. Robust sentence analysis and habitability. Ph.D dissertation, California Inst. of Technology, Pasadena. Calif.]]Google ScholarGoogle Scholar
  175. TROY, P. L. 1990 Combining probabilistic sources with lexical distance measures for spelhng correction. Bellcore Tech Memo., Bellcore, Morristown, N.J.]]Google ScholarGoogle Scholar
  176. TSAO, Y. C. 1990. A lexical study of sentences typed by hearing-impaired TDD users. In Proceed~ngs of the 13th International Symposium on Human Factors in Telecommun~catzons (Turin, Italy, Sept ), 197 201.]]Google ScholarGoogle Scholar
  177. TURBA, T.N. 1981. Checking for spelling and ty pographical errors in computer-based text. SIGPLAN-SIGOA Newslett. (June), 51-60.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  178. ULLMANN, J.R. 1977 A binary n-gram technique for automatic correction of substitution, deletion, insertion and reversal errors in words. Cornput J. 20, 141-147.]]Google ScholarGoogle Scholar
  179. VAN BERKEL, B., AND DESMEI)T, K. 1988 Triphone analysis' A combined method for the correction of orthographical and typographical errors. In Proceedings of the 2nd Apphed Natural Language Processing Conference (Austin, Tex., Feb.). Association for Computational Linguistics (ACL).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  180. VERONIS, J. 1988a. Computerized correction of phonographic errors. Comput. Hum. 22, 43-56.]]Google ScholarGoogle ScholarCross RefCross Ref
  181. VERONIS, J 1988b. Morphosyntactic correction in natural language interfaces, in Proceedings of the 12th Iaternat~onal Conference on Computattonal Ltngu~st~cs (Budapest, Hungary), 708 713]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  182. VOSSE, T. 1992. Detecting and correcting morpho-syntactic errors m real texts. In Proceedlngs of the 3rd Conference on Applied Natural Language Processing (Trento, Italy, Mar. 31 Apr.3). ACL, 111-118.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  183. W^GNER, R.A. 1974. Order-n correction for regular languages. Commun. ACM 17, 5 (May), 265 268.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  184. WAGNER, R. A., ANI~ F~aCnER~ M. J 1974. The stnng-to-string correction problem. J ACM21, I (Jan.), 168 178.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  185. WALKE~, D. E. 1991. The ecology of language. In Proceedings of the International Workshop on Electronic D~ctzonarzes (Feb.). Japan Electronic Dictionary Research Institute, Tokyo, 10-22.]]Google ScholarGoogle Scholar
  186. WALKER, D. E., AND AMSLER, R.A. 1986. The use of machine-readable dictionaries in sublanguage analysis. In Analyzing Language ~n Restricted Domains: Sublanguage Description and Processing. Lawrence Erlbaum, Hillsdale, N.J., 69-83.]]Google ScholarGoogle Scholar
  187. WALTZ, D. L. 1978. An English language question answering system for a large relational database. Commun. ACM 21, 7, 526-539.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  188. Webster's New World Misspeller's Dictionary. Simon and Schuster, New York.]]Google ScholarGoogle Scholar
  189. WEISCHEDEL, R. M., AND SONDHEIMER, N.K. 1983. Meta-rules as a basis for processing ill-formed input. Amer. J. Comput. Ling. 9, 3-4 (July-Dec.), 161-177.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  190. WING, A. M., AND BADDELEY, A.D. 1980. Spelling errors in handwriting: A corpus and distributional analysis. In Cognitive Processes in Spelhng, U. Frith, Ed. Academic Press, London.]]Google ScholarGoogle Scholar
  191. WONG, C. K., AND CHANDRA, A.K. 1976. Bounds for the string editing problem. J. ACM 23, 1 (Nov.), 13-16.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  192. WRIGHT, h. G., AND NEWELL, A. F. 1991. Computer help for poor spellers. Brit. J. Educ. Tech. 22, 2 (Feb.), 146 148.]]Google ScholarGoogle ScholarCross RefCross Ref
  193. YANNAKOUDAKIS, E. J., AND FAWTHROP, D. 1983a. An intelligent spelling correcter. Inf. Process. Manage. 19, 12, 101-108.]]Google ScholarGoogle ScholarCross RefCross Ref
  194. YANNAKOUDAKIS, E. J., AND FAWTHROP, D. 1983b. The rules of spelling errors. Inf. Process. Manage. 19, 2, 87 99.]]Google ScholarGoogle ScholarCross RefCross Ref
  195. YOUNG, C. W., EASTMAN, C. M., AND OAKMAN, R. L. 1991. An analysis of ill-formed input in natural language queries to document retrieval systems. Inf. Process. Manage. 27, 6, 615-622.]]Google ScholarGoogle ScholarCross RefCross Ref
  196. ZA~IORA, E. M., POLLOCK, J. J., AND ZAMORA, A. 1981. The use of trigram analysis for spelling error detection. Inf. Process. Manage. 17, 6, 305-316.]]Google ScholarGoogle ScholarCross RefCross Ref
  197. ZIPF, G. K. 1935. The Psycho-Biology of Language. Houghton Mifflin, Boston.]]Google ScholarGoogle Scholar

Index Terms

  1. Techniques for automatically correcting words in text

          Recommendations

          Reviews

          Graeme J. Hirst

          It is often easy to tell when a poor speller or poor typist has used a spelling checker on a document: each word is correctly spelled, but not all are the words that the author intended. And optical character recognition of documents, with its occasional misrecognitions, has given the world a whole new source of spelling errors. Although spelling checkers (sometimes called “spell checkers” by people who need syntax checkers) have been available for many years now, there is still much room for improvement. In this paper, Kukich presents a careful and exhaustive survey of the techniques—many of them fascinating and ingenious—that have been developed for efficiently finding and correcting errors in spelling; she summarizes each method and its strengths and weaknesses. The problem divides into two parts: detecting an error, which might be a non-word or a wrong real word; and correcting such errors, either in isolation or in context. Non-word detection is the easiest form of the problem, and so the simplest spelling checkers are those that merely draw the user's attention to suspect words. The main techniques used are n -gram probabilities (for example, the trigram fkh has zero chance of occurring in an English word) and lexicons of correctly spelled words (which must be neither too big nor too small). Kukich finds the former better for detecting OCR errors, the latter better for human typing. To correct the possible error, once it is found, a set of candidate corrections must be generated and ranked. These may be presented to the user for the final judgment, or the substitution may be automatic. Kukich reviews a wide variety of methods—including minimum edit distance, similarity keys, the Viterbi algorithm, and neural nets—but finds none wholly satisfactory; in particular, neural nets, which might have been thought to be ideally suited to a problem of this kind, require a prohibitive amount of training. The hardest form of the problem is the detection and correction of erroneous real words, which generally requires some linguistic knowledge (and, in the worst case, a complete understanding of the meaning of the text). For example, a parser can determine when a real-word error causes a syntax error in the sentence; this technique is the basis for many grammar-based writer's aids, such as CRITIQUE [1]. Word bigram or trigram probabilities, derived from large text corpora, can improve other techniques. Because it admits so many different kinds of approaches, spelling checking is a problem that attracts an audience from many different subfields of computing. Despite much effort and many clever ideas, it remains far from solved. Kukich's review will become the definitive reference for work done up to this point; any computer scientist will enjoy reading it.

          Access critical reviews of Computing literature here

          Become a reviewer for Computing Reviews.

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader