Information Compression as a Unifying Principle in Human Learning , Perception , and Cognition

This paper reviews evidence for the idea thatmuch of human learning, perception, and cognitionmay be understood as information compression and often more specifically as “information compression via the matching and unification of patterns” (ICMUP). Evidence includes the following: information compression canmean selective advantage for any creature; the storage and utilisation of the relatively enormous quantities of sensory information would be made easier if the redundancy of incoming information was to be reduced; content words in natural languages, with theirmeanings,may be seen as ICMUP; other techniques for compression of information—such as class-inclusionhierarchies, schema-plus-correction, run-length coding, and part-whole hierarchies—may be seen in psychological phenomena; ICMUPmay be seen in how we merge multiple views to make one, in recognition, in binocular vision, in how we can abstract object concepts via motion, in adaptation of sensory units in the eye of Limulus, the horseshoe crab, and in other examples of adaptation; the discovery of the segmental structure of language (words and phrases), grammatical inference, and the correction of overand undergeneralisations in learning may be understood in terms of ICMUP; information compression may be seen in the perceptual constancies; there is indirect evidence for ICMUP in human cognition via kinds of redundancy such as the decimal expansion of π which are difficult for people to detect; much of the structure and workings of mathematics—an aid to human thinking—may be understood in terms of ICMUP; and there is additional evidence via the SP Theory of Intelligence and its realisation in the SP Computer Model. Three objections to the main thesis of this paper are described, with suggested answers. These ideas may be seen to be part of a “Big Picture” with six components, outlined in the paper.


Introduction
"Fascinating idea!All that mental work I've done over the years, and what have I got to show for it?A goddamned zipfile!Well, why not, after all?" (John Winston Bush, 1996).This paper describes empirical evidence for the idea that much of human learning, perception, and cognition may be understood as information compression.(This paper updates, revises, and extends the discussion in [1, Chapter 2] but with the main focus on human learning, perception, and cognition.)To be more specific, evidence will be presented that much of human learning, perception, and cognition may be understood as information compression via the discovery of patterns that match each other, with the merging or "unification" of two or more instances of any pattern to make one.References will also be made to the SP Theory of Intelligence and its realisation in the SP Computer Model in which information compression has a central role (Section 2.2.1).
Although this paper is primarily about information compression in human brains, it seems that similar principles apply throughout the nervous system and throughout much of the animal kingdom.Accordingly, this paper has things to say here and there about the workings of neural tissue outside the human brain and in nonhuman species.
1.1.Abbreviations.For the sake of brevity in this paper: "information compression" may be shortened to "IC"; the expression "information compression via the matching and unification of patterns" may be referred to as "ICMUP"; and "human learning, perception, and cognition" may be "HLPC".

Complexity
Figure 1: A schematic representation of the way two instances of the pattern "INFORMATION" in a body of raw data may be unified to form a single "unified" pattern or "chunk" of information, below the "raw data".Lower again in the figure, "w62" is added to the unified chunk as a relatively short identifier or "code".The lowest part of the figure shows how the raw data may be compressed by replacing each instance of "INFORMATION" with a copy of the short identifer.Adapted with permission from Figure 2.3 in [1].
The main thesis of this paper-that much of HLPC may be understood as IC-may be referred to as "ICHLPC".
For reasons given in Section 2.2, the name "SP" stands for Simplicity and Power.
The SP Theory of Intelligence, with its realisation in the SP Computer Model, may be referred to, together, as the SP System.1.2.Presentation.In this paper, the next section (Section 2) describes some of the background to this research and some relevant general principles; the next-but-one section (Section 3) describes related research; Sections 4 to 20 describe relatively direct empirical evidence in support of ICHLPC; and Section 21 summarises indirect support for ICHLPC via the SP Theory of Intelligence.
Appendix A, referenced from Section 2.3 and elsewhere, gives some mathematical details related to ICMUP and the SP System.
Appendix B, referenced from Section 3.1.1and elsewhere, describes Horace Barlow's change of view about the significance of IC in mammalian learning, perception, and cognition, with comments.
Appendix C, referenced from Section 22 and elsewhere, describes apparent contradictions of ideas in this paper and how they may be resolved.

Background and General Principles
This section provides some background to this paper and summarises some general principles that have a bearing on ICHLPC and the programme of research of which this paper is a part.

Seven Variants of "Information Compression via the Matching and Unification of Patterns" (ICMUP)
. This subsection fills out the concept of ICMUP, starting with the essentials, described in Section 2.1.1,next.Six variants of the basic idea are described in Sections 2.1.2to 2.1.7.
While care has been taken in this programme of research to avoid unnecessary duplication of information across different publications, the importance of the following seven variants of ICMUP has made it necessary, for the sake of clarity, to describe them quite fully both in this paper and also in [2].

Basic ICMUP.
The main idea in ICMUP is illustrated in the top part of Figure 1.Here, a stream of raw data may be seen to contain two instances of the pattern "INFORMATION".Subjectively, we "see" this immediately.But, in a computer or a brain, the discovery of that kind of replication of patterns must necessarily be done by some kind of searching for matches between patterns.
In itself, the detection of repeated patterns is not very useful.But by merging or "unifying" the two instances of "INFORMATION" in Figure 1, we may create the single instance shown below the raw data, thus achieving some compression of information in the raw data (Appendix A.1).
Other relevant points include the following: (i) Repetition of patterns and "redundancy" in information.From the perspective of ICMUP, the concept of redundancy in information may be seen as the occurrence of two or more arrays of symbols that match each other.As noted in Section 2.2.2 below, redundancy may take the form of good partial matches between patterns as well as exact matches between patterns.
(ii) A threshold on frequency of occurrence.With regard to the previous point, an important qualification is that, for a given repeating array of symbols, A, to represent redundancy within a given body of information, I, A's frequency of occurrence within I must be higher than what would be expected by chance for an array of the same size [ (iii) Frequencies and sizes of patterns.In connection with the preceding point, the minimum frequency needed to exceed the threshold is smaller for large patterns than it is for small patterns.Contrary to the common assumption that large frequencies are needed to attain statistical significance, frequencies as small as 2 can be statistically significant with patterns of quite moderate size or larger; and large patterns of a given frequency yield more compression than small ones of the same frequency (Appendix A.1 [1, Section 2.2.8.4]).
(iv) The concept of a "chunk" of information.A discrete pattern like "INFORMATION" is often referred to as a chunk of information, a term that gained prominence in psychology largely because of its use by George Miller in his influential paper The magical number seven, plus or minus two [3].
Miller did not use terms like "unification" or "IC", and he sees some uncertainty in the significance of the concept of a chunk: "The contrast of the terms bit and chunk also serves to highlight the fact that we are not very definite about what constitutes a chunk of information" (p.93, emphasis in the original).However, he describes how chunking of information may achieve something like compression of information: ". . .we must recognize the importance of grouping or organizing the input sequence into units or chunks.Since the memory span is a fixed number of chunks, we can increase the number of bits of information that it contains simply by building larger and larger chunks, each chunk containing more information than before" (p.93, emphasis in the original) and " . . . the dits and dahs are organized by learning into patterns and . . .as these larger chunks emerge the amount of message that the operator can remember increases correspondingly" (p.93, emphasis in the original).

(v) Basic ICMUP means lossy compression of information.
A point to notice about basic ICMUP of a body of information, I, is that, without the code mentioned above, it must always be "lossy", meaning that nonredundant information in I will be lost.This is because, in the unification of two or more matching patterns in I, information is lost about the location of the following: (1) all but one of those patterns if the unified chunk is stored in one of the original locations within I or alternatively (2) all of those patterns if the unified chunk is stored outside I.

Chunking-with-Codes.
The key idea with the chunkingwith-codes variant of ICMUP is that each unified chunk of information (Section 2.1.1)receives a relatively short name, identifier, or code, and that code is used as a shorthand for the chunk of information wherever it occurs.
As already noted, this idea is illustrated in Figure 1, where, in the middle of the figure, the relatively short code or identifier "w62" is attached to a copy of the "chunk" "INFORMATION", and we may suppose that the pairing of code and unified chunk would be stored in some kind of "dictionary", separate from the main body of data.Then, under the heading "Compressed data" at the bottom of the figure, each of the two original instances of "INFORMATION" is replaced by the short code "w62" yielding an overall compression of the original data.
Examples of chunking-with-codes from this paper are the use of "ICMUP" as a shorthand for "information compression via the matching and unification of patterns" and "HLPC" as a shorthand for "human learning, perception, and cognition".
The chunking-with-codes variant of ICMUP overcomes the weakness of basic ICMUP noted at the end of Section 2.1.1:that it loses nonredundant information about the locations of chunks in the original data, I.The problem may be remedied with chunking-with-codes because copies of the code for a given chunk may be used to mark the locations of each instance of the chunk within I.
Another point of interest is that, with the chunkingwith-codes technique, compression of information may be optimised by assigning shorter codes to more frequent chunks and longer codes to rarer chunks, in accordance with some such scheme as Shannon-Fano-Elias coding [4,Section 5.9].
Similar principles may be applied in the other variants of ICMUP described in Sections 2.1.3to 2.1.7below.

Schema-Plus-Correction.
The schema-plus-correction variant of ICMUP is like chunking-with-codes but the unified chunk of information may have variations or "corrections" on different occasions.
An example from everyday life is a menu in a restaurant or café.This provides an overall framework, something like "starter, main course, pudding" which may be seen as a chunk of information.Each of the three elements of the menu may be seen as a place where each customer may make a choice or "correction" to the menu.For example, one customer may choose "starter(soup), main course(fish), pudding (apple pie)" while another customer may choose "starter(salad) main course(vegetable hotpot) pudding(ice cream)", and so on.
The schema-plus-correction variant of ICMUP may achieve compression of information via two mechanisms: (i) The schema may itself have a short code.In our menu example, each menu may have a short code such as "bm" for the breakfast menu, "lm" for the lunch-time menu, and so on.
(ii) Each "correction" may have a short code.Again with our menu example, options such as "soup", "fish", and so on may each have a short code such as "s" for soup, "f" for fish, and so on.

Run-Length Coding.
The run-length coding variant of ICMUP may be used with any sequence of two or more copies of a pattern where each copy except the first one follows immediately after the preceding copy.In that case, it is only necessary to record one copy of the pattern with the number of copies or with symbols or "tags" to mark the start and end of the sequence.
For example, a repeated pattern like may be reduced to something like "INFORMATION(×5)" (where "×5" records the number of instances of "INFORMATION").Alternatively, the sequence may be reduced to something like "p INFORMATION* #p", where "*" means that the pattern "INFORMATION" is repeated an unspecified number of times, and "p . . .#p" specifies where the sequence begins and where it stops.

Class-Inclusion Hierarchy with Inheritance of Attributes.
With the class-inclusion hierarchy variant of ICMUP, there is a hierarchy of classes and subclasses, with "attributes" at each level.At every level except the top level, each subclass "inherits" the attributes of all the higher levels.For example, in simplified form, the class "motorised vehicle" contains subclasses like "road vehicle" and "rail vehicle"; the class "road vehicle" contains subclasses like "bus", "lorry", and "car", and so on.An attribute like "contains engine" would be assigned to the top level ("vehicle") and would be inherited by all lower-level classes, thus avoiding the need to record that information repeatedly at all levels in the hierarchy and likewise for attributes at lower levels.Thus a class-inclusion hierarchy with inheritance of attributes combines IC with inference, in accordance with the close relation between those two things, noted in Section 2.5.
Of course there are many subtleties in the way people use class-inclusion hierarchies, such as cross-classification, "polythetic" or "family resemblance" concepts (in which no single attribute is necessarily present in every member of the given category and there need be no single attribute that is exclusive to that category [5]), and the ability to recognise that something belongs in a class despite errors of omission, commission, or substitution.The way in which the SP System can accommodate those kinds of subtleties is discussed in [1, Sections 2.3.2,6.4.3, 12.2, and 13.4.6.2].

Part-Whole Hierarchy with Inheritance of Contexts.
The part-whole hierarchy variant of ICMUP is like a classinclusion hierarchy with inheritance of attributes except that the hierarchical structure represents the parts and subparts of some class or entity, and any given part inherits information about the context which it shares with all its siblings on the same level.A part-whole hierarchy promotes economy by sidestepping the need for each part of an entity at any given level to store full information about the higher-level structures of which it is a part-which is the same as other parts on the same level.
A simple example is the way that a "person" has parts like "head", "body", "arms", and "legs", while an arm may be divided into "upper arm", "forearm", "hand", and so on.In a structure like this, inheritance means that if one hears that a given person has an injury to his or her hand, one can infer immediately that that person"s "arm" has been injured and indeed his or her whole "person".

SP-Multiple-Alignment as a Generalised Version of ICMUP.
The seventh of the versions of ICMUP considered in this paper is the concept of SP-multiple-alignment, described in Section 2.2.2 below.
SP-multiple-alignment may be seen to be a generalised version of ICMUP which encompasses the other six versions described in Sections 2.1.1 to 2.1.6.
This versatility in modelling other versions of ICMUP is not altogether surprising since SP-multiple-alignment is largely responsible for the SP System's versatility in diverse aspects of intelligence (including diverse kinds of reasoning), in the representation of diverse kinds of knowledge, and its potential for the seamless integration of diverse aspects of intelligence and diverse kinds of knowledge, in any combination (Section 2.2.5).

The SP Theory of Intelligence.
Readers will see that the paper contains references to the SP Theory of Intelligence, its realisation in the SP Computer Model, and associated ideas, especially the concept of SP-multiple-alignment.But it must be emphasised that the SP Theory is not the main focus of the paper.Instead it is relevant for subsidiary reasons:  For those reasons, an outline of the theory is appropriate here.

Outline of the SP Theory of Intelligence: Introduction.
The SP Theory of Intelligence and its realisation in the SP Computer Model-the SP System-is a unique attempt to simplify and integrate observations and concepts across artificial intelligence, mainstream computing, mathematics, and human learning, perception, and cognition, with IC as a unifying theme.This broad scope for the SP programme of research has been adopted for reasons summarised in Section 2.6 below.
As mentioned in Section 1.1, the name "SP" stands for Simplicity and Power.This is because compression of any given body of information, I, may be seen as a process of reducing informational "redundancy" in I and thus increasing its "simplicity", while retaining as much as possible of its nonredundant expressive "power".
The SP Theory, the SP Computer Model, and some applications are described quite fully in [6] and much more fully in [1].Details of other publications about the SP System, most with download links, may be found on http://www .cognitionresearch.org/sp.htm.A download link for the source code of SP71, the latest version of the SP Computer Model, may be found under the heading "SOURCE CODE" near the bottom of that page.
The SP Theory is conceived as a brain-like system as shown schematically in Figure 2. The system receives New information via its senses and stores some or all of it in compressed form as Old information.
All kinds of knowledge or information in the SP System are represented with arrays of atomic SP-symbols in one or two dimensions called SP-patterns.At present, the SP Computer Model works only with one-dimensional SPpatterns but it is envisaged that, at some stage, it will be generalised to work with two-dimensional SP-patterns.

SP-Multiple-Alignment. A central part of the SP System
is the powerful concept of SP-multiple-alignment, outlined here.The concept is described more fully in [6,Section 4] and [1,Sections 3.4 and 3.5].
The concept of SP-multiple-alignment in the SP System is derived from the concept of "multiple sequence alignment" in bioinformatics (see, e.g., [7]).That latter concept means an arrangement of two or more DNA sequences or sequences of amino-acid residues so that, by judicious "stretching" of sequences in a computer, symbols that match from row to row are aligned-as illustrated in Figure 3.A "good" multiple sequence alignment is one with a relatively high value for some metric related to the number of symbols that have been brought into line.
For a given set of sequences, finding or creating "good" multiple sequence alignments amongst the many possible "bad" ones is normally a complex process-normally too complex to be solved by exhaustive search.For that reason, bioinformatics programs for finding good multiple sequence alignments use heuristic methods, building multiple sequence alignments in stages and discarding lowscoring multiple sequence alignments at the end of each stage, with backtracking or something equivalent to improve the robustness of the search.
With such methods, it is not normally possible to guarantee that the best possible multiple sequence alignment has been found, but it is normally possible to find multiple sequence alignments that are good enough for practical purposes.
The two main differences between the concept of SPmultiple-alignment in the SP System and the concept of multiple sequence alignment in bioinformatics are the following: (i) New and Old information.With an SP-multiplealignment, one of the SP-patterns (sometimes more than one) is New information from the system's environment (see Figure 2), and the remaining SPpatterns are Old information, meaning information that has been previously stored (also shown in Figure 2).
(ii) Encoding New information economically in terms of Old information.In the creation of SP-multiplealignments, the aim is to build ones that, in each case, allow the New SP-pattern (or SP-patterns) to be encoded economically in terms of the Old SPpatterns in the given SP-multiple-alignment.In each case, there is an implicit merging or unification of SPpatterns or parts of SP-patterns that match each other, as described in [6, Section 4.1] and [1, Section 3.5].
In the SP-multiple-alignment shown in Figure 4, one New SP-pattern is shown in row 0, and Old SP-patterns, drawn from a repository of Old SP-patterns, are shown in rows 1 to 9. By convention, the New SP-pattern(s) is always shown in row 0 and the Old SP-patterns are shown in the other rows, one SP-pattern per row.
In this example, the New SP-pattern is a sentence and the Old SP-patterns in rows 1 to 9 represent grammatical structures including words.The overall effect of the SPmultiple-alignment is to "parse" or analyse the sentence into its constituent parts and subparts, with each part marked with a category like "NP" (meaning "noun phrase"), "N" (meaning "noun"), "VP" (meaning "verb phrase"), and so on.
But, as described in Section 2.2.5, the SP-multiple-alignment construct can do much more than parse sentences.
Each SP-multiple-alignment is evaluated in terms of how it provides for the New SP-pattern in row 0 to being encoded economically in terms of the Old SP-patterns in the other rows.An SP-multiple-alignment is "good" if the encoding is indeed economical.Details of how this is done are described in Appendix A. 4.
With SP-multiple-alignments in the SP System, as with multiple sequence alignments in bioinformatics, the process of finding "good" SP-multiple-alignments is too complex for exhaustive search, so it is normally necessary to use heuristic methods-which means that, as before, the best possible results may be missed but it is normally possible to find SPmultiple-alignments that are reasonably good.
At the heart of SP-multiple-alignment is a process for finding good full and partial matches between SP-patterns, described quite fully in [1, Appendix A].As in the building of SP-multiple-alignments, heuristic search is an important part of the process of finding good full and partial matches between SP-patterns.Some details with relevant calculations are given in Appendix A.8.
As noted in Section 2.1.7,the concept of SP-multiplealignment may be seen to be a generalised version of ICMUP, which encompasses all the other six variants of ICMUP described in Section 2.1.

Unsupervised Learning in the SP System.
Unsupervised learning in the SP System is described in [6, Section 5] and [1,Chapter 9].In brief, it means searching for one or more collections of Old SP-patterns called grammars which are relatively good for the economical encoding of a given set of New SP-patterns.
As with the building of SP-multiple-alignments (Section 2.2.2) and the process of finding good full and partial matches between SP-patterns [1, Appendix A] and many other AI programs, unsupervised learning in the SP System uses heuristic techniques: doing the search in stages and, at each stage, concentrating the search in the most promising areas and cutting out the rest.Some of the details of relevant calculations are given in Appendix A.7.
As mentioned in Section 2.2.4,learning in the SP System is quite different from the popular "Hebbian" learning, often characterised as "Cells that fire together wire together", and it is quite different from how deep learning systems learn.(Hebb's original version of his learning rule is "When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A's efficiency, as one of the cells firing B, is increased."[9, p. 62]) 2.2.4.SP-Neural.Functionality that is similar to that of the SP System may be realised in a "neural" sister to the SP System called SP-Neural, expressed in terms of neurons and their interconnections [8], as illustrated in Figure 5.Although the main elements of SP-Neural have been defined, there are details to be filled in.As with the development of the SP Theory itself, it is likely that many insights may be gained by building computer models of SP-Neural.
An important point here is that SP-Neural is quite different from the kinds of "artificial neural network" that are popular in computer science, including those that provide the basis for "deep learning" [10].
It is relevant to mention that Section V of [11] describes thirteen problems with deep learning in artificial neural networks and how, with the SP System, those problems may be overcome.The SP System also provides a comprehensive solution to a fourteenth problem with deep learning-"catastrophic forgetting"-meaning the way in which new learning in a deep learning system wipes out old memories.
Probably, SP-Neural's closest relative is Donald Hebb's [9] concept of a "cell assembly" but, since learning in SP-Neural is likely to be modelled on learning in the SP System (Section 2.2.3), it will be quite different from Hebbian learning and also quite different from learning in deep learning systems.More loosely, SP-Neural, when it is more fully developed, is likely to bear a superficial resemblance to Alan Turing's concept of an "unorganised" machine [12] because its neural tissues would become progressively more organised as it learns.  .Each broken-line rectangle with rounded corners represents a pattern assembly-corresponding to an SP-pattern in the SP Theory.Each character or group of characters enclosed in a solid-line ellipse represents a neural symbol corresponding to an SP-symbol in the SP Theory.The lines between pattern assemblies represent nerve fibres with arrows showing the direction in which impulses travel.Neural symbols are mainly symbols from linguistics such as "NP" meaning "noun phrase", "D" meaning a "determiner", "#D" meaning the end of a determiner, and "#NP" meaning the end of a noun phrase.Reproduced with permission from Figure 3 in [8].

Strengths and
(iii) Versatility in the representation and processing of knowledge.The SP System has strengths in the representation and processing of several different kinds of knowledge including the syntax of natural languages; class-inclusion hierarchies (with or without cross classification); part-whole hierarchies; discrimination networks and trees; if-then rules; entity-relationship structures; relational tuples; and concepts in mathematics, logic, and computing, such as "function", "variable", "value", "set", and "type definition".With the addition of two-dimensional SP-patterns to the SP System, there is potential to represent such things as photographs, diagrams, structures in three dimensions, and procedures that work in parallel.
(iv) Seamless integration of diverse aspects of intelligence and diverse kinds of knowledge, in any combination.
Because the SP System's versatility (in diverse aspects of intelligence and in the representation of diverse kinds of knowledge) flows from one relatively simple framework-SP-multiple-alignment-the system has clear potential for the seamless integration of diverse aspects of intelligence and diverse kinds of knowledge, in any combination.That kind of seamless and 'semantic' .integration appears to be essential in modelling the fluidity, versatility, and adaptability of the human mind.
Figure 6 shows schematically how the SP System, with SP-multiple-alignment centre stage, exhibits versatility and integration.
There are more details in [6] and much more details in [1].Distinctive features and advantages of the SP System are described quite fully in [11].
How absolute and relative probabilities for SP-multiplealignments may be calculated (for use in reasoning and other aspects of AI) is detailed in Appendix A.6.

Potential Benefits and Applications of the SP System.
Apart from its strengths and potential in modelling aspects of the human mind, it appears that, in more humdrum terms, the SP System has several potential benefits and applications.These include helping to solve nine problems with big data, helping to develop intelligence in autonomous robots, development of an intelligent database system, medical diagnosis, computer vision and natural vision, suggesting avenues for investigation in neuroscience, commonsense reasoning, and more.Details of relevant papers, with download links, may be found on http://www.cognitionresearch.org/sp.htm.

Avoiding Too Much Dependence on Mathematics.
Many approaches to IC have a mathematical flavour (see, e.g., [14]).Much the same is true of concepts of inference and probability which, as outlined in Section 2.5, are closely related to IC.
In the SP programme of research, the orientation is different.The SP Theory attempts to get below or behind the mathematics of other approaches to IC and to concepts of inference and probability: it attempts to focus on ICMUP, the relatively simple, "primitive" idea that information may be compressed by finding two or more patterns that match each other, and merging or "unifying" them so that multiple instances of the pattern are reduced to one.
That said, there is some mathematics associated with ICMUP, and there is some more which is incorporated in the SP Computer Model.They are described in Appendix A and referenced at appropriate points throughout this paper.
There are four main reasons for this focus on ICMUP and the avoidance of too much dependence on mathematics: (ii) Do not use mathematics in describing the foundations of mathematics.The SP Theory aims to be, amongst other things, a theory of the foundations of mathematics [2], so it would not be appropriate for the theory to be too dependent on mathematics.
(iii) The SP Theory is not founded on the concept of a universal Turing machine.While the SP Theory has benefitted from valuable insights gained from mathematically oriented research on Algorithmic Probability Theory, Algorithmic Information Theory, and related work (Section 3.2), it differs from that work in that it is not founded on the concept of a "universal Turing machine".
Instead, a focus on ICMUP has yielded a new theory of computing and cognition, founded on ICMUP and SPmultiple-alignment, with the generality of the universal Turing machine [1, Chapter 4] but with strengths in the modelling of human-like intelligence which, as Alan Turing acknowledged [12,15], are missing from the universal Turing machine (Section 2.2.5, [1,6]).
(iv) ICMUP not obvious in such techniques as as wavelet compression and arithmetic coding.At some abstract level, it may be that all mathematically based techniques for compression of information are founded on ICMUP.And if the thesis of [2] is true, then all such techniques will indeed have an ICMUP foundation.But, nevertheless, techniques for the compression of information such as wavelet compression or arithmetic coding seem far removed from the simple idea of finding patterns that match each other and merging them into a single instance.
The SP System, including the concepts of SP-multiplealignment and ICMUP, provides a novel approach to concepts of IC and probability (Section 2.5) which appears to have potential as an alternative to more widely recognised methods in these areas.

Empirical Evidence and Quantification.
Although quantification of empirical evidence can in some studies be necessary or at least useful, it appears that, with most of the evidences presented in this paper (except in Sections 15 and 16), quantification would not be feasible or useful.In any case, attempts at quantification would be a distraction from the main thrust of the paper: that many examples of IC in HLPC are staring us in the face without the need for quantification.
As an example (from Section 6), a name like "New York" is, in the manner of chunking-with-codes, a relatively brief "code" for the enormously complex "chunk" of information which is the structure and workings of the city itself.Similar things can be said about most other names for things, and also "content" words like "house", "table", and so forth.In short, natural language may be seen to be a very powerful means of compressing information via the chunking-withcodes technique-and this is clear without the need for quantification.

IC and Concepts of Inference and Probability.
It has been recognised for some time that there is an intimate relation between IC and concepts of inference and probability (Appendix A.2, [16][17][18][19]).
In case this seems obscure, it makes sense in terms of ICMUP.A pattern that is repeated is one that invites ICMUP but it is also one that, via inductive reasoning, suggests possible inferences: (i) Any repeating pattern provides a basis for prediction.Any repeating pattern-such as the association between black clouds and rain-provides a basis for prediction: black clouds suggest that rain may be on the way, and probabilities may be derived from the number of repetitions.
(ii) Inferences via partial matching.With basic ICMUP and its variants, inferences may be made when one new pattern from the environment matches part of a stored, unified pattern.If, for example, we see "INFORMA", we may guess, on the strength of the stored pattern, "INFORMATION" (Figure 1), that the letters "TION" are likely to follow.This idea is sometimes called "prediction by partial matching" [20].Of course, the pattern may be completed in a similar way if the incoming information is "INFORMAN", "INMATION", "INFRMAION", and so on.
(iii) The SP System is designed to find partial matches as well as exact matches.Because of the need to make inferences like those just described and because a prominent feature of human perception is that we are rather good at finding good partial matches between patterns as well as exact matches, the SP System, including the process for building SP-multiplealignments, is designed to search for redundancy in the form of good partial matches between patterns, as well as redundancy in the form of exact matches.This is done with a version of dynamic programming, There is a lot more detail about how this works with the SP-multiple-alignment concept in Appendix A.6 [6, Section 4.4] and [1, Section 3.7 and Chapter 7].The SP System has proven to be an effective alternative to Bayesian theory in explaining such phenomena as "explaining away" ([6, Section 10.2], [1, Section 7.8]).
As indicated in Section 4, the close connection between IC and concepts of inference and probability makes sense in terms of biology.
2.6.The Big Picture.The credibility of the ICHLPC thesis of this paper is strengthened by its position in a "Big Picture" of the importance of IC in at least six areas: (i) Evidence for IC as a unifying principle in human learning, perception, and cognition.This paper describes relatively direct empirical evidence for IC (and more specifically ICMUP) as a unifying principle in HLPC.
(ii) IC in the SP Theory of Intelligence.ICMUP is central in the SP Theory of Intelligence (Section 2.2) which itself has much empirical and analytical support, summarised in Section 2.2.5, with pointers to where further information may be found.
(iii) IC in Neuroscience.Because of its central role in the SP System, IC is central in SP-Neural (Section 2.2.4) and may thus have an important role in neuroscience.
(iv) IC and concepts of inference and probability.It is known that there is an intimate relation between IC and concepts of inference and probability (Section 2.5).
(v) IC as a foundation for mathematics.The paper "Mathematics as information compression via the matching and unification of patterns" [2] argues that much of mathematics, perhaps all of it, may be understood in terms of ICMUP.
(vi) IC as a unifying principle in science.It is widely agreed that "Science is, at root, just the search for compression in the world" [21, p. 247], with variations such as "Science may be regarded as the art of data compression" [19, p. 585], and more.
The Big Picture, as just outlined, is important for reasons summarised here: (i) You can't play 20 questions with nature and win.In his famous essay, "You can't play 20 questions with nature and win", Allen Newell [22] writes about the sterility of developing theories in narrow fields and calls for each researcher to focus on "a genuine slab of human behaviour" (p.303).( Newell's essay and his book Unified Theories of Cognition [23] led to many attempts by himself and others to develop such theories.But the difficulty of reaching agreement on a comprehensive framework for general, human-like AI is suggested by the following observation in [24, : "Despite all the current enthusiasm in AI, the technologies involved still represent no more than advanced versions of classic statistics and machine learning."And what follows [24, Location 52] seems to confirm the persistence of the longstanding fragmentation of AI: "Behind the scenes, however, many breakthroughs are happening on multiple fronts: in unsupervised language and grammar learning, deep-learning, generative adversarial methods, vision systems, reinforcement learning, transfer learning, probabilistic programming, blockchain integration, causal networks, and many more".) (ii) Ockham's razor.Newell's exhortation accords with a slightly extended version of Ockham's razor: in developing simple theories of empirical phenomena, we should concentrate on those with the greatest explanatory range.Such theories will, naturally, be more useful than those with narrow scope, but, in addition, it seems that they are often relatively robust in the face of new evidence.
(iii) If you can't solve a problem, enlarge it.In a similar vein, President Eisenhower is reputed to have said: "If you can't solve a problem, enlarge it", meaning that putting a problem in a broader context may make it easier to solve.Good solutions to a problem may be hard to see when the problem is viewed through a keyhole but become visible when the door is opened.
In keeping with these three reasons, the Big Picture is important in showing the potential of IC as a unifying principle across a wide canvass, including the six areas mentioned above.
Each of the six components of the Big Picture has support via empirical and analytical evidence which is specific to that component.In addition, the six components are mutually supportive in the sense that the credibility of any one of them, including the main ICHLPC thesis of this paper, is strengthened via its position in the Big Picture.
Implications of the Big Picture include, for example, the fact that IC should be a key part of any and all proposals for general, human-like AI, for theories of human learning, perception, and cognition and for theories of cognitive neuroscience.

Volumes of Data and Speeds of Learning.
As noted in Section 2.1.1,large patterns may exceed the threshold for redundancy at a lower frequency than small patterns.With a complex pattern, such as an image of a person or a tree, there can be significant redundancy in a mere 2 occurrences of the pattern.
If redundancies can be detected via patterns that occur only 2 or 3 times in a given sample of data, unsupervised learning may prove to be effective with smallish amounts of data.This may help to explain why, in contrast to the very large amounts of data that are apparently required for success with deep learning, children and non-deep-learning types of learning program can do useful things with relatively tiny amounts of data [11, Section V-E].
In this connection, neuroscientist David Cox has been reported as saying: "To build a dog detector [with a deep learning system], you need to show the program thousands of things that are dogs and thousands that aren't dogs.My daughter only had to see one dog" and, the report says, she was happily pointing out puppies ever since.("Inside the moonshot effort to finally figure out the brain", MIT Technology Review, 2017-10-12, https://bit.ly/2wRxsOg.) This issue relates to the way in which a camouflaged animal is likely to become visible when it moves relative to its background (Section 12).As with random-dot stereograms (Section 11), only two images that are similar but not the same are needed to reveal hidden structure.

Emotions and Motivations.
A point that deserves emphasis is that while this paper is part of a programme of research aiming for simplification and integration of observations and ideas in HLPC and related fields, it does not aspire to be a comprehensive view of human psychology.In particular, it does not attempt to say anything about emotions or motivations, despite their undoubted importance and relevance to many aspects of human psychology, including cognitive psychology.That said, it seems possible that IC might apply to emotions or motivations in the same way that it may be applied to sensory data and our concepts about the world.

Related Research
An early example of thinking relating to IC in HLPC was the suggestion by William of Ockham in the 14th century that "Entities are not to be multiplied beyond necessity".Later, Isaac Newton wrote that "Nature is pleased with simplicity" [25, p. 320], Albert Einstein wrote that "A theory is more impressive the greater the simplicity of its premises, the more different things it relates, and the more expanded its area of application" (Quoted in [26, p. 512].) and more.Research with a more direct bearing on ICHLPC began in the 1950s and 1960s after the publication of Claude Shannon's [16] "theory of communication" (later called "information theory") and was partly inspired by it.
In the two subsections that follow, there is a rough distinction between research with the main focus on issues in HLPC and neuroscience and research that concentrates on issues in mathematics and computing.In both sections, research is described roughly in the order in which it was published.
In this research, the prevailing view of information, compression of information, and probabilities is that they are things to be defined and analysed in mathematical terms.This perspective has yielded some useful insights but, as suggested in Section 2.3, there are potential advantages in the ICMUP perspective adopted in the SP research.This ICMUP perspective is what chiefly distinguishes the evidence that provides the main thrust of this paper from the related research described in this section.

Psychology-Related and Neuroscience-Related Research.
Research relating to IC and HLPC and neuroscience may be divided roughly into two parts: early research initiated in the 1950s and 1960s by Fred Attneave, Horace Barlow and others and then, after a relative lull in activity, later research from the 1990s onwards.

Early Psychology-Related and Neuroscience-Related
Research.In a paper called "Some informational aspects of visual perception", Fred Attneave [27] describes evidence that visual perception may be understood in terms of the distinction between areas in a visual image where there is much redundancy and boundaries between those areas where nonredundant information is concentrated: ". . .information is concentrated along contours (i.e., regions where color changes abruptly) and is further concentrated at those points on a contour at which its direction changes most rapidly (i.e., at angles or peaks of curvature)" [27, p. 184].
For those reasons, he suggests that "Common objects may be represented with great economy and fairly striking fidelity by copying the points at which their contours change direction maximally and then connecting these points appropriately with a straight edge" [27, p. 185].And he illustrates the point with a drawing of a sleeping cat reproduced in Figure 7.
And he concludes with the suggestion that perception may be seen as economical description: "It appears likely that a major function of the perceptual machinery is to strip away some of the redundancy of stimulation, to describe or encode incoming information in a form more economical than that in which it impinges on the receptors" [27, p. 189].
Satosi Watanabe picked up the baton in a paper called "Information-theoretical aspects of inductive and deductive inference" [28].He later wrote about the role of IC in pattern recognition [29,30].
At about this time, Horace Barlow published a paper called "Sensory mechanisms, the reduction of redundancy, and intelligence" [31] in which he argued, on the strength of the large amounts of sensory information being fed into the [mammalian] central nervous system, that "the storage and utilization of this enormous sensory inflow would be made easier if the redundancy of the incoming messages was reduced" (p.537).And he draws attention to evidence that, in mammals at least, each optic nerve is too small, by a wide margin, to carry reasonable amounts of the information impinging on the retina unless there is considerable compression of that information [31, p. 548].
In the paper, Barlow makes the interesting suggestion that ". . . the mechanism that organises [the large size of the sensory inflow] must play an important part in the production of intelligent behaviour" (p.555), and in a later paper [32, p. 210] he writes the following: ". . . the operations required to find a less redundant code have a rather fascinating similarity to the task of answering an intelligence test, finding an appropriate scientific concept, or other exercises in the use of inductive reasoning.Thus, redundancy reduction may lead one towards understanding something about the organization of memory and intelligence, as well as pattern recognition and discrimination".These prescient insights into the significance of IC for the workings of human intelligence, with further discussion in Figure 7: Drawing made by abstracting 38 points of maximum curvature from the contours of a sleeping cat and connecting these points appropriately with a straight edge.Reproduced from Figure 3 in [27], with permission.[33], are a strand of thinking that has been carried through into the SP Theory of Intelligence, with a wealth of supporting evidence, summarised in Section 2.2.5.(When I was an undergraduate at Cambridge University, it was fascinating lectures by Horace Barlow about the significance of IC in the workings of brains and nervous systems that first got me interested in those ideas.) Barlow developed these and related ideas over a period of years in several papers, some of which are referenced in this paper.However, in [34], he adopted a new position, arguing that ". . . the [compression] idea was right in drawing attention to the importance of redundancy in sensory messages because this can often lead to crucially important knowledge of the environment, but it was wrong in emphasizing the main technical use for redundancy, which is compressive coding.The idea points to the enormous importance of estimating probabilities for almost everything the brain does, from determining what is redundant to fuelling Bayesian calculations of near optimal courses of action in a complicated world" (p.242).
While there are some valid points in what Barlow says in support of his new position, his overall conclusions appear to be wrong.His main arguments are summarised in Appendix B, with what I'm sorry to say are my critical comments after each one.(I feel apologetic about this because, as I mentioned, Barlow's lectures and his earlier research relating to IC in brains and nervous systems have been an inspiration for me over many years.)

Later Psychology-Related and Neuroscience-Related
Research.Like the earlier studies, later studies relating to IC in brains and nervous systems have little to say about ICMUP.But they help to confirm the importance of IC in HLPC and thus provide support for ICHLPC.A selection of publications are described briefly here.
Ruma Falk and Clifford Konold [35] describe the results of experiments indicating that the perceived randomness of a sequence is better predicted by various measures of its encoding difficulty than by its objective randomness.They suggest that judging the extent of a sequence's randomness is based on an attempt to encode it mentally and that the subjective experience of randomness may result when that kind of attempt fails.
Jose Hernández-Orallo and Neus Minaya-Collado [36] propose a definition of intelligence in terms of IC.At the most abstract level, it chimes with remarks by Horace Barlow quoted in Section 3.1.1,and indeed it is consonant with the SP Theory itself.But the proposal shows no hint of how to model the kinds of capabilities that one would expect to see in any artificial system that aspires to human-like intelligence.
Nick Chater, with others, has conducted extensive research on HLPC, compression of information, and concepts of probability, generally with an orientation towards Algorithmic Information Theory, Bayesian theory, and related ideas.For example, (i) Chater [37] discusses how "simplicity" and "likelihood" principles for perceptual organisation may be reconciled, with the conclusion that they are equivalent.He suggests that "the fundamental question is whether, or to what extent, perceptual organization is maximizing simplicity and maximizing likelihood" (p.579).(ii) Chater [38] discusses the idea that the cognitive system imposes patterns on the world according to a simplicity principle, meaning that it chooses the pattern that provides the briefest representation of the available information.Here, the word "pattern" means essentially a theory or system of one or more rules, a meaning which is quite different from the meaning of "pattern" or "SP-pattern" in the SP research, which Complexity simply means an array of atomic symbols in one or two dimensions.There is further discussion in [39].
(iii) Emmanuel Pothos and Nick Chater [40] present experimental evidence in support of the idea that, in sorting novel items into categories, people prefer the categories that provide the simplest encoding of these items.
(iv) Nick Chater and Paul Vitányi [41] describe how the "simplicity principle" allows the learning of language from positive evidence alone, given quite weak assumptions, in contrast to results on language learnability in the limit [42].There is further discussion in [43].
(v) Editors Nick Chater and Mike Oaksford [44] present a variety of studies using Bayesian analysis to understand probabilistic phenomena in HLPC.
(vi) Paul Vitányi and Nick Chater [45] discuss whether it is possible to infer a probabilistic model of the world from a sample of data from the world and, via arguments relating to Algorithmic Information Theory, they reach positive conclusions.
Jacob Feldman [46] describes experimental evidence that when people are asked to learn "Boolean concepts", meaning categories defined by logical rules, the subjective difficulty of learning a concept is directly proportional to its "compressibility", meaning the length of the shortest logically equivalent formula.
Don Donderi [47] presents a review of concepts that relate to the concept of "visual complexity".These include Gestalt psychology, Neural Circuit Theory, Algorithmic Information Theory, and Perceptual Learning Theory.The paper includes discussion of how these and related ideas may contribute to an understanding of human performance with visual displays.
Vivien Robinet and coworkers [48] describe a dynamic hierarchical chunking mechanism, similar to the MK10 Computer Model (Section 15).The theoretical orientation of this research is towards Algorithmic Information Theory, while the MK10 Computer Model embodies ICMUP.
From analysis and experimentation, Nicolas Gauvrit and others [49] conclude that how people perceive complexity in images seems to be partly shaped by the statistics of natural scenes.In [50], a slightly different grouping with Gauvrit as lead author describe how it is possible to overcome the apparent shortcoming of Algorithmic Information Theory in estimating the complexity of short strings of symbols, and they show how the method may be applied to examples from psychology.
In a review of research on the evolution of natural language, Simon Kirby and others [51] describe evidence that transmission of language from one person to another has the effect of developing structure in language, where "structure" may be equated with compressibility.On the strength of further research, [52] conclude that increases in compressibility arise from learning processes (storing patterns in memory), whereas reproducing patterns leads to random variations in language.
On the strength of a theoretical framework, an experiment, and a simulation, Benoît Lemaire and coworkers [53] argue that the capacity of the human working memory may be better expressed as a quantity of information rather than a fixed number of chunks.
In related work, Fabien Mathy and Jacob Feldman [54] redefine George Miller's [3] concept of a "chunk" in terms of Algorithmic Information Theory as a unit in a "maximally compressed code".On the strength of experimental evidence, they suggest that the true limit on short-term memory is about 3 or 4 distinct chunks, equivalent to about 7 uncompressed items (of average compressibility), consistent with George Miller's famous magical number.
And Mustapha Chekaf and coworkers [55] describe evidence that people can store more information in their immediate memory if it is "compressible" (meaning that it conforms to a rule such as "all numbers between 2 and 6") than if it is not compressible.They draw the more general conclusion that immediate memory is the starting place for compressive recoding of information.
In addition to these several studies, there is quite a large body of research which relates to the concept of "efficient coding" in brains and nervous systems.These include the studies described in the following paragraphs.
Tiberiu Tes ¸ileanu, Bence Ölveczky, and Vijay Balasubramanian [56] developed a computer model of efficient twostage learning, which proved accurate against data for the learning of birdsong by birds.
Ann Hermundstad and colleagues [57] found evidence in support of the propositions that efficient coding extends to higher-order sensory features and that more neural resources are applied when sensory data is limited.
Vijay Balasubramanian [58] argues that the remarkable energy efficiency of the brain is achieved in part through the dedication of specialized circuit elements and architectures to specific computational tasks, in a hierarchy stretching from the scale of neurons to the scale of the entire brain, and that these structures are learned via an evolutionary process.
Francisco Heras and colleagues [59] provide evidence for mechanisms promoting energy efficiency in the workings of blowfly photoreceptors.
Biswa Sengupta and colleagues [60] investigate why the conversion of "graded" potentials in the brain's neural circuits to "action" potentials in those circuits is accompanied by substantial information loss and how this changes energy efficiency.
Simon Laughlin and Terrence Sejnowski [61] describe some of "the geometric, biophysical, and energy constraints that have governed the evolution of cortical networks", how "nature has optimized the structure and function of cortical networks with design principles similar to those used in electronic networks", and how "the brain . . .exploits the adaptability of biological systems to reconfigure in response to changing needs".
Joseph Atick [62] reviews evidence relating to the principle that efficiency of information representation may be a design principle for sensory processing.In particular, it appears that this principle applies to large monopolar cells in the fly's visual system and retinal coding in mammals in the spatial, temporal, and chromatic domains.
Joseph Atick and Norman Redlich [63] argue that the goal of processing in the retina is to transform the visual input as much as possible into a "statistically independent" form as a first step in creating a compressed representation in the cortex, as suggested by Horace Barlow.But the amount of compression that can be achieved in the retina is reduced by the need to suppress noise in the sensory input.
Adrienne Fairhall and colleagues [64] consider evidence relating to the optimisation of neural coding when the statistics of sensory data is changing.They conclude that "the speed with which information is optimized and ambiguities are resolved approaches the physical limit imposed by statistical sampling and noise".
Naama Brenner and colleagues [65] show that the input/ output relation of a sensory system in a dynamic environment changes with the statistical properties of the environment.More specifically, when the dynamic range of inputs changes, the input/output relation rescales so as to match the dynamic range of responses to that of the inputs.And the scaling of the input/output relation is set to maximize information transmission for each distribution of signals.
William Bialek and colleagues [66] review progress on the question: "Does the brain construct an efficient representation of the sensory world?"In their answer to this question they take account of the biological value of sensory information, and they report preliminary evidence from studies of the fly's visual system which appear to support their view.
Stephanie Palmer and colleagues [67] show that efficient predictive computation starts at the earliest stages of the visual system and that this is true of nearly every cell in the retina and beyond."Efficient representation of predictive information is a candidate principle that can be applied at each stage of neural computation".
Bruno Olshausen and David Field [68] discuss how "sparse coding" (the encoding of sensory information using a small number of active neurons at any given point in time) may confer several advantages and that there is evidence that "sparse coding could be a ubiquitous strategy employed in several different modalities across different organisms".
The same two authors, in [69], discuss the problem of how images can best be encoded and transmitted, with particular emphasis on how the eye and brain process visual information.They remark that "computer scientists and engineers now focusing on the problem of image compression should keep abreast of emerging results in neuroscience.At the same time, neuroscientists should pay close attention to current studies of image processing and image statistics".
Kristin Koch and colleagues [70] consider the question: how much information does the retina send to the brain and how is it apportioned among different cell types?They conclude that "with approximately 10 6 ganglion cells, the human retina would transmit data at roughly the rate of an Ethernet connection".This figure appears to be for the amount of information that is transmitted after decompression.

Mathematics-Related and Computer-Related Research.
Other researches, with an emphasis on issues in mathematics and computing, including artificial intelligence, can be helpful in the understanding of IC in brains and nervous systems.This includes the following: Theory showing the intimate relation between IC and inductive inference [17,18] (Section 2.5).
(iii) Gregory Chaitin and Andrei Kolmogorov, working independently, developed Algorithmic Information Theory, building on the work of Ray Solomonoff.The main idea here is that the information content of a string of symbols is equivalent to the length of the shortest computer program that anyone has been able to devise that describes the string.
A detailed description of these and related bodies of research may be found in [19].
In research on deep learning in artificial neural networks, well reviewed by Jürgen Schmidhuber [10], there is some recognition of the importance of IC (in [10, Sections 4.2, 4.4, and 5.6.3]),but it appears that the idea is not well developed in deep learning systems.
Marcus Hutter, with others, [76][77][78] has developed the "AIXI" model of intelligence based on Algorithmic Probability Theory and Sequential Decision Theory.He has also initiated the "Hutter Prize", a competition with € 50,000 of prize money, for lossless compression of a given sample of text.The competition is motivated by the idea that "being able to compress well is closely related to acting intelligently, thus reducing the slippery concept of intelligence to hard file size numbers".(From http://www.hutter1.net,retrieved 2017-10-10.)This is an interesting project which may yet lead to general, human-level AI.

IC and Biology
This section and those that follow (up to and including Section 21) describe evidence that, in varying degrees, lends support to the ICHLPC perspective.Most of this evidence comes directly from observations of people, but some of it comes from studies of animals-with the expectation that similar principles would be true of people.
First, let us take an abstract view of why IC might be important in people and other animals.In terms of biology, IC can confer a selective advantage to any creature by allowing it to store more information in a given storage space or use less storage space for a given amount of information and by speeding up the transmission of any given volume of information along nerve fibres-thus speeding up reactions-or reducing the bandwidth needed for the transmission of the same volume of information in a given time.

Complexity
Perhaps more important than the impact of IC on the storage or transmission of information is the close connection, outlined in Section 2.5, between IC and concepts of inference and probability.Compression of information provides a means of predicting the future from the past and estimating probabilities so that, for example, an animal may learn to predict where food may be found or where there may be dangers.
As mentioned in Section 2.5, the close connection between IC and concepts of inference and probability makes sense in terms of ICMUP: any repeating pattern can be a basis for inferences, and the probabilities of such inferences may be derived from the number of repetitions of the given pattern.
Being able to make inferences and estimate probabilities can mean large savings in the use of energy and other benefits in terms of survival.

Sensory Inflow, Redundancy, and the Transmission and Storage of Information
As mentioned in Section 3.1.1,Fred Attneave [27] describes how visual perception may be understood in terms of the distinction between areas in a visual image where there is much redundancy and boundaries between those areas where nonredundant information is concentrated.And he suggests that visual perception may be understood, at least in part, as the economical description of sensory input.Also mentioned in the same section is Horace Barlow's [31] argument that compression of sensory information is needed to cope with the large volumes of such information and, more specifically, his recognition that, without compression of the information falling on the retina, each optic nerve would be too small to transmit reasonable amounts of that information to the brain [31, p. 548].

Chunking-with-Codes
ICMUP is so much embedded in our thinking and seems so natural and obvious that it is easily overlooked.This section, with Sections 7 and 8, describes some examples.
In the same way that "TFEU" may be a convenient code or shorthand for the rather cumbersome expression "Treaty on the Functioning of the European Union" (Appendix C.1.2),a name like "New York" is, as previously noted in Section 2.4, a compact way of referring to the many things and activities in that renowned city and likewise for the many other names that we use: "Nelson Mandela", "George Washington", "Mount Everest", and so on.
The "chunking-with-codes" variant of ICMUP (Section 2.1.2) permeates our use of natural language, both in its surface forms and in the way in which surface forms relate to meanings.(Although natural language provides a very effective means of compressing information about the world, it is not free of redundancy.And redundancy has a useful role to play in, for example, enabling us to understand speech in noisy conditions and in learning the structure of language.How this apparent inconsistency may be resolved is discussed in Appendix C.2.)Because of its prominence in natural language and because of its intrinsic power, chunking-with-codes is probably important in nonverbal aspects of our thinking, as may be inferred from empirical support for the SP System and its strengths in several aspects of intelligence (Section 2.2.5).(Contrary to the view which is sometimes expressed that thinking is not possible without language, there is evidence in [79] for nonverbal thinking by congenitally deaf people without knowledge of written or spoken natural language, and there is another evidence in [80] for nonverbal thinking in people and in animals.) Ever since George Miller's influential paper [3], the concept of a "chunk" has been the subject of much research in psychology and related disciplines (see, e.g., [81][82][83][84]).
Principles outlined in this section are likely to apply also to variants of ICMUP discussed in Sections 7 and 8 below.

Class-Inclusion Hierarchies
As with chunking-with-codes, class-inclusion hierarchies, with variations such as cross-classification, are prominent in our use of language and in our thinking.Benefits arise from economies in the storage of information and in inferences via inheritance of attributes, in accordance with the "classinclusion hierarchies" variant of ICMUP (Section 2.1.5).
As with chunking-with-codes, names for classes of things provide for great economies in our use of language: most "content" words (nouns, verbs, adjectives, and adverbs) in our everyday language stand for classes of things and, as such, are powerful aids to economical description.
Imagine how cumbersome things would be if, on each occasion that we wanted to refer to a "table", we had to say something like "A horizontal platform, often made of wood, used as a support for things like food, normally with four legs but sometimes three, . ..", like the slow Entish language of the Ents in Tolkien's The Lord of the Rings.(J.R. R. Tolkien, The Lord of the Rings, London: HarperCollins, 2005, Kindle edition.For a description of Entish, see, e.g., page 480.See also, pages 465, 468, 473, 477, 478, 486, and 565.)Similar things may be said for verbs like "speak" or "dance", adjectives like "artistic" or "exuberant", and adverbs like "quickly" or "carefully".
Classes and categories have been the subject of much research in psychology and related disciplines over several decades (see, e.g., [85][86][87]).

Schema-Plus-Correction, Run-Length Coding, and Part-Whole Hierarchies
As with chunking-with-codes and class-inclusion hierarchies, it seems natural to conceptualise things in terms of other techniques described in Section 2.1.In all cases, there is clear potential for substantial economies in how knowledge is represented and for the making of useful inferences.

Schema-Plus-Correction.
As mentioned in Section 2.1.3,a menu in a restaurant or café is an obvious example of the schema-plus-correction device in everyday thinking.Other examples are the uses of forms to gather information about candidates for a job, the features of a house for sale, a checklist for repairs on a car, and so on.And knowledge of almost any skill such as baking a cake, gardening, or woodwork may be seen as a schema that may be tailored for a specific task-such as baking a coffee-and-walnut cake-by plugging in values for that task.
An interesting example of schema-plus-correction in everyday life is the UK shipping forecast which leaves out most of the schema and gives only the corrections to the schema.So, for example, "good, becoming moderate or poor" refers to visibility without mentioning that word; "moderate or rough" refers to the state of the sea, without mentioning that expression; figures for wind speed are given without mentioning that they refer to the Beaufort wind force scale; a word like "later" means a time that is more than 12 hours from the time the forecast was issued; and so on.

Run-Length Coding. If anything is repeated, especially if
it is repeated a large number of times, it seems natural and obvious to describe the repetition with a form of run-length coding.For example, an instruction to walk from one place to another may be "From the old oak tree keep walking until you see the river".Here, "the old oak tree" marks the start of the repetition, "keep walking" describes the repeated operation of putting one foot in front of the other, and "until you see the river" marks the end of the repetition.

Part-Whole Hierarchies.
As with class-inclusion hierarchies, part-whole hierarchies are prominent in our language and in our thinking.In describing anything that is more complex than "very simple", such as a house or a car, it seems natural and obvious to divide it into parts and subparts through as many levels as are needed, thus promoting economies and the making of inferences as described in Section 2.1.6.

Merging Multiple Views to Make One
Here is another example of something that is so familiar that we are normally not aware that it is part of our perceptions and thinking.
If, when we are looking at something, we close our eyes for a moment and open them again, what do we see?Normally, it is the same as what we saw before.But creating a single view out of the before and after views means unifying the two patterns to make one and thus compressing the information, as shown schematically in Figure 8. (It is true that people may, on occasion, not detect large changes to objects and scenes ("change blindness") [88] and that, without attention, we may not even perceive objects ("inattentional blindness") [89], but it is also true that we can detect differences between pairs of images that are similar but not identical-which means that we can also detect the similarities between such pairs of images.That ability to detect similarities, together with our ordinary experience that we normally merge multiple views to make one, as described in the main text, implies that compression of information is an important part of visual perception.) It seems so simple and obvious that if we are looking at a landscape like the one in the figure, there is just one landscape even though we may look at it two, three, or more times.But if we did not unify successive views we would be like an old-style cine camera that simply records a sequence of frames, without any kind of analysis or understanding that, very often, successive frames are identical or nearly so.

Recognition
With the kind of merging of views just described, we do not bother to give it a name.But if the interval between one view and the next is hours, months, or years, it seems appropriate to call it "recognition".In cases like that, it is more obvious that we are relying on memory, as shown schematically in Figure 9. Notwithstanding the undoubted complexities and  subtleties in how we recognise things, the process may be seen in broad terms as ICMUP: matching incoming information with stored knowledge, merging or unifying patterns that are the same, and thus compressing the information.
If we did not compress information in that way, our brains would quickly become cluttered with millions of copies of things that we see around us-people, furniture, cups, trees, and so on-and likewise for sounds and other sensory inputs.
As mentioned earlier, Satosi Watanabe has explored the relationship between pattern recognition and IC [29,30].

Binocular Vision
ICMUP may also be seen at work in binocular vision: "In an animal in which the visual fields of the two eyes overlap extensively, as in the cat, monkey, and man, one obvious type of redundancy in the messages reaching the brain is the very nearly exact reduplication of one eye's message by the other eye."[32, p. 213].
In viewing a scene with two eyes, we normally see one view and not two.This suggests that there is a matching and unification of patterns, with a corresponding compression of information.A sceptic might say, somewhat implausibly, that the one view that we see comes from only one eye.But that sceptical view is undermined by the fact that, normally, the one view gives us a vivid impression of depth that comes from merging the two slightly different views from both eyes.
Strong evidence that, in stereoscopic vision, we do indeed merge the views from both eyes comes from a demonstration with "random-dot stereograms", as described in [90, Section 5.1] (see also Appendix A.3).
In brief, each of the two images shown in Figure 10 is a random array of black and white pixels, with no discernable structure, but they are related to each other as shown in Figure 11: both images are the same except that a square area near the middle of the left image is further to the left in the right image.
When the images in Figure 10 are viewed with a stereoscope, projecting the left image to the left eye and the right image to the right eye, the central square appears gradually as a discrete object suspended above the background.
Although this illustrates depth perception in stereoscopic vision-a subject of some interest in its own right-the main interest here is on how we see the central square as a discrete object.There is no such object in either of the two images individually.It exists purely in the relationship between the two images, and seeing it means matching one image with the other and unifying the parts which are the same.This example shows that, although the matching and unification of patterns is a usefully simple idea, there are interesting subtleties and complexities that arise in finding a good match when the two patterns are similar but not identical.

Finding a Good Match.
Seeing the central object in a random-dot stereogram means finding a good match between relevant pixels in the central area of the left and right images and likewise for the background.Here, a good match is one that yields a relatively high level of IC.Since there is normally an astronomically large number of alternative ways in which combinations of pixels in one image may be aligned with combinations of pixels in the other image, it is not normally feasible to search through all the possibilities exhaustively.

The Best Is the Enemy of the Good.
As with the SP System (Sections 2.2.1 to 2.2.3) and many problems in artificial intelligence, the best is the enemy of the good.Instead of looking for the perfect solution-which may lead to outright failure-we can do better, achieving something useful on most occasions by looking for solutions that are good enough for practical purposes.With this kind of problem, acceptably good solutions can often be found in a reasonable time with heuristic search.One such method for the analysis of random-dot stereograms has been described by Marr and Poggio [92].

Abstracting Object Concepts via Motion
It seems likely that the kinds of processes that enable us to see a hidden object in a random-dot stereogram also apply to how we see discrete objects in the world.The contrast between the relatively stable configuration of features in an object such as a car, compared with the variety of its surroundings as it travels around, seems to be an important part of what leads us to conceptualise the object as an object [90,Section 5.2].
Any creature that depends on camouflage for protection-by blending with its background-must normally stay still.As soon as it moves relative to its surroundings, it is likely to stand out as a discrete object ([90, Section 5.2], see also Section 2.7).
The idea that IC may provide a means of discovering "natural" structures in the world-such as the many objects in our visual world-has been dubbed the "DONSVIC" principle: the discovery of natural structures via information compression [6,Section 5.2].Of course, the word "natural" is not precise, but it has enough precision to be a meaningful name for the process of learning the kinds of concepts which are the bread-and-butter of our everyday thinking.
Similar principles may account for how young children come to understand that their first language (or languages) is composed of words (Section 15).

Adaptation in the Eye of Limulus and Run-Length Coding
IC may also be seen down in the works of vision.Figure 12 shows a recording from a single sensory cell (ommatidium) in the eye of a horseshoe crab (Limulus polyphemus), first when the background illumination is low, then when a light is switched on and kept on for a while, and later background rate.The rate of firing remains at that level until the light is switched off, at which point it drops sharply and then returns to the background level, a mirror image of what happened when the light was switched on.
In connection with the main theme of this paper, a point of interest is that the positive spike when the light is switched on and the negative spike when the light is switched off have the effect of marking boundaries, first between dark and light and later between light and dark.In effect, this is a form of run-length coding (Section 2.1.4).At the first boundary, the positive spike marks the fact of the light coming on.As long as the light stays on, there is no need for that information to be constantly repeated, so there is no need for the rate of firing to remain at a high level.Likewise, when the light is switched off, the negative spike marks the transition to darkness and, as before, there is no need for constant repetition of information about the new low level of illumination.(It is recognised that this kind of adaptation in eyes is a likely reason for small eye movements when we are looking at something, including sudden small shifts in position ("microsaccades"), drift in the direction of gaze, and tremor [94].Without those movements, there would be an unvarying image on the retina so that, via adaptation, what we are looking at would soon disappear!) Another point of interest is that this pattern of responding-adaptation to constant stimulation-can be explained via the action of inhibitory nerve fibres that bring the rate of firing back to the background rate when there is little or no variation in the sensory input [95].
Inhibitory mechanisms are widespread in the brain [96, p. 45] and it appears that, in general, their role is to reduce or eliminate redundancies in information ([8, Section 9]), in keeping with the main theme of this paper.

Other Examples of Adaptation
Adaptation is also evident at the level of conscious awareness.If, for example, a fan starts working nearby, we may notice the hum at first but then adapt to the sound and cease to be aware of it.But when the fan stops, we are likely to notice the new quietness at first but adapt again and stop noticing it.
Another example is the contrast between how we become aware if something or someone touches us but we are mostly unaware of how our clothes touch us in many places all day long.We are sensitive to something new and different and we are relatively insensitive to things that are repeated.
As with adaptation in the eye of Limulus, these other kinds of adaptation may be seen as examples of the run-length coding technique for compression of information.

Structure of Language
There is evidence that much of the segmental structure of language-words and phrases-may be discovered via ICMUP, as described in the following two subsections.To the extent that these mechanisms model aspects of HLPC, they provide evidence for ICHLPC.With regard to Section 2.4, about the possible role of quantification in empirical evidence for ICHLPC, the MK10 Computer Model, designed for the discovery of segmental structure in language and outlined below, assigns a central role to the quantification of frequencies with which basic symbols such as letters, or sequences of symbols, occur in any given sample of language.
15.1.The Word Structure of Natural Language.As can be seen in Figure 13, people normally speak in "ribbons" of sound, without gaps between words or other consistent markers of the boundaries between words.In the figure-the waveform for a recording of the spoken phrase "on our website"-it is not obvious where the word "on" ends and the word "our" begins and likewise for the words "our" and "website".Just to confuse matters, there are three places within the word "website" which look as if they might be word boundaries.
Given that words are not clearly marked in the speech that young children hear, how do they get to know that language is composed of words?Learning to read could provide an answer but it appears that young children develop an understanding that language is composed of words well before the age when, normally, they are introduced to reading.Perhaps more to the point is that there are still, regrettably, many children throughout the world that are never introduced to reading but, in learning to talk and to understand speech, they inevitably develop a knowledge of the structure of language, including words.(It has been recognised for some time that skilled speakers of any language have an ability to create or recognise sentences that are grammatical but new to the world.Chomsky's well-known example of such a sentence is Colorless green ideas sleep furiously.[97, p. 15], which, when it was first published, was undoubtedly novel.This ability to create or recognise grammatical but novel sentences implies that knowledge of a language means knowledge of words as discrete entities that can form novel combinations.) In keeping with the main theme of this paper, ICMUP provides an answer [98, p. 193] which works largely via ICMUP and can reveal much of the word structure in an English-language text from which all spaces and punctuation have been removed [6,Section 5.2].It is true that there are added complications with speech but it seems likely that similar principles apply.This discovery of word structure by the MK10 program, illustrated in Figure 14, is achieved without the aid of any kind of externally supplied dictionary or other information about the structure of English.The program builds its own dictionary via "unsupervised" learning using only the unsegmented sample of English with which it is supplied.It learns without the assistance of any kind of "teacher", or data that is marked as "wrong", or the grading of samples from simple to complex (cf.[42]).
Statistical tests show that the correspondence between the computer-assigned word structure and the original (human) division into words is significantly better than chance.
Two aspects of the MK10 model strengthen its position as a model of what children do in learning the segmental structure of language [98, p. 200]: the growth in the lengths of words learned by the program corresponds quite well with the same measure for children; and the pattern of changing numbers of new words that are learned by the program at different stages corresponds quite well with the equivalent pattern for children.
Discovering the word structure of language via ICMUP is another example of the DONSVIC principle, mentioned in Section 12-because words are the kinds of "natural" structure which are the subject of the DONSVIC principle and because ICMUP provides a key to how they may be discovered.

The Phrase Structure of Natural Language.
In addition to its achievements in learning the word structure of natural language, the MK10 Computer Model, featured in Section 15.1, does quite a good job at discovering the phrase structure of unsegmented text in which each word has been replaced by a symbol representing the grammatical class of the word [98, p. 194].An example is shown in Figure 15.As before, the program works without any prior knowledge of the structure of English and, apart from the initial assignment of word classes, it works in unsupervised mode without the assistance of any kind of "teacher" or anything equivalent.As before, statistical tests show that the correspondence between computer-assigned and human-assigned structures is statistically significant.(Thanks to Dr. Isabel Forbes, a person qualified in theoretical linguistics, for the assignment of grammatical class symbols to words in the given text and for phrase-structure analyses of the text.)Since ICMUP is central in the workings of the MK10 Computer Model, this result suggests that ICMUP may have a role to play not merely in discovering the phrase structure of language but more generally in discovering the grammatical structure of language.

Grammatical Inference
Regarding the last point from the previous section, it seems likely that learning the grammar of a language may also be understood in terms of ICMUP.Evidence in support of that expectation comes from research with two programs designed for grammatical inference: (ii) The SP Computer Model.The SP Computer Model, one of the main products of the SP programme of research, achieves results at a similar level to that of SNPR.As before, ICMUP is central in how the program works.With the solution of some residual problems, outlined in [6, Section 3.3], there seems to be a real possibility that the SP System will be able to discover plausible grammars from samples of natural language.Also, it is anticipated that, with further development, the program may be applied to the learning of nonsyntactic "semantic" knowledge and the learning of grammars in which syntax and semantics are integrated.
What was the point of developing the SP Computer Model when it does no better at grammatical inference than the SNPR Computer Model?The reason is that the SNPR Computer Model, which was designed for the discovery of syntactic structures and worked mainly via the building of hierarchical structures, was not compatible with the new and much more ambitious goal of the SP programme of research: to simplify and integrate observations and concepts across artificial intelligence, mainstream computing, mathematics, and HLPC.What was needed was a new organising principle that would accommodate hierarchical structures and several other kinds of structure as well.
It turns out that the SP-multiple-alignment concept is much more versatile than the hierarchical organising principle in the SNPR program, providing for several aspects of intelligence and the representation and processing of a variety of knowledge structures of which hierarchical structures is only one (Section 2.2.5).It appears that the SP System provides a much firmer foundation for the development of human-level intelligence than the SNPR Computer Model or indeed deep learning models, as discussed in [11,Section V].
With regard to Section 2.4 about the possible role of quantification in empirical evidence for ICHLPC, the SNPR Computer Model and the SP Computer Model, like the MK10 Computer Model (Section 15), both have a central role for quantification of the frequencies with which basic symbols such as letters, or contiguous or broken patterns of symbols, occur in any given sample of data.

Generalisation, the Correction of Wrong
Generalisations, and (Dirty Data) Issues relating to generalisation in learning are best described with reference to the Venn diagram shown in Figure 16.
That figure relates to the unsupervised learning of a natural language but it appears that generalisation issues in other areas of learning are much the same.The evidence to be described derives largely from the SNPR Computer Model and the SP Computer Model.Since both models are founded on ICMUP, evidence that they have human-like capabilities with generalisation and related phenomena may be seen as evidence in support of ICHLPC.
In the figure, the smallest envelope shows the finite but large sample of "utterances" from which a young child learns his or her native language (which we shall call L)-where an "utterance" is a speech sound of any kind, and the speakers from which a young child learns are adults or older children (To keep things simple in this discussion we shall assume that each child learns only one first language, although many children learn two or more first languages.).The middlesized envelope shows the (infinite) set of utterances in L, and the largest envelope shows the (infinite) set of all possible utterances, including those that are in L and those which are not."Dirty data" are the many "ungrammatical" utterances that children normally hear-outside the envelope for L but inside the envelope representing the utterances from which a young child learns.
The child generalises "correctly" when he or she infers L, and only L, from the finite sample he or she has heard, including dirty data.Anything that spills over into the outer envelope, like "mouses" as the plural of "mouse" or "buyed" as the past tense of "buy", is an overgeneralisation, while failure to learn the whole of L represents undergeneralisation.
In connection with the foregoing summary of concepts relating to generalisation, there are three main problems: (i) Generalisation without overgeneralisation.How can we generalise our knowledge without overgeneralisation and this in the face of evidence that children can learn their first language or languages without the correction of errors by parents or teachers or anything equivalent?(Evidence comes chiefly from children who learned language without the possibility that anyone might correct their errors.Christy Brown was a cerebral-palsied child who not only lacked any ability to speak but whose bodily handicap was so severe that for much of his childhood he was unable to demonstrate that he had normal comprehension Figure 16: Categories of utterances involved in the learning of a first language, L. In ascending order size, they are the finite sample of utterances from which a child learns, the (infinite) set of utterances in L, and the (infinite) set of all possible utterances.Adapted from Figure 7.1 in [98], with permission.
of speech and nonverbal forms of communication [99].Hence, his learning of language must have been achieved without the possibility that anyone might correct errors in his spoken language.)(ii) Generalisation without undergeneralisation.How can we generalise our knowledge without undergeneralisation?As before, there is evidence that learning of a language can be achieved without explicit teaching.(iii) Dirty data.How can we learn correct knowledge despite errors in the examples we hear?Again, it appears that this can be done without correction of errors.
These things are discussed quite fully in [1, Section 9.5.3] and [6, Section 5.3].There is also relevant discussion in [11, Section V-H and XI-C].
In brief, IC provides an answer to all three problems like this: for a given body of raw data, I, compress it thoroughly via unsupervised learning; the resulting compressed version of I may be split into two parts, a grammar and an encoding of I in terms of the grammar; normally, the grammar generalises correctly without over-or undergeneralisation, and errors in I are weeded out; the encoding may be discarded.
This scheme is admirably simple, but, so far, the evidence in support of it is only informal, derived largely from informal experiments with English-like artificial languages with the SNPR Computer Model of language learning [98, pp.181-185] and the SP Computer Model [1, Section 9.5.3].
The weeding out of errors via this scheme may seem puzzling, but errors, by their nature, are rare.The grammar retains the repeating parts of I (which are relatively common), while the encoding contains the nonrepeating parts including most of the errors."Errors" that are not rare acquire the status of "dialect" and cease to be regarded as errors.
A problem with research in this area is that the identification of any over-or undergeneralisations produced by the above scheme or any other model depends largely on human intuitions.But this is not so very different from the long-established practice in research on linguistics of using human judgements of grammaticality to establish what any given person knows about a particular language.
The problem of generalising our learning without overor undergeneralisation applies to the learning of a natural language and also to the learning of such things as visual images.It appears that the solution outlined here has distinct advantages compared with, for example, what appear to be largely ad hoc solutions that have been proposed for deep learning in artificial neural networks [11, Section V-H].
As noted above, evidence for human-like generalisation with the SNPR and SP computer models, without either overor undergeneralisation, may be seen as evidence in support of ICMUP as a unifying principle in HLPC.

Perceptual Constancies
It has long been recognised that our perceptions are governed by constancies: (i) Size constancy.To a large extent, we judge the size of an object to be constant despite wide variations in the size of its image on the retina [100, pp. 40-41].
(ii) Lightness constancy.We judge the lightness of an object to be constant despite wide variations in the intensity of its illumination [100, p. 376].
(iii) Colour constancy.We judge the colour of an object to be constant despite wide variations in the colour of its illumination [100, p. 402].
These kinds of constancy, and others such as shape constancy and location constancy, may each be seen as a means of encoding information economically: it is simpler to remember that a particular person is "about my height" than many different judgements of size, depending on how far away that person is.In a similar way, it is simpler to remember that a particular object is "black" or "red" than all the complexity of how its lightness or its colour changes in different lighting conditions.
By filtering out variations due to viewing distance or the intensity or colour of incident light, we can facilitate ICMUP and thus, for example, in watching a football match, simplify the process of establishing that there is (normally) just one ball on the pitch and not many different balls depending on viewing distances, whether the ball is in a bright or shaded part of the pitch, and so on.

Kinds of Redundancy That People Find Difficult or Impossible to Detect
Although the matching and unification of patterns is often effective in the detection and reduction of redundancy in information, there are kinds of redundancy that are not easily revealed via ICMUP.It seems that those kinds of redundancy are also ones that people find difficult or impossible to detect.A well-known example is the decimal representation of , which appears to most people to be entirely random, but which can be created by a simple program so that, in terms of Algorithmic Information Theory, it contains much redundancy.
At first sight, this observation seems to contradict the main thesis of this paper that much of HLPC may be may be understood as IC.But there is nothing in the ICHLPC thesis to say that people can or should be able to detect all kinds of redundancy via ICMUP.And the apparent randomness of the decimal representation of  suggests that any natural or artificial system that works via ICMUP would fail to detect the redundancy in data of that kind.
In short, what appears at first sight to be evidence against ICHLPC turns out to be evidence in support of that thesis: the failure of most people to detect the redundancy in the decimal representation of  may be explained via the ICHLPC thesis, together with the apparent weakness of ICMUP in discovering and reducing that kind of redundancy.

Mathematics
A discussion of mathematics may seem out of place in a paper about ICHLPC but mathematics is relevant because it has been developed over many years as an aid to human thinking.For that reason, in the spirit of George Boole's An investigation of the laws of thought [101], a consideration of the organisation and workings of mathematics is relevant to ICHLPC (Another book with the suggestion in its title is that it is relevant to human thinking is William Thomson's "Outline of the Laws of Thought" [102], although his orientation is more towards concepts in logic than concepts in mathematics.).
In [2] it is argued that much of mathematics, perhaps all of it, may be seen as a set of techniques for the compression of information via the matching and unification of patterns and their application.In case this seems implausible, we have the following: (i) An equation as a compressed representation of data.An equation like Albert Einstein's  =  2 may be seen as a very compressed representation of what may be a very large set of data points relating energy () and mass (), with the speed of light () as a constant.Similar things may be said about such well-known equations as  = ( 2 )/2 (derived from Newton's second law of motion),  2 +  2 =  2 (Pythagoras's equation),  =  (Boyle's law), and  = ( + V × ) (the charged-particle equation).
(ii) Variants of ICMUP may be seen at work in mathematical notations.The second, third, and fourth of the variants of ICMUP outlined in Section 2.1 may be seen at work in mathematical notations.For example, multiplication as repeated addition may be seen as an example of run-length coding.
Owing to the close connections between logic and mathematics and between computing and mathematics, it seems likely that similar principles apply in logic and in computing [2, Section 4].
Although in this research it has seemed necessary to avoid too much dependence on mathematics (for reasons outlined in Section 2.3), there is now the interesting possibility that the scope of mathematics may be greatly extended by incorporating within it such concepts as SP-multiple-alignment and other elements of the SP Theory [2, Section 7].

Evidence for ICHLPC via the SP System
Another strand of empirical evidence for ICHLPC is via the SP System and the central role within it of SP-multiplealignment (Section 2.2.2), a variant of ICMUP which, as described in Section 2.1.7,encompasses the six others described in Section 2.1.
The evidence for ICHLPC via the SP System derives largely from the strengths of the SP System in modelling several aspects of HLPC, summarised in Section 2.2.5 and described in more detail in [6] and in [1].

Some Apparent Contradictions and How They May Be Resolved
The idea that IC is fundamental in HLPC, and also in the SP Theory as a theory of HLPC, seems to be contradicted by the following: (i) The ways in which people may create redundant copies of information as well as how they may compress information (ii) The fact that redundancy in information is often useful in detecting and correcting errors and in the storage and processing of information (iii) A less direct challenge to ICHLPC, and the SP Theory as a theory of HLPC, is persuasive evidence, described by Gary Marcus [103], that in many respects, the human mind is a kluge, meaning "a clumsy or inelegant-yet surprisingly effective-solution to a problem" (p 2) These apparent contradictions and how they may be resolved are discussed in Appendix C.

Conclusion
This paper presents evidence for the idea that much of human learning, perception, and cognition (HLPC) may be understood as IC, often via the matching and unification of patterns.
The paper is part of a programme of research developing the SP Theory of Intelligence and its realisation in the SP Computer Model-a theory which aims to simplify and integrate observations and concepts across artificial intelligence, mainstream computing, mathematics, and HLPC.
Since IC is central in the SP Theory, evidence for IC in HLPC, presented in this paper in Sections 4 to 20 inclusive (but excluding Section 21), strengthens empirical support for the SP Theory, viewed as a theory of HLPC.
More direct empirical evidence for the SP Theory as a theory of HLPC-summarised in Section 2.2.5-provides evidence for the IC in HLPC thesis which is additional to that in Sections 4 to 20 inclusive.
Four possible objections to the IC in HLPC thesis, and the SP Theory, are described in Appendix C, with answers to those objections.
The ideas developed in this research may be seen to be part of a "Big Picture" of the importance of IC in at least six areas, outlined in Section 2.6.

A. Mathematics Associated with ICMUP and Mathematics Incorporated in the SP System
As mentioned in Section 2.3, this appendix details some mathematics associated with ICMUP and some of the mathematics incorporated in the SP System.
A. Maximising  means searching the space of possible unifications for the set of big, frequent patterns which gives the best value.For a sequence containing  symbols, the number of possible subsequences (including single symbols and all composite patterns, both coherent and fragmented) is The number of possible comparisons is the number of possible pairings of subsequences which is For all except the very smallest values of , the value of  is very large and the corresponding value of  is huge.In short, the abstract space of possible comparisons between patterns and thus the space of possible unifications is, in the great majority of cases, astronomically large.
Since the space is normally so large, it is not feasible to search it exhaustively.For that reason, we cannot normally guarantee to find the theoretically ideal answer, and normally we cannot know whether or not we have found the theoretically ideal answer.
In general, we need to use heuristic methods in searching-conducting the search in stages and discarding all but the best results at the end of each stage-and we must be content with answers that are "reasonably good".
A.2. Information, Compression of Information, Inductive Inference, and Probabilities.Solomonoff [17] seems to have been one of the first people to recognise the close connection that exists between IC and inductive inference (Section 2.5): predicting the future from the past and calculating probabilities for such inferences.The connection between them-which may at first sight seem obscure-lies in the redundancyas-repetition-of-patterns view of redundancy and how that relates to IC (Section 2.1, [1, Section 2.2.11]):Complexity 27 (i) Patterns that repeat within I represent redundancy in I, and IC can be achieved by reducing multiple instances of any pattern to one.(ii) When we make inductive predictions about the future, we do so on the basis of repeating patterns.For example, the repeating pattern "Spring, Summer, Autumn, Winter" enables us to predict that, if it is Spring time now, Summer will follow.
Thus IC and inductive inference are closely related to concepts of frequency and probability.Here are some of the ways in which these concepts are related: (i) Probability has a key role in Shannon's concept of information.In that perspective, the average quantity of information conveyed by one symbol in a sequence is where   is the probability of the th type in the alphabet of  available alphabetic symbol types.If the base for the logarithm is 2, then the information is measured in "bits".(ii) Measures of frequency or probability are central in techniques for economical coding such as the Huffman method [4, Section 5.6] or the Shannon-Fano-Elias method [4, Section 5.9].(iii) In the redundancy-as-repetition-of-patterns view of redundancy and IC, the frequencies of occurrence of patterns in I are a main factor (with the sizes of patterns) that determines how much compression can be achieved.(iv) Given a body of (binary) data that has been "fully" compressed (so that it may be regarded as random or nearly so), its absolute probability may be calculated as   = 2 − , where  is the length (in bits) of the compressed data.
Probability and IC may be regarded as two sides of the same coin.That said, they provide different perspectives on a range of problems.In this research, the IC perspective-with redundancy-as-repetition-of-patterns-seems to be more fruitful than viewing the same problems through the lens of probability.In the first case, one can see relatively clearly how compression may be achieved by the primitive operation of unifying patterns, whereas these ideas are obscured when the focus is on probabilities.In this case, assuming that the left image has the same number of pixels as the right image, the size of the search space is where  is the number of possible patterns in each image, calculated in the same way as was described in Appendix A.1.The fact that the images are two-dimensional needs no special provision because the original equations cover all combinations of atomic symbols.
For any stereogram with a realistic number of pixels, this space is very large indeed.Even with the very large processing power represented by the 10 11 neurons in the brain, it is inconceivable that this space can be searched in a few seconds and to such good effect without the use of heuristic methods.
David Marr [104,Chapter 3] describes two algorithms that solve this problem.In line with what has just been said, both algorithms rely on constraints on the search space and both may be seen as incremental search guided by redundancy-related metrics.

A.4. Coding and the Evaluation of SP-Multiple-Alignments in
Terms of IC.Given an SP-multiple-alignment like one of the two shown in Figure 4 (Section 2.2.2), one can derive a code SP-pattern from the SP-multiple-alignment in the following way: (1) Scan the SP-multiple-alignment from left to right looking for columns that contain an SP-symbol by itself, not aligned with any other symbol.
(2) Copy these SP-symbols into a code pattern in the same order that they appear in the SP-multiplealignment.
The code SP-pattern derived in this way from the SP-multiple-alignment shown in Figure 4 is "S 0 2 4 3 7 6 1 #S".This is, in effect, a compressed representation of those symbols in the New pattern which form hits with Old symbols in the SP-multiple-alignment.Given a code SP-pattern derived in this way, we may calculate a "compression difference" as or a "compression ratio" as where   is the total number of bits in those symbols in the New pattern which form hits with Old symbols in the SP-multiple-alignment and   is the total number of bits in the code SP-pattern (the "encoding") which has been derived from the SP-multiple-alignment as described above.
In each of these equations,   is calculated as where   is the size of the code for th symbol in a sequence,  1 ⋅ ⋅ ⋅  ℎ , comprising those symbols within the New pattern which form hits with Old symbols within the SP-multiplealignment (Appendix A.5). JGW: Barlow is right to say that knowledge of and recognition of redundancy is important "for this can tell [an animal] about structure and statistical regularity in its environment that are important for its survival".In keeping with that remark, knowledge of the frequency of occurrence of any pattern may serve in the calculation of absolute and relative probabilities ([1, Section 3.7], [6,Section 4.4]) and it can be the key to the correction of errors, as Barlow mentions in the quote from him in the heading of Appendix B.2.
But, in the SP System, redundancy is not treated as "something useless that can be stripped off and ignored".Patterns that repeat are reduced to a single instance and the frequency of occurrence of that single instance is recorded.The existence of single instances like that, each with a record of its frequency of occurrence, is very important, both in the way that the SP System builds its model of the world and also in the way that it makes inferences and calculates probabilities of those inferences.
As noted in Section 10, if we did not compress sensory information, "our brains would quickly become cluttered with millions of copies of things that we see around us-people, furniture, cups, trees, and so on-and likewise for sounds and other sensory inputs".And, as noted in Section 3.1.1,Barlow himself has pointed out that the mismatch between the relatively large amounts of information falling on the retina and the relatively small transmission capacity of the optic nerve suggests that sensory information is likely to be compressed [31, p. 548].And he has also pointed out that, in animals like cats, monkeys, and humans, "one obvious type of redundancy in the messages reaching the brain is the very nearly exact reduplication of one eye's message by the other eye" [32, p. 213], and because we normally see one view, not two, the duplication implies that the two views are merged and thus compressed.In general, the evidence presented in Sections 4 to 21 points strongly to IC as a prominent feature of HLPC.

B.2. "Redundancy Is Mainly Useful for Error Avoidance and
Correction".JGW: The heading above, from [34, p. 244], implies that compression of information via the reduction of redundancy is relatively unimportant, in keeping with the quotes from Barlow in the previous subsection.
Redundancy can certainly be useful in the avoidance of or correction of errors (Appendix C.2).But experience in the development and application of the SP Computer Model has shown that compression of information via the reduction of redundancy is also needed for such tasks as the parsing of natural language, pattern recognition, and grammatical inference.And compression of information may on occasion be intimately related to the correction of errors of omission, commission, and substitution, as described in Appendix C.2 and illustrated in Figure 19 (see also [6,Section 4.2.2] and [1, Section 6.2]).

B.3. "There Are Very Many More Neurons at Higher Levels in the Brain" and "Compressed, Non-Redundant, Representation
Would Not Be at All Suitable for the Kinds of Task That Brains Have to Perform".Following the remark that "This is the point on which my own opinion has changed most, partly in response to criticism and partly in response to new facts that have emerged."[34, p. 244], Barlow writes: "Originally both Attneave and I strongly emphasized the economy that could be achieved by recoding sensory messages to take advantage of their redundancy, but two points have become clear since those early days.First, anatomical evidence shows that there are very many more neurons at higher levels in the brain, suggesting that redundancy does not decrease, but actually increases.Second, the obvious forms of compressed, non-redundant, representation would not be at all suitable for the kinds of task that brains have to perform with the information represented; . .." [34, pp. 244-245].
and "I think one has to recognize that the information capacity of the higher representations is likely to be greater than that of the representation in the retina or optic nerve.If this is so, redundancy must increase, not decrease, because information cannot be created."[34, p.

245].
JGW: There seem to be two problems here: (i) The likelihood that there are "very many more neurons at higher levels in the brain [than at the sensory levels]" and that "the information capacity of the higher representations is likely to be greater than that of the representation in the retina or optic nerve" need not invalidate ICHLPC.It seems likely that many of the neurons at higher levels are concerned with the storage of one's accumulated knowledge over the period from one's birth to one's current age ([1, Chapter 11], [8, Section 4]).By contrast, neurons at the sensory level would be concerned only with the processing of sensory information at any one time.
Although knowledge in one's long-term memory stores is likely to be highly compressed and only a partial record of one's experiences, it is likely, for most of one's life except early childhood, to be very much larger than the sensory information one is processing at any one time.Hence, it should be no surprise to find many more neurons at higher levels than at the sensory level.
(ii) For reasons given in Appendix B.4, next, there are reasons for doubting the proposition that "the obvious forms of compressed, nonredundant, representation void oranges and lemons(int x) { printf("Oranges and lemons, Say the bells of St. Clement's; "); if (x > 1) oranges and lemons(x -1); } .
Algorithm 1: A simple recursive function showing how, via computing, it is possible to create repeated (redundant) copies of "Oranges and lemons, Say the bells of St. Clement's;".
would not be at all suitable for the kinds of task that brains have to perform with the information represented".

B.4. "Compressed Representations Are Unsuitable for the
Brain".Under the heading above, Barlow writes: "The typical result of a redundancy-reducing code would be to produce a distributed representation of the sensory input with a high activity ratio, in which many neurons are active simultaneously, and with high and nearly equal frequencies.It can be shown that, for one of the operations that is most essential in order to perform brain-like tasks, such high activityratio distributed representations are not only inconvenient, but also grossly inefficient from a statistical viewpoint . .." [34, p. 245].
JGW: With regard to these points, (i) It is not clear why Barlow should assume that a redundancy-reducing code would, typically, produce a distributed representation or that compressed representations are unsuitable for the brain.The SP System is dedicated to the creation of nondistributed compressed representations which work very well in several aspects of intelligence as outlined in Section 2.2.5 with pointers to where fuller information may be found.And in [8] it is argued that, in SP-Neural, such representations can be mapped on to plausible structures of neurons and their interconnections that are quite similar to Donald Hebb's [9] concept of a "cell assembly".
(ii) With regard to efficiency, (a) It is true that deep learning in artificial neural networks [10], with their distributed representations, is often hungry for computing resources, with the implication that they are inefficient.But otherwise they are quite successful with certain kinds of task, and there appears to be scope for increasing their efficiencies [105].(b) The SP System demonstrates that the compressed localist representations in the system are efficient and effective in a variety of kinds of task, as outlined in Section 2.2.5 with pointers to where fuller information may be found.

C. Some Apparent Contradictions of ICHLPC and the SP Theory, and How They May Be Resolved
The apparent contradictions of ICHLPC, and the SP Theory as a theory of HLPC that were mentioned in Section 22, are discussed in the following three subsections, with suggested answers to those apparent contradictions.

C.1. Redundancy May Be Created by Human Brains and via
Mathematics and Computing.Any person may create redundancy by simply repeating any action, including any portion of speech or writing.Although this seems to contradict the ICHLPC thesis, the contradiction may be resolved as described in the following subsections.

C.1.1. Creating Redundancy via IC.
With a computer, it is very easy to create information containing large amounts of redundancy and to do it by a process which may itself be seen to entail the compression of information.We can, for example, make a "call" to the function defined in Algorithm 1, using the pattern "oranges and lemons(100)".The effect of that call is to print out a highly redundant sequence containing 100 copies of the expression "Oranges and lemons, Say the bells of St. Clement's;".
Taking things step by step, this works as follows: (1) The pattern "oranges and lemons(100)" is matched with the pattern "void oranges and lemons(int x)" in the first line of the function.
(2) The two instances of "oranges and lemons" are unified and the value 100 is assigned to the variable .The assignment may also be understood in terms of the matching and unification of patterns but the details would be a distraction from the main point here.
(3) The instruction "printf("Oranges and lemons, Say the bells of St.Clement"s; ");" in the function has the effect of printing out "Oranges and lemons, Say the bells of St. Clement"s; ".
(4) Then if  > 1, the instruction "oranges and lemons(x -1)" has the effect of calling the function again but this time with 99 as the value of  (because of the instruction  − 1 in the pattern  In Figure 17(b), the process is reversed.Now the code SPpattern "S s0 n1 v0 #S" is supplied to the program as a New SP-pattern.Each of the SP-symbols in that SP-pattern are given extra bits of information to ensure that the program has some redundancy to work on, as mentioned above.The best SP-multiple-alignment that is created in this case contains "j o h n" followed by "r u n s", which is of course the original sentence, recreated via its code SP-symbols.
In general, the SP Computer Model, which is devoted to the compression of information, can reverse the process without any modification.It achieves "decompression by compression" without any paradox or contradiction.

C.1.4. How the SP System May Create Redundancy via Recursion.
The SP Computer Model may also create redundancy via recursion, as illustrated in Figure 18.
In this example, the SP Computer Model is supplied with two Old SP-patterns-"X b X #X 1 #X" and "X a 0 #X"-and a one-symbol New SP-pattern: "0".The program processes this information like this: (1) The SP-symbol "0" in the New SP-pattern is matched with, and implicitly unified with, the same SP-symbol in the Old SP-pattern "X a 0 #X", as shown in rows 0 and 1 in the figure.
(2) The SP-symbols "X" and "#X" at the beginning and end of "X a 0 #X" are matched and unified with the same two symbols at the third and fourth positions in the SP-pattern "X b X #X 1 #X", as shown in rows 1 and 2 in the figure.
(3) The SP-symbols "X" and "#X" at the beginning and end of "X b X #X 1 #X" are matched and unified with the same two symbols at the third and fourth positions in that same SP-pattern, as shown in rows 2 and 3 in the figure.
(4) After that, the process in step 3 repeats, as shown in rows 3 and 4 and rows 4 and 5 of the figure-and it may carry on like this, producing many SP-multiplealignments, until the operator stops it, or computer memory is exhausted.
If the matching symbols in Figure 18 are all unified (merging each matching pair into a single symbol), the result is a single sequence like this: "X b X b X b X b X a 0 #X 1 #X 1 #X 1 #X 1 #X", and likewise for all the many other SP-multiple-alignments that the program may produce.With all but the simplest of those SP-multiple-alignments, there would be redundancy in the repetition of the symbol "1" and likewise for other symbols in the figure.Hence, the SP Computer Model has created redundancy by a process which is devoted to the compression of information.

C.2. Redundancy Is Often Useful in the Detection and Correction of Errors and in the Storage and Processing of Information.
The fact that redundancy-repetition of information-is often useful in the detection and correction of errors and in the storage and processing of information, and the fact that these things are true in biological systems as well as artificial systems, is the second apparent contradiction to ICHLPC and the SP Theory as a theory of HLPC.Here are some examples: (i) Backup copies.With any kind of database, it is normal practice to maintain one or more backup copies as a safeguard against catastrophic loss of the data.Each backup copy represents redundancy in the system.(ii) Mirror copies.With information on the Internet, it is common practice to maintain two or more mirror copies in different places to minimise transmission times and to spread processing loads across two or more sites, thus reducing the chance of overload at any one site.Again, each mirror copy represents redundancy in the system.(iii) Redundancies as an aid to the correction of errors.
Redundancies in natural language can be a very useful aid to the comprehension of speech in noisy conditions.(iv) Redundancies in electronic messages.It is normal practice to add redundancies to electronic messages, in the form of additional bits of information together with checksums and also by repeating the transmission of any part of a message that has become corrupted.These things help to safeguard messages against accidental errors caused by such things as birds flying across transmission beams or electronic noise in the system and so on.
In information processing systems of any kind, uses of redundancy of the kind just described may coexist with ICMUP.For example, ". . . it is entirely possible for a database to be designed to minimise internal redundancies and, at the same time, for redundancies to be used in backup copies or mirror copies of the database . . .Paradoxical as it may sound, knowledge can be compressed and redundant at the same time" [1,Section 2.3.7].
As noted in Appendix C.1.3, the SP System, which is dedicated to the compression of information, will not work properly with totally random information containing no redundancy.It needs redundancy in its "New" data in order to achieve such things as the parsing of natural language, pattern recognition, and grammatical inference.Also, for the correction of errors in any incoming batch of New SPpatterns, it needs a repository of Old patterns that represent patterns of redundancy in a previously processed body of New information.
Figure 19 shows two SP-multiple-alignments that illustrate error correction by the SP Computer Model. Figure 19(a) shows, as a reference standard, a parsing of the sentence "t w o k i t t e n s p l a y" in row 0 where that New SP-pattern is free of errors.For comparison, Figure 19(b) shows a parsing in which the New SP-pattern in row 0 contains an error of omission ("t w o" is changed to "t o"), an error of substitution ("k i t t e n s" is changed to "k i t t e m s"), and an error of addition ("p l a y" is changed to "p l a x y").Despite these three errors, the best SP-multiple-alignment created by the SP Computer Model is what would normally be regarded as correct.
This example illustrates the point, mentioned in Appendix B.2, that the exploitation of redundancy for the correction of errors may on occasion be intimately related to the exploitation of redundancy for the compression of information.

C.3. The Human Mind as a Kluge.
As mentioned in Section 22, Gary Marcus has described persuasive evidence that, in many respects, the human mind is a kluge.To illustrate the point, here is a sample of what Marcus says: "Our memory is both spectacular and a constant source of disappointment: we recognize photos from our high school year-books decades later-yet find it impossible to remember what Complexity 35 we had for breakfast yesterday.Our memory is also prone to distortion, conflation, and simple failure.We can know a word but not be able to remember it when we need it . . .or we can learn something valuable and promptly forget it.The average high school student spends four years memorising dates, names, and places, drill after drill, and yet a significant number of teenagers can't even identify the century in which World War I took place" [103, p. 18, emphasis as in the original].
Clearly, human memory is, in some respects, much less effective than a computer disk drive or even a book.And it seems likely that at least part of the reason for this and other shortcomings of the human mind is that "Evolution [by natural selection] tends to work with what is already in place, making modifications rather than starting from scratch" and "piling new systems on top of old ones" [103, p. 12].
The evidence that Marcus presents is persuasive: it is difficult to deny that, in certain respects, the human mind is a kluge.And evolution by natural selection provides a plausible explanation for anomalies and inconsistencies in the workings of the human mind.
Broadly in keeping with these ideas, Marvin Minsky has suggested that "each [human] mind is made of many smaller processes" called agents, each one of which "can only do some simple thing that needs no mind or thought at all.Yet when we join these agents in societies-in certain very special ways-this leads to true intelligence" [106, p. 17].Perhaps errors here and there in a society of agents might explain the anomalies and inconsistencies in human thinking that Marcus has described.
Superficially, evidence and arguments presented by Marcus and Minsky seem to undermine the idea that there is some grand unifying principle-such as IC via SP-multiplealignment-that governs the organisation and workings of the human mind.But those conclusions are entirely compatible with ICHLPC and the SP Theory as a theory of mind.As Marcus says, "I don't mean to chuck the baby along with its bath-or even to suggest that kluges outnumber more beneficial adaptations.The biologist Leslie Orgel once wrote that "Mother Nature is smarter than you are," and most of the time it is" [103, p. 16], although Marcus warns that in comparisons between artificial systems and natural ones, nature does not always come out on top.
In general, it seems that, despite the evidence for kluges in the human mind, there can be powerful organising principles too.Since ICHLPC and the SP Theory are well supported by evidence, they are likely to provide useful insights into the nature of human intelligence, alongside an understanding that there are likely to be kluge-related anomalies and inconsistencies too.
Minsky's counsel of despair-"The power of intelligence stems from our vast diversity, not from any single, perfect principle" [106, p. 308]-is probably too strong.It is likely that there is at least one unifying principle for human-level intelligence, and there may be more.And it is likely that, with people, any such principle or principles operate alongside the somewhat haphazard influences of evolution by natural selection.

Figure 2 :
Figure 2: Schematic representation of the SP System from an "input" perspective.Reproduced with permission from Figure 1 in [6].

Figure 5 :
Figure5: A schematic representation of a partial SP-multiple-alignment in SP-Neural, as discussed in[8, Section 4].Each broken-line rectangle with rounded corners represents a pattern assembly-corresponding to an SP-pattern in the SP Theory.Each character or group of characters enclosed in a solid-line ellipse represents a neural symbol corresponding to an SP-symbol in the SP Theory.The lines between pattern assemblies represent nerve fibres with arrows showing the direction in which impulses travel.Neural symbols are mainly symbols from linguistics such as "NP" meaning "noun phrase", "D" meaning a "determiner", "#D" meaning the end of a determiner, and "#NP" meaning the end of a noun phrase.Reproduced with permission from Figure3in[8].

Figure 6 :
Figure 6: A schematic representation of versatility and integration in the SP System, with SP-multiple-alignment centre stage.

Figure 8 :
Figure8: A schematic view of how, if we close our eyes for a moment and open them again, we normally merge the before and after views to make one.The landscape here and in Figure9is from Wallpapers Buzz (https://www.wallpapersbuzz.com),reproduced with permission.

Figure 9 :
Figure 9: Schematic representation of how, in recognition, incoming visual information may be matched and unified with stored knowledge.

Figure 11 :
Figure 11: Diagram to show the relationship between the left and right images in Figure 10.Reproduced from [91, Figure 2.4-3], with permission of Alcatel-Lucent/Bell Labs.

Figure 12 :
Figure12: Variation in the rate of firing of a single ommatidium of the eye of a horseshoe crab in response to changing levels of illumination.Reproduced from [93, Figure16], with permission from the Optical Society of America.

Figure 13 :
Figure 13: Waveform for the spoken phrase "On our website" with an alphabetic transcription above the waveform and a phonetic transcription below it.With thanks to Sidney Wood of SWPhonetics (swphonetics.com)for the figure and for permission to reproduce it.

Figure 14 :Figure 15 :
Figure 14: Part of a parsing created by the MK10 Computer Model from a 10,000-letter sample of English (book 8A of the Ladybird Reading Series) with all spaces and punctuation removed.The program derived this parsing from the sample alone, without any prior dictionary or other knowledge of the structure of English.Reproduced from Figure 7.3 in [98], with permission.
(i) The SNPR Computer Model.The SNPR Computer Model, which was developed from the MK10 Computer Model, can discover plausible grammars from samples of English-like artificial languages [98, pp.181-185].This includes the discovery of segmental structures, classes of structure, and abstract patterns.ICMUP is central in how the program works.

A. 3 .
Random-Dot Stereograms.A particularly clear example of the kind of search described in Appendix A.1 is what the brain has to do to enable one to see the figure in the kinds of random-dot stereogram described in Section 11.

Figure 18 :
Figure18: One of many SP-multiple-alignments produced by the SP Computer Model with a New SP-pattern, "0", and a repository of usersupplied Old SP-patterns: "X b X #X 1 #X".Reproduced with permission from Figure4.4(a) in[1].

Figure 19 :
Figure 19: (a) The best SP-multiple-alignment created by the SP model with a store of Old SP-patterns like those in rows 1 to 8, representing grammatical structures, including words, and a New SP-pattern in row 0, representing a sentence to be parsed.(b) As in (a) but with errors of omission, commission, and substitution in the New SP-pattern and with same set of Old SP-patterns as before.

SP- multiple- alignment Unsupervised learning Analysis and production of natural language
1.Searching for Repeating Patterns.At first sight, the process of searching for repeating patterns (Sections 2.1.1 and 2.2.2) is simply a matter of comparing one pattern with another to see whether they match each other or not.But there are, typically, many alternative ways in which patterns within a given body of information, I, may be compared-and some are better than others. is the frequency of the th member of a set of  patterns, and  is its size in bits.Patterns that are both big and frequent are best.This equation applies irrespective of whether the patterns are coherent substrings or patterns that are discontinuous within I.
Complexityand ignored.An animal must identify what is redundant in its sensory messages, for this can tell it about structure and statistical regularity in its environment that are important for its survival."[34,p.243],and"It is ...knowledge and recognition of ...redundancy, not its reduction, that matters."[34,p.244].