A theory of consciousness from a theoretical computer science perspective: Insights from the Conscious Turing Machine

Significance This paper provides evidence that a theoretical computer science (TCS) perspective can add to our understanding of consciousness by providing a simple framework for employing tools from computational complexity theory and machine learning. Just as the Turing machine is a simple model to define and explore computation, the Conscious Turing Machine (CTM) is a simple model to define and explore consciousness (and related concepts). The CTM is not a model of the brain or cognition, nor is it intended to be, but a simple substrate-independent computational model of (the admittedly complex concept of) consciousness. This paper is intended to introduce this approach, show its possibilities, and stimulate research in consciousness from a TCS perspective.


Extended Summary
We consider consciousness from the perspective of theoretical computer science (TCS). Inspired by Alan Turing's simple yet powerful model of a computer, the Turing Machine (TM), and by Bernard Baars' Theater of Consciousness, we define a computational model of consciousness, the Conscious Turing Machine (CTM).
The CTM is defined formally as a 7-tuple, < STM, LTM, Up-Tree, Down-Tree, Links, Input, Output >. The theory includes a precise definition of George Miller's informally defined chunk, and a precise definition of a competition for deciding which of the (10 7 or more) Long Term Memory (LTM) processors gets access to Short Term Memory (STM).
Bi-directional links between processors that emerge in the life of the CTM enable conscious processing to become unconscious. Links are also crucial for the "global ignitions", described by  in their Global Neuronal Workspace Theory (GNWT), that re-enforce and sustain conscious awareness. Input/Output maps enable communication between the CTM and its environment. Other features of the model can be found in (Blum & Blum, 2021).
The definition of the model is followed by formal definitions of conscious content, conscious awareness, and the stream of consciousness, in the CTM. While these are just formal definitions, we claim that the CTM supports high-level explanations for these and other phenomena associated with consciousness, including the feeling of consciousness. One purpose of our model is to argue these claims. Another is to provide a theoretical computer science foundation for understanding consciousness.
In particular, we argue that the feeling of consciousness arises in CTM as a consequence of: 1. the global workspace architecture, which enables all processors, including those that are particularly responsible for the feeling of consciousness -Inner Speech, Inner Vision, Inner Sensations and Modelof-the-World -to be privy to the same (conscious) content of STM, 2. the expressive power of CTM's multi-modal inner language Brainish, which is able to express gists that betoken images, sounds, tactile sensations, thoughts, pains, pleasures, and the whole range of emotions, 3. the close correspondence between gists of outer speech (what we say and hear in the world), outer vision (what we see in the world), and so on, to gists of inner speech (what we say to ourselves), inner vision (what we see in dreams), and the like, and 4. predictive dynamics = cycles of prediction, feedback, and learning that help CTM develop its understanding and its ability to deal with its environment and inner world.
We argue that the feeling of free will in the CTM, like the experiences of illusions and dreams, are direct consequences of CTM's architecture, certain special processors such as the Model-of-the-World processor and the Inner generalized Speech processors, the expressive power of Brainish, and its predictive dynamics.

Q7. Why is the resolution mechanism (in the Up-Tree competition) the specific one that is proposed?
A. For the probabilistic CTM, the decision made at each interior node of the Up-Tree -namely which one of the node's two children's chunks should win the match -is decided by which chunk has the larger f-value. As processor p's chunk works its way up the tree, its f-value is affected by those processors that neighbor p. It is a surprising consequence of this mechanism that if f is additive then the probability that a chunk rises to STM is independent of its location, that is to say the location of the processor that generated it, on (the leaves of) the competition Up-Tree. As a consequence, the competition is permutation independent.
There are other completely different and even more important reasons for making f additive: for one, when f is not additive, in each and every one of the many nontrivial examples considered, something always goes terribly wrong. For example, if f (chunk) = |mood| then a strong positive mood and a strong negative mood appearing in two siblings, children of a node, completely cancel: neither "becomes" conscious (reaches STM) even when all other chunks are relatively unimportant. For example, consider the following Up-Tree with f = |mood| and w1 = 100, w2 = -100, w3 = 1, w4 = 2. Then the Up-Tree competition looks like this: For another example, if f(chunk) = |weight| (see figure in answer A1 to Q9), then two chunks having the same maximum |weight| can have vastly different probabilities of reaching STM. This does not happen if f is additive.

______________________________
Q8. Why do you focus on the probabilistic rather than the deterministic CTM as being the correct model?
A. There are many reasons for this. For one, as noted above, with any additive competition function such as f(chunk) = intensity, the competition is permutation independent if CTM is probabilistic. This is not the case if CTM is deterministic, not even if f is additive:

Deterministic Competition Trees with Competition Function f: chunk à intensity.
For another, in the probabilistic CTM, gists submitted to the competition, even those with small intensity, get into STM with probability proportional to their estimated importance (f-value). Again, this is not the case for a deterministic CTM.

______________________________
Q9. Why do the authors choose to have intensity and mood in the chunk? It seems equally valid to discard them.

A1.
Without intensity and mood in the chunk, every "reasonable" competition function such as f(chunk) = |weight| is non-additive, which leads to the possibility of a weird lopsided kind of consciousness. For example, suppose the competition tree has N/2 chunks each of a heavy weight W in the left-hand subtree (LHST), exactly 1 chunk of the same weight W in the right-hand subtree (RHST), and that all other chunks have negligible weight, as in the next figure. In that case, though CTM considers all heavily weighted chunks equally important, the (heavily weighted) chunks in the LHST each have negligible probability 1/N to get into STM, while the single heavily weighted chunk in the RHST has probability almost 1/2 to get into STM: A2. At time t+h, the winning chunk contains the weight that was originally assigned to it at time t when it got put into the competition. The intensity and mood of that winning chunk, which were set to |weight| and weight respectively at time t, got continuously modified as the chunk moved up the competition tree until the chunk entered the STM at time t+h, at which time the intensity and mood are indicators of (N times) the average intensity and mood of the entire CTM at time t. We believe that humans are normally consciously aware of their global intensity and mood, a fact that makes it entirely reasonable to include intensity and mood in the chunk.

A3.
Chess and tennis tournaments use seeding to give players of equal strength roughly equal chances of winning the tournament. The Up-Tree competition with additive competition function assures that even without seeding, all players have a probability of winning proportional to their ability/expertise.

Q10. Where does feedback come from?
A. Feedback comes from chunks that are received in broadcasts from STM, through links, and from the environment via Input maps, all of which have information that can be graded as "erroneous" or "correct" in comparison to (stored) predictions.

Q11. How do processors judge whether or not their information is valuable?
A. Judgements are based on feedback. Each processor has a Sleeping Experts Algorithm (SEA) that learns, based on feedback, what the weight-giving power of its processor should have been, so that weight assignments eventually settle down to something more or less correct. Roughly speaking, when a |weight| is too low, the SEA multiplies the weight-giving power by 2 (more generally by some constant c > 1). When too high, the SEA multiplies it by ½ (more generally by 1/c).

Q12. Why would chunks contain queries and answers?
A. For example, when you meet a person at a party and can't remember her name, a chunk produced by a processor can pose the query "What's her name?". When this chunk wins the competition for STM, its query is broadcast to all LTM processors. Sometime later, another processor answers, "I think her name begins with T," which rises to STM and gets broadcast. This can later, perhaps much later, trigger another processor to answer, "Her name is Tina." ______________________________ Q13: Must the CTM have a Model-of-the-World processor? A1: The Model-of-the-World processor is a fundamental component of consciousness. Our explanation for the "feeling of consciousness" is for a CTM that has a Model-of-the-World. We don't see how to do without it.
A2. Our argument that CTM feels conscious depends on it having a Model-of-the-World processor and is akin to the argument given by the Attention Schema of . We argue that CTM's feeling of consciousness, in the sense that the term is normally understood, is a consequence of the fact that what the CTM consciously knows (in the formal sense) of the world and of itself in the world is the Model-of-the-World processor's view of the world, and its view of the "CTM" in its models of the world; and that this view, which includes that the "CTM" is conscious, is broadcast to all processors.

Q14. Isn't naming specific processors, such as the Model-of-the-World processor, in definite and final terms, an over-specification of the model?
A. The proposed Model-of-the-World processor is only an example of how such a processor might work. It is not meant to be the definitive final specification. It's kind of like Turing's universal machine. The idea is important, as is having a description of such a machine. The particular machine is less important.

Q15. Why would the whole cortex not constitute a Model-of-the-World processor (as commonly assumed in neuroscience) and not just the Model-of-the-World processor?
A. The whole cortex may well be viewed as a model of the world in both the human brain and the CTM. We don't suggest otherwise.

Q16. Why would the lack of input from the environment lead to incoherent thoughts in dreams?
A. No, no, we're not saying that dreams must be incoherent, only that they can be incoherent, and that this is especially the case in dreams because the processors involved in a dream are not getting feedback from the environment. For example, in a dream, one might believe that one can fly.

Q17. What question does the theory address that is not already accounted for by the standard GWT-related theories?
A1. Unlike standard GWT-related theories of consciousness, CTM is a substrate independent computational model of consciousness, not a model of the brain. Its purpose is to explain how a machine can experience feelings. As Arlindo Oliveira has pointed out: "The proposed model is not a model of human consciousness, but a computational model that can explain many features of conscious behavior and that address directly the hard problem of consciousness, as defined by Chalmers. [It explains] why systems that are subject to the laws of physics can have subjective experiences." A2. No other GWT-related theory gives a substantive idea how processors might decide among themselves what information to send to the stage.

Q18. How does the theory argue that CTM has free will?
A. We don't. We argue that CTM has the feeling of free will. Our argument is two-fold.
The first part of the argument has to do with resource limitations -a complexity theory argument. For example, when the CTM plays chess, it can be faced with a selection of possible moves but, not yet having evaluated the consequences of those moves, the CTM is free (and knows it is free) to choose whichever move it reckons bestwithin the time constraints. The second part of the argument is that CTM's Model-of-the-World processor tags the "CTM" in its models-ofthe-world with a multi-modal Brainish gist asserting that CTM is in the process of choosing its next movemeaning that it (the CTM) is free to choose its next move. Of course, its decision is deterministic (assuming as we do Newton's deterministic physics). However, this labeling of "CTM" as having free will gives CTM its knowledge that its processors may now suggest the next move, and this knowledge is conveyed by a feeling of free will. This argument is similar to our argument that CTM feels it is conscious.

Q19. Do you have an implementation of the CTM?
A. We do not. That said, Jean-Louis Villecroze is working on an implementation (Villecroze, 2019), and Paul Liang is working on developing the multi-modal language Brainish (Liang, 2022) in his PhD research on multi-modal machine learning. Our own focus is on understanding the hard problem of consciousness and some essential related issues. In this paper, we are not attempting to provide novel biological predictions, nor AI implementations -as wonderful as those would be -but to provide a simple machine model for consciousness.
We note that it took Alan Turing almost a decade (mostly due to his war work) to go from his theoretical one-tape universal machine to his complete circuit specification for the implementation of a universal computer -a description so complete that it included vacuum tube choices, resistor and capacitor values, mercury delay line memories, and even the cost of the computer in pounds (Turing A. M., 1945). Unfortunately, due to politics, Turing's Automatic Computing Engine (ACE) never saw the light of day; only a more primitive computer, the Pilot ACE, was constructed. (Hodges, 1992). A2. Even a complete knowledge of the "circuitry" of the brain and a complete knowledge of the neural correlates of consciousness -as wonderful and desirable as it would be to have these -cannot explain how the feeling of consciousness arises. To understand that feeling, something else is needed. That something is what we are proposing to get a handle on with the CTM.

Q21. Could brain dynamics and competition between attractors be a more neurally plausible explanation for how the brain works?
A. Perhaps, but... we're not looking to model the brain or brain dynamics but to understand consciousness. For this purpose, we use the mathematics that we find most helpful.

Q22. Assuming the brain is a CTM, what are some conditions for a part of the brain to be considered the STM?
A. The STM has a very small memory, a relatively small direct input, and an output that goes almost everywhere.
In "A Brain Structure Looking for a Function" (Koch, 2014), Christof Koch suggests that the claustrum might fit the bill.

Altered States of Consciousness (added section)
Under psychedelics or meditation, humans can experience altered states of consciousness ranging from a heightened sense of awareness to dissolution of self (feelings of being "one with the world"). We agree with (Bayne & Carter, 2018) that these are states, not levels, of consciousness. We disagree, however, with their assessment that the global/global neuronal workspace theories are too simple to explain these altered states. Indeed, the beauty of those theories lies in the significant understanding that comes of their simplicity.
Here we show how the CTM might experience a simple form of dissolution of self. We start by describing a Mindful Meditation processor (MMp) that a CTM might have.
The conscious decision to meditate would be the concern of the MMp that creates and submits a sequence of chunks to the competition for STM. Through repeated practice, this processor gains strength and increases the intensity of its chunks. It can be surprisingly difficult for the MMp to keep other chunks from entering STM. The difficulty is not in the sense of lifting a heavy weight or proving a difficult theorem, but in the sense of demanding focused concentrated attention and practice. (Rathi, 2021) explains how a human, using the Mantra meditation technique, accomplishes this.
When the MMp is successful, its chunks get into STM and are broadcast. Those broadcasts generally contain feedback that other processors use, through their Sleeping Experts Algorithm, to hush their own selfevaluations. Thus during successful meditation, the MMp's chunks get the lion's share of time in STM.
Additionally, during successful meditation, chunks that get communicated via links from all processors except the MMp get hushed -by the incoming broadcasts from MMp -and thus processors are unlikely to pay their usual attention to the chunks they receive through links. 3 This "hushing" or diminishing of functional connectivity is observed in studies on effects of psychedelics and meditation. For example, brain imaging and electromagnetic studies on effects of certain psychedelics (psilocybin) suggest that the dissolution of self ("ego-dissolution") is due to "disintegration" of functional connectivity (Calvey & Howells, 2018). This decreased connectivity accounts in part for the sense of dissolution of spatial boundaries, which in turn leads to the feeling of being "one with the world".
Neuroimaging studies on various forms of meditation from distinct traditions share some common neural correlates, see (Millière, Carhart-Harris, Roseman, Trautwein, & Berkovich-Ohana, 2018). Importantly, the latter report that in several forms of meditation there is "attenuation for either activity or functional connectivity" in the medial prefrontal cortex and in the posterior cingular cortex, key nodes of the so-called default mode network (DMN). The DMN is active when a person is daydreaming or mind-wandering. It is also active when a person is thinking about others or themselves, remembering the past, or planning for the future, see (Buckner, Andrews-Hanna, & Schacter, 2008) and (Lieberman, 2013). Thus attenuation of functional connectivity in these areas may also account for dissolution of self.

Relation of CTM to Other Theories of Consciousness
The CTM is an abstract computational model designed to consider consciousness from a TCS perspective. It is not intended to model the brain nor the neural correlates of consciousness. Nevertheless, the CTM is both inspired by, and has certain features in common with, neural, cognitive, and philosophical theories of consciousness.
The CTM is directly influenced by Bernard Baars' GWT , which is supported by , (Dehaene S. , 2014) and (Mashour, Roelfsema, Changeux, & Dehaene, 2020) in their investigation of neural correlates of consciousness known as the Global Neuronal Workspace Theory (GNWT). We are inspired by David Mumford's 1991 work on the computational architecture of the neocortex (Mumford, 1991), which we view as an early proposal for GNWT.
Like the LIDA model of cognition (Baars & Franklin, 2007) and (Baars & Franklin, 2009), CTM is architectural. Unlike LIDA, which is a more elaborate model of GWT, the CTM is intended to be a minimal model of GWT sufficient to explain a wide range of conscious phenomena and, in particular, the feeling of consciousness.
We see a kinship between the CTM and the self-aware robots developed by (Chella, Pipitone, Morin, & Racy, 2020). We also see a kinship between the CTM and the Global Latent Workspace (GLW) proposed by (VanRullen & Kanai, 2021) for deep learning.
Our explanation for CTM's "feeling of consciousness" aligns closely with Michael Graziano's Attention Schema Theory (AST) . As in AST, CTM is consciously aware of both external and internal events. Basic AST is similar to GWT: its i-consciousness (i for information) aligns somewhat with CTM's conscious awareness. 4 However, we do not agree with Graziano et al. that GWT "leaves unexplained how people end up believing they have subjective experience" i.e., that it leaves an explanatory gap. Instead, we argue that in our model, the feeling of subjective experience arises when "winning chunks" from imaginings and dreams, for example, are received by the same (unconscious) processors that receive chunks directly from the environment via Input maps. Additionally, the Model-of-the-World processor incorporates the information gotten from the winning chunks (i.e., the conscious content of the CTM) into its models of the world, as appropriate, tagging the "CTM" in all models of the world as "conscious". This is similar to Graziano's argument for consciousness in the AST. Fuller discussion for the feeling of consciousness in the CTM is in (Blum & Blum, 2021).
Philosophically, we align with much of Daniel Dennett's functionalist perspective (Dennett D. C., 1991) except we don't agree with his view that we are the only species to have consciousness (Dennett D. C., 1978) (Dennett D. C., 2019). As for animal consciousness, we agree with (Mumford, 2019) that consciousness is a matter of degree. Here he cites (Merker, 2007) that consciousness does not need a cerebral cortex: it arises from midbrain structures. We would also cite other studies, e.g., (Slobodchikoff, 2012).
We do not see the explanatory gap (Levine, 1983) between functional and phenomenological consciousness as insurmountable. This viewpoint aligns closely with Baars (see (Kaufman, 2020) interview) and (Dennett D. C., 2016). Indeed, we see the CTM's ability to tag and test features in its models of the world as playing a role in the feeling of "what it is like" (Nagel, 1974).
Both AST and CTM appear to embody illusionist notions of consciousness proposed by (Dennett D. C., 2019) and Keith Frankish (Frankish, 2016). Saying that the feeling of consciousness is an illusion does not deny the existence of that feeling. As a familiar example, the fact that a movie is made up of (many) discrete still images does not affect the feeling of continuity one gets from viewing it. The feeling of continuity is an illusion.
By utilizing existing technology (or apps) to supplement its supply of LTM processors, CTM incorporates elements similar to those advocated by (Clark & Chalmers, 1998)'s "extended minds".
Integrated Information Theory (IIT), the theory of consciousness developed by Giulio Tononi, (Tononi, 2004) and supported by Koch (Tononi & Koch, 2015), proposes a measure of consciousness called PHI, inspired by Shannon's information theory that essentially measures the amount of feedback in a system. It is a mechanism's intrinsic ability to influence itself, rather than its input-output information processing, that determines its consciousness.
This is consistent with CTM's intrinsic predictive dynamics (of prediction, feedback and learning). Tononi proposes five "axioms" (properties) necessary for any causal system to have consciousness. 5 Given a detailed specification of a CTM, one could in principle compute its PHI and compare it to the PHI of any other precisely defined causal system. It turns out that many causal physical systems have non-zero measures of PHI. IIT would validate animal consciousness.
With regard to the "adversarial collaboration" between advocates of GNWT and IIT, (Reardon, 2019) and (Melloni, Mudrik, Pitts, & Koch, 2021), the CTM shares features of both basic theories, as pointed out above. Our view is that both theories add to the discussion of consciousness. The adversarial aspects between the theories arise mainly from the advocates' differing views on brain regions primarily responsible for consciousness -prefrontal cortex for GNWT, posterior cortex for IIT. We note however, it is possible to have some level of consciousness without a cerebral cortex at all (Merker, 2007) and suspect that in such cases, as in the CTM, aspects of the basic GWT and IIT are still in play.
Our view on free will is close to Dehaene's (Dehaene S. , 2014). Our explanation of the feeling of free will in the CTM incorporates additionally and especially, resource limits imposed by computational complexity considerations.

About the Authors of the expanded monograph (in progress)
Manuel has been motivated to understand the mind/body problem since he was in second grade when his teacher told his mom she should not expect him to get past high school. As an undergrad at MIT, he spent a year studying Freud and then apprenticed himself to the great anti-Freud 6 neurophysiologist, Dr. Warren S. McCulloch, who became his intellectual mentor. When he told Warren (McCulloch) and Walter (Pitts) that he wanted to study consciousness, he was told in no uncertain terms that he was verboten to do so and why (there was no fMRI at the time). As a graduate student, he asked and got Marvin Minsky to be his thesis advisor. Manuel is one of the founders of complexity theory, a Turing Award winner, and has mentored many in the field who have chartered new directions ranging from computational learning, cryptography, zero knowledge, interactive proofs, proof checkers, and human computation. He is a Fellow of AAAS1, AAAS2, NAS, NAE. Manuel Blum mblum@cs.cmu.edu Lenore has been passionate about mathematics since she was 10. She attributes that to having dropped out of school when she was 9 to wander the world, then hit the ground running when she returned and became fascinated with the Euclidean Algorithm. Her interests turned to non-standard models of mathematics, and of computation. As a graduate student at MIT, she showed how to use saturated model theory to get new results in differential algebra. Later, with Mike Shub and Steve Smale, she developed a foundational theory for computing and complexity over continuous domains such as the real or complex numbers. The theory generalizes the Turing-based theory (for discrete domains) and has been foundaional for computational mathematics.
Lenore is internationally known for her work in increasing the participation of girls and women in STEM and is proud that CMU has gender parity in its undergraduate CS program. Over the years, she has been active in the mathematics community:

------------------------------
All three Blums received their PhDs at MIT and spent a cumulative 65 wonderful years on the faculty of the Computer Science Department at CMU. Currently the elder two are emeriti and the younger is Professor and Chief Academic Officer at TTIC (Toyota Technological Institute at Chicago), a PhD-granting computer science research institute focusing on areas of machine learning, algorithms, AI (robotics, natural language, speech, and vision), data science and computational biology, and located on the University of Chicago campus. Manuel Blum is Emeritus Professor of Computer Science at UC Berkeley and CMU. Lenore Blum is Emerita Distinguished Career Professor of Computer Science at CMU and is currently a Distinguished Professor-in-Residence at UC Berkeley.

Connections in CTM to and from an LTM processor.
A coin-flip neuron on input (a, b) with a + b > 0. Baars' GWT model (l); CTM (r).

Selective attention test:
Screen shots from video, "The original selective attention task" .

The Whodunnit video:
Beginning and ending screen shots from video (London, 2008).