Learning human insight by cooperative AI: Shannon-Neumann measure

A conceptually sound solution to a complex real-world challenge, is built on a solid foundation of key insights, gained by posing ‘good’ questions, at the ‘right’ times/places. If the foundation is weak, due to insufficient human insight, the resulting, conceptually flawed solution, can be very costly or impossible to correct downstream. The response to the global 2020 pandemic, by countries using just-in-time supply/production chains and fragmented health-care systems, are striking examples. Here, Artificial intelligence (AI) tools to help human insight, are of significant value. We present a computational measure of insight gains, which a cooperative AI agent can compute, by having a specific internal framework, and by observing how a human behaves. This measure enables a cooperative AI to maximally boost human insight, during an iterated questioning process—a solid foundation for solving complex open-ended challenges. It is an AI-Human insight bridge, built on Shannon entropy and von Neumann utility. Our next paper will addresses how this measure and its associated strategy, reduce a hard cooperative inverse reinforcement learning game, to simple Q-Learning, proven to converge to a near-optimal policy.


Introduction
Insight, defined as a combination of discrimination, understanding and intuition, is critical in the conceptual phases of solution-building, to open-ended real-world challenges. These are a complex mixture of human, financial and technological needs. Advances in Big data and machine learning (ML) generated insights [1] can be extremely useful, yet will remain profoundly incomplete, as long as the human dimension is not well accounted for.
In the context of business, financial or scientific insight and discovery, ML is extensively used to extract useful patterns, by combining big data with human-generated prior domain knowledge, to enable interpretation and explanation. We approach the question from the complementary perspective of human compatible artificial intelligence (AI) [2] instead of taking humans out of the loop during computations (which risks being an opaque 'black box', and missing the human dimension), we focus on human-centered AI, to assist people in gaining insights, by observing human behavior.
We propose a computational measure μ I of insight I, enabling an AI to compute gains in I, resulting from a cooperative questioning process, between a human(H), and an AI cognitive assistant (A cog ).
The framework for the AI's learning, is a 2-person Iterated Questioning (IQ) game. A set Q, of (yes/no) questions q i äQ, posed to reach a goal G. The set Q is shared by both H and A cog , whose only objective is to maximize the total insight gained by H, over the course of the cooperative game [3]. Each IQ-game is played until H achieves a given long-term objective: to reach a minimum threshold of total insight gained. Reaching this threshold, defines a learning episode for A cog . The game is 'solved', when optimal strategies (the shortest way to reach the threshold) are found for H and A cog .
For A cog to achieve its single-minded role, it needs to learn about human insights. It can learn this, by the way we react during the IQ game. In this cooperative 2-person game: A cog ʼs decision policy is to suggest a probability-ranked list of questions 1, , H, and Hʼs decision policy is to select the most promising questions Î q Q j to explore, and then signal how useful q j was.
This will enable A cog to compute μ I , simply by observing H. The measure ( ) m Q I G combines the (objective) Shannon entropy S, with a user-defined (subjective) von Neumann utility U . We call it the Shannon-Neumann, or SN-measure of insight gains [3,4] .
The assistant A cog ʼs only objective, is to help H gain a maximum total amount of insight m T G , over the pursuit of a long-term objective. This approach is consistent with Stuart Russell's thesis [5] for designing humancompatible AI, where the AI assistant cooperatively realigns with Hʼs short and long-term goals.
2. Insight gains in a questioning process 2.1. Measuring insight gains Gains of insight are normally defined as changes in discernment (precision-increase), understanding (errorcorrection, uncertainty-reduction), and intuition (familiarity-boost) about a topic T. We want to introduce a reasonable computational measure, of gains in insight resulting from exploring a set of questions. Discernment, understanding, and intuition on a topic T, are normally gained, by posing a set of 'good questions', at the 'right times and places', leading to stories, metaphors, analogies, models and theories of T, with increasing accuracy, precision and relevance. In this paper we address how the AI agent learns what a 'good question' is (one that maximizes the insight gains). In the next paper [6], we will model the notion of the 'right times/places', using Cooperative Inverse Reinforcement Learning [7].
The IQ game is a precision-boosting, uncertainty-reducing process, where we start from an approximate understanding of T, whose precision will increase and error will decrease, under iterated questioning. We formalize the IQ, using the simplest model, which still retains its essence: posing a suite of (yes/no) questions Q, to achieve a goal G, which we describe next.

Framework for insight gains
We create a framework, which enables us to separate the 'good questions' (more informative and insightful ones), from the 'bad questions' (less informative and insightful ones). The framework can be thought of, as an elaborate form of the 20 (yes/no) question game.
A formal questioning framework F (our toy universe), is used to frame and measure insight gains towards achieving a goal G, when answering questions in Q. The measure is built, by combining Shannon information S [8] with von Neumann utility measure U, associated to a set of questions Q: we call it the SN-measure. The elements of A cog ʼs toy cognitive universe ( ) F T G Q , , are: • A topic T consisting of a bag of N balls, each with only two properties: , . The long-term goal, is to fully determine T.
The set Q is a discrete information source [9], as the answers to each q i produces a certain amount of information (and insight on achieving G), by reducing the uncertainty or ambiguity, about T. In this IQ game, our long-term goal, is to fully determine Tʼs composition. The short-term goals Î g G i , to gain insights on T, are: To know the proportion of balls in T, of each color' To know the proportion of balls in T, of each size' The insight generating (yes/no) questions q i äQ on the topic T, could be, for example, the following: • q 2 = 'are the balls in T, mostly B?' • q 3 = 'are the balls in T, mostly L?' This simple framework, allows us to express the necessary (and arguably, sufficient) properties of a good measure ( ) m Q I G of insight gains, in mathematical terms.

Necessary properties of insight gains measures
We want to measure the insight gained ( ) m Q I G , by a 'yes' answer to each question q i äQ in the information source Q, with respect to each short-term goal Î g G j . The required properties Î P P i , of a meaningful insight measure μ are the following: The justifications for these requirements on a measure μ of insight gain, are: • P 1 : m m > 1 2 when a 'yes' answer to q 1 is more informative (reduces more uncertainty) than a 'yes' answer to q 2 , and both are useful for achieving g 1 .
• P 2 : m > 0 when a 'yes' answer to q i is informative (about T) and useful for g j • P 3 : μ=0 is when a 'yes' answer to q i is informative (about T), but q i useless for g j • P 3 : μ=0 when a 'yes' answer to q i does not produce relevant information The intuitive interpretation is that for a question to be insightful (produce new insight), it must be simultaneously, informative (Shannon) about the topic T, and useful (von Neumann) for achieving the goal G. In other words, we define insightful, as both informative (uncertainty, error and ignorance reducing) about a topic, and actionable (useful) for achieving a goal. We Relations (1, 2) are fundamental definitions of entropy, in information theory [11] and previously, in statistical mechanics [12,13]. In the context of our 2-person cooperative IQ-game, ( ) = p p q i i is the probability that the answer to the (yes/no) question q i about the topic T, is 'yes'.
Note that, the more uncertainty (ignorance) the answer to q i resolves, the lower its probability ( ) p q i , and the more informative it is. The information units are bits, when the log base is 2. So a good estimator for the probabilities p i , is the amount of uncertainty a 'yes' answer eliminates (just as 'good' questions, in the 20question game cut possibilities by half-this is also true for the coin-weighing game to identify the single defective coin).
For example, if the random variable is throwing a fair die, the answer to = q 1 'is the outcome a 6?' is less probable ( ( ) = p q 1 6 1 ) and more informative than the answer to q 2 ='is the outcome an even number?' ( ( ) ) ( ) ( )( ) ( ) ( ) = º > º p q S q bits S q 1 2 : log 1 6 log 1 2 2 1 2 (bits), and the 100% certain answer to q 3 ='is the outcome an integer?' adds no information: ( ) ( ) ( ) º = S q bits log 1 0 . 3 The Shannon entropy, is consistent with some desirable properties of a insight gains, but is missing a key ingredient: utility with respect to achieving the goal g, to which we now turn. The search for new insight, is usually done in the context of a specific target goal.
Standard utility, as defined in the von Neumann-Morgenstern utility theorem [15], has long been a foundation of economic and (rational) decision-making, and has successfully described some aspects of human decisions, but not all, by any means [15]. For our cooperative 2-person IQ game, we use the simplest notion of utility, which does the job (Occam's razor). The simplest utility function ( ) { } Î U q g , 0,1 i j , is assigned by H, once the question q i has been answered, since only H can tell how useful a question q i is, with respect to his/her short-term goal g j . The cooperative H would need to express an observable sign for U (e.g. by clapping or smiling U= 1 (utils), or not clapping or smiling U=0 (utils) ), so that the agent A cog can measure ( ) U q g , i j , simply by observing H.
To satisfy all properties ( ) Î = P P i 0, 1, 2, 3, 4 i , we define the insight gained ( ) m q I g i i , from a 'yes' answer to a question q i (information source), to achieve a goal g j to be: for the insight gain, satisfies all properties Î P P i , and is a mixture between the Shannon information S produced, and a personal subjective utility U (von Neumann), acting as an AI-Human bridge for gaining insight. The SN-measure's units can be taken to be bits utils .
, and can be normalized into a mathematical (monotone) measure on [ ] 0, 1 . It satisfies the definition of insight as a combination of discernment, understanding and intuition, which all increase under the error-correcting, uncertainty-reducing, and familiarity-boosting IQ game.

Shannon-neumann strategies
For the AI agent to learn, the SN-measure (3) demands an iterated cooperation between A cog and the user H, during the IQ, to reach a long-term goal T. In this 2-person (A cog , H) cooperation, each player agrees to follow a well-defined decision strategy (policy) Π A and Π H .
Policy Π A : given a short-term goal Î g G j , A cog suggests to H, a Bayesian probability (evidence-based plausibility beliefs) distribution, as an ordinal-ranked list of questions Q, from most to least likely informative. A cog can start with an estimated Bayesian prior probability distribution { ( )} p q i , using the relative amounts of uncertainty removed by 'yes' answers, for an ordinal ranking of the ( ) p q i . This means A cog ʼs initial policy Π A , is approximate. The 2-person iterated cooperative game should be designed so that P  P A A *, the optimal policy. In our next paper, we show that a Shannon-Neumann strategy (based on the SN-measure) allows us to reduce a hard 2-person game of Cooperative Inverse Reinforcement Learning, to a simpler Q-Learning, proven to converge to a near-optimum [17].
Policy Π H : given a short-term goal Î g G j , H goes through A cog ʼs proposed probability-ranked list of questions Î q Q i , and expresses (signals) their Utility ( ) U q g , i j . The IQ game enables A cog to refine its own understanding of human insight gains, in a iterated manner. Recall that A cog ʼs only purpose in life (objective), is to maximize Hʼs total gain of insights ( ) m Q T G , over the course of IQP. Using the SN-policy will reduce solving the computationally hard 2-person cooperative game, to a proven Q-Learning of a near-optimal policy P A *.
We now ask an important question: is there an optimal way to structure the information source Q, for the short-term goal G? The answer is 'yes', thanks to Shannon's notion of source entropy.
The Shannon entropy S(Q) of the whole information source Q, is the expected amount E, of information from the source Q: In agreement with Shannon's theory, the probabilities are set by the information source, in our case A cog , which structures Q. The AI agent A cog , can construct Q, to interpret the questions q i , as independent hypotheses to be tested, and ( ) p q i as the probabilities of outcomes, of independent experiments (e.g., repeated coin tosses), or as probabilities of answers to independent questions (e.g., 20 (yes/no) question game).
In this latter interpretation, S(Q) is the expected Shannon entropy (mean information gained per question), from the source Q, which we can maximize, by setting an equilibrium distribution: All (yes/no) questions q i äQ, have an equally probable 'yes' answer, and the expected information gain S(Q) is maximum: n n n bits per question 1 log 1 log 6 To optimize the effectiveness/efficiency of the IQ game, the agent A cog should aim to structure Q as close as possible to an equilibrium distribution, to maximize the expected information (and thus, insight) gains by H, during the course of the entire IQ episode. Also, S(Q) increases with the number n of questions in Q.

Conclusion
In complex real-world challenges, conceptually sound solutions are based on a solid foundation of key conceptual insights, gained by posing 'good' questions, at the 'right' times/places. If the foundation is weak, due to insufficient insight (often, strong from a strictly financial and/or technological standpoint, but weak on human dimensions), the flawed resulting solutions can be very costly or impossible, to correct downstream. The outcomes in countries with 'just-in-time' supply chains and fragmented health care systems, during the 2020 global pandemic, comes to mind. Thus AI tools which boost human-insight are invaluable, to help us build sound solutions to complex open-ended challenges: designing, implementing, optimizing, validating and verifying complex human-technological systems and networks.
We defined a computational measure of insight gains, which an AI agent can compute, simply by having an internal questioning framework ( ) F T G Q , , , and by observing how a human behaves. The measure requires an information source Q, that is simultaneously informative S (in Shannon bits) about a topic T, and actionable U (useful, in von Neumann utils) for achieving a given goal G. This measure satisfies four necessary properties Î P P i , as well as the intuitive definition of new insights, as gains in discernment, understanding and intuition. In fact, we argue, along with many others before us [18][19][20][21][22] that H should always remain heavily in the loop, no matter how potent AI/ML becomes, for two main reasons: (1) it may eventually be the only work left for us to do, (2) no advanced non-human entity (e.g. Artificial General Intelligence AGI) can do it as well for our own benefit (just as we are incapable of doing it for, say, Bonobos and Orcas, no matter how much smarter we think we are). It takes a human to think and feel like one! This perspective makes our Shannon-Neumann measure ( ) m Q I G most valuable as a AI-Human bridge. In this paper, our SN-measure addressed the question of 'good vs bad' questions. In paper II, we address the difficulty of determining the 'right' times/places, by formalizing the 2-person IQ game, as cooperative inverse reinforcement learning.

Data availability statement
No new data were created or analysed in this study.