T-patterns, external memory and mass-societies in proteins and humans: In an eye-blink the naked ape became a string-controlled citizen

This project started in the 1970's inspired by biological behavior research (Morris, 1967; Tinbergen, 1965) including that of N. Tinbergen, K. Lorenz and K. von Frisch, for which they shared a Nobel Prize in 1973 in Medicine or Physiology, the first for ethological research. Further inspiration came from research on social insects and children (Montagner, 1971, 2012), interactions in human adults (Duncan & Fiske, 1977), probabilistic real-time analysis of behavior (Skinner, 1969) and linguistic analysis (Chomsky, 1957). There was not yet talk of self-similarity or nano scale agents. Adequate computational pattern discovery required models, algorithms and software, which has led to the definition of the scale independent T-pattern and related pattern types making up the T-system and the creation of the only available dedicated special purpose T-pattern and T-system pattern detection algorithms and software, THEMETM. Theme has already allowed the detection of T-patterns in many different research areas from human to neuronal interactions at time scales from days to 10-6 s and finally spatial T-patterns and T-strings in textual and molecular strings. Similarity of temporal and spatial patterning from human to neuronal interactions to giant purely informational physical strings, DNA and texts, seems to exist and biologically extremely recent self-similarity between each human mass-society and the protein mass-societies of the cells making up each of its typically >104 individuals. Giant T-patterned text strings, T-strings, as external memory have in a biological eye blink allowed the development of modern mass-societies with their science and technology allowing the discovery of this biologically sudden advent of unique self-similarity and thus a bio-mathematical continuum between nano and human scales, which may change views on modern human mass-societies and their modern lifestyle and issues.


Introduction
Those born in the late 1940's came into a world where some of the most advanced societies had in less than half a century killed countless millions of each other's members in two world wars and on all sides priests had pushed them much as in previous centuries of religious wars. Making sense of such a world naturally preoccupied the minds of many teenagers in the 1960's with still more wars.
Some looked to the biology of primates and of humans, who in a biological eye-blink went from small hunter-gatherer groups and villages to highly structured societies of hundreds of millions of specialized individuals. Very different from flocks or schools and other such animal aggregates. For millions of years structured mass-societies of more than, for example, 10 4 individuals had only existed among social insects, which also sacrificed and killed large numbers of each other's individuals in their wars. What else was common between these only kinds of animals with mass-societies that could cast new light on our own species and possibly suggest, for example, some underlying mathematical, physical and/or biological principles? External memory seemed promising, that is, relatively specific and durable modifications papers on Temporal Configuration Analysis, precursors to T-Pattern Analysis, TPA and the Theme software [23,24]. This initiated the present project and lead to the definition of the T-pattern and its extensions initially for the detection of hidden temporal patterns in behavior, but increasingly also focused on physical T-patterned strings, called Tstrings.

Collaboration
This project has involved close collaborations with leading researchers in the US mostly at The University of Chicago, University of Arizona & University of California Irvine and in Europe mostly at Museum of Mankind of the French National Museum of Natural History, Paris; the University of Paris V, VIII and XIII; the University of Barcelona and the University of Cambridge. Since 1995 also within a formalized inter-university collaboration between 32 European and American universities with bi-annual workshops/conferences, numerous papers and two edited books dedicated to the growing T-system and its applications ( [1,3,33] and [4]).

Mathematics and T-patterns
Mathematics is by most mathematicians defined as the "science of patterns" [8] suggesting an infinitely broad pattern concept. The Tpatterns and T-strings are therefore a tiny special purpose subset initially proposed for the analysis of behavior and some other biological phenomena [29].

T-patterns and interactions
The hierarchical self-similar T-pattern is characterized by the order and special relative distances, defined as critical intervals, (see below) between its components. It has allowed the description and detection of intra-and inter-individual causal and non-causal patterns frequently sharing the T-pattern structure, but the detection of intra-individual patterns may be a precondition for the detection of more complex interindividual patterns. For example, verbal interactions normally do not rely on exchanging single phonemes or syllables, but rather patterns of these. So, if the coding categories do not reach that level, the higherorder patterns must first be detected.

Interdisciplinary feedback
The 20th century saw more than a thousand fold increase in mathematical knowledge including the addition of many new fields ( [8], p. 3). Comparable increase continues in many other areas and not the least in biology and related sciences making it practically impossible to follow and especially in interdisciplinary research. Presenting evolving research to groups of experts in related areas has thus provided valuable feed-back as during the last few years the author has accepted numerous (mostly keynote) invitations to international conferences in genetics, proteomics, mass-spectrometry, A.I., applied mathematics, nanoscience, science of religion, neuroscience, psychology and the biology of behavior (see, for example, [37][38][39]).

External giant memory strings
Attention to external memory strings has been amplified by new insight into the nano scale intra-cellular world. Giant molecular Tstrings (DNA) with prescriptions for many kinds of specialized proteins (workers) forming highly organized "societies". Striking self-similarity thus appears with modern human mass-societies: Billions of years ago, the RNA world invented purely informational strings, DNA, and now there is only the DNA world. Then, similarly, just a biological eye blink ago, humans invented their purely informational strings, text, and suddenly their world is largely based on text.

Insight versus explanations
Causal explanations from nano to human scales, will here not be attempted, but such self-similarity across so many levels and orders of size and time suggests something biologically essential. The E. coli (m −6 , see [45], p. xxxviii) with its mass-society of millions of different cooperating proteins had existed for billions (10 9 ) of years before the sudden advent of human mass-societies of, for example, >10 4 specialized individuals and easily extending far beyond 10 3 m.

Ethology and species specific behavior in nature
A central theme in Ethology, the biology of behavior, is respecting the special characteristics of each species with a preference for their study under natural conditions. But initially focusing on animal behavior it was not well prepared for the study of humans in modern masssocieties and mega cities. Nor for analyzing human language and its written form, text, a powerful and uniquely human kind of external memory appearing in a biological eye-blink, but without which modern human behavior cannot be understood. Ethology was even less prepared for self-similarity, that is, an entity being similar to a part of itself nor its more rigorous fractal mathematical form [42] spanning numerous scales even throughout the increasingly known universe [2,19,55].
Research and findings in Ethology [59,60] were the initial inspiration for this project and especially the ethological work of Niko Tinbergen for which he in 1973 shared the Nobel Prize in Medicine or Physiology with Konrad Lorenz and Karl von Frisch. Konrad Lorenz and Niko Tinbergen for their studies of the behavior of birds, fish and humans, but von Frisch of the smallest, the highly social bees. At about this time there was also much new research on primate social behavior, for example, [9,18,48] and [61] E. O. Wilson's opus, Sociobiology, focused on insect mass-societies stimulating interest in similarities with human societies.

Ethology, self-similarity and mass-societies
Mass-society here refers to structured animal groups of approximately >10 4 individuals of many types, found among animals only in social insects and modern humans. This thus excludes less structured animal groupings such as swarms, schools, flocks and herds.
The tiny insects were generally the smallest creatures of interest within ethology and none of the studied creatures were parts of the others. Konrad Lorenz's inaugural Nobel speech entitled "Analogy as a source of knowledge" did not mention self-similarity (self-analogy). There was much interest in discovering behavioral sequences, but no mention yet of fractals. With the intra-cellular nano scale world mostly out of reach, activities of proteins including motor proteins as citizens (members) of protein mass-societies, sometimes called "Cell Cities", were as for billions of years, waiting to be discovered by their very recent descendants: human mass-societies. Now, thanks to recent technological progress and research efforts, striking self-similarities seem to appear between behavioral and social structures in protein and human mass-societies, many orders of magnitude apart in size and duration, a new bio-mathematical model throwing light on the sudden appearance of human mass-societies with their advanced science and technology. Humans have suddenly jumped from an illiterate nomadic hunter gatherer lifestyle to that of millions and even billions of textenabled and text-controlled highly specialized citizens some even heading towards other planets. With more people added since the year 2000 than all of humanity in the year 1900, billions soon to be added and signs of serious trouble ahead, it seems essential to reach the best possible understanding of human behavior, and especially mass-social behavior.
Beginning in the early 1970's this primarily ethological and methodological project was more specifically influenced by various ethological and human interaction research [6,7,10,46,47] and linguistics (for example, [5]), but also radical behaviorism (for example, [56]) all in different ways focusing on recurrent hierarchical and syntactically constrained temporal sequences, patterns or contingencies. That is, non-random probabilistic recurrent synchronic and/or sequential temporal patterns of behaviors that often were themselves such patterns. For example, common phrases often composed of common combinations of words being combinations of syllables themselves constrained temporal patterns of phonemes, or spatial patterns of letters in the case of written language (text). Some patterns occurring as parts of interactive verbal and/or nonverbal behavior where individuals interact with characteristic constraints on order and relative timing much as between the elements of melodies or everyday routines, rituals and ceremonies.
"Behavior consists of patterns in time. Investigations of behavior deal with sequences that, in contrast to bodily characteristics, are not always visible." Opening words of Eibl-Eibesfeldt's Ethology: The Biology of Behavior, [12], p. 1; {Emphasis added.} When phenomena are not directly discernible humans have developed artificial means of detecting them and in continuation of theoretical and methodological studies ( [21] & [22]) what has become the Tsystem and TPA was initially developed for the search for hidden or non-obvious human and animal interaction patterns [23,24]. But when far more structure than expected was discovered the interest soon turned towards possible broader applicability of the model and detection methods eventually including neuronal interactions analysis [49,50] and finally also spatial T-patterning in bio-physical strings, that is, molecules and text.
This has required since the late seventies the gradual development of detection algorithms corresponding to new pattern types and the design and programming of the detection software, THEME™, which in 1980 (before the PC) was a 3000 line Fortran IV program running on PDP 8 and 11 computers [23,24], but now a 300.000 lines Windows software recently using parallel processing on multi-core processors [26][27][28]30,31,[34][35][36]40].

The T-system
Starting with the T-pattern and its univariate version the T-burst, other structural aspects have been added to the T-system, including T-Markers, T-Predictors, T-Retrodictors and T-packets with +/-T-Associates as well as T-Composition described elsewhere [30,31,33,35,39,40]. A relation between T-patterns and cyclical patterns has been considered [25]. The latest addition to the T-system is the T-string a physical (spatial) string of elements where some form Tpatterns on a single discrete dimension of length (space) analogous to behavioral elements forming T-patterns in time.
It is hoped that this system of concepts with corresponding detection algorithms and software may serve a unified bio-mathematical framework for the discovery and description of analogous structures across levels of organization and orders of magnitude in time and space.
TPA has now been applied in a number of research areas implicating very different time scales from 10 −6 s in neuronal interactions [49,50] to days and even years [1,3,4,30,31,[33][34][35]40]. The latest addition to the T-system, the T-string, a physical T-patterned string is of essential importance in biological cells and modern human mass-societies (see below). The first T-pattern detection in textual T-strings has appeared [40].
Giant T-strings (DNA vs. text) external to the individuals (proteins or humans) are of essential importance in biological cells and modern human mass-societies allowing the addition and modification of definitions of new specialized individuals, but also of other information, without any general increase in the individuals' information storing or processing power.
As recent publications about the T-system and TPA including some freely available on the internet [35,40] care is here taken to avoid unnecessary overlap by focusing mostly on the more recent aspects, especially T-strings and the self-similarity suddenly reached between nano and human scales with the advent in a biological eye-blink of literate human mass-societies.

The T-pattern-Towards a scale & content independent pattern type
Some everyday observations illustrate the kind of hierarchical and order and time constrained patterns of patterns of patterns etc., which led to the definition of the T-pattern type.
The following examples thus have aspects in common: Words or standard phrases in text or speech. For example: How do you do? (verbal, single speaker). How are you? Fine thank you (verbal, interactive). Pass me the salt, Jack. Jack passes the salt (verbal, nonverbal & causal). Variously filled-in routines such as "If…then..else" (verbal) and the typical dinner: "sit down…take an entrée..take a main course…take dessert…drink coffee…stand up (nonverbal); where "…" stands for variously filled intervals of characteristic lengths as in numerous other time and/or space constrained everyday patterns.
Note that melodies are typically a sequence of notes with characteristic approximate duration and distances between them. Played too fast or too slowly they disappear becoming just accords or disconnected notes. The same goes for all the above having significant constraints on the durations of their components and the lengths of the intervals between them. In other words, it's about similarity of the consecutive intervals over distinct occurrences of the "same" pattern, here a crucial defining aspect of their sameness and recurrence. The dinner pattern, for example, is thus here seen as hierarchical and selfsimilar, that is, with the same relation at all levels, and recurring with some elasticity, but statistically significant translational symmetry in time; recurring, elastic, hierarchical and self-similar on a single dimension.
The more formal T-pattern definition attempts to integrate such aspects of known recurrent behavioral patterns assuming they are shared by unknown ones and thus allowing their computational (artificial) detection.
This initially led to algorithmic definition and detection of "temporal configurations" (now called T-patterns) as a kind of "artificial categories" and results of their computational detection using specially developed algorithms implemented in the earliest Theme version were first presented in Artificial Intelligence [23] and Applied Statistics [24]. The aim remains to obtain new objective, quantitative and structural bio-mathematical insights through the formulation and evaluation of hypotheses in terms of mathematical pattern types such as now make up the T-system.

T-Data
All T-system definitions and detection algorithms refer to the same type of data, called T-data, which has been collected using among other multi-media coders, chips inserted in brain tissue or transformation of text or molecular data (DNA or proteins) easily available on the internet.
T-data ( Fig. 2) consists of ≥ 1 sets (samples) of ≥ 1 discrete (occurrence) point series, each sample occurring within a continuous observation period on a discrete scale [t 1 , t 2 ] i . Each sample is stored in a two-column tab delimited (time tab event) single-sample .txt file, which is the required input format for all Theme processing. Optionally, samples can be concatenated within Theme and analyzed together. The baseline probabilities used for detection are calculated for each sample or for all concatenated samples. The setting of a search parameter such as the minimum number of occurrences must consider that a pattern rarely repeated in any single sample may occur often in multi-sample data. Visualization of the raw T-data is provided in Theme for overview of the data and to help identifying coding errors (Fig. 2). Note that a physical string, textual or molecular, of length T and with k different mutually exclusive elements is T-data with k series within [1, T]. It thus only differs from typical temporal T-data in that parallel (concurrent) occurrences do not occur. See also the Theme User Manual [41].

T-pattern definition
The initial algorithmic definition of the T-pattern has been made independent of the detection algorithms used.
A T-pattern, Q, is m ordered components, X 1.m , recurring on a single discrete dimension within [1, T] where each component is a T-data category (called primitive or event-type) or a T-pattern Where the ≈dt (0 ≤ dt) terms stand for the approximate characteristic distances between the consecutive components X of the pattern when it occurs within T-data, each significantly invariant relative to a zero hypothesis of independent random distribution of each component with constant baseline probability per unit time given by its number of occurrences divided by T. In terms of the variation intervals of the ≈dt terms, the definition comes closer to the currently used detection algorithms: [ , ] The current binary-tree bottom-up search algorithm of the Theme software relies on finding in T-data at least one pair of series related by such intervals, called critical intervals, and then adding its occurrence series to T-data thus including it in the continued search for more pairs and possibly pairs of pairs etc.

T-pattern detection algorithm
Restricting the T-pattern definition above for detection purposes, any T-pattern Q: X 1 X 2 .X m can be split top-down recursively into a pair of shorter ones related by a corresponding critical interval, CI: And recursively, Q Left and Q Right can each be split until the full Tpattern is expressed as the 1..m terminals X 1. .X m of a binary-tree of nonterminal critical interval relations.
Detection works in the opposite direction of the splitting above, that is, bottom-up, beginning with the series in T-Data and using special algorithms for critical interval detection, pattern construction and pattern evolution algorithm avoiding redundant detections of the same underlying patterns.

Statistical validation
When numerous significance tests are calculated, many may be positive even when the data is random so it is necessary to evaluate to what extent this explains the detection of T-patterns in a dataset for given search parameter values. Two methods are provided in Theme each using a different type of randomization, T-shuffling or T-rotation.
Under T-shuffling, each of the series in T-data is replaced with a series of random numbers within the observation interval, [1, T]. Under Trotation, each series, t i , is shifted by a new random value dt between 0 < dt < T, so t i = ((t i +dt) mod T). Each method repeatedly randomizes the data, searches for T-patterns and stores the number of different patterns of each length found. Finally the averages over all the randomizations are calculated and compared with the number detected in the original data. The number of standard deviations found for each pattern length are usually far greater than required for significance (for example, 10-1000 standard deviations).

[1,1] restricted T-patterns
The [1,1] restricted T-pattern type has the fixed critical interval [1,1] and can be used for TPA of text or molecular sequences and other strictly sequential (i.e. non-parallel) data searching for patterns in Tstrings, such as words in texts (see [40] and below).

Children's dyadic interaction
Most directly inspired by the results of Hubert Montagner et al.'s analysis of children's interactions [47] and Starkey Duncan et al.'s analysis of turn taking in adults [10], the present approach was initially developed for the detection of hidden interaction and turn taking sequences in children's dyadic object play with special attention to behavior preceding the transfer of a toy from one to the other. Montagner et als' extensive filming of unconstrained interactions in kindergartens followed by frame-by-frame probabilistic functional analyses had led among other to the identification of the head tilt as important solicitation (begging) and soothing behavior. In ethological studies, important behavior discovered in natural situations is often studied in more controlled laboratory settings and such study was thus initiated [23,24,28,40].
Pairs of five-year-old children and one picture viewer were brought into a room that was split by a thin separation wall with a 1.5 m long transparent section with space under it where the viewer and picture cards could be easily transferred. The children were instructed to take turns viewing pictures in the viewer and continue as long as they wanted. Video recording was made with two opposite static cameras one facing each child. Coding was done using mostly an existing ethological category list [44]. Even though over decades the approximately 13.5 min dyadic interaction (Fig. 2) has been coded repeatedly and then searched for T-patterns something new continues to be noticed as no single T-pattern captures all that is happening while some add new insight. The Monte Carlo results for this data characteristic for TPA is shown in Fig. 4.

T-patterns in brains and self-similarity
The Brain itself is a self-similar structure: "…a nested hierarchysmaller elements join together to form larger elements, which, in turn, form even larger elements, and so forth … many of the integrative aspects of brain function depend on this multiscale structural arrangement of elements and connections." (Discovering the Human Connectome, Olaf [57], p. 41.) As it seemed possible that interactions between neurons in brains could be patterned like interactions among other organisms, TPA was made of data simultaneously registered with chips in rat brains of the moments of firing of each of >40 neurons. Numerous complex and highly significant T-patterns were detected in each of the hundreds of 12-breathing-cycles long samples and led among other to the hypothesis that T-patterns are involved in memory storage [49]. In addition to being highly statistically significant, the existence of complex T-patterns in neuronal brain networks gained external support due to coincidence as one of the subjects became unstable and eventually had to be terminated. TPA of the firing during its deterioration showed that while the amount of firing remained almost constant, the number and complexity (length) of T-patterns gradually diminished, from numerous, complex and highly significant to just few and simple ones and Monte Carlo results practically at random expectation [50]. Considering only patterns including both inhalation, exhalation and neuronal firing, 30 such patterns were found in the first sample (see Fig. 5), but none in the
Temporal T-patterned self-similarity thus seems to exist from interaction in humans and animals to neuronal interactions within their brains.

Spatial T-patterns on physical strings: T-strings
The T-patterned T-string self-similarity in form and function existing across some nine orders of magnitude, between inert purely informational molecules (DNA) and derived active molecules (proteins) rising in the RNA world, to the biologically sudden rise and influence of text in the animal world only among humans, has motivated the first TPA of both types of strings with first results supporting the view of both as being T-strings, which in modern humans influence practically all behavior and have transformed the naked apes into string-enabled and string-controlled mass-social citizens, string-apes. Concerning the orders of magnitude. For example, E. coli, approximately 10 −6 m (See [45], p. 3, Fig. 1-1) to cities easily exceeding 10 4 m, a difference of >10 orders of magnitude (https://en.wikipedia.org/wiki/List_of_largest_ cities_throughout_history).
This view of the recent mass-social context of human interaction has only recently become possible thanks to new technology and discoveries in among other cell biology including genetics and proteomics.

Text as a T-string
Texts have long seemed obvious T-strings, that is, containing words as ([1, 1] restricted) T-patterns of letters, but to test this TPA has been made of 50 different texts of lengths varying from about 150 to 10.000 words. Each text was analyzed independently. The only preprocessing was putting the whole text (string) into lowercase and removing all characters other than letters. The probability of a letter at any position is estimated as its number of occurrences divided by the number of positions (letters, the length of the text letter string). The a priori probabilities of occurrence of any string is calculated as the product of the probabilities of its letters (as if the text had been randomized). In line with TPA practices, the binomial significance of a word as a Tpattern is calculated using the usual binomial test with the number of occurrences of its first letter as the number of trials, the number of word occurrences as the successes and the probability of the rest of the word. As is the default in TPA with Theme, the minimum number of occurrences is set to three and the level of significance to 0.005.
Results. Approximately 90% of the words in every file were T-patterns and, as is so often seen in TPA, difference from random was little or none for the shortest patterns, but > 90% of words of length ≥ 4 were significant.

DNA as a T-string
The structure of the giant DNA molecules [15] and of text has striking analogies ( [54] p. 8). In DNA, bases form exons and introns forming genes and genomes. Genes are thus patterns of exons (often with introns) separated by approximately fixed non-coding base sequences (see, for example, [14], Figs 2-6). The exons themselves are composed of smaller base patterns, see Fig. 1.
"Genes, we know, are long stretches of DNA code. Each is built up of smaller modules, like a mosaic. We don't know exactly how many such modules there are, but it looks as though there may be as few as a thousand or two. So these basic modules must be shared by a large number of genes." [52]. p. 103.
Each gene is copied into structurally nearly identical RNA strings (any introns removed), which then enter ribosomes to form T-strings of some 20 different elements (amino-acid residuals), that is, proteins with special behavioral potentials and tendencies, members (citizens) of the cell's protein mass-society. For each gene a number of RNA copies and proteins exist at the same time so the size of the protein population greatly exceeds the number of genes in the cell. For example, estimated protein population sizes for E. coli with 4000-5500 genes is estimated 3 × 10 6 and in budding yeast with some 6000 genes is 100 × 10 6 ([45], p. 105). These population sizes are in line with modern human populations. Given the great number of identical protein copies there can be no doubt about their non-random T-string composition. In humans this recalls the number of, for example, similar curricula or university studies, each resulting in many similar specialists.
For each gene a number of protein strings may thus be working at the same time in the cell recalling the many detected concurrent Tpatterns probably interacting in a brain ( [49], 2014).

Known patterns in proteins as T-patterns
Proteins do themselves reflect recurrent gene components. Using the UniprotKP data base (https://www.uniprot.org/help/uniprotkb) all 1176 known patterns stated as regular expressions [13] found in at least one of 231,419 protein strings (amino acid residuals) were downloaded and tested. In 99.7% (230,690) of the proteins the known patterns were found to be significant T-strings (p < 0.005, average p for these was < 0.0001).

Searching for T-patterns in proteins
Many known T-patterns in proteins cannot be detected with the current (to be extended) binary tree TPA algorithm as they involve alternatives at each terminal position [40]. For example, a known pattern may be, ([AQG] [2,5] [DKRS]), [13] such that at each of its occurrences, any one of the letters in the first brackets is followed within the next 2 to 5 positions by any one of the letters in the latter brackets. However, the following example shows that with the current algorithm, TPA of proteins has detected complex T-patterns.
Here TPA was made of the string T-data of each of 100 proteins randomly selected from those above (5.3) with a minimum number of occurrences = 3 and critical interval significance level = 0.005. Tpatterns were detected in 12 of the proteins all passing the Monte Carlo test (i.e., deviations for each pattern length > 2 standard deviations). The longest (m = 11) T-pattern (Fig. 7) was detected in protein Q55GU0, a string of length 916: MKGGGFYQNQYL...KRPSFTEILNLL-NEIP (see https://www.uniprot.org/uniprot/Q55GU0#sequences).
T-bursts were detected in each of the amino acid residuals P, Q and G. Fig. 8 shows the results of T-shuffling and T-rotation simulation for that protein. Fig. 9 shows the results of another such simulation test using only the elements involved in the pattern (amino acid residuals R, D, V, P, Q, E, T and G). No pattern of length >2 containing a T-burst was found in the randomized data.
A next step will be inspection of highly significant T-patterns detected in proteins that might draw attention to unnoticed structural aspects.

Mass-societies in proteins, insects and humans
Animal mass-societies are found only in insects and modern humans (still with primate small group brains) making modern human societies Fig. 1. The (black) sub-patterns are separated by relatively freely filled intervals (white). Any parts can be compressed and stretchedto a limitlike rubber bands. For example, a human behavioral routine like a dinner in a human mass-society or a gene in a protein mass-society both facilitating processes within a mass-society. (About the "Dinner-Gene" see [30].) Figure  adapted from [14], Figs. 2-6).
the only large brain mass-societies. But for mass-society emergence, great brain power is neither necessary nor sufficient (Table 1). In Cell City, the mass-societies of proteins, the citizens have no "brain", but can be seen as information processing entities defined by their structure. Amongst human social groups, all with the same brain size, only some have become modern mass-societies. The great creative potential of human brains possibly hampering the synchronization and coordination of numerous individuals, needing like in protein societies support from external behavior-controlling strings (texts) with means of massive copying and distribution.   Fig. 2., a 13.5 min dyadic interaction between two 5-yearolds exchanging a toy in the interaction for which the T-pattern model, algorithm and the Theme software were first developed. The binary detection tree on the left shows how points in the T-data series on the right get connected level-by-level to form the four occurrences of the complete pattern, detected with significance at each connection p < = 0.005 and minimum number of occurrences = 4. Horizontal axis shows time in 1/15 s (one frame) units. The four trees above are the dynamic versions of the detection tree at each pattern occurrence. The vertical axis shows the pattern terminal event types from the top where x and y are the two children, b or e means begins or ends. Automanipulate = fiddling with something without watching it (see [44], p. 69); haveviewer = having the viewer; manipulate, viewer = manipulates the viewer; view,long = viewing pictures in viewer for more than 3 s; immobile = no movement ("frozen"); glanceat,partner = glances at partner; headtilt = tilting the head to one side.

Mass-societies, string control and self-similarity
Comparison of mass-societies of proteins, cells and bodies shows that external giant durable T-strings whether molecular or textual, external to the individuals, only in proteins and humans determine their specialization including their behavioral potentials and tendencies (see Fig. 12).

Forming citizens in protein and human mass-societies. Ribosome: factory and/or school?
A unique analogy between the function of giant durable T-strings in protein and human mass-societies regards the specialization of citizens. In both cases various segments of their stringomes (DNA or texts) are used by special structures to create multiple kinds of specialized citizens and objects. In Cell City these structures are called ribosomes, but in human mass-societies "schools" (in the broadest sense) and factories. The protein citizen is literally formed, its structure making it able to perform particular interactions, information processing and actions, while in humans, formation means modification of internal memory, that is, of neuronal network patterns (Fig. 11). The myriad of neuronal T-patterns in brains also remind of the myriad of t-patterned proteins in neurones, another self-similarity.

Standardized massively copied T-Strings promoted by the same (primate) alpha male
When considering the importance of textual T-strings for human social behavior, text-based religions can hardly be overlooked. Christianity was the first text-based (eventually the Bible) religion in Fig. 4. This figure shows the result of Monte Carlo analysis (100 random runs) of the children's dyadic interaction data above. It is characteristic that the difference between real and random is smallest for the shorter patterns and for the longer patterns none are found in the randomized data. Rome ( [11], Kindle locations 333-338) and eventually destroyed nearly all the existing (competing) Latin and Greek literature/T-strings [51]. Islam was then based on another related T-string, the Quran. The importance of (standardized) texts and effective copying and distribution technologies thus seems hard to overestimate with half of Humanity now identifying exclusively with one or the other of just these two (over a thousand year old) T-strings, both with the same (primate) alpha male promoter (named, for example, God, Allah or Yahweh) and often called holy scriptures, but sometimes eliciting lethally incompatible behavior [32]. See Fig. 12.
"Religion is regarded by the common people as true, by the wise as false, and by the rulers as useful." This citation is often attributed to the famous roman philosopher Lucius Annaeus Seneca (5 BCE -65 CE).
The modern world is confusing in that personal gods described in very old T-strings (Holy Scriptures) are passionately believed in and worshipped by countless millions of humans, while the majority of the most eminent scientists does not believe they ever existed (see, for example, [58]). Such disbelieve is, for example, clearly stated by Einstein: "It was, of course, a lie what you read about my religious convictions, a lie which is being systematically repeated. I do not believe in a personal God and I have never denied this but have expressed it clearly. If something is in me which can be called religious then it is the unbounded admiration for the structure of the world so far as our science can reveal it." Albert Einstein, letter to an atheist [16], quoted in "Albert Einstein: The Human Side," edited by Helen Dukas & Banesh Hoffman. Kindle edition, location 681.
With Seneca's words in mind, it may also be noted that two of the world's greatest warlords, Roman emperor Constantine at around 300 CE launched Christianity as the religion of his military, and then the 20th century's greatest warlord, Eisenhower, in the 1957 had "In God  We Trust" printed on the dollar bill and made it the official motto of the United States of America supplanting "E pluribus unum" ("Out of Many, One"). Evidence of faith in the power of a god and/or T-strings. (See, for example, https://en.wikipedia.org/wiki/In_God_We_Trust).

Behavior without neurons based on T-strings
Without neurons, motor proteins are now known to work within Cell City, for example, walking along tracks within the cell transporting other worker molecules to other sites and tasks. So much T-string based self-similarity across so many orders of magnitude of time, space and levels of biological organization seems to suggest fundamental aspects. http://www.open.edu/openlearn/nature-environment/natural-history/cells-are-cities-simple-version 7. Discussion

Pattern types and recognition of self-similarity
It can seem strange to approach self-similarity from above, i.e., from higher levels of organization rather than start at the base in the intra cellular nanoscale world and search upwards? But logically, biological self-similarity can only begin to exists and be discovered at higher levels as, obviously, there can be no similarity between the whole and any of its parts before the whole exists, and it cannot exist before its parts. The parts may involve no self-similarity. The cell is a component of human  Fig. 7). Using only the elements (R, D, V, P, Q, E, T, and G) involved in the T-pattern with a T-burst in three of its elements and counting only random T-patterns including a T-burst in at least one of its elements. 100 T-shuffling runs found no such patterns and in 100 T-rotation runs (red) none longer than 7. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) bodies and thus of mass-societies of human bodies, while, obviously, neither human bodies nor human mass-societies are parts of cells, but only human mass-societies have the external T-string self-similarity with the protein mass-societies in cells (Fig. 10).

Sequential vs. parallel patterns
Another aspect is the difference between the ordinal patterns on DNA and the patterns in real-time streams of behavior. Between strictly sequential exclusive elements and sequences on DNA vs. the parallel elements and patterns in flexible time. While patterns of elements occurring at the same discrete position (time) are common and important in behavioral temporal data (streams of behavior), the regular expression language [13]does not consider such concurrence (impossible in strictly sequential data). Thus while [BEGSA] means that B, E, G, S or A occur at the same position there is no possibility of expressing that, for example, all of them do.
A pattern type corresponding to the simpler data may not correspond to the more complex, while the opposite may be true. So discovering self-similarity naturally comes from the higher-level pattern types.

Why wasn't more learned from social insect mass-societies?
The RNA world had no purely informational giant molecules external to its citizens deciding their specialization and the same is true for social insects and all other animals including non-modern humans living in small groups and illiterate societies until a biological eye-blink ago.
The giant durable external T-strings, molecular or textual, essential for the specialization of individuals in protein and human mass-societies are without parallel in insect mass-societies where specialization of individual occurs very differently [17].

A bio-mathematical continuum
No simple evolutionary path exists from the internal workings of biological cells to human mass-societies, but as underlined by Konrad Lorenz in his inaugural Nobel Lecture [20], analogy is a valuable source of knowledge in ethology here providing a new perspective on the human situation where dramatic fast changes are taking place, not in genes, but in lifestyle, due to the recent fast expanding and increasingly accessible external T-string memory, texts, without which modern mass-societies and their science and technology are unthinkable. Masssocieties from proteins to modern humans based on "T-stringomes" thus represent a bio-mathematical continuum from molecules to modern culture, science and technology with human culture in mass-societies now mostly external.

Reaching unique self-similarity and a revolution in human possibilities
While Humans descend from other Apes, apparently their very recent and powerful mass-societies descend from those of nano creatures using analogous mechanisms. Human societies being the only ones among animals to reach such self-similarity it comes accompanied by a unique explosion in knowledge and understanding of nature including of those trillions of protein mass-societies constituting every human.

Wealth, inequality and T-strings in the world of money
Just a word about money in the strange new T-string world. The numbers 0 to 9 as particular T-strings of zeros and ones in modern computers recur in countless combinations and are frequently associated through special T-strings (bank account numbers, contracts, etc.) with the DNA of particular individuals who may thus be rich or poor, even wealthy or desperate.

Complementary bio-mathematical model
The biological cell was first discovered in the 17th century [43] and Table 1 Brain size and mass-societies. Approximate relationship between brain size and existence of mass-societies in proteins, insects and humans.  the DNA, ribosome (see [53]) and RNA world in less than a century. The discovery of protein mass-societies with their giant T-strings external to their protein workers, depended on the T-string based science of human mass-societies, for which they now seem an attractive complementary bio-mathematical model. Human behavioral problems certainly did not end after the two world wars. Now adding overpopulation, global warming and epidemics, reduced biodiversity and related threats to humanity. But it looks as if the recent science at the nano scale could revolutionize the way humans see themselves. Surrounded by myriads of massively copied (for example, legal, religious, scientific and technical) T-strings, which suddenly made human modern mass-social life possible by enabling and controlling the citizens making some even wonder if the feared "robots" are already here, that is, us. Naked apes, now suddenly mass-social "string-apes", reproducing and polluting at a previously unseen rate, while creating and maintaining a barely sustainable world of billions with staggering inequality; the so called world of the 1%.

Declaration of Competing Interest
The author is the sole creator and copyright owner of the THEME software and principal owner and COB of PatternVision Ltd (www. patternvision.com) responsible for the marketing of the THEME™ software. Fig. 11. T-strings and formation of citizens. Various sub-sets of purely informational T-strings are used to form specialized citizens in protein and human masssocieties.

Fig. 12.
Holy T-strings. Using T-strings to set special aspects of the behavioral potentials and tendencies of individuals in human mass-societies. About half of humanity is assigned, exclusively, to one or the other of just two such strings, called Bible and Quran.