Computer Vision and Mathematical Methods Used to Analyse Children’s Drawings of God(s)

Cocco, Christelle; Ceré, Raphaël

doi:10.1007/978-3-030-94429-2_9

Part of the book series: New Approaches to the Scientific Study of Religion ((NASR,volume 12))

1486 Accesses

Abstract

In contrast to mainstream research methods in psychology, the project Children’s Drawings of Gods encompasses computer vision and mathematical methods to analyse the data (drawings and drawing annotations). The first part of the present work describes a set of methods designed to extract measures, namely features, directly from the drawings and from annotations of the images. Then, the dissimilarities between the drawings are computed based on particular features (such as the gravity centre of the smallest image unit, namely pixel, or the annotated position of god) and combined in order to measure numerically the differences between the drawings. In the second part, we conduct an exploratory data analysis based on these dissimilarities, including multidimensional scaling and clustering, in order to determine whether the chosen features permit us to distinguish the different strategies that the children used to draw god.

You have full access to this open access chapter, Download chapter PDF

Perceptually grounded quantification of 2D shape complexity

Article 09 August 2022

Gardony Map Drawing Analyzer: Software for quantitative analysis of sketch maps

Article 12 February 2015

Circle Tales. Infographics to Tell About Contemporary Art

Keywords

A crucial and original point of the project Children’s Drawings of Gods^{Footnote 1} is its intercultural and interfaith nature. In order to accomplish intercultural and interfaith comparisons, it is necessary to have a large number of drawings. However, it becomes difficult to analyse each drawing using methods that are standard to psychological research because the standard methods are very resource-intensive, especially if an interrater is needed. In the latter case, moreover, the results could sometimes be subjective, because they depend, to some extent, on the raters. This project makes an original contribution by encompassing computer vision and mathematical methods to analyse the drawings and furnish (semi-)automatic methods to treat large numbers of drawings.

While computer vision methods are well developed to analyse natural images (see, e.g., Szeliski, 2011), such as aerial photography, human portraits, or natural landscapes, they are less developed for artistic work, such as paintings or drawings, with a few exceptions (see, e.g., Stork, 2009; Manovich, 2012; Romero et al., 2018). As for psychological studies of drawings, there are only a few works using computer vision (see, e.g., Kim et al., 2007; Kim et al., 2012; Ahi, 2017). Ahi (2017) uses the ImageJ program, a tool largely used in the area of biological imaging and fully discussed in Schneider et al. (2012). Regarding annotation tools, they are often developed for specific purposes, but with a usage either too restrictive or too permissive according to the aims of the project (Cocco et al., 2018).

Therefore, we developed specific methods to answer various research questions (see Table 9.1 for the relation between the research questions, the methods, and the chapters of this volume). In a nutshell, we developed two main method types in this project in order to extract features from the drawings: methods based on annotations and methods of computer vision. Both methods enable the extraction of counts and measurements (namely of features) from the drawings.

Table 9.1 Summary of research questions treated in this chapter with regard to the related chapters in this book, the considered features, and the type of features

Full size table

First, regarding the methods based on annotations, we created a specific annotation tool, dubbed Gauntlet, for this project (Dessart et al., 2016).^{Footnote 2} This tool proposes a fixed list of items that can be annotated (with a box or a point depending on the items). The output provides a list of the annotated items with the coordinates of the boxes or points. Two types of annotations were produced with this tool in the present project and its sequel: annotations for anthropomorphism and annotations for position. Regarding annotations for anthropomorphism, we annotated all anthropomorphic items in the drawings.^{Footnote 3} Only the names of the items were used here. For position annotations, we placed a rectangular box around the god^{Footnote 4} representations in the drawings, in accordance with instructions found and detailed earlier (Chaps. 6 and 7, this volume). Based on this box position, we extracted features, namely the vertical and the horizontal position of god.

Second, we developed various methods of computer vision, enabling us to directly extract features from the drawings. Three general features are discussed in this segment: the gravity, which mainly consists of the positional mean of coloured pixels; the colour frequencies; and the colour organisation (both of which are based on the RGB (red, green and blue) colour space representation of the drawings).

Although we developed these types of methods to answer a variety of questions (see Table 9.1), they can sometimes serve more than one purpose; more than one method can aid us in answering a single question. For instance, to understand where children draw god on the page, we can: (1) consider the complete composition of the drawing with the mean position of coloured pixels (gravity features); or (2) focus on the annotated position of the figure of god (position features).

The aim of this work is to go further in the automated processes to see if expected patterns can be seen and new patterns discovered by using clustering methods that have not been used in other segments of analysis, as well as to add new features and to combine them. In the first part, we present the method: first, the feature extractions; second the transformation of features into dissimilarities; and third, the clustering method, namely the K-means. The next part provides results (illustrations) of applied clustering in combination with previously defined features. Finally, we discuss the contribution of this method to the psychological research questions.

Method

In this section, we first describe the dataset of drawings and drawing annotations. Then we present various features that characterise the drawings (gravity, colour frequencies, colour organisation, god position, and anthropomorphism), features that have been extracted from the drawings and/or from the annotations. We explain how those features are used in computations to measure the dissimilarities between each pair of drawings (dissimilarities based on features). Finally, the clustering technique is described (clustering based on dissimilarities).

Dataset

Each i = 1, …, N drawing from the dataset of size N = 1211 has different origins, as summarised in Table 9.2. The drawings in this dataset were collected between 2003 and 2016 in small groups of compulsory school aged children.

Table 9.2 Count of the drawings in the dataset

Full size table

Not all drawings were annotated for position. In drawings with multiple figures it was difficult, or even impossible, to identify which or how many of the figures represented god. Although we were not able to annotate all drawings for the position, we were able to annotate all drawings for anthropomorphism (with the exception of Russia, for this subset of drawings our work is still in progress). To date, we have been able to annotate 745 drawings for anthropomorphic characteristics and 1162 for the position of the god figure.

Each i drawing is defined here as a mathematical object consisting of a d = 3 dimensional matrix or array of size n × m × d, coding the vertical position on the Y-axis, the horizontal position on the X-axis and the colour, respectively (think about a regular 3D grid). With y = 1, …, n and x = 1, …, m, each p_yx element represents the colour value or the pixel at the (x, y) coordinates. More precisely, p_yx is defined by a triplet of values in the RGB colour space.

Features

We extracted two types of features from the drawings in the dataset. There are features extracted from manually executed annotations (god’s position and anthropomorphism) and features automatically computed from the drawings (gravity, colour frequencies, and colour organisation) according to computer vision approach.

Gravity

The disjunctive configuration of the coloured pixels (absence or presence), permits us to extract their so-called mean position, based on the weighted mean of the gravity computation proposed by Konyushkova et al. (2015). First, the pixels colour space is converted from RGB → HSV (Hue, Saturation and Value) and the retained $ {p}_{yx}^b $ coloured pixel follows

$$ {p}_{yx}^b:= \left\{\begin{array}{l}1\ \textrm{if}\ {p}_{yx}\in \left[H,S>0.05,V<0.95\right]\\ {}0\ \textrm{otherwise}\end{array}\right. $$

Then, the standardised $ \left(\overline{x^b},\overline{y^b}\right)\in \left[0,1\right] $ mean position coordinates of the coloured pixels are obtained with

$$ \overline{x^b}=\frac{f^b\sum \limits_x\left(\sum \limits_y{p}_{yx}^b\left(x-0.5\right)\right)}{m}\kern2.5em \overline{y^b}=\frac{f^b\sum \limits_y\left(\sum \limits_x{p}_{yx}^b\left(y-0.5\right)\right)}{n} $$

(9.1)

where the normalisation factor $ {f}^b=1/\sum \limits_{xy}{p}_{yx}^b $ is the inverse number of coloured pixels.^{Footnote 5}

Moreover, the inertia Δ^b, measuring the dispersion of the coloured pixels inertia, is computed as:

$$ {\varDelta}^b=\left\{\begin{array}{l}\operatorname{var}\left({x}^b\right)+{\left(\frac{n}{m}\right)}^2\operatorname{var}\left({y}^b\right)\ \textrm{if}\ m<n\\ {}\operatorname{var}\left({y}^b\right)+{\left(\frac{m}{n}\right)}^2\operatorname{var}\left({x}^b\right)\ \textrm{otherwise}\end{array}\right. $$

(9.2)

where var(x^b) and var(y^b) are defined as:

$$ \operatorname{var}\left({x}^b\right)={f}^b\sum \limits_x{\left(\frac{\sum \limits_y{p}_{yx}^b\left(x-0.5\right)}{m}-\overline{x^b}\right)}^2 $$

and

$$ \operatorname{var}\left({y}^b\right)={f}^b\sum \limits_y{\left(\frac{\sum \limits_x{p}_{yx}^b\left(y-0.5\right)}{n}-\overline{y^b}\right)}^2. $$

Thus, both the weighted mean of gravity and the inertia, called together gravity features in the sequel, are computed based on coloured pixels. This means that if pixels are not coloured, for example if a child left a part in white, it influences these features. The next step in this work will be to find a way to consider these cases, using a method such as the one developed by Seong-in Kim et al. (2012).

In order to keep the process completely automatic, the whole dataset was included in the analysis for these gravity features. While we expected that some drawings (such as those left as blank sheets) would be automatically removed from this portion of the analysis, they were not, because there is always at least one coloured pixel, sometimes due to the scan or to noise.

Colour Frequencies

On a finer level, the colour frequencies (in pixel count) can be extracted automatically for each drawing with the method proposed by Cocco et al. (2019). This method uses a two-step procedure that assigns all pixels p_yx first to a set of 117 micro-colours, and second to a set of a defined palette of G = 10 colours: gray-black scale (achromatic), blue, cyan, green, orange, pink, purple, red, white, and yellow.^{Footnote 6} Finally, a binary matrix of the same dimension as the considered drawing is obtained, $ {b}_{yx}^g $, for each colour g = 1, …, G, whose components are 1 if the pixel has the colour g. Thus, for each drawing, $ {c}^g=\sum \limits_{xy}{b}_{yx}^g $ is the number of pixels per image for each colour. This allowed us to create a contingency table, V^COL where images are the rows; and colours, the columns.

Colour Organization

Looking deeper, each drawing expresses a particular organisation of colours, which can be quantified by a measure of entropy (Parker, 2011): the higher the entropy, the more dispersed or “random” the corresponding colour distribution will be. Conversely, a lower measure of entropy corresponds to the use of less colours, and may indicate a more organised state of colours.

The triplet value defining the colour of a pixel in the RGB space is first linearly converted to the so-called grey level $\tilde{p}$ defined as $ \tilde{p}=0.2125{p}_R+0.7154{p}_G+0.0721{p}_B $ which results in 256 discrete grey levels in t = 0, …, T = 1. Then, for each drawing, the relative frequency of each level of grey t occurs at

$$ \textrm{p}(t)=\frac{\sum \limits_{\textrm{y}=1}^{\textrm{n}}{\sum}_{x=1}^m\mathbbm{1}\left(\tilde{p}_{yx}=t\right)}{\textrm{P}} $$

with P = n × m the pixel number of the drawing. Thus, the entropy (in bits) associated with the drawing of the 256 colours is

$$ H(T)=-\sum \limits_{t=1}^T\textrm{p}(t)\ {\log}_2\ \textrm{p}(t) $$

and obeys 0 ≤ H(T) ≤ log₂(256) = 8 bits.

God Position

We annotated drawings in order to locate the god figure’s position in the pictorial space (god position in the image). At this stage, only drawings with a single god figure have been analysed (see Chap. 7, this volume). For each drawing, god’s representation is delimited by a box defined as two points x_min, y_min, in the upper left corner, and x_max, y_max, in the bottom right corner. The position is defined by the centroid-standardized coordinates (x^c, y^c) ∈ [0, 1] such as:

$$ {x}^c=\frac{\left({x}_{max}-{x}_{min}\right)/2}{m}\kern2em {y}^c=\frac{\left({y}_{max}-{y}_{min}\right)/2}{n} $$

(9.3)

Anthropomorphism

We annotated drawings with various labels and positions related to anthropomorphism.^{Footnote 7} For this task, we kept only the labels (position was not used) and only those directly connected to anthropomorphism. A contingency table V^ANT = (v_il) was obtained, where v_il counts the number of occurrences of the l^th anthropomorphic feature in the i^th drawing. We considered 13 labels, namely:

heads,
eyes,
noses,
mouths,
ears,
hair,
beard,
clothes,
arms,
hands,
legs,
feet and
no anthropomorphic item.

The last label (l= “no anthropomorphic item”) was used when no other label applied to the drawing i (v_il = 1), in order to avoid drawings without labels and to be therefore able to include all drawings in the dissimilarity computation detailed in the section below.

Dissimilarities Based on Features

When each drawing has been characterized by uni- or multi-variate features, computing the dissimilarities between each pair of i, j drawings constitutes a natural way to reveal their contrast within a large dataset.

We computed two types of dissimilarities regarding the quotient of the features. The numerical measure yields the n × n symmetric dissimilarity matrix D = (d_ij)with $ {d}_{ij}^2=\parallel {\overrightarrow{x}}_i-{\overrightarrow{x}}_j{\parallel}^2 $ representing the squared Euclidean distances. Otherwise, the categorical feature with m modalities yields a contingency table V = (q_il) of size n × m, a matrix counting the number of occurrences of modality l in drawing i. From the latter, a chi-squared dissimilarity denoted here χ² can be computed. However, the categorical features under investigation include various modalities that are over-represented and hide other subtle yet relevant modalities that are less frequently represented (think about a distribution count of white pixels in front of the distribution counts of pink or yellow pixels). The generalized χ² defined below (Ceré & Egloff, 2018) provides a parameter θ ≥ 0 to adapt the sensitivity of the measure to the high or low frequencies in the distribution of each l modality, respectively θ > 1 or θ < 1 whereas θ = 1 provides the usual χ². Then $ {d}_{ij}^{\chi^2}=\sum \limits_l{v}_l{\left({\rho}_{il}^{\theta }-{\rho}_{jl}^{\theta}\right)}^2 $ where $ {v}_l=\frac{q_{\bullet l}}{q\bullet \bullet } $ is the modality weight and $ {\rho}_{il}=\frac{q_{il}{q}_{\bullet \bullet }}{q_{i\bullet }{q}_{\bullet l}} $ the independence quotient.

In both cases, those dissimilarities are squared Euclidean, and so are their p-variate mixtures $ {D}_{ij}^{\prime }=\sum \limits_{k=1}^p{\alpha}_k{D}_{ij}^{(k)} $, where D^(k) is the k-th dissimilarity D_ij/Δ standardized by the corresponding inertia Δ for the k-variable, and the free coefficient α_k ≥ 0 with $ \sum \limits_k^p{\alpha}_k=1 $ permits us to tune the relative weight of each contribution.

Therefore, for the five dissimilarity matrices of Table 9.1, three are straightforward squared Euclidean dissimilarities:

D^VAR = (‖H(t)_i − H(t)_j‖²) from the colour variety feature,
$ {D}^{GRAV}=\left({\left\Vert \overline{x_i^b}-\overline{x_j^b}\right\Vert}^2+{\left\Vert \overline{y_i^b}-\overline{y_j^b}\right\Vert}^2+{\left\Vert {\varDelta}_i^b-{\varDelta}_j^b\right\Vert}^2\right) $ from the gravity features,
$ {D}^{POS}=\left({\left\Vert {x}_i^c-{x}_j^c\right\Vert}^2+{\left\Vert {y}_i^c-{y}_j^c\right\Vert}^2\right) $ from the position features; and two are chi-squared dissimilarities:
D^COL from the colour frequencies features and
D^ANT from the anthropomorphism features.

Clustering Based on Dissimilarities

As the dataset is large enough (a large n that will increase drastically in the near future of the project) a clustering method is needed to identify possible homogeneous drawings aggregations. From among many other methods, the well-known iterative K-means approach has been adapted here according to Cocco (2014) for the formalism D = (d_ij) and performed to attribute each drawing in c < N clusters.^{Footnote 8} We consider a uniform weight for each i drawing, such as f_i = 1/N and $ \sum \limits_i{f}_i=1 $.

In contrast with the current practice of this method,^{Footnote 9} we first define a uniformly random binary partition matrix Z = (z_ik) where $ \sum \limits_k^c{z}_{ik}=1 $. Each drawing is then attributed to the cluster k with the probability z_ik.

Iteratively, the distance between the drawings and the intermediate k-centroids $ {D}_i^k $ is

$$ {D}_i^k=\sum \limits_j{f}_j^k{D}_{ij}-{\varDelta}_k\kern1em \textrm{with}\kern1em {\varDelta}_k=\frac{1}{2}\sum \limits_{ij}{f}_i^k{f}_j^k{D}_{ij} $$

where $ {f}_j^k={f}_i{z}_{ik}/{\rho}_k $ is the distribution of the drawing i within cluster k and obeys $ \sum \limits_i{f}_i^k=1 $, where $ {\rho}_k=\sum \limits_i{f}_i{z}_{ig} $ is the relative weight of cluster k. At each iteration, the drawing i is attributed to the nearest kth cluster, $ {k}_i=\arg {\min}_{k=0}^c{D}_i^k $ that is z_ik = 1 if k = k_i, and z_ik = 0 otherwise, and the process is continued until the partition Z converges.

The number of c clusters is chosen accordingly to the heuristic rule of Hartigan (Chiang & Mirkin, 2010; Hartigan, 1975; Sablatnig et al., 1998) which defines the optimal number of clusters c^⋆ as the minimal c for which the Hartigan index $ HK=\left(\frac{w_c}{w_{c+1}}-1\right)\left(n-c-1\right) $ satisfies HK(c) ≤ 10, where w_c = $ \sum \limits_{ik}{D}_i^k $ is the sum of the within-cluster distances to the c centroids. Such a criterion seeks to minimize the variation of w_cwhile increasing c. If the value HK(c) ≤ 10 is never reached, the number of clusters is chosen as $ {c}^{\star }=\arg {\min}_{c=2}^{c=10}\mid {w}_c-{w}_{c+1}\mid $ (de Amorim & Hennig, 2015).

The above yields various distinct and homogeneous clusters of the drawings, whereas the Multi-Dimensional Scaling (MDS) permits to explore the dissimilarities between drawings by representing and visualizing the dataset in a lower number of dimensions. Although the dimension reduction implies a loss of information,^{Footnote 10} the combination of the labelled drawings and dissimilarities in a 2-dimensional plot constitutes a particularly intuitive way to understand patterns of two aspects in a large dataset. This is useful when analysing two aspects such as:

1.
The similarities between drawings within a given cluster and between differing clusters,
2.
The identification of the uni- or multivariate profile most contributing to the distinction between drawings, or groups of drawings.

To perform the MDS, the matrix of scalar products weighted B are computed from the dissimilarity matrix (Bavaud, 2011; Cocco, 2014) as:

$$ B=-\frac{1}{2}{HDH}^{\prime}\kern1em \textrm{with}\kern1em H=I-\mathbbm{1}{f}^{\prime } $$

which permits us to define the matrix of weighted scalar products $ {K}_{ij}=\sqrt{f_i{f}_j}{B}_{ij} $ whose trace is equal to the inertia Δ of the configuration. The spectral decomposition K = UΛU^′ (where Λ is diagonal and U orthogonal) finally defines the factor coordinates of each i drawing upon the new α^th dimension as

$$ {x}_{i\alpha}=\frac{\sqrt{\lambda_{\alpha }}}{\sqrt{f_i}}{u}_i $$

where the eigen value λ_α represents the inertia explained by the α^th dimension. Ideally, the first dimensions express an important part of the inertia, thus justifying the 2-dimensional plots, as illustrated in the section below.

Results

In this section, we propose some possible answers to questions mentioned in Table 9.1. For each feature or combination of features, a MDS and a K-means were computed, as described in the “Method” section, and the MDS served to plot the results.^{Footnote 11} For each cluster obtained with the K-means, four types of proportions were additionally computed, according to the metadata (country, sex, age and context) of the drawings:

The proportion of children from each country;
The proportion of males and females;
The proportion of children in each age group:
- Group “young”: until 9 years and 5 months,
- Group “middle”: between 9 years and 6 months and 12 years and 5 months,
- Group “old”: at least 12 years and 6 months; and
The proportion of children met in a religious or in a public context.

Because the number of drawings varies in each sample based on the metadata, we computed a sample adjustment for these proportions. Indeed, each illustration of frequencies by clusters below shows the relative group proportion, which balances the number of subjects for each metadata group. For instance, we re-weighted the number of children in each country so that, for the calculations, the three countries contain the same number of children.

Anthropomorphism

In order to understand if children represent god(s) as human, we computed a K-means on the χ² dissimilarities D^ANT, with the free parameter θ = 0.5,^{Footnote 12} produced with the anthropomorphism contingency table and the results are detailed in Figs. 9.1, 9.2, and 9.3. As explained before, these results cannot be used to answer the research questions directly, because features include the anthropomorphic items of the whole drawing and not only of the god figure. However, they do permit us to distinguish some patterns that characterize the available dataset.

For instance, the most remarkable cluster (the third one) includes drawings without human items (Fig. 9.2). As expected, this cluster is highly distant from others (Fig. 9.1). Moreover, there is a majority of Swiss children, males, older children and/or of drawings collected in a religious context (Fig. 9.3). Cluster 2 is also different enough from the others, containing drawings with an average of more than 10 heads, 20 arms and legs, but less than five noses or mouths on average (Fig. 9.2). It contains drawings with many small anthropomorphic figures that lack details, namely mouths and noses. All the drawings in Cluster 2 come from Switzerland and were produced primarily by females and older children in a religious context (Fig. 9.3). Figure 9.2 also demonstrates that Cluster 9 contains drawings with anthropomorphic figures without hands; Cluster 4, more heads than arms or legs; Cluster 7, more than two heads on average; Cluster 5, 6, and 8, one head on average; and Cluster less than one head on average. Therefore, the first cluster is composed of drawings with one main anthropomorphic figure, without eyes, as seen in the drawing by a Japanese boy presented in Fig. 9.4. Children from Japan, the religious context, and the older group drew most of the representations in this cluster.

Position and Gravity

To understand the position of children’s god representation on the page, we developed two techniques. The first one is based on the position of coloured pixels and takes into account the whole drawing (gravity). The second one is based on an annotation that delimits only the figure of god (position). As explained in the method, the K-means can be applied to a single set of features, such as position or gravity, or to a combination of feature sets, such as position and gravity.

Position

As this feature has only two components, x^c and, y^c the MDS plots exactly the position, with the exception of the direction and thus, the sum of the variances explained by the two first dimensions is equal to 100% (see Fig. 9.5).

The first quadrant of Fig. 9.5 (top-right) corresponds to the fourth quadrant of the drawing (bottom-right), the second quadrant of the Fig. 9.5 (top-left) to the first quadrant of the drawing (top-right), and so on. Thus, it is a 90° counter-clockwise rotation and Cluster 5 represents drawings where god’s representation is in the upper-left position of the drawing, while in Cluster 7, god is represented in the lower part of the drawing. It is important to note that this rotation is due to the fact that the vertical position of god, represented by the first dimension, explains 71% of the variance, compared to only 29% for the horizontal dimension: the vertical position of god turns out to be more efficient to differentiate between drawings than the horizontal position.

As shown in the Fig. 9.6, the majority of drawings in Cluster 5 (god is at the upper-left position of the drawing) were drawn by Russian children, boys and children aged between 10 and 12 years old and/or met in a public context. Cluster 4 represents drawings where god was drawn on the right part of the page, near the bottom, mainly produced by Swiss children and children met in the religious context. Drawings in Cluster 7 represent drawings where god was drawn at the bottom. As for Cluster 4, the majority of these representations were drawn by Swiss children. Moreover, these drawing were mainly produced by young children. As in the case of Cluster 5, the Clusters 8 and 9, with god drawn at the left middle portion of the page, contain mainly Russian drawings. Moreover, representations in Cluster 8, where god is depicted slightly lower than the centre, were mainly drawn by girls; and those of Cluster 9, where the representation of god is situated higher than the centre, were drawn mainly by children of middle age group.