1 Introduction

The “Brain Projects” have been widely implemented throughout the world in recent years, such as those in China [40], the USA [5], Europe [1], and Japan [36]. Such phenomenon has accordingly contributed to the burgeoning of research on visual information processing mechanisms in academic fields such as cognitive neuroscience and computer vision (CV) [10, 22, 30, 55]. Considered as the perfect image information processing system, visual system of human beings can quickly recognize such objective information as position, size, shape, color, and orientation with substantial advantages in stability, robustness, efficiency, and simplicity [56]. For that reason, scholars from fields of cognitive neurobiology, computational neuroscience, and CV have shown growing interest in examining the neural information processing mechanisms of the visual nervous system [13], [23]; [38, 39, 41, 44].

Indeed, research on visual information processing mechanisms has kept accelerating as biological techniques continue to evolve over the past few decades [53]. In 1962, Wiesel and Hubel’s experimental research findings on cat’s primary visual cortex (V1) illustrated the correlation between the receptive fields (RFs) of the lateral geniculate nucleus (LGN) and RFs of V1, which significantly advanced research in the field of biological vision [18]. In 1971, Dubner and Zeki studied the characteristics of the orientation selectivity of cells in visual area V5, initially revealing that the middle temporal visual area (MT) belongs to the central region of motion perception [11]. In 1994, Ungerleider and Haxby proposed the theory of ventral and dorsal visual processing streams, providing a physiological basis for the visual system to process motion and static information [51]. In 2002, Riesenhuber and Poggio discovered mutual projection and interaction between the dorsal and ventral pathways. Building on this synergistic effect, the researchers further investigated perceptions under the influence of visual stimuli [43]. In the same year, Yifeng Zhou and Tiande Shou (2002) revealed that the orientation sensitivity of LGN cells could experience changes due to visual cortex feedback. In 2010, Bin Zhu and Tiande Shou reported that visual area V4 has a positive correlation effect on the orientation selectivity of V1 [50]. Further, Jianbo Xiao and Xin Huang discovered the characteristics of MT cells for distinguishing complex orientations, indicating their great significance for the extraction of multiple movement directions [52]. Altogether, these experimental results have contributed considerably to understanding the basic principles of visual information processing [58, 32, 54].

In parallel with neurobiological experiments, a number of neural computational models for the visual system have been likewise put forward. As early as in 1982, Marr firstly introduced a relatively comprehensive theory of visual computing informed by research grounded in neurobiology [31]. He argued that visual cognition obtains “what” and “where” information through the “seeing” behavior and that the brain follows the hierarchical processing of visual information and the bottom-up principle. Such findings are deemed to lay the groundwork for research in subsequent years. In 1999, Riesenhuber and Poggio proposed a model named “HMAX” (hierarchical nonlinear maximum operation) based on V1 cells, mimicking the neural mapping from simple cells to complex cells in V1 [42]. In 2001, the widespread computing models of visual attention were brought to the fore, consisting of environmental stimuli saliency, saliency map, inhibition of return, attention and eye movements, scene understanding and object recognition, etc. These models enlarged the knowledge base concerning the neurobiological mechanisms of visual attention [19]. In 2003, Zhaoping Li explored the segmentation and contour enhancement of V1 cells from the perspective of the computational model [57]. In 2006, Schölkopf and colleagues proposed a bottom-up model of visual saliency based on bottom-up attention, which was then employed to calculate the visual saliency map in the corresponding scene [46]. In 2011, Xianglin Meng and Zhengzhi Wang provided a model of enhancement strategy for region of interest based on attentional shroud, which possesses physiological and psychological rationality and can be used for region segmentation, target recognition, and scene analysis [33]. In 2014, George et al. proposed a model for texture inhibition and contour enhancement based on the antagonistic and reverse inhibition properties of simple cells in V1 [3]. In the same year, Jeroen et al. proposed a recurrent motion model (RMM) based on the response of the preferred orientations of MT cells, which can predict the perception of motion characteristics of MT cells [21, 48]. In 2015, Chessa et al. proposed a V1–MT neural model for motion estimation, simulating the primary motion pathway of V1 to MT [8]. More specifically, a two-dimensional Gabor filter was used to simulate the RFs of simple cells in V1, followed by obtaining the MT cells’ model through the weighted combination of V1 cells’ response and regularization and subsequently applied to motion estimation. In 2017, Klaus et al. constructed an interference model of working memory about visual object feature information, based on four continuous-reproduction experimental data about working memory of color and direction [35]. It is concluded that continuous visual information and discrete visual information have the same mechanisms of cue-based retrieval and interference. The findings thus paved the way for developing a unified theory of working memory in verbal, spatial, and visual information.

As the above reviews suggested, exploration into the visual information mechanisms has went through a long developmental period giving rise to a substantial amount of scientific achievements both in the field of neurobiological experiments and of computational neuroscience. Nevertheless, there has been a lack of a well-established theory to elucidate the significant phenomenon of visual information dynamic degradation in the visual nervous system.

Clearly, the visual information dynamic degradation occurs in the visual system. According to the experimental data provided by Anderson and Raichle ([2]; [41], the real world can actually emanate unlimited visual information. However, in the visual nervous system, only about 1010 bits/sec are deposited in the retina, which can be translated as nearly 1 million axons in each nerve from the neurobiological point of view. As a result of this limited number of axons in the optic nerves, only about 6 × 106 bits/sec leave the retina, and only 104 bits/sec can get to layer IV of V1 [41, 60]. It can be seen that during the process of transmitting from the retina to layer IV of V1, the visual information is reduced by about 10–6 times. Yet, the dynamic degradation cannot prevent visual cortex from gaining a complete visual perception of the real world.

Previous research shows that there is a convolution calculation approach for the pathway of retina-LGN-V1 [60]. Not only does it contain significant visual information dynamic degradation, it can also extract the edge features efficiently according to the principle of energy minimization of brain activity. Moreover, the computational model proposed in accordance with such findings provides quantitative methods to understand the neural mechanisms of the dynamic degradation mapping from the retina to V1, which can produce results that match the experimental data noted above.

As we mentioned earlier, however, the mechanism of visual information mapping from V1 to the secondary visual cortex (V2) still remains unclear as regards the existence (or not) of degradation during the mapping process and the way in which such visual information can be quantitatively analyzed [20], [47]; [59]. These are vital to understanding visual information processing in higher-order cortices.

Due to a lack of available models to address these questions, we established a computational model in the current paper to quantitatively predict and analyzed the visual information dynamic degradation based on the mapping from V1 to V2. The study was informed by the convolution calculation approach for the pathway of retina-LGN-V1 [60], the theory of convolutional neural networks (CNN) [26], and anatomical architecture between V1 and V2 [14].

The novelty of this study mainly consisted in the following three respects. First, CNN is directly inspired by the classic notions of simple cells and complex cells in the visual system, and the overall architecture relies on the LGN-V1–V2 hierarchy in the visual cortex [26]. Drawing the lesson from CNN, we have built a computational model in the previous study, the results of which were consistent with experimental data and proved its feasibility. Therefore, we extended that model based on the anatomical architecture between V1 and V2, which is of great value to research on visual information processing from a new theoretical perspective.

Second, the computational model proposed by the current paper, which includes six layers simulating the levels of photoreceptors, ganglion cells, LGN, V1, and V2, mimics the visual information processing. The results indicate there still exists convolution calculation and a slight degree of dynamic degradation in V1–V2. Specifically, the visual information of V2 is 0.18 times that of V1, which offers us a precise understanding of the visual information mapping mechanism from V1 to V2. In addition, the computational results will make up for the lack of experimental data of V1–V2.

Lastly, the results demonstrated that although the RFs of V2 have strong responses to the “corner” of the visual image [17], they do not extract the feature information to any further degree. Therefore, it can be concluded that the significant dynamic degradation occurs in the pathway of the retina to V1. In other words, the novel visual information from the real world is entirely processed in the early visual areas and primarily processed in retina-LGN-V1. On the other hand, following the principle of synaptic plasticity, the RFs of V2 can accurately respond to and encode the scarce “corner” information about the real world. The contour detection (edge and corner detection) of visual perception in natural scenes only uses lower-order areas’ visual information.

2 Methods

2.1 The visual information changes from retina to V1

The visual system grants animals the capability to perceive the real world [14]. In the ventral pathway of the visual cortex, the form perception is gradually improved with respect to the cortical hierarchy of low order to high order [16]. In V1, V2, and V4, their RFs are selective for orientations, angles, and curvatures, respectively.

The light reaching the retina, and then mapping to LGN, and V1, the sequence of visual information processing follows Fig. 1, as shown below [60].

Fig. 1
figure 1

The visual information processing from retina to V1. The pathway of photoreceptors-ganglion cells-LGN-V1 is a one-to-one mapping, which contains a significant degradation through the visual information flow

The pathway of the retina to V1 is a one-to-one neural mapping [56]. The photoreceptor converts the external light signals into bioelectrical signals and delivers them to ganglion cells, which finally are transmitted to V1 through LGN. In this system, about 1010 bits/s are deposited in the retina; only 104 bits/s can get to V1. Obviously, the visual information changes from the retina to V1 is dynamic degradation.

The type model of ganglion cell is On-center, which is shown in Fig. 2.

Fig. 2
figure 2

On-center model of ganglion cell. The RF of ganglion cell, composed of photoreceptors, contains an inner circle and an outer circle, which trigger the antagonist mechanism to realize the discrimination between light and dark

If the stimuli of light are located at the inner circle of RFs of photoreceptors, the ganglion cells generate action potentials. If the stimuli are located at the outer circle of RFs, the ganglion cells inhibit action potentials. Consequently, the central stimulation response and the peripheral stimulation response are mutually offset, leading to the discovery that the ganglion cells are very sensitive to the difference in brightness in RFs [25, 34]. High sensitivity is positively correlated with visual information degradation since high sensitivity indicates that RFs extract the features efficiently. In our previous study [60], convolution calculation between RFs plays an important role in degradation. Thus, the visual information dynamic degradation occurs [41].

The architectures of RFs of LGN are the same as those of RFs of ganglion cells, which include two concentric circles [14]. After the processing of LGN, the visual information is transmitted to V1. Similarly, LGN also can identify corresponding features. These characteristics make the visual information further degrade after the processing of LGN.

Both the simple cells and complex cells in V1 display a strong response to the specific preferred orientation. The architecture of simple cells is very similar to that of the Gabor filter [45], as shown in Fig. 3. Complex cells have no requirement for specific locations and are the abstraction of simple cells. These characteristics of V1 cells further strengthen the capability of feature detection, which also degrades the visual information.

Fig. 3
figure 3

Simple cells for different preferred orientations

2.2 The visual information changes from V1 to V2

Section 2.1 briefly introduces and analyzes the reason for visual information dynamic degradation from the retina to V1 in visual nervous system. Nevertheless, the visual information remains unknown for the changes in transmission from V1 to V2 and for the changes in ventral pathway transmission as the cortical order increases.

Sparse coding theory is a critical approach in visual information processing. Due to the restriction of energy metabolism during brain information processing and signal transmission, the number of neurons that process large amounts of visual information is very few [17]. To some extent, the activity of simple cells in V1 can be summed as a linear function of RFs in a small spatial position. We could utilize the Gabor function to represent the characteristics of the two-dimensional mapping of simple cells [37]. The complex cells are regarded as the abstraction of simple cells. There is no significant difference directly from the morphological perceptive of simple cells and complex cells, which seem to be the same type of cells [15]. Some research results have shown that the functional classification of simple cells and complex cells is not static, and their functions can mutually transform into each other [49]. From the perspective of the computational model, tuning parameters achieve continuous behavior from simple cells to complex cells.

V2 cells have a characteristic of selectivity for the corners [4], comprising two different lines from end to end, each orientation of which is derived from V1 cells. Consequently, V2 cells can be represented as two weighed Gabor filters [56, 61].

According to the hierarchical hypothesis model of the primary visual cortex proposed by Hubel and Wiesel [18], the external information arriving at the visual system abides by the principles of the pathway of retina-LGN-V1–V2. Concerning the information separation and processing model proposed by Livingstone and Hubel [29], the shape, color, motion, and stereopsis are separated in V1 and V2 during the information processing of retina-LGN-V1–V2. Since we focused on the visual information changes, we paid greater attention to the shape. Hosoya and Hyvärinen have proposed a model based on a 3-layer network consisting of simple cells, complex cells, and V2 cells [17]. Accordingly, we contended that the visual information processing of retina-LGN-V1–V2 in the visual system, shown in Fig. 4, can be represented by a structural schematic diagram, as shown in Fig. 5.

Fig. 4
figure 4

The diagram of visual information processing and transmission

Fig. 5
figure 5

The structural diagram of visual information from retina to V2

Figure 4 shows that the red-brown line represents the external information transmission by neural mapping from the retina to V2. With the model designed on the basis of Fig. 4 and neurobiological experiments [17], [27]), we have established a structural schematic diagram of visual information transmitting to V2, as shown in Fig. 5. Figure 5 allows for the calculation of the visual information changes from V1 to V2.

2.3 The analysis of visual information changes from V1 to V2

The visual information changes from V1 to V2 has long mystified neuroscientists. In other words, there is no available method to quantitatively analyze the visual information changes from V1 to V2, whether from the perspective of neurobiological experiments or computational models. That hinders the understanding of the mechanisms of visual information processing. Literature suggests that the edge detection channel of the visual system, which is the functional channel of retina-LGN-V1, has the characteristic of one-to-one neural mapping [60]. The mapping mechanism from the retina to V1 is closely related to the convolution calculation, which partly causes significant dynamic degradation. The EDMRV1 model is established based on the pathway of photoreceptor-ganglion cell-LGN-V1. The simulation results turned out to fit well with the experimental data provided by Anderson [2] and clearly explained the dynamic degradation phenomenon, as shown in Fig. 6.

Fig. 6
figure 6

The visual information dynamic degradation of photoreceptor-ganglion cell-LGN-V1 based on EDMRV1 model [60]. The line that represents the visual information of LGN and V1 is very close to the x-axis. To illustrate the y-axis values of the points of V1 and V2 are unequal to 0, we have zoomed in to clarify, presented in two small boxes

Since the visual information processing is processed by the RFs of simple cells and complex cells in V1, the processed information is directly output to V2 cells. According to the neural mapping from V1 to V2, it can be argued that one RF of V2 is weighted by two RFs of V1, which are the same or different preferred orientations [17]. Hence, the visual information is in fact transmitted in the way of connection. Given this connection, we established a visual information detection model based on V2, which could predict and calculate the visual information changes in V2 through a quantitative analysis.

As existing research uncovers [60], in the pathway of photoreceptor-ganglion cell-LGN-V1, the visual information hierarchical transmission from low-order to high-order visual cortex follows the convolution calculation. It is also the main reason for the visual information changes from the retina to LGN to V1. Following the convolution calculation through the “retina-LGN-V1” pathway, the RFs of V2 are constructed by combining RFs of V1 (Minami & Naokazu, 2011), which accounts for the convolution calculation exists in the neural mapping from V1 to V2. Therefore, it is reasonable and feasible to use the photoreceptor-ganglion cell-LGN-V1–V2 model to predict and calculate the visual information changes.

Due to the intricate connections between neurons, the characteristic of connections in different RFs is closely related to spike timing-dependent plasticity (STDP) [6, 24, 44], which is also known as pulse-time-dependent plasticity. The connection characteristics are also tightly linked with the orientation selectivity of RFs [7]. STDP comprises two types: long-term potentiation (LTP) and long-term depression (LTD) [14]. The relationship between the sequence of firing and the connection strength determines the detection of image feature information by RFs.

  1. (1)

    In the area of edges of the image, the presynaptic and postsynaptic neurons produce synchronous and positive discharge with high probability under the LTP effect. At this time, the synaptic connection is continuously strengthened, as expressed in the following:

    $$ \text{potentiation} (i,\;j) = t_{{{\text{post}}}} {\text{(}}i, j{\text{)}} \times \left( {1 + {\text{synapse(}}i, j{\text{)}} \times e^{{ - \left| {t_{{{\text{pre}}}} {\text{(}}i, j{\text{)}} - t_{{{\text{post}}}} {\text{(}}i, j{\text{)}}} \right|}} } \right),\;t_{{{\text{pre}}}} < t_{{{\text{post}}}} , $$
    (1)

    where potentiation(i, j) represents the decoding information of the image features after the LTP effect in STDP, synapse(i, j) indicates the strength of the synapse connection.

  2. (2)

    In the none-edge area, the presynaptic and postsynaptic neurons produce non-synchronous and high-probability non-positive discharges under the LTD effect. At this point, the synaptic connections are constantly suppressed, expressed by the following:

    $$ \text{depression} (i,\;j) = t_{{{\text{post}}}} {\text{(}}i, j{\text{)}} \times \left( {1 - {\text{synapse(}}i, j{\text{)}} \times e^{{ - \left| {t_{{{\text{pre}}}} {\text{(}}i, j{\text{)}} - t_{{{\text{post}}}} {\text{(}}i, j{\text{)}}} \right|}} } \right),\;{\text{else,}} $$
    (2)

where depression(i, j) represents image edge decoding information after the LTD effect in STDP. Concerning Eqs. (1) and (2), tpre < tpost indicates that pre- and postsynaptic neurons are positively discharged. Reversely, tpre ≥ tpost indicates that pre- and postsynaptic neurons are non-positively discharged. The results of STDP rule are shown in Fig. 7.

Fig. 7
figure 7

STDP rule. The potentiation and depression distribution of weights. The x-axis indicates the results of potentiation(i, j) and depression(i, j); the y-axis, the calculation times, which means the higher value of the y-axis indicates that the higher frequency of appearance of the corresponding x-axis value

To this end, considering Fig. 5, in order to advance research on the changes of visual information from V1 to V2 and the mechanism of visual information processing of V2, we designed a 6-layer feedforward network model, which is a predictive model for visual information changes in V2 (PMVICV2).

In our proposed model, layer 1 represents the photoreceptor of the retina. The real world’s information is transmitted to the photoreceptor after being refracted by the lens and then converted into a bioelectric signal. At this point, the size of the entire image on the retina is denoted as A, which depends on the specific experiment subjects. It is assumed that A could be divided into M × N patches. We defined the image information on the photoreceptor as I(i, j) (i = 1, 2, 3, …, M; j = 1, 2, 3, …, N), then:

$$ A = \sum\limits_{{j = 1}}^{N} {\sum\limits_{{i = 1}}^{M} {I_{{i,j}} \left( a \right)} } , $$
(3)
$$ a = \Delta i \times \Delta j. $$
(4)

Layer 2 represents the RFs of ganglion cells; we defined the visual information in ganglion cells as I2(i, j), each RF of ganglion cells defined as a. Being processed by horizontal cells and bipolar cells, I2(i, j) is the visual information transmitted by I1(i, j) to the ganglion cells. One ganglion cell will obtain signal inputs of 103 ~ 104 photoreceptors [56]. Suppose that the RF of a ganglion cell, shown in Fig. 2, can be represented by DOG (i, j), the antagonism of the outer circle and inner circle can be described by the difference of two Gaussians (DOG) [49]

$$ {{DOG}}\left( {i,\;j} \right)_{{{\text{ganglion}}\;{\text{cell}}}} = {{DOG}}_{1} \left( {i,j} \right) - {{DOG}}_{2} \left( {i,j} \right) = k_{{\text{c}}} e^{{\left( { - \frac{{i^{2} + j^{2} }}{{2r_{{\text{c}}} ^{2} }}} \right)}} {\text{ - }}k_{{\text{s}}} e^{{\left( { - \frac{{i^{2} + j^{2} }}{{2r_{{\text{s}}} ^{2} }}} \right)}} , $$
(5)

where DOG1 represents the inner circle, and DOG2 represents the outer circle. kc and ks, respectively, indicate the maximum sensitivity of the central and peripheral areas of the RF. rc and rs represent the radius concerning the maximum sensitivity of central and peripheral areas of the RF when they drop to e−1.

Each Ii, j(a) contains the visual information of the corresponding RF and feature information of some images. In other words, the RF of each ganglion cell has the corresponding Ii, j(a). Such neural mapping exists extensively in the visual system. A patch Ii, j(a) activates the corresponding ganglion cell and triggers its higher-frequency action potential.

At this point, the visual information on the RF of the ganglion cell is I2(i, j), shown in the following:

$$ I_{2} \left( {i,\;j} \right) = \left( {I_{1} * {{DOG}}_{{{\text{ganglion}}\;{\text{cell}}}} } \right)\left( {i,\;j} \right). $$
(6)

In layer 3, the ganglion cells in layer 2 have processed the visual information, which is transmitted to LGN. The RF of LGN is divided into two antagonistic areas, of which the structure and function are very similar to that of the ganglion cell [14]. To this end, we still used the DOG model for representation. Then, we supposed the visual information on the RF of LGN in layer 3 is I3(i, j), which is mapped from layer 2, shown in the following:

$$ I_{3} \left( {i,\;j} \right) = \left( {I_{2} * {{DOG}}_{{{\text{LGN}}}} } \right)\left( {i,\;j} \right). $$
(7)

In layer 4, the simple cells in V1 have orientation selectivity for the features on the image [28], that is, they have a strong selectivity for features with specific orientations, at which point the corresponding neuron responds strongest in the orientation, shown as a two-dimensional Gabor function [49]:

$$ G_{{\lambda ,\theta ,\psi ,\sigma ,\gamma }} \left( {i,j} \right) = e^{{ - \frac{{{i^{\prime 2}} + {\gamma^{2}} j^{\prime 2} }}{{2\sigma ^{2} }}}} \cos \left( {2\pi \frac{{i^{\prime}}}{\lambda } + \psi } \right), $$
(8)
$$ \left\{ {\begin{array}{*{20}c} {i^{\prime} = i\cos \theta + j\sin \theta } \\ {j^{\prime} = - i\sin \theta + j\cos \theta } \\ \end{array} } \right.. $$
(9)

Equation (8) is the product of a Gaussian function and a cosine function. λ is the wavelength, which directly affects the filter scale of the filter. θ is the direction of the filter. ψ is the phase shift of the tuning function. γ is the ratio of spatial vertical to horizontal. σ is the variance of the Gaussian filter.

The RFs in V1 prefer different orientations [14]; we sampled every 60° with 3 orientations of RFs, considering that the V1 area is not the emphasis of the current research. Then, we supposed the visual information on the RFs of the simple cells in layer 4 is I4(i, j), which came after the neural mapping of layer 3; we also defined the RF of V1 as Gaborv1, shown in the following:

$$I_{4} \left( {i,\;j} \right) = \left( {I_{3} * Gabor_{{{\text{v1}}}} } \right)\left( {i,\;j} \right).$$
(10)

In layer 5, complex cells originate from the inputs of simple cells at the same orientation but at different locations, which means the abstraction of RFs of simple cells. Since the RFs of complex cells have no clear antagonist area, there is no strict requirement for the location as the orientation selection. Simple cells and complex cells can sometimes be converted functionally. The image information on layer 5 can be written as I5(i, j). Therefore, I5(i, j) and I4(i, j) are considered the same function.

In layer 6, the visual information on the RFs in V2 is the neural mapping from layer 5, recorded as I6(i, j). The RF in V2 is composed of two RFs in V1 (defined as \(\left( {Gabor_{{{\text{v1}}^{\prime}_{{{1}}} }} + Gabor_{{{\text{v1}}^{\prime}_{{{2}}} }} } \right)\left( {i,\;j} \right)\)). Each RF’s preferred orientation could be the same or different. The RF in V2 is selective for the angle profile, shown in the following equation:

$$I_{6} \left( {i,\;j} \right) = \left( {I_{5} * \left( {Gabor_{{{\text{v1}}^{\prime}_{{{1}}} }} + Gabor_{{{\text{v1}}^{\prime}_{{{2}}} }} } \right)} \right)\left( {i,\;j} \right),$$
(11)

where the visual information reaches layer 5, that is, the neural mapping from V1 to V2, I5(i, j) is equivalent that a series of stimuli react to the different RFs in V2. Therefore, the RFs in V2 extract the corresponding features according to the different strengths of the stimuli. To this end, the model of Layer 6 is composed of two RFs in V1 with the preferred orientation. Such a combination forms an angle, the value range of which is [0, 360] in degree. Also, each angle has an orientation, the value range of which is also [0, 360]. The unit is degree. The details can be seen in Fig. 8.

Fig. 8
figure 8

The RFs in V2. Two RFs in V1, which exhibit different orientation selectivity, combine an RF of V2 with varying orientations. a. Showing the angledegree in the subfigure that we defined. b. Showing the angleorientation in the subfigure that we defined. In the following text, we used “orientation” to replace “orientationv2.”

The RFs in V2 have different preferred angles according to varying degrees and orientations. In the current study, each 30° can be used as the sample angle. As such, 12 angles and 12 orientations are shown in Fig. 9.

Fig. 9
figure 9

The RFs (two gray sides, each of which is the RF of V1) in V2 with 12 angles and 12 orientations. a. The angles of RFs equal 0° in different orientations. The dark gray indicates two sides of RFs are overlapped. b. The angles of RFs equal 30° in different orientations. c. The angles of RFs equal 60° in different orientations. d. The angles of RFs equal 330° in different orientations

According to the first row of Fig. 9, each angle has two sides, which are composed of V1; one of them is fixed, the other is rotated. These two form the shape with different degrees. In the first column, the shape of the angle is fixed; the rotation forms different orientations of angles. Each side of the angle is an RF in V1, which has a specific orientation preference. Among the RFs in V2, angles in the first column are 0, as shown in (a) of Fig. 9. The second column of RFs has angles with 30°, as shown in (b) of Fig. 9. Angles in the third column are 60°, as shown in (c) of Fig. 9. The fourth has angles with 330°, as shown in (d) of Fig. 9.

The angles with different degrees and orientations in layer 6 are defined as follows:

(1) angledegree (see (a) of Fig. 8) is indicated in Eq. (12):

$$ \left\{ {angle_{{{\text{degree}}}} |angle_{{{\text{degree}}}} = 30^\circ \times n,\;n \in \left[ {0,\;11} \right]\;{\text{and}}\;n \in N} \right\}. $$
(12)

(2) angleorientation (see (b) of Fig. 8) is shown in Eq. (13):

$$\left\{ {angle_{{{\text{orientation}}}} |angle_{{{\text{orientation}}}} = 15^\circ \times n,\;n \in \left[ {0,\;23} \right]\;{\text{and}}\;n \in N} \right\}.$$
(13)

On the effect of visual information stimuli in layer 5, layer 6 performs convolution calculation with RFs at varying angles and orientations in V2 and finally obtains RFs responses in V2.

3 Results and analyses

3.1 Simulation

According to the description of the above PMVICV2 model, the following four diverse scenarios are used as experimental examples. In the V1 area, the sampling angle is 60°; in the V2 area, the sampling angle is 30°, 60°, 90°, 120°, 150°, 180°, sequentially. In addition, as shown in each row of Fig. 9 that contains 4 RFs with different orientations, we used four sampling orientations in the following experiments. The phenomenon of dynamic degradation exists on the pathway of photoreceptor-ganglion cell-LGN-V1 and the pathway of V1–V2. Each pixel of images was encoded in one byte.

3.1.1 Experiment of the portrayal of Lena

As shown in the (b) of Fig. 10, the picture of Lena (original image), of which resolution was 512 × 512, was utilized as the experimental object. The optical signal reached the photoreceptors, the visual information of which was 2.10 × 106 bits. Subsequently, the visual information was processed by cones and rods and then reached ganglion cells for processing. Since the RFs in ganglion cells had antagonistic properties that are highly sensitive to the change of light and dark, the edge feature information of the image could be detected. The dynamic degradation occurred after the visual information was transmitted to LGN for processing. Simple cells with different preferred orientations are defined as θ60°,120°,180°, which actively responded to the image information and recognized the edge feature information in the specific orientation. The visual information in V1 was 1.34 × 103 bits, 1.07 × 103 bits, and 1.14 × 104 bits, respectively, which was about 6.39 × 10–4 times, 5.11 × 10–4 times, and 5.43 × 10–3 times that of the photoreceptors. The visual information in V1 was obtained by the processing of RFs of ganglion cells and those of LGN. Finally, the visual information was transmitted from V1 to V2. The RFs in V2 have a strong response to the different corresponding angles and orientations, which can identify the image feature information. These angles are denoted as angledegree, shown in Eq. (14):

$$ \left\{ {{{angle}}_{{{\text{degree}}}} |{{angle}}_{{{\text{degree}}}} = 30^\circ \times n,\;n \in \left[ {1,\;6} \right]\;{\text{and}}\;n \in N} \right\}. $$
(14)
Fig. 10
figure 10

Lena image and PMVICV2 model responses. a. Lena original image and the processed image in V1(60°) and V2 (30°, 60°, 90°, 120°, 150°, 180° with the same orientation). b. The results of a: dynamic degradation of Lena image processed by PMVICV2 model. The line that represents the visual information of V1 and V2 is very close to the x-axis. To illustrate the y-axis values of the points of V1 and V2 are unequal to 0, we have zoomed in to clarify, presented in two small boxes

The image after processing is shown in (a) of Fig. 10. The visual information in V2 is shown in Table 1

Table 1 Visual information in V2 of the experiment of Lena (Unit: bits)
Table 2 Relationship between photoreceptors and V2 of the experiment of Lena

The comparison of visual information between photoreceptors and V2, shown as the following:

From the above analysis, considering the image of Lena as the experimental object, we have indicated the changes of visual information from the retina to V1 and V2, shown in (b) of Fig. 10. It can be recognized that the average value of visual information of photoreceptors was 2.10 × 106 bits; the average value of V1 was 4.60 × 103 bits; the average value of V2 was 4.26 × 102 bits. These values demonstrated that the visual information degrades significantly from photoreceptors to V1. The visual information of V1 was 2.20 × 10–3 times that of the photoreceptor. Nevertheless, during the processing from V1 to V2, the dynamic degradation already existed but was scanty; the visual information of V2 was 9.25 × 10–2 times that of V1.

3.1.2 Experiment of the island of Manhattan

As shown in (a) of Fig. 11, the Manhattan image (original image), of which resolution was 1023 × 674, was used as the experimental object. The photoreceptor received the optical signal, the visual information of which was 5.52 × 106 bits. Afterward, the visual information was processed by cones and rods and then passed to ganglion cells for processing. As the RFs in ganglion cells had antagonistic properties, the edge features of the image could be detected. The dynamic degradation occurred after the visual information was transmitted to LGN for processing. Simple cells with different preferred orientations are defined as θ60°,120°,180°, which actively responded to the image information and recognized the edge feature information in the specific orientation. The visual information in V1 was 3.52 × 103 bits, 3.36 × 103 bits, and 6.62 × 104 bits, respectively, which was about 6.39 × 10–4 times, 6.09 × 10–4 times, and 1.20 × 10–2 times that of the photoreceptors. The visual information in V1 was obtained by the processing of RFs of ganglion cells and those of LGN. Finally, the visual information was transmitted from V1 to V2. The RFs in V2 had a strong response to the different corresponding angles and orientations denoted as angledegree, which can identify the image feature information, shown in Eq. (14). The processed image is shown in (a) of Fig. 11. Lastly, the visual information in V2 is shown in Table 3:

Fig. 11
figure 11

Manhattan image and PMVICV2 model responses. a. Manhattan original image and the processed image in V1(60°) and V2 (30°, 60°, 90°, 120°, 150°, 180° with the same orientation). b. The results of a: dynamic degradation of Manhattan image processed by PMVICV2 model. The line that represents the visual information of V1 and V2 is very close to the x-axis. To illustrate the y-axis values of the points of V1 and V2 are unequal to 0, we have zoomed in to clarify, presented in two small boxes

Table 3 Visual information in V2 of the experiment of Manhattan (Unit: bits)
Table 4 Relationship between photoreceptors and V2 of the experiment of Manhattan

The comparison of visual information between photoreceptors and V2 is shown in the following:

From the above analysis, taking the image of the island of Manhattan as the experimental object, we have indicated that the visual information changes from the retina to V1 and V2, as shown in (b) of Fig. 11. It can be recognized that the average value of visual information of photoreceptors was 5.52 × 106 bits; the average value of V1 was 2.43 × 104 bits; the average value of V2 was 1.65 × 103 bits. These values demonstrated that the visual information degraded significantly from photoreceptors to V1. The visual information in V1 was 4.41 × 10–3 times that in the photoreceptor. Nevertheless, during the processing from V1 to V2, the dynamic degradation already existed but was scanty; the visual information in V2 was 6.77 × 10–2 times that in V1.

3.1.3 Experiment of the harbor of Sydney

As shown in (a) of Fig. 12, the Sydney image (original image), the resolution of which is 1663 × 934, was used as the experimental object. The photoreceptor received the optical signal of the visual information, which was 1.24 × 107 bits. Subsequently, the visual information was processed by cones and rods, and then passed to ganglion cells. The RFs in ganglion cells easily detected the edge features of the image. The dynamic degradation occurred after the visual information was transmitted to LGN. Simple cells with different preferred orientations are defined as θ60°,120°,180°, the visual information of which in V1 was 7.17 × 103 bits, 6.83 × 103 bits, and 2.02 × 104 bits, respectively, which was about 5.77 × 10–4 times, 5.50 × 10–4 times, and 1.63 × 10–3 times that of the photoreceptors. The visual information in V1 was obtained by the processing of RFs of ganglion cells and LGN. Finally, the visual information was transmitted from V1 to V2. The RFs in V2 had a strong response to the different corresponding angles and orientations denoted as angledegree, which can identify the image feature information, as shown in Eq. (14). The processed image is shown in (a) of Fig. 12. Lastly, the visual information in V2 is shown in Table 5

Fig. 12
figure 12

Sydney image and PMVICV2 model responses. a. Sydney original image and the processed image in V1(60°) and V2 (30°, 60°, 90°, 120°, 150°, 180° with the same orientation). b. The results of a: dynamic degradation of Sydney image processed by PMVICV2 model. The line that represents the visual information of V1 and V2 is very close to the x-axis. To illustrate the y-axis values of the points of V1 and V2 are unequal to 0, we have zoomed in to clarify, presented in two small boxes

Table 5 Visual information in V2 of the experiment of Sydney (Unit: bits)
Table 6 Relationship between photoreceptors and V2 of the experiment of Sydney

The comparison of visual information between photoreceptors and V2 is shown in the following:

From the above analysis, taking the image of Sydney as the experimental object, we have shown that visual information changes from the retina to V1 and V2, as illustrated in (b) of Fig. 12. It can be recognized that the average value of visual information of photoreceptors was 1.24 × 107 bits; the average value of V1 was 1.14 × 104 bits; the average value of V2 was 4.26 × 103 bits. These values demonstrated that the visual information degrades significantly from photoreceptors to V1. The visual information in V1 was 9.17 × 10–4 times that in the photoreceptor. Nevertheless, during the processing from V1 to V2, the dynamic degradation was scanty; the visual information in V2 was 3.74 × 10–1 times than that of V1.

3.1.4 Experiment of Mount Fuji

As shown in (a) of Fig. 13, the Mount Fuji image (original image), of which the resolution was 3840 × 2160, was used as the experimental object. The photoreceptor received the optical signal, of which the visual information was 6.64 × 107 bits. Subsequently, the visual information was processed by cones and rods and then passed to ganglion cells, and the edge features of the image can be detected. The dynamic degradation occurred after the visual information was transmitted to LGN. Simple cells with different preferred orientations are defined as θ60°,120°,180° as well. The visual information in V1 was 22.4 bits, 26.9 bits, and 5.83 × 102 bits, respectively, which was about 3.38 × 10–7 times, 4.05 × 10–7 times, and 8.79 × 10–6 times that of the photoreceptors. The visual information in V1 was obtained by the processing of RFs of ganglion cells and those of LGN. Finally, the visual information was transmitted from V1 to V2. The RFs in V2 had a strong response to the different corresponding angles and orientations denoted as angledegree, identifying the image feature information, as shown in Eq. (14). The processed image is displayed in (a) of Fig. 13. Lastly, the visual information in V2 is shown in Table 7

Fig.13
figure 13

Mount Fuji image and PMVICV2 model responses. a. Mount Fuji original image and the processed image in V1(60°) and V2 (30°, 60°, 90°, 120°, 150°, 180° with the same orientation). b. The results of a: dynamic degradation of Mount Fuji image processed by PMVICV2 model. The line that represents the visual information of V1 and V2 is very close to the x-axis. To illustrate the y-axis values of the points of V1 and V2 are unequal to 0, we have zoomed in to clarify, presented in two small boxes

Table 7 Visual information in V2 of the experiment of Mount Fuji (Unit: bits)

The comparison of visual information between photoreceptors and V2 is shown in Table 8

Table 8 Relationship between photoreceptors and V2 of the experiment of Mount Fuji

From the above analysis, drawing on the image of Mount Fuji, we have indicated that visual information changes from the retina to V1 and V2, as shown in (b) of Fig. 13. It can be recognized that the average value of visual information of photoreceptors was 6.64 × 107 bits; the average value of V1 was 2.11 × 102 bits; the average value of V2 was 7.51 × 102 bits. These values demonstrated that the visual information degraded significantly from photoreceptors to V1. The visual information in V1 was 3.18 × 10–6 times that in the photoreceptor. Nevertheless, during the processing from V1 to V2, the visual information in V2 remained constant, which was 3.57 times that of V1.

Results and analyses.

Based on the above experimental images, the visual information of photoreceptors from the PMVICV2 model was 2.10 × 106 bits, 5.52 × 106 bits, 1.24 × 107 bits, and 6.64 × 107 bits, respectively. The average value of those was 2.16 × 107 bits, as shown in (a) of Fig. 14.

Fig.14
figure 14

Results and analyses of visual information dynamic degradation of four experiments. a. Showing visual information of photoreceptors. b. Showing visual information of V1. c. Showing visual information of V2. d. According to ac, we drew a subfigure d, which shows the visual information dynamic degradation of photoreceptor-V1–V2. The lines that represent the visual information of V1 and V2 are all overlapped and very close to the x-axis. To illustrate the y-axis values of the points of V1 and V2 are unequal to 0, we have zoomed in to clarify, presented in three small boxes

After transmitting the visual information to the RFs of ganglion cells and LGN and V1 area, the data were calculated as 4.60 × 103 bits, 2.43 × 104 bits, 1.14 × 104 bits, and 2.11 × 102 bits, respectively. The average value was 1.01 × 104 bits, which is shown in (b) of Fig. 14.

Ultimately, the processed visual information was transmitted from V1 to V2, of which the value was 4.26 × 102 bits, 1.65 × 103 bits, 4.26 × 103 bits, 7.51 × 102 bits, respectively, and the average value was 1.77 × 103 bits, shown in (c) of Fig. 14.

Figure (a)–(c) of Fig. 14 shows that the visual information changes of the PMVICV2 model in these four scenarios could be obtained, as shown in (d) of Fig. 14 and Table 9. The visual information transmitted to V2 was 8.19 × 10–5 times that to photoreceptor and 0.18 times that to V1; the degradation percentage was 99.992% (3 digits after the decimal point to ensure accuracy). Despite the different test images, there were no significant differences across the experimental results. Relatively, it can be concluded that the significant dynamic degradation existed in the photoreceptor to V1 during the pathway of photoreceptor-ganglion cell-LGN-V1–V2. In the subsequent process of transmitting from V1 to V2, there had only a short dynamic degradation. Taken the analyses together, the significant dynamic degradation existed in the pathway of photoreceptor-ganglion cell-LGN-V1, which exhibited substantial differences between light and dark were retained by convolution calculation. Then, the edge signal of the image was obtained. In the process of visual information processing of the pathway of V1–V2, although the RFs in V2 had a strong response to the corner, they did not further extract the image feature, which accounted in part for the small dynamic degradation.

Table 9 Visual information changes in four experimental scenarios from PMVICV2 model

4 Conclusions

Taking into account energy metabolism, the brain capacity is actually limited in terms of fully transmitting visual information into the visual cortex, leading inevitably to visual information degradation. Then, how could the brain perceive the environment efficiently? Chumbley and Friston contend that surprise, captured by prediction error (defined as the difference between observed and expected quantities), drives learning [9, 12]. Our previous research showed one reason for degradation, which is related to prediction error, is that retina-LGN-V1 contains the convolution calculation, which acts to extract the pivotal visual information, ignore the unnecessary, thus effectively saving brain power consumption. The findings served as a further elaboration of the “prediction error” proposed by Friston. Building on this discovery, we were driven to further explore the visual information degradation or changes in V1–V2. As a result, in undertaking this study, we sought to shed light on the mechanism by which the visual information is mapped from V1 to V2. Through establishing an original PMVICV2 model and conducting a quantitative analysis, we reached four major conclusions stated as follows:

  1. (1)

    A quantitative description of visual information degradation in V1–V2.

    According to the results of the PMVICV2 model, we achieved Table 9, which shows the visual information in V2 is 8.19×10-5 times that of the photoreceptor and 0.18 times that of V1. It yields an exact quantitative interpretation of the visual information dynamic degradation in V2 by developing and experimenting with a new computational model. In doing so, it complements previous research wherein the neuroscientific experiment of the dynamic degradation focused chiefly on V1, which promotes a more accurate and specific understanding of the way visual information is encoded and managed in V2.

  2. (2)

    A strong response to the “corner” information, but a slight degradation in V1–V2.

    While moving from low-order to high-order visual signal processing, the visual information degrades significantly from the pathway of photoreceptor-ganglion cell-LGN-V1 [41, 60]. However, according to (d) of Figure 13 and Table 9, the dynamic degradation has been scarcely observed during the mapping from V1 to V2. Whereas the RFs in V2 exhibit a strong response to the “corner” information [17], they do not further extract the image feature information. This demonstrates that a significant amount of dynamic degradation only has occurred on the pathway of photoreceptor-ganglion cell-LGN-V1, leaving limited visual information existing in V1 for the RFs in V2 to encode. This is a new discovery that has never been noticed before.

  3. (3)

    Convolution calculation in V1–V2.

    During the visual information processing [26], the convolution calculation can be found on the pathway of photoreceptor-ganglion cell-LGN-V1 [60]. Moreover, the anatomical architecture between V1 and V2: one RF of V2 is weighted by two RFs of V1 [17], suggesting that the convolution calculation also exists in V1–V2.

  4. (4)

    STDP rule making a more effective response to “corner” information.

As we mentioned in Fig. 7, STDP rule intensifies the edge of the image and moderates the non-edge of the image. Therefore, the RFs of V2 can effectively respond to and encode “corner” information about the real world, dealing with the scarcity of visual information mapped from V1.

Despite the quantitative calculation and interpretation of the visual information changes in V1–V2, the study also has limitations. Structurally, we did not take all the details of retina-LGN-V1–V2 into account due to the fact that the human visual system is complicated (see Fig. 4) and that the visual information processing mechanisms have not been clearly uncovered [41, 60]. Therefore, we concentrated on the basic contour features such as edge and corner, which are considered highly relative to the degradation. Furthermore, we have not counted the top-down predictions since the novel visual information of the real-world mapping from the retina to V2, which involves degradation, is a bottom-up transmission. According to Chumbley and Friston [9, 12], bottom-up inputs make prediction errors, which originate from the novel visual information and are linked to degradation. The mutual exchange of bottom-up prediction errors and top-down predictions from higher-order areas proceeds until prediction error is minimized. It means the degradation during the mapping from the retina to higher-order areas can be minimized likewise. This complex operative mechanism merits continued investigation in our future research.