Object Categorization at the Higher Levels Do With More Neurons Than Finer Levels and Takes Faster

Humans can categorize an object in three ways. For example, a car can be categorized as vehicles (superordinate), ground vehicles (basic) or car (subordinate). Different semantic levels of categorization is referred to these three categorization modes. There are different speed and accuracy for a similar object in the classification of these levels. However, much research has been done in this context, the trend of these levels is still questionable as to the accuracy and reaction time and the reason for this difference. In this paper, we examine the order of these levels and the reason for their differences. Here we show the superordinate advantage is declared and after this level the base level and the subordinate level are expressed respectively. To this end, first, we design an experiment to examine the semantic levels in the human eye system. The result of this experiment is the superordinate advantage. Actually, at the superordinate level at the same time, the reaction time was lower than other levels, and the efficiency was higher than other levels. In addition, a computational model is introduced that has time and can classify semantic levels. The model is trained for ten categories in this study. These ten categories are considered as subordinate level and five levels of basic and two levels of superordinate are expressed. We found that, at higher levels, such as superordinate, more neurons participate in the clustering task, so outcome is faster and more accurate results. Moreover, the proposed model has been tested for inverted input images to verify the model.


I. INTRODUCTION
The processing time of the visual cortex is a useful tool to study the different layers of the visual cortex and understanding the hierarchy of visual cortex [1].Note that the processing time of different areas of visual cortex is depended on the problem types.In addition, image processing is carried on quickly in the visual cortex and can be divided into two parts, the primary processing and the high-level processing [2]- [5].
The visual cortex is feed from the image information when a stimulus is onset.It is shown that the human visual system can react immediately with an average reaction time between approximately 250ms to 400ms [6], while the processing in the visual cortex occurs in a bottom to top procedure [7].
The associate editor coordinating the review of this manuscript and approving it for publication was Junhua Li .
In fact, Information first switched by the lateral geniculate nucleus (LGN), and in terms of information hierarchy in the visual cortex, V1 is the first area that accesses the information.Then, the information in the ventral pathway is feed to V2 and V4.Afterward, it is feed to the Inferior temporal (IT) section containing posterior inferior temporal (PIT) and anterior inferior temporal (AIT) tow section.Note that the IT section, which is the last section of the visual cortex, contains a number of neurons that only spike to some specific stimuli.Finally, information is feed to the prefrontal cortex (PFC) and motor cortex (MC) sections of the brain.The delay in the last two section is considered constant [8]- [11].Semantic levels are divided into three levels of superordinate, basic and subordinate [12].Superordinate is known as the highest level of the category and may keep a high degree of abstract information and generalities.The basic level is the fundamental level of categorization; however, it is not able to offer any particular configuration and gestalt.Most object categorizations in human's daily life are placed at this level.The basic level, which is placed between two other levels, is about the objects' existence, and most common features are categorized under this level.Finally, the subordinate level can be seen as the lowest level of categorization that classifies the objects in lower grade of generic.These three levels of categories perform in high resolutions and details.For example, one can categorize an object as a vehicle, ground vehicle or a car (superordinate, basic and subordinate levels, respectively).
A number of psychophysical studies have shown more rapid perceptual access to basic-level category information, called basic-level advantage, than to the higher or the lower levels [13].Nevertheless, there are some other psychophysical research results challenging basic-level advantage by demonstrating faster access to superordinate level category information [14], [15].Therefore, exploring the temporal sequence of object categorization at different levels of the hierarchy is one of the key points to understand the neural mechanism of object categorization in the brain [16].At a glance, humans can detect an object, recognize it as an animal or non-animal, categorize it as a marine animal or domestic animal, or finally identify it as a dog or cat.A crucial point is to know whether categorization time affects the level of categorization.On the other hand, if this is true, which category is detected earlier?
In 1976, Rash et al. first raised the question of which of the levels of classification would occur faster [17].They conducted an experiment in which a number of labels such as 'Animal', 'Bird' and 'North Cardinal Bird' were shown to a subject and requested to sort them based on the shown label.It is shown that the basic level was faster and stronger than the other two levels.
Another well-known study [18] shows that the categorization pathway is separated at different levels.In the experiment, participants must categorize the stimuli in upright and inverted states at two levels of superordinate and basic.Surprisingly, the accuracy of results is very high and the same at the superordinate level in both states of upright and inverted; however, the accuracy drops at the basic level at the inverted state.The presence of different paths in the classification is highlighted as a reason for this issue.
Another interesting research on categorization levels explains that categorization in the visual system is in that way which categorization performance is based on perceived similarity relations between items within and outside the category [19].The results in this study were time-consuming for classifying distractors and atypical objects, and most importantly they were specified the trend of superordinate, basic and subordinate in terms of speed and accuracy.They explained that the reason for this trend was the different speed of the course and fine information.
A recent study [14] illustrates a variety of experiments that in all cases superordinate is faster.Examined states are include variation in stimuli and stimuli onset time, which in all experiments superordinate are reported faster.
In 2015, a study [20] examines the pace of categorizing levels.Numerous experiments reveal that the pace of classification of different levels is associated with the stimulus time.Therefore, it is concluded that by increasing the display time, the superordinate advantage becomes to basic advantage.
In addition, another study [21] examined neural responses of the inferior temporal (IT) cortex related to macaque monkeys seeing a large number of object images.It is proposed that the basic level advantage compared to the other two levels.
A recent study [22] investigates the role of spatial frequency in object categorization in different levels.They separated some images into three filters of low spatial frequency (LSF), intermediate spatial frequency (ISF) and high spatial frequencies (HSF) bands and examine three levels of categorization with a HMAX model.In that research, they were explained for basic and subordinate levels, higher spatial frequency is needed and also accuracy in the basic level is higher than the subordinate level.Also, at HSF band, all levels yield a similar accuracy.The study introduces the stimulus frequency due to the speed of the categorization of levels.A review of the levels of categorization can assist to understand the functionality of the human eye system.It can potentially explain the classification mechanisms in visual cortex and address the questions that which level of classification is faster?And also, which level is more accurate?
In this paper, at first, we demonstrate a psychophysical experiment performing on ten participants.Here we asked the participants to categorize, vehicles, ground vehicles and cars in accordance using a label observed through each individual process.Note that vehicle, ground vehicles and cars categories are respectively considered as superordinate, basic and subordinate levels.The result of the experiment demonstrates superordinate advantages.Then, a model based on the human vision system including time is proposed.The model is then trained for ten categories.Superordinate advantage is also observed in the model results.A possible reason for obtaining these results in the model can be that more neurons are activated in the category, at higher levels.

A. PARTICIPANTS
In total, ten volunteers, seven males and three females, were included in this experiment.Volunteer age ranged from 20-31 years and mean age 23 years (one left handed).The experiment was approved by the ethics committee of the Iran University of Medical Sciences.All participants had normal or corrected to normal vision.

B. DATASET
We used images (255 × 255 pixels, and subtended approximately about 7 • ×7 • of visual angle) of eight object categories including fish, dolphins, cats, dogs, airplanes, helicopters, motorbikes, and cars.Most of the images were selected from the Chandler dataset [23], and others were gathered from some available public sources on the internet.Categorization levels were considered as vehicle/ non-vehicle for the superordinate, ground vehicle/ non-ground vehicle for the basic and car/ non-car for the subordinate level.Figure 1 illustrates a number of images obtained from the data set for each individual task.The distractor consists of everything except the target.The subjects had no prior knowledge concerning the target in the picture.

C. TASK AND SET-UP
In order to conduct the experiment, the subjects were seated in a dark room about 50cm away from a computer screen (Intel core 2 duo processor 2.66 GHz, 4 GB RAM, 85 Hz monitor refresh rate) and MATLAB psychophysics toolbox was used [24]- [26] to perform a speeded category verification task.All images were converted to 256 × 256 pixel grayscale and divided into two blocks.Each block contained 146 images (73 animals and 73 vehicles).Each volunteer responds to two different image blocks (10 subjects × 2 blocks = 20 responded per blocks), and we obtained 10 different reaction times for each image.
Each experiment started with a fixation point to participant centralization for 1000ms, followed immediately by vehicle, ground vehicle or car category label image for 800 to 1200ms.After that, the test images were displayed for 25ms, next a noise image remained on the screen for 100ms followed by a blank until the participant responded (see Figure 2).Participants responds by pushing a 'Yes' key if the label matches the object shown in the stimulus image, and a 'No' key if it does not.One-third of the category verifications were made at the superordinate level (car, motorbike, airplane, helicopter), one-third were made at the basic level (ground vehicles), and another one-third were made at the subordinate level (car).Note that in a correct response, the category label and the object in the stimulus must be matched.

D. EXPERIMENTAL RESULTS
We observed high accuracy in the result for all three categorization tasks: Median 93% in superordinate, 90% in basic and 89% in the subordinate (see Figure 3B).As it is mentioned previously, we obtain 10 different reaction times (RT) for each individual image.In the experiment, the correct responses with very high (>1000ms) or very low (<200ms) RT are not acceptable.Therefore RTs greater than 1000ms were eliminated from analysis first.Also, images with more than six correct responses are analyzed Here, RT for each individual image is computed as the median of RT among all correct responses.Figure 3A shows median, within-class and betweenclass variance of RT.As it can be seen, levels are separated enough regarding the average and variance of each level.The procedures at the superordinate level are faster than the basic level and the procedures at basic level are faster than the subordinate level.In this experiment, we found a superordinate level advantage in the level of categorization.

III. MODEL A. BASIC MODEL
In 2007, a novel temporal model based on spike-timingdependent plasticity (STDP) is introduced [27].Note that STDP method is a learning method in the temporal model, and involves two important concepts of LTP and LTD.Here, LTP means the previous synapse is activated a few milliseconds earlier than the next synapse, and LTD has the opposite concept.A simplified version of STDP is used in the spiking HMAX model [28].
where i and j respectively refer to the index of the post and presynaptic neurons, t i and t j are the corresponding spike times, ω ij is the synaptic weight correction.In addition, and a + and a − are two parameters representing the learning rate.This model, which is based on spiking network, is looking for stable and discriminative features in the data input.This spike model consists of five layers (S1, C1, S2, C2 and classifier) possessing a similar layout with the HMAX model but different layout structure.Note that S1 recognizes the edges by performing convolution a Gabor-Filter on incoming images.This layer is represented simple cells at the first stage in the visual cortex, as described by Hubel and Wiesel [29].
The cores of Gabor-Filter that is used in this model are 5 × 5 matrices at four angles of π/8, π/4 + π/8, π/2 + π/8, 3π/4 + π/8 that the amount of π/8 is to avoid focusing on the vertical and horizontal edges.These filters are applied to five different sizes of the original image (100%, 71%, 50%, 35% and 25%).Therefore, as it is shown in Fig. 4, there are 20 images as the output of this layer.The C1 layer is similar to a complex vision cortex cell, and each individual C1 complex cell is a local maximum of S1 cells.To calculate the local maximum, a 7 × 7 window moves on the output images of the S1 layer with 6 steps and then the maximum of that windows is stored as output.Afterwards, in the next step, four-images corresponding to four directions that are in a single size, between their pixels maximization is carried out.These maximums are saved in a matrix containing coordinate, scale and orientation and also a parameter that name is Time Latency.Time Latency is calculated with the following equation: where max(•) is a maximization operator and a is pixels amount of four images.Figure 5 illustrates the C1 layer proceeds Integrating these five outputs matrices results a 5 × 7225 matrix.This matrix is then sorted by Time Latency.In the  Next, a maximum operator is performed on all orientation at each scale and this maximum scale, orientation and coordinate is stored is a matrix with that latency.The latency is equal to inverted of that maximum value [27].
next step, data are feeds to S2.Note that S2 unit refers to the mid-level vision features.Each individual neuron is a weight matrix, normally 16 × 16, in four directions that this weights matrix is initialed randomly filled up with numbers between 0 and 1 and consist 5 matrices with sizes 75 × 50, 53 × 35, 38 × 25, 26 × 17 and 19 × 12.By feeding the information from the previous layer, the weight is matched with the corresponding neuron, depending on the direction, position and scale.Whenever a cell of neurons reaches a defined threshold (64 based on maximum accuracy), that neuron fires.At this time, the weight that caused the fire will be updated according to the rule of STDP.At the test phase, all these steps are repeated but the difference is that when a fire is happened, the numeral of input spike causing fire is saved in a spike series matrix.The number of this firing time is called the Firing Rate.Whatever when firing rate grows,  the detection time is reduced and the model recognizes the object faster.

B. MODEL RESULT
In order to train the model, ten categories are considered.The selections of these categories are in such a way that they can be seen in different levels of classification.Figure 6 shows these categories and levels of classification.
The model is trained using 110 neurons and 60,000 iterations for these 10 categories that each of them contains 100 images.The parameter such as the number of iteration and the number of neurons are selected based on achieving the best accuracy.Upon completion of unsupervised training, different neurons are selected for different input features.Figure 7 shows the trained weights of some neurons for some of the characteristics of the car, airplane, and the butterfly.
The classification structure in the model is that the neurons are associated with the properties of each class is connected to the single neuron of the next layer of that class.The main reason of choosing this particular type of connection is the effectiveness of level of categorization on  classifier connection.Figure 8 illustrates this concept for the three categorization levels.
In fact, In case the level of categorization is subordinate, is classifying as a car, only the car-related neurons participate in the categorization.When the aim is to classify ground vehicles, the neurons that describe the characteristics of the car and the motorcycle were selected for classification.In superordinate mode, where aims of categorization are vehicles, Neurons related to the car, motorcycle, helicopter and airplane are in operation.Figure 9 illustrates the firing rate of 110 neurons for cars.With the growth of time, neurons that depend on the car's features start to fire.
The first threshold is used for selection where some neurons are dependent on certain categories, meaning that in the training phase with respect to the initial weights, each of the 110 neurons in the model is dependent on one of ten categories, and To determine which neurons belong to which category, the firing rate of each neuron is examined for the input image of each cluster, and if that firing rate  is exceeded to a certain threshold, those neurons will be assigned to that category.The second threshold is used for the neuronal output potential.In fact, as each spike enters from the spike series matrix, the category neuron's potential for that matrix increases, and when this potential reaches to a certain threshold, the input image falls into that category.10 shows the method for determining these two thresholds.In this figure the median accuracy of 10 categories in subordinate level for the model in NST and CNT.As can be seen in 10, maximum accuracy is achieved at threshold levels of 175 for NST and 870 for CNT.The accuracy is 86% in this value of thresholds for subordinate level.As well as, 11 shows some of the neurons of the 110 neurons that belong to certain categories.
At the test stage, the output hardware is specified based on the level of the categorization.The categorization is done as follows: For each spike entrance, some amounts are added to the potential of the last layer neuron.In case this potential is reached to a threshold value, the input image is classified into this neuron category.The potential growth in the output neurons is demonstrated in Figure 12.The categories are started to be separated according to the firing rate of neurons related to each category over time.Finally, we obtain the accuracy and pace of the model for the three levels of categorization.Maximum accuracy is for superordinate level and its value is 93%.For all categories at the basic level, the accuracy of the ranking is above 89.2%.At subordinate level, the highest  amount of accuracy is related to the spider with 90% and the lowest is for cat, which is 83%.If we compute the median of subordinate level accuracy, it will be 86.2%, which is the lowest efficiency in the categorization levels.In terms of the pace of categorization, superordinate is faster than other levels.Note that in the test phase, images were applied to the model upright and inverse.Figures 13 and 14 show model accuracy and pace for vehicle and animal at three levels of categorization.It can be observed in these charts, accuracy is very high in all superordinate levels (93±1% for vehicle and 93.5±2% for the animal).In terms of accuracy and model time, arrange of categorization is as follows: superordinate, basic and subordinate.Note that these graphs are obtained after 10 times implementing the model with random weights.
Figure 13A shows accuracy and model time for the vehicle, ground vehicle, car and motorcycle at upright and . Some features related to the categories of car, fish, helicopter and dolphin.In the car, these features represent the wheels and for fish, these features represent dorsal, ventral and caudal fins.Inverting the images has no effect on these features.However, for helicopter and dolphin, features are selected to the overall shape of the cabin and the dorsal fin respectively.Inverted status cannot be represented in these categories.
inverted state.The difference between the three levels is quite significant (p<0.05).However, this has no effect on the arrangement of levels advantages, because both of them are at the subordinate level.In figure 13B, the accuracy of the basic level and subordinate level at the category of aircraft and helicopter are not significant and the reason is that the number of neurons selected for the airplane is much higher than the helicopter.This is due to the higher diversity of airplanes in comparison with the helicopters.However, basic advantage versus subordinate, in terms of the helicopter, can be shown significantly.Figure 14 illustrates the accuracy and the model time for animals in three levels of categorization.For all categories, it can be seen the arrangement of superordinate, basic and subordinate respectively.The reason for low and high signification in some categories is that in the subordinate level the number of neurons selected to each category is different.
In the inverse mode, in some categories, there is a sharp drop in precision, and in some cases, there is no change.The reason is that the model is trained for upright images and certainly for reverse images, the accuracy is reduced and the diagnostic time also increases.Figure 15 illustrates some features are selected to car, fish, helicopter and dolphin.As you can see, rotation in some features does not impose any changes.

IV. DISCUSSION
In this paper, the issue of levels of categorization has been considered.First, by a rapid psychophysical test, the levels were examined in the human vision system; that the result of this experiment was a superordinate advantage.Then a classification model based on the vision system with time was introduced.This model works in that way, for each category, the neurons are selected to that category begin to fire and in the categorization layer depending on the level of the category, the neurons associated with that layer, are connected to the cluster neuron of the last stage.This model is trained with 110 neurons and 60000 iterations for 100 images per category.Each individual neuron is dependent on some features of data.Next 50 images were employed for selecting the neuron attributed to each category, and at the last, 50 images are used for the test phase.The designed model shows that the superordinate level is faster and more accurate than the other levels.The possible reason for this in the model is that a higher number of neurons are involved in the classification at higher levels.Hence, the firing rate in the categorization of higher levels increases, and as a result, the categorization pace goes up.
As noted in the introduction, according to previous studies there are different opinions about categorization levels.But in some of these studies, it has been also reported the basic level advantages.Actually, the human brain system is a unique hierarchical system and has the same function for categorizing objects.Different factors are the reason for the difference in opinions.Stimulus presentation time, stimulus complexity and diversity of the stimulus are the main reasons for these differences.They were designed two experiments.In the first experiment, the stimulus presentation was only 25ms and the mask was presented 250ms after that and in the second experiment, the stimulus was presented 250ms, followed by a mask.It was actually considered respectively superordinate advantage at the first experiment and basic advantage in the second experiment.They attributed this to the fact that the coding time of the information was separate and independent from the VOLUME 9, 2021 time of categorization.Moreover, superordinate advantages were presented in the explained model in this research that associated a rapid categorization experiment and it is because of changes in the function of the brain in the change of category type.When category type is changed, the hardware of the brain also changed and the IT area is selected the different number of neurons for categorization.
Another important look at this issue is the size of the category area.At higher level, the inter-class area becomes larger, and therefore the classification becomes simpler.In fact, according to this that how enormous is intra-class domain, the speed of categorization can be affected [19].
At higher levels, more areas are included in the categorization which was introduced by Bamer and Just in 2017 [30].They examined two levels of basic and subordinate and expressed the basic level advantages.In their opinion, basic has advantage because humans classify more basic level in their life and are experts in basic classification.As it is shown in the test phase, the accuracy is reduced and detection time is increased by applying images in inverted mode, and the decrease almost is the same for all three levels.In fact, the reason for this decline is that the model is trained with upright images.At the time of training, the model learns some features of each category.In the test phase with inverted images, if those features are changed with inverted images, accuracy decreases sharply.However, if those feature in inverted images to be presented, accuracy is changed gradually.For example, in the car category, most features are wheels and the wheel does not change with inversion.Hence, car accuracy and model time decreases very low and those neurons are to wheel also fire.These changes can be seen in the same way at higher levels.As is seen in Figure 15, there are no significant changes in reverse mode for the car and fish features.Whereas associated feature is lost in reverse mode for helicopter and dolphin.Therefore, the type of categories determines their accuracy for the inverted state and this experiment cannot be determined category pathway for each level.

FIGURE 1 .
FIGURE 1.Samples showing some images used in the trial.The three categorization tasks are framed according to the stimulus category: blue or the superordinate level; gray for the basic level; and green for the subordinate level.For each level of categorization, the target category s illustrated on the left and the distractor category on the right.

FIGURE 2 .
FIGURE 2. Psychophysics categorization task.A fixation point illustrated for 1000ms to focus on the subject.After that, categorization label onset for 800-1200ms.Next, a gray 256 × 256 image named stimulus is flashed for 25ms followed by a noise mask for 100ms.At the end of the experiment, the subjects are asked to respond to whether the presented image matched with the shown label or not using 'YES' or 'NO' keys on a keyboard.

FIGURE 3 .
FIGURE 3. Accurate than other levels.A) Accuracy for three levels of categorization.It can be see that superordinate is more accurate than other levels.B) Unlike accuracy, reaction times for superordinate level is lower than another level that means superordinate is the fastest level.These two images show superordinate advantage.

FIGURE 4 .
FIGURE 4. Here, S1 layer in the entrance of model.In this layer input image with five scale convolved with four Gabor-Filter in four directions and outputs of this layer is 20 images [27].

FIGURE 5 .
FIGURE 5. Here, C1 layer in model.Maximize operation perform in this layer.The output corresponding to this layer is 4 matrices with size of 75 × 50, 53 × 35, 38 × 25, 26 × 17 and 19 × 12 which maximize operator performing with a 7 × 7 frame.Next, a maximum operator is performed on all orientation at each scale and this maximum scale, orientation and coordinate is stored is a matrix with that latency.The latency is equal to inverted of that maximum value[27].

FIGURE 6 .
FIGURE 6.Some of the image stimuli and hierarchy of three categorization levels.Levels are determined by different colors.Blue refers to superordinate, gray highlights the basic, and subordinate is in the green border.

FIGURE 7 .
FIGURE 7. Final reconstruction of some neurons selectivity to airplane, car, butterfly, fish, helicopter and motorcycle.

FIGURE 8 .
FIGURE 8. Three structure of classifier for categorizing three levels of categorization.(A) Categorization structure for the superordinate level.(B) Categorization structure for the basic level (C) Categorization structure for the subordinate level.

FIGURE 9 .
FIGURE 9. Car firing rate for one of 110 neurons.Scatter plot is shown firing rate during the time per neuron.Diagram illustrated maximum firing rate for each neuron for car input.

FIGURE 10 .
FIGURE 10.Median accuracy for 10 categories at subordinate level in CNT and NST.The accuracy of the subordinate-level has been used to obtain the best threshold.the best values of these thresholds are calculated at this way which the CNT was changed from 100 to 1500 with steps of 10, and NST was alternately changed from 100 to 250 with 5 steps and at all these steps, the accuracy of subordinate level will be counted.The maximum amount of accuracy obtained at CNT = 870 and NST = 175.

FIGURE 11 .
FIGURE 11.Neurons that selected for some categories at the 176 threshold values.Car in red, motorcycle in green, helicopter in blue and butterfly in yellow.These neurons at the training phase are selected to features of each category is form unsupervised mode.

FIGURE 12 .
FIGURE 12. Potential of output neurons for different data at three levels of classification over time.As time increases, these categories are separated from each other and classifier is able to recognize them.The shapes around the chart illustrate detaching this data at four different times.This separation is examined using a K-means classifier with 10 classes.It may be seen that the categorization becomes more precise over time.

FIGURE 13 .
FIGURE 13.Accuracy and model time for vehicle input in 10 runs of the model.A) The accuracy of car and motorcycle at upright and inverted images in three levels of categorization.B) Model time of car and motorcycle at upright and inverted images in three levels of categorization.C) The accuracy of airplane and helicopter at upright and inverted images in three levels of categorization.D) Model time of airplane and helicopter at upright and inverted images in three levels of categorization.

FIGURE 14 .
FIGURE 14. Accuracy and model time for animals input in 10 runs of the model.A) The accuracy of dog and cat at upright and inverted images in three levels of categorization.B) Model time of dog and cat at upright and inverted images in three levels of categorization.C) The accuracy of dolphin and fish at upright and inverted images in three levels of categorization.D) Model time of dolphin and fish at upright and inverted images in three levels of categorization.E) The accuracy of spider and butterfly at upright and inverted images in three levels of categorization.F) Model time of spider and butterfly at upright and inverted images in three levels of categorization.
Poncet and Thorpe in 2014 are decisive have stated superordinate level advantage.They carried out several experiments, in which experiments change the flashing time of stimulus and change the type of categorization were visible [17].Mack and Palmeri in 2015 were rejected the theory of Poncet and Thorpe.They introduced superordinate advantages in rapid categorization same as Mack and Palmery but they explained another result in long flashed stimuli.