Several editorials in this journals have focused on why so little neuroscience data is being shared and what can be done to improve this situation.Footnote 1,Footnote 2 This editorial is on a different challenge: what data should be shared and how should this data be annotated and classified to promote efficient brain research? If the reader’s first reaction to this statement is that surely this is a well understood problem then please read on because I have got news for you: the traditional neuroscience research paradigm that emphasizes hypothesis-driven approaches is ill adapted to study real brains. This is because brains are degenerate systems.

Degeneracy is the ability of elements that are structurally different to execute the same function or produce the same output.Footnote 3 This should not be confused with the more familiar concept of redundancy, which describes systems where identical elements are replicated so that if one fails another can take over the function. In his seminal 2001 paper,3 that should be required reading for all biologists, Gerald Edelman describes 21 examples of degeneracy in different biological systems. Many of these are general cellular properties, but six fall within the specific scope of neuroscience including behavior. The ultimate example of degeneracy is interanimal communication, with the multitude of human languages all serving the same function.

Degeneracy is related to complexityFootnote 4 in that more degenerate systems are more complex, but it is not a general property of complex systems. Edelman3, however, argues that degeneracy is an essential property of all biological systems because they had to evolve. Without degeneracy it would be very difficult for living organisms to compensate for deleterious mutations and, because many random mutations will result in some loss of function, this implies that evolution would on average result in less fit individuals. Of course many lethal and disease generating mutations are known, but most mutations are relatively innocent because degeneracy allows for compensatory adjustments. Conversely, some mutations may lead to improved adaptation to environmental conditions and become a selective advantage, a process called evolution… An additional advantage of degenerate systems is that they allow for more flexibility: although different entities may be able to perform the same function, they often do so with small differences. Therefore, depending on prevailing conditions, one type may be favored over another because of its improved performance.

Many hypothesis-driven neuroscience studies can be summarized as ‘we observed property X in system Y and hypothesized that entity Z is causing X’ followed by a series of experiments that confirm the second part of the statement. Examples are attributing specific functions to ion channel type Z in producing excitability property X in neuron type Y, and neuron type Z or synaptic connectivity Z in brain structure Y causing behavior X. The fallacy of ‘proving’ such an hypothesis in a degenerate system is that it provides incomplete information about structure Y, because it ignores both the many other functions and properties that Z may contribute to and the involvement of other elements in causing X. In other words, most brain functions depend on the dynamic interaction of many actors in a flexible manner. Take, for example, mechanisms contributing to synaptic plasticity. Edelman 3 reminds the reader that many presynaptic and postsynaptic mechanisms are involved in synaptic plasticity and goes on to say “The complexity of the system includes many sites at which a variety of changes can modulate synaptic efficacy in a similar manner. Whenever evidence for each of these changes has been sought experimentally, it has been found.” This makes synaptic plasticity an archetypical example of degeneracy where many (competing) hypotheses may be true at the same time. This probably explains why in cerebellar learning, depending on the experimental setup, researchers find that cerebellar long-term depression is sometimes essential to learn conditioned behavior and sometimes not.Footnote 5 Returning to specific functions of channels, it has been well established both for voltage-gated and synaptic channels that very specific neuronal properties can be produced by different combinations of channels acting together.Footnote 6

The consequences of degeneracy for neuroinformatics are complex but two issues stand out: the problems of selective data in support of specific hypotheses and of generating classification schemes for degenerate systems. Although there is an increasing interest for big data approaches in neuroscience,Footnote 7 many leading neuroscientists are still outspoken believers of the hypothesis-driven paradigm. It is also a good strategy to get work published in high profile journals, especially if the work is presented as a nicely packaged story. The limited value of proving a specific hypothesis in a degenerate system has already been described, but from a neuroinformatics viewpoint the hypothesis-driven paradigm also introduces a serious problem of data bias. Scientists will only collect data in support of their hypothesis combined with ‘control’ experiments. Alternative hypotheses are rarely investigated and, in general, there is no incentive to do a complete description of the system. As a consequence, data about most systems under investigation in neuroscience, at any level, are incomplete and have a bias to emphasize differences and underreport commonalities between related elements. In other words, the neuroscience literature structurally hides the degeneracy of the system.

These are familiar problems to computational neuroscientists. Because computational models require many parameters and a complete analysis is rarely available, a wide range of sources need to be consulted and compared. In most cases, unfortunately, one is confronted not only with missing data but also with contradictory evidence in the published literature.2, Footnote 8 Typical examples include divergence in measurements of specific properties, sometimes even between publications from the same laboratory, or dispute about the presence of specific voltage-gated channels in a type of neuron. While some of this can be attributed to – again de-emphasized - biological variability (see further), it is more often caused by differences in experimental design related to the hypothesis being investigated. In a literature search this hypothesis-driven context is still visible, but this may no longer be true if the data is collated in a database.

The biased data collection in neuroscience poses challenges but also offers opportunities to neuroinformatics databases. It is a fundamental error to assume that data is context free. Not only should neuroscience data be accompanied with a description of how the data was collected, which techniques were used, composition of drugs and media, etc., but also with its scientific context. In principle referring to the original publication is sufficient but this is not efficient because the publication is not always easily accessible and, more importantly, because it is not machine searchable.Footnote 9 The Resource Identification Initiative to develop standardized descriptions of methodsFootnote 10 may facilitate the incorporation of experimental methods together with the data, even in the case of single measurements being stored, but does not yet offer a solution to the hypothesis context. A neuroinformatics opportunity is that properly set up databases can flag some of the missing data about specific systems and can prompt research to ‘fill the holes’. In cases where there are definite lists of properties or measurements that characterize an entity, missing entries will be obvious. Otherwise one can use data mining techniques to predict missing data by comparing over similar elements.

Degeneracy also has fundamental consequences for schemes that try to classify neuroscience entities, an important target of standardization efforts in neuroinformatics. This is especially apparent for neurons where it seems that the diversity is so great that attempts to create a definite classification may fail. This is an aspect of degeneracy that Edelman 3 did not focus on, although he does point out that within a tissue no individual differentiated cell is indispensable. However, in the nervous system, the multiplicity of function is complemented by a seemingly endless variability of some of the elements like neuron types. It is well known in the cerebellar literature that while inhibitory stellate cells and basket cells seem very different in morphology and details of their connectivity, they are really the extremes of a continuous distributionFootnote 11 and as a consequence they are now commonly grouped together as molecular layer interneurons. In cortex, though the discussion has not been fully settled, it has been difficult to agree on a definite classification of cortical interneurons as has been attempted.Footnote 12 In general, intermediary cell types are observed that seem to defy the classification scheme, whether one uses schemes based on morphology, physiology and molecular markersFootnote 13 or on gene expression.Footnote 14 Conversely, a recent big-data studyFootnote 15 reports that it is sufficient to use 3 simple connectivity motives to come to a definite classification, suggesting that also in cortex it may be more useful to consider broader groupings of cell types instead of clearly separable types.

In conclusion, current neuroinformatics efforts have often been conceived based on a classic approach to neuroscience research and have not fully considered the challenges of investigating and describing a complex, degenerate system. This may result in neuroinformatics resources that fail to capture the true properties of the brain.