What is the difference between degenerate and redundant




















This means that to self-evidence, it is necessary to find an accurate explanation for sensory observations that incurs the least complexity cost as indicated by the third equality above. Formally, the accuracy is the expected log likelihood of the sensory outcomes, given some posterior beliefs about the causes of those data. Complexity is the difference between these posterior beliefs and prior beliefs, that is, prior to seeing the outcomes. In essence, complexity scores the degree to which posterior beliefs have to move away from prior beliefs to explain the data at hand.

Alternatively, they can be thought of as the degrees of freedom that are used to provide an accurate account of sensory data. These terms inherit, by analogy, from free energy functionals in statistical physics. The energy here is the expected log probability of both the sensory consequences and their causes, under posterior beliefs.

The entropy is the uncertainty of those posterior beliefs. A failure to minimize entropy, and indeed complexity, in statistics is reflected in an over parameterization of the generative model, which leads to overfitting and a failure to generalize to some new sensory data. This is a pernicious sort of failure that plagues many applications in machine learning Hochreiter and Schmidhuber ; Lin et al. The free energy formulation of degeneracy and redundancy is elemental, and therein lies its significance.

Free energy is simply a way to articulate what function means, in relation to degenerate or redundant function. Function here entails maximizing model evidence. This licenses the use of belief updating—with an overtly representational stance—to define degeneracy and redundancy mathematically.

In what follows below, we relate these concepts—of entropy and complexity—to degeneracy and redundancy. For example, degenerate eigenvalue solutions in quantum physics mean that there are many linearly independent eigenstates that have the same eigenvalue Wheeler ; Garriga and Vilenkin ; Goold et al. Solving this problem underwrites nearly all Bayesian inference, namely, using prior constraints to resolve ill-posed problems, characterized by degenerate mappings between causes and consequences.

To measure this sort of degeneracy, we will associate function with self-evidencing. This means that degeneracy is measured by the number or variety of distinct causes that could produce the same outcome.

Mathematically, high degeneracy implies that the posterior probability or belief about causes i. This is precisely the entropy part of the free energy above. In short, this means that we can associate the entropy of posterior beliefs about the causes of our sensorium with degeneracy. Minimizing free energy therefore requires the maximization of degeneracy, under constraints of energy minimization. This is a ubiquitous conclusion that is found throughout statistics and physics, often referred to as the maximum entropy principle Jaynes ; Banavar et al.

Formally, this means our representations i. A completely degenerative mapping from causes to consequences has the highest entropy: here, a posterior over two mutually exclusive causes. To link this example to lesion studies, if I wanted to use some posterior beliefs to predict what I am going to do, the accompanying structural representations of a right- or left-handed movement are sufficient to produce that outcome.

However, if I am unable to represent either cause, I will be unable to realize the outcome. This follows from variational principles of least action, speaking to a unique and maximally efficient movement. The average efficiency can, in the present setting, be associated with redundancy via the equivalence of the principles of maximum efficiency Barlow ; Linsker and minimum redundancy Mohan and Morasso In terms of self-evidencing and free energy minimization, maximizing efficiency corresponds to minimizing the complexity of an inference possibly about the action that we are currently taking.

Previously, we have defined complexity as the difference between posterior and prior beliefs, that is, beliefs before and after seeing the outcomes Friston ; Kanwal et al. Therefore, large divergences from prior beliefs to posterior beliefs would incur a greater complexity cost, that is, have a larger redundancy. If a priori I believed I might use either hand—and in witnessing my action, I only used one—the difference between posterior and prior beliefs would be large high complexity cost.

In contrast, if my prior beliefs suggest that I use the hand nearest to the cup, and I use that hand, the difference between posterior and prior beliefs would be small low complexity cost. This complexity is another important part of free energy; therefore, minimizing free energy requires a minimization of complexity or redundancy.

This minimization manifests in many ways and—under some simplifying assumptions—directly implies the principles of maximum mutual information or the Infomax principle Linsker This formulation of degeneracy and redundancy has several consequences, some of which are quite revealing. Firstly, degeneracy and redundancy are well-defined measurable quantities, given some outcomes—and a generative model—under ideal active Bayesian observer assumptions.

Furthermore, they have the same units of measurement i. This means degeneracy and complexity are measured in the same currency and can be compared quantitatively. Additionally, they are both attributes of posterior probabilistic, subpersonal beliefs. Degeneracy is a statement about the uncertainty i. In virtue of the fact that one can measure the expected degeneracy and redundancy i.

In turn, this means that degeneracy and redundancy are context-sensitive attributes of a generative model—because belief updating depends upon the data or outcomes at hand.

This context sensitivity is important. In other words, what may be redundant in one context or experimental setting may not be redundant in another. Furthermore, the imperatives to reduce degeneracy and complexity are in opposition.

These constraints mean that neither degeneracy nor redundancy is a complete specification of function, when defined in terms of self-evidencing. This means that one cannot talk about minimizing degeneracy or redundancy without knowing the implications for how changes in posterior beliefs affect accuracy and energy. Having said this, there is one important exception. In the absence of sensory data, the free energy reduces to redundancy, because the accuracy term disappears, leaving only complexity.

This means that optimizing a generative model offline e. Crucially, one cannot change degeneracy without changing redundancy—and vice versa. In other words, redundancy can be interpreted as a cost that is offset by degeneracy, highlighting the opposing roles of redundancy and degeneracy.

This relationship also speaks to the pre-eminent role of prior beliefs in defining the relationship between degeneracy and redundancy—and their contributions to evidence or marginal likelihood.

These prior beliefs are the quantities that are optimized during learning, for example, experience-dependent plasticity Gershman and Niv ; Tervo et al. In short, changes to the structure can always be articulated in terms of changes to a prior Friston and Penny Thus, the changes to the priors necessarily change the posterior following belief updating , and changes in the two necessarily mean a change in redundancy and degeneracy.

From a practical perspective, given these definitions, it is clear that to quantify degeneracy and redundancy, one needs the entire belief structure—and belief updating—that accompanies experience-dependent learning. In turn, this means that it is necessary to measure the distributed aspects of probabilistic representations in the brain. Put another way, redundancy and degeneracy cannot be localized to a particular representation or neuronal system; they are properties of distributed representations that underwrite belief updating throughout hierarchical generative models or cortical structures.

By distributed representations we are referring to how sensory features of observed outcomes are being encoded neuronally. This may have important implications for understanding the impact of lesions—as scored through changes in degeneracy and redundancy. Note that in this quantitative treatment, degeneracy and redundancy are attributes of a function, namely, some perceptual recognition or overt action.

As noted above, this kind of degeneracy and redundancy can be context-sensitive and therefore cannot be inferred from anatomical connections alone—it can only be inferred from the functional message passing along these connections. In this sense, the empirical measurement of degeneracy and redundancy necessarily relies upon some form of functional neuroimaging or electrophysiology. In the remainder of this paper, we unpack some of the above points and pursue their construct validation using simulations of active inference, where we know exactly the form of belief updating and can measure degeneracy and redundancy.

Our aim is to illustrate the correspondence between these mathematical quantities and the use of these terms in the context of lesion studies.

In this section, our objective was to illustrate how learning or experience-dependent plasticity in neuronal structures i. For this, we chose a canonical paradigm in the neuropsychology of language, namely, word repetition Burton et al. The first step was to specify a generative model and active inference scheme that is apt for simulating this paradigm. We used a Markov decision process generative model of discrete outcomes that are caused by discrete hidden states: described extensively in Friston, FitzGerald, et al.

For technically minded readers, we have included a detailed description of the generative model and accompanying belief updates in the Appendix. These equations are a bit involved; however, the generative model on which they are based is very general—and can be applied in most settings— where outcomes and their causes can be expressed in terms of distinct i.

Through our simulations, we evaluated the minimization of redundancy, using a formulation of structure learning. In other words, after some suitable experience with—and learning of—a word repetition task, we withheld sensory information and adjusted the parameters of the generative model i. We then repeated the paradigm to quantify the effects of complexity and degeneracy. We anticipated that this targeted elimination of redundant parameters would selectively suppress redundancy in relation to degeneracy.

In the subsequent section, we use a similar manipulation but reduce nonspurious connectivity to simulate synthetic lesions. The paradigm used to illustrate the differences between degeneracy and redundancy was a word repetition task: The subject is presented with a single word e. If the agent repeats the word correctly, they are given a positive evaluation and negative otherwise. The generative model—comprising appropriate probability distributions—was deliberately minimal to illustrate the role of redundancy and degeneracy Fig.

The categorical probability distributions were based on an empirical understanding of how subjects respond in a word repetition paradigm Hope et al. Specifically, it is an expression of how experimental stimuli are generated during an experiment conditioned upon subject responses. We assume that subjects adopt similar models to mirror this process. This model can be plausibly scaled to account for larger lexical content i. Generative model. Graphical representation of the generative model for word repetition.

There are three hidden state factors, epoch, target word, and repeated word, and three outcome modalities, proprioception, evaluation, and audition. The hidden state factors had the following levels i. Epoch three levels indexes the phase of the trial. During the first epoch, the target word is heard. The second epoch involves repeating the word.

The third phase elicits a positive evaluation, if the repeated word matches the target word and a negative evaluation otherwise. The repeated word factor includes the words that our synthetic subject can choose to say five levels. The target word factor four levels lists the words the experimenter can ask the participant to repeat. The lines from states to outcomes represent the likelihood mapping, and lines mapping states within a factor represent allowable state transitions.

For clarity, we have highlighted likelihoods and transition probabilities that are conserved over state factors and outcome modalities. One out of a total of five example transition probability is highlighted for the repeated word; that is, the transition is always to blue, regardless of previously spoken word red, read, triangle, square, or blue.

Alternative actions then correspond to alternative choices of transition probability. In Figure 1 , the lines represent plausible connections and their absence reflects implausible connections , with the arrow denoting direction. The likelihood A —mapping between states i. When inversion of a model is formulated in terms of neuronal message passing, these connections take the form of extrinsic connections i. Each sensory outcome modality is associated with its own likelihood. If I believe it is epoch 2 and I am repeating the target word, then my mouth is moving.

Which of these is responsible for generating auditory input depends on the epoch in play? Positive evaluation is given at epoch 3, if I am repeating the previously heard word correctly. For example, if I am repeating triangle—after hearing triangle resp. The transition matrices, B —transitions among the hidden states encoding prior beliefs about trajectories or narratives—are represented by lines modeling transitions among states within each factor in Figure 1.

When interpreted as message passing between neural populations, these denote intrinsic connections i. These involve transitions to a specific word, where the word depends upon which action is selected. This means the target word stays the same over all epochs. The likelihood and transition matrices outlined above can themselves be learned over multiple trials of the word repetition task, resulting in changes in degeneracy and redundancy associated with belief updating.

The more often a given pairing of state and outcome or past and present state is observed, the greater the number of counts attributed to that pairing. By dividing the number of counts for each by their total, we arrive at the new learned probability distributions.

It is worth emphasizing two aspects of this accumulative process. The first is that it closely resembles Hebbian plasticity, where synaptic efficacy is incremented upon the simultaneous firing of a pre- and postsynaptic neuron. The second is that this plastic potential depends upon the number of Dirichlet counts assumed at first exposure to the task.

This number may be thought of as quantifying the confidence in prior beliefs about these conditional probability distributions and the resistance to moving away from these priors.

To ensure learning the initial Dirichlet concentration parameters, of both the likelihood and transition prior distributions, were set to 1 for plausible and 0. During learning, we hoped to demonstrate an overall reduction in redundancy—in relation to accuracy—and an increase in degeneracy, in relation to energy. This allows us to explicitly represent how redundancy may evolve over time within a system; our model learns to understand its environment allowing for redundancy to go down with appropriate evidence accumulation until it plateaus.

Please see Friston et al. The simulated subject was equipped with strong preferences measured in nats, i. Additionally, the subject was allowed to choose from a set of five different deep policies sequences of actions , each of which is a different permutation of how controlled state transitions might play out. We have associated redundancy with complexity, namely, the difference between posterior beliefs and prior beliefs about hidden states generating outcomes.

Using this definition, we can measure redundancy via complexity , given some outcomes and specified generative model, under ideal Bayesian observer assumptions. We start with a simple setup. For each trial the subject had to repeat one of the four possible words i. To illustrate the effects of learning on degeneracy and redundancy, we computed the free energy, posterior entropy i.

The results are shown in Figure 2 as a function of trials. As anticipated, there is a gradual reduction in free energy, that is, a large increase in model evidence due to experience-dependent learning. This is accompanied by a reduction in redundancy and a small reduction in degeneracy.

In other words, skill acquisition or structural learning under this word repetition task increases model evidence by reducing redundancy. In what follows, we turn to the effect of changing connections, not by experience-dependent learning but by selectively removing certain connections.

To examine this, we precluded further learning by preventing any further updates to the model parameters i. Learning-dependent changes in degeneracy and redundancy. This figure plots trial-specific estimates of free energy, degeneracy i. We next examined the effects of removing redundant model parameters or connections.

In brief, we simulated structure learning by removing the spurious hidden state i. This was implemented by setting the Dirichlet concentration to the same value i. Dirichlet concentration parameters. The scale goes from white high concentration to black low concentration , and gray indicates gradations between these. The top row represents the prior beliefs for both the generative model with A and without C the spurious level: The key difference to note is the prior beliefs for the model without the spurious level i.

We reran the experimental paradigm in the absence of redundant connections i. We then measured the total redundancy and degeneracy averaged over epochs, trials, hidden factors, and subjects. The results are shown in the first two rows of Table 1 , eight simulations with Y and without N spurious connections. For ease of visualization, the free energy, redundancy, and degeneracy are also shown as bar plots in Figure 4 , with and without the removal of spurious connections.

This table summarizes the simulations in terms of the respective free energy, redundancy complexity , accuracy, degeneracy entropy , energy, and cost measurements. These are measured in natural units. There are four sorts of simulations: control with no lesions , intrinsic lesions, extrinsic lesions, and both intrinsic and extrinsic lesions.

The simulations were repeated with Y or without N spurious representations. The values are averages calculated for all the state factors across each epoch and simulated subjects. Note that in order to appropriately simulate learning, trials were simulated per agent for the control group with the spurious level, relative to the 10 trials simulated for all other groups.

The effects of structure learning. This figure reproduces the data in the first two rows of Table 1. It highlights the effects of removing redundant connections i. The key point to take from these results is that selective removal of redundant connections decreases free energy increases model evidence driven by the decrease in redundancy.

The targeted elimination of spurious connectivity parameters selectively reduced redundancy in relation to degeneracy: The decrease in redundancy of 3. This is consistent with the interpretation of redundancy as a cost that is offset by degeneracy. It also demonstrates how prior beliefs define the relationship between degeneracy and redundancy: Differences in priors produce differences in posteriors—as exemplified by the second row of Figure 3 posterior beliefs in the spurious model are less precise, compared to posterior beliefs in the model without spurious representation.

In short, changes in priors and posteriors necessarily entail changes in redundancy and degeneracy. Furthermore, minimizing redundancy by changing posterior beliefs has a direct impact on accuracy of the generative model, here, an increase in accuracy of 0.

However, as detailed below, this has little impact on behavior postlearning , due to the very similar belief updating under the two models Fig.

Belief updating for model with left panels and without right panels the spurious level. Each panel reports belief updating over three epochs of a single trial, for the factors epoch, target word, repeated word when repeating the word red.

The x -axis represents time in seconds divided into 3 epochs , and the y-axis represents posterior expectations about each of the associated states at different epochs in the past or future. Brain Struct Funct. Information geometry on complexity and stochastic interaction. Applications of the principle of maximum entropy: from physics to ecology.

J Phys Condens Matter. Barlow H. Possible principles underlying the transformations of sensory messages. In: Rosenblith W , editor. Sensory communication. Google Preview. Barlow HB. Inductive inference, coding, perception, and language. Canonical microcircuits for predictive coding. Bernstein NA. The co-ordination and regulation of movements. Oxford : Pergamon Press. The anatomy of auditory word processing: individual variability. Brain Lang. Clark A. Surfing Uncertainty: prediction, action, and the embodied mind.

Oxford : University Press. Consequences of degeneracy in network function. Curr Opin Neurobiol. Active inference on discrete state-spaces: a synthesis. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control.

Nat Neurosci. The Helmholtz machine. Neural Comput. Is the P component a manifestation of context updating? Behav Brain Sci. Bayesian brain: probabilistic approaches to neural coding.

Friston K. A free energy principle for a particular physics. Friston K , Buzsaki G. The functional anatomy of time: what and when in the brain. Trends Cogn Sci. Active inference and learning. Neurosci Biobehav Rev. Active inference: a process theory. A free energy principle for the brain.

J Physiol Paris. Friston K , Penny W. Post hoc Bayesian model selection. Friston K , Price C. Degeneracy and redundancy in cognitive anatomy. Active inference and epistemic value. Cogn Neurosci. The anatomy of choice: dopamine and decision-making. The free-energy principle: a rough guide to the brain? Active inference, curiosity and insight. The graphical brain: belief propagation and active inference. Network Neuroscience. The mismatch negativity: a review of underlying mechanisms.

Clin Neurophysiol. Garriga J , Vilenkin A. Many worlds in one. Physical Review D. Gershman SJ. Predicting the past, remembering the future.

Curr Opin Behav Sci. Reinforcement learning and episodic memory in humans and animals: an integrative framework. Annu Rev Psychol. Gershman SJ , Niv Y. Learning latent structure: carving nature at its joints. The role of quantum information in thermodynamics—a topical review. J Phys A Math Theor. The energy landscape of neurophysiological activity implicit in brain network structure.

Sci Rep. The "wake-sleep" algorithm for unsupervised neural networks. Hobson JA , Friston K. Waking and dreaming consciousness: neurobiological and functional considerations. Prog Neurobiol. Hochreiter S , Schmidhuber J. Flat minima. Hohwy J.

The self-evidencing brain. Dissecting the functional anatomy of auditory word repetition. Front Hum Neurosci. Hutter M. Universal artificial intellegence: sequential decisions based on algorithmic probability. Isomura T , Friston K. In vitro neural networks minimise variational free energy. Itti L , Baldi P. Bayesian surprise attracts human attention. Vision Res. Jaynes ET. Information theory and statistical mechanics. Phys Rev Ser II. Comparing information-theoretic measures of complexity in Boltzmann machines.

Bayes factors. J Am Stat Assoc. Kelso JS. Multistability and metastability: understanding dynamic coordination in the brain. Keep your options open: an information-based driving principle for sensorimotor systems. PLoS One. Knill DC , Pouget A. The Bayesian brain: the role of uncertainty in neural coding and computation. Trends Neurosci. Laughlin SB. Efficiency and complexity in neural coding. Novartis Found Symp. Why does deep and cheap learning work so well? Journal of Statistical Physics.

Linsker R. Perceptual neural organization: some approaches based on network models and information theory. Annu Rev Neurosci. Divide et impera: subgoaling reduces the complexity of probabilistic inference and problem solving. J R Soc Interface. Quantification of degeneracy in Hodgkin—Huxley neurons on Newman—Watts small world network.

J Theor Biol. Mohan V , Morasso P. Passive motion paradigm: an alternative to optimal control. Front Neurorobot. Morlet D , Fischer C. MMN and novelty P3 in coma and other altered states of consciousness: a review. Brain Topogr. The mismatch negativity MMN in basic research of central auditory processing: a review. Optican L , Richmond BJ. Temporal encoding of two-dimensional patterns by single units in primate inferior cortex.

II information theoretic analysis. J Neurophysiol. Precision and false perceptual inference. Front Integr Neurosci. Parr T , Friston K. The computational anatomy of visual neglect. Cereb Cortex. Computational neuropsychology and Bayesian inference. Degeneracy and cognitive anatomy. Active inference: demystified and compared.

Evidence for surprise minimization over value maximization in choice behavior. Schwartenbeck P , Friston K. Computational phenotyping in psychiatry: a worked example. Seifert U. Stochastic thermodynamics, fluctuation theorems and molecular machines.

Rep Prog Phys. Information and efficiency in the nervous system-a synthesis. PLoS Comput Biol. Shipp S. Neural elements for predictive coding. Front Psychol. The hippocampus as a predictive map. Thermodynamics of prediction. Phys Rev Lett. Planning to be surprised: optimal bayesian exploration in dynamic environments. Berlin, Heidelberg : Springer Berlin Heidelberg , pp. Reinforcement learning: an introduction. Toward the neural implementation of structure learning.

Testolin A , Zorzi M. Probabilistic models and generative neural networks: towards an unified framework for modeling normal and impaired neurocognitive functions. Front Comput Neurosci. Tononi G , Cirelli C. Sleep function and synaptic homeostasis. Sleep Med Rev. Measures of degeneracy and redundancy in biological networks. Information processing in decision-making systems. With an eye on uncertainty: modelling pupillary responses to environmental volatility. Wald A. An essentially complete class of admissible decision functions.

Ann Math Stat. Wheeler JA. Information, physics, quantum: the search for links. Whitacre J , Bender A. Degeneracy: a design principle for achieving robustness and evolvability. Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide. Sign In or Create an Account.

Sign In. Advanced Search. Search Menu. Article Navigation. Close mobile search navigation Article Navigation. Volume Article Contents Abstract. Degeneracy, Redundancy, and Active Inference. Simulations of Word Repetition. In Silico Lesion Studies. Concluding Comments. Software Note. Degeneracy and Redundancy in Active Inference. Noor Sajid , Noor Sajid. Wellcome Centre for Human Neuroimaging.

Email: noor. Oxford Academic. Thomas Parr. Thomas M Hope. Cathy J Price. Karl J Friston. Revision received:. Select Format Select format. Permissions Icon Permissions. Actually, there are 3 stop codons, but these have identical 'meanings' to the cell, and are therefore best counted as one. The fact that DNA could accommodate a larger variety of meanings namely, 64 than it does namely, 21 means that, in a technical sense, the DNA code is degenerate.

The redundancy of the DNA code refers to the fact that either strand of the double-stranded DNA is theoretically enough to specify all the information in the genome. This is true because the two strands simply mirror one another - they're complementary.

If the other strand doesn't technically hold any additional information, why does the body persist with it? Chiefly, the complementary strand is useful to the check for, and correct, errors. Having two copies of an important document is vastly preferably to having only one. If you spill coffee on one copy and can't make out some of the words, you can use the remaining copy to fill in the gaps.



0コメント

  • 1000 / 1000