A major problem faced by an RL agent is how to determine the relevant states and actions in the first place: when faced with noisy sensory information from the world, how does the agent determine the
relevant features that constitute a state, and then identify what are the relevant actions in that state? 27, 28 and 29]. This problem is essentially one of perception and sensorimotor learning, as it depends on the capacity to segment and identify relevant objects, contexts LGK-974 clinical trial and actions 30, 31, 32 and 33]. One approach to this problem involved setting up an experimental situation in which a given stimulus has multiple dimensional attributes (e.g. shape, color, motion). Inspired by earlier cognitive set-shifting tasks 34 and 35], one of these dimensions is unbeknownst to the participant, selected to be ‘relevant’ in terms of being associated with a reward, and the goal of the agent is to work out which attribute is relevant, as well as to work out which exemplar within an attribute (e.g. a green color vs a red color) is actually reinforced 36 and 37]. Bayesian inference or RL can then be used to establish the probability
that a particular dimension is relevant, which can then be used to guide further learning about the value of individual exemplars within a dimension. The ability to construct a simplified representation of the environment focused only on essential details reduces the complexity high throughput screening of the state-space encoding
problem. One way to accomplish this is to represent states by their degree of similarity to other states either via relational logic [38], transition statistics [39•] or feature-based metrics 40 and 41]. Furthermore, generalized state-space representations can speed up state-space learning considerably by avoiding the time cost of re-learning repeated environmental motifs (if I learn how to open my first door, I can Loperamide generalize this to all doors). RL agents ‘in the real world’ can suffer from a dimensionality problem in which there are too many states over which to integrate information to make decisions let alone learn [42]. It has been proposed that state-space structures be compressed in order to make calculations tractable. In particular, multiple actions (and their interceding states) might be concatenated into ‘meta-actions’ or, more generally, ‘options’ [43]. Decision policies would be developed over these options rather than individual actions thus reducing the computational complexity of any policy-learning algorithm.