Embodied Discovery Models

Embodied

Our science is rooted in the idea that understanding - and by extension, intelligence - can only be obtained trough the interaction with the environment. Approaches like LLMs, that learn from symbols (i.e. words), cannot understand what those symbols mean because they never experience the real world that gives meaning to those symbols. They just learn to place one word after another based on how many times they have seen that combination of words in their training corpus.

Hence, to give understanding to our models, we need to build them with the idea that they must experience their environment. The manipulation of symbols (i.e. words) will come at a later stage, as a byproduct of the abstractions each model builds of its environment.

To accomplish this, we need to give each model a body, a real world (or digital world) set of sensors and actuators that allows them to observe and interact with their environment. This set will delineate what kind of world each model experiences, as each model will be able to react only to what it perceives, and only in ways its actuators can execute.

With these limitations, it is clear that the body is a critical part of the design of any model. Hence, the question is: how can we build an algorithm able to control any body? The answer: The Embodiment.

Empowering All Bodies

To adapt each model to control a different body, but keeping the same fundamental algorithm as "brain", an interface layer is mandatory. This interface is what we called The Embodiment.

The Embodiment is a communication layer that translates raw input signals into signals that our algorithm understands. For that, we decided to use Sparse Distributed Representations (SDRs), inspiring ourselves in the work of Jeff Hawkins and its Thousand Brain Theory.

The goal here is to allow us to develop a general algorithm that only needs to know how to deal with SDRs. Hence, we can focus the development of our algorithm into how to learn from the world, generate abstractions, produce intelligence, etc.., leaving the nitty-gritty of how to deal with different data types, how to adapt to different bodies, etc... to the Embodiment.

This setup allows us to develop one algorithm able to control any body it is deployed on. Then, each body will have its own model produced by letting the same algorithm control an specific body, and learning how that body works. This will work in a similar fashion as brains work, where the fundamental algorithm of a brain is the same in each human, but then each brain behaves differently based on the body (and experience) it controls.

Sparse Distributed Representations

Sparse Distributed Representations[1,2], or SDRs, are a vector-based data type able to represent any kind of information. Moreover, recent studies have shown that the brain works with SDRs, as the activation of neurons in the brain conform with the definition of an SDR.

Given their nature, there are mechanisms to translate data sources as different as images, waves, or plain numbers, into SDRs in an efficient way. This versatility makes them perfect as an interface between any complex body and a unified algorithm serving as "brain", because it allows the algorithm to focus on working with SDRs independently of the multiple different interpretations those SDRs could represent in the real world. It is the Embodiment the one that knows what those SDRs represent, hence being the one making the effort to translate between the body and the brain.

The only downside of using SDRs is the loss of continuity, as they are discrete by nature. This should not be a problem, as the real world is discrete also (even although it appears continuous to us), given that everything is made of atoms, a discrete unit.

[1] Ahmad, S., & Hawkins, J. (2016). How do neurons operate on sparse distributed representations? A mathematical theory of sparsity, neurons and active dendrites. arXiv preprint arXiv:1601.00720.

[2] Cui, Y., Ahmad, S., & Hawkins, J. (2017). The HTM spatial pooler—A neocortical algorithm for online sparse distributed coding. Frontiers in computational neuroscience, 11, 111.

Discovery

Having an embodied algorithm is a first step towards building intelligent models. However, it is not enough to just have a body. The algorithm also needs to learn in the proper way: by Discovery.

Learning from canned samples (i.e. datasets) is like learning from suddenly appearing in different scenarios without transition or cause. This only hampers learning because the algorithm is not observing the transition that leads from one scenario to the other, hence loosing any concept of causality.

To solve this problem, it is fundamental that the algorithm learns from an interactive setting: the algorithm should be an agent that not only observes the environment, but that also acts over it and observes how its actions affect it. With this interactive setup, the algorithm is capable of learning causality and hence discovering how its body works and how its environment reacts to its actions.

However, having an interactive setting for learning is not enough. If we later freeze the training to do "test" runs, like current AI do, we are still impeding the development of intelligence. In current algorithms, this is a limitation of the learning mechanism - backpropagation - but it shouldn't be. To proper develop intelligence we need a setup similar to what humans do: learn continuously. That setup is what has been called continuous learning.

Continuous Learning As A Starting Point

Continuous learning is desirable for three main reasons: it allows to adapt to unforeseen circumstances, it allows to learn new tasks while executing the already known ones, and it allows to simplify knowledge transfer scenarios.

First, thanks to the fact that the model is continuously learning, it is able to adapt to unforeseen circumstances. It can do so because it can analyse the new situation, process its options, but most importantly, it can learn in real time what are the characteristics of the new situation by trying and testing. This gives the model a flexibility that is impossible to achieve if the knowledge is frozen in time in a version that never encountered this new situation. Also, it allows to solve this new situation again in the future without needing to analyse it again.

Second, thanks to its capability of being always learning, the model is able to learn new tasks while executing the already known ones, hence adapting to changing requirements. If the model were to be frozen in time, this would be impossible after training, and you would need a new model to perform the new tasks. This increases the adaptability of any model trained in a continuous learning setup, and its usefulness.

Finally, this setup simplifies knowledge transfer scenarios in which a model trained in a digital simulation is later deployed in the real world. Once deployed in the real world, the model keeps learning, and thus is able to better adapt to the small (or not so small) differences between what it was perceiving in the simulation, and what it can perceive now from the real world. This is a great advantage because it simplifies model development substantially, allowing for an initial discovery of the body in a controlled, digital environment before moving to the expensive real world body.

Given these arguments, an algorithm able to learn under a continuous learning setup is fundamental to develop intelligent models. However, up to date, the most effective algorithm for learning we have develop - backpropagation - is not suited for it.

Backpropagation As A Stopper

Backpropagation has three main problems that invalidates it as a continuous learning mechanism: efficiency, stability, and catastrophic forgetting. First, backpropagation is a brute force approach to learning, it needs to process thousand of samples only to learn basic things like numbers, while humans are able to learn them only observing a couple of samples. This inefficiency makes it unsuitable for learning in a continuous setup because it requires thousands of millions of attempts only to learn a basic policy, and in a continuous learning setup you cannot attempt to do the same thing that many times because the environment changes.

Second, Backpropagation has a serious problem of stability, by which it is necessary to cut the learning at some point or either the algorithm will not converge to a valid solution (due to the variability of the domain) or it will learn too much its training samples (a.k.a. over-fitting). This stability problem makes it unfit for continuous learning, and it's the reason that the "early stop" of LLMs training is so fundamental.

Finally, Backpropagation has a serious problem of catastrophic forgetting, that is, the fact that, when learning a new task, it forgets an old one. This problem arises specially in the continuous learning setup because in such setup you are not always learning the same task, but you are continuously learning new tasks. This problem is derived from its global learning approach, by which all neurons have to learn from the last sample, hence forcing neurons that learnt to solve other tasks to override their learning with the new task.

For all these reasons, an alternative to backpropagation was necessary as a learning mechanism for an algorithm that aspires to build intelligent models. Our bet is that such an algorithm has to be a local learning algorithm

Local Learning As An Enabling Tool

Local learning relates to the fact that each neuron (or computational unit) learns independently of the others and of the global error. Hence, each one of them learn based on the knowledge they can extract from its surrounding peers. That is why it is called local.

The key advantage of this kind of learning is that each neuron can decide when it learns, hence avoiding the forceful learning associated to global learning mechanisms. This allows each neuron to specialize, avoiding unnecessary updates that could produce catastrophic forgetting. This also allows the whole system to stabilize, as neurons will not be constantly updating their learning.

Another advantage of local learning is its ability to reduce the cost of learning. By producing fewer but more meaningful updates, a local learning mechanism is able to reduce the amount of learning the model needs. This in turn allows for a more intelligent learning, that does not need to process the same sample millions of times.

With all these advantages in mind, we have develop our own learning mechanism that we call Discovery Learning.

Discovery Learning

Discovery Learning is our patent-pending learning mechanism to develop intelligent models. It is a local learning mechanism specifically designed for continuous learning. It draws on the Reinforcement Learning, Predictive Coding (i.e. World Model), and Hebbian Learning theories, and on the Appreciation Thesis, to define how the updates are performed.

It's main focus is on reproducing the same learning that is produced in a baby: a baby does not observe the world passively, or learns how the world works by observing others play with it. It learns by playing. First, it plays with its body to learn how to control the body, what are the limits, what its painful and what its pleasure. Once the baby has got a hold of its body, then it starts to play with the world, trowing things to check how gravity (and consequence) works for example.

To reproduce that behavior, our algorithm does not learn like traditional AI, but instead discovers: its body, its environment, its limits, its world. It does not learn physic laws, it gets an intuition of how they work by playing with them. It does not learn to solve problems, it gets an intuition of how to solve them by playing with their components.

All of this is possible thanks to the intrinsic reward we developed inspired by the Free Energy Principle of Karl Friston, and the Learning Progress Hypothesis of Pierre-Yves Oudeyer. With this reward, there is no need to define a task reward from the outside to drive an agent's learning, but instead the agent will learn to solve the problem based on the intrinsic reward it finds in acquiring knowledge.

Finally, with this approach, our algorithm is able to learn "on the fly" how to control its body, how to solve problems, and how to reason, without needing costly pre-training processes, or any efforts to collect huge amounts of data. It just needs its body and its environment, and the Discovery Learning makes the rest.

Models

Once we have lay out the pieces of the puzzle (i.e. an Embodiment, and a Discovery Learning mechanism), it is clear that the traditional unit of computation used by most machine learning algorithms is not enough. The neuron used in the Artificial Neural Networks is ill suited to develop an embodied algorithm that learns locally and continuously. Hence, we needed to develop our own patent-pending computational unit, that we called Cortical Computational Unit (CCU).

This CCU is inspired by the definition of a cortical column in the Thousand Brains Theory of Jeff Hawkins, and as such it has connections in three key directions: down, to CCUs from the immediate lower level of the architecture; up, to the CCUs from the immediate higher level of the architecture; and lateral, to all the CCUs in its same level of the architecture.

With this unit, we can stack layers of CCUs, from the bottom layer that connects with the embodiment and process the most urgent, quick reaction inputs, to the top layer that process the reasoning and long term planning. These layers are organised in a stack where the input and output are at the bottom, allowing this bottom layer to be already capable of executing basic reactive agent policies. The goal of the upper layers is then to give depth to these policies, allowing for the creation of abstractions, the disregard of some inputs, the focus on others, etc... This in turn facilitates stability and avoids catastrophic forgetting by storing more general information in the upper levels of the stack.

A key fact here is that our algorithm creates new CCUs, and with them new layers, when needed. This allows for a dynamic development of the stack that better fits the specific body the algorithm is controlling. Hence allowing us to control any kind of body with the same algorithm, that will build a body-specific model by playing with the body and its environment.

Finally, a careful integration of the Discovery Learning mechanism into this architecture allows for the development of our Embodied Discovery Models, the next step in the road to Artificial General Intelligence.