On May 12, 2022, Google’s DeepMind released Gato, which they described as a “generalist agent.” They were able to use a single transformer architecture to train an agent that could perform 600 different tasks including captioning images, playing video games, and moving real robotic arms.
In my opinion, this does seem like it could be a path towards a kind of general AI. There are some serious limitations that will prevent it from being the kind of intelligence that humans possess. One limitation is that it will be a single thread for sequential action. Human intelligence involves multiple sequential threads.
Transformers can likely be used to do some “thinking,” but the symbolic output needs to be transformed by a different kind of network into a parallel program for execution (consider the diffusion model of DALL-E 2). This would be akin to the motor system in the human brain. It would also be necessary to provide an embedding for time so that the model can output symbolic time data to help in the formulation of the action plan.
Gato seems like a fruitful first step in building more general agents, and we can be certain that scaling it to a high number of parameters will provide impressive results.