Was 2024 the year of embodied AI?

Blog

2023 was the big breakthrough year for generative AI – sometimes with superhuman and a creative ability to generate and analyse texts and images, even videos.

Many predicted that 2024 would be the year when we would make a similar breakthrough in the ability to control machines that move, and manipulate the physical world. So far, this has only been possible with narrow AI systems, designed to do very specific tasks. They often fail completely when the problem looks a little different to what they have been trained for. Many believe that 2024 will be the year of large-scale foundational models for physical robotics. That means one, or just a few large models that can generate innovative and purposeful movements in virtually any robot or machine, making them move in unstructured environments, as well as purposefully manipulating their physical environment. Is this realistic? And what would it mean for automated forestry machines, for example?

Martin Servin Fotograf Johan Olsson — Martin Servin, Umeå University.

As early as 2022, we connected an AI to the control system of a forwarder at Skogforsk's Jälla test area outside Uppsala. We let it take over the control of the shuttle arm forwarder, XT28. We had trained the AI in advance, in a simulated environment on Umeå University's supercomputer. The model had undergone many millions of training steps in order to learn the relationships between control signals, sensor data and what that says about the state of the machine in the environment in which it is moving, as well as which control is likely to lead to the fulfilment of the objectives. That is to say, to move safely and efficiently over uneven terrain. Now it was ready to be connected and to take over control of the physical machine. As far as we know, this was the first time anything like this had been done with this type of machine.

The result was... quite okay. Far from perfect, and certainly not safe enough to start using. We had managed to overcome many of the discrepancies that exist between a simulated training environment and the physical reality (which is ‘confusing’ for the AI) and in so doing, demonstrated that the methodology works in principle. We are able, with no small amount of effort, to train ‘narrow’ AI models to control individual functions in a simulated environment, and we now have a methodology for transferring that solution to physical machines.

Two interesting follow-up questions now arise. Firstly, what new, smart driver support are we already able to create, which would relieve the machine operator in situations where that is desirable? So our focus is now directed to the loading of logs onto timber lorries at the landing.

The second question is, how scalable is this methodology? If we use much larger computational resources and can train much larger models, can we then attain human capability in creatively planning and controlling the machine for the many tasks and the varying conditions that prevail in the forest? It should be noted that the models we have trained so far have a few million model parameters, while OpenAI's GPT-4 for example, has about 1 trillion (1012) parameters and has been trained on a substantial number of the world's accumulated books and articles. This is why we are currently devoting a lot of time and resources in Mistra Digital Forest to the collection of data from machines in the field, and from increasingly realistic simulators.

The sceptics do not believe that it is possible to train generic foundational models of embodied AI, as yet. One of the arguments put forward is that there is not enough training data in the world in relation to how different machines and different physical situations might be. Other arguments are that embodied AI requires long-term memory and the ability to plan hierarchically, that is, to break down a problem into sub-problems which are broken down into even fewer problems, and so on, but without getting bogged down in irrelevant details. The counter-arguments culminate in the models requiring more energy to be trained and used than humanity can spare - this being linked to current methodology.

That's why I actually think it could be beneficial if things don't move too quickly. We need to have the time to discuss what it is we are trying to build, and understand the consequences of it.

Martin Servin, Senior Lecturer at Umeå University and leader of the Automation work package in the Mistra Digital Forest research programme

Caption: The image shows a simulator for training and testing AI models for automated harvesting.

Source: Anders Backman, Algoryx Simulation