AI Labs are spending billions of dollars on human-label data trainers who help the AI learn through reinforcement learning (RL). However, this means the AI will only every learn what humans know. This is not superficial superintelligence. This is not AGI. We’re just replacing humans, but that will not get us to AGI.
If we want to reach AGI, then ultimately AI Agents needs to be able to learn through trial and error, similar to how humans do. Today, we’re restricting this from happening, by manually taking human-annotated data and training models on them. This was able to get us to GPT3.5, but since then, we haven’t seen any step function changes.
AGI is when AI can learn and do something we humans have not yet discovered or thought of. But that can only be achieved through this process where agents go through daily tasks, similar to humans, make mistakes, similar to humans, and learn from them, similar to humans. Just like we learn from our own feedback loops in the real world, AI agents need to be able to do the same. That is the only way we can have an AI agent complete a task fully, end-to-end, without human intervention. For example, an AI agent, by itself, win a case for you in court.
The next level of training needs to involve a hybrid approach of RL and human intervention, enabling the ability for the AI itself to learn through trial and error.
Imagine an AI Agent performing a task, makes a few mistakes, sits there and thinks about how to solve it, being able to take in general context rather than hyper-focused notes provided by a human, and come up with a solution. There is a high possibility its a solution the human-data labeler would have never even thought of. And now because of this RL approach, rather than only know what the human data labeler annotated, when the agent runs into another problem, it will be able to take its previous learnings (also why memory is so important), while also knowing how to go through general context, and solve even harder problems in the future. Once this process is possible for an Agent, is when we can say we are close to reaching AGI. And this process at scale, we will have ultimately unlocked AGI.