LLMs vs World Models – Understanding why Yann LeCun and other top AI researchers are moving from Large Language Models to World Models

What are Large Language Models?

Large Language Models are AI models that process billions of words and generate predictions in the form of text output.

They are great for text focused tasks such as – Coding, Translation, Legal Research, Editing, Debugging, Documenting.

What are World Models?

World Models are AI models that more closely capture how humans view the world. Instead of just text, they take in a variety of input including video, music, spatial relationships, etc.

Whereas LLMs take billions of data points that are all text, World Models can take billions of data points of inputs of different modes (not just text).

Yann LeCun explains it well when he says that it is like viewing the world as Babies do and building a model of the world based on that approach.

Will adding more and more ‘compute’ and ‘resources’ to LLMs enable them to achieve Artificial General Intelligence (AGI)?

There is a very strong divide

  1. Most companies built on LLMs think that just by adding more and more compute resources and using more and more data points, LLMs can achieve Artificial General Intelligence
  2. Many AI researchers are beginning to agree with Yann LeCun who feels that LLMs can never reach Artificial General Intelligence

Personally, at Deal AI, we feel that expecting LLMs to achieve AGI is a dream. Yes, the insane amounts of investment into LLM based companies leads to a state where everyone WISHES that LLMs could achieve AGI.

Reality is that wishful thinking is not going to lead to AGI and Super Intelligence. We need World Models and even more advanced Models to start building to AGI and SI (Super Intelligence).

Why do some researchers, including one of the Godfathers of AI, Yann LeCun, think LLMs will never lead to AGI or Super Intelligence?

Reference: Life after Meta for Yann LeCun – Fast Company

A very good article and well worth reading. These two paragraphs are super key ->

 LeCun has been critical of that approach (LLMs to achieve AGI), and doubts that it has produced AI that truly reasons, rather than just detects patterns and predicts the next word or pixel in a sequence. 

LeCun has called for more foundational research on alternate paths that could more quickly lead to AI models that can match or exceed human intelligence. His recent research has focused on “world modeling”—developing AI systems capable of quickly learning about the physical world as human babies do. 

This is the crux -> LLMs just detect patterns and predict the next word or pixel in a sequence.

No matter whether you feed a LLM billions of words or trillions of words, it is just a Prediction Engine. It is not AGI or Super Intelligence.

Why we think LLMs are a Dead End as regards Artificial General Intelligence and Super Intelligence

To keep their massive valuations. To keep getting funding. To help the Founders and Early Investors to cash out. To get IPOs and hand over the bag to retail bagholders.

All the existing Investors and CEOs in AI companies desperately need to sell the Dream that LLMs will lead to AGI and Super Intelligence and unbelievable levels of wealth for everyone.

Reality does not care whether or not AI investors and AI CEOs can get their cash and exit before the game of musical chairs stops.

Reality is that LLMs have not shown any signs of being close to AGI. LLMs are very impressive for tasks that are suited to Prediction Engines and Guessing Engines. They have had no ‘human spark’ and no actual sparks of intelligence.

Why we think World Models have a lot more promise and are far likelier to lead to Artificial General Intelligence and Super Intelligence

In the end, the only path to Artificial General Intelligence and Super Intelligence is to have a Processing Engine that takes in multiple types of data (like the human senses do), and which processes these data streams as fast or faster than the human brain.

Text only input can never catch the human brain because the human brain is getting sight, sound, touch, and other sensory data.

Furthermore, the human brain is processing all of these together, in real time and in combination.

While you could argue that a ‘supercomputer’ with tens of thousands of Nvidia GPUs or AI Chips could try and reach the complexity and processing power of the human brain, there is no way LLMs with a single data source (just words) could match the uniqueness of the human brain processing data from multiple senses at the same time.


Leave a comment