Meta AI Chief Yann LeCun’s 5 Arguments: The Other Voice in the AI Hype

Cindy X. L.
6 min readFeb 23, 2024

--

While most folks cheer for the amazing results from AI products in the past 1–2 years, Turing Award winner Yann LeCun bravely delivered a series of unpopular opinions.

In this article, I’ll summarize his main objections and analyze the hotly debated questions so we can hear some other voices in the middle of the hype.

Sora is not a world model

Sora’s demos make many folks excited to envision the future when AI can understand and simulate the physical world. Just as OpenAI’s tech article titled, Video Generation Models as World Simulators.

Prof. LeCun, however, states in his tweet that “modeling the world for action by generating pixel is as wasteful and doomed to failure.” There exist more effective ways other than generating every single pixel just for some bad approximated results. He mentions the hot debate a few years back about generative vs. discriminative methods for classification tasks.

For folks who are not that familiar, here’s a visual, intuitive illustration of their differences. Generative models aim to propose a probability distribution whereas discriminative models find a direct mapping from input to discrete labels.

Source: mmuratarat.github.io/2019–08–23/generative-discriminative-models

I could sort of see the limitations of generative models. If these models can’t generate precise results, then the simulated world won’t follow physical laws precisely. Think about how lights reflect from surfaces of various colors and roughness, or objects moving following Newton’s principles. It’s exponentially hard to get it to the “ground truth.”

OpenAI’s problematic generation result. Red liquid suddenly appears on the table before the cup pours.

From an application standpoint, I think folks may also not be satisfied by generative models in the real world. Image editing tools like Photoshop can perform the exact task of “adding a 50% opacity layer” while a generative method can’t.

Prof. LeCun points out that, for recognition and planning tasks, using generative methods is a bad idea. Using them in language tasks as today’s LLMs makes sense because text is discrete. For high-dimension continuous space like video generation, there’s too high a prediction uncertainty that generation results will not be satisfactory.

Auto-Regressive LLMs are not the way

Another product Prof. LeCun doesn’t find innovative is ChatGPT. He says, ‘Nothing revolutionary.’ ChatGPT adopts an auto-regressive model that tries to predict the next word token based on previous word input.

Auto-Regressive LLMs. Source: https://subscription.packtpub.com/book/data/9781788834131/6/ch06lvl1sec33/autoregressive-models

In Prof. LeCun’s tweets, he attributes ChatGPT’s downsides to its auto-regressive architecture, which unfortunately-

  • has hallucinations;
  • has limited working memory;
  • is not Turing-complete;
  • is not controllable;
  • is super bad at planning.

LeCun’s ultimate approach

What structure does Prof. LeCun believe is the right way to go? In the same week as Sora, Meta released the Video Joint Embedding Predictive Architecture (V-JEPA) Model.

Source: https://ai.meta.com/blog/v-jepa-yann-lecun-ai-model-video-joint-embedding-predictive-architecture/

First, note that the model converts input to some abstract latent representation. It trains a visual encoder by predicting masked space-time areas of a frame instead of predicting the next frame.

Some of you may wonder, aren’t models like Sora all dealing with latent space? Well, I mentioned in my previous Sora article that, they use a decoder at the end to transform back to pixel presentation and V-JEPA doesn’t have such a decoder.

Also note that V-JEPA is a self-supervised model, which learns from unlabeled data like pure images or sounds, not through language or other types of labeled data. By watching only 2 million videos without text or simulated interaction, the model learns an abstract representation and obtains the superpower to predict future actions.

V-JEPA can predict.

There’s no human-level AGI

Prof. looks disappointed with auto-regressive structures; no wonder why he rejects Artificial General Intelligence (AGI), the ultimate form of AI that can perform general intellectual tasks like humans.

Today, AI with auto-regressive structures possesses very limited reasoning and planning abilities. Prof. LeCun states in his tweet: “This will not be fixed by making them bigger and training them on more data.”

When OpenAI’s Chief Scientist Ilya Sutskever tweeted about weak machine consciousness in 2022, Prof LeCun responded “Nope.”

Hey bro, what’s up

Rather, he believes intelligence has various aspects not captured by a linear scale. See different animals have different skills and, therefore, different kinds of intelligence.

As for humans, he proposed an architecture called autonomous intelligence in his position paper, a Path Towards Autonomous Machine Intelligence. For those who are interested, you can read the whole 62 pages.

The AI agent has modules to predict, reason, and make decisions, with a “world model” that can predict environment states from missing information.

Robot takeover is impossible

Prof. LeCun thinks it’s too early to worry about threats from AIs, as they are not that smart yet: “Before we have a basic design & basic demos of AI systems that could credibly reach human-level intelligence, arguments about their risks & safety mechanisms are premature.”

Along with Prof. Andrew Ng, he expressed objection to a 6-month moratorium on generative AI initiated by OpenAI members.

AI should be open-sourced

Prof. LeCun has long been an open-source advocate and believes that AI must be open-sourced.

He’s unhappy about OpenAI’s close source practice after growing from many open-sourced methods and contributions from the community. “In fact, without Meta’s PyTorch, there would be no OpenAI products today.” Says in his tweet.

My thoughts

Power is shifting from academia to industry.

In the past, research labs in academia published methodologies that were later implemented in the industry; the pattern may not necessarily hold in this wave of AI. Big data and Reinforcement Learning from Human Feedback (RLHF) are crucial to model training, so companies with user data can produce better results than university laboratories.

At the same time, the past practice of tech giants publishing papers to enhance their industry influence has also been discouraged. Each company has entered the wartime state and no longer publishes technical details. Productization becomes the key — too theoretical or fancy methods mean nothing in front of massive data and actual users. It may also be a reason for discomfort among idealists like LeCun.

The spotlight moments of AI researchers are too short.

People expect the top researchers in science and engineering to make breakthroughs every few years, which is especially cruel in the field of machine learning. Since the advent of transformers, the old methods have been largely abandoned, and the new generation of researchers no longer needs to learn LSTM or BERT.

It should be noted that this situation is not common in other science and engineering fields such as mathematics, because their knowledge blocks are built up piece by piece. In the field of deep learning, due to the tendency of empiricism, people cannot judge whether a neural network structure is excellent until it produces some good results.

I feel bad that a man as accomplished as Prof. LeCun, facing all the doubts and mocks, has to defend himself like this:

Prof. LeCun: ”I never anticipated how wonderfully entertaining it would be to see so many people who have never contributed a single thing to AI or ML…”

Additional contributors: Ethan He, Tom Gou.

--

--

Cindy X. L.
Cindy X. L.

Written by Cindy X. L.

Tech influencer (150k on Weibo), Columbia alum. This is my tiny corner to write about AI, China tech, and creator economy. Views are my own.

No responses yet