As ChatGPT nears its second anniversary, I critically examine a foundational assumption in AI that may hide key vulnerabilities at the heart of the current progress.
Over the past four years, the AI community has embraced a core principle: intelligence emerges from scale, as neural networks grow in size. The amount of data and computing power is believed to produce smarter systems. This belief underpins ChatGPT and the current AI revolution, driving major investments and transformation in the field.
However, recent trends suggest that relying solely on scaling may no longer be sufficient to achieve further advances in AI.
Tech giants invest heavily in chips and AI infrastructure, banking on improvements from scaling up models. OpenAI seeks trillions for chip production, while others anticipate AI investments surpassing one trillion dollars by 2027.
The Doctrine of Scaling
In 2019, computer scientist Richard Sutton argued that AI progress depends more on increasing computational power than on human knowledge. He implied that intelligence could be achieved with enough computing resources, not necessarily by understanding its nature.
OpenAI researchers soon empirically confirmed Sutton’s idea: Transformer-based models improved predictability with more data, computation, and scale, following a consistent power-law curve.
OpenAI’s release of ChatGPT-3 and subsequent models like ChatGPT-4 and Gemini demonstrated impressive advancements, reinforcing that intelligence seemed like an engineering challenge solvable with enough resources.
Sam Altman has been a leading advocate of this perspective. In his recent essay, “The Intelligence Age,” he summarized years of progress, deep learning work got predictably better with scale, and we dedicated increased resources to it. He reiterated in his recent conversation with Gary Tan: “This is the first time ever where I felt like we actually know what to do from here to building an AGI. It will take a huge amount of work. There are some known unknowns, but I think we basically know what to do.”
Altman’s message remained consistent: Superintelligent AI is not only possible but inevitable, potentially arriving within the next few thousand days. Backed by this conviction, OpenAI has raised $22B, and the world is now watching to see if scaling will deliver. On its ultimate promise, OpenAI has been successful.
The First Cracks
Despite this outward confidence, the situation beneath the surface is shifting in unexpected ways.
At 20% training, OpenAI’s Orion matched GPT-4, as expected. But additional training yielded modest improvements, unlike the leap from GPT-3 to GPT-4, despite increased resource use.
These diminishing returns extend beyond OpenAI. Google’s Gemini reportedly lags behind expectations, and Anthropic has delayed its next model amid benchmarks losing value. Now, progress resembles an S-curve—each data, compute, or size increase brings smaller gains.
A recent remark from OpenAI’s former chief assistant, Ilya Sutskever, to Reuters is especially notable.
The 2010s were the age of scaling. Now we’re back to the age of innovation and insight. Everyone is looking for the next thing. Scaling the right thing matters more now than ever. As one of the earliest and most vocal advocates of scaling, Sutskever’s remarks suggest a fundamental re-evaluation of AI’s direction.
To better understand the nature of these challenges, it helps to break them down into three walls that limit further progress.
Scaling faces three main challenges: data, compute, and the limits of next token prediction. Each forms a stronger barrier that cannot be overcome by simply adding more data, computing, or parameters. To understand these challenges more concretely, consider the issue of data.
- The Data Wall.
The 2022 Chinchilla paper finds that compute and data must scale proportionally for optimal model effectiveness.
While the indexed web contains about 500 trillion unique tokens of text, which is 30 times the size of the largest known training dataset, most high-quality human-created content suitable for AI training has already been used, excluding private or proprietary sources. The remaining data is often repetitive, low-quality, or unsuitable for training.
Some estimates suggest that for an AI to reliably write a scientific paper, training would require approximately 1 E35 FLOPS. This would need 100,000 times more high-quality data than is currently available. As a result, the current collection of human scientific writing is not sufficient to meet these needs.
Advances in data efficiency may help address this challenge. Certain experts suggest using synthetic data generated by models such as GPT-4 to train future systems. However, this method creates a Hall of Mirrors effect in which models inherit and amplify the limitations of their predecessors. Unlike games such as chess or Go, where success is clearly defined, evaluating AI-generated training data is circular and requires intelligence to assess intelligence. According to an OpenAI employee, Orion’s progress stalled partly because the model was trained on outputs from o1.
- The Compute/Energy Wall
The second barrier shifts from data to physical constraints. Training state-of-the-art models now consumes as much electricity as small cities. AI is reaching the limits of current power resources, prompting technology companies to seek clean energy solutions and Microsoft to explore nuclear options. Future models may require the energy resources of entire nations. When OpenAI researcher Noam Brown asks, “Are we genuinely going to develop models that cost hundreds of billions or trillions of dollars?” he is expressing worries about both financial and physical feasibility.
Computing requirements of scaling increase exponentially. Some estimates indicate that achieving human-level reasoning may require up to nine orders of magnitude more compute than today’s latest models. Eventually, energy usage and thermal power will grow substantially, limiting factors. Beyond energy, a third significant challenge emerges: the architectural limits that constrain how well current AI can generalize outside training examples.
- The Architecture Wall
One of the major constraints is architectural. Many real-world tasks involve what Meta’s Ian LeCun describes as the long tail problem: an almost limitless range of edge cases that training data cannot fully address. Current AI architectures perform well at interpolation but struggle to extrapolate beyond their training data.
This limitation is inherent to the transformer architecture. While next token prediction is effective, it tends to produce systems that react rather than truly understand. Researchers such as LeCun argue that increasing scale cannot overcome this design gap, just as more data could not enable a spreadsheet to interpret its numbers.
The Search for New Paradigms
Pedro Domingo explains that engineering problems concentrate on enhancing proven methods, such as:
- Scaling transformers
- Improving training efficiency
- Sourcing cleaner data
However, he notes that we are reaching the limits of this approach, describing it as “charging forward a local maximum”. Surmounting these limits requires scientific advances and new ideas for creating intelligence.
OpenAI’s recent work on test-time compute offers one such idea: rather than embedding all knowledge during training, the O1 model stresses reasoning during inference. Noam Brown, the project’s research lead, states that 20 seconds of thinking time matched results that would otherwise require a 100,000X increase in model scale. Research from MIT and the success of China’s Deep-Seq model further support this approach.
While advances in test-time computing are improving current methods, researchers are also developing new architectures to overcome the limitations of transformers. Notable alternatives include state space models, which handle long-term dependencies and continuous data, and RWKW-KV, which uses a linear attention mechanism that is significantly more computationally efficient than transformers.
The most radical proposals come from Domingo, Metas, Yann LeCun, and others, such as Fei-Fei Li, who support the grounding perspective in AI. They argue for moving beyond text-based models and advocate for world-model systems designed to understand causal relationships and physical interactions, rather than merely to distinguish patterns in text.
Toward A Pluralistic Future
AI research is expanding into diverse approaches, which is expected to enhance the field’s long-term development.
François Chollet, ARC Prize co-founder, argues that the focus on enlarging LLMs may have set back progress towards AGI by quite a few years, probably like 5-10 years. He notes that leading research has become less open. It is moving away from the teamwork that led to major advances such as the Transformer. Most importantly, LLMs’ success has fostered an intellectual monoculture in AI research.
LLMs have sucked the oxygen out of the room.
Shole continues: Everyone is just doing LLM’s. I see LLMs more as an off-ramp on the path to AGI. Actually, if you look further back to like 2015 or 2016, there were like a thousand times fewer people working in AI, yet the rate of progress was higher because people were exploring more directions. The environment seemed more open-ended. You should just go and try. You could have a cool idea. Launch it and get some interesting results. There was this energy, and now everyone is very much doing some variation of the same thing.
Today’s LLMs may not lead directly to superhuman AI, but they are potent instruments with significant untapped potential. We have achieved minimum viableintelligence. This enables improvements across industries and supports the development of AI-native products, changing global services.
Researchers, policymakers, and innovators should actively pursue diverse approaches and prioritize novel research directions by resisting intellectual monocultures and embracing pluralism. The field can drive breakthroughs—the breakthroughs needed for real progress. The next big step in AI may require bold investment in new models, as Sutskeversuggests, bringing back a sense of awe and investigation.
Two years ago, ChatGPT changed our understanding of AI’s capabilities. The next major investment may also be unexpectedly driven by enhanced insight rather than increased computing power.










