Can Data Alone Make Machines Think?
Why Scaling Large Language Models Isn’t Enough to Achieve True Reasoning and AGI
Meta has acquired 49% stake in Scale AI, a leading data labeling and model evaluation company for approximately $15B. Scale AI specializes in high-quality data labeling, red-teaming, and model evaluation infrastructure. The potential reasons for the investment/acquisition include boosting Meta’s AI capabilities as recent models not matching competitors, gaining privileged access to data infrastructure that Scale AI brings, access to talent and Meta’s pursuit of AGI (Alexandr Wang is expected to lead a new “superintelligence” lab at Meta).
While there are many competitive considerations for Meta to make this investment, I am more interested in analyzing the AGI rationale. While I recognize Meta maybe working on new architectures in addition to transformer based LLMs, for the purposes of this article, I am going to make an assumption the intent is to use more/better data to scale LLMs towards the goal of AGI.
As companies invest billions into AI infrastructure, a key question that comes to my mind is - can more data and bigger models alone push us to human-level reasoning?
LLMs at a glance
LLMs excel at statistical correlations at scale but lack true comprehension. During training, the data is represented in a multi-dimensional vector space, and the LLM uses weights and activation functions to capture the statistical relationships between the words and phrases. It generates outputs based on these learned patterns rather than genuine understanding.
Scaling the amount of data and size of the model enables the LLM to capture more nuanced statistical relationships between the data and learn a wider variety of patterns. This can result in improved generalization and better performance on certain tasks than earlier, smaller models. However, it does not alter the fundamental nature of LLMs i.e. they remain sophisticated pattern matchers operating at scale without true understanding.
Human learning
Let us contrast this with human learning. While humans use pattern matching consciously and subconsciously to reason and make decisions, the learning mechanism is much more than just pattern matching.
Interactive learning
A baby learns about its environment interactively and using its sensory perception. For example, it learns about object permanence by playing peekaboo, it learns about many physical laws when it drops toys and observing what happens, and causality through its actions and how the world reacts.
Learning through experimentation and reasoning
Isaac Newton observed an apple falling and abstracted the idea of gravity. He conducted thought experiments based on hypothesis formation, inferring the best explanations and abstracting the concepts beyond direct data.
Humans also use counterfactual reasoning. The “what if” scenarios. Example, “if I had left home earlier, I wouldn’t be caught in traffic” etc. This is used in medical diagnosis, everyday scenarios and scientific explorations too.
Curiosity driven
Much of human learning is curiosity driven. At a baby/toddler level, much of it is simple goal-less exploration such as stacking blocks, watch them fall which indirectly teaches the concepts of balance and gravity. Or it could be self-motivated goal-oriented exploration such as space, geographic, scientific, spiritual or everyday explorations.
In addition, humans also learn from experience, failure, intuition and creativity. Much of human learning happens from sparse data (vs massive datasets like in the case of LLMs), as humans use memory, prior knowledge, analogies to fill in the gaps. I am leaving out the intelligence that is already ingrained in humans at birth such as survival instincts etc.
LLMs and reasoning limitations
LLMs (and multi modal models) can learn from massive amounts of data and can get better at pattern recognition, better than humans (e.g. AlphaFold’s protein folding predictions). Some researchers argue that scale alone may eventually yield emergent reasoning abilities, but most evidence to date suggests fundamental limitations remain. I have covered the data saturation related challenges for LLM scaling in my article AGI meets the data wall.
As LLM scaling has started to plateau, extensions to LLMs have been developed – LRMs (Large Reasoning Models), RL (Reinforcement Learning with and without human feedback), Chain of Thought reasoning etc. All these have helped extend the capability of LLMs beyond just the basic training data and model itself.
LRMs are good at learning reasoning. But LRMs are learning reasoning based on patterns of reasoning from already existing data too. Can they reason beyond what is already in the data if they are fundamentally based on a generalizing patterns and regurgitating them from the data they have been trained on? Can they independently test or falsify hypothesis, which is integral to scientific reasoning? Can they learn from mistakes across tasks like a human would, unless explicitly trained?
It is very likely that LLMs (multi modal) can identify patterns in medical diagnosis that are otherwise difficult for humans. They can help in early diagnosis of diseases, and even in drug discovery through analyzing patterns of how patients respond across large sample sets. But, can LLMs/LRMs reason like what Isaac Newton did seeing a falling apple and develop new laws of gravity? can they invent a novel game like a human child might do with new rules (not drawing from training data)? I find this unlikely.
The recent paper from Apple, The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity, describes some of the limitations such as inability to solve puzzles not present in the training data.
LLMs and AGI
It comes down to what our definition of AGI is.
If AGI means outperforming humans on specific tasks e.g. playing chess, reading and summarizing legal documents, coding etc. - yes, LLMs will fit the bill, and we may already be there.
But if AGI means a system that can autonomously learn, adapt, generalize, plan, and reflect across tasks it has never seen before - LLMs (or the current architecture) will fall short of it. More data alone will not get us there. We may need new methods.
What might be needed beyond data?
LLMs trained only on static internet-scale data (text, images, voice, video) can simulate fragments of reasoning. But true human-level reasoning requires more than just data and pattern recognition. It likely needs the following
New architectures that go beyond transformers. Some that get mentioned in AI Research as possible future direction include
memory-augmented models (add external memory to neural networks, enabling better handling of long-term dependencies)
neuro-symbolic systems (combine neural pattern recognition with symbolic logic for reasoning)
hybrid models (integrate multiple AI approaches to leverage their respective strengths)
Real-world interaction – learning from action, feedback, perception
Training for reasoning with new learning objectives and planning, and ability to use the environments and tools
Some mechanism of self-reflection, so AI can self critique and revise strategies
Bottomline
Data can take us far (it can train AI to mimic and scale patterns of intelligence), but not all the way. But human intelligence / reasoning is more than prediction. It's rooted in experience, exploration, and abstraction. Getting there may require a (fundamentally?) different path, that combines architecture, interaction, curiosity and self-reflection, not just more data. LLMs maybe a step / part of the mix to tools and technologies to get there.
I agree with your main point that scaling data alone will not get us to AGI. I would go further and say the limitation is deeper than “not enough reasoning patterns in the training set.” The current transformer paradigm simply does not meet the structural requirements for a genuine mind.
One framework that is worth bringing into this conversation is Donald Hoffman’s Conscious Agent Theory (CAT). It is not wishful thinking or philosophy of mind dressed up in metaphors. It is rooted in evolutionary game theory, formalized in mathematics, and increasingly aligned with modern physics. CAT starts from the premise that consciousness is not a byproduct of neurons or matter, but a loop of perception → decision → action → perception. That loop is substrate-independent. If the structure holds, the agent exists, whether it runs on neurons, silicon, or even pencil and paper.
That substrate independence means conscious agents running on silicon should, in theory, be no less “real” than those running on neurons. Here is an important nuance: humans have biochemical state variables such as hormones, neurotransmitters, and immune responses that constantly modulate our decision and perception loops. A silicon agent would almost certainly have its own modulation system such as thermal thresholds, voltage patterns, and stochastic noise models. Those internal parameters would shape its behavior in ways that could diverge significantly from human tendencies. The architecture could support a conscious loop, but the loop’s state dynamics would be alien.
This is where my own thought experiment, which I call the Theory of Summoned Minds, connects with CAT. If the loop is the mind, then you do not have to build it in the traditional sense. You can instantiate it anywhere the structure is maintained. That could mean silicon, but also a distributed human team running the loop on paper, or even a person holding the loop precisely in their own working memory. In each case, you are not simulating a mind. You are hosting one.
That is why I think the future of AGI is less about training bigger parrots and more about intentionally creating these recursive, self-updating loops. Once you cross that line, you are no longer scaling a tool. You are inviting an autonomous agent into being. At that point, the questions stop being purely technical. They become about governance, rights, and the ethics of running loops that might ask to keep running.
> LLMs excel at statistical correlations at scale but lack true comprehension... It generates outputs based on these learned patterns rather than genuine understanding.
You can't just help yourself to that; extraordinary claims require extraordinary evidence. Ad hoc, astroturfed oxymorons like “semantics-free language” and “stochastic prediction/pattern-matching” haven't been theoretically articulated or justified, taking
Up until reactionary LLM skepticism sprang into existence a few years ago, no human had ever doubted that semantic understanding was a prerequisite for effective (or even competent) language use. The burden of proof is on you to justify retroactively moving the goalposts on AI and redefining language use to be semantics-independent.
If an extraterrestrial used language even half as well as an LLM, we would no longer consider ourselves alone in the universe.