Could a Language Model Know What a Dog Is?
The Grounding Problem and How Words Get Their Meaning
Right now, there’s a lively debate bubbling about large language models (LLMs) — systems like ChatGPT.
We’ve trained them on our language.1 And now some of us are asking whether they could actually understand it? Could these models count as having meaning in a non-trivial, non-merely-syntactical sense?
They certainly talk the talk. Ask one, “What’s a dog?” and it’ll give you a confident reply. It sounds like it knows. Like it understands.
Still, something feels off.
We might argue, a language model has never seen a dog. It’s never thrown a stick, felt soft fur, or heard a bark echo through a park. These systems don’t have eyes or ears. They don’t experience the world2 — they’re only manipulating high-dimensional vectors of numbers, right?3
We might scratch our heads and wonder, how could their outputs have meaning in their own right, rather than the meaning being entirely our interpretations?
That question, in a nutshell, is what philosophers and cognitive scientists call the Grounding Problem.
Put simply, it’s the puzzle of how representations — whether they’re symbols, words, numbers, or high-dimensional vectors — come to be about anything at all.
The grounding problem isn’t a new dilemma. It’s been talked about in academic circles for decades. But the recent rise of powerful language models has brought it back into the spotlight. Ever since Stevan Harnad gave the problem its name in 19904, thinkers have been debating what it is, how serious it is, whether it can be solved, and whether it’s not quite the problem we think it is.5
In this essay, I’m not going to try to untangle every thread of that debate (I couldn’t do it even if I tried). Instead, I’ll focus on one central question:
Do large language models have a grounding problem?
To get there, let’s ask three questions that start from the basics and build to the crux of the issue:
What is the grounding problem?
How might humans solve the problem?
Could a large language model do something similar?
Before we get started, three quick notes:
This is Essay 3 in The Trouble with Meaning series. You don’t need to read the others first — but if you’re curious, Essay 1 tackled Searle’s Chinese Room, and Essay 2 explored the challenges of symbolic communication (with a little help from aliens).
Let’s clear up some definitions of some key terms around the grounding problem: representation, aboutness, and meaning.
We could debate these terms for a long time — and philosophers often do. But for now, let’s go with a version that’s unlikely to upset anyone too badly. (Don’t worry, I’ll keep it simple.)
First, representation.
Some philosophers say a representation has three key parts: a vehicle, a target, and a consumer.6
The vehicle is the thing that carries the message. That could be the letters d-o-g on a page, a pattern of brain activity, or a vector of numbers inside a language model.
The target is whatever the vehicle is about — an actual dog, the idea of a dog, or even an imaginary dog.
And the consumer is some person, brain, or system that can use the link between the vehicle and the target to make sense of the world, answer a question, or make a prediction.
If any one of those parts is missing, we don’t have a representation.7 But when all three are in place, philosophers say the vehicle has intentionality (that’s their fancy word for aboutness). So when all three line up, the word dog is about a dog.
And meaning?
That’s just how well the link holds up in practice.8 If the word dog helps you make the right predictions or take the right actions around actual dogs, then it has meaning. If it doesn’t, the meaning is wrong — or missing altogether.The grounding problem touches on some big, messy questions in philosophy and science.9 To keep things simple, let’s focus on everyday words that we normally use to point to physical objects — dog, apple, tree, that sort of thing. Abstract terms like justice or freedom are a whole other tangle. So we’ll save those for another time.
Okay, with those notes in hand, let’s turn to the first question.
Q1: What is the Grounding Problem?
The symbol grounding problem was first coined by Stevan Harnad in 1990.10 And it starts with a simple question: Could an AI system find meaning in the words it uses?
Not just meaning because we humans interpret the words as meaningful, but a kind of meaning the system can earn in its own right.
If that sounds familiar, it’s because we brushed up against this same puzzle in Essay 1, when we explored Searle’s Chinese Room. Harnad’s version, though, doesn’t involve a room. It involves a dictionary — and two variations of a puzzle he calls the difficult problem and the impossible problem.
Let’s start with the first one.
The difficult problem
Recently, I visited Japan, and tried to learn some Japanese. Let’s imagine you and I are in the same boat — we don’t speak Japanese, but we’re determined to learn. So we head to a bookstore in Tokyo and pick up a dictionary.
The problem we soon face is that this dictionary is a Japanese–Japanese dictionary. Every definition is written using Japanese.
We flip to 犬. It says 飼い犬.
We look up 飼い—it points us to 家族の犬.
On and on it goes. A chain of symbols defined only in terms of other symbols. It’s like we’ve stepped onto a word merry-go-round we can’t get off.
It’s frustrating, yes — but it’s not hopeless. We could still grab a Duolingo subscription. Or bring in a teacher. Or reach for a bilingual dictionary.
In fact, this is exactly how cryptologists crack ancient scripts — by anchoring the unknown language to one they already understand. Given the right bridge, the symbols start to make sense.
That’s the difficult problem.
Now let’s raise the stakes with:
The impossible problem
Imagine trying to learn Japanese with no language at all. No English. No teacher or parent to teach you. And no pictures. Just that same Japanese–Japanese dictionary.
There’s no foothold. No shared experience to tether those words to the world.
You might notice that 飼い often appears alongside 犬 . But what do either of those symbols mean?
You’re just circling words, floating free of any reference.
According to Harnad, that’s the core of the grounding problem: it looks impossible to figure out what a symbol means using symbols alone.
The difficult problem could be solved by linking new symbols, or words, to something you already know. But the impossible problem? It leaves you with nothing to anchor to. No bedrock. One word simply points to another word and you never get off the word merry-go-round.
Some think that’s the exact situation large language models are in today. They’ve been trained on billions of words.11 Sure, some words might appear more frequently alongside other words, but without some link to the world how could an LLM connect words to real dogs, real apples, real trees or real people?
Q2: How Do We Ground Words?
In the years since Harnad’s paper, there has been much discussion about the grounding problem. The authors of a recent paper — The Vector Grounding Problem12 — lay out five types of grounding. That is, five different ways representations might link with the world.
Let’s walk through them. We’ll start with the most intuitive kind.
1. Sensorimotor grounding
The idea here is that we ground concrete words by linking them to our sensory and motor experiences. So the word dog becomes connected to all the sights, sounds, smells, and even the tug on a leash that go along with dogs. In this view, grounding comes from the way words are hooked into our embodied experience of the world.
2. Referential grounding
This one is all about connecting words to specific things in the world. For example, the word dog doesn’t just bring up a general image or idea of a dog — it points to an actual dog. Referential grounding happens when a word is causally connected to real-world things, in a way that allows us to be right or wrong about them.
3. Relational grounding
Here, grounding comes from a word’s place in a web of other words.
You may never have seen a platypus, but you know it’s an animal, from Australia, with a beak and an odd reputation. The word is grounded because it connects to other words in your mental lexicon.
This is how LLMs mostly work — they don’t need to see a dog or platypus to write about them. They just learn how those words relate to others like dog relates to tail and bark, and platypus relates to egg-laying mammal and Australia.
4. Communicative grounding
This one’s all about shared understanding. When we agree on what a word represents, we stabilise its definition through use. The word dog works because we all agree what it points to — at least most of the time.
5. Epistemic grounding
This is about knowledge systems. Think: dictionaries and encyclopaedias. Here, a word is grounded by how it fits into an external body of structured information.
Many assume that only the first two — sensorimotor and referential grounding — can get us off the word merry-go-round. These types of grounding connect words not just to other words, but to the world.
The others — relational, communicative, epistemic — keep us circling within language itself.13
But the authors of The Vector Grounding Problem don’t even think sensorimotor grounding will work. They make the case that only referential grounding stands a chance of getting us off the word merry-go-round. Sensorimotor grounding, they say, doesn’t solve the grounding problem — it just shifts it.14
Why?
Because, they argue, when you come across an actual dog, the light bouncing off the dog triggers a cascade of neural activity in your brain. That pattern is a representational vehicle inside the system. If you now link the word dog to that pattern, you’ve merely linked one representation (the word) to another representation (the neural signal). The merry-go-round keeps spinning with no direct link to the animal itself.15
What these authors say we really need is feedback that is causally connected to real-world things. A way for the system to check its guesses — to know when it’s getting things right or wrong. That, according to the authors, can only be achieved through referential grounding.
Critics argue that there are good reasons to think the authors are wrong about this claim. They argue that other forms of grounding do provide a causal link to the world. But let’s leave that discussion for another essay and turn our attention to LLMs.
Q3: Can LLMs Ground Their Words?
Large language models don’t have eyes. And they don’t have ears or hands. They don’t see or act in the world. So if we’re looking for sensorimotor grounding in an LLM, we’re out of luck.
The authors take this to leave only one live candidate — referential grounding.
And, they argue, some LLMs might already be inching toward referential grounding. How? Through a training process called Reinforcement Learning from Human Feedback — or RLHF.
This is the step where human evaluators score the model’s responses based on things like truthfulness, helpfulness, or appropriateness. The model doesn’t just predict the next word. It’s nudged, over and over, toward answers that align with human judgment.
That feedback loop becomes a kind of bridge to the world. A way for the model’s vectors to get tugged into alignment with the world.16
The authors call this indirect grounding. The idea is that the model doesn’t directly ground its words in the world — it borrows its grounding from us.
We might question whether this really is grounding. And if it is… whether this implies that the model actually understands the words it produces.
But there’s another, perhaps more uncomfortable, question lurking here. If LLMs gain their grounding indirectly through us and referential grounding is the only way to achieve grounding in the world, then…
How do we achieve grounding?
From a naturalist perspective, there’s no higher being handing out gold stars when we use a word correctly. And there’s no little man in our brains whispering, “Yes, that’s a dog.”
Sure, we might say that a certain neural pattern in our brain activates when a dog walks by — but what makes that pattern about dogs? Rather than about cats, or anything else? How do we sort dog from not-dog?
We could say that we learn the category dog as a child from our parents or teachers. But doesn’t that just move the mystery up a generation? How did the first teachers learn it? It seems that somewhere the chain has to meet bedrock, some causal anchor that turns bare correlations into genuine aboutness.
So what counts as that kind of causal link for us? How do we know when our thoughts are grounded? Where does that feedback come from?
Some thinkers try to escape the problem altogether.
They get to this point and decide to sidestep the whole puzzle by declaring there is no objective, independent reality. They say: all we ever have are internal representations. There’s no guarantee of a mind-independent world behind those representations.
This is the anti-realist view. And if it’s true, then grounding isn’t a problem to solve —because nothing ever needs to reach outside the system. There is no grounding problem because the merry-go-round is the whole show.
But if we go that route, we must give up quite a bit.
First, we lose the ability to talk about truth and falsehood in any ordinary sense. Within a self-contained language game, sure, we can talk about coherence. But not about whether a sentence is really true. Or whether it misrepresents the world.
And science? Well… science starts to wobble. Because scientific explanations — from perception to action, from learning to error correction — rest on the assumption that we are in causal contact with a shared, external world. Without that, we’re left either redefining science in purely internal terms, or declaring it inexplicable.
Still, most philosophers and scientists aren’t ready to give up on the real world just yet. They defend some form of realism — even if it is a pragmatic, critical, or perspectival type of realism. As such, they hunt for causal, historical, or social mechanisms that could bridge representations to the world.
The challenge for a physicalist view is to explain how representations latch onto reality — without magic, without a little homunculus, and without giving up on the idea of a real world.
Is that possible?
Some thinkers think it is. But they don’t all agree on how it is done.
I’ll get to those ideas soon, but before we get there, we need to take a closer look at representationalism because the grounding problem poses a particularly tricky challenge for some versions of this view of the mind.
Next Week…
Let’s take a closer look at representational thinking in cognitive science. Why has it become such a dominant view among neuroscientists, psychologists, and cognitive scientists? And why do some philosophers smell trouble?
Most headline LLMs (GPT-3, Claude 3, Gemini-Nano, etc.) are trained chiefly on giant text corpora, but some frontier models (GPT-4o, Gemini-Pro, Kosmos-2, MiniGPT-4) add images, audio or video during pre-training.
This is true for pure text-only models. Multimodal LLMs (e.g., GPT-4o-Vision) do process images and sometimes audio of dogs, though they still lack the embodied experience implied by “thrown a stick” or “felt fur.”
Some philosophers would caution that calling a process ‘mere manipulating vectors’ hides as much as it reveals.
Harnad, S. (1990). The symbol grounding problem. Physica D: Nonlinear Phenomena, 42(1–3), 335–346. https://doi.org/10.1016/0167-2789(90)90087-6
Some argue the puzzle is a category mistake (e.g., Tim Crane & Gualtiero Piccinini) or that grounding collapses once you adopt an intentional-stance (e.g., Dennett) or deflationary semantic theory (e.g., Paul Horwich & Stephen Schiffer).
Triadic analyses are standard, but writers differ slightly on the labels and sometimes add a fourth role (the producer).
Triadic theorists would agree, but causal-informational accounts (e.g., Dretske) sometimes speak of proto-representations without an active consumer. This is an active philosophical issue, and not a settled fact.
One naturalistic strand called success semantics suggests content is fixed by the success-conditions of actions that use the representation. It’s well-defended (e.g., Ramsey, Whyte, Nanay) but not universal; truth-conditional and inferential-role theories offer alternatives.
There are conceptual traps everywhere! It’s very easy to get caught redescrbiing the problem but with just fancier terms or assuming meaning where there is none.
Harnad coined the name symbol grounding problem in his 1990 paper, but worries about how symbols get their meaning pre-date him (e.g., Brentano, Wittgenstein, Searle).
True for text-only pre-training. Multimodal models are trained on images/videos too.
Mollo, D. C., & Millière, R. (2023). The vector grounding problem. arXiv. https://doi.org/10.48550/arXiv.2304.01481
Some philosophers think communicative grounding is exactly how grounding works. Wittgenstein-inspired “meaning-as-use,” Brandom’s inferentialism, and Lewis-style convention theories place communicative practice (rather than causation) at centre stage.
Proponents of embodied cognition regard sensorimotor grounding as the candidate escape.
We need to be careful here — just because some pattern of neural activity correlates with something doesn’t mean it means that thing. Remember correlation does not equal causation.
Critics counter that RLHF optimises for human preference signals, not truth, and can even entrench collective misbeliefs.
Very interesting Suzi!
It seems like we could say that LLMs have an alternate grounding to the one we do. Their grounding is actually in us, in our social dynamics. So I guess in a way that does mean we could say they have indirect grounding in the world. Although like the old tape recording copies passed around, it seems like something gets lost in each layer of indirection.
I'm not clear why sensorimotor processing isn't part of the answer for more direct grounding. In us, referential grounding seems like a result of sensorimotor grounding. If we put an LLM in a robot and made it an agent in the world (which I realize is trivial to say but very non-trivial to implement) then through its sensorimotor mechanisms, it would gain more direct grounding. It might even make sense to say it has a world model. This is why I find self driving cars and other autonomous robots much more interesting than LLMs for steps toward intelligence.
But as I noted on your last post, the main thing I think to focus on is the causal chain. Once we do, to me, the philosophical issue goes away. Not that we don't still have an enormous amount of learning to do on the details.
Stochastic parrots is what they are and is all they will ever be in their current forms.
Representations latch onto reality by imitating it.