It seems like we could say that LLMs have an alternate grounding to the one we do. Their grounding is actually in us, in our social dynamics. So I guess in a way that does mean we could say they have indirect grounding in the world. Although like the old tape recording copies passed around, it seems like something gets lost in each layer of indirection.
I'm not clear why sensorimotor processing isn't part of the answer for more direct grounding. In us, referential grounding seems like a result of sensorimotor grounding. If we put an LLM in a robot and made it an agent in the world (which I realize is trivial to say but very non-trivial to implement) then through its sensorimotor mechanisms, it would gain more direct grounding. It might even make sense to say it has a world model. This is why I find self driving cars and other autonomous robots much more interesting than LLMs for steps toward intelligence.
But as I noted on your last post, the main thing I think to focus on is the causal chain. Once we do, to me, the philosophical issue goes away. Not that we don't still have an enormous amount of learning to do on the details.
I’d be a bit careful with painting all humans with one brush and LLMs with another to say “they” have an alternate grounding than us. I’ve been looking for ways to humanise philosophical questions like grounding and meaning and I found these videos by a person who has been blind from birth talking about his experience of colour — what would your view of grounding make of a person like him using the words “my favourite colour used to be blue” I wonder?
Thanks! His comments about color are fascinating. It's interesting how much he knows about colors without having ever directly experienced them. It's a reminder of how much of our own concept of color is bound up in its associations.
Certainly I was referring to a typical human with more or less complete sensorimotor capabilities. I would say that his concept of color is still grounded in the sensorium he does have, in what he's been told about it throughout his life, and how he relates it to the sounds, feels, smells, and tastes he does have access to. So his grounding for some things is different from a sighted person's, but I would say it clusters much closer to the sighted person's than the LLM's.
Isn’t it? There are also videos of him talking about how he “sees” himself or what he finds “attractive” in other people (spoiler: it’s not looks) … and it’s all very fascinating to hear.
But anyways, the next thing I’m wondering about is which LLM you mean. Suzi relegated mentions of multimodal ones to the footnotes, but where would you cluster something like RT-2?
Interesting, and somewhat long the lines of the scenario I mentioned above about putting an LLM into a robot. Although skimming through their discussion, it does seem to have a couple of big differences from how an animal or us understands things.
One is that its movement commands seem to be through a text interface. So the tight coupling between sensory and motor processing isn't the same.
The other is it appears to be a pretrained model (which is typical for most of these models). So it doesn't appear to be a robot forming concepts from its sensory data of the environment, but taking actions depending on a generalized model built on training data.
Overall, it looks like an interesting project. But I'm not sure it's on track to process information the way we do.
Of course, like all AI, it's superhuman in certain ways. No human can take in the amount of training data it's built on. But even simple animals still seem to have a stronger ability to take meaning from their input. I don't doubt that will change, but I do doubt the current models will get us there.
I came here to say this. I think at some point, you have to compare Tesla++ combined with LLM++ and wonder what exactly it is that humans can supposedly do that current-AI cannot. If AI can see and touch and feel and talk to other AIs and humans, I would say that they understand meaning as well as we can.
Good point. I guess all we can really talk of is in terms of grounding that is more direct and less direct. It seems like there's always a causal chain with many steps involved.
Yes, I agree. A spectrum of directness. And there are important differences in plasticity too, as many have pointed out. But it's important to remember that a preserved context window is a real form of plasticity even if weights aren't shifted.
I don’t know exactly what I should be saying here in a political sense, and given that these modern LLMs seem to be doing my work for me. (I also mean this literally since I’ve now included three wonderful AI podcasts for my first three posts.) Thus hammering home my points could paint me as an insensitive jerk. Daniel Dennett has been considered nothing less than a naturalistic god. But now such technology seems to be redefining him as a person who accidentally backed something spooky. Once the dust settles, and perhaps this will even require the empirical validation of the brain physics which constitutes value, then that’s when the true work should need to be done to build up the still primitive science of psychology.
Except that if the laws of physics as we know them are obeyed, it doesn't matter how crucial or trivial electromagnetic fields are to consciousness. It's still a computational process and Dennett would be right!
I do not have a philosophy degree or have taken a single course in it (though I believe everyone is a philosopher in their own way) or a background in cognitive science, so I might not be using the right terms here. However, I wanted to share some thoughts that resonate with ideas I've encountered, like Thomas Nagel's essay "What Is It Like to Be a Bat?".
Do we not think that what it means to be a dog is fundamentally different for humans and dogs? Can we ever truly understand a dog without being a dog ourselves? Nagel's argument seems relevant here: even if we could study every detail of a dog's behavior, brain, and biology, we would still lack access to its subjective experience—the "dog-ness" of being a dog. Our understanding of dogs is always an external, human-centered representation, not the dog's own intrinsic experience of itself.
If we follow this line of thought, any understanding of the "dog" we humans have is just our representation of what a dog is. Our sensory experiences, cultural frameworks, and language shape it. But this representation differs from the dog's version of "dog-ness." Similarly, if AI systems develop their understanding of "dog," it will be their version of "dog"—a computational abstraction based on patterns in data rather than direct experience. This AI version would differ from human representation and the dog's experience.
In this sense, only the dog has the "right" or most authentic version of what it means to be a dog, as it is directly tied to its lived experience. Our human and AI versions are both external and, to some extent, incomplete. That said, I'm not discounting the idea that humans may have a more detailed or nuanced representation of dogs than AI, given our capacity for sensory interaction and cultural understanding. Still, even that representation is removed from the dog's reality.
Interestingly, other animals also have their versions of dogs. For example, a cat's version of a dog might be a competitor or potential threat, while a bird might interpret a dog as a large, moving object to avoid. These representations are shaped by each animal's sensory and ecological perspective, what Jakob von Uexküll called their Umwelt—their unique perceptual world.
Ultimately, all external representations—human, AI, or other animals—are referential, shaped by the observer's perspective. They are interpretations of "dog-ness," not the direct, intrinsic experience of being a dog. This realization makes me wonder if any being can fully understand another's perspective or if we are all limited in living and representing the world.
I really get that intuition — there’s something deeply personal about experience, and it’s hard to imagine truly knowing what it’s like to be another creature.
But an alternative view might suggest that what it’s like to be a dog isn’t something sealed off inside the dog, accessible only to the dog itself. Instead, what it is like to be a dog is something we can glimpse by looking at how the dog moves through the world — what it notices, what it avoids, what it seeks out.
I'm sorry. I don't think I was very clear, so let me try again.
Let me explain where I am coming from: your post reminded me of a conversation that Richard Feynman and his father, which Feynman has paraphrased here:
Feynman articulates the difference between knowing the name of something and understanding it.
"See that bird? It’s a brown-throated thrush, but in Germany it’s called a halzenfugel, and in Chinese they call it a chung ling and even if you know all those names for it, you still know nothing about the bird. You only know something about people; what they call the bird. Now that thrush sings, and teaches its young to fly, and flies so many miles away during the summer across the country, and nobody knows how it finds its way."
I am trying to say that AI knows more than just the names of things when it comes to a dog.
Now, what I’m trying to say is that there’s a broad spectrum of understanding of what a dog is. On one extreme, you have AI, which has a minimal and abstract understanding based on patterns in data. Conversely, you have the dog with the most authentic knowledge because it directly experiences what it means to be a dog. A dog doesn’t just know facts about itself—it knows what it feels like to be a dog, from the inside out.
Humans fall somewhere in between. We can observe and interpret dogs, but our experiences shape our understanding. For example, a human who has never had a dog will likely have a more detached understanding, based on external observations or cultural ideas. In contrast, a human who has lived with a dog will have a deeper, more experiential learning, shaped by emotional connection and shared life with the dog.
Other animals also have their interpretations of dogs, shaped by their interactions and sensory capabilities. For example, a cat might see a dog as a competitor, while a bird might see a dog as a potential threat. These interpretations differ from both human and dog perspectives.
AI exists at one extreme of the spectrum. It knows specific characteristics of dogs (like "dogs bark" or "dogs have four legs") but lacks direct experience and emotional connection. Its understanding is a tiny part of the spectrum compared to humans or dogs, but it still exists.
At the other extreme, the dog itself knows what it feels like to be a dog, from wagging its tail to smelling the world in ways no human or AI can truly understand. Just as I can never fully know what it feels like to be you, no one can ever fully understand what it feels like to be a dog except the dog itself. Understanding is not static—it evolves through interaction. A human with a dog learns more about it over time by observing its behaviors, emotional responses, and unique personality. Similarly, a cat that grows up with a dog develops a different understanding of dogs than one that only encounters them as predators. Even within a species, experience shapes interpretation. AI, however, lacks this dynamic quality—it processes static data and doesn’t adapt its understanding through real-world interaction (at least in its current form). This limits its ability to learn from the richness of lived experience.
Beyond individual perspectives, there is also collective understanding. Humans share knowledge about dogs through language, culture, and science. This collective knowledge allows even those who have never lived with a dog to learn about them through books, documentaries, or conversations. AI’s understanding, too, is shaped by collective human knowledge, as it is trained on massive datasets representing human language and thought. However, this collective knowledge is filtered and incomplete—it lacks the richness of direct interaction or sensory experience.
Ultimately, the spectrum of understanding—from AI to humans to the dog itself—reminds us that all perspectives are limited and relative. Each has its value, but none can fully encompass the truth of being a dog.
Ah! I see where you're coming from. Thanks for taking the time to clear that up. There’s a lot here that resonates — and some interesting philosophical moves too.
Starting with the factual parts:
Yes, current AI systems — especially large language models — do go beyond just knowing the names of things. They capture statistical patterns about dogs: how we talk about them, what behaviours they exhibit, the roles they play in human life. That’s more than labelling.
And, yes! Humans develop richer, more intuitive models of dogs through direct interaction. There's research showing that people who live with dogs are better at interpreting their behaviour than those who don’t — so experience really does shape understanding. The idea that other animals have species-specific interpretations of dogs also holds up: a cat’s experience of a dog is shaped by very different sensory systems and survival strategies than ours.
You're spot on -- most current AI systems don’t adapt their understanding through lived, real-time interaction. They process static data, and while some can work with images or audio, they don’t (yet) update continuously from experience like a brain does. That’s (I think) a real limitation, and it’s one reason AI systems still fall far short of what animals or humans do naturally.
Where your comment moves into more philosophical terrain is the idea that the dog has the “most authentic” understanding of what it means to be a dog. That’s a powerful intuition. It echoes thinkers like Thomas Nagel, who argued that the subjective feel of an experience can’t be fully captured from the outside.
But it’s not a scientific claim, it's a philosophical one. And some philosophers argue that the intuition might be wrong. For example, philosophers like Dennett argue that what matters is how well a system can predict, explain, or respond to the world. In that view, understanding isn’t about a private inner feeling; it’s about functional capacity. So a system that can't “feel” like a dog, but can navigate the world as effectively as one, might still count as having robust “dog knowledge.”
So I think you make a great point -- all perspectives are partial. But this might include the dog too. The dog, the human, the AI — each has access to different aspects of “dog-ness,” shaped by different capacities and experiences. The question then becomes whether we think that one perspective counts as more authentic than another. That answer will depend on the philosophical lens we choose to look through.
Thank you. I have not read any of Dennett’s books yet, but I have read a little about functionalism. If we were to understand consciousness fully, do you think our opinions on intelligence would likely change in ways that would make functionalism seem incomplete or even possibly outdated? Also, does functionalism dominate today because it works with what we can study, even if it doesn’t address everything we want to understand?
I think if we see functionalism as a strategy: a way to explain mind and intelligence in terms of what systems do, how they behave, how they react, how they learn, then it is very useful. As for why it dominates — yes, I think it is because it works. It’s helped us make progress without getting trapped in unanswerable metaphysical riddles.
But there is a move happening in neuroscience, atm, that says, functionalism is not wrong — it’s just incomplete. We don’t have to throw functionalism out. But what we might need to do is change what we mean by function.
This is a masterful essay within a masterful series, within a masterful body of work. I’ve never seen anything quite like this.
If a neuroscientist, philosopher and AI researcher decided to explain the inexplicable by taking advantage of a curiosity exploit, the result could be no better than When Life Gives You a Brain.
There is another option we might call "functional grounding." It was explored by Wittgenstein, but I don't think it's covered in your discussion . Functional grounding includes elements of sensorimotor grounding, relational grounding, and communicative grounding; but instead of simple sensorimotor grounding, which has difficulty with abstractions like justice or exclamations like "Aha!", it relies on what you could call "way-of-life" grounding, where the expression is associated with our activities or behaviours.
In footnote 13, you associate Wittgenstein's "meaning is use" with accounts that "place communicative grounding at centre stage." But for communicative grounding as you describe it, "The word 'dog' works because we all agree what it points to," and this lets Wittgenstein out. The concept of "pointing" may have been important to his early position in the <i>Tractatus</i>, but in <i>Philosophical Investigations</i> he repudiates it (by most accounts), precisely because of the difficulties its positivism presents for abstractions, interjections, opinions, moral and aesthetic discourse, and other language games not obviously connected with the empirical verifications of sensorimotor activity.
All language games, however, are connected with ways of life or patterns of behaviour. Their webs of words are linked to webs of behaviour, so there is an aspect of relational grounding. Their role in a shared way of life involves a shared or communal understanding, so there is an aspect of communicative grounding. But they stop short of full referential grounding, for that would imply, for example, the reification of "dog" because of some particular way of life in which the utterance "dog" plays a role. This is a stronger ontological claim than the language-game account needs to make -- and if we go there anyway, we are left wondering what we would say then about the ontological status of "justice" in some language-game where that utterance plays a role. Wittgenstein's functional grounding allows us to remain ontologically agnostic across language-games (a position that some may find dismaying).
Yes, I can see that. Wittgenstein’s later work resists any simple pointing theory of meaning. I included him under the umbrella of communicative grounding (in the 'meaning is use' sense), but you are right, this may have underplayed how radical his shift was — especially in Philosophical Investigations.
And, yes, good point, I should have mentioned functional grounding. But I am planning on addressing something along those lines in a future essay. But just quickly, yes, I agree, I think this sort of view is especially helpful when it comes to abstractions, emotions, or interjections like 'Aha!' (or even 'Justice!'), where there's no neat pointing relationship to be had. Instead, these utterances do something — they function within our shared activities, reactions, and expectations.
Thanks again Jim -- really appreciate your insight here.
It's hard to cover all the bases in this format while remaining readable, informative, and entertaining -- which you do most admirably. I look forward to that future essay!
Something related and really weird is that we could train an LLM on an undeciphered writing system and it would work. If we had a large enough training corpus, it would be able to do as well as they can for English.
Although I wonder if we can't fairly see words as at least partially grounded in other words. Words are part of reality, and refer to other words/concepts much as they refer to physical things. Some words are even just about words, like the word "word". Also, if we learn that a particular word is a noun, verb, adjective etc, we have learned something about its meaning just by learning how it relates to other words, even though we don't yet know its full meaning.
I think we could also see LLMs as having a kind of "sense" in the form of the words we use to prompt them. And while they don't have muscles, they can write out a response. If we imagine a human in a come who could only receive text messages into their brains and type out responses, I think we could fairly consider this to be their senses and their acting upon the world. And then there are multimodal LLMs, and they can be hooked up to control computers or even robots.
I think meanings must ultimately be grounded by causal connection, similar to what Mike said. It's a point Whitehead makes in 'Process & Reality', distinguishing between "perception in the mode of presentational immediacy" (our representations that work as a kind of projected image of the world) and "perception in the mode of causal efficacy" (the world as it immediately affects us). All of our representations ultimately derive from these causal influences, although we can be mistaken in how we conceptually link the two modes. We know the world because we are part of it. We do not need to merely make correlations between perceptions, because we do directly feel causality -- causality is the root of sensation.
I think of those pin art toys, where you press your hand or face on one side and it moves the needles so that a 3D image of it is formed on the other. Is that a representation or just a presentation? I would lean towards the latter.
Yes — I think that’s such a great point about words referring to other words. And like you said, in some cases, they don’t even need to point outward at all. Some words refer to themselves — like “word” or “noun.” Even whole sentences can be their own content: “This sentence is false” is the classic one. It’s a sentence that points to itself — a kind of self-contained representation.
And I really like the point you made about LLMs and undeciphered scripts. In a way, that’s what they’re doing already. Words get transformed into vectors, and those vectors are manipulated based on patterns of use. The model just needs to "know" how a word/phase "behaves" in context.
And yes, I agree with you and Mike — causal connection does seem crucial.
Language models, including multimodal ones, are static mathematical functions. That should be enough to settle any question regarding their capacity to learn, know or experience anything.
Agree! But as I see it, a sufficiently large context window, the appropriate transformer architecture, compute and memory, then we have a reshaping system. From a functionalist perspective, there's no difference between updating the parameters and updating the activation patterns by means of updating the context window.
From the outside, both parameter updates and context updates can change a model’s behaviour. But does functionalism really treats those as equivalent?
A weight update alters the model’s standing dispositions — it reshapes what the system can do across contexts. A context window, by contrast, only tweaks the current activation state. Once the prompt is gone, so is the change.
Only one of these methods makes the model able to do something new, reliably, across conditions. The other relies on keeping the context window open. Is this more like scaffolding than integrated changes?
So the question is: Is that just a structural difference? Or is it a functional one too? Is it the session that has functional equivalence?
I was probably not clear enough. Yes, I'm talking about a session! But a session that may be running for a lifetime. I think this is the most natural and least confusing way to think and talk about AI agents, but that may be only me!
Compare to identical twins. For the sake of the argument, let's say that they are completely identical at birth. If we want to say that all instances of an LLM is one and the same system, then the equivalent is to claim that the twins are the same system, just two "sessions" (for many years, hopefully). But I view them as two different individuals even if they start with the same architecture and context window.
Once we agents that are reshaping significantly over years, via the context window or weight updating (mostly the latter, probably), then I think we'll move gradually away from saying that they are the same AI
Suzi your brain has never seen a dog (no light penetrates your skull). Yet you can vividly recall that experience when no dog is present. WE are the “eyes” of the LLM. Our experiences are laid down in their weights in a similar manner to the way in which empirical sense data are laid down in your brain. Furthermore, we can give them literal “eyes” (cameras) and have done so to a limited degree.
There are virtually no people who understand both philosophy at her level and LLMs in depth. For example, she provides a naive footnote about how some models are trained on only text, while others include video. Generative Pre-trained Transformers are based on tokens that encode semantic meaning (word2vec). LLMs can’t directly ingest video. The video must first be broken down into images, and the images converted into text by a different kind of machine learning process (typically a convolutional neural network). The text *describing* the video then becomes part of the training data. LLMs can’t “see” images or “watch” videos during training. Adding narratives about videos enriches the training data, but it is not different in kind.
I didn’t read Suzi’s footnote as you apparently did. It seems perfectly accurate.
As for your comment.
Aren’t base GPT models (GPT-1/2/3) trained exclusively on text corpora?
Don’t specialized models (e.g., Video-LLaMA, LaViLa) use video-derived data that converts videos into visual tokens (via encoders like CLIP or Vision Transformers) paired with text narrations?
Don’t GPTs use subword tokenization (e.g., Byte-Pair Encoding) and transformer-based embeddings learned during pretraining, rather than Word2vec, being an older, static embedding method not used in modern LLMs?
Aren’t videos split into frames, with images then encoded into numerical embeddings (not descriptive text) using vision models like ViT or CLIP?
Thanks! But I think some of these claims are a bit outdated.
GPT-style models use sub-word tokens (BPE, SentencePiece). These tokens are then mapped to high-dimensional embeddings learned end-to-end. They are not Word2Vec vectors, and the model does not assume each token has a fixed semantic meaning.
It is true that earlier research pipelines did convert images to captions. But that's not the case anymore. Newer models (e.g., Flamingo, GPT-4o, Gemini, Perceiver-AR, VILA) pass visual feature embeddings (from CNNs or Vision Transformers) straight into a shared transformer. No intermediate textual captions are required; the visual tokens are treated as another modality. For video, models often sample frames, extract frame-level embeddings, and feed them directly.
The claim that LLMs can’t "see" images or "watch" videos during training depends on how we define "seeing". One might argue that multimodal LLMs are trained jointly on image–text pairs, sometimes video–text. They “see” embeddings that preserve spatial/temporal structure, even if they don’t output pixels. As you said in your other comment -- one could (and many philosophers of perception do) argue that neither LLMs nor brains "sees" raw images.
On whether adding narratives about videos enriches the data. I can see this. But I can also see someone arguing that is a difference “in kind” relative to text-only corpora.
It seems to me that this is a problem only if we assume the existence of an independent, external reality to which we need to ground our symbols. If we were to live entirely in a world of symbols there would be no grounding problem at all, right? Now why do we think we don’t live in a world of pure symbols and we are instead led to think that symbols have a meaning, i.e. that at least in some cases they point to something that is not a symbol?
Yes! The grounding problem feels like more of a problem if we assume there’s something outside the system of internal representations that matters — something those representations need to “hook onto.” If we lived entirely in a world of internal patterns and coherence, there’d be no concern about grounding at all.
But I think many people suspect we don’t live in that kind of world. One reason to think that is because our representations can fail. We might say, “There’s a dog in the backyard,” and sometimes — there isn’t. We were mistaken. That kind of mismatch, for many, suggests there’s a gap between our internal models and the world itself. And it’s that gap (or perceived gap) that makes things like truth, error, and knowledge such thorny philosophical topics.
Strictly speaking the moment when we realize the dog isn’t there is a clash between symbols: a stored memory of having thought that the dog was there and a current perception of the dog not being there. This if we are willing to call perception “a symbol”. But then if everything is a symbol, nothing is: our notion of a symbol rests on the idea that symbols point to something else. Still it seems we can’t talk meaningfully about that something else in any other way than using symbols to point at it. At the opposite extreme, symbols are already something else (they are already pieces of reality), since they can be pointed at by other symbols. I feel stuck.
Welcome to the swamp! You’ve captured the dilemma perfectly.
This week's essay will finish off laying out the puzzle. But after that, I’ll start exploring the different paths thinkers have taken to try to solve it — or at least, make it seem like not so much of a puzzle.
You’ve chosen to push abstract grounding to the future post right away, so maybe over there discussion would be quite different. ☺️
IMO we should have started with abstract, say boolean, like “bigger than a breadbox” stuff, and then other concepts like causality.
“Dog” is definitely starting from the deep end.
But back on today’s topic: I’m in the camp of perceived reality, which is what Donald Hoffman presents. With example of “dog” one silly trap for humans is “hyenas”. Many people would use common sense to call them “dogs” while they are actually “Feliformia” and much closer to cats. We just use our lopsided heuristics based on the shape, behavior etc. to make a quick judgement.
Scientific truth-seeking path is definitely needed, as we catalogue our universe as we see more details. Eventually new paradigms emerge, and concepts shatter.
One such concept that you’ve tirelessly covered already is “self”. Common sense on that one is about to get absolutely smashed in the next few decades. 🤣
That’s interesting — I’ve always thought of sensation and perception as the shallow end!
So “dog” felt like a less complex place to start — not abstract at all. But, you're right, dog is tricky. I guess from the neuroscience perspective, perceptual concepts are often seen as more concrete, and therefore more simple, than things like “justice” or “causality,” which feel much more complex.
And I appreciate the Donald Hoffman reference. I think he’s absolutely right to challenge the idea that perception is a transparent window to reality.
I think this is an interesting discussion and a well written overview of the topic. However I think it continues to be misplaced to treat LLM as anything more than a part of language itself, they are just a new tool mankind is using to communicate. Its not that I believe AI is impossible but more that grounding is where we would need to start, not something that merely needs to be added on to our electronic dictionaries. No one asks 'Can our calculator know how what taxes are?' or 'Can the latest weather model tell me what it's like to be wet?' That is not what these things are for and it fact it is absurd. I am solidly in the school of sensorimotor grounding. Early man developed understanding and concepts of the world before language. I would be much more interested to debate the consciousness of my cat, as at least with her there is a possibility for an output I did not have to program.
There is also a separateness and independence that defines the boundaries of each consciousness.
I often refer to LLM as plagiarism machines, they possess no capacities beyond what was given by their programmers, questioners, and the authors of the trained materials. If consciousness is ever detected it is stolen from these sources, merely obscured by the amalgamation form hundreds of sources.
Conversely you will have almost universal agreement that a new born baby , before they have learned anything, has a unique consciousness. A few might that that infant is merely a sum of their inherited DNA but those types will have an easier time arguing against the existence of consciousness than extending it to the LLMs.
Buckminster Fuller called us localized problem solvers due our unique capacity of distillation of experience into useful serendipity. LLM will surely help us with the distillation part, but the serendipity is all us.
Very interesting Suzi!
It seems like we could say that LLMs have an alternate grounding to the one we do. Their grounding is actually in us, in our social dynamics. So I guess in a way that does mean we could say they have indirect grounding in the world. Although like the old tape recording copies passed around, it seems like something gets lost in each layer of indirection.
I'm not clear why sensorimotor processing isn't part of the answer for more direct grounding. In us, referential grounding seems like a result of sensorimotor grounding. If we put an LLM in a robot and made it an agent in the world (which I realize is trivial to say but very non-trivial to implement) then through its sensorimotor mechanisms, it would gain more direct grounding. It might even make sense to say it has a world model. This is why I find self driving cars and other autonomous robots much more interesting than LLMs for steps toward intelligence.
But as I noted on your last post, the main thing I think to focus on is the causal chain. Once we do, to me, the philosophical issue goes away. Not that we don't still have an enormous amount of learning to do on the details.
I’d be a bit careful with painting all humans with one brush and LLMs with another to say “they” have an alternate grounding than us. I’ve been looking for ways to humanise philosophical questions like grounding and meaning and I found these videos by a person who has been blind from birth talking about his experience of colour — what would your view of grounding make of a person like him using the words “my favourite colour used to be blue” I wonder?
https://youtu.be/nwgkF_HOh-I?si=CTV18fXpdUZkN-6t
https://youtu.be/59YN8_lg6-U?si=7KZuPI7s4zoiTuNj
Thanks! His comments about color are fascinating. It's interesting how much he knows about colors without having ever directly experienced them. It's a reminder of how much of our own concept of color is bound up in its associations.
Certainly I was referring to a typical human with more or less complete sensorimotor capabilities. I would say that his concept of color is still grounded in the sensorium he does have, in what he's been told about it throughout his life, and how he relates it to the sounds, feels, smells, and tastes he does have access to. So his grounding for some things is different from a sighted person's, but I would say it clusters much closer to the sighted person's than the LLM's.
Isn’t it? There are also videos of him talking about how he “sees” himself or what he finds “attractive” in other people (spoiler: it’s not looks) … and it’s all very fascinating to hear.
But anyways, the next thing I’m wondering about is which LLM you mean. Suzi relegated mentions of multimodal ones to the footnotes, but where would you cluster something like RT-2?
https://deepmind.google/discover/blog/rt-2-new-model-translates-vision-and-language-into-action/
Interesting, and somewhat long the lines of the scenario I mentioned above about putting an LLM into a robot. Although skimming through their discussion, it does seem to have a couple of big differences from how an animal or us understands things.
One is that its movement commands seem to be through a text interface. So the tight coupling between sensory and motor processing isn't the same.
The other is it appears to be a pretrained model (which is typical for most of these models). So it doesn't appear to be a robot forming concepts from its sensory data of the environment, but taking actions depending on a generalized model built on training data.
Overall, it looks like an interesting project. But I'm not sure it's on track to process information the way we do.
Is it on track to (or beyond?) the way Tommy Edison understands…
“things”?
I wouldn't say so.
Of course, like all AI, it's superhuman in certain ways. No human can take in the amount of training data it's built on. But even simple animals still seem to have a stronger ability to take meaning from their input. I don't doubt that will change, but I do doubt the current models will get us there.
I always appreciate your comments, Mike. When the credits start rolling on Suzi’s essay, I always look for your insightful reaction.
Thanks Sunny! You're too kind.
I came here to say this. I think at some point, you have to compare Tesla++ combined with LLM++ and wonder what exactly it is that humans can supposedly do that current-AI cannot. If AI can see and touch and feel and talk to other AIs and humans, I would say that they understand meaning as well as we can.
In principle I agree. In practice, I think there's still a lot of work to be done to get there.
One thing that seems clear, AI won't be free from the kinds of biases and other issues that plague our minds.
Wise words! Also, one could say that brains have indirect grounding.
Good point. I guess all we can really talk of is in terms of grounding that is more direct and less direct. It seems like there's always a causal chain with many steps involved.
Yes, I agree. A spectrum of directness. And there are important differences in plasticity too, as many have pointed out. But it's important to remember that a preserved context window is a real form of plasticity even if weights aren't shifted.
I don’t know exactly what I should be saying here in a political sense, and given that these modern LLMs seem to be doing my work for me. (I also mean this literally since I’ve now included three wonderful AI podcasts for my first three posts.) Thus hammering home my points could paint me as an insensitive jerk. Daniel Dennett has been considered nothing less than a naturalistic god. But now such technology seems to be redefining him as a person who accidentally backed something spooky. Once the dust settles, and perhaps this will even require the empirical validation of the brain physics which constitutes value, then that’s when the true work should need to be done to build up the still primitive science of psychology.
Except that if the laws of physics as we know them are obeyed, it doesn't matter how crucial or trivial electromagnetic fields are to consciousness. It's still a computational process and Dennett would be right!
Stochastic parrots is what they are and is all they will ever be in their current forms.
Representations latch onto reality by imitating it.
A friendly encouragement: repeat after me: "I will not stochastically parrot the stocastic parrot meme!". If you wanna know why:
https://open.substack.com/pub/markslight/p/biologically-assisted-large-language?r=3zjzn6&utm_campaign=post&utm_medium=web&showWelcomeOnShare=false
I do not have a philosophy degree or have taken a single course in it (though I believe everyone is a philosopher in their own way) or a background in cognitive science, so I might not be using the right terms here. However, I wanted to share some thoughts that resonate with ideas I've encountered, like Thomas Nagel's essay "What Is It Like to Be a Bat?".
Do we not think that what it means to be a dog is fundamentally different for humans and dogs? Can we ever truly understand a dog without being a dog ourselves? Nagel's argument seems relevant here: even if we could study every detail of a dog's behavior, brain, and biology, we would still lack access to its subjective experience—the "dog-ness" of being a dog. Our understanding of dogs is always an external, human-centered representation, not the dog's own intrinsic experience of itself.
If we follow this line of thought, any understanding of the "dog" we humans have is just our representation of what a dog is. Our sensory experiences, cultural frameworks, and language shape it. But this representation differs from the dog's version of "dog-ness." Similarly, if AI systems develop their understanding of "dog," it will be their version of "dog"—a computational abstraction based on patterns in data rather than direct experience. This AI version would differ from human representation and the dog's experience.
In this sense, only the dog has the "right" or most authentic version of what it means to be a dog, as it is directly tied to its lived experience. Our human and AI versions are both external and, to some extent, incomplete. That said, I'm not discounting the idea that humans may have a more detailed or nuanced representation of dogs than AI, given our capacity for sensory interaction and cultural understanding. Still, even that representation is removed from the dog's reality.
Interestingly, other animals also have their versions of dogs. For example, a cat's version of a dog might be a competitor or potential threat, while a bird might interpret a dog as a large, moving object to avoid. These representations are shaped by each animal's sensory and ecological perspective, what Jakob von Uexküll called their Umwelt—their unique perceptual world.
Ultimately, all external representations—human, AI, or other animals—are referential, shaped by the observer's perspective. They are interpretations of "dog-ness," not the direct, intrinsic experience of being a dog. This realization makes me wonder if any being can fully understand another's perspective or if we are all limited in living and representing the world.
I really get that intuition — there’s something deeply personal about experience, and it’s hard to imagine truly knowing what it’s like to be another creature.
But an alternative view might suggest that what it’s like to be a dog isn’t something sealed off inside the dog, accessible only to the dog itself. Instead, what it is like to be a dog is something we can glimpse by looking at how the dog moves through the world — what it notices, what it avoids, what it seeks out.
I'm sorry. I don't think I was very clear, so let me try again.
Let me explain where I am coming from: your post reminded me of a conversation that Richard Feynman and his father, which Feynman has paraphrased here:
Feynman articulates the difference between knowing the name of something and understanding it.
"See that bird? It’s a brown-throated thrush, but in Germany it’s called a halzenfugel, and in Chinese they call it a chung ling and even if you know all those names for it, you still know nothing about the bird. You only know something about people; what they call the bird. Now that thrush sings, and teaches its young to fly, and flies so many miles away during the summer across the country, and nobody knows how it finds its way."
I am trying to say that AI knows more than just the names of things when it comes to a dog.
Now, what I’m trying to say is that there’s a broad spectrum of understanding of what a dog is. On one extreme, you have AI, which has a minimal and abstract understanding based on patterns in data. Conversely, you have the dog with the most authentic knowledge because it directly experiences what it means to be a dog. A dog doesn’t just know facts about itself—it knows what it feels like to be a dog, from the inside out.
Humans fall somewhere in between. We can observe and interpret dogs, but our experiences shape our understanding. For example, a human who has never had a dog will likely have a more detached understanding, based on external observations or cultural ideas. In contrast, a human who has lived with a dog will have a deeper, more experiential learning, shaped by emotional connection and shared life with the dog.
Other animals also have their interpretations of dogs, shaped by their interactions and sensory capabilities. For example, a cat might see a dog as a competitor, while a bird might see a dog as a potential threat. These interpretations differ from both human and dog perspectives.
AI exists at one extreme of the spectrum. It knows specific characteristics of dogs (like "dogs bark" or "dogs have four legs") but lacks direct experience and emotional connection. Its understanding is a tiny part of the spectrum compared to humans or dogs, but it still exists.
At the other extreme, the dog itself knows what it feels like to be a dog, from wagging its tail to smelling the world in ways no human or AI can truly understand. Just as I can never fully know what it feels like to be you, no one can ever fully understand what it feels like to be a dog except the dog itself. Understanding is not static—it evolves through interaction. A human with a dog learns more about it over time by observing its behaviors, emotional responses, and unique personality. Similarly, a cat that grows up with a dog develops a different understanding of dogs than one that only encounters them as predators. Even within a species, experience shapes interpretation. AI, however, lacks this dynamic quality—it processes static data and doesn’t adapt its understanding through real-world interaction (at least in its current form). This limits its ability to learn from the richness of lived experience.
Beyond individual perspectives, there is also collective understanding. Humans share knowledge about dogs through language, culture, and science. This collective knowledge allows even those who have never lived with a dog to learn about them through books, documentaries, or conversations. AI’s understanding, too, is shaped by collective human knowledge, as it is trained on massive datasets representing human language and thought. However, this collective knowledge is filtered and incomplete—it lacks the richness of direct interaction or sensory experience.
Ultimately, the spectrum of understanding—from AI to humans to the dog itself—reminds us that all perspectives are limited and relative. Each has its value, but none can fully encompass the truth of being a dog.
Ah! I see where you're coming from. Thanks for taking the time to clear that up. There’s a lot here that resonates — and some interesting philosophical moves too.
Starting with the factual parts:
Yes, current AI systems — especially large language models — do go beyond just knowing the names of things. They capture statistical patterns about dogs: how we talk about them, what behaviours they exhibit, the roles they play in human life. That’s more than labelling.
And, yes! Humans develop richer, more intuitive models of dogs through direct interaction. There's research showing that people who live with dogs are better at interpreting their behaviour than those who don’t — so experience really does shape understanding. The idea that other animals have species-specific interpretations of dogs also holds up: a cat’s experience of a dog is shaped by very different sensory systems and survival strategies than ours.
You're spot on -- most current AI systems don’t adapt their understanding through lived, real-time interaction. They process static data, and while some can work with images or audio, they don’t (yet) update continuously from experience like a brain does. That’s (I think) a real limitation, and it’s one reason AI systems still fall far short of what animals or humans do naturally.
Where your comment moves into more philosophical terrain is the idea that the dog has the “most authentic” understanding of what it means to be a dog. That’s a powerful intuition. It echoes thinkers like Thomas Nagel, who argued that the subjective feel of an experience can’t be fully captured from the outside.
But it’s not a scientific claim, it's a philosophical one. And some philosophers argue that the intuition might be wrong. For example, philosophers like Dennett argue that what matters is how well a system can predict, explain, or respond to the world. In that view, understanding isn’t about a private inner feeling; it’s about functional capacity. So a system that can't “feel” like a dog, but can navigate the world as effectively as one, might still count as having robust “dog knowledge.”
So I think you make a great point -- all perspectives are partial. But this might include the dog too. The dog, the human, the AI — each has access to different aspects of “dog-ness,” shaped by different capacities and experiences. The question then becomes whether we think that one perspective counts as more authentic than another. That answer will depend on the philosophical lens we choose to look through.
Thank you. I have not read any of Dennett’s books yet, but I have read a little about functionalism. If we were to understand consciousness fully, do you think our opinions on intelligence would likely change in ways that would make functionalism seem incomplete or even possibly outdated? Also, does functionalism dominate today because it works with what we can study, even if it doesn’t address everything we want to understand?
I think if we see functionalism as a strategy: a way to explain mind and intelligence in terms of what systems do, how they behave, how they react, how they learn, then it is very useful. As for why it dominates — yes, I think it is because it works. It’s helped us make progress without getting trapped in unanswerable metaphysical riddles.
But there is a move happening in neuroscience, atm, that says, functionalism is not wrong — it’s just incomplete. We don’t have to throw functionalism out. But what we might need to do is change what we mean by function.
This is a masterful essay within a masterful series, within a masterful body of work. I’ve never seen anything quite like this.
If a neuroscientist, philosopher and AI researcher decided to explain the inexplicable by taking advantage of a curiosity exploit, the result could be no better than When Life Gives You a Brain.
What a lovely thing to say! Thank you so much, Sunny.
I love this series, and the niche you occupy between philosophy and AI. Thank you! Keep the content coming
Great work. Very thought provoking.
Thank you!
There is another option we might call "functional grounding." It was explored by Wittgenstein, but I don't think it's covered in your discussion . Functional grounding includes elements of sensorimotor grounding, relational grounding, and communicative grounding; but instead of simple sensorimotor grounding, which has difficulty with abstractions like justice or exclamations like "Aha!", it relies on what you could call "way-of-life" grounding, where the expression is associated with our activities or behaviours.
In footnote 13, you associate Wittgenstein's "meaning is use" with accounts that "place communicative grounding at centre stage." But for communicative grounding as you describe it, "The word 'dog' works because we all agree what it points to," and this lets Wittgenstein out. The concept of "pointing" may have been important to his early position in the <i>Tractatus</i>, but in <i>Philosophical Investigations</i> he repudiates it (by most accounts), precisely because of the difficulties its positivism presents for abstractions, interjections, opinions, moral and aesthetic discourse, and other language games not obviously connected with the empirical verifications of sensorimotor activity.
All language games, however, are connected with ways of life or patterns of behaviour. Their webs of words are linked to webs of behaviour, so there is an aspect of relational grounding. Their role in a shared way of life involves a shared or communal understanding, so there is an aspect of communicative grounding. But they stop short of full referential grounding, for that would imply, for example, the reification of "dog" because of some particular way of life in which the utterance "dog" plays a role. This is a stronger ontological claim than the language-game account needs to make -- and if we go there anyway, we are left wondering what we would say then about the ontological status of "justice" in some language-game where that utterance plays a role. Wittgenstein's functional grounding allows us to remain ontologically agnostic across language-games (a position that some may find dismaying).
Hi Jim!
This is a great thoughtful push. Thank you.
Yes, I can see that. Wittgenstein’s later work resists any simple pointing theory of meaning. I included him under the umbrella of communicative grounding (in the 'meaning is use' sense), but you are right, this may have underplayed how radical his shift was — especially in Philosophical Investigations.
And, yes, good point, I should have mentioned functional grounding. But I am planning on addressing something along those lines in a future essay. But just quickly, yes, I agree, I think this sort of view is especially helpful when it comes to abstractions, emotions, or interjections like 'Aha!' (or even 'Justice!'), where there's no neat pointing relationship to be had. Instead, these utterances do something — they function within our shared activities, reactions, and expectations.
Thanks again Jim -- really appreciate your insight here.
It's hard to cover all the bases in this format while remaining readable, informative, and entertaining -- which you do most admirably. I look forward to that future essay!
Something related and really weird is that we could train an LLM on an undeciphered writing system and it would work. If we had a large enough training corpus, it would be able to do as well as they can for English.
Although I wonder if we can't fairly see words as at least partially grounded in other words. Words are part of reality, and refer to other words/concepts much as they refer to physical things. Some words are even just about words, like the word "word". Also, if we learn that a particular word is a noun, verb, adjective etc, we have learned something about its meaning just by learning how it relates to other words, even though we don't yet know its full meaning.
I think we could also see LLMs as having a kind of "sense" in the form of the words we use to prompt them. And while they don't have muscles, they can write out a response. If we imagine a human in a come who could only receive text messages into their brains and type out responses, I think we could fairly consider this to be their senses and their acting upon the world. And then there are multimodal LLMs, and they can be hooked up to control computers or even robots.
I think meanings must ultimately be grounded by causal connection, similar to what Mike said. It's a point Whitehead makes in 'Process & Reality', distinguishing between "perception in the mode of presentational immediacy" (our representations that work as a kind of projected image of the world) and "perception in the mode of causal efficacy" (the world as it immediately affects us). All of our representations ultimately derive from these causal influences, although we can be mistaken in how we conceptually link the two modes. We know the world because we are part of it. We do not need to merely make correlations between perceptions, because we do directly feel causality -- causality is the root of sensation.
I think of those pin art toys, where you press your hand or face on one side and it moves the needles so that a 3D image of it is formed on the other. Is that a representation or just a presentation? I would lean towards the latter.
Hi Joseph! Great comment — there’s so much here.
Yes — I think that’s such a great point about words referring to other words. And like you said, in some cases, they don’t even need to point outward at all. Some words refer to themselves — like “word” or “noun.” Even whole sentences can be their own content: “This sentence is false” is the classic one. It’s a sentence that points to itself — a kind of self-contained representation.
And I really like the point you made about LLMs and undeciphered scripts. In a way, that’s what they’re doing already. Words get transformed into vectors, and those vectors are manipulated based on patterns of use. The model just needs to "know" how a word/phase "behaves" in context.
And yes, I agree with you and Mike — causal connection does seem crucial.
Language models, including multimodal ones, are static mathematical functions. That should be enough to settle any question regarding their capacity to learn, know or experience anything.
Being active seems important to me too. A system that can reshape itself through interaction feels very different from one doesn't do this.
Agree! But as I see it, a sufficiently large context window, the appropriate transformer architecture, compute and memory, then we have a reshaping system. From a functionalist perspective, there's no difference between updating the parameters and updating the activation patterns by means of updating the context window.
Interesting claim — I'll play devil's advocate.
From the outside, both parameter updates and context updates can change a model’s behaviour. But does functionalism really treats those as equivalent?
A weight update alters the model’s standing dispositions — it reshapes what the system can do across contexts. A context window, by contrast, only tweaks the current activation state. Once the prompt is gone, so is the change.
Only one of these methods makes the model able to do something new, reliably, across conditions. The other relies on keeping the context window open. Is this more like scaffolding than integrated changes?
So the question is: Is that just a structural difference? Or is it a functional one too? Is it the session that has functional equivalence?
Oh, please do!
I was probably not clear enough. Yes, I'm talking about a session! But a session that may be running for a lifetime. I think this is the most natural and least confusing way to think and talk about AI agents, but that may be only me!
Compare to identical twins. For the sake of the argument, let's say that they are completely identical at birth. If we want to say that all instances of an LLM is one and the same system, then the equivalent is to claim that the twins are the same system, just two "sessions" (for many years, hopefully). But I view them as two different individuals even if they start with the same architecture and context window.
Once we agents that are reshaping significantly over years, via the context window or weight updating (mostly the latter, probably), then I think we'll move gradually away from saying that they are the same AI
Does that make more sense?
Suzi your brain has never seen a dog (no light penetrates your skull). Yet you can vividly recall that experience when no dog is present. WE are the “eyes” of the LLM. Our experiences are laid down in their weights in a similar manner to the way in which empirical sense data are laid down in your brain. Furthermore, we can give them literal “eyes” (cameras) and have done so to a limited degree.
You’re right! No light reaches our brain. Yet, we say things like I can visualise a dog.
LLM weights do store human-generated patterns, but unlike a brain they’re mostly frozen after training. This, misses the motor-sensory loop.
Multimodal models with cameras are a fascinating case. But, still, I think a robot dog is still a long way from a real dog.
There are virtually no people who understand both philosophy at her level and LLMs in depth. For example, she provides a naive footnote about how some models are trained on only text, while others include video. Generative Pre-trained Transformers are based on tokens that encode semantic meaning (word2vec). LLMs can’t directly ingest video. The video must first be broken down into images, and the images converted into text by a different kind of machine learning process (typically a convolutional neural network). The text *describing* the video then becomes part of the training data. LLMs can’t “see” images or “watch” videos during training. Adding narratives about videos enriches the training data, but it is not different in kind.
I didn’t read Suzi’s footnote as you apparently did. It seems perfectly accurate.
As for your comment.
Aren’t base GPT models (GPT-1/2/3) trained exclusively on text corpora?
Don’t specialized models (e.g., Video-LLaMA, LaViLa) use video-derived data that converts videos into visual tokens (via encoders like CLIP or Vision Transformers) paired with text narrations?
Don’t GPTs use subword tokenization (e.g., Byte-Pair Encoding) and transformer-based embeddings learned during pretraining, rather than Word2vec, being an older, static embedding method not used in modern LLMs?
Aren’t videos split into frames, with images then encoded into numerical embeddings (not descriptive text) using vision models like ViT or CLIP?
Thanks! But I think some of these claims are a bit outdated.
GPT-style models use sub-word tokens (BPE, SentencePiece). These tokens are then mapped to high-dimensional embeddings learned end-to-end. They are not Word2Vec vectors, and the model does not assume each token has a fixed semantic meaning.
It is true that earlier research pipelines did convert images to captions. But that's not the case anymore. Newer models (e.g., Flamingo, GPT-4o, Gemini, Perceiver-AR, VILA) pass visual feature embeddings (from CNNs or Vision Transformers) straight into a shared transformer. No intermediate textual captions are required; the visual tokens are treated as another modality. For video, models often sample frames, extract frame-level embeddings, and feed them directly.
The claim that LLMs can’t "see" images or "watch" videos during training depends on how we define "seeing". One might argue that multimodal LLMs are trained jointly on image–text pairs, sometimes video–text. They “see” embeddings that preserve spatial/temporal structure, even if they don’t output pixels. As you said in your other comment -- one could (and many philosophers of perception do) argue that neither LLMs nor brains "sees" raw images.
On whether adding narratives about videos enriches the data. I can see this. But I can also see someone arguing that is a difference “in kind” relative to text-only corpora.
It seems to me that this is a problem only if we assume the existence of an independent, external reality to which we need to ground our symbols. If we were to live entirely in a world of symbols there would be no grounding problem at all, right? Now why do we think we don’t live in a world of pure symbols and we are instead led to think that symbols have a meaning, i.e. that at least in some cases they point to something that is not a symbol?
Yes! The grounding problem feels like more of a problem if we assume there’s something outside the system of internal representations that matters — something those representations need to “hook onto.” If we lived entirely in a world of internal patterns and coherence, there’d be no concern about grounding at all.
But I think many people suspect we don’t live in that kind of world. One reason to think that is because our representations can fail. We might say, “There’s a dog in the backyard,” and sometimes — there isn’t. We were mistaken. That kind of mismatch, for many, suggests there’s a gap between our internal models and the world itself. And it’s that gap (or perceived gap) that makes things like truth, error, and knowledge such thorny philosophical topics.
Strictly speaking the moment when we realize the dog isn’t there is a clash between symbols: a stored memory of having thought that the dog was there and a current perception of the dog not being there. This if we are willing to call perception “a symbol”. But then if everything is a symbol, nothing is: our notion of a symbol rests on the idea that symbols point to something else. Still it seems we can’t talk meaningfully about that something else in any other way than using symbols to point at it. At the opposite extreme, symbols are already something else (they are already pieces of reality), since they can be pointed at by other symbols. I feel stuck.
Welcome to the swamp! You’ve captured the dilemma perfectly.
This week's essay will finish off laying out the puzzle. But after that, I’ll start exploring the different paths thinkers have taken to try to solve it — or at least, make it seem like not so much of a puzzle.
You’ve chosen to push abstract grounding to the future post right away, so maybe over there discussion would be quite different. ☺️
IMO we should have started with abstract, say boolean, like “bigger than a breadbox” stuff, and then other concepts like causality.
“Dog” is definitely starting from the deep end.
But back on today’s topic: I’m in the camp of perceived reality, which is what Donald Hoffman presents. With example of “dog” one silly trap for humans is “hyenas”. Many people would use common sense to call them “dogs” while they are actually “Feliformia” and much closer to cats. We just use our lopsided heuristics based on the shape, behavior etc. to make a quick judgement.
Scientific truth-seeking path is definitely needed, as we catalogue our universe as we see more details. Eventually new paradigms emerge, and concepts shatter.
One such concept that you’ve tirelessly covered already is “self”. Common sense on that one is about to get absolutely smashed in the next few decades. 🤣
That’s interesting — I’ve always thought of sensation and perception as the shallow end!
So “dog” felt like a less complex place to start — not abstract at all. But, you're right, dog is tricky. I guess from the neuroscience perspective, perceptual concepts are often seen as more concrete, and therefore more simple, than things like “justice” or “causality,” which feel much more complex.
And I appreciate the Donald Hoffman reference. I think he’s absolutely right to challenge the idea that perception is a transparent window to reality.
Totally agree on the “self” 😄
I think this is an interesting discussion and a well written overview of the topic. However I think it continues to be misplaced to treat LLM as anything more than a part of language itself, they are just a new tool mankind is using to communicate. Its not that I believe AI is impossible but more that grounding is where we would need to start, not something that merely needs to be added on to our electronic dictionaries. No one asks 'Can our calculator know how what taxes are?' or 'Can the latest weather model tell me what it's like to be wet?' That is not what these things are for and it fact it is absurd. I am solidly in the school of sensorimotor grounding. Early man developed understanding and concepts of the world before language. I would be much more interested to debate the consciousness of my cat, as at least with her there is a possibility for an output I did not have to program.
I'm with you. I think sensorimotor is important for what we mean by consciousness.
There is also a separateness and independence that defines the boundaries of each consciousness.
I often refer to LLM as plagiarism machines, they possess no capacities beyond what was given by their programmers, questioners, and the authors of the trained materials. If consciousness is ever detected it is stolen from these sources, merely obscured by the amalgamation form hundreds of sources.
Conversely you will have almost universal agreement that a new born baby , before they have learned anything, has a unique consciousness. A few might that that infant is merely a sum of their inherited DNA but those types will have an easier time arguing against the existence of consciousness than extending it to the LLMs.
Buckminster Fuller called us localized problem solvers due our unique capacity of distillation of experience into useful serendipity. LLM will surely help us with the distillation part, but the serendipity is all us.
This is brilliant.
Thank you so much!