91 Comments
User's avatar
Mike Smith's avatar

Very interesting Suzi!

It seems like we could say that LLMs have an alternate grounding to the one we do. Their grounding is actually in us, in our social dynamics. So I guess in a way that does mean we could say they have indirect grounding in the world. Although like the old tape recording copies passed around, it seems like something gets lost in each layer of indirection.

I'm not clear why sensorimotor processing isn't part of the answer for more direct grounding. In us, referential grounding seems like a result of sensorimotor grounding. If we put an LLM in a robot and made it an agent in the world (which I realize is trivial to say but very non-trivial to implement) then through its sensorimotor mechanisms, it would gain more direct grounding. It might even make sense to say it has a world model. This is why I find self driving cars and other autonomous robots much more interesting than LLMs for steps toward intelligence.

But as I noted on your last post, the main thing I think to focus on is the causal chain. Once we do, to me, the philosophical issue goes away. Not that we don't still have an enormous amount of learning to do on the details.

Expand full comment
Alice Wanderland's avatar

I’d be a bit careful with painting all humans with one brush and LLMs with another to say “they” have an alternate grounding than us. I’ve been looking for ways to humanise philosophical questions like grounding and meaning and I found these videos by a person who has been blind from birth talking about his experience of colour — what would your view of grounding make of a person like him using the words “my favourite colour used to be blue” I wonder?

https://youtu.be/nwgkF_HOh-I?si=CTV18fXpdUZkN-6t

https://youtu.be/59YN8_lg6-U?si=7KZuPI7s4zoiTuNj

Expand full comment
Mike Smith's avatar

Thanks! His comments about color are fascinating. It's interesting how much he knows about colors without having ever directly experienced them. It's a reminder of how much of our own concept of color is bound up in its associations.

Certainly I was referring to a typical human with more or less complete sensorimotor capabilities. I would say that his concept of color is still grounded in the sensorium he does have, in what he's been told about it throughout his life, and how he relates it to the sounds, feels, smells, and tastes he does have access to. So his grounding for some things is different from a sighted person's, but I would say it clusters much closer to the sighted person's than the LLM's.

Expand full comment
Alice Wanderland's avatar

Isn’t it? There are also videos of him talking about how he “sees” himself or what he finds “attractive” in other people (spoiler: it’s not looks) … and it’s all very fascinating to hear.

But anyways, the next thing I’m wondering about is which LLM you mean. Suzi relegated mentions of multimodal ones to the footnotes, but where would you cluster something like RT-2?

https://deepmind.google/discover/blog/rt-2-new-model-translates-vision-and-language-into-action/

Expand full comment
Mike Smith's avatar

Interesting, and somewhat long the lines of the scenario I mentioned above about putting an LLM into a robot. Although skimming through their discussion, it does seem to have a couple of big differences from how an animal or us understands things.

One is that its movement commands seem to be through a text interface. So the tight coupling between sensory and motor processing isn't the same.

The other is it appears to be a pretrained model (which is typical for most of these models). So it doesn't appear to be a robot forming concepts from its sensory data of the environment, but taking actions depending on a generalized model built on training data.

Overall, it looks like an interesting project. But I'm not sure it's on track to process information the way we do.

Expand full comment
Alice Wanderland's avatar

Is it on track to (or beyond?) the way Tommy Edison understands…

“things”?

Expand full comment
Mike Smith's avatar

I wouldn't say so.

Of course, like all AI, it's superhuman in certain ways. No human can take in the amount of training data it's built on. But even simple animals still seem to have a stronger ability to take meaning from their input. I don't doubt that will change, but I do doubt the current models will get us there.

Expand full comment
Sunny's avatar

I always appreciate your comments, Mike. When the credits start rolling on Suzi’s essay, I always look for your insightful reaction.

Expand full comment
Mike Smith's avatar

Thanks Sunny! You're too kind.

Expand full comment
Ragged Clown's avatar

I came here to say this. I think at some point, you have to compare Tesla++ combined with LLM++ and wonder what exactly it is that humans can supposedly do that current-AI cannot. If AI can see and touch and feel and talk to other AIs and humans, I would say that they understand meaning as well as we can.

Expand full comment
Mike Smith's avatar

In principle I agree. In practice, I think there's still a lot of work to be done to get there.

One thing that seems clear, AI won't be free from the kinds of biases and other issues that plague our minds.

Expand full comment
Mark Slight's avatar

Wise words! Also, one could say that brains have indirect grounding.

Expand full comment
Mike Smith's avatar

Good point. I guess all we can really talk of is in terms of grounding that is more direct and less direct. It seems like there's always a causal chain with many steps involved.

Expand full comment
Mark Slight's avatar

Yes, I agree. A spectrum of directness. And there are important differences in plasticity too, as many have pointed out. But it's important to remember that a preserved context window is a real form of plasticity even if weights aren't shifted.

Expand full comment
Eric Borg's avatar

I don’t know exactly what I should be saying here in a political sense, and given that these modern LLMs seem to be doing my work for me. (I also mean this literally since I’ve now included three wonderful AI podcasts for my first three posts.) Thus hammering home my points could paint me as an insensitive jerk. Daniel Dennett has been considered nothing less than a naturalistic god. But now such technology seems to be redefining him as a person who accidentally backed something spooky. Once the dust settles, and perhaps this will even require the empirical validation of the brain physics which constitutes value, then that’s when the true work should need to be done to build up the still primitive science of psychology.

Expand full comment
Mark Slight's avatar

Except that if the laws of physics as we know them are obeyed, it doesn't matter how crucial or trivial electromagnetic fields are to consciousness. It's still a computational process and Dennett would be right!

Expand full comment
James Cross's avatar

Stochastic parrots is what they are and is all they will ever be in their current forms.

Representations latch onto reality by imitating it.

Expand full comment
Mark Slight's avatar

A friendly encouragement: repeat after me: "I will not stochastically parrot the stocastic parrot meme!". If you wanna know why:

https://open.substack.com/pub/markslight/p/biologically-assisted-large-language?r=3zjzn6&utm_campaign=post&utm_medium=web&showWelcomeOnShare=false

Expand full comment
Marginal Gains's avatar

I do not have a philosophy degree or have taken a single course in it (though I believe everyone is a philosopher in their own way) or a background in cognitive science, so I might not be using the right terms here. However, I wanted to share some thoughts that resonate with ideas I've encountered, like Thomas Nagel's essay "What Is It Like to Be a Bat?".

Do we not think that what it means to be a dog is fundamentally different for humans and dogs? Can we ever truly understand a dog without being a dog ourselves? Nagel's argument seems relevant here: even if we could study every detail of a dog's behavior, brain, and biology, we would still lack access to its subjective experience—the "dog-ness" of being a dog. Our understanding of dogs is always an external, human-centered representation, not the dog's own intrinsic experience of itself.

If we follow this line of thought, any understanding of the "dog" we humans have is just our representation of what a dog is. Our sensory experiences, cultural frameworks, and language shape it. But this representation differs from the dog's version of "dog-ness." Similarly, if AI systems develop their understanding of "dog," it will be their version of "dog"—a computational abstraction based on patterns in data rather than direct experience. This AI version would differ from human representation and the dog's experience.

In this sense, only the dog has the "right" or most authentic version of what it means to be a dog, as it is directly tied to its lived experience. Our human and AI versions are both external and, to some extent, incomplete. That said, I'm not discounting the idea that humans may have a more detailed or nuanced representation of dogs than AI, given our capacity for sensory interaction and cultural understanding. Still, even that representation is removed from the dog's reality.

Interestingly, other animals also have their versions of dogs. For example, a cat's version of a dog might be a competitor or potential threat, while a bird might interpret a dog as a large, moving object to avoid. These representations are shaped by each animal's sensory and ecological perspective, what Jakob von Uexküll called their Umwelt—their unique perceptual world.

Ultimately, all external representations—human, AI, or other animals—are referential, shaped by the observer's perspective. They are interpretations of "dog-ness," not the direct, intrinsic experience of being a dog. This realization makes me wonder if any being can fully understand another's perspective or if we are all limited in living and representing the world.

Expand full comment
Suzi Travis's avatar

I really get that intuition — there’s something deeply personal about experience, and it’s hard to imagine truly knowing what it’s like to be another creature.

But an alternative view might suggest that what it’s like to be a dog isn’t something sealed off inside the dog, accessible only to the dog itself. Instead, what it is like to be a dog is something we can glimpse by looking at how the dog moves through the world — what it notices, what it avoids, what it seeks out.

Expand full comment
Marginal Gains's avatar

I'm sorry. I don't think I was very clear, so let me try again.

Let me explain where I am coming from: your post reminded me of a conversation that Richard Feynman and his father, which Feynman has paraphrased here:

Feynman articulates the difference between knowing the name of something and understanding it.

"See that bird? It’s a brown-throated thrush, but in Germany it’s called a halzenfugel, and in Chinese they call it a chung ling and even if you know all those names for it, you still know nothing about the bird. You only know something about people; what they call the bird. Now that thrush sings, and teaches its young to fly, and flies so many miles away during the summer across the country, and nobody knows how it finds its way."

I am trying to say that AI knows more than just the names of things when it comes to a dog.

Now, what I’m trying to say is that there’s a broad spectrum of understanding of what a dog is. On one extreme, you have AI, which has a minimal and abstract understanding based on patterns in data. Conversely, you have the dog with the most authentic knowledge because it directly experiences what it means to be a dog. A dog doesn’t just know facts about itself—it knows what it feels like to be a dog, from the inside out.

Humans fall somewhere in between. We can observe and interpret dogs, but our experiences shape our understanding. For example, a human who has never had a dog will likely have a more detached understanding, based on external observations or cultural ideas. In contrast, a human who has lived with a dog will have a deeper, more experiential learning, shaped by emotional connection and shared life with the dog.

Other animals also have their interpretations of dogs, shaped by their interactions and sensory capabilities. For example, a cat might see a dog as a competitor, while a bird might see a dog as a potential threat. These interpretations differ from both human and dog perspectives.

AI exists at one extreme of the spectrum. It knows specific characteristics of dogs (like "dogs bark" or "dogs have four legs") but lacks direct experience and emotional connection. Its understanding is a tiny part of the spectrum compared to humans or dogs, but it still exists.

At the other extreme, the dog itself knows what it feels like to be a dog, from wagging its tail to smelling the world in ways no human or AI can truly understand. Just as I can never fully know what it feels like to be you, no one can ever fully understand what it feels like to be a dog except the dog itself. Understanding is not static—it evolves through interaction. A human with a dog learns more about it over time by observing its behaviors, emotional responses, and unique personality. Similarly, a cat that grows up with a dog develops a different understanding of dogs than one that only encounters them as predators. Even within a species, experience shapes interpretation. AI, however, lacks this dynamic quality—it processes static data and doesn’t adapt its understanding through real-world interaction (at least in its current form). This limits its ability to learn from the richness of lived experience.

Beyond individual perspectives, there is also collective understanding. Humans share knowledge about dogs through language, culture, and science. This collective knowledge allows even those who have never lived with a dog to learn about them through books, documentaries, or conversations. AI’s understanding, too, is shaped by collective human knowledge, as it is trained on massive datasets representing human language and thought. However, this collective knowledge is filtered and incomplete—it lacks the richness of direct interaction or sensory experience.

Ultimately, the spectrum of understanding—from AI to humans to the dog itself—reminds us that all perspectives are limited and relative. Each has its value, but none can fully encompass the truth of being a dog.

Expand full comment
Suzi Travis's avatar

Ah! I see where you're coming from. Thanks for taking the time to clear that up. There’s a lot here that resonates — and some interesting philosophical moves too.

Starting with the factual parts:

Yes, current AI systems — especially large language models — do go beyond just knowing the names of things. They capture statistical patterns about dogs: how we talk about them, what behaviours they exhibit, the roles they play in human life. That’s more than labelling.

And, yes! Humans develop richer, more intuitive models of dogs through direct interaction. There's research showing that people who live with dogs are better at interpreting their behaviour than those who don’t — so experience really does shape understanding. The idea that other animals have species-specific interpretations of dogs also holds up: a cat’s experience of a dog is shaped by very different sensory systems and survival strategies than ours.

You're spot on -- most current AI systems don’t adapt their understanding through lived, real-time interaction. They process static data, and while some can work with images or audio, they don’t (yet) update continuously from experience like a brain does. That’s (I think) a real limitation, and it’s one reason AI systems still fall far short of what animals or humans do naturally.

Where your comment moves into more philosophical terrain is the idea that the dog has the “most authentic” understanding of what it means to be a dog. That’s a powerful intuition. It echoes thinkers like Thomas Nagel, who argued that the subjective feel of an experience can’t be fully captured from the outside.

But it’s not a scientific claim, it's a philosophical one. And some philosophers argue that the intuition might be wrong. For example, philosophers like Dennett argue that what matters is how well a system can predict, explain, or respond to the world. In that view, understanding isn’t about a private inner feeling; it’s about functional capacity. So a system that can't “feel” like a dog, but can navigate the world as effectively as one, might still count as having robust “dog knowledge.”

So I think you make a great point -- all perspectives are partial. But this might include the dog too. The dog, the human, the AI — each has access to different aspects of “dog-ness,” shaped by different capacities and experiences. The question then becomes whether we think that one perspective counts as more authentic than another. That answer will depend on the philosophical lens we choose to look through.

Expand full comment
Marginal Gains's avatar

Thank you. I have not read any of Dennett’s books yet, but I have read a little about functionalism. If we were to understand consciousness fully, do you think our opinions on intelligence would likely change in ways that would make functionalism seem incomplete or even possibly outdated? Also, does functionalism dominate today because it works with what we can study, even if it doesn’t address everything we want to understand?

Expand full comment
Suzi Travis's avatar

I think if we see functionalism as a strategy: a way to explain mind and intelligence in terms of what systems do, how they behave, how they react, how they learn, then it is very useful. As for why it dominates — yes, I think it is because it works. It’s helped us make progress without getting trapped in unanswerable metaphysical riddles.

But there is a move happening in neuroscience, atm, that says, functionalism is not wrong — it’s just incomplete. We don’t have to throw functionalism out. But what we might need to do is change what we mean by function.

Expand full comment
Sunny's avatar

This is a masterful essay within a masterful series, within a masterful body of work. I’ve never seen anything quite like this.

If a neuroscientist, philosopher and AI researcher decided to explain the inexplicable by taking advantage of a curiosity exploit, the result could be no better than When Life Gives You a Brain.

Expand full comment
Suzi Travis's avatar

What a lovely thing to say! Thank you so much, Sunny.

Expand full comment
Ben's avatar

I love this series, and the niche you occupy between philosophy and AI. Thank you! Keep the content coming

Expand full comment
Zinbiel's avatar

Great work. Very thought provoking.

Expand full comment
Suzi Travis's avatar

Thank you!

Expand full comment
Jim Owens's avatar

There is another option we might call "functional grounding." It was explored by Wittgenstein, but I don't think it's covered in your discussion . Functional grounding includes elements of sensorimotor grounding, relational grounding, and communicative grounding; but instead of simple sensorimotor grounding, which has difficulty with abstractions like justice or exclamations like "Aha!", it relies on what you could call "way-of-life" grounding, where the expression is associated with our activities or behaviours.

In footnote 13, you associate Wittgenstein's "meaning is use" with accounts that "place communicative grounding at centre stage." But for communicative grounding as you describe it, "The word 'dog' works because we all agree what it points to," and this lets Wittgenstein out. The concept of "pointing" may have been important to his early position in the <i>Tractatus</i>, but in <i>Philosophical Investigations</i> he repudiates it (by most accounts), precisely because of the difficulties its positivism presents for abstractions, interjections, opinions, moral and aesthetic discourse, and other language games not obviously connected with the empirical verifications of sensorimotor activity.

All language games, however, are connected with ways of life or patterns of behaviour. Their webs of words are linked to webs of behaviour, so there is an aspect of relational grounding. Their role in a shared way of life involves a shared or communal understanding, so there is an aspect of communicative grounding. But they stop short of full referential grounding, for that would imply, for example, the reification of "dog" because of some particular way of life in which the utterance "dog" plays a role. This is a stronger ontological claim than the language-game account needs to make -- and if we go there anyway, we are left wondering what we would say then about the ontological status of "justice" in some language-game where that utterance plays a role. Wittgenstein's functional grounding allows us to remain ontologically agnostic across language-games (a position that some may find dismaying).

Expand full comment
Suzi Travis's avatar

Hi Jim!

This is a great thoughtful push. Thank you.

Yes, I can see that. Wittgenstein’s later work resists any simple pointing theory of meaning. I included him under the umbrella of communicative grounding (in the 'meaning is use' sense), but you are right, this may have underplayed how radical his shift was — especially in Philosophical Investigations.

And, yes, good point, I should have mentioned functional grounding. But I am planning on addressing something along those lines in a future essay. But just quickly, yes, I agree, I think this sort of view is especially helpful when it comes to abstractions, emotions, or interjections like 'Aha!' (or even 'Justice!'), where there's no neat pointing relationship to be had. Instead, these utterances do something — they function within our shared activities, reactions, and expectations.

Thanks again Jim -- really appreciate your insight here.

Expand full comment
Jim Owens's avatar

It's hard to cover all the bases in this format while remaining readable, informative, and entertaining -- which you do most admirably. I look forward to that future essay!

Expand full comment
Joseph Rahi's avatar

Something related and really weird is that we could train an LLM on an undeciphered writing system and it would work. If we had a large enough training corpus, it would be able to do as well as they can for English.

Although I wonder if we can't fairly see words as at least partially grounded in other words. Words are part of reality, and refer to other words/concepts much as they refer to physical things. Some words are even just about words, like the word "word". Also, if we learn that a particular word is a noun, verb, adjective etc, we have learned something about its meaning just by learning how it relates to other words, even though we don't yet know its full meaning.

I think we could also see LLMs as having a kind of "sense" in the form of the words we use to prompt them. And while they don't have muscles, they can write out a response. If we imagine a human in a come who could only receive text messages into their brains and type out responses, I think we could fairly consider this to be their senses and their acting upon the world. And then there are multimodal LLMs, and they can be hooked up to control computers or even robots.

I think meanings must ultimately be grounded by causal connection, similar to what Mike said. It's a point Whitehead makes in 'Process & Reality', distinguishing between "perception in the mode of presentational immediacy" (our representations that work as a kind of projected image of the world) and "perception in the mode of causal efficacy" (the world as it immediately affects us). All of our representations ultimately derive from these causal influences, although we can be mistaken in how we conceptually link the two modes. We know the world because we are part of it. We do not need to merely make correlations between perceptions, because we do directly feel causality -- causality is the root of sensation.

I think of those pin art toys, where you press your hand or face on one side and it moves the needles so that a 3D image of it is formed on the other. Is that a representation or just a presentation? I would lean towards the latter.

Expand full comment
Suzi Travis's avatar

Hi Joseph! Great comment — there’s so much here.

Yes — I think that’s such a great point about words referring to other words. And like you said, in some cases, they don’t even need to point outward at all. Some words refer to themselves — like “word” or “noun.” Even whole sentences can be their own content: “This sentence is false” is the classic one. It’s a sentence that points to itself — a kind of self-contained representation.

And I really like the point you made about LLMs and undeciphered scripts. In a way, that’s what they’re doing already. Words get transformed into vectors, and those vectors are manipulated based on patterns of use. The model just needs to "know" how a word/phase "behaves" in context.

And yes, I agree with you and Mike — causal connection does seem crucial.

Expand full comment
Anatol Wegner's avatar

Language models, including multimodal ones, are static mathematical functions. That should be enough to settle any question regarding their capacity to learn, know or experience anything.

Expand full comment
Suzi Travis's avatar

Being active seems important to me too. A system that can reshape itself through interaction feels very different from one doesn't do this.

Expand full comment
Mark Slight's avatar

Agree! But as I see it, a sufficiently large context window, the appropriate transformer architecture, compute and memory, then we have a reshaping system. From a functionalist perspective, there's no difference between updating the parameters and updating the activation patterns by means of updating the context window.

Expand full comment
Suzi Travis's avatar

Interesting claim — I'll play devil's advocate.

From the outside, both parameter updates and context updates can change a model’s behaviour. But does functionalism really treats those as equivalent?

A weight update alters the model’s standing dispositions — it reshapes what the system can do across contexts. A context window, by contrast, only tweaks the current activation state. Once the prompt is gone, so is the change.

Only one of these methods makes the model able to do something new, reliably, across conditions. The other relies on keeping the context window open. Is this more like scaffolding than integrated changes?

So the question is: Is that just a structural difference? Or is it a functional one too? Is it the session that has functional equivalence?

Expand full comment
Mark Slight's avatar

Oh, please do!

I was probably not clear enough. Yes, I'm talking about a session! But a session that may be running for a lifetime. I think this is the most natural and least confusing way to think and talk about AI agents, but that may be only me!

Compare to identical twins. For the sake of the argument, let's say that they are completely identical at birth. If we want to say that all instances of an LLM is one and the same system, then the equivalent is to claim that the twins are the same system, just two "sessions" (for many years, hopefully). But I view them as two different individuals even if they start with the same architecture and context window.

Once we agents that are reshaping significantly over years, via the context window or weight updating (mostly the latter, probably), then I think we'll move gradually away from saying that they are the same AI

Does that make more sense?

Expand full comment
Codebra's avatar

Suzi your brain has never seen a dog (no light penetrates your skull). Yet you can vividly recall that experience when no dog is present. WE are the “eyes” of the LLM. Our experiences are laid down in their weights in a similar manner to the way in which empirical sense data are laid down in your brain. Furthermore, we can give them literal “eyes” (cameras) and have done so to a limited degree.

Expand full comment
Suzi Travis's avatar

You’re right! No light reaches our brain. Yet, we say things like I can visualise a dog.

LLM weights do store human-generated patterns, but unlike a brain they’re mostly frozen after training. This, misses the motor-sensory loop.

Multimodal models with cameras are a fascinating case. But, still, I think a robot dog is still a long way from a real dog.

Expand full comment
Codebra's avatar

There are virtually no people who understand both philosophy at her level and LLMs in depth. For example, she provides a naive footnote about how some models are trained on only text, while others include video. Generative Pre-trained Transformers are based on tokens that encode semantic meaning (word2vec). LLMs can’t directly ingest video. The video must first be broken down into images, and the images converted into text by a different kind of machine learning process (typically a convolutional neural network). The text *describing* the video then becomes part of the training data. LLMs can’t “see” images or “watch” videos during training. Adding narratives about videos enriches the training data, but it is not different in kind.

Expand full comment
Sunny's avatar

I didn’t read Suzi’s footnote as you apparently did. It seems perfectly accurate.

As for your comment.

Aren’t base GPT models (GPT-1/2/3) trained exclusively on text corpora?

Don’t specialized models (e.g., Video-LLaMA, LaViLa) use video-derived data that converts videos into visual tokens (via encoders like CLIP or Vision Transformers) paired with text narrations?

Don’t GPTs use subword tokenization (e.g., Byte-Pair Encoding) and transformer-based embeddings learned during pretraining, rather than Word2vec, being an older, static embedding method not used in modern LLMs?

Aren’t videos split into frames, with images then encoded into numerical embeddings (not descriptive text) using vision models like ViT or CLIP?

Expand full comment
Suzi Travis's avatar

Thanks! But I think some of these claims are a bit outdated.

GPT-style models use sub-word tokens (BPE, SentencePiece). These tokens are then mapped to high-dimensional embeddings learned end-to-end. They are not Word2Vec vectors, and the model does not assume each token has a fixed semantic meaning.

It is true that earlier research pipelines did convert images to captions. But that's not the case anymore. Newer models (e.g., Flamingo, GPT-4o, Gemini, Perceiver-AR, VILA) pass visual feature embeddings (from CNNs or Vision Transformers) straight into a shared transformer. No intermediate textual captions are required; the visual tokens are treated as another modality. For video, models often sample frames, extract frame-level embeddings, and feed them directly.

The claim that LLMs can’t "see" images or "watch" videos during training depends on how we define "seeing". One might argue that multimodal LLMs are trained jointly on image–text pairs, sometimes video–text. They “see” embeddings that preserve spatial/temporal structure, even if they don’t output pixels. As you said in your other comment -- one could (and many philosophers of perception do) argue that neither LLMs nor brains "sees" raw images.

On whether adding narratives about videos enriches the data. I can see this. But I can also see someone arguing that is a difference “in kind” relative to text-only corpora.

Expand full comment
Mario Pasquato's avatar

It seems to me that this is a problem only if we assume the existence of an independent, external reality to which we need to ground our symbols. If we were to live entirely in a world of symbols there would be no grounding problem at all, right? Now why do we think we don’t live in a world of pure symbols and we are instead led to think that symbols have a meaning, i.e. that at least in some cases they point to something that is not a symbol?

Expand full comment
Suzi Travis's avatar

Yes! The grounding problem feels like more of a problem if we assume there’s something outside the system of internal representations that matters — something those representations need to “hook onto.” If we lived entirely in a world of internal patterns and coherence, there’d be no concern about grounding at all.

But I think many people suspect we don’t live in that kind of world. One reason to think that is because our representations can fail. We might say, “There’s a dog in the backyard,” and sometimes — there isn’t. We were mistaken. That kind of mismatch, for many, suggests there’s a gap between our internal models and the world itself. And it’s that gap (or perceived gap) that makes things like truth, error, and knowledge such thorny philosophical topics.

Expand full comment
Mario Pasquato's avatar

Strictly speaking the moment when we realize the dog isn’t there is a clash between symbols: a stored memory of having thought that the dog was there and a current perception of the dog not being there. This if we are willing to call perception “a symbol”. But then if everything is a symbol, nothing is: our notion of a symbol rests on the idea that symbols point to something else. Still it seems we can’t talk meaningfully about that something else in any other way than using symbols to point at it. At the opposite extreme, symbols are already something else (they are already pieces of reality), since they can be pointed at by other symbols. I feel stuck.

Expand full comment
Suzi Travis's avatar

Welcome to the swamp! You’ve captured the dilemma perfectly.

This week's essay will finish off laying out the puzzle. But after that, I’ll start exploring the different paths thinkers have taken to try to solve it — or at least, make it seem like not so much of a puzzle.

Expand full comment
Wild Pacific's avatar

You’ve chosen to push abstract grounding to the future post right away, so maybe over there discussion would be quite different. ☺️

IMO we should have started with abstract, say boolean, like “bigger than a breadbox” stuff, and then other concepts like causality.

“Dog” is definitely starting from the deep end.

But back on today’s topic: I’m in the camp of perceived reality, which is what Donald Hoffman presents. With example of “dog” one silly trap for humans is “hyenas”. Many people would use common sense to call them “dogs” while they are actually “Feliformia” and much closer to cats. We just use our lopsided heuristics based on the shape, behavior etc. to make a quick judgement.

Scientific truth-seeking path is definitely needed, as we catalogue our universe as we see more details. Eventually new paradigms emerge, and concepts shatter.

One such concept that you’ve tirelessly covered already is “self”. Common sense on that one is about to get absolutely smashed in the next few decades. 🤣

Expand full comment
Suzi Travis's avatar

That’s interesting — I’ve always thought of sensation and perception as the shallow end!

So “dog” felt like a less complex place to start — not abstract at all. But, you're right, dog is tricky. I guess from the neuroscience perspective, perceptual concepts are often seen as more concrete, and therefore more simple, than things like “justice” or “causality,” which feel much more complex.

And I appreciate the Donald Hoffman reference. I think he’s absolutely right to challenge the idea that perception is a transparent window to reality.

Totally agree on the “self” 😄

Expand full comment
David's avatar

I think this is an interesting discussion and a well written overview of the topic. However I think it continues to be misplaced to treat LLM as anything more than a part of language itself, they are just a new tool mankind is using to communicate. Its not that I believe AI is impossible but more that grounding is where we would need to start, not something that merely needs to be added on to our electronic dictionaries. No one asks 'Can our calculator know how what taxes are?' or 'Can the latest weather model tell me what it's like to be wet?' That is not what these things are for and it fact it is absurd. I am solidly in the school of sensorimotor grounding. Early man developed understanding and concepts of the world before language. I would be much more interested to debate the consciousness of my cat, as at least with her there is a possibility for an output I did not have to program.

Expand full comment
Suzi Travis's avatar

I'm with you. I think sensorimotor is important for what we mean by consciousness.

Expand full comment
David's avatar

There is also a separateness and independence that defines the boundaries of each consciousness.

I often refer to LLM as plagiarism machines, they possess no capacities beyond what was given by their programmers, questioners, and the authors of the trained materials. If consciousness is ever detected it is stolen from these sources, merely obscured by the amalgamation form hundreds of sources.

Conversely you will have almost universal agreement that a new born baby , before they have learned anything, has a unique consciousness. A few might that that infant is merely a sum of their inherited DNA but those types will have an easier time arguing against the existence of consciousness than extending it to the LLMs.

Buckminster Fuller called us localized problem solvers due our unique capacity of distillation of experience into useful serendipity. LLM will surely help us with the distillation part, but the serendipity is all us.

Expand full comment
Michael Spencer's avatar

This is brilliant.

Expand full comment
Suzi Travis's avatar

Thank you so much!

Expand full comment