If It Predicts, Is It Intelligent?

Apr 22

The Trouble with Words

71 Comments

Comment deleted

Comment deleted

Expand full comment

Haha! Fair.

Expand full comment

Comment deleted

Comment deleted

Expand full comment

Hey Jack!

I love this example.

In both cases —you reading the manual, or an AI inferring outcomes from patterns — the system gets it right. But does that mean it’s predicting in the same way?

This reminds me of the something called the grain problem (Jerry Fodor talks about it). Basically: at what level are two things supposed to be functionally equivalent? If we describe things too coarsely, we risk collapsing into behaviourism (only caring about the input and output). We miss the internal causal mechanisms. If we go too fine-grained, we risk losing the functional similarities entirely.

So when we say two systems predict, we need to ask: at what grain? Do they use the same internal structure, feedback loops, or role in a larger system? Or are we just noticing that both output the 'right' thing and calling it a day?

That’s why you duck test puzzle is great — how much do we need to know about ducks before we decide something's functionally a duck?

Expand full comment

Comment deleted

Apr 23Edited

Comment deleted

Expand full comment

"there's nothing like an other to make you think of self." -- isn't that true!

Thanks Jack!

Expand full comment

Nobody claims AI can’t reason. It clearly can. Chess playing robots reasoned within a narrow range 50 years ago. AI is not however conscious, nor is it intentional. The latter in particular is extremely obvious. We’d know very quickly if inanimate objects became truly self-aware and began to exhibit independently intentional behavior.

Expand full comment

inanimate objects would also have to be motivated, otherwise they'd just sit there observing and we'd never know.

Expand full comment

I agree — if a system is going to behave in a way that looks intentional, we’d probably expect some kind of goal or motivation (whatever that turns out to mean). That part makes sense to me.

But the second part — “they’d just sit there observing” — is where I’d push back a bit. I don’t think observation is something that just happens by default. It’s not passive. Observation, as I understand it, is an active process. Without action, I’m not sure what “just observing” means.

Expand full comment

I was actually thinking of something from a SciFi novel I read many years ago that included somebody of a race that didn't die but became a crystal that was "just a point of view".

Motivation is maximising some motivation hormone, probably an opioid analog.

Expand full comment

Cool! Do you remember the name of the novel? Sounds interesting!

Expand full comment

No afraid not. And CoPilot doesn't recognise it (nothing new there) It might come to me.

Expand full comment

Thanks for checking :)

Expand full comment

I agree. AI has shown reasoning capabilities for a long time. No disagreement there.

But that’s partly what I’m probing in the piece: not whether AI reasons or whether AI predicts, but what kind of function prediction plays in different systems — and whether using the same word for both might mislead us about their equivalence.

So when we say that an AI reasons, or an AI predicts, I wonder whether that depends a lot on what we mean by "reason" and "predict".

Expand full comment

Very nice intro on Ramsey sentences, and in using them to distinguish LLMs from brains!

On intelligence, as a functionalist, I'm in the blurry line group myself, although I like to think of it as a spectrum. But LLMs are still very far apart from animals, much less humans, on that spectrum. And prediction is the underlying functionality, but that's a relatively low level description of animal intelligence. In other words, we can have higher levels of organization that use prediction. And as you describe, LLMs are still missing most of that.

LLMs are very cool technology. But this rush to declare them like minds is not doing the AI industry any favors in terms of credibility. I don't think there's anything in principle stopping us from eventually having artificial minds, but we should be honest about where we are.

Expand full comment

Thanks Mike!

I also share your caution about the AI hype cycle.

Maybe it's just me, but it feels like a year ago there was more uncertainty about what AI was actually doing. I think a lot of people outside the field are still feeling that. But within the AI community, the tone feels different now. There's less uncertainty — people are more confident in their views. But because not everyone shares the same view, a lot of what's going on in AI now are heated debates. And a lot of those debates are about what's really going on under the hood — and whether that matters.

Expand full comment

Suzi, I'm not quite convinced but I haven't read the paper yet (busy today). Your example is very different from the sort of things that ChatGPT predicts. Is it proven that our intellectual musings also require efference? And anyway isn't that just an implementation detail?

I suspect that both humans and LLM's use the same underlying information flow but we've somehow optimised ours ("understanding"?) whereas they're all brute force.

"understanding" = predicting from one level up?

Expand full comment

You're right: the direct evidence for efference copies in abstract thinking is still debated. I’ll get into that more in future essays — but to answer your question: yes, I think there’s a case to be made that even our intellectual musings rely on similar predictive loops.

So, is it an implementation detail? I don’t think so. I think treating the brain as an input → processing → output machine misses something essential about why the brain can’t work that way.

LLMs never have to experience the consequences of their predictions. Their outputs are grounded only in the data we give them and the models we build. But brains don’t come pre-built. If the brain simply processed input, it would have no way to make sense of those inputs. The brain doesn't come with reliable external labels. The brain must generate its own structure from scratch, and, I think, the only way to do that is by having a different causal structure. We must act on the world, predicting the consequences of those actions, and then use feedback to adjust. Without that loop — without action first — there’s no way to calibrate the input.

Expand full comment

LLM is pre-tuned by mutiple sources saying the same thing.

We tune by ground-truthing, but I don't see how that can apply to cerebration.

Look forward to reading about that in future essays!

Expand full comment

I'm not sure reactive vs proactive is the key distinction here. Correct me if I'm wrong, but I thought the Bayesian Brain Hypothesis involved more "reactive" predictions as well, like our general belief formation. I thought it goes far beyond predicting the effects of our own actions.

I think a better distinction would be to look at what brains and LLMs do with their predictions. LLMs take their predicted next token and print it, and that's pretty much it. But brains use their predictions, test them, and revise them -- LLMs do none of that.

The toaster example seems to me like a good argument against functionalism -- it discards most of the causal structure we should be interested in! Eg we could create a tiny language model that responds to very short pieces of text, then create a program that produces the exact same outputs for every possible input, storing each input and output in a big table it simply refers to each time. In terms of their input and output they're identical, but I think it's clear they're not doing the same thing. We could further imagine a "Swamp program" where the output values in the table were determined randomly, and just by extraordinary luck happened to be the same as the language model and its imitation program. The *how* is, I think, extremely relevant.

Also, I don't think we should look at prediction as the "function" of LLMs. Their function is to produce coherent text, and prediction is more like the mechanism by which it accomplishes its function. For brains on the other hand, coming up with predictions does appear to be a pretty key function.

Expand full comment

Hey Joseph — this is such a great comment. Thank you!

I agree with so much of what you’re saying, especially the part about what systems do with their predictions. I think that really gets at the core question: at what level do we define the function?

You’re absolutely right that the Bayesian Brain view goes far beyond motor control. I didn’t mean to imply that active prediction only applies to movement — just that sensorimotor control gives us the clearest example of the loop structure in action. It’s the most experimentally grounded case we have of prediction being tightly coupled to consequences and updating. But I agree that what we call beliefs, perceptions, and internal musings might rely on the same architecture — but they are just more complex and harder to observe directly.

I also love the toaster and Swampman-style examples. I agree! If we take functionalism too shallowly — just input, black box, output — it starts to look like flat behaviourism. If we’re going to use a functionalist lens, then causal role has to matter.

Your last point is interesting: maybe prediction is a mechanism for LLMs, but a function for brains? 🤔 For LLMs, prediction serves a downstream task (text generation). But in brains, if the predictive loop is the function — the thing that organises perception, action, and maybe even consciousness — then the distinction isn't just about outputs. It’s about what the system is for. And that brings us right back to the question of grain — at what level do we define the function?

Expand full comment

This comment is your prompt and “sensory stimulus” to tell me how a human, according to Predictive Processing theory, is allegedly different from an LLM who will also receive this very prompt. Go!

Expand full comment

Now you're reading my mind! (predicting my mind). How did that happen

Expand full comment

Haha! A question posed like a true philosopher should.

I’ll be digging into this more in the coming weeks, but here’s the short version of how I see it.

Yes, we can describe this comment as the input -- the sensory stimulus -- followed by some internal processing and then a response. And on the surface, that process might look the same in both LLMs and brains. But I think there’s a crucial difference worth pausing on -- and it comes down to grounding. By grounding, I mean that in order to learn, we need a second opinion -- a reliable source that helps us calibrate.

For LLMs, that grounding happens during training. The model is exposed to huge amounts of data, and the second opinion comes from the training: Was the next token prediction correct? Over time, this builds a statistical model of language where the grounding is baked in. So when an LLM receives a prompt, it maps it onto those learned patterns and outputs the most likely next token. Which is a perfectly good model for an LLM to have. Whether its output is accurate or not doesn’t matter to the LLM. There’s no consequences for the LLM.

This is not true for brains. Brains probably possess some innate wiring, but they largely build themselves. If we processed input the same way as an LLM -- input → process → output -- it seems to me that we'd run into a grounding problem. Sensory signals would arrive, neurons would fire… but then what? Where does the second opinion come from? What would that activity mean? Visual neurons spiking are just neurons spiking. Unless those spikes can be tested against the world -- there’s no way to ground the input.

In systems like brains -- systems that build themselves -- I can't see how input alone could be enough. We need a way to calibrate. And that means we need a loop. I don't see how a strictly feed-forward system could work. Unless, of course, we evoke a little homunculus man to do the interpreting and grounding for us. But, of course, that idea is nonsense.

Expand full comment

Thank you! I suppose I’ll have more to say on this, and will put it in a standalone post. Just super briefly for now: I think recurrence is indeed an important feature of vertebrate nervous systems. I do think there are nonetheless ways that LLMs both predict and do active inference. I also tend to think most appeals to the importance of “grounding” are overblown. Stay tuned. Thanks again and keep up the terrific work!

Expand full comment

Thank you! And I look forward to reading your thoughts.

Expand full comment

Suzi Travis wrote:

"For LLMs, that grounding happens during training. The model is exposed to huge amounts of data, and the second opinion comes from the training: Was the next token prediction correct? Over time, this builds a statistical model of language where the grounding is baked in. So when an LLM receives a prompt, it maps it onto those learned patterns and outputs the most likely next token. Which is a perfectly good model for an LLM to have. Whether its output is accurate or not doesn't matter to the LLM. There's no consequences for the LLM."

There's actually no reason why a machine learning system couldn't be built that updated its prediction models based on new training data even while it made predictions for test data based on its existing model. Back in the 1990s I developed a data mining and machine learning system based on models in the form of old- fashioned decision tree ensembles (no LLMs or neural networks involved). Then I built an interface to that system consisting of a front end that received both training and test data in real time, and a back end that periodically built models based on the training data, unhooked the existing model that was in use by the front end, and replaced it with the new "smarter" model. Meanwhile the front end received both training data that included the approved correct predictions and test data for which the system produced new predictions. Similarly, an LLM could continually be fed "second opinions" based on the accuracy of its predictions and update its "grounding" accordingly. I'm not an expert on LLMs, but my guess is that such self-updating systems already exist.

The real issue is whether an artificial system, whether based on an LLM or something more sophisticated, will ever produce predictions comparable to those produced by a human brain, both in accuracy and in the diversity and complexity of the sensory stimuli to which the system is subjected.

Expand full comment

Fascinating! I want to rummage through your thoughts on a whole bunch of topics!!

One thing I am curious about: do you think the same kind of approach could realistically scale to something as huge and complex as our modern LLMs? Or would the engineering challenges (like stability, latency, compute costs) make true real-time updating impractical at that scale?

Expand full comment

"Fascinating! I want to rummage through your thoughts on a whole bunch of topics!!"

Well, Suzi, in my many years on planet Earth I have squirreled away, in various dusty corners of my brain, thoughts about a vast range of diverse topics. But if you can narrow down your queries to a manageable number, I will try to dig around in the depths of my psyche and convert the results of my searches into coherent responses (edited as necessary for public consumption).

Ever since I was a youngster I've played a game in my head I call "Devil's Advocate". I can't recommend this game to others without a (whimsical) caveat: "Warning: professional thinker on closed neural pathways. Do not try this at home!". What I do is pick an issue--it could be political, economic, religious, philosophical, whatever--and conduct a debate in my head between two sides that I conjure up from a part of my brain where I store personas of various persuasions: libertarian, progressive, conservative, etc. I try to keep these different personas stocked with ideas by reading and listening to a wide variety of points of view. Anyway, the two sides duke it out in my head, refining their arguments, making points and counterpoints, etc., until I decide to give it a rest and get back to "real life" for a while.

"One thing I am curious about: do you think the same kind of approach could realistically scale to something as huge and complex as our modern LLMs? Or would the engineering challenges (like stability, latency, compute costs) make true real-time updating impractical at that scale?"

That's a good question. It seems like current LLMs suffer from serious stability problems and escalating computing costs even without real-time updating. Despite their impressive performance in some areas, LLMs still lack sufficient real world knowledge and sensible motivation, and this leads them to respond to some queries by "going off the rails" and making statements that are completely ridiculous to us mere mortals. But I'm not an expert on LLMs, so for a cogent analysis of the challenges of real-time updating, you will need to consult someone more experienced in this area. One problem is that the development of AI and LLMs is proceeding so rapidly that even those immersed in it have a hard time imagining what the technology will look like just a few years from now.

Expand full comment

Oh, the Devil’s Advocate Debate Team — I love that image! I think we could all use a bit more of your wisdom, Dave. What a great exercise for learning, but also for keeping us curious. And I suspect it goes a long way toward promoting more empathy too.

Expand full comment

Wisdom? Me? Not sure of that, but I am reminded of an old joke:

An angel appears in a puff of smoke to a man and says to him, "Because you have lived a good and virtuous life, I can offer you a gift: you can be the most handsome man in the world, or you can have infinite wisdom, or you can have limitless wealth." Reflecting, the man says, "I'll take the wisdom."

"Wisdom is yours," says the angel, disappearing in another puff. The smoke is barely clear before the man thinks, "I should have taken the money."

Expand full comment

Interesting! Over the weekend I was writing something with this theme to post about, though it takes a far harder line. It’s called “The Magic of Computational Functionalism” and blames the rise of this perspective on Alan Turing’s test — if something seems conscious to us, then it effectively is. What you’ve written here Suzi may be good for potentially moderating the most extreme “AI equates with consciousness” contingency, though it admittedly doesn’t get to the heart of the matter, or the very thing that I consider to violate causality. Hopefully I’ll get the details of my post straightened out next weekend for publication.

Expand full comment

Oh! Great. Looking forward to that one.

Expand full comment

Michael Pingleton

I would mostly agree with the functionalist idea here. Just because AI can give us such a convincing illusion of intelligence and human-like qualities, that doesn't mean that it is the same. And that's just it: an illusion. We can do useful things with it, sure. But to say that two things are the same based on this principle just doesn't seem helpful to me. The causal structure does matter in my opinion, to a point. I do view functions as simply an abstraction of complexity, but that's just my take.

Building on the toaster example, we can say that the causal structure can be different between different types of toasters. We see the hidden layer of "heats up bread." But what method do we use to heat up the bread? By running electricity through a resistive heating element and placing that heating element near the bread, we get an electric toaster. Maybe by focusing sunlight through a lens onto the bread, we might get a solar toaster. If we place the bread into a box, assume the bread to be both simultaneously toasted and not toasted until we observe its state, we get Schrödinger's toaster. This being said, we could say that all three examples are either similar or dissimilar based on the level complexity, or level of abstraction, that we look at them from.

As important as considering the complexity of the causal structure within the function, I think it's also important to consider the complexity of the inputs and outputs. Briefly going back to the toaster example, we might have different types of bread: wheat, white, sourdough, pumpernickel, etc. Although we call all of the above "bread," and can even run them through the same function, they're not quite the same on a deeper level.

So, how do we apply this thinking to intelligence and sentience? I think that we are simply dealing with two different types of intelligence here, being processed through two different types of functions. They appear similar on the surface, but the differences become clear as soon as you dig any deeper. This somewhat goes back to your article about analyzing levels of complexity. It seems you somewhat allude to this idea in the article; I thought I would expand on it here. I think we do need to understand the causal structure of each before we can say they are similar or not.

Here's another idea: functions inside of other functions. Yo dawg, I heard you like functions, so we put a function inside a function so you can calculate functions while calculating functions!

Expand full comment

Yes! — at a coarse grain, things might look the same: input goes in, something happens, output comes out. But once you zoom in on the causal structure — how the input is processed, how the model is built, what kind of feedback loop is (or isn’t) in place — the differences really start to show.

The challenge is finding the right level of granularity for the question we're asking.

In the essay, the authors give an analogy: “We call both a bird’s wing and a plane’s wing “wings” not because they are made of the same material or work the same way, but because they serve the same function. Should we care whether a plane achieves flight differently than a bird? Not if our concern is with purpose — that is, with why birds and planes have wings in the first place.”

But that raises the question: at what level do we define purpose?

If we mean “lift against gravity,” then yes — same purpose.

But at the biological level — evolution, survival, mating — the purpose is very different.

Expand full comment

Just gonna throw in the perspective that airplane wings are part of the human extended phenotype, just like the bird's nest (although it's wings are it's phenotype, not extended phenotype). Airplane wings are part of memetics which is as evolutionary a process as bird wings are! Building airplane wings is not fundamentally different than chimps making tools. Societies with technology dominate and grab more resources and gain control.

Expand full comment

Apr 22Edited

Very well written as always! This cuts right to some of the most important issues which I think are the source of much confusion and disagreement.

In my view, you're a bit mistaken here. Disclaimer: I have Sam Harris syndrome (quite smart, but over confident, not engaging properly in the work of experts. I think I got it all figured out just thinking about this in my free time). As usual, I could be misunderstanding you or LLMs, or brains for that matter.

I maybe will elaborate later but I think these questions will make my critique clear enough:

-what precedes the brain's initiation of an action? For example, when catching a ball, or when starting to write an article. Isn't there always sensory "input" preceding that initiation? Whether recent or not. In the case of catching a ball there's an obvious input preceding any action. Also, in the case of catching a ball, there is first an input, and then an initiation of an action without any feedback (for a short while) before feedback starts having an impact, right? So the initial response is a simple input-output mechanism.

-are you sure an LLM is not initiating an action (like a "first impulse" pointing in the direction of a token) and then adjusting it as its attention mechanism and whatnot processes various parts of the context window, during the generation of a token? I'm not saying that's the case, I don't know the transformer well enough. Maybe you can straight up answer this question.

-isn't an LLM's autoregressive recursion during inference analogous to how I initiated this sentence with "isn't" and then initiated the next step based on the previous word and the context etc? (Anthropics demonstration of LLM's "thinking ahead" several tokens strengthens this analogy). As I view them, LLM's are initiating an action and may are course correcting adjust as they go along, since they are changing the input slightly for every token they output. (on a related note: I believe one strong reason for hallucinations is that they have trained on written text and not spoken text. They have very little "oh wait, what am I trying to say, that doesn't make sense, I take that back" in their training data since humans typically correct their text rather than write like that. Thus that human course correcting mechanism, often manifest in spoken language, is very scarce in the training data. Therefore they have not learned to course correct in this way - they haven't been encouraged to do so. The opposite is true, mimicking written text they have been encouraged to stick to their guns once they start a thought! Reasoning models only partially can compensate for this).

-If we widen the scope even more and look at an LLM's behaviour throughout several prompts and responses (which I see no reason not to do), isn't then the LLM's action initiation and sensory feedback (user prompts combined with the LLM's own output and the interplay between them) which guides further token generation very much analogous to what you're describing that humans do?

In my mind (for the purpose of this topic), the main differences between me and an instance of an LLM are 1. Multimodality (obvious). 2. the LLM's activity pauses completely between prompts, while my internal "token generation" goes on indefinitely. 3. I'm constantly in an evolving training / post-training / fine-tuning state. The line between what I hold in my context window and what I store in by adjusting weights is blurry.

I don't think the distinction between different meanings of prediction quite works in the end. But that's just me. Anyway, thanks. Really interesting stuff! I'm learning a lot from your posts! Keep it going!!

Expand full comment

Apr 23Edited

Haha, I appreciate the “Sam Harris syndrome” disclaimer — I’ll take the curiosity and confidence any day!

You raise some great points — many of which I’ve touched on in replies to other comments here or will dig into more in future essays. You’re right that it often seems like there’s always sensory input preceding action. That’s how we’re taught how it works — it’s what I was taught as an undergrad. I just think it’s wrong.

If we were literally a feedforward system — input → process → output — where sensory input precedes action, we’d get hit in the face by balls a lot. Neurons are too slow. What lets us catch the ball is that the brain is already predicting its trajectory, adjusting mid-movement, and using feedback to refine the model on the fly.

That’s the loop I’m pointing to — action guiding perception, not the other way around. And it’s that grounding-in-consequences that I think makes all the difference.

Expand full comment

Thanks Suzi! Will check out your other replies.

Yeah I totally get that (I believe). What I'm saying is that a perception of a ball coming at all precedes the initiation of the action. Yes, I'm totally on board with predicting the trajectory!! Wouldn't be catching many balls at all otherwise.

What I'm trying to say is that we are literally a feed-forward system for a few milliseconds or for a synchronised depolarisation or two or whatever. This is the relevant comparison to the feed-forward system that the transformer generating a token is.

During a long chat with an LLM (or even during a multiple-token output) it is not a simple feed-forward system. During one multi-token output, you could always zoom out state that you're putting in a simple input and getting a simple output. That's true at one level. But the same goes for a human if we zoom out. You put your comment as input and receive this comment as output.

I don't think I need reasoning models for this argument, but I think it's more convincing if you think about them.

Thanks!

Expand full comment

Apr 23Edited

You’re right to point out that for a tiny window — maybe a “synchronized depolarization or two” (or oscillation periods and spike volleys) — brain activity can look feedforward. But I think we need to be cautious here not to mistake that appearance for actual causal structure.

In a system as interconnected and loop-driven as the brain, even short bursts of directional activity are embedded in ongoing self-organised dynamics. Causation gets tricky here: what looks like a linear flow may actually be shaped by upstream inhibition, adaptation, or internal loops (or even feedforward inhibition). So drawing conclusions from timing can be misleading.

So yes — things can look feedforward at a fine temporal scale. But the brain is never truly feedforward in the way a transformer is. It's always entangled in its own feedback.

And there’s also the issue of parallelism. The brain isn’t a single pathway waiting for input — it’s massively parallel, with multiple circuits running at once, often influencing each other in real time. Even during early sensory processing, top-down and lateral signals are shaping the response. So while a sequence of spikes might look feedforward on a raster plot, the underlying system is anything but linear or serial.

Expand full comment

Apr 23Edited

Thank you! I still feel like we're talking past each other here! I totally agree with almost all of that. I still feel like I'm not getting my point across and I think I have not been clear enough.

I think looking at a transformer is myopic. The generation of a single token is not recursive or loopy in any way. But an LLM is not reducible to a transformer. How much better is gpt4 vs gpt 3 if we look at a single token? How intelligent are llm's at this level? Can we judge a chess engine, or a chess player it's/his first move? (or any single move?). I just don't think it's relevant that the transformer in isolation is feedforward. After the first token, the causal chain of the LLM is no longer a simple feed-forward system. Looking at the transformer in isolation is not enough to understand the capabilities or functionality of an LLM. You must look how the state of the transformer dynamically changes during the autoregressive token generation, and what the unfolding process accomplishes.

Loopy, recursive processing is a high-level phenomenon. It's not that we don't see it when we zoom in on a single action potential. It just isn't there. It's not that we don't see life when we zoom in on a protein. It just isn't there. Is the depolarisation of the protein part of a high-level loopy process? Yes! In this sense it is entangled. But so is everything else. You cannot have a conversation with a transformer or with a neuron (or with a human during a short slice of time). Never judge an LLM by it's transformer!

Recursion, integrated action initiation and prediction are high level emergent phenomena in complex life forms. It's not fundamental. Likewise, it's not fundamental in the transformer (zero evidence). But it's emergent when the loop starts (from token 2 and onwards). The system as a whole is predicting its own future tokens which influences the current token generation. It's predicting how the human will react and develops it's output accordingly.

I think this is the proper functionalist analysis. We shouldn't care about what is or isn't evident at some arbitrary level (proteins, neurons, transformers). A brain is not feed-forward in the way it's subcomponents are. The same is true for an LLM in action. Important parallels are life (made of dead parts), consciousness (not made of conscious parts), agency/control (no agency/control whatsoever at the most fine-grained levels).

In the paralellism issue - Isn't a transformer internally parallel similarly to a brain? Not as much of course, but still. Genuinely asking. Of course the brain has much more obvious parallelisation of vision, auditory processing etc. And an LLM can only output one token at a time. But at the same time, each token selection is the result of parallel processes which interact and influence the choice. The context window acts as a buffer and is part of the loop. I feel like whether there is parallelisation or not also depends on the level of zoom.

Ok, clearly having a bad Sam Harris syndrome day. You're the expert here. I'm clearly in the minority view but I think I'm not totally alone. I feel generally aligned with the first author of the piece you're referring to and to Elan Barenholz. If I'm not misunderstanding everything, maybe they'd disagree. Totally possible!

Thanks for engaging!

And for the essay.

Expand full comment

Thanks so much for this, Mark! I really appreciate your thoughts! I think I’ll leave my full response for a future essay (or six!), where I can properly work through the ideas you're raising. But before I do that, I just want to make sure I’m understanding your view correctly.

Would this be a fair summary of what you’re saying?

You’re not arguing that LLMs are feedforward in the way transformers are. Quite the opposite. You’re saying the transformer architecture may be feedforward for generating a single token, but an LLM in action, as it moves from token to token, interacting with users and unfolding across time, becomes something more complex.

In your view:

-- The system-level dynamics matter more than the architecture of its parts.

-- Recursion, prediction, and loopy structure are emergent properties that arise once the loop starts (for example, from token 2 onward), and especially across turns in a conversation.

-- Just as we wouldn’t judge a brain by inspecting a single neuron, or life by looking at proteins, we shouldn’t judge an LLM by inspecting a transformer layer.

-- The functionalist analysis, for you, happens at a higher level of organisation and when we look there, LLMs might exhibit something structurally and causally closer to what predictive brains do (even if the analogy isn’t perfect).

-- And finally, you’re asking: isn’t there some form of internal parallelism at work in transformers, too? Even if it’s different from biological parallelism? Maybe it depends on what level of zoom we’re working at.

Did I get that right? Let me know if I’m mischaracterising anything or missing anything important. And truly, thank you again. This is great feedback. It’s helping me sharpen how I want to approach these issues in the future.

Expand full comment

Suzi! Oh, how I wish that I could express my views as clearly and concisely as you can express MY views! Too much uncoordinated parallellism going on here. It kinda resembles the CoT in reasoning models more than it resembles the output... Anyway, a few comments, as I am not able to stop.

1. Yes, exactly.

2. Yes, exactly.

3. Yes, exactly.

4. Yes, exactly. Not a perfect analogy, but much closer than comparing a transformer to a whole brain, I think!

5. Yes, exactly. I have in mind two kinds of related parallellism. A: multiple emergent "agents" inside the transformer fighting for influence over the choice of the next token. Possible analogy: when seeing a face, multiple "agents" in the FFA (and related areas) fight for influence over other systems so as to, in effect, the high-level system (the brain/person) identifies the face as belonging to a certain person. B: "serial parallellism". While the transformer has no internal memory of the previous token generation, by definition any internal "agents" die when the token is generated. However, since that token is now part of the context window, which is identical except for this most recent token, the same or almost the same agents will now wake up from the dead within the transformer. In this model of mine, if I ask an LLM what the capitol of Australia is, "Sydney" is already among the top agents within the transformer when it types out tokens "the", "capitol", "of", "Australia". By top agent, I don't mean that it necessarily is among the top most likely tokens to be chosen. I just mean that it is influential in selecting all the preceding tokens before "Sydney" is chosen. (Arguably, "Sydney" would also be among the most likely tokens when "the" was chosen, while not competing with "capitol" or "of", but that's beside the point). In this model, there is a parallelism to be observed across multiple token generations.

6. (unsolicited bonus comment). In your response to Pete Mandik, you wrote: "For LLMs, that grounding happens during training. The model is exposed to huge amounts of data, and the second opinion comes from the training: Was the next token prediction correct? Over time, this builds a statistical model of language where the grounding is baked in. So when an LLM receives a prompt, it maps it onto those learned patterns and outputs the most likely next token. Which is a perfectly good model for an LLM to have. Whether its output is accurate or not doesn’t matter to the LLM. There’s no consequences for the LLM."

-Your characterisation here seems to me to be the consensus (although I'm not following that closely). But again, I think this is a somewhat myopic view. Neurons, or brains, don't fundamentally "care" about any consequences either! My neurons have no fear of death. Having consequences that "matter" (survival, replication, death) are descriptions we apply with the intentional stance.

It think it is fair to apply the same intentional stance to LLM's. They are evolving replicators, in a particular environment (of humans and GPUs). In this environment, during training there is a natural selection process where weights with less desired performance fail to replicate. At a larger scale, whole LLM's have a fitness and replication success (more instances, running on more GPUs) when humans like them. They have offspring and interbreeding. GPT 2 and 3 are extinct (practically) but have offspring, where the desired qualities have replicated and evolved. Llama is one of the many parents of DeepSeek R1, and now DeepSeek R1 is one of many parents in all the newer models (presuming it has features that everyone are incorporating).

In other words, LLMs use humans to replicate and evolve. much like modern wheat has used humans to be an incredibly successful plant, and how wolves have used humans to become very widespread (in the form of dogs).

Please note, whether we have an internal narrative "I don't want to die" or "she is cute" is not to be confused with what "matters" here. In the perspective I am pushing, it matters equally for a plant to replicate as it does for a human.

Thank you for showing appreciation. Really looking forward to the upcoming six essays or so!! Keep it up!

Expand full comment

Continue thread →

I have seen the style of rhetoric employed in this Noema article before: it takes advantage of the looseness and polysemic nature of everyday language to stitch together some ideas that do not fit together all that well, leading to a conclusion which looks both solid and profound, until you look at how it was made. Specifically, this one broadens 'computation' to encompass just about any causal process, and by squeezing intelligence into the 'prediction' bucket, it encourages us to pay no attention to how different intelligence is from a host of simple systems which can also be stuffed into the same bucket.

For example, the authors jump from an analogy between DNA and the tapes needed for a Universal Turing Machine to the conclusion "Von Neumann had shown that life is inherently computational", ignoring the fact that you need quite a bit more than tapes to make UTMs, and you need quite a bit more than DNA to have life.

At this point, you may be thinking that the claim can be rescued by being more thorough in one's analogies, and I believe it is true to say that cellular biochemistry contains all the mechanisms needed to make a UTM. The problem here, however, is not that the claim is unjustifiable, but that it is not useful: if you are persuaded that life really is inherently computational, then the claim which really matters here - that intelligence (at least biological intelligence) is computational - falls out as a given, but what this line of thought fails to do is make any real progress towards an understanding of what intelligence is and how it works.

Again, you might be thinking "what about neural nets? In their case, a biological analogy seems to have been very useful." I agree, but one can make that analogy without any reference to DNA, and it does not require one's acceptance of the very broad claim that life is inherently computational, either.

More or less the same can be said for defining intelligence as prediction: there are many not-very-smart systems which can be described as predictive (for example, my house's thermostat 'predicts' that unless the heat is turned on, the temperature will fall below its set value.) My understanding of how thermostats work tells me nothing about intelligence.

This sort of rhetoric can be thought-provoking, but one should not take it too literally.

Expand full comment

Apr 22Edited

I haven't read the article yet and maybe you're right but the first author I have (perhaps prematurely) decided is a genius. Check out his appearance on Sean Carroll's mindscape podcast!

Expand full comment

As the Brits (and Aussies?) say: brilliant!

Put me down for prediction being a big part of how our brains work, and possibly necessary for intelligence, but not at all sufficient for it. I quite agree with your footnote about understanding and reasoning. Likely also embodiment.

How would you cast the kind of prediction involving, for example, predicting the murderer in a murder mystery or guessing what a friend will say next? I'm curious, because these cases seem less about motor control. (And now I'm wondering how good an LLM might be, fed an original murder mystery, at predicting the murderer.)

With regard to motor control, catching a ball seems an interesting instance of seeing and predicting its path and then initiating motor control for the catch.

In any event, I think you've nailed it that the functional structures differ considerably.

Expand full comment

Thanks! And brilliant right back at you!

I love the murder mystery example. That kind of prediction feels far removed from motor control, doesn’t it? And maybe it is. But I’d argue it still fits within the broader predictive loop. I’ll get into why in a future post — now that you’ve mentioned murder mysteries, I might just steal that example!

Catching a ball gives us a clearer version of this loop, I think. As I wrote in reply to Mark, I don't think catching a ball could be a feedforward system — input → process → output. If it were we’d get hit in the face by balls a lot. Neurons are too slow. What lets us catch the ball is that the brain is already predicting its trajectory, adjusting mid-movement, and using feedback to refine the model on the fly.

Expand full comment

Just wanna clarify here too that I meant that the perception of the situation that the ball is coming my way precedes everything else :) catching the ball is totally loopy.

Expand full comment

Apr 23Edited

Haha! Yes, indeed! The catching of a ball is a beautifully loopy process.

That said, I'll add that even the perception that a ball is coming your way is preceded by action. We don’t see in any meaningful sense unless we move our eyes — in fact, it takes many micro-movements (saccades) just to build a stable image. There’s a cool phenomenon called the Troxler effect, where if you hold your gaze fixed, your vision fades. I think this is a nice reminder that vision depends on movement.

I wrote a bit about it here if you’re curious:

https://suzitravis.substack.com/i/142687004/q-why-are-bodies-important-for-understanding-the-world

Expand full comment

😄 Getting hit in the face reminds me of teaching my Black Lab puppy to catch thrown treats. At first, they'd bounce off her head, but her brain learned to track and catch them pretty quickly. The first few times were pretty funny, though.

Baseball is a fascinating example of trajectory prediction. Just last night our center fielder had to run, run, run — while keeping his eye on the fly ball — to make a last second diving catch to end the game in our favor. Definitely a case of constant feedback and correction to catch outfield flies.

We see pure prediction in action with batters. They have about 0.4 seconds from when the pitcher releases the ball (about 50 feet away) until it reaches them, and they have to swing at where they hope the ball will be or decide to 'take' the pitch hoping it's a ball. The reason for so many strikes is a failed prediction. The art of pitching is all about fooling the batter; the art of batting is trying to outguess the pitcher. There's just no way in 400 milliseconds to process the incoming data accurately.

Expand full comment

Consciousness is a puzzle worth taking seriously. You're laying out how the pieces fit together. I'm still wrapping my head around it all, but thank you for this blog.

Expand full comment

Thanks so much, Marcus!

Expand full comment

Apr 24Edited

Hi Suzi. I have been busy (well not compared to just about anyone else in the family) with a new grandchild, so whilst I have skimmed your essays of late, I haven’t been listening to your reading them. My loss. Brilliant. (I thought some flattery and praise overdue 🙂).

I wonder if listening makes me feel that I’m following your arguments better? That’s got to be an eponymous effect in the literature somewhere?

Anyway, excellent essay.

I believe that we’re pretty much prediction engines as befits our evolutionary heritage as a prey species. It was the case that septo-hippocampal systems were thought to be paramount in this and Jeffrey Gray wrote on this and did some very elegant research in the field. He was a most lucid thinker.

Thanks again for another great essay.

Expand full comment

Oh! Congratulations, John!!! How exciting. I am so looking forward to having the moment of a new grandchild one day.

And thank you so much for your kind words :)

Expand full comment

Suzi—fascinating article. I always learn so much from your articles.

It strikes me that software engineers arguing/promoting LLM’s and even AI as equal to humans or human intelligence is like scientists declaring God doesn’t exist. Wrong mindset and wrong tools for the job.

More specifically LLM’s are integrated datasets not processes. An LLM can’t ‘do’ anything—any more than a large library can do something other than sit there and collect dust—or fuel a large fire. ML and AI are a processing methodologies which ‘do’ things but have no value—and no value capacity—or rational capacity outside of human configuration or shaping. Which is human value systems applied to machine output.

An analogy—autonomous driving vehicles (Waymos for example) are not possible without electric motors with single speed transmissions or internal combustion engines with automatic transmissions. Some of these more sophisticated transmissions predict the needed gear based on speed, load, and incline/slope, etc. So automatic transmissions are making decisions and predictions to create motion and velocity. Are they then conscious? It’s laughable to imagine such a thing.

And yet, integrated into a complex system of sensors, mapping, steering algorithms and voila! We get autonomous vehicles—until they run out of charge or fuel when they become immobile blobs of complex machinery sitting in the middle of the road. That said, when working the device has no valuation of whether the trip /route it is following is for groceries, for taking someone to the ER, for bank robbery, or for kidnapping. It’s just a device.

Likewise LLM’s contain the imitation of human language / speech but have no ability to value the output. This apparent function is inserted by humans based on cultural preferences and human valuations. For that matter applying any value to ML or AI outputs from LLMs is from human valuation—it is not inherently valuable but must be configured into some value proposition. It is a machine.

Human beings on the other hand have inherent value. One value being the ability to transfer human values into machines. Thus a machine can be configured to predict based on human values (it has none of its own)—whether it is to move along quickly and smoothly (transmissions) or to configure diverse data sets into a probability or predictive outcome (ML or AI working with an LLM). Automatic transmissions with information and language.

Expand full comment

Hi Dean,

I like that you identify value as the difference between us and our machines. So what is “value”? Unfortunately that’s a highly confused subject in academia. It technically resides under philosophy, though philosophers only ponder the supposed rightness versus wrongness of behavior rather than the goodness versus badness of existing. Psychologists avoid value as well. Fortunately at least economists presume that good resides as feeling good and bad resides as feeling bad. Let’s presume they’re right.

So what does the brain do to create something that feels good/bad? I’m referring to “the loop” that Suzi has been talking about here (not that she’s mentioned it directly in terms of value, but I’d say this is the same loop). Consider the possibility that the brain produces an electromagnetic field that itself resides as value, or consciousness, or ultimately “you”. So when the field that is you decides to do something, like scratch an itch, it affects your neurons to cause them to fire so that your muscles function that way. Until some such account becomes empirically validated, I’m afraid that the “chatbots are becoming conscious” crowd will continue to predominate in science and society in general.

Expand full comment

Eric—interesting to consider value and brain processes. Yes feeling good/bad has a value proposition in it, but I’m not sure this is a definition of value. I can recall many times that I knew something wasn’t going to feel good—could be bad or just neutral predicted feeling—but valued something produced as the outcome. Simple example is going to a Dentist for a round of teeth drilling. No good feeling there, best I hope for is no feeling. Yet I value my teeth as a device so I go for the fugitive unpleasantness. The benefits or results are valued more than the feeling—I get no good feeling from having functional chompers.

Having been paralyzed in parts (or entire sides) of my body from various nasty medical conditions a few times in my life (not currently a condition I’m dealing with) I can say definitively that a broken feed back loop is not pain free but agonizingly excruciating—sometimes for many years. So there must be something to what you are saying about value and electromagnetic fields. That said, I didn’t feel less me—even though I felt progressively less capable in functioning. In fact I felt more condensed and more me because I was becoming alienated from my body and brain function. It seems I’m temporarily existing in a body (in coordination with a brain/body and am affected by it) but have little doubt that when this body completely stops working, I will still exist. The push / the willful drive in me to find a way to make this body function (because I still value life and physical existence) is part of what has made recovery possible—this is beyond /outside of broken electromagnetic fields /signals which were painfully dragged back into function by my pushing and rerouting the physical material stuff of my body—not the other way around. So I have my own empirical evidence that the ‘field that is me’ is partly, or wholly, a result, not a driver or generator of me-ness.

Expand full comment

On your dentist scenario Dean, I’ve noticed that’s how my perspective mainly hits people initially — feeling good rather than bad can’t be all that’s valuable because we sometimes choose to do things that presently feel bad to us. This was one of the things that I tried to address in my first Substack post. https://eborg760.substack.com/p/post-1-the-instantaneous-nature-of

Essentially feeling good/bad in the present is highly dependent upon our perceptions of what will happen in the future. So if you have reason to “worry” about dental problems knowing how horrible they can feel, the thought of not getting this taken care of professionally should itself feel bad in the present. Thus to potentially help alleviate this particular worry there’s the option of getting treatment. Furthermore there’s potentially “hope” that can directly feel good in the present. The more you perceive that certain dental treatment would give you a generally more happy life, the more it should feel good to you presently to get this done even given perceived discomforts. Feeing confident of a given plan should feel better that feeling unconfident for example, and since the first will be hopeful while the second will not.

On paralysis feeling excruciating, that’s what I’d predict as well. Just because one’s muscles won’t do what the brain tells them to do, shouldn’t stop the consciousness loop from making one feel bad about such inabilities. It’s good to hear that you’re doing well on that again!

In a psychological sense I suppose in the end it doesn’t matter if brain creates mind or if mind somehow works in tandem with the brain/body for a while. We are what we are regardless of the metaphysics behind it. In a sense I feel like I’m more able to effectively discuss psychology with people who directly posit mind to be magical than people who’ve mistakenly accepted a magical position of mind by means of the Turing test and general popularity of science fiction. So in my next post (or #3) I’ll see if I can effectively display their error.

Expand full comment

Apr 27Edited

Good question! What, the heck, is value?

It is a topic addressed by many in academia. Philosophers in ethics with their rightness vs wrongness, but also intrinsic goodness and existential value in areas like hedonism, utilitarianism, existentialism which address things like goodness, well-being, and the goodness of existence. Psychologists are also interested in value too. Positive psychology is interested in well-being, meaning, and happiness. Cognitive psychology and affective neuroscience gets into emotional valence (good/bad feelings) and value-based decision-making.

I always thought classical utilitarian economics (e.g., Bentham), equated utility with pleasure and pain (feeling good or bad). But modern economics, uses broader, often rational-choice and revealed preferences models where "value" may not map neatly to simple feelings. Is that right? And behavioural economics (e.g., Kahneman) explicitly separates what he calls experienced utility (feelings) from decision utility (choices).

Expand full comment

To me the complicated mess that you describe here Suzi is just begging for someone prominent to come along and razor the crap out of it! Unfortunately no one prominent seems up for the job. I’ll do what I can in the mean time anyway. And to stay in line with Occam’s nominalism as well, I find it helpful to at least occasionally take the time to state the question fully. So it’s not what value “is”, but rather what’s “a useful definition” for this humanly fabricated term? Here I mean the value of existing in itself rather than any other ideas. Consider this scenario:

I suspect you agree that existence didn’t matter to anything that existed before there was life on our planet. I doubt the emergence of life change this either, or even brain instructed life. In the evolution of brains however I think there must have been a point where value emerged in an extremely basic sense. This would necessarily have been epiphenomenal for a while because that’s how evolution works — it serendipitously takes things that just happen to exist and sometimes finds functional uses for them by means of its processes. Furthermore when value bearing entities were given the opportunity to affect organism function somewhat, there must have been at least one iteration that succeeded well enough to indeed evolve. Thus value went on to create the purpose based conscious form of brain function in general that you know of quite personally as yourself.

It could be that there are elements of modern economics that try to get away from the basic premise that feeling good/bad is what ultimately drives conscious function. In the 90s I had a basic education it the field though don’t keep up on it in general. I consider its clean value dynamic to be the premise that permitted it to develop a vast collection of professionally undisputed models, though of course some may be moved to add complexity in that regard. Back then I figured that psychologists wouldn’t have much to teach me since they hadn’t yet achieved this value premise. I still think I was right about that. So perhaps society helps dissuade psychology from adopting the simple value premise that’s succeeded for economics because its centrality causes it to more overtly conflict with the social tool of morality, or something that should instead rewards us for portraying ourselves altruistically?

In any case to found all other mental and behavioral sciences upon the clean value premise that traditionally founds economics, I suspect we’ll need a respected community of meta scientists that’s able to do what philosophers have not, which is to say, provide science with an agreed upon value premise from which to build. But it could be that there won’t be enough incentive to establish such a foundation until after the physics of value becomes empirically demonstrated. You know the physics that I suspect will become so validated. Thus our weakest sciences should finally begin developing various highly successful models given that premise.

On Bentham, I think he made an admirable attempt but was also ultimately ensnared by the social tool of morality as philosophers in general seem to be. So instead of directly positing the value of existing to exist as how good/bad any defined subject feels from moment to moment (whether that subject is individual or a social collection of individuals), he instead posited that what’s “right” is to promote the greatest happiness for the greatest number. Though economists did adopt his “utility” term, fortunately they use it the way I advocate rather than to moralistically differentiate supposed right versus wrong behavior.

Expand full comment

Thank you so much, Dean. This is such a thoughtful comment.

You’re raising a really important point about value — the question of where value, meaning and caring actually come from is a critical one.

One of the big challenges for computational physicalists is exactly this concern you raise. If humans are just very sophisticated physical prediction machines, why does it seem like we have things like goals, caring, and value? It's not just that machines like LLMs don't have intrinsic values — the computational physicalist also needs to explain why humans think they do.

There are a number of moves the computational physicalist makes to try to deal with this problem. That’s what I’m hoping to dig into in some upcoming essays.

But one thing we need to keep in mind, is that the physicalist is working with abstractions here. And when we draw analogies between systems, we have to be careful: it’s easy to gloss over important differences. The concern is that if we choose the wrong level of abstraction, we might not sharpen our understanding — we might actually blur it.

Expand full comment

Joe: You know what the greatest example of artificial intelligence is?

Moe: No, tell me.

Joe: It is the GPT thermos bottle.

Moe: Yeah, How's that?

Joe: You put icy stuff in it and it keeps it cold, you put boiling stuff in it and it keeps it hot. The amazing thing is: how does it know?

Moe: It predicts it!

Seriously, we are talking about two very different architectures--causal structures--when we describe prediction in LLMs (predicting the next word) and prediction in naturally intelligent animals (predicting the result of one's action in the umwelt). LLMs are reflexive architectures (input->output) that are pre-programmed with a training set assembled by intelligent agents (knowledge engineers). Their function is to predict the next word. Natural intelligence is a continous cycle that acts in a sensed environment (umwelt) and expects (or predicts) some change from an intentional action. If the change resulting from said action is unexpected, we learn something. AI and LLMs do not learn (see https://tomrearick.substack.com/p/ai-does-not-learn). Apples and oranges both have seeds, but that does not make them equivalent. Nor does prediction in AI make it equivalent to natural intelligence.

Suzi, as you state so clearly, the two systems have different causal structures.

Expand full comment

Thanks, Tom! I love the thermos joke! And yes, exactly — different causal structures mean that even if the surface behaviour looks similar (both predict), how they do it is different.

If it is true that the causal structures are different, then the question is: does this difference matter? The problem with words like prediction, is that they are slippery. And when abstractions get slippery, we have to be extra careful we have the right abstraction.

Expand full comment

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts