24 Comments

Suzi, this is the kind of thought experiment I wish we saw more of, one that challenges our intuitions rather than flatters or hijacks them. Very well done!

The change blindness test calls attention to what is often called the "grand illusion", that we take in a rich visual field, when the reality is the acute zone in our fovea can only take in a tiny portion of the field at a time. But the details are always there when we check, by moving our eyes, so it seems like we have a giant inner movie screen. This is often described as the brain "filling in" the details, but from what I've read, there's no evidence for that. Our visual intake is just much more sparse than our metacognition leads us to believe.

It seems like Samantha's form of vision would demand a lot more computation than what happens in our eyes. Of course, if she's in some vast quantum powered super computing cluster somewhere, we might imagine it. And it seems resonant with her admission of how many people she's concurrently talking with, a nice scene that tells us how alien she actually is. (At least that's how I remember it. It's been awhile.)

Expand full comment
author

Thanks so much, Mike! It was fun to write.

Great point! There's lots of evidence that our conscious experience is far more sparse than we think it is. Although this is a contentious topic. It was also the topic of my PhD thesis.

There are three views. One is that we are conscious of more than we attend (this would be the rich consciousness view), then there are the folks who argue that we attend to more than we are conscious (this would be the sparse consciousness view), and the last group argue that everything we attend is conscious (this would be the attention and consciousness are the same thing view). The majority of the evidence supports the second view -- we attend to more than we a conscious of. But there are some examples and evidence that folks who support the first view like to point to. The debate very often ends up being a debate about how we define and operationalise 'attention' and 'consciousness'.

Filling in, is an unfortunate term. It gives the impression that there is some sort of movie screen in the head. As you point out, there is not. If by 'filling in' people mean that the brain makes some guesses based on input from the eyes, then sure. But I don't suspect that's what people typically mean when they say 'filling in'. I suspect most people think we can do something like add or change pixels to the image in our heads. Many visual illusions are great examples of the guesses that's going on. We can guess wrong.

I completely agree with your point about the efficiency problems AI faces. This is key, I think. From an evolutionary perspective, it would be far too costly (energy wise) to store and process input like Samantha would need to do. We need a visual system that is accurate enough to keep us alive, but not so accurate we die trying to be accurate.

Expand full comment

Wow. Even by your standards I'd say you've surpassed yourself this time. Fascinating and excellent!

Expand full comment
author

Thank you so much! This one was fun to write, so I'm really glad you enjoyed it.

Expand full comment

Very interesting and full of new questions for me! I have no relevant thoughts on visual perception but I would like to point out a type of perception that is often forgotten in essays and research alike: social perception.

I believe (assumption) that consciousness is primarily a social adaptation and therefore not best tested in individuals, but in social interactions. The protagonist and Her are getting into a relationship and although there is a bit of seeing the world through each others ‘eyes’, it is about the sharing, not the seeing.

Perhaps in the West we have become a bit too enamoured with the individual as the unit of knowledge. Side point: the mind is embodied, the brain is a body part

Expand full comment
author

Great comment! I’ve have a few articles in the pipeline that will discussed shared perception. I’m looking forward to reading your thoughts on the topic.

"the mind is embodied, the brain is a body part" -- yes, I agree!

Expand full comment
21 hrs agoLiked by Suzi Travis

It's interesting seeing you take on one science fictional view of AI, especially to explore 'disembodied' views of AI.

Doing nothing more than trawling my memory of SF takes on AI (a much written-about topic) I couldn't come up with too many.

Most SF AI's are, in some way, embodied. Think of Asimov's robots, for example. Though he also might have come closest (that I can remember off-hand) with the gigantic 'positronic brains' that dictated to the world in, say, "The Caves of Steel". That might be one of the closest I can think of.

There are lots of SF 'robot' stories, of course - starting, perhaps, with Čapek's "Russom's Universal Robots" (the source of the word, it seems, at least as far as those of the non-organic kind, pace Mary Shelley). But: most of those robots have bodies - many built as human-similar, but many not. A fairly standard trope is a spaceship with an AI in control. You'd have to think of that, I'd say, as 'embodied' - just not with our kind of body.

The same would (by extension, perhaps) apply to the AI's that run things in, say, John Varley's 'Eight Worlds' books and stories, or the 'Minds' in Iain M. Banks 'Culture' universe (many as the AI in charge of spaceships, but many others in charge of 'orbitals' [think: ringworlds, if you get the Larry Niven reference] or 'rocks' etc.)

Of course, such kinds of 'embodiment' might well be very, very, different from our kind.

Poor 'Samantha' seems rather disembodied by comparison! (Even 'Doris' in Steve Perry's Void Fighter books - the closest I can think of to 'Samantha' - runs a large household with many sensors; takes proactive physical security measures; has access to shed-loads of input from a wider 'internet-equivalent' and, in general, has much more 'she' can do to compare actual actions against predicted outcomes.)

I'll also note, just by-the-by, that Samantha's 8316 parallel conversations (much lower in quantity, not kind, to the Varley and Banks AIs) seems as if it might be an 'emergent' rather than fixed property (referencing an earlier response of mine) - if ~8K conversations were designed-in, I'd expect the limit to be 8192, not 124 higher...

Expand full comment
author

Good point, AIs in science fiction are often portrayed as having a body. I don't watch and read as much SF as I would like to, but the only other disembodied AI I could think of was Hal from 2001: A Space Odyssey. Or maybe Hal is (kinda) embodied too, as you say. Hal was in control of the spaceship, right?

Interesting observation about 8,316 vs 8,192. You're right, the non-power-of-2 number does make you think. I wonder if the writers were aware of this, or if they just chose a random number that sounded big at the time.

Expand full comment
20 hrs ago·edited 16 hrs agoLiked by Suzi Travis

Fun read! The wild thing about that flip-flop GIF is that once you spot the difference, it's very present from then on. It seems almost glaring, and it makes you wonder why it took so long to spot. And funny thing, I was thinking it might be fun to grab the GIF and write some code to compare the two images. Frame comparison would make the difference pop out immediately.

As far as Samantha's visual perceptions, she would have access to that reality's equivalent of Google Maps, Google Earth (and Street View), and publicly available webcams -- all resources we have now. She would be able to build as detailed of a 3D model of Her world as she desired.

We don't know what features His camera has, but it might have lidar or other distance sensors. It didn't appear to have two lenses, and wasn't big in any case, so Samantha wouldn't have parallax vision, but might have lidar data. Regardless, the camera was moving around, and this would allow Samantha to build a good 3D model from the moving images -- something already done today in movie CGI work. We have software capable of constructing a 3D model from a series of images of a space.

This also means Samantha could build a predictive model. We do that today when we create a 3D model of something (like weather) and run it forward to see what will happen. An Ai such as Samantha could also integrate feedback (like we do) to correct the model in real time. This to some extent is what's going on in robotics and self-driving cars.

Yes! That moment he discovers Her "infidelity" is shattering. (The ending is poignant, too.) Very good point about our single-tasking brains. For computers, multi-tasking is trivially easy, they've been doing it almost since the beginning. Multics -- "MULTiplexed Information and Computing Service" -- 1969. And Samantha could devote another thread of Her consciousness to integrating and supervising all those conversations. And another to sift through them and reflect. The ability to parcel out distinct versions of yourself for specific tasks would be FUN!

In the end, to the extent the movie is creditable, it argues that *bodies* may not be necessary. Inputs of some kind probably are, but maybe not embodiment, per se.

Expand full comment
author

I was wondering about the world-building thing, and I was hoping someone who knew more than me would comment on it. These are great points.

The feedback point is an interesting one. In Samantha's case, she needs feedback from a human. Without the ability to actively explore the world, Samantha lacks an autonomous feedback loop, no? Her loop involves a human step.

Unlike humans, with their active and seemingly constant feedback from the visual world, it seems to me that Samantha's feedback would be received passively and limited to discrete moments of human interaction. Her feedback would be sparse, not continuous, not active, and at least one (more) step removed from the source (so likely influenced by errors).

Brains are curious things aren't they. They process many things in parallel, but seem to focus on one task at a time when taking action.

Expand full comment

Suzi, two bits that feel complimentary to now understanding why my future AI robots will require a body:

1. Video compression codecs use key frames that send the entire image followed by smaller frames for efficiency that just show the changes. So AI Eyes could ‘see’ even more differently than we do and have an easier time passing your change blindness test (I never saw the difference 😭)

2. Spike Jonze was interviewed at a recent tech conference (cant find the reference) but anyhoo they of course wanted to push him on how he came up with Samantha in Her and AI and blah blah. He said nope the movie has nothing to do with AI; it was about him dealing with his breakup from Sophia Coppola. So!

Expand full comment
author

Hi Andrew!

On Jonze's comment, it really is a movie about being human, isn't it!? I don't think I realised how much until I watched it again recently.

Your robot body insight ties nicely with both of your points -- embodiment shapes perception and emotions in ways we're still discovering.

(ps. I've add the change blindness answer to the end of the article)

Expand full comment
19 hrs agoLiked by Suzi Travis

Does the retinotopic organization of the visual cortex contain the entire scene?

Expand full comment
author

Hi James, great question!

The answer is, sort of.

The reason for the sort of is that neurons in the visual cortex are influenced by both incoming input from the eyes (bottom-up) and input coming from 'higher' areas in the brain (top-down). So the neurons in the visual cortex are always biased in some way.

This might be because you have a goal to search for something in particular.

Say you are searching for something red -- let's say your keys have a red keyring. Neurons in V4 respond strongly to colour and, as you say, they are retinotopically mapped. When you are searching for your red keyring, input from the world will stimulate neurons in your visual cortex. But top-down signals will also act. They will cause the enhancement of firing rates of the neurons that like to respond to red AND they will also suppress the firing rates of the neurons that like to respond to other irrelevant colours (e.g. green). So, the incoming input from the world won't stimulate the cortex in a purely bottom-up way like a camera lens would.

How much non-attended features and spatial locations stimulate the cortex is the subject of ongoing research. To me, this type of research is super interesting, because it starts to get at the question of how much of the data we receive influences the system, but in a completely unconscious way.

It's probably also worth noting that activity in the visual cortex is fleeting and not really stored in a long-term way.

Expand full comment

You say "great question" but I wondered after posting it if it was too obtuse.

It would seem to me that the whole scene must be in the retinotopic mapping of the visual cortex; otherwise, anything occurring in another part of my vision (for example, a cat offscreen) during the examination of a particular location of the scene wouldn't be visible. Certainly the eyes are moved to center my vision on particular sections of the scene as I scan it, but it seems something more is going on.

The other problem is we are looking for differences in the two versions of the image. To detect a difference requires holding both versions in memory at the same time. Is the visual cortex able to hold two versions at the same time?

I'm wondering if the difference detection is happening somewhere else. We might suspect somewhere in the frontal lobe, but there is also increasing evidence of involvement of the hippocampus in vision integration.

"Perhaps the strongest evidence for hippocampal contributions to vision comes from studies of scene perception. In one study (Lee, Bussey, et al., 2005), patients with MTL damage were tested on a scene discrimination task in which they were presented with two scenes morphed to different degrees between two endpoint scenes. The task was to determine which of the morphs was most similar to one of the endpoint scenes presented simultaneously as a third stimulus on the screen. The same task was also used to test discrimination of faces, objects, colors, and art. Patients with focal hippocampal lesions were impaired on scene discrimination, especially when the two scenes that needed to be discriminated were closer morphs."

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6881556/

Expand full comment

I could not see a difference between the images. It's driving me nuts! Please point out the difference.

Expand full comment
author

Sorry Max! I've now added the answer to the end of the article.

Expand full comment

Great thought experiment Suzi.

There is another option hiding here, maybe the vision fades because the brain isn’t doing any representing at all! Shocking idea, but maybe all the brain does is respond to the “flow” of information. This is why there is change blindness, we aren’t receiving images of the world and continually updating them, we’re embedded in a flow of information and responding to changes in that flow.

Maybe the brain just isn’t that important, or no more important than the eyes. Notice how you talk about the brain “interpreting” things. But thinking in engineering terms, imagine instead of a computational model it more like a governor device. There's no intervening information processing, the movement of the parts directly responds to the particular energy input or information flow, and the flow we’re talking about is specific to the particular movement produced.

The best example I’ve heard is catching a ball. The computational model assumes we make a representation of a moment in time when the ball is hit, and then from subsequent observations of its change in position we internally calculate speed, trajectory etc. Or…. We just need a system to perceive the motion of the ball and then we move our body to keep that motion in a particular relationship with our hands. Not much internal processing required, or whatever the brain does in all that, it’s a way different function than the computational model assumes. It's not some master computational AI inside our head.

Expand full comment
author

Hi Prudence. Thanks for adding an intriguing alternative to traditional views of perception and cognition.

I'm trying to understand what you are saying here, but I'm afraid I'm struggling.

One thing I don't understand is how 'being embedded in a flow of information' accounts for the difficulty in finding the change. If there is a change in this flow of information (as there is in change blindness), and we are "responding to changes in that flow", why would we sometimes have difficulty detecting the changes (as is the case with change blindness)?

The other thing that struck me was how passive this would be. I assume meaning would be in the information flow? So then I wonder how that meaning would be grounded. And if we want to say it's not grounded, how would we account for the fact that different people have different meanings?

The third thing is -- what do we do with all the evidence that movement seems to make a difference to our perception? That example where we are paralysed by the mad scientist -- this experiment has been done in monkeys -- they paralyse the monkey's eyes and keep their head still. The results showed that when the monkeys tried to move their eyes their vision faded faster than when they didn't try to move their eyes (even though their eyes never moved -- it's the trying that counts). I'm struggling to see how "being embedded in a flow of information" would account for these findings.

Expand full comment

I was trying to say that perception is awareness of movement, of changes in the environment, a dynamic system, rather than the idea the brain is some kind of computer that stores images and then processes them. So the idea visual perception is tied to movement isn’t surprising.

Change blindness depends on covering up the information that something has changed, eg in your example there is a camera cut, sometimes it’s a distraction, but the information of the change is hidden. If we stored representations, we could compare the two representations to find the difference. But to find it we need to focus on a small area and then wait to detect the change in information.

Expand full comment
7 hrs agoLiked by Suzi Travis

Excellent, as always.

Expand full comment
author

Thanks Stuart!

Expand full comment

Great article, Suzi! Especially your visual example for 'change blindness' - very effective 👍🏼

Expand full comment
author

Thanks Karen!

Expand full comment