Suzi, this is the kind of thought experiment I wish we saw more of, one that challenges our intuitions rather than flatters or hijacks them. Very well done!
The change blindness test calls attention to what is often called the "grand illusion", that we take in a rich visual field, when the reality is the acute zone in our fovea can only take in a tiny portion of the field at a time. But the details are always there when we check, by moving our eyes, so it seems like we have a giant inner movie screen. This is often described as the brain "filling in" the details, but from what I've read, there's no evidence for that. Our visual intake is just much more sparse than our metacognition leads us to believe.
It seems like Samantha's form of vision would demand a lot more computation than what happens in our eyes. Of course, if she's in some vast quantum powered super computing cluster somewhere, we might imagine it. And it seems resonant with her admission of how many people she's concurrently talking with, a nice scene that tells us how alien she actually is. (At least that's how I remember it. It's been awhile.)
Great point! There's lots of evidence that our conscious experience is far more sparse than we think it is. Although this is a contentious topic. It was also the topic of my PhD thesis.
There are three views. One is that we are conscious of more than we attend (this would be the rich consciousness view), then there are the folks who argue that we attend to more than we are conscious (this would be the sparse consciousness view), and the last group argue that everything we attend is conscious (this would be the attention and consciousness are the same thing view). The majority of the evidence supports the second view -- we attend to more than we a conscious of. But there are some examples and evidence that folks who support the first view like to point to. The debate very often ends up being a debate about how we define and operationalise 'attention' and 'consciousness'.
Filling in, is an unfortunate term. It gives the impression that there is some sort of movie screen in the head. As you point out, there is not. If by 'filling in' people mean that the brain makes some guesses based on input from the eyes, then sure. But I don't suspect that's what people typically mean when they say 'filling in'. I suspect most people think we can do something like add or change pixels to the image in our heads. Many visual illusions are great examples of the guesses that's going on. We can guess wrong.
I completely agree with your point about the efficiency problems AI faces. This is key, I think. From an evolutionary perspective, it would be far too costly (energy wise) to store and process input like Samantha would need to do. We need a visual system that is accurate enough to keep us alive, but not so accurate we die trying to be accurate.
Your thesis sounds very interesting. I might have to look it up, if it's accessible anywhere.
I think on another thread you mentioned that it covered Michael Graziano's attention schema theory, which fits with us attending to more than we're conscious of. And your point about the debate often being about how we define attention and consciousness matches a long running issue I've noticed for years, that consciousness is a very protean concept, one that has been stretched and contorted over the centuries, to the point that many debates about it are people arguing past each other with different definitions.
My take is that debating definitions is unproductive. It's better to acknowledge the different conceptions and label them, and then discuss each of them in turn. I think when that happens we end up with some productive concepts that can be tractably investigated, and others that are completely metaphysical and beyond science's ability to adjudicate.
On AI, one thing that I think confuses the issue is that we often talk about AGI (artificial general intelligence) as though we ourselves are a general intelligence. But we're not, We're intelligences heavily specialized for surviving in our environments. It's easier to see with non-human animals, that brains exist to make movement decisions, with everything optimized for that, which I think your thought experiment illustrates beautifully.
Suzi’s PhD is on her website: https://suzitravis.com/neuroscience/. It gets detailed very quickly but I was able to understand Chapter 1. It’s beautifully written.
Very interesting and full of new questions for me! I have no relevant thoughts on visual perception but I would like to point out a type of perception that is often forgotten in essays and research alike: social perception.
I believe (assumption) that consciousness is primarily a social adaptation and therefore not best tested in individuals, but in social interactions. The protagonist and Her are getting into a relationship and although there is a bit of seeing the world through each others ‘eyes’, it is about the sharing, not the seeing.
Perhaps in the West we have become a bit too enamoured with the individual as the unit of knowledge. Side point: the mind is embodied, the brain is a body part
Great comment! I’ve have a few articles in the pipeline that will discussed shared perception. I’m looking forward to reading your thoughts on the topic.
"the mind is embodied, the brain is a body part" -- yes, I agree!
It's interesting seeing you take on one science fictional view of AI, especially to explore 'disembodied' views of AI.
Doing nothing more than trawling my memory of SF takes on AI (a much written-about topic) I couldn't come up with too many.
Most SF AI's are, in some way, embodied. Think of Asimov's robots, for example. Though he also might have come closest (that I can remember off-hand) with the gigantic 'positronic brains' that dictated to the world in, say, "The Caves of Steel". That might be one of the closest I can think of.
There are lots of SF 'robot' stories, of course - starting, perhaps, with Čapek's "Russom's Universal Robots" (the source of the word, it seems, at least as far as those of the non-organic kind, pace Mary Shelley). But: most of those robots have bodies - many built as human-similar, but many not. A fairly standard trope is a spaceship with an AI in control. You'd have to think of that, I'd say, as 'embodied' - just not with our kind of body.
The same would (by extension, perhaps) apply to the AI's that run things in, say, John Varley's 'Eight Worlds' books and stories, or the 'Minds' in Iain M. Banks 'Culture' universe (many as the AI in charge of spaceships, but many others in charge of 'orbitals' [think: ringworlds, if you get the Larry Niven reference] or 'rocks' etc.)
Of course, such kinds of 'embodiment' might well be very, very, different from our kind.
Poor 'Samantha' seems rather disembodied by comparison! (Even 'Doris' in Steve Perry's Void Fighter books - the closest I can think of to 'Samantha' - runs a large household with many sensors; takes proactive physical security measures; has access to shed-loads of input from a wider 'internet-equivalent' and, in general, has much more 'she' can do to compare actual actions against predicted outcomes.)
I'll also note, just by-the-by, that Samantha's 8316 parallel conversations (much lower in quantity, not kind, to the Varley and Banks AIs) seems as if it might be an 'emergent' rather than fixed property (referencing an earlier response of mine) - if ~8K conversations were designed-in, I'd expect the limit to be 8192, not 124 higher...
Good point, AIs in science fiction are often portrayed as having a body. I don't watch and read as much SF as I would like to, but the only other disembodied AI I could think of was Hal from 2001: A Space Odyssey. Or maybe Hal is (kinda) embodied too, as you say. Hal was in control of the spaceship, right?
Interesting observation about 8,316 vs 8,192. You're right, the non-power-of-2 number does make you think. I wonder if the writers were aware of this, or if they just chose a random number that sounded big at the time.
Fun read! The wild thing about that flip-flop GIF is that once you spot the difference, it's very present from then on. It seems almost glaring, and it makes you wonder why it took so long to spot. And funny thing, I was thinking it might be fun to grab the GIF and write some code to compare the two images. Frame comparison would make the difference pop out immediately.
As far as Samantha's visual perceptions, she would have access to that reality's equivalent of Google Maps, Google Earth (and Street View), and publicly available webcams -- all resources we have now. She would be able to build as detailed of a 3D model of Her world as she desired.
We don't know what features His camera has, but it might have lidar or other distance sensors. It didn't appear to have two lenses, and wasn't big in any case, so Samantha wouldn't have parallax vision, but might have lidar data. Regardless, the camera was moving around, and this would allow Samantha to build a good 3D model from the moving images -- something already done today in movie CGI work. We have software capable of constructing a 3D model from a series of images of a space.
This also means Samantha could build a predictive model. We do that today when we create a 3D model of something (like weather) and run it forward to see what will happen. An Ai such as Samantha could also integrate feedback (like we do) to correct the model in real time. This to some extent is what's going on in robotics and self-driving cars.
Yes! That moment he discovers Her "infidelity" is shattering. (The ending is poignant, too.) Very good point about our single-tasking brains. For computers, multi-tasking is trivially easy, they've been doing it almost since the beginning. Multics -- "MULTiplexed Information and Computing Service" -- 1969. And Samantha could devote another thread of Her consciousness to integrating and supervising all those conversations. And another to sift through them and reflect. The ability to parcel out distinct versions of yourself for specific tasks would be FUN!
In the end, to the extent the movie is creditable, it argues that *bodies* may not be necessary. Inputs of some kind probably are, but maybe not embodiment, per se.
I was wondering about the world-building thing, and I was hoping someone who knew more than me would comment on it. These are great points.
The feedback point is an interesting one. In Samantha's case, she needs feedback from a human. Without the ability to actively explore the world, Samantha lacks an autonomous feedback loop, no? Her loop involves a human step.
Unlike humans, with their active and seemingly constant feedback from the visual world, it seems to me that Samantha's feedback would be received passively and limited to discrete moments of human interaction. Her feedback would be sparse, not continuous, not active, and at least one (more) step removed from the source (so likely influenced by errors).
Brains are curious things aren't they. They process many things in parallel, but seem to focus on one task at a time when taking action.
Much of my love of SF comes from the world-building authors do. Many of their insights have come to pass. (And many not so much. I'm always amused by the SF that thought fax would be a thing on spaceships, and it's astonishing how few predicted personal communication devices. Something the Dick Tracy comic got right. But now I want one of those flying trashcans.)
The movie doesn't give us much about how Samantha works, how she was created, or what capabilities the hardware has. Better than 2013, certainly. LLMs didn't come around until 2020, but we might assume she's a product of that research or something similar.
Projecting from the capabilities we have today, Samantha would have access to streaming video (and all the other interweb resources) and could build her predictive and interactive capabilities by "watching" movies and (as we sometimes do) "guessing" what comes next. Then using the differences between her guesses and what does happen to improve her model. She potentially could "experience" more in an hour than we do in days. We can assume she functions at least as fast as today's machines, and probably faster.
But, yeah, this is all passive to some extent, and you'd have to think real-time interaction with a human would be different. Working without a net, so to speak. Makes one wonder if there might be some equivalent of stage fright. A nervous shuffling of the RAM?
I think "parallel" has distinct meaning between brains and computers. It might be another place the brain=computer analogy is misleading. Our parallelism is always directed at a single task, having a mind. A comparison might be the parallelism of an orchestra, all the players cooperating towards a single symphonic goal. In parallelism in computers, we have separate tasks -- the whole point of the parallelism. Sometimes that's accomplished with separate hardware, and sometimes a single CPU task-switches rapidly, something very easy for CPUs. More and more I agree with those thinkers who've said the brain=computer thing has been misleading.
Suzi, two bits that feel complimentary to now understanding why my future AI robots will require a body:
1. Video compression codecs use key frames that send the entire image followed by smaller frames for efficiency that just show the changes. So AI Eyes could ‘see’ even more differently than we do and have an easier time passing your change blindness test (I never saw the difference 😭)
2. Spike Jonze was interviewed at a recent tech conference (cant find the reference) but anyhoo they of course wanted to push him on how he came up with Samantha in Her and AI and blah blah. He said nope the movie has nothing to do with AI; it was about him dealing with his breakup from Sophia Coppola. So!
The reason for the sort of is that neurons in the visual cortex are influenced by both incoming input from the eyes (bottom-up) and input coming from 'higher' areas in the brain (top-down). So the neurons in the visual cortex are always biased in some way.
This might be because you have a goal to search for something in particular.
Say you are searching for something red -- let's say your keys have a red keyring. Neurons in V4 respond strongly to colour and, as you say, they are retinotopically mapped. When you are searching for your red keyring, input from the world will stimulate neurons in your visual cortex. But top-down signals will also act. They will cause the enhancement of firing rates of the neurons that like to respond to red AND they will also suppress the firing rates of the neurons that like to respond to other irrelevant colours (e.g. green). So, the incoming input from the world won't stimulate the cortex in a purely bottom-up way like a camera lens would.
How much non-attended features and spatial locations stimulate the cortex is the subject of ongoing research. To me, this type of research is super interesting, because it starts to get at the question of how much of the data we receive influences the system, but in a completely unconscious way.
It's probably also worth noting that activity in the visual cortex is fleeting and not really stored in a long-term way.
You say "great question" but I wondered after posting it if it was too obtuse.
It would seem to me that the whole scene must be in the retinotopic mapping of the visual cortex; otherwise, anything occurring in another part of my vision (for example, a cat offscreen) during the examination of a particular location of the scene wouldn't be visible. Certainly the eyes are moved to center my vision on particular sections of the scene as I scan it, but it seems something more is going on.
The other problem is we are looking for differences in the two versions of the image. To detect a difference requires holding both versions in memory at the same time. Is the visual cortex able to hold two versions at the same time?
I'm wondering if the difference detection is happening somewhere else. We might suspect somewhere in the frontal lobe, but there is also increasing evidence of involvement of the hippocampus in vision integration.
"Perhaps the strongest evidence for hippocampal contributions to vision comes from studies of scene perception. In one study (Lee, Bussey, et al., 2005), patients with MTL damage were tested on a scene discrimination task in which they were presented with two scenes morphed to different degrees between two endpoint scenes. The task was to determine which of the morphs was most similar to one of the endpoint scenes presented simultaneously as a third stimulus on the screen. The same task was also used to test discrimination of faces, objects, colors, and art. Patients with focal hippocampal lesions were impaired on scene discrimination, especially when the two scenes that needed to be discriminated were closer morphs."
The research would suggests that if the cat is not moving (or drawing attention to itself in any way) and you're not looking for cats, then you would be unaware of the cat. It seems strange, but that's what the majority of the research suggests. Remember the gorilla example? Where you are asked to count the passes between people wearing white shirts. The first time you saw this, you probably had no idea that a person in a black gorilla walks onto the court, stops and beats it's chest, and then walks off the court. You don't see the gorilla unless you are paying attention to the gorilla.
Good point, we would have some sort of memory that is at least keeping track of where we have checked.
There's also evidence that scenes are processed differently to objects in a scene. This might account for the feeling that we are taking in the entire scene, but clearly we are not processing all of the objects and details in the scene.
Cool paper, Nick does great work. I haven't read this one, so thanks for the link.
I was trying to figure where the not moving cat would get dropped from awareness. Is it dropped before it even reaches the brain or after?
The gorilla experiment seems related but different because some subset of viewers will see and be aware of the gorilla; whereas, the non-moving object , I think, drops from everybody's awareness.
There are some bottlenecks at the retina, in the fact that cells in the retina respond selectively to particular wavelengths within the visible spectrum, but the rest needs to be dealt with by the brain.
I'm not sure that the cat and gorilla are that different. I guess if we didn't ever move our eyes (or our head) they would be different. But, it seems, to me, that because we move our eyes, they would both be situations where we miss something because those things are not attended. Some people will will see the gorilla because their attention is captured by the gorilla. And I assume some people would also see the cat for the same reason.
I'm in a crowded room of people conversing. I filter my hearing to the person I'm conversing with but still would probably hear a different conversation if my name were mentioned.
Coherent consciousness seems all about filtering. Compare Bergson's filer theory. We have to pick out the apparent relevant from the irrelevant because we have limited computational power.
On why it’s so hard to detect what’s missing in one picture versus the other, I think it’s disguised entirely by movement. This is to say that the first frame is slightly offset from the second so the whole thing moves. I consider us to have inherent short term memory of images, and this is what lets us detect movement, or the thing that changes. They would have proved my thesis here I think if they’d have also cycled the two images in the exact same spot. If in that case we’d easily see the “moving” difference then I’m right, or if not then I’m wrong.
The first frame is not offset on the x or y axis. But there is an offset in time.
There are actually four images presented: Two images of the street and two grey images. The street scene images are presented for 700ms, and the grey images are presented for 100ms. The second street image was edited using Canva’s eraser feature to remove the street lamp. When I made it, I didn’t realise that the eraser feature also compresses the image, so there is a slight pixel difference throughout the entire image. This is what you might be picking up as movement. I got the sense of movement too, but I suspect the perception of movement might be mostly an illusion. I've added a gif with the grey screens removed. (You can see the pixel differences if you zoom in close on that gif).
Thanks Suzi, I realized my mistake about the image movement after I made that spur of the moment comment. But rather than try to edit my comment I decided to just let it go. So yes, it clearly was the grey frames that disguised the difference. Of course you proved this by removing those frames, and in accordance with theory, the difference became quite clearly displayed as movement between the two images.
There is another option hiding here, maybe the vision fades because the brain isn’t doing any representing at all! Shocking idea, but maybe all the brain does is respond to the “flow” of information. This is why there is change blindness, we aren’t receiving images of the world and continually updating them, we’re embedded in a flow of information and responding to changes in that flow.
Maybe the brain just isn’t that important, or no more important than the eyes. Notice how you talk about the brain “interpreting” things. But thinking in engineering terms, imagine instead of a computational model it more like a governor device. There's no intervening information processing, the movement of the parts directly responds to the particular energy input or information flow, and the flow we’re talking about is specific to the particular movement produced.
The best example I’ve heard is catching a ball. The computational model assumes we make a representation of a moment in time when the ball is hit, and then from subsequent observations of its change in position we internally calculate speed, trajectory etc. Or…. We just need a system to perceive the motion of the ball and then we move our body to keep that motion in a particular relationship with our hands. Not much internal processing required, or whatever the brain does in all that, it’s a way different function than the computational model assumes. It's not some master computational AI inside our head.
Hi Prudence. Thanks for adding an intriguing alternative to traditional views of perception and cognition.
I'm trying to understand what you are saying here, but I'm afraid I'm struggling.
One thing I don't understand is how 'being embedded in a flow of information' accounts for the difficulty in finding the change. If there is a change in this flow of information (as there is in change blindness), and we are "responding to changes in that flow", why would we sometimes have difficulty detecting the changes (as is the case with change blindness)?
The other thing that struck me was how passive this would be. I assume meaning would be in the information flow? So then I wonder how that meaning would be grounded. And if we want to say it's not grounded, how would we account for the fact that different people have different meanings?
The third thing is -- what do we do with all the evidence that movement seems to make a difference to our perception? That example where we are paralysed by the mad scientist -- this experiment has been done in monkeys -- they paralyse the monkey's eyes and keep their head still. The results showed that when the monkeys tried to move their eyes their vision faded faster than when they didn't try to move their eyes (even though their eyes never moved -- it's the trying that counts). I'm struggling to see how "being embedded in a flow of information" would account for these findings.
I was trying to say that perception is awareness of movement, of changes in the environment, a dynamic system, rather than the idea the brain is some kind of computer that stores images and then processes them. So the idea visual perception is tied to movement isn’t surprising.
Change blindness depends on covering up the information that something has changed, eg in your example there is a camera cut, sometimes it’s a distraction, but the information of the change is hidden. If we stored representations, we could compare the two representations to find the difference. But to find it we need to focus on a small area and then wait to detect the change in information.
I'm sorry Prudence, I completely misunderstood you. Thank you so much for clearing that up. Yes, I agree, the brain doesn't store images like a computer does.
And you are so right, the change blindness example only works because there is a slight break between the images. In this example, images are presented for 700ms and then a blank grey screen for 100ms. Without the 100ms break, the onset of the change would be interpreted as movement, and easily found. The break has the effect of disrupting any sustained pattern of neural activity.
Sometimes with a change blindness setup, I get the sense that I have detected the change in an area, but haven't yet got exactly what the change is yet. That feeling always fascinates me. It's like detecting a change and detecting the identity of the change are not the same thing.
Eye fixation duration and location during reading is comparable to Samantha’s viewing one object at a carnival, say, a Ferris wheel. Whereas the human reader has a sensational image that looks like [ferris wheel] the disembodied AI has a sensation like 🎡. Human vision sends the immediate image [ferris wheel] for immediate processing and an efference image that serves to feed forward in predictive mode. For example, the human eye might expect to “see” (limit the possibilities) an image like [tilt-a-whirl] during the upcoming fixation. Can you help me understand the conclusion cognitive scientists have sold as a “fact” that word perception is finished before the saccade? That [ferris wheel] is somehow routed in sequence through a phonological processor that recodes the letters into sounds in isolation, sends these sounds into long short-term memory, achieves lexical access (selects a meaning for [ferris wheel]), connects this meaning to an accumulating “mental model” of a text, and then begins again at ground zero to redo the process of word identification? What about this efference copy? Does it not apply to the visual perception of words as objects?
Hey Terry! These are excellent questions. It will take some time to answer (and I’ll need to brush up on the latest research), so perhaps it’s a good topic for an article?
You write, "Or are we talking about something completely different — a kind of intelligence so unlike our own ...."
Yes, I think so, an intelligence so unlike our own. I like this example...
Bacteria defend themselves from invading viruses by grabbing a chunk of the virus DNA, storing the virus DNA within the bacteria's own DNA, and then referencing the stored virus DNA to identify future invading viruses so as to present the most effective defense. When we do things like this we call it intelligence.
Bacteria have no brain or nervous system. Are they intelligent?
Maybe it's the concept of intelligence which isn't all that intelligent?
(I have no expertise in this so this is just an intuitive response to your excellent article.)
I don’t quite get why the AI perceptions would need to be embodied. I don’t expect that AI vision will necessarily work the same way as ours just as robots can have wheels instead of legs and microphones instead of malleus, incus, and stapes. They are nothing like our bits but they still work. Sometimes better than ours.
Our perceptions may be embodied but AIs or AI researchers might figure out how to do vision without our complexity. Many of our features ended up complex as a quirk of evolution and AIs may be able to skip that bit and come up with something better that doesn’t require a body or clever little interactions with the motor system.
They’ll need a camera for vision and microphones for hearing but I don’t think they’ll need all that other complexity.
The argument might be that there's a difference between active and passive perception. If perception is passive — input received, processed, then responded to — this same process can be applied to entities we don't typically assume have experiences. Even a simple thermostat does this to some degree: it receives temperature information, processes this input against a set threshold, and responds by turning heating or cooling on or off. However, we don't generally attribute consciousness or experiences to thermostats.
Under this passive view, we are often left with the unsatisfying conclusion that consciousness or experience emerges simply with enough complexity. The implication is that with sufficient sophistication in the input-process-respond system, consciousness somehow appears. But this leaves us wondering: At what point does a system become complex enough to generate consciousness? And why should increased complexity suddenly give rise to subjective experience?
I think the line to be crossed is to build an AI brain that is able to reflect on the sensations and combine them with memories and thoughts and desires and to consider the most desirable action. A thermostat does not do that; it simply responds to a sensation (the temperature.
There is still a line of complexity to be crossed of course, but it doesn’t require the separate mind of the dualists. Perhaps subjective experience or consciousness is just the ability to reflect on a sensation and replay it while we consider what to do with it.
I think you have raised some really good questions in this one about embodied consciousness and visual perception. Looking forward to reading what follows. Also, I couldn't point out the difference in the images which is fascinating.
This post suggests that our host Suzi has some sympathy for the embodied cognition folks. I suspect she doesn’t quite accept their title itself though — there’s probably too much questionable baggage associated that she isn’t also comfortable accepting. I don’t think I’d be comfortable using the “embodied cognition” title, that is unless there were a reasonable faction in it that had my own perspective regarding information. The position is that information can only exist as such to the extent that it informs something appropriate. For example, a DVD is only informational in the intended sense to the extent that it informs something such as a DVD player. This argument against Samantha being conscious is far stronger than what Suzi presented (though I certainly appreciate her argument as well). It mandates that there be some sort of quantifiable physics which brain information informs to exist as an experiencer of, for example, vision. So for Samantha to actually be conscious she’d need to do more than just trick people into believing she’s conscious (which has been the hope at least since Alan Turing’s imitation game). Here input information from cameras, speakers, and so on does need to be processed correctly (or the half that functional computationalists get right), though the processed information must also inform something appropriate. What might processed brain information inform to exist as the experiencer of light information for vision, and all the rest? Unfortunately this question seems rarely asked.
Beyond this post getting into some of the technicals of how our consciousness works, I enjoyed Suzi’s discussion with Mike at the top. When defined effectively my vote is that attention and consciousness should be considered the same. Here a “sparse consciousness” advocate might ask me how we can effectively drive cars, if we’re only able to attend what we’re conscious of? The thing to understand, I think, is that our consciousness over time effectively teaches the massively parallel brain computer to do things automatically, though we often take credit for them as if we did them consciously. So here we switch our singular serial consciousness (or attention) to various driving tasks, most of which we’ve already taught our brains to take care of automatically. I consider this to be a parsimonious explanation since there’s no need to either deflate or inflate our consciousness regarding attention, but rather incorporate the brain that creates consciousness/attention.
It’s known that the brain functions as a relatively slow computer, though makes up for its slowness by functioning in a massively parallel way. This can be contrasted with consciousness itself, which functions serially and so must switch between different tasks one at a time. Notice that just as a unified electromagnetic field as consciousness should not only bind all elements of a conscious experience created from neural firing around the brain, but that this consciousness should function serially rather than in parallel given the unified physics of such a field.
Wouldn't it be implicit in EM field theory of consciousness, that elements of consciousness would have more or less strength at any point in time? Driving a car with my mind focused on the lyrics of song wouldn't preclude me from awareness of a car running a stop sign in front of me that I might need to swerve to avoid hitting. My attention might be serial in that I switch from lyrics to car but my consciousness would have had to be of more than the subject of my attention.
Good point James! So I’ve been thinking about your question. I also took a bit of a look at Suzi’s doctoral thesis to see if my initial answer of attention/perception parity might be wrong. (Thanks Geoff!). And maybe I am wrong? How is my non-conscious brain able to effectively drive my truck from time to time, while conversely “I” read a message from you? Maybe it’s a “strength” thing as you suggest — I’m mainly attending to reading and slightly attending to driving such that I can switch the focus fast enough? I consider things to be a bit more discrete here however, as in I quickly monitor one task when it seems appropriate versus the other so my bad driving practices don’t get me into too much trouble.
I’ve mentioned consciousness teaching the massively parallel brain computer to do many things automatically so that consciousness itself doesn’t always need to do them (even though we arrogantly tend to feel otherwise). Notice that experienced drivers have far more ability to multitask than less experienced drivers. There’s another side to mention as well. For this I mean that consciousness itself is a multifaceted thing that’s bound together. When I’m consciously writing to you for example, that’s not the end of my consciousness — each moment is also colored by my concurrent vision, hearing, temperature perception, and so on. Consciously framing my thoughts without any of these other elements should be quite different from the rich form of consciousness that contains all such dynamics. So I consider them all unified, and yes, very much like an electromagnetic field that’s also inherently unified. Here my consciousness tends to wander through of all this discretely from moment to moment rather than from the continuous perspective suggested by “gradual degrees of strength”. In Germany there was a Gestalt psychology movement that apparently faded with World War II, though it was very much supportive of the position that consciousness is both multifaceted and discrete. (It was a McFadden paper that enlightened me on this, of course.)
I’m not exactly suggesting that Suzi’s doctoral thesis is wrong. The “attention begets perception” position is fine for now, though could be rendered obsolete if what I’m saying happens to be true. Yes it does appear that what we’re motivated to attend to, does in itself facilitate what’s perceived. From a not yet accepted broader model however, it could be that a multifaceted consciousness provides gestalt options and so we actively attend what we perceive as most valuable in a wide and discrete momentary domain. And what might value exist as in the end? I consider this to be what feels best each moment. Suzi’s coming post regarding pain should get into value explicitly, so I’ll of course be interested!
You've hit on a hotly debated topic in cognitive science -- When in the processing stream does selection happen? People still debate about whether selection happens early or late, because there's evidence to support both views. Much of the recent work has been looking at how different types of attention work. For example, there's a difference between perceptual attention (attention towards sensory input) and cognitive attention (thinking).
Suzi, this is the kind of thought experiment I wish we saw more of, one that challenges our intuitions rather than flatters or hijacks them. Very well done!
The change blindness test calls attention to what is often called the "grand illusion", that we take in a rich visual field, when the reality is the acute zone in our fovea can only take in a tiny portion of the field at a time. But the details are always there when we check, by moving our eyes, so it seems like we have a giant inner movie screen. This is often described as the brain "filling in" the details, but from what I've read, there's no evidence for that. Our visual intake is just much more sparse than our metacognition leads us to believe.
It seems like Samantha's form of vision would demand a lot more computation than what happens in our eyes. Of course, if she's in some vast quantum powered super computing cluster somewhere, we might imagine it. And it seems resonant with her admission of how many people she's concurrently talking with, a nice scene that tells us how alien she actually is. (At least that's how I remember it. It's been awhile.)
Thanks so much, Mike! It was fun to write.
Great point! There's lots of evidence that our conscious experience is far more sparse than we think it is. Although this is a contentious topic. It was also the topic of my PhD thesis.
There are three views. One is that we are conscious of more than we attend (this would be the rich consciousness view), then there are the folks who argue that we attend to more than we are conscious (this would be the sparse consciousness view), and the last group argue that everything we attend is conscious (this would be the attention and consciousness are the same thing view). The majority of the evidence supports the second view -- we attend to more than we a conscious of. But there are some examples and evidence that folks who support the first view like to point to. The debate very often ends up being a debate about how we define and operationalise 'attention' and 'consciousness'.
Filling in, is an unfortunate term. It gives the impression that there is some sort of movie screen in the head. As you point out, there is not. If by 'filling in' people mean that the brain makes some guesses based on input from the eyes, then sure. But I don't suspect that's what people typically mean when they say 'filling in'. I suspect most people think we can do something like add or change pixels to the image in our heads. Many visual illusions are great examples of the guesses that's going on. We can guess wrong.
I completely agree with your point about the efficiency problems AI faces. This is key, I think. From an evolutionary perspective, it would be far too costly (energy wise) to store and process input like Samantha would need to do. We need a visual system that is accurate enough to keep us alive, but not so accurate we die trying to be accurate.
Your thesis sounds very interesting. I might have to look it up, if it's accessible anywhere.
I think on another thread you mentioned that it covered Michael Graziano's attention schema theory, which fits with us attending to more than we're conscious of. And your point about the debate often being about how we define attention and consciousness matches a long running issue I've noticed for years, that consciousness is a very protean concept, one that has been stretched and contorted over the centuries, to the point that many debates about it are people arguing past each other with different definitions.
My take is that debating definitions is unproductive. It's better to acknowledge the different conceptions and label them, and then discuss each of them in turn. I think when that happens we end up with some productive concepts that can be tractably investigated, and others that are completely metaphysical and beyond science's ability to adjudicate.
On AI, one thing that I think confuses the issue is that we often talk about AGI (artificial general intelligence) as though we ourselves are a general intelligence. But we're not, We're intelligences heavily specialized for surviving in our environments. It's easier to see with non-human animals, that brains exist to make movement decisions, with everything optimized for that, which I think your thought experiment illustrates beautifully.
This thread was incredible. Beautiful framing. Thanks Suzi and Mike.
Wow. Thanks Geoff!
Suzi’s PhD is on her website: https://suzitravis.com/neuroscience/. It gets detailed very quickly but I was able to understand Chapter 1. It’s beautifully written.
Thanks!
Thanks Geoff!
Wow. Even by your standards I'd say you've surpassed yourself this time. Fascinating and excellent!
Thank you so much! This one was fun to write, so I'm really glad you enjoyed it.
Very interesting and full of new questions for me! I have no relevant thoughts on visual perception but I would like to point out a type of perception that is often forgotten in essays and research alike: social perception.
I believe (assumption) that consciousness is primarily a social adaptation and therefore not best tested in individuals, but in social interactions. The protagonist and Her are getting into a relationship and although there is a bit of seeing the world through each others ‘eyes’, it is about the sharing, not the seeing.
Perhaps in the West we have become a bit too enamoured with the individual as the unit of knowledge. Side point: the mind is embodied, the brain is a body part
Great comment! I’ve have a few articles in the pipeline that will discussed shared perception. I’m looking forward to reading your thoughts on the topic.
"the mind is embodied, the brain is a body part" -- yes, I agree!
It's interesting seeing you take on one science fictional view of AI, especially to explore 'disembodied' views of AI.
Doing nothing more than trawling my memory of SF takes on AI (a much written-about topic) I couldn't come up with too many.
Most SF AI's are, in some way, embodied. Think of Asimov's robots, for example. Though he also might have come closest (that I can remember off-hand) with the gigantic 'positronic brains' that dictated to the world in, say, "The Caves of Steel". That might be one of the closest I can think of.
There are lots of SF 'robot' stories, of course - starting, perhaps, with Čapek's "Russom's Universal Robots" (the source of the word, it seems, at least as far as those of the non-organic kind, pace Mary Shelley). But: most of those robots have bodies - many built as human-similar, but many not. A fairly standard trope is a spaceship with an AI in control. You'd have to think of that, I'd say, as 'embodied' - just not with our kind of body.
The same would (by extension, perhaps) apply to the AI's that run things in, say, John Varley's 'Eight Worlds' books and stories, or the 'Minds' in Iain M. Banks 'Culture' universe (many as the AI in charge of spaceships, but many others in charge of 'orbitals' [think: ringworlds, if you get the Larry Niven reference] or 'rocks' etc.)
Of course, such kinds of 'embodiment' might well be very, very, different from our kind.
Poor 'Samantha' seems rather disembodied by comparison! (Even 'Doris' in Steve Perry's Void Fighter books - the closest I can think of to 'Samantha' - runs a large household with many sensors; takes proactive physical security measures; has access to shed-loads of input from a wider 'internet-equivalent' and, in general, has much more 'she' can do to compare actual actions against predicted outcomes.)
I'll also note, just by-the-by, that Samantha's 8316 parallel conversations (much lower in quantity, not kind, to the Varley and Banks AIs) seems as if it might be an 'emergent' rather than fixed property (referencing an earlier response of mine) - if ~8K conversations were designed-in, I'd expect the limit to be 8192, not 124 higher...
Good point, AIs in science fiction are often portrayed as having a body. I don't watch and read as much SF as I would like to, but the only other disembodied AI I could think of was Hal from 2001: A Space Odyssey. Or maybe Hal is (kinda) embodied too, as you say. Hal was in control of the spaceship, right?
Interesting observation about 8,316 vs 8,192. You're right, the non-power-of-2 number does make you think. I wonder if the writers were aware of this, or if they just chose a random number that sounded big at the time.
Fun read! The wild thing about that flip-flop GIF is that once you spot the difference, it's very present from then on. It seems almost glaring, and it makes you wonder why it took so long to spot. And funny thing, I was thinking it might be fun to grab the GIF and write some code to compare the two images. Frame comparison would make the difference pop out immediately.
As far as Samantha's visual perceptions, she would have access to that reality's equivalent of Google Maps, Google Earth (and Street View), and publicly available webcams -- all resources we have now. She would be able to build as detailed of a 3D model of Her world as she desired.
We don't know what features His camera has, but it might have lidar or other distance sensors. It didn't appear to have two lenses, and wasn't big in any case, so Samantha wouldn't have parallax vision, but might have lidar data. Regardless, the camera was moving around, and this would allow Samantha to build a good 3D model from the moving images -- something already done today in movie CGI work. We have software capable of constructing a 3D model from a series of images of a space.
This also means Samantha could build a predictive model. We do that today when we create a 3D model of something (like weather) and run it forward to see what will happen. An Ai such as Samantha could also integrate feedback (like we do) to correct the model in real time. This to some extent is what's going on in robotics and self-driving cars.
Yes! That moment he discovers Her "infidelity" is shattering. (The ending is poignant, too.) Very good point about our single-tasking brains. For computers, multi-tasking is trivially easy, they've been doing it almost since the beginning. Multics -- "MULTiplexed Information and Computing Service" -- 1969. And Samantha could devote another thread of Her consciousness to integrating and supervising all those conversations. And another to sift through them and reflect. The ability to parcel out distinct versions of yourself for specific tasks would be FUN!
In the end, to the extent the movie is creditable, it argues that *bodies* may not be necessary. Inputs of some kind probably are, but maybe not embodiment, per se.
I was wondering about the world-building thing, and I was hoping someone who knew more than me would comment on it. These are great points.
The feedback point is an interesting one. In Samantha's case, she needs feedback from a human. Without the ability to actively explore the world, Samantha lacks an autonomous feedback loop, no? Her loop involves a human step.
Unlike humans, with their active and seemingly constant feedback from the visual world, it seems to me that Samantha's feedback would be received passively and limited to discrete moments of human interaction. Her feedback would be sparse, not continuous, not active, and at least one (more) step removed from the source (so likely influenced by errors).
Brains are curious things aren't they. They process many things in parallel, but seem to focus on one task at a time when taking action.
Much of my love of SF comes from the world-building authors do. Many of their insights have come to pass. (And many not so much. I'm always amused by the SF that thought fax would be a thing on spaceships, and it's astonishing how few predicted personal communication devices. Something the Dick Tracy comic got right. But now I want one of those flying trashcans.)
The movie doesn't give us much about how Samantha works, how she was created, or what capabilities the hardware has. Better than 2013, certainly. LLMs didn't come around until 2020, but we might assume she's a product of that research or something similar.
Projecting from the capabilities we have today, Samantha would have access to streaming video (and all the other interweb resources) and could build her predictive and interactive capabilities by "watching" movies and (as we sometimes do) "guessing" what comes next. Then using the differences between her guesses and what does happen to improve her model. She potentially could "experience" more in an hour than we do in days. We can assume she functions at least as fast as today's machines, and probably faster.
But, yeah, this is all passive to some extent, and you'd have to think real-time interaction with a human would be different. Working without a net, so to speak. Makes one wonder if there might be some equivalent of stage fright. A nervous shuffling of the RAM?
I think "parallel" has distinct meaning between brains and computers. It might be another place the brain=computer analogy is misleading. Our parallelism is always directed at a single task, having a mind. A comparison might be the parallelism of an orchestra, all the players cooperating towards a single symphonic goal. In parallelism in computers, we have separate tasks -- the whole point of the parallelism. Sometimes that's accomplished with separate hardware, and sometimes a single CPU task-switches rapidly, something very easy for CPUs. More and more I agree with those thinkers who've said the brain=computer thing has been misleading.
Another extraordinary thread. Watching experts interact like this is pretty incredible.
😊
Suzi, two bits that feel complimentary to now understanding why my future AI robots will require a body:
1. Video compression codecs use key frames that send the entire image followed by smaller frames for efficiency that just show the changes. So AI Eyes could ‘see’ even more differently than we do and have an easier time passing your change blindness test (I never saw the difference 😭)
2. Spike Jonze was interviewed at a recent tech conference (cant find the reference) but anyhoo they of course wanted to push him on how he came up with Samantha in Her and AI and blah blah. He said nope the movie has nothing to do with AI; it was about him dealing with his breakup from Sophia Coppola. So!
Hi Andrew!
On Jonze's comment, it really is a movie about being human, isn't it!? I don't think I realised how much until I watched it again recently.
Your robot body insight ties nicely with both of your points -- embodiment shapes perception and emotions in ways we're still discovering.
(ps. I've add the change blindness answer to the end of the article)
Does the retinotopic organization of the visual cortex contain the entire scene?
Hi James, great question!
The answer is, sort of.
The reason for the sort of is that neurons in the visual cortex are influenced by both incoming input from the eyes (bottom-up) and input coming from 'higher' areas in the brain (top-down). So the neurons in the visual cortex are always biased in some way.
This might be because you have a goal to search for something in particular.
Say you are searching for something red -- let's say your keys have a red keyring. Neurons in V4 respond strongly to colour and, as you say, they are retinotopically mapped. When you are searching for your red keyring, input from the world will stimulate neurons in your visual cortex. But top-down signals will also act. They will cause the enhancement of firing rates of the neurons that like to respond to red AND they will also suppress the firing rates of the neurons that like to respond to other irrelevant colours (e.g. green). So, the incoming input from the world won't stimulate the cortex in a purely bottom-up way like a camera lens would.
How much non-attended features and spatial locations stimulate the cortex is the subject of ongoing research. To me, this type of research is super interesting, because it starts to get at the question of how much of the data we receive influences the system, but in a completely unconscious way.
It's probably also worth noting that activity in the visual cortex is fleeting and not really stored in a long-term way.
You say "great question" but I wondered after posting it if it was too obtuse.
It would seem to me that the whole scene must be in the retinotopic mapping of the visual cortex; otherwise, anything occurring in another part of my vision (for example, a cat offscreen) during the examination of a particular location of the scene wouldn't be visible. Certainly the eyes are moved to center my vision on particular sections of the scene as I scan it, but it seems something more is going on.
The other problem is we are looking for differences in the two versions of the image. To detect a difference requires holding both versions in memory at the same time. Is the visual cortex able to hold two versions at the same time?
I'm wondering if the difference detection is happening somewhere else. We might suspect somewhere in the frontal lobe, but there is also increasing evidence of involvement of the hippocampus in vision integration.
"Perhaps the strongest evidence for hippocampal contributions to vision comes from studies of scene perception. In one study (Lee, Bussey, et al., 2005), patients with MTL damage were tested on a scene discrimination task in which they were presented with two scenes morphed to different degrees between two endpoint scenes. The task was to determine which of the morphs was most similar to one of the endpoint scenes presented simultaneously as a third stimulus on the screen. The same task was also used to test discrimination of faces, objects, colors, and art. Patients with focal hippocampal lesions were impaired on scene discrimination, especially when the two scenes that needed to be discriminated were closer morphs."
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6881556/
The research would suggests that if the cat is not moving (or drawing attention to itself in any way) and you're not looking for cats, then you would be unaware of the cat. It seems strange, but that's what the majority of the research suggests. Remember the gorilla example? Where you are asked to count the passes between people wearing white shirts. The first time you saw this, you probably had no idea that a person in a black gorilla walks onto the court, stops and beats it's chest, and then walks off the court. You don't see the gorilla unless you are paying attention to the gorilla.
Good point, we would have some sort of memory that is at least keeping track of where we have checked.
There's also evidence that scenes are processed differently to objects in a scene. This might account for the feeling that we are taking in the entire scene, but clearly we are not processing all of the objects and details in the scene.
Cool paper, Nick does great work. I haven't read this one, so thanks for the link.
I was trying to figure where the not moving cat would get dropped from awareness. Is it dropped before it even reaches the brain or after?
The gorilla experiment seems related but different because some subset of viewers will see and be aware of the gorilla; whereas, the non-moving object , I think, drops from everybody's awareness.
There are some bottlenecks at the retina, in the fact that cells in the retina respond selectively to particular wavelengths within the visible spectrum, but the rest needs to be dealt with by the brain.
I'm not sure that the cat and gorilla are that different. I guess if we didn't ever move our eyes (or our head) they would be different. But, it seems, to me, that because we move our eyes, they would both be situations where we miss something because those things are not attended. Some people will will see the gorilla because their attention is captured by the gorilla. And I assume some people would also see the cat for the same reason.
How would it be if compared hearing?
I'm in a crowded room of people conversing. I filter my hearing to the person I'm conversing with but still would probably hear a different conversation if my name were mentioned.
Coherent consciousness seems all about filtering. Compare Bergson's filer theory. We have to pick out the apparent relevant from the irrelevant because we have limited computational power.
On why it’s so hard to detect what’s missing in one picture versus the other, I think it’s disguised entirely by movement. This is to say that the first frame is slightly offset from the second so the whole thing moves. I consider us to have inherent short term memory of images, and this is what lets us detect movement, or the thing that changes. They would have proved my thesis here I think if they’d have also cycled the two images in the exact same spot. If in that case we’d easily see the “moving” difference then I’m right, or if not then I’m wrong.
The first frame is not offset on the x or y axis. But there is an offset in time.
There are actually four images presented: Two images of the street and two grey images. The street scene images are presented for 700ms, and the grey images are presented for 100ms. The second street image was edited using Canva’s eraser feature to remove the street lamp. When I made it, I didn’t realise that the eraser feature also compresses the image, so there is a slight pixel difference throughout the entire image. This is what you might be picking up as movement. I got the sense of movement too, but I suspect the perception of movement might be mostly an illusion. I've added a gif with the grey screens removed. (You can see the pixel differences if you zoom in close on that gif).
https://suzitravis.substack.com/i/149551360/change-blindness-answer
Thanks Suzi, I realized my mistake about the image movement after I made that spur of the moment comment. But rather than try to edit my comment I decided to just let it go. So yes, it clearly was the grey frames that disguised the difference. Of course you proved this by removing those frames, and in accordance with theory, the difference became quite clearly displayed as movement between the two images.
I could not see a difference between the images. It's driving me nuts! Please point out the difference.
Sorry Max! I've now added the answer to the end of the article.
Great thought experiment Suzi.
There is another option hiding here, maybe the vision fades because the brain isn’t doing any representing at all! Shocking idea, but maybe all the brain does is respond to the “flow” of information. This is why there is change blindness, we aren’t receiving images of the world and continually updating them, we’re embedded in a flow of information and responding to changes in that flow.
Maybe the brain just isn’t that important, or no more important than the eyes. Notice how you talk about the brain “interpreting” things. But thinking in engineering terms, imagine instead of a computational model it more like a governor device. There's no intervening information processing, the movement of the parts directly responds to the particular energy input or information flow, and the flow we’re talking about is specific to the particular movement produced.
The best example I’ve heard is catching a ball. The computational model assumes we make a representation of a moment in time when the ball is hit, and then from subsequent observations of its change in position we internally calculate speed, trajectory etc. Or…. We just need a system to perceive the motion of the ball and then we move our body to keep that motion in a particular relationship with our hands. Not much internal processing required, or whatever the brain does in all that, it’s a way different function than the computational model assumes. It's not some master computational AI inside our head.
Hi Prudence. Thanks for adding an intriguing alternative to traditional views of perception and cognition.
I'm trying to understand what you are saying here, but I'm afraid I'm struggling.
One thing I don't understand is how 'being embedded in a flow of information' accounts for the difficulty in finding the change. If there is a change in this flow of information (as there is in change blindness), and we are "responding to changes in that flow", why would we sometimes have difficulty detecting the changes (as is the case with change blindness)?
The other thing that struck me was how passive this would be. I assume meaning would be in the information flow? So then I wonder how that meaning would be grounded. And if we want to say it's not grounded, how would we account for the fact that different people have different meanings?
The third thing is -- what do we do with all the evidence that movement seems to make a difference to our perception? That example where we are paralysed by the mad scientist -- this experiment has been done in monkeys -- they paralyse the monkey's eyes and keep their head still. The results showed that when the monkeys tried to move their eyes their vision faded faster than when they didn't try to move their eyes (even though their eyes never moved -- it's the trying that counts). I'm struggling to see how "being embedded in a flow of information" would account for these findings.
I was trying to say that perception is awareness of movement, of changes in the environment, a dynamic system, rather than the idea the brain is some kind of computer that stores images and then processes them. So the idea visual perception is tied to movement isn’t surprising.
Change blindness depends on covering up the information that something has changed, eg in your example there is a camera cut, sometimes it’s a distraction, but the information of the change is hidden. If we stored representations, we could compare the two representations to find the difference. But to find it we need to focus on a small area and then wait to detect the change in information.
I'm sorry Prudence, I completely misunderstood you. Thank you so much for clearing that up. Yes, I agree, the brain doesn't store images like a computer does.
And you are so right, the change blindness example only works because there is a slight break between the images. In this example, images are presented for 700ms and then a blank grey screen for 100ms. Without the 100ms break, the onset of the change would be interpreted as movement, and easily found. The break has the effect of disrupting any sustained pattern of neural activity.
Sometimes with a change blindness setup, I get the sense that I have detected the change in an area, but haven't yet got exactly what the change is yet. That feeling always fascinates me. It's like detecting a change and detecting the identity of the change are not the same thing.
Eye fixation duration and location during reading is comparable to Samantha’s viewing one object at a carnival, say, a Ferris wheel. Whereas the human reader has a sensational image that looks like [ferris wheel] the disembodied AI has a sensation like 🎡. Human vision sends the immediate image [ferris wheel] for immediate processing and an efference image that serves to feed forward in predictive mode. For example, the human eye might expect to “see” (limit the possibilities) an image like [tilt-a-whirl] during the upcoming fixation. Can you help me understand the conclusion cognitive scientists have sold as a “fact” that word perception is finished before the saccade? That [ferris wheel] is somehow routed in sequence through a phonological processor that recodes the letters into sounds in isolation, sends these sounds into long short-term memory, achieves lexical access (selects a meaning for [ferris wheel]), connects this meaning to an accumulating “mental model” of a text, and then begins again at ground zero to redo the process of word identification? What about this efference copy? Does it not apply to the visual perception of words as objects?
Hey Terry! These are excellent questions. It will take some time to answer (and I’ll need to brush up on the latest research), so perhaps it’s a good topic for an article?
I’d love it!!!
Excellent, as always.
Thanks Stuart!
Great article, Suzi! Especially your visual example for 'change blindness' - very effective 👍🏼
Thanks Karen!
You write, "Or are we talking about something completely different — a kind of intelligence so unlike our own ...."
Yes, I think so, an intelligence so unlike our own. I like this example...
Bacteria defend themselves from invading viruses by grabbing a chunk of the virus DNA, storing the virus DNA within the bacteria's own DNA, and then referencing the stored virus DNA to identify future invading viruses so as to present the most effective defense. When we do things like this we call it intelligence.
Bacteria have no brain or nervous system. Are they intelligent?
Maybe it's the concept of intelligence which isn't all that intelligent?
Great discussion and I did enjoy the film. I was surprised that it was released in 2013. That seems quite a while ago.
(I have no expertise in this so this is just an intuitive response to your excellent article.)
I don’t quite get why the AI perceptions would need to be embodied. I don’t expect that AI vision will necessarily work the same way as ours just as robots can have wheels instead of legs and microphones instead of malleus, incus, and stapes. They are nothing like our bits but they still work. Sometimes better than ours.
Our perceptions may be embodied but AIs or AI researchers might figure out how to do vision without our complexity. Many of our features ended up complex as a quirk of evolution and AIs may be able to skip that bit and come up with something better that doesn’t require a body or clever little interactions with the motor system.
They’ll need a camera for vision and microphones for hearing but I don’t think they’ll need all that other complexity.
The argument might be that there's a difference between active and passive perception. If perception is passive — input received, processed, then responded to — this same process can be applied to entities we don't typically assume have experiences. Even a simple thermostat does this to some degree: it receives temperature information, processes this input against a set threshold, and responds by turning heating or cooling on or off. However, we don't generally attribute consciousness or experiences to thermostats.
Under this passive view, we are often left with the unsatisfying conclusion that consciousness or experience emerges simply with enough complexity. The implication is that with sufficient sophistication in the input-process-respond system, consciousness somehow appears. But this leaves us wondering: At what point does a system become complex enough to generate consciousness? And why should increased complexity suddenly give rise to subjective experience?
I think the line to be crossed is to build an AI brain that is able to reflect on the sensations and combine them with memories and thoughts and desires and to consider the most desirable action. A thermostat does not do that; it simply responds to a sensation (the temperature.
There is still a line of complexity to be crossed of course, but it doesn’t require the separate mind of the dualists. Perhaps subjective experience or consciousness is just the ability to reflect on a sensation and replay it while we consider what to do with it.
I think you have raised some really good questions in this one about embodied consciousness and visual perception. Looking forward to reading what follows. Also, I couldn't point out the difference in the images which is fascinating.
This post suggests that our host Suzi has some sympathy for the embodied cognition folks. I suspect she doesn’t quite accept their title itself though — there’s probably too much questionable baggage associated that she isn’t also comfortable accepting. I don’t think I’d be comfortable using the “embodied cognition” title, that is unless there were a reasonable faction in it that had my own perspective regarding information. The position is that information can only exist as such to the extent that it informs something appropriate. For example, a DVD is only informational in the intended sense to the extent that it informs something such as a DVD player. This argument against Samantha being conscious is far stronger than what Suzi presented (though I certainly appreciate her argument as well). It mandates that there be some sort of quantifiable physics which brain information informs to exist as an experiencer of, for example, vision. So for Samantha to actually be conscious she’d need to do more than just trick people into believing she’s conscious (which has been the hope at least since Alan Turing’s imitation game). Here input information from cameras, speakers, and so on does need to be processed correctly (or the half that functional computationalists get right), though the processed information must also inform something appropriate. What might processed brain information inform to exist as the experiencer of light information for vision, and all the rest? Unfortunately this question seems rarely asked.
Beyond this post getting into some of the technicals of how our consciousness works, I enjoyed Suzi’s discussion with Mike at the top. When defined effectively my vote is that attention and consciousness should be considered the same. Here a “sparse consciousness” advocate might ask me how we can effectively drive cars, if we’re only able to attend what we’re conscious of? The thing to understand, I think, is that our consciousness over time effectively teaches the massively parallel brain computer to do things automatically, though we often take credit for them as if we did them consciously. So here we switch our singular serial consciousness (or attention) to various driving tasks, most of which we’ve already taught our brains to take care of automatically. I consider this to be a parsimonious explanation since there’s no need to either deflate or inflate our consciousness regarding attention, but rather incorporate the brain that creates consciousness/attention.
It’s known that the brain functions as a relatively slow computer, though makes up for its slowness by functioning in a massively parallel way. This can be contrasted with consciousness itself, which functions serially and so must switch between different tasks one at a time. Notice that just as a unified electromagnetic field as consciousness should not only bind all elements of a conscious experience created from neural firing around the brain, but that this consciousness should function serially rather than in parallel given the unified physics of such a field.
Wouldn't it be implicit in EM field theory of consciousness, that elements of consciousness would have more or less strength at any point in time? Driving a car with my mind focused on the lyrics of song wouldn't preclude me from awareness of a car running a stop sign in front of me that I might need to swerve to avoid hitting. My attention might be serial in that I switch from lyrics to car but my consciousness would have had to be of more than the subject of my attention.
Good point James! So I’ve been thinking about your question. I also took a bit of a look at Suzi’s doctoral thesis to see if my initial answer of attention/perception parity might be wrong. (Thanks Geoff!). And maybe I am wrong? How is my non-conscious brain able to effectively drive my truck from time to time, while conversely “I” read a message from you? Maybe it’s a “strength” thing as you suggest — I’m mainly attending to reading and slightly attending to driving such that I can switch the focus fast enough? I consider things to be a bit more discrete here however, as in I quickly monitor one task when it seems appropriate versus the other so my bad driving practices don’t get me into too much trouble.
I’ve mentioned consciousness teaching the massively parallel brain computer to do many things automatically so that consciousness itself doesn’t always need to do them (even though we arrogantly tend to feel otherwise). Notice that experienced drivers have far more ability to multitask than less experienced drivers. There’s another side to mention as well. For this I mean that consciousness itself is a multifaceted thing that’s bound together. When I’m consciously writing to you for example, that’s not the end of my consciousness — each moment is also colored by my concurrent vision, hearing, temperature perception, and so on. Consciously framing my thoughts without any of these other elements should be quite different from the rich form of consciousness that contains all such dynamics. So I consider them all unified, and yes, very much like an electromagnetic field that’s also inherently unified. Here my consciousness tends to wander through of all this discretely from moment to moment rather than from the continuous perspective suggested by “gradual degrees of strength”. In Germany there was a Gestalt psychology movement that apparently faded with World War II, though it was very much supportive of the position that consciousness is both multifaceted and discrete. (It was a McFadden paper that enlightened me on this, of course.)
I’m not exactly suggesting that Suzi’s doctoral thesis is wrong. The “attention begets perception” position is fine for now, though could be rendered obsolete if what I’m saying happens to be true. Yes it does appear that what we’re motivated to attend to, does in itself facilitate what’s perceived. From a not yet accepted broader model however, it could be that a multifaceted consciousness provides gestalt options and so we actively attend what we perceive as most valuable in a wide and discrete momentary domain. And what might value exist as in the end? I consider this to be what feels best each moment. Suzi’s coming post regarding pain should get into value explicitly, so I’ll of course be interested!
Interesting, as always!
You've hit on a hotly debated topic in cognitive science -- When in the processing stream does selection happen? People still debate about whether selection happens early or late, because there's evidence to support both views. Much of the recent work has been looking at how different types of attention work. For example, there's a difference between perceptual attention (attention towards sensory input) and cognitive attention (thinking).