What is Information? The Ins and the Outs…

Suzi Travis

Apr 23, 2024

Information Theory versus Integrated Information Theory

Read →

18 Comments

Comment deleted

Mar 18

Comment deleted

Expand full comment

Suzi Travis

Mar 19Edited

Haha, yes, I am indeed amazed! 😉 Great questions! And, yay! I get to do some math...

Each yes/no question roughly halves the remaining cards, but because 52 isn’t a power of 2, some splits will be uneven. That means some yes/no questions aren't a 50/50.

Question 1 (52 → 26) → Exactly 50/50

Question 2 (26 → 13) → Exactly 50/50

Question 3 (13 → 6 or 7) → Not 50/50 (46% chance of 6, 54% of 7)

Question 4 (6 or 7 → 3 or 4) (always 50% if we had 6, but 3/4 if we had 7)

Question 5 (3 or 4 → 1 or 2) → Not 50/50 (33% chance of 1, 67% of 2)

Starting with all 52 cards, let’s assume we’re trying to guess the Ace of Spades:

1. Is the card red? → No (eliminates 26 red cards) → 26 left

2. Is the suit Spades? → Yes (eliminates Clubs) → 13 left

3. Is the card higher than 8? → Yes (eliminates 2, 3, 4, 5, 6, 7, 8) → 6 left

4. Is the card higher than Jack? → Yes (eliminates 9, 10, Jack) → 3 left

5. Is the card higher than King? → Yes → It’s the Ace of Spades!

In most cases, question 5 still leaves us with 2 cards, meaning we need a sixth question.

If we get lucky twice (at question 3 and question 5, or questions 4 and 5), we only need 5 questions instead of 6.

If we do the math, this works out to happen about 23% of the time.

---

Yes! Assumptions are always baked in.

In Shannon’s Information Theory, information is really about surprise. If something is already known or assumed with near certainty, then it doesn’t carry much information when it happens. When we turn over a card, we don't expect it to vanish, so its persistence doesn’t count as new information — it’s just confirming what we already assumed.

But you’re right that if we were dealing with a different system — say, a quantum one where a card could disappear or mutate — then our information calculation would change. We'd have to account for those extra possibilities, and the entropy would increase.

It's kinda cool to think about how much assumptions are built in.

Expand full comment

Reply (1)

Comment deleted

Mar 19

Comment deleted

Expand full comment

Suzi Travis

Mar 19

That’s an interesting question -- what right do we have not to be surprised when we see something again?

If we were surprised every time we saw something we already expected, it would be hugely energy inefficient. The brain (and many other systems) optimise by by learning stable patterns, so that only unexpected events generate a response. In that sense, surprise is costly.

But I do see your point — sometimes the things we assume should be re-examined, because that’s where breakthroughs happen.

And good call on inertia! Even in a frictionless system, something still has to apply force to move the door. I wonder whether most discussions of Maxwell’s Demon simplify this part to focus on the cost of information processing?

Expand full comment

Reply (1)

Comment deleted

Mar 19Edited

Comment deleted

Expand full comment

Suzi Travis

Mar 21

Jack, this is so cool. I’m going to be pondering this for days!

If our model of the world is based on prior expectations, and those expectations shape how we interpret new data, then aren't we kind of stuck in a loop? How do we ever truly challenge our assumptions if the system is designed to reinforce them for the sake of efficiency?

This whole conversation reminds me of Hume’s problem of induction. Assuming something will happen because it always has — that’s exactly Hume’s concern. We expect the sun to rise tomorrow because it always has, but that’s just past experience. Hume would say we can never prove that induction is valid — we just rely on it because it seems to work.

But the alternatives aren’t much better. Deductive reasoning can’t give us new information — it just unpacks what's already assumed. And inductive reasoning is circular — it basically begs the question.

This is such a fascinating problem.

Expand full comment

Reply (1)

Comment deleted

Mar 21

Comment deleted

Expand full comment

Continue thread →

Daniel Nest

Apr 23, 2024

You did it again.

I propose a new tagline for your Substack: "Everything you thought was simple is actually much more complex when you start to think about it"

(On second thought, that tagline sucks. Forget I ever said anything.)

But thanks to this post, I can now sound smart when I tell my poker buddies that Texas Hold 'Em is really all about minimizing information entropy while maximizing your bets.

At which point I will no longer have any poker buddies.

Expand full comment

Reply (1)

Suzi Travis

Apr 24, 2024

Hilarious! As always. I'm stealing that tagline, btw. But I'm changing two words... "Everything you thought was boring is actually much more interesting when you start to think about it"

Expand full comment

Alejandro Piad Morffis

Apr 23, 2024

Brilliant presentation, as usual!

Just want to add there's an equivalent interpretation for information being quantified as the neg log of the probability of the event: it gives you the minimum size (in bits) of a message encoding the outcome of that event, in a hypothetical language if which receiver and transmitter have the same knowledge except for that outcome. This had of course very practical implications in the design of Internet communication protocols: it gives us, for example, the optimal compression rate we can expect for any given piece of data, the size the redundancy payload needs to be to be able to recover from different types of errors.

But it also has a rather profound meaning in computational complexity, it gives a way to talk about the optimal program for any given problem, e.g., the smallest Turing machine that can decide if an arbitrary string is a solution to the problem or not. And this, in turn, has profound implications in machine learning and AI, as it speaks about inherent limitations of learning algorithms.

Expand full comment

Reply (1)

Suzi Travis

Apr 24, 2024

Ah! Brilliant. Thank you. Is this Kolmogorov’s information theory? or similar to?

Expand full comment

Reply (1)

Alejandro Piad Morffis

Apr 24, 2024

Yes! Part of it at least.

Expand full comment

Arturo Macias

Apr 23, 2024

This has been inspiring. Only an obvious observation. Unlike weight or volume, information is not a characteristic of reality, but of our relation to it. I have an information set, you have other. It is difficult to understand how information can be the basis of consciousness, because that is intrinsic.

But the structure of information flows look related to conscience. The experiments of Tononi on the kind of information flows that are related to conscience (=vigil or conscious dreams) vs deep sleep are convincing. The ITT, of course, is a wild extrapolation, but what else do we have?

Expand full comment

Reply (1)

Suzi Travis

Apr 24, 2024

I agree, if we define information in this way, it is difficult to see how information can be the basis of consciousness. But Tononi's definition of information is radically different to Shannon's Information Theory. Tononi's definition fits more with the etymology of the word, which is to do with meaning and having the ability to make changes in the world. More on that next week.

I'm guessing by the question, 'what else do we have?', you mean, what other theories of consciousness do we have? Other than IIT, I think we're up 19 other scientific theories at the moment. So, we have options (and plenty of room for disagreements) 🤣

Expand full comment

Nick Potkalitsky

Apr 24, 2024

Great work! I am slightly suspicious of the notion that information is observer dependent. Seems to set some other problems in morion. Unless we distinguish info from knowledge, but then we set up a potential parallel path problem begging the quest where the two meet. It is interesting how many of these 20th century notions are so practically useful, but ultimately hinge on the idea that we ultimately have very little connection with each other or the actual universe. I don’t know what to make about that.

Expand full comment

Reply (1)

Suzi Travis

Apr 24, 2024Edited

I know, right! And I'm trying to avoid the "what is knowledge?" question, because that one's a doozy!

It's interesting isn't it when theories like Information Theory, are so enormously useful, but also seem disconnected with our everyday use of the word, and maybe even disconnected from the way we think about reality.

Expand full comment

Wyrd Smythe

Jun 6, 2024

An excellent overview!

>> "Information entropy is similar to entropy in physics, but it’s not exactly the same."

Yes! A point often missed. Some regret Shannon using the same term, and I see their point, but there are strong similarities between the notions of disorder, uncertainty, surprise, the ability to do work, and the original physical definition (the log of the number of micro-states for a given macro state).

>> "The order of the cards has very low information entropy."

In fact, a pristine deck with a known order would have *zero* entropy! There is only one state that is perfectly sorted, and as you mentioned, log_2(1)=0.

Another way to look at that 5.7+ bits is that it requires that many bits to count to 52, so we can index any card with 5.7+ bits. Essentially the same thing as the six questions you list, but here simply assigning a unique six-bit pattern to each card. (Since 2^6=64, we'd have 12 patterns left over.)

I might be wrong about this, but I believe the entropy of a well-shuffled deck is log_2(52!), which is a little over 255 bits -- the number of bits required to index all possible orderings.

Expand full comment

Reply (1)

Suzi Travis

Jun 9, 2024

Ah! Yes! If we wanted to determine the entire order of the deck it would be log2(52!) (the number of bits needed to represent all possible permutations of 52 cards), which is approximately 225.58 bits. But if we just want to know the top card, log2(52) works. It gives the number of bits needed to represent the number 52, which is around 5.7 bits.

Thanks for pointing that out, it could have caused some confusion.

Expand full comment

Reply (1)

Wyrd Smythe

Jun 10, 2024

Yep, exactly. (Ah, you caught my typo -- obviously should be "...over 225 bits...". I'd just washed my hands and couldn't do anything with them. 😊) There are two different situations, the entropy (the surprise) of predicting the next card (5.7 bits), and the entropy of an entire card deck depending on how well ordered it is. The former is more what Shannon was addressing -- predicting the next communication baud -- while the latter is more the entropy=disorder of physics. That one changes depending on the order of the deck whereas the Shannon entropy of picking a card kind of assumes a fully disordered deck.

As an aside, I'd long thought using the log in calculating entropy was to tame the high numbers involved in combinations of lots of parts. Which, yeah, that too, but I learned recently a key reason is that it allows addition of entropy values. More convenient than having to multiply them. Maybe that's common knowledge, but I hadn't realized it before.

Expand full comment