25 Comments
User's avatar
Daniel Nest's avatar

You did it again.

I propose a new tagline for your Substack: "Everything you thought was simple is actually much more complex when you start to think about it"

(On second thought, that tagline sucks. Forget I ever said anything.)

But thanks to this post, I can now sound smart when I tell my poker buddies that Texas Hold 'Em is really all about minimizing information entropy while maximizing your bets.

At which point I will no longer have any poker buddies.

Expand full comment
Suzi Travis's avatar

Hilarious! As always. I'm stealing that tagline, btw. But I'm changing two words... "Everything you thought was boring is actually much more interesting when you start to think about it"

Expand full comment
Alejandro Piad Morffis's avatar

Brilliant presentation, as usual!

Just want to add there's an equivalent interpretation for information being quantified as the neg log of the probability of the event: it gives you the minimum size (in bits) of a message encoding the outcome of that event, in a hypothetical language if which receiver and transmitter have the same knowledge except for that outcome. This had of course very practical implications in the design of Internet communication protocols: it gives us, for example, the optimal compression rate we can expect for any given piece of data, the size the redundancy payload needs to be to be able to recover from different types of errors.

But it also has a rather profound meaning in computational complexity, it gives a way to talk about the optimal program for any given problem, e.g., the smallest Turing machine that can decide if an arbitrary string is a solution to the problem or not. And this, in turn, has profound implications in machine learning and AI, as it speaks about inherent limitations of learning algorithms.

Expand full comment
Suzi Travis's avatar

Ah! Brilliant. Thank you. Is this Kolmogorov’s information theory? or similar to?

Expand full comment
Alejandro Piad Morffis's avatar

Yes! Part of it at least.

Expand full comment
Arturo Macias's avatar

This has been inspiring. Only an obvious observation. Unlike weight or volume, information is not a characteristic of reality, but of our relation to it. I have an information set, you have other. It is difficult to understand how information can be the basis of consciousness, because that is intrinsic.

But the structure of information flows look related to conscience. The experiments of Tononi on the kind of information flows that are related to conscience (=vigil or conscious dreams) vs deep sleep are convincing. The ITT, of course, is a wild extrapolation, but what else do we have?

Expand full comment
Suzi Travis's avatar

I agree, if we define information in this way, it is difficult to see how information can be the basis of consciousness. But Tononi's definition of information is radically different to Shannon's Information Theory. Tononi's definition fits more with the etymology of the word, which is to do with meaning and having the ability to make changes in the world. More on that next week.

I'm guessing by the question, 'what else do we have?', you mean, what other theories of consciousness do we have? Other than IIT, I think we're up 19 other scientific theories at the moment. So, we have options (and plenty of room for disagreements) 🤣

Expand full comment
Nick Potkalitsky's avatar

Great work! I am slightly suspicious of the notion that information is observer dependent. Seems to set some other problems in morion. Unless we distinguish info from knowledge, but then we set up a potential parallel path problem begging the quest where the two meet. It is interesting how many of these 20th century notions are so practically useful, but ultimately hinge on the idea that we ultimately have very little connection with each other or the actual universe. I don’t know what to make about that.

Expand full comment
Suzi Travis's avatar

I know, right! And I'm trying to avoid the "what is knowledge?" question, because that one's a doozy!

It's interesting isn't it when theories like Information Theory, are so enormously useful, but also seem disconnected with our everyday use of the word, and maybe even disconnected from the way we think about reality.

Expand full comment
Wyrd Smythe's avatar

An excellent overview!

>> "Information entropy is similar to entropy in physics, but it’s not exactly the same."

Yes! A point often missed. Some regret Shannon using the same term, and I see their point, but there are strong similarities between the notions of disorder, uncertainty, surprise, the ability to do work, and the original physical definition (the log of the number of micro-states for a given macro state).

>> "The order of the cards has very low information entropy."

In fact, a pristine deck with a known order would have *zero* entropy! There is only one state that is perfectly sorted, and as you mentioned, log_2(1)=0.

Another way to look at that 5.7+ bits is that it requires that many bits to count to 52, so we can index any card with 5.7+ bits. Essentially the same thing as the six questions you list, but here simply assigning a unique six-bit pattern to each card. (Since 2^6=64, we'd have 12 patterns left over.)

I might be wrong about this, but I believe the entropy of a well-shuffled deck is log_2(52!), which is a little over 255 bits -- the number of bits required to index all possible orderings.

Expand full comment
Suzi Travis's avatar

Ah! Yes! If we wanted to determine the entire order of the deck it would be log2(52!) (the number of bits needed to represent all possible permutations of 52 cards), which is approximately 225.58 bits. But if we just want to know the top card, log2(52) works. It gives the number of bits needed to represent the number 52, which is around 5.7 bits.

Thanks for pointing that out, it could have caused some confusion.

Expand full comment
Wyrd Smythe's avatar

Yep, exactly. (Ah, you caught my typo -- obviously should be "...over 225 bits...". I'd just washed my hands and couldn't do anything with them. 😊) There are two different situations, the entropy (the surprise) of predicting the next card (5.7 bits), and the entropy of an entire card deck depending on how well ordered it is. The former is more what Shannon was addressing -- predicting the next communication baud -- while the latter is more the entropy=disorder of physics. That one changes depending on the order of the deck whereas the Shannon entropy of picking a card kind of assumes a fully disordered deck.

As an aside, I'd long thought using the log in calculating entropy was to tame the high numbers involved in combinations of lots of parts. Which, yeah, that too, but I learned recently a key reason is that it allows addition of entropy values. More convenient than having to multiply them. Maybe that's common knowledge, but I hadn't realized it before.

Expand full comment
Jack Render's avatar

I have a couple of questions, a fact which I'm sure will surprise and amaze you.

First, I found your 20-questions approach to the cards enlightening, but it did take you six questions to arrive at your answer, and I didn't see any variation that would have reduced that number of six, yet the average is apparently 5.7, which must mean that in some situations one would be able to ask only five questions. What situation?

And second, I note that certain things are taken as "given." For example, that when you turn a card over it will not simply disappear, or turn into five other cards. Object permanence, in other words, but object permanence really is theoretical, and at the particulate level often isn't even true. Say, if the cards were on some nano particle that might revert to an energy state. And then there's the discount of possible printing error, because what could be more annoying than a deck with two aces of hearts but no king? Or that one of those kings could be purple. But it can happen. This would seem to mean that information embedded in assumptions is not counted when verified. I could see problems with that.

Expand full comment
Suzi Travis's avatar

Haha, yes, I am indeed amazed! 😉 Great questions! And, yay! I get to do some math...

Each yes/no question roughly halves the remaining cards, but because 52 isn’t a power of 2, some splits will be uneven. That means some yes/no questions aren't a 50/50.

Question 1 (52 → 26) → Exactly 50/50

Question 2 (26 → 13) → Exactly 50/50

Question 3 (13 → 6 or 7) → Not 50/50 (46% chance of 6, 54% of 7)

Question 4 (6 or 7 → 3 or 4) (always 50% if we had 6, but 3/4 if we had 7)

Question 5 (3 or 4 → 1 or 2) → Not 50/50 (33% chance of 1, 67% of 2)

Starting with all 52 cards, let’s assume we’re trying to guess the Ace of Spades:

1. Is the card red? → No (eliminates 26 red cards) → 26 left

2. Is the suit Spades? → Yes (eliminates Clubs) → 13 left

3. Is the card higher than 8? → Yes (eliminates 2, 3, 4, 5, 6, 7, 8) → 6 left

4. Is the card higher than Jack? → Yes (eliminates 9, 10, Jack) → 3 left

5. Is the card higher than King? → Yes → It’s the Ace of Spades!

In most cases, question 5 still leaves us with 2 cards, meaning we need a sixth question.

If we get lucky twice (at question 3 and question 5, or questions 4 and 5), we only need 5 questions instead of 6.

If we do the math, this works out to happen about 23% of the time.

---

Yes! Assumptions are always baked in.

In Shannon’s Information Theory, information is really about surprise. If something is already known or assumed with near certainty, then it doesn’t carry much information when it happens. When we turn over a card, we don't expect it to vanish, so its persistence doesn’t count as new information — it’s just confirming what we already assumed.

But you’re right that if we were dealing with a different system — say, a quantum one where a card could disappear or mutate — then our information calculation would change. We'd have to account for those extra possibilities, and the entropy would increase.

It's kinda cool to think about how much assumptions are built in.

Expand full comment
Jack Render's avatar

Well, I can’t congratulate myself on the first question - I just got lazy there. But I can certainly congratulate you on another excellent explanation. I hope that book is coming along…

The second question was a little richer because it points, in my opinion, to the weakness of taking theories for facts and therefore according no surprise or value to their being shown true in the situation. Considering how often in the history of physics theories have proven to be false or incomplete, I’d be inclined to think there was some circularity in that reasoning. Just because we’ve seen it all before, what right do we have not to be surprised when we see it again? :) Remember my mention of Ghajini, the guy with the disappearing memory?

I do have one more observation. That demon and his “frictionless door.” Even without friction there would still be inertia. It would take energy both to start and stop the door from swinging. For what it’s worth! I’m giving myself another day or two to reflect on the main post here before putting it away. As always, great work, and thanks for the answers.

Expand full comment
Suzi Travis's avatar

That’s an interesting question -- what right do we have not to be surprised when we see something again?

If we were surprised every time we saw something we already expected, it would be hugely energy inefficient. The brain (and many other systems) optimise by by learning stable patterns, so that only unexpected events generate a response. In that sense, surprise is costly.

But I do see your point — sometimes the things we assume should be re-examined, because that’s where breakthroughs happen.

And good call on inertia! Even in a frictionless system, something still has to apply force to move the door. I wonder whether most discussions of Maxwell’s Demon simplify this part to focus on the cost of information processing?

Expand full comment
Jack Render's avatar

Very interesting, but I remain concerned about circularity. Biological efficiency and the inefficiency of testing expected outcomes. Hmm. Obviously you're right that it would be costly to reaffirm expectations, but when seeking certainty is it fair or consistent to consider that cost? I was ignoring it in the search for theoretical certainty, but now we have a new hybrid, "pragmatic certainty," an elastic concept that regards the status quo as some sort of default given. Very efficient until you're a flat-earther navigating at sea, for example.

I'm not really being snippy, here; I'm wondering to what extent this elevation of preconceived notions and the implied efficiency really amounts to a form of circularity of reasoning.

So my new proposed solution to the 20-questions format of the cards would have to start with: "is this a standard deck with proper and consistent cards from the four suits?" "will their identity persist as we conduct our inquiries?" (evoking the quantum notion of observation changing outcome); will what has been established remain established?... None of these things, strictly speaking, is theoretically certain. I'm reminded of one of the dialogues in Gödel, Escher, Bach where, I think it's the turtle, is willing to accept every logical postulate but asks, unendingly, whether that will lead to the next logical step.

But what a nuisance that turtle was! Don't waste too much time on this unless it moves the inquiry you intended along. Really just thinking aloud here.

Expand full comment
Suzi Travis's avatar

Jack, this is so cool. I’m going to be pondering this for days!

If our model of the world is based on prior expectations, and those expectations shape how we interpret new data, then aren't we kind of stuck in a loop? How do we ever truly challenge our assumptions if the system is designed to reinforce them for the sake of efficiency?

This whole conversation reminds me of Hume’s problem of induction. Assuming something will happen because it always has — that’s exactly Hume’s concern. We expect the sun to rise tomorrow because it always has, but that’s just past experience. Hume would say we can never prove that induction is valid — we just rely on it because it seems to work.

But the alternatives aren’t much better. Deductive reasoning can’t give us new information — it just unpacks what's already assumed. And inductive reasoning is circular — it basically begs the question.

This is such a fascinating problem.

Expand full comment
Jack Render's avatar

Have you ever tried to draw, seriously? If so, you will have encountered the war between expectations and observations. Pardon the self-reference, but it's a thing I've long been interested in and discuss, starting at the third paragraph down, here: https://jackrender.substack.com/p/fur-elise-or-notes-on-time-and-memory if you're interested, of course. I think Kant took on the question of drawing the line you mention in a different way, in "A Critique of Pure Reason."

Expand full comment