• Member Since 11th Apr, 2012
  • offline last seen 2 hours ago

Bad Horse


Beneath the microscope, you contain galaxies.

More Blog Posts758

Mar
20th
2013

Information theory and writing · 2:39am Mar 20th, 2013

PrettyPartyPony's blog post yesterday got me thinking about what we mean by "wordiness". We don't mean having "too many" words. Then we would just say "long". We mean having words that don't do much.

High-entropy writing

In 1948, Claude Shannon published "A Mathematical Theory of Communication", an essay (or very short book) that's surprisingly quick and easy to read for something with such profound mathematical content. It's one of the three cornerstones of science, along with Euclid's Elements and Newton's Principia. It provided equations to measure how much information words convey. Let me repeat that, shouting this time, because the implications surely didn't sink in the first time: It provided EQUATIONS to measure HOW MUCH INFORMATION WORDS CONVEY.

These measurements turn out to be isomorphic (that's a big word, but it has a precise meaning that is precisely what I mean) to the concept of thermodynamic entropy. The exact method Shannon used to measure information per letter in English is crude, but it's probably usually within 20% of the correct answer. The important point is that, for a given text and a given reader, there is a correct answer.

The implications of being able to measure information are hard to take in without thinking about it for a few decades [1]. For writers, one implication is that the question "Is this story wordy?" has an answer. I could write a simple program that would analyze a story and say how wordy it was.

The caveat is simple, subtle, and enormous: A given text conveys a well-defined amount of information to a given reader, assuming infinite computational resources [2]. Without infinite computational resources, it depends on the algorithms you use to predict what's coming next, and there are probably an infinite number of possible algorithms. I could easily compute the information content of a story by predicting the next word of each sentence based on the previous two words. This would warn a writer if their style were cliched or vague. But it would miss all the information provided by genre expectations, our understanding of story structure and theme, psychology, and many other things critical in a story.

But you can be aware of the information content of your story without writing that program or understanding how to measure entropy. One simple way is to be aware of the information content of the words you use. Writers say to use precise words and avoid vague ones. Maybe better advice is, use high-entropy words. A high-entropy word is one that can't be easily predicted from what came before it. The word "fiddle" is usually unexpected, but is expected if you just said "fit as a".

Fill in the blanks:

She headed to the right, past the empty bar and the plastic display case of apple and coconut creme pies, towards a tall, lean blonde in a faded orange miner's jumpsuit who was sprawled on a chair at the end of a booth, tilting it backwards into the aisle, her arms dangling.
— some hack writer, Friends, with Occasional Magic

A breeze blew through the room, blew curtains in at one end and out the other like pale flags, twisting them up toward the frosted wedding-cake of the ceiling, and then rippled over the wine-colored rug, making a shadow on it as wind does on the sea.
— F. Scott Fitzgerald, The Great Gatsby

See how the words in the second passage are harder to predict?

High-entropy writing can simply mean putting things together that don't usually go together:

The ships hung in the sky in much the same way that bricks don't.
— Douglas Adams, The Hitchhiker's Guide to the Galaxy

An AMERICAN wearing a jungle hat with a large Peace Sign on it, wearing war paint, bends TOWARD US, reaching down TOWARD US with a large knife, preparing to scalp the dead.
— From a 1975 draft of the screenplay for Apocalypse Now by John Milius and Francis Ford Coppola

When you use a word that's true and unexpected, it's poetry. When you tell a story that's true and unexpected, it's literature [3]. So aim for the unexpected plot and the unexpected word.

Meaning-dense writing

This is taken a bit too far in modernist poetry, which has very high entropy:

dead every enourmous [sic] piece
of nonsense which itself must call
a state submicroscopic is-
compared with pitying terrible
some alive individual
— E.E. Cummings, dead every enourmous piece

The problem with measuring information content is that you would produce the most-unpredictable sequence of words by choosing words at random. Meaningless text has maximum information density.

What you want to measure is true, or, better, meaningful, information [4]. Writers often use words and tell stories that are technically low-entropy (the words aren't unexpected). But whenever they do, if it's done well, it's because they convey a lot of extra, meaningful information that isn't measured by entropy.

To convey a mood or a metaphor, you choose a host of words (and maybe even punctuation) associated with that mood. That makes that cluster of words appear to be low-entropy: They all go together, and seeing one makes you expect the others.

The sky above the port was the color of television, turned to a dead channel.
— William Gibson, Neuromancer

All the world's a stage, and all the men and women merely players;
They have their exits and their entrances;
And one man in his time plays many parts, His acts being seven ages.
— William Shakespeare, As You Like It

In a metaphor or a mood, the words convey more information than you see at first glance. That someone would compare the sky to a television channel, and that the world's channel is dead, tell you a lot about Gibson's world. That men and women are "merely players" conveys a philosophy. An extended metaphor doesn't just tell you the information in its sentences. It points out which parts of the two things being compared are like each other, in a way that lets you figure out the different similarities from just a few words. That is extra meaning that isn't measured by entropy (but would be by Kolmogorov complexity). It may be low-entropy, but it's meaning-dense.

Rhyme greatly decreases the entropy of the rhyming words. Knowing that you need to say something about a frog that rhymes with frog reduces the number of possible final words for this poem to a handful. Yet it's still surprising—not which word Dickinson picked, but all the things it meant when she suddenly compared public society to a ...

How dreary—to be—Somebody!
How public—like a Frog—
To tell one's name—the livelong June—
To an admiring Bog!
— Emily Dickinson, I'm Nobody! Who are You?

Sometimes you use repetition to connect parts of a story:

‘Twas the day before Hearthwarming, and a nameless horror had taken residence in Dotted’s chimney. Again.
...
‘Twas the day before Hearthwarming, and a nameless horror had taken residence in Spinning Top’s chimney.

... or to focus the reader's attention on the theme:

“It’s just that I’ve plans for Hearthwarming and—”
... “Don’t you worry about me. I’ve plans for this Hearthwarming."
... “Indeed, Your Excellency. I’ve plans for Hearthwarming.”
... “Yes. I am. Now go. I’ll keep. Don’t you worry. I’ve plans for Hearthwarming.”
... He had plans this Hearthwarming.
— GhostOfHeraclitus, A Canterlot Carol

... or to make a contrast:

Smash down the cities.
Knock the walls to pieces.
Break the factories and cathedrals, warehouses and homes
Into loose piles of stone and lumber and black burnt wood:
You are the soldiers and we command you.

Build up the cities.
Set up the walls again.
Put together once more the factories and cathedrals, warehouses and homes
Into buildings for life and labor:
You are workmen and citizens all: We command you.

— Carl Sandburg, And They Obey

That's okay. The repetition is deliberate and is itself telling you something more than the sum of what the repeated parts would say by themselves.

Predictable words are no better than vague words

Some words have lots of meaning, yet convey little information because we're always expecting someone to say them.

What words do I mean? I refer you to (Samsonovic & Ascoli 2010). These gentlemen used energy-minimization (one use of thermodynamics and information theory) to find the first three principal dimensions of human language. They threw words into a ten-dimensional space, then pushed them around in a way that put similar words close together [5]. Then they contrasted the words at the different ends of each dimension, to figure out what each dimension meant.

They found, in English, French, German, and Spanish, that the first three dimensions are valence (good/bad), arousal (calm/excited), and freedom (open/closed). That means there are a whole lot of words with connotations along those dimensions, and owing to their commonality, they seldom surprise us. Read an emotional, badly-written text—a bad romance novel or a political tract will do—and you'll find a lot of words that mostly tell you that something is good or bad, exciting or boring, and freeing or constrictive. Words like "wonderful", "exciting", "loving", "courageous", "care-free", or "boring". Read a badly-written polemical or philosophy paper, and you'll find related words: "commendable", "insipid", "bourgeois", "unforgivable", "ineffable". These are words that express judgements. Your story might lead a reader toward a particular judgement, but stating it outright is as irritating and self-defeating as laughing at your own jokes.

Our most-sacred words, like "justice", "love", "freedom", "good", "evil", and "sacred", are these types of words. They are reifications of concepts that we've formed from thousands of more-specific cases. But by themselves, they mean little. They're only appropriate when they're inappropriate: People use the words "just" or "evil" when they can't provide a specific example of how something is just or evil.

Avoid these words. Don't describe a character as an "evil enchantress"; show her doing something evil. Sometimes they're the right words. Most of the time, they're a sign that you're thinking abstractly rather than concretely. More on this in a later post.

It's meaningful for characters to be vague!

The flip side is, have your characters use these words to highlight their faulty thinking! Pinkie describes Zecora as an evil enchantress to show that Pinkie is jumping to conclusions. Rainbow Dash calls things "boring" to show that she's just expressing her prejudices and isn't open to some kinds of things.


[1] 70 years later, my current field, bioinformatics, is crippled because biologists still won't read that book and don't understand that when you want to compare different methods for inferring information about a protein, there is EXACTLY ONE CORRECT WAY to do it. Which no one ever uses. Same for linguistics. Most experts don't want to develop the understanding of their field to the point where it can be automated. They get upset and defensive if you tell them that some of their questions have a single mathematically-precise answer. They would rather be high priests, with their expertise more art and poetry than science, free to indulge their whimsies without being held accountable to reality by meddling mathematicians.

[2] And assuming some more abstruse philosophical claims, such as that Quine's thesis of ontological relativism is false. Which I have coincidentally proven.

[3] When you tell a story that's false and expected, it's profitable.

[4] The best way I know to define how much meaning a string of text has is to use Kolmogorov complexity. The Kolmogorov complexity of a text is the number of bits of information needed to specify a computer program that would produce that text as output. But this still fails completely to penalize random strings for being random. A specific random sequence still has Kolmogorov complexity equal to its length if you need to re-produce it. But you don't need to reproduce it. There's nothing special about it. The amount of meaning in a text is the amount of information (suitably compressed) that is required to produce that text, or one sufficiently like it for your purposes. For any purpose you can have for a random text, there are a vast number of other random texts that will serve just as well; the length of a computer program to produce a suitably random text is short.

[5] People usually do this by putting words close to each other that are often used in the same context (the same surrounding words), so that "pleasant" and "enjoy" are close together, as are "car" and "truck". This work instead took antonyms and synonyms from a thesaurus, and pushed synonyms towards each other and pulled antonyms apart from each other.


Alexei V. Samsonovic & Giorgio A. Ascoli (2010). Principal Semantic Components of Language and the Measurement of Meaning. PLoS One 5(6):e10921, June 2010.

Report Bad Horse · 2,512 views ·
Comments ( 64 )

....Fascinating.

You always give me a lot to think about with these posts. Writing, like most art forms, is something like an onion – for every layer, there is a layer beneath. When you grow in skill as a writer, it doesn't just improve your writing, it also makes you realize how much better your writing could be.

932827
Oh, absolutely. I'm much better than I was two decades ago -- and I still wonder how I come up with so much horrible stuff.

Although BH did have some kind words for me earlier this week, for which I am even more grateful now than I was then.

932827 That's a frightening thing about writing - you look ahead to see how much you still have left to learn, but you can only ever see a little way ahead of you. What's farther on all runs together in the distance. You can't know how high the mountain is when you start climbing. Maybe you never find out.

Most of the ideas went way over my head. It's a lot to think about, to be sure.

I would say that Joyce is a better example of ultra high density to the point of illegibility writing.

Other than that, nice essay.

This post is incredible.

Comment posted by Derpmind deleted Mar 20th, 2013

932851

Then again, the only way down from a mountain is a quick jump and a long fall (or a slow and boring decline), so maybe it's better that nobody reaches the absolute zenith of language. Or anything else, for that matter. I rather enjoy the moody, stumbling striving that is the writing process.

932889>>932950 This post is a geeky way of saying simple stuff. If you skim the techno-babble, you probably won't miss it.

Okay, let me start by saying that this was fascinating, and dense, and took some parsing on my part. So if I'm totally missing the point feel free to laugh at me. I agree with nearly everything you said from "Meaning-dense writing" on.

My problem, or if not problem, at least disagreement, comes in the opening definition of "high entropy writing." Correct me if I'm wrong here, but something I'm not finding any mention of in this is reader preference for contribution. This is something that bothers me to no end, because many writers seem to talk as though there's one reader out there, and we are writing for him, while the vast numbers of stories and styles of writing seems to indicate to me that there are at least several different readers out there who have different opinions of what they like in a story.

Your initial question is "what do we mean by wordy?" and to me the answer is simple: The information being provided by the words is more than sufficient to convey the information necessary to the proper telling of the story. Which is basically what you said. But what's missing is that that's going to vary from reader to reader. Some readers like all of the information laid out for them, some are happy to use their imagination. Some are more interested in having some subject explained and others are uninterested in the same subject.

As an example:

The brown house sat on the hill.

The old brown house sat unsteadily on the hill, teetering in the wind.

The old house sat on the hill, brown paint peeling, trembling as wind ripped at it's rotting facade.

Now, assuming that all of those are describing the same house, I think we can agree that the first one is not wordy enough. It doesn't give the information anyone would need to picture the house correctly (that the house is old and shaky). However, I, personally, find the second one to be just fine. The information is there, and since I'm not a fan of prose, that gives me the information I need in order to know what the house we're talking about is like. And I'm not a special snowflake, there are other people out there like me. But I'm well aware that many people, writers and readers both, would prefer the third. It requires less imagination and immerses you in the story more and stuff like that.

So for some portion of the population, example three is wordy, and for some it's an example of good writing.

The same is true of selecting any information to add to a work. Some people enjoy reading two characters having a conversation, even one that's unnecessary to the story, and some don't. Some like to understand the thought processes of characters, and some feel they've developed an understanding of the characters and can work those processes out from their ultimate decision. History lessons (real and fictional), back stories, daily routines of characters, and minor events- these are all places where readers will rarely agree how much information is useful or entertaining, and how much is excessive.

Now, to get back to where we agree, the things you mentioned are all excellent ways of satisfying the broadest number of people at the same time. Those who dislike too much information for the story being told are far more likely to forgive, or even enjoy it if it's delivered economically. And those who like it will be just as happy with five well crafted sentences as they would be with twenty five decently crafted ones. So what you describe is, in fact, "good writing" in that it appeals to a wider audience, though other kinds of writing can and do find an specific audience that's happy with that particular style.

This was beautiful and very useful.
Thank you.

933070
What you're describing is (to use another fancy mathematical word that means precisely what I want to say) orthogonal to what Bad Horse was describing.

The entropy of a text, its predictability, is agnostic as to length. It's a running average of the entropy of each individual word. The reason wordier descriptions have less entropy is not because there are more words, but because "wordier" texts use more words that cluster together in similar patterns and are thus easier to predict. It's easy to have short texts with low entropy, too: a mathematical proof is highly structured writing whose entropy is extremely low, because reassembling the proof without any given word should simply be like solving a logic puzzle with a single right answer.

(That last paragraph is an example of "bad" wordier writing. I repeat myself several times. If I'd used those words instead to make new and different points, I could have increased entropy without touching length.)

One of Bad Horse's points is that, whether you're writing in a more verbose style or a more laconic style, moving to higher-entropy (more unexpected) words generally improves a story. (See, however, the caveat on modernist poetry and randomly generated words.)

While trimming the fat is a simple way to do that, adding more detail can also increase entropy. Notice how, in each of the house-on-the-hill examples you gave, each detail you added was something that couldn't be predicted from the original text? The fact that it's unsteady is new information. The fact that its paint is peeling is new information. (The fact that it's teetering in the wind is a weaker addition, because it's merely reinforcing the existing fact that it's unsteady. But the new fact that the wind is blowing might, in context, be significant in the story.)

So: value judgments on the amount of description provided don't matter one way or the other to the idea of entropy. They're measuring independent things on different axes. (Hence: orthogonal.)

Well, most of this flew right over my head, but I think I've got the general thesis down; Words good, douchebaggery bad. That's actually pretty smart, I really feel smarter! Thank you!

Wow. Bad Horse my hat, if I wore one, would be off to you. That's brilliant. It never occurred to me to look at things from that point of view. It's fascinating enough that you thought to apply information theory in that way, but with the next-to-last section I think you've made a serious dent in what "Show, don't tell" actually means. The words most frequently used (and in bad writing, too) are judgement words -- telly words, but only on certain axes, good/bad most of all. And those words, being expected (in a way, overused) carry little information. The word 'wonderful,' say, should mean a lot. Full of wonder. Magnificently amazing. But after decades, hell, centuries of use, it's power is worn down to a tiny little nub. It's collapsed to its basest form -- of meaning 'good.'

Wonderful (hah!) essay, all told. :twilightsmile:

Information theory, Kolmogorov complexity, and writing? I think that essay gave me a nerdgasim. :twilightsmile:

I think these principles also provide the answer to how to think about tropes. It's very much the same situation at a higher level. If you use a trope and the reader can predict exactly what's going to happen then the information content of your plot is low and your trope becomes a cliche, if you go overboard trying to avoid cliche you get the plot equivalent of a the too-modern poem.

933070
So, as Horizon said, to make this fair, the sentences need to be the same length. So here are two fairer example.

1) The brown coloured house sat on top of the high hill overlooking the slope down to the valley.

2) The old house sat on the hill, brown paint peeling, trembling as wind ripped at it's rotting facade.

Example (1) contains only the information of your first sentence "The brown house sat on the hill" but it uses many more words to say these things (it was hard going saying so little in so many words. I'm particually proud of "brown coloured"—of course brown is a colour!). I think (1) is unambiguously worse than (2).

Easily one of the most fascinating and thought-provoking things I've read this year, and it was posted on a site for fanfiction for a children's cartoon. The magic of the Internet at work.

Seriously, thank you for this. I may have to bookmark it.

I'll lower the entropy in this comments section by echoing everyone else: fantastic blog post. Blends together an understanding of information theory perfectly with practical application towards writing. Hats are off and populating the floor.

In 1948, Claude Shannon published "A Mathematical Theory of Information", an essay (or very short book) that's surprisingly quick and easy to read for something with such profound mathematical content.

Wasn't the paper in question called "A Mathematical Theory of Communication"?

Very fascinating stuff. My natural inclination was to react similarly to 933070 and try to tie this to the notion of "purple", but that's not quite correct. After reflection, I'm leaning toward purple being a combination of Bad Horse's point and Horizon's counterpoint: high-entropy writing on topics that stray from the core topic of the piece (e.g. scene descriptors). Perhaps that's then why really good, really purple authors can still be enjoyed, because their purple is high-entropy and thus entertaining poetry in its own right, even if it doesn't directly advance the plot. Cue writing philosophy debate if you want to be economic with your words and ideas, to more quickly reach the conclusion, or if lounging about to be all artsy holds merit. The "answer" is probably the low-entropy reply of "all things in moderation"—cliche but so often true.

On that note, it's interesting to see how moderation doesn't seem applicable to entropic writing. Sure, the postmodernist poetry examples, but that's just because you're approaching alphabet soup. I forget how long ago I first read Douglas Adams, yet the line "The ships hung in the sky in much the same way that bricks don't," still paints a vivid mental image, feels fresh and novel, is still humorous to this day, and is just so quotable. Though my personal favorite Adams quote, pretty sure from LIfe, the Universe, and Everything, is "A magician wandered the beach, but no one needed him." Sure, a cynical part of my brain wants to say "Oh, but you're breaking the flow, which ruins immersion." While that's generally good advise, it seems quite apparent that avoiding being interesting is bad advise.

And yet, as I'm skimming this post before submitting, I find myself sprinkling in words that, while they clarify and improve flow, are decidedly low-entropy. Even while self-conscious of it, I don't want to strip away these low-entropy connectors. So I suppose, while high-entropy is preferred, sometimes concessions are required for the sake of other elements of good writing. Or it means i'm a hack.

Another point that confused me was metaphors. They're colorful, information-rich and entropy-rich. Yet I never see them when reading, except perhaps in wry comedic fics. I even have vague memories of a fellow fanfic author, early in my "career", actively dissuading me from their use. While I didn't end up taking that particular piece of advise to heart, I sure haven't actively sprinkled metaphors into my writing either. I think part of the scare is that it breaks a previous rule of "using the right word, not a close word." Metaphors literally have "it was like a" right in them, so of course it's less accurate! But if you're trading a small amount of accuracy for a large gain of color and entropy, I suppose it's a net win, so long as it's well-crafted and applied judiciously, no?

933788 Doh! You're right. Brain malfunction. BTW, the next year he republished it as The Mathematical Theory of Communication, because by then he was confident that it was the only possible correct theory of information.

933810 Perhaps that's then why really good, really purple authors can still be enjoyed, because their purple is high-entropy and thus entertaining poetry in its own right, even if it doesn't directly advance the plot
Sounds plausible.

933398>>933568

So: value judgments on the amount of description provided don't matter one way or the other to the idea of entropy. They're measuring independent things on different axes. (Hence: orthogonal.)

Agreed. I'm afraid I wasn't clear about what I was referring to. (I actually considered quoting it, but decided I was tl;dr already. But for clarification purposes, tl;dr all the way!)

I wasn't arguing with the usefulness or basis of high entropy writing. I was quibbling (which is a short, non-mathematical word which none-the-less means exactly what I want to say) with the definition of "wordiness," which happens to be planted in the beginning of the "high entropy writing" section. This part specifically:

The implications of being able to measure information are hard to take in without thinking about it for a few decades [1]. For writers, one implication is that the question "Is this story wordy?" has an answer. I could write a simple program that would analyze a story and say how wordy it was.

The caveat is simple, subtle, and enormous: A given text conveys a well-defined amount of information to a given reader, assuming infinite computational resources [2]. Without infinite computational resources, it depends on the algorithms you use to predict what's coming next, and there are probably an infinite number of possible algorithms. I could easily compute the information content of a story by predicting the next word of each sentence based on the previous two words. This would warn a writer if their style were cliched or vague. But it would miss all the information provided by genre expectations, our understanding of story structure and theme, psychology, and many other things critical in a story.

The problem is that this concept starts with the idea that a writer is actually attempting to convey the correct amount of information to tell the story to begin with, and redundancy and cliche (information the reader knows or could predict) is therefore wordiness. This is an incorrect assumption, because a writer may be trying to add all sorts of high-entropy but useless information into the story. In that case, they might be using exactly the correct words, in exactly the correct way, to maximize their efficiency in conveying the information, and it's still wordy.

When I, at least, am talking about wordiness, I'm not only talking about redundant information, I'm also talking about useless information, where useless information is information that's unnecessary for an individual reader to enjoy a story. Redundant information tends to be low entropy, but useless information can be extremely high entropy-- a reader certainly isn't going to expect you to take a break from your story to describe the linguistic structure of the word "elephant." But providing that information is only going to improve the story in certain instances, and to certain audiences. To other audiences, you've just made the story "wordy." Not "too long"-- if the same word space had been allocated to, say, a conversation between the main characters, or the genealogy of a character, the audience who isn't interested in linguistics might have been interested in that. Or not.

(A perfect pony example: Pinkie Pie dialogue is likely to be high-entropy, but often primarily useless information. Tolerance of it among both readers and the characters reacting to it is going to vary based on personality and situation. In the same instance the information can be funny (and therefore interesting) to some, and annoyingly wordy to others.)

(See also: A shaggy dog story.)

So, regardless of length or entropy, one thing that's going to contribute to a charge of "wordiness" is how much of the information you offer is perceived as useful or interesting by the reader. And that is going to vary from reader to reader, which means that Bad Horse can not, in fact, write a program that would tell you how wordy a story was without some definition of the exact tolerance and attitude of the reader (which would be impossible to measure.)

Now I agreed that high-entropy writing is a good thing. Redundancy is a part of wordiness, and the easiest part to fix because it is largely consistent. And high-entropy writing, being in general more fun to read, will buy you a certain amount leeway with people who are finding your information useless, especially when combined with a certain amount of wit and style.

So, that was my point, hopefully better explained.

933398
933431
933568

When I read a piece of science-based non-fiction that I don't understand:

I tend to become prickly and defensive. I mean, I got '4's on both the physics and calculus AP tests when I graduated from high school thirty years ago, didn't I? And yes, my first quarter at college quickly ran me smack into the limits of my brain power on such matters and sent me scurrying to the humanities side of campus, but I still like to think--despite all the evidence to the contrary that I keep accumulating--that I'm smart enough to glean the basics when folks start talking about information theory and string theory and grand unified field theories.

Last night, though, after floundering and foundering my way from top to bottom here three times, I felt the telltale signs of my brain hopping up and down in frustration. Fortunately, I have a long-standing rule never to post comments when my inner Magic Eight Ball is pointing that way, so instead of flinging my initial uncooked scrambled-egg of a response up here, I saved it to my computer's desktop and went to bed.

Coming back this morning, reading your comments, then rereading the essay a fourth time, I can now see that what I thought the essay was saying--"There's a simple, mathematical formula to writing good fiction"--isn't what it's saying at all. So thank you, folks, and my apologies to you, BH, for these leaky thought processes of mine missing your point so very thoroughly.

'Cause now what I think it's saying--and I may be just as far off as before, so please let me know if I'm still floundering--is the same thing that the two or three good writing seminars I've taken over the years have said: "The more specific you write, the more general it becomes."

Which is to say: carefully-chosen, concrete words and details will bring a piece to life in a reader's mind, will get that reader to "see" the scene the words are describing, and will therefore give the scene greater meaning to the reader and greater enjoyment, too. Or to paraphrase Mark Twain, the difference between the right word and the almost-right word is the difference between the lightning and the lightning bug. :eeyup:

Mike

Studying Information Systems in school and not one whiff of something as fascinating as this. Adding to my list of must-read books.

College, why have you failed me? :facehoof:

933905
Yes and no. Yes, the essay above could be taken to mean just that. But more importantly, Bad Horse is making the argument that this piece of advice[1] can be derived[2] from a central information-theoretical insight, viz. that what we see as wordy isn't a function of merely length, but is also a function of the information content of writing. Bad Horse suggest a simple heuristic based on this insight, which is to think about the information content of words when choosing them. In this instance, he's talking about Shannon information where, to simplify a bit, measure of information is measure of surprise. By this I mean, the words that surprise the reader the most, carry the most information (entropy[3] to use the correct term) and thus are the 'best.' Of course this is a heuristic[4] and so must be used with care, otherwise a rand() function and a dictionary would be The Greatest Writer Ever. But, on the balance of things, it's good to try for high-entropy when describing things. Which generally does correlate with specifics, as you said[5].

As for me, I pointed out that the selfsame insight could be used to explain the longstanding feud both Bad Horse and yours truly have had with "Show, don't tell." We complained that plenty of good prose is as telly as all hell. We compared a purely show-y style of writing to fiction as written by an alien anthropologist. And so forth. Anyway, what this insight tells us (possibly) is that the sin of 'telling' is probably the sin of using expected words -- words so expected they are cliche and tell (ha!) the reader nothing. The cited research even describes just what those groups of words probably are. So if we want to say something that's normally told through some of those devalued words, we ought to experiment with showing, because telling won't work -- the specificity has been leeched out of the word. Otherwise, telling is as valid a choice as showing, which provides us with flexibility and explains why there's all that good writing that tells all the time. To illustrate, I think that Bad Horse's idea could be said to indicate that the line like:

"He felt wonderful! Wonderful, and carefree."

is probably a bad idea, while a line like

"She felt melancholy tinged with disgust."

is better, because of the unexpectedness of those two words (especially together).

Does this help?

[1] As well as, possibly, some other ones.
[2] In the non-strictly-mathematical sense of the word.
[3] It's unfortunate this word was chosen in some respects, but there's reasons for it, viz. the aforementioned isomorphism.
[4] Rule-of-thumb, basically.
[5] But be warned, this isn't always the case. What may be a piquant detail can, through irritating overuse, say, become expected and lose its high-entropy nature. For instance, 'awesome' used to be a very detailed word -- very high-entropy because it described a very peculiar emotion -- of simultaneous attraction to something and knee-buckling fear of its immensity/power. However, it's now a low-entropy word, because it means so little, and is so expected. I've heard better-than-average fries described as 'awesome.' This sort of decay can be global or, I think, reserved for a genre. Here's a wonderful example:

The door dilated.

-- R.A. Heinlein, Beyond the Horizon

Now then this was the marquee line for science fiction, and it got to be so famous because it used a high-entropy word -- it conveyed the alien nature of a scene in one word. If you used it now you'd be evoking Heinlein -- not redefining genre. Just so in ponyfic. Globally speaking 'lavender' used as a color is a very peculiar shade and is fairly high-entropy. In ponyfic? As low-entropy as you can get.

933926

How on Earth did you manage not to gear about Shannon? Introduction to Computing (or what have you) classes tend to mention him in the very first class, just after, like, Turing. I've had the notions of Shanon entropy and complexity of Kolmogorov-Chaitin hammered into me at least twice in my studies. Can't say that they've took much[6], but they sure as hell tried.

[6] Dumber than a box of rocks, and all that.

That someone would compare the sky to a television channel, and that the world's channel is dead, tell you a lot about Gibson's world.

The interesting thing is, it tells you something different now than it does when it was written.

933956

That might help:

Let's take a look. Because my humanities-major's mind is now translating the phrase "information content of words" into the idea of connotation vs. denotation. All words, I was taught decades ago, have dictionary definitions--their denotations--and larger, contextual definitions--their connotations.

Connotations change much more quickly than denotations and can in fact render a word functionally useless--which I think is the larger point Bad Horse and you are making. A writer has to be aware of what a word "says" as well as what it "means" to make an informed choice when putting together a sentence.

Is that closer?

Of course, like 933864 says, it does in the end come down to personal choice. "Language and attitude" is always my motto: what do I want to say and how do I want to say it? I mean, for my part, I don't find that "melancholy tinged with disgust" really conveys any more information to me about the character's state of being than "wonderful and carefree" does. Because "melancholy" has several different connotations: "generally depressed" is one, but so is "saddened by memories."

It could just be that when, as you say, two words are less likely to be paired together, a reader like me needs more context to understand their relationship. Or give me some specific physical details or a specific image: "She turned away, her fists and stomach both clenched," maybe, or "He felt like a bird blowing soap bubbles."

But then I'm so far outside the usual when it comes to being a reader, I always hesitate to even get involved in discussions like this. I mean, I've never found a work of Terry Pratchett's that I liked, nor a work of Steven King's, nor a work of Robert Heinlein's, yet millions of people all over the world enjoy stories written by one or more of these gentlemen. In matters of taste and opinion, therefore, I recognize that I'm lacking something and turn away with a blush. :twilightblush:

Mike Again

933905 Yes, though I should have emphasized that the right word is ultimately more important than being unexpected. I put causality in the wrong direction. The right word is specific and hence has high entropy. It isn't the right word because it has high entropy.

933956 I think it was Mike Vassar who first pointed out that the connotations of all words, over time, are dragged towards the principal axes of meaning space, especially "good/bad". Awesome, terrible, gorgeous, incredible, marvellous, fantastic, wonderful, brilliant, horrible, are all nearly dead. Even words that had originally nothing to do with good/bad get dragged into it: common, villain, vulgar, barbaric, vandal, bourgeois, radical, independent, homely.

We have some words that are almost exact synonyms except that one connotes "good" while the other connotes "bad". Can't think of an example just now.

934409
Yes, I think you have it. What I would say, in addition, is that this insight tells us that the wrong word is almost always the expected one, or rather, the one whose connotations have so overshadowed their denotations that they hardly mean anything aside, perhaps, from 'good' or 'bad.'

Now regarding the show-vs-tell bit, I may be too crappy at writing to make a good example, but I don't think yours is an improvement. The emotion I imagined is at seeing an old friend severely diminished -- melancholy at their fall tinged with just a bit of guilty disgust. The actions you described tell that to me less well, I think. I could be wrong. I really like Terry Pratchett, for a start.

934462
Well cool/cold is one, or close to it, anyway. But, yes, I agree. I've noted (and bemoaned -- well it's traditional) this general decay. I'm especially peeved about awesome. There's actually no word to replace it!

This is a remarkable discussion all around. Thank you, everyone.

933864
I appreciate the clarification; that was a much better-explained objection, and I agree.

933905
+10 respect for not only sitting on your snark overnight, but also being willing to take a final, fresh pass before saying anything. I think GOH/BH said everything I would have, but I wanted to specifically acknowledge the maturity of your contribution.

933956
> what this insight tells us (possibly) is that the sin of 'telling' is probably the sin of using expected words
My instinct is that the show-vs-tell discussion doesn't intersect with entropy in quite the way that you're suggesting, but I may have to think about it a while before being able to advance the objection coherently.

Here's a first stab: High-entropy text draws attention to itself. Low-entropy text fades into the background. ("Hi!" Jack ejaculated, vs. "Hi!" Jack said.) When authors tell in a way that backgrounds important information, or draws attention to unimportant detail, telling is being misused. Even telling with low-entropy words has its place: e.g., when you're providing necessary exposition that would derail your story if given focus. Or to avoid saidbookism. ;)

934462
> almost exact synonyms except that one connotes "good" while the other connotes "bad"
famous / infamous

934803
Sorry. You are right, at least in part. I overplayed my hand and focused on adjectives when thinking about this. Indeed, some exceptionally low-entropy words ('said') are necessary for good writing, and high-entropy text may end up sounding like the Eye of Argon. The informational content of 'many-fauceted scarlet emerald' is through the roof. :pinkiehappy:

I should have probably said that the reason that 'telling' is sometimes a good idea and sometimes an error could be related to expected-vs-unexpected words. If you tell using low-entropy (or connotation-smothered) words, then you are conveying little specificity and your writing fails. If you tell using high-entropy words, you are more likely to get high specificity and to convey to the reader a powerful impression of some sort.

Does this sound more reasonable?

934462
awesome/awful?

934864
No need to apologize! We're all (myself included) stumbling around here, trying to explore what we thought was familiar territory via an unfamiliar framework (as if we'd navigated by property lines for years, and suddenly someone hands us a topographic map) and making all sorts of discoveries along the way. I'm loving all the little correspondences that are popping up, even if I'm not sure we're getting them right on the first pass.

So what it sounds like you're saying is: the reason people wrongly complain about "telling" is that they are complaining about low-specificity (low-entropy, expected) words: bland writing. There's no reason to complain about "telling" in non-bland ways.

If that's your point, then I think that both of us are simultaneously correct; which means we're not really talking about the same thing yet. That's why I figured I'd have to chew on it for a bit.

934885
That is very close to my point, yes. The only part you missed (because I didn't mention it, being silly) is that I'm talking mostly about EqD-style "Show don't tell," which applies particularly to emotional states. But, yes, otherwise we are in agreement, or at least, not in disagreement.

Well, 934931 pointed out to me that I'd missed one of these things and wasn't chiming in on the conversation as usual. So now I'm here. At 3:00am and tired. So bear with me.

This stuff is all well outside my ken, entropy and Kolmogorov complexity. I'm a statistician and a philosopher / theologian. That's what I know. So I had to take some time trying to sort this into a framework I could understand. And I think of everything that's been said, I personally feel that 933864 probably made the most meaningful contribution by talking about useful vs. useless information.

There seems to be an overwhelming consensus on display that high entropy language is a worthwhile aim, despite the fact that it's been mentioned more than once that maximal entropy language is gibberish. I don't know that I buy the value of high entropy language at all. But at the same time, I'm confident I must be misunderstanding something. Because as far as I can discern, the statement that, "The right word is specific and hence has high entropy," makes no sense.

The right word, by virtue of being the right word, should be perfectly predictable.

Now, the idea of Kolmogorov complexity – or to put it differently, the idea of looking at the amount of information necessary to create a pattern of words – that sounds much more promising... and yet still lacking. Because again, I see complexity for its own sake as a weakness in any system, and the argument being put forward here seems to be that maximizing meaning, as operationalized by Kolmogorov complexity, is somehow good. Perhaps that's not the argument being made, but if it is, it seems to suffer from the same general problem as the one exhibited by random collections of words. If meaning is operationalized by Kolmogorov complexity, would not meaning be maximized by creating such a mish-mash of disjoint ideas that one would have to sort through each independently?

To be honest, I really don't understand Bad Horse's assertion that you can deal with random strings as low Kolmogorov complexity objects. Well, that's not entirely true. I think I do understand it, but neither the sort of concrete understanding of 'meaning' it would provide, nor how it would escape the problem of suggesting that more Kolmogorov complexity is always better, until we're dealing with intractable labyrinths of 'meanings' as the pinnacle of literature.

So that's a lot of, "I don't know that I buy what you guys are selling." But that's not terribly useful. Do I have anything of my own to try to sell?

Well, it sounds what's really desired is some way of dumbing this all down to a sliding scale, like 'show' vs. 'tell', something where we can say "moving in this direction is good; moving in that direction is bad". I'm not convinced we can do that, but if we can, I think it's going to have to revolve around maximizing information utility. It's going to have to involve density.

For an example of what I mean, take π. We aren't going to be expressing all the digits of π anytime soon, but we can write a program to obtain new ones, and that program can be a whole lot smaller than the number it derives. As I said, I'm not even remotely up on things like entropy and Kolmogorov complexity, but I'm envisioning Kolmogorov complexity as the size of the function needed to get new digits of pi and the entropy to be the amount of information contained in a collection of digits of pi.

Simple programs are more efficient. So it seems like what we really want is the simplest program that can create generate the most information.

I'm so tired that I'm really starting to lose my train of thought here, and I'm not as good a person as 933905 to be willing to wait until tomorrow and edit this. But to sum up where I'm going, it seems like what's really desired is the simplest pattern that can create the appearance of the most complexity. Efficient representations of data.

935927
I think I see where you miss our[1] point. It's not high-entropy[2] text that's the key. It's high-entropy words. Text is too complex and too fiddly to subject to this analysis, I think, at the very least because language has built-in redundancies starting from the transcription level upwards. But. You can consider the entropy of the words you are using.

Now it's not a simple rule -- use high entropy words. That's a surefire ticket to Eye Of Argon land. But, it's an useful heuristic to avoid low-entropy words in certain contexts. AugieDog interpreted the low entropy of words as them being overburdened by connotative meaning, collapsing into simple dichotomies of, say, good/bad.

The 'right' word is likely to be high-entropy[4] because high-entropy is likely to correlate well with specificity. If the word is unexpected, it means it is rare, which generally means its less overshadowed by connotation, which means it means a very specific thing. And so the right word (which is predictable in the context of the story because it is the right word) is likely to be specific (not vague) and that means it's likely to be high-entropy.

I extended this onto "Show don't tell[5]," specifically on the question on how come writers generally accepted as being the bee's knees, wasp's elbows, and any number of other desirable body parts of the suborder Apocrita, do a lot of telling. My contention is that, when it comes to describing emotional states, the chief sin is to use low-entropy words, i.e. words so worn down by use to describe emotional states that they lose all specificity, and hence all ability to paint a picture for the reader. The reader feels little empathy, because there's precious little to empathize with -- all the shades of emotion that make an emotion recognizably human are gone, swallowed by the enormous amorphous glob of 'good' or 'bad.'

[1] Well...I'm not sure there's a 'we' at this point, but bear with me.
[2] Forget Kolmogorov-Chaitin complexity for now. It doesn't encode meaning. As far as can be determined, nothing does. What it does do, however, is say interesting things about the compressibility of information. It's really quite fascinating. The distribution of digits in Pi is very close to uniform. That would mean that the informational content of, say, n digits of Pi where n is very large is huge. But the Kolmogorov-Chaitin complexity of it[3] is as much as is needed to encode the Leibniz Series, say.
[3] Well actually, the bound of such a complexity. It's provable that Kolomogorov-Chaitin complexity itself can't be calculated.
[4] As viewed against the backdrop of language in general/or language in chosen genre, say.
[5] The version focusing on emotional states, to be precise. There are other interpretations, of course, like ones that claim the maxim is about character traits -- i.e. you don't tell the reader the character is brave, you show us bravery, through deed or through the reaction of other characters. Obviously, that's a different kettle of fish.

935967
935927
You're both right. Bradel, you're onto the major difficulties. Yes, a simple measurement of high entropy does not make a text good, but I did have a large section about why high entropy doesn't make something good, that talked about useful vs. useless information, titled "Meaning-dense information". So I feel that I've addressed that, though not thoroughly.

>To be honest, I really don't understand Bad Horse's assertion that you can deal with random strings as low Kolmogorov complexity objects.

The key point is that it's useless to analyze a string in and of itself. The random string has high information content, but meaning, as different from information, is relative to an observer. Meaning (to person X) is information that is useful (to person X). If the string was generated by a Random-Number Generator, and it wasn't just blind luck that it was useful to X, then any number generated by that RNG would be equally useful to X. So the amount of information it takes to specify a useful string is just the length of the RNG code.

It sounds like I'm claiming that the word "meaning" induces a partition of possible strings into equivalence classes based on their function. Not exactly, but that might be a useful way to think about it.

>If meaning is operationalized by Kolmogorov complexity, would not meaning be maximized by creating such a mish-mash of disjoint ideas that one would have to sort through each independently?

I said meaning-dense, not high-meaning, so the thing to maximize would not be meaning, but meaning per word. But, you might be right. Which has more meaning per word: A well-constructed story, or a phone book? My impression from my work in knowledge representation is that the amount of knowledge needed to construct any natural story is vastly greater than the length of the story, so any story has greater Kolmogorov complexity than a phone book of equal length.

Euclid's five postulates are dense in meaning, because all Euclidean geometry can be derived from them. A well-built story is like that. It contains elements that don't contradict, and that interact with each other to imply many things.

it seems like what's really desired is the simplest pattern that can create the appearance of the most complexity. Efficient representations of data.

"The appearance of complexity" is a problematic phrase. If you have 2 stories each of 1000 words, and one required 10 million words of knowledge to create, and second other 20 million, the second has higher K complexity, and so it (the story) is the more complex pattern, and the more efficient re-presentation of knowledge.

>It's not high-entropy[2] text that's the key. It's high-entropy words.
High-entropy words are most tractable. I think one could apply the same ideas to the text as a whole, if one were clever and patient.

>And so the right word (which is predictable in the context of the story because it is the right word) is likely to be specific (not vague) and that means it's likely to be high-entropy.

Well, if it's predictable, it's low-entropy. By high-entropy I don't mean merely low-frequency. An entropy calculation must consider the context. My idea is worse than useless if the correct calculation gives low entropy but we chose the word because our crude calculation gave it high entropy.

So Bradel has an important point. My gut feeling is that, overall, there is a tradeoff--more unpredictable words are likely to not interact with the previous words of the text, making word soup / a dictionary, and not a set of propositions from which one can construct a great mass of inferences. So greedily grabbing some high entropy for individual words will lower the meaning present in the text. Poetry is (in this info-theory argument) when you walk right up to the edge of that tradeoff, where you find words that give the greatest product of (unpredictability * usefulness).

934803 famous/infamous: Correct, but that pair seems morally defensible.
934885 awesome/awful: That's a really interesting example.
I was thinking of sneaky words that people use in arguments, where they can use a word that's technically correct, but is used to make a value judgement with no evidence. The best I can think of is "cheap" vs. "inexpensive", but I know there are more-subtle ones.

935927>>935967 Here's the problem: Newton had a bunch of observations of how fast things fell or rolled. He derived the laws of gravity from them. So the laws contained zero additional information, and zero entropy. Were they therefore unsurprising and uninteresting?

No. They contained no information, yet were exactly the sort of surprising thing I want a theory to handle. (I'm modeling reading a story as scientific discovery. In both cases, you seek the pleasure of finding things out.)

You shouldn't value a story for the new information it conveys to you, because the story is fiction. Any completely new information it contains, that was not inherent in or implied by your previous experience, is a lie. The story's value is in correlating and connecting information you already had.

In providing new frameworks and theories. Newton's theory of gravity let him throw out those pages of observations, because he could re-compute any of them any time he wanted to. It was the shorter, more information-dense, less-wordy representation of the information.

So I should restate my post to say something like this: A good story is one that has lots of meaning per word. You might measure meaning as the knowledge needed to write or to understand the story, or that you can infer from it. Or you might measure it as the quantity of stuff you already knew that the story made you think about, or gives you a better grip on. Or the amount of information you can forget because it is now summarized by your memory of the story.

(This makes me think that an oracle (in the technical sense) would not find stories or science interesting. If the enjoyment of a story comes from discovering a more-compact representation for things you already knew, and improving your computational efficiency, then no. An oracle is already infinitely efficient. Perhaps it's pointless to speculate on the feelings of theoretically-impossible beings. But perhaps it suggests that making ourselves more perfect would diminish our capacity to enjoy stories.)

On the word level, the best word is the one that makes the best tradeoff between being unexpected enough to connect to new ideas, but expected enough to also connect to the old ideas. It is a bridge.

936158
Regarding words alone, I think we can agree easily.

Regarding meaning...interesting. What you are suggesting, then, is that the pleasure of reading a story[1] is discovering more efficient ways of expressing certain forms of knowledge? It doesn't tell you new things[2], it instead takes your current knowledge and provides a new framework for it which can be more efficient[3] or can just provide a purely aesthetic enjoyment[4]. I'm not sure this is right[5], you understand, but it sure is fascinating.

It's especially curious that you model a story as one would a scientific theory. I say curious because even cursory thought draws interesting parallels between the properties of a scientific theory[6] and those of a good story. Theories ought to be predictive -- can it then be said that a good story also lends itself to sprawling? To fanfiction, in fact? Theories ought to be falsifiable -- can it then be said that a good story must operate within a rule framework which allows its internal logic to fail[7]?

The answers are probably 'no,' truth be told, but I am curious if it turns out that the answers are, actually, 'no, but...':twilightsmile:

[1] Or a certain type of story?
[2] Well, it might depending on the research of the author, but I think we can ignore that because people clearly enjoy stories set in entirely non-factual settings, like, say, those with magical talking horses. To pick an example entirely at random.
[3] Hence the self-improving aspect of literature
[4] Possibly a weak point -- is it always efficiency we admire?
[5] In the sense that I'm unprepared to commit to any judgement in any meaningful way without thinking about this way more.
[6] In the Poper/Lakatos sense. Though Lakatos'd say "Research Program" of course.
[7] Shades of Sanderson's Law?

Oh, sweet Morning, when my brain returns to functionality.

935967
I think I see where my disconnect comes on the entropy thing. Understandably, I'm just reading entropy as an analog to "probability of appearance" – as I said, I do statistics and philosophy / theology, and if it doesn't fit one of those frameworks, I get in trouble. Thankfully most things seem to fit just fine. Where I erred, though, was in considering (if you'll bear with me extending the metaphor) a conditional probability of appearance based on known text and not a joint probability of appearance with known text [1][2]. My initial framing of the problem was poor, and would let 'phrontistery' win just about any competition. Change the framing a little, and this idea begins to make much more sense.

936120
I am positively embarrassed by my failure to notice that you were talking about density all along. So it sounds like what you're talking about with meaning isn't exactly, but is almost, analogous to some sort of sufficiency measure[4], and then Kolmogorov complexity takes on something of a Cramer-Rao role as a measure of how small a sufficient representation can be. I'm also somewhat embarrassed I didn't pick up on that last night. It's a pretty smooth way to think of things. Now, how one compares the proportion of information contained in a sufficient statistic to the remaining variability in data after accounting for that statistic strikes me as a pretty hard problem, and one that would almost certainly need a nonparametric framework.

How does this bear on the phone book vs. story problem, then? I think that remains unclear, but maybe I'm just not taking the time to think it through. A sufficient representation of a phone book is basically just going to tell you that you have a list of names and seven (or ten) digit numbers, and that these names are going to appear in alphabetical order. It doesn't take a whole lot to represent that sufficiently, but while having a sufficient representation removes a lot of randomness from the phone book (knowing where to expect words and numbers is a bit of a big deal), it doesn't give you enough information to make any serious predictions about what you'll see next. You framed this problem as meaning-per-word, and I think that's fair. We're constructing the sufficient information, the predictive information, in our heads as we go along, so that's kind of out of the writer's control.

To me, the question then becomes, how efficient is a writer at getting that predictive information into our heads, and how good are the predictions one could make with that model. With a phone book, we get all the necessary predictive information very quickly, but it doesn't help us make very specific predictions. Certainly nothing we could call repeatable. A good story is going to take more time to get predictive information into our heads – we need to know genre, tone, characters, etc. But that predictive information is also going to be much better. If we know genre, tone, and characters, we begin to be able to extrapolate the shape of a story. We can make predictions, and these predictions may even in some sense be repeatable.

How does this bear on word entropy? So far, I think this remains largely orthogonal, so yes, we have two nice dimensions on which to work like other people have been suggesting.

The problem I see now is whether the system favors complex plots and characters over simple ones. Simple ones are easier to predict, so... aha. Meaning density, meaning-per-word, depends on not just the number of words, but how much total meaning there is to be gleaned from them. Here's where we get our "why should a long story include subplots". Thinking about it more concretely, you're going to maximize meaning-per-word if a word provides information on multiple predictive dimensions. Perhaps one word describes Rarity's attitude well; another word is more setting-appropriate for a British mystery style story; and yet a third word, perhaps not as precise for Rarity or the setting, still does a good job bearing on both elements. More total levels of meaning means more roles each word can play. And this, finally, seems to tie straight back into entropy. We don't want a word that's unexpected in context, we want a word that's precisely cued to context, and...

Oh bother, I'm going to have to revisit the entropy ideas. I'm not sure they're staying coherent here.

Anyway, this reply has gotten long enough and includes some meat for chewing. So I'll let other folks do that while I eat my breakfast.

[1] My first footnote! I'm not sure whether to feel excited, dirty, or derivative. I probably should feel all of the above.
[2] This is a bad metaphor, but easier to grasp. I think the truth of it is closer to the following: if we consider a given word as a piece of data and the words around it as parameters, I started out by considering a ratio, the probability of seeing that data over the supremum over all such possible data of probabilities of seeing that data. What I should have been looking at was the likelihood context. Since the probability of a piece of data given its parameters is equal to the likelihood of the parameters given that piece of data, the numerator retains its form. The denominator, though, becomes the likelihood of the parameter set (string of nearby words) most likely to result in the appearance of a particular word. These ratios would I think, in the abstract, be something like inversely proportional to entropy – so if the ratio is small, entropy is large. The former, though, hinges on how likely it is to see a particular word rather than other words conditioning on context. The latter depends on how unexpected a word is relative to the surrounding words. An argument could be made for looking at either of these, but I think the latter one is more interesting. What we want isn't so much a word that's unexpected in context, but an expected word used unexpectedly. [3]
[3] And now you know why I used a footnote.
[4] For those of you trying to follow my stat-think, sufficiency means roughly "containing all the non-stochastic information relative to a collection of data". That doesn't seem to make things a lot clearer, though. Let's try it this way. Say you have a process from which you'll get some data (a distribution, if you will), and this process is governed by a set of parameter values. A piece of information is sufficient if it contains all the information about system parameters contained in the data, or to put it another way, if you can partition the process into two pieces – one that creates the sufficient information from the parameters, and another that creates the data from the sufficient information. We say that a statistic is sufficient if data depend on parameters only through the sufficient statistic.

935927
> "The right word is specific and hence has high entropy," makes no sense. The right word, by virtue of being the right word, should be perfectly predictable.

"Mary's father died. She felt ____." What's the "right" word? It's perfectly predictable if we have perfect information about Mary ("sad" if she loved him; "relieved" if he beat her; "guilty" if she accidentally killed him; etc.), but in a fictional story, we are starting from a low-information base, and it is the story itself which builds our understanding of the story.

Here entropy means "adding additional information." A high-entropy word, because it carries more information about the contents of the story, is more useful in understanding the story than a low-entropy word. This is why "we" are correlating entropy with specificity: "Mary felt sad" tells us less about the story, and about Mary herself, than "Mary felt saudade."

Edit: … and now I notice that's already been cleared up, never mind. Edit2: Or not. Still might be a useful thought in the context of your last few paragraphs of the new reply.

935967
…desirable body parts of the suborder Apocrita…
Can I be the ant's pajamas?

936158
> Poetry is (in this info-theory argument) when you walk right up to the edge of that tradeoff, where you find words that give the greatest product of (unpredictability * usefulness).

I like this insight. It's also an important reminder that we're discussing prose fiction here, which lies at a specific range on the sliding scale, closer toward "useful" than poetry, closer toward "unpredictable" than mathematical proofs or phone books. A lot of the discussion here seems to be "most prose needs to be nudged in the direction of poetry," but we're still talking about a range far smaller than the whole scale.

> The best I can think of is "cheap" vs. "inexpensive" …

Ah, okay! I suspect you could mine a rich vein of examples over in the land of political shibboleths. For example, the loaded meanings (in American politics) of "treehugger" vs "environmentalist", (the adjectives) "Democrat" vs "Democratic", "fetus" vs "unborn baby", etc.

Finally:
> You shouldn't value a story for the new information it conveys to you, because the story is fiction. Any completely new information it contains, that was not inherent in or implied by your previous experience, is a lie.

I'm bookmarking this, because by grand and cosmic chance, the story you commissioned is a meditation on this exact point. (I need to set aside a little writing time after work and get cracking on that…)

936433 I think I'm feeling what other people complained about feeling when they read my post. :derpyderp1: What I take away from this is: Never try to outgeek Bradel.
:twilightoops:
No, Twilight. Not even you.
Meaning as a sufficiency measure: I see what a sufficient statistic is, but I can't guess what a sufficiency measure is. A measure of how sufficient your parameters are? Would that range from 0 to 1?

>Now, how one compares the proportion of information contained in a sufficient statistic to the remaining variability in data after accounting for that statistic strikes me as a pretty hard problem, and one that would almost certainly need a nonparametric framework.
Help? :rainbowderp:

936609 I'm bookmarking this, because by grand and cosmic chance, the story you commissioned is a meditation on this exact point.
See, this is what I'm talking about. A coincidence like this is entirely too predictable. It's just lazy writing on the part of God.

937370
Don't give me more credit than I deserve. When I say "sufficiency measure" I'm not (to the best of my knowledge, at least) referring to any existing concept. Basically, I mean "a thing to measure sufficiency," not "a thing with the mathematical properties of a measure that can be applied to the notion of sufficiency." Though the latter would certainly be interesting.

But it does seem like a sort of natural extension of the ideas of completeness and sufficiency, so what I'm really talking about when I say "sufficiency measure" is the variance of a particular sufficient statistic scaled by the Cramér-Rao lower bound for the variance of all estimators of the same quantity.

...um. Okay, yes. I think maybe I see your point. I apologize, you got me thinking. And I don't know how to think on your terms, with physics and entropy and Kolmogorov information, so I'm just using what I understand of those things from context and brief Wikipedia scanning to try to put it in a framework that I'm used to.

As for...
>Now, how one compares the proportion of information contained in a sufficient statistic to the remaining variability in data after accounting for that statistic strikes me as a pretty hard problem, and one that would almost certainly need a nonparametric framework.

That's just me thinking in linear modeling terms – let's try to portion out variance into what's explained by the model and some random error. If we think about writing as creating a dataset, what we want, or maybe I should say what I want, is a dataset with a lot of variance and a model that explains most of the variance in the data succinctly. Lots of variance = lots of entropy, so what I'm really interested in is a certain type of variance / entropy that we can generate with a minimum of fuss by using a clever model (our 'meaning'). The complication here being what I said about layering in meanings, so we're really looking at more of a multivariate prediction problem. We want to take a set of data and predict lots of different things at once. That'll allow us to up our information density while still keeping the creation of that information tractable. The problem is that this type of model can't be anything resembling linear – that would grossly oversimplify the patterns one could get in terms of meaning – so you'd need some larger framework to use. Since I don't really have any good preconceptions about those sorts of patterns from a data-analytic standpoint, I'd probably have to look into non-parametric methods, which just let you fit a smooth curve to data, however weird it gets. The simplest way to think about them is in terms of piecing together a lot of thrice-differentiable functions, broken up at data points, so you can have continuity in second derivative at every change of functions – but there are lots of different ways to do them, about most of which I know very little. The important thing is just that they're a flexible class of models, and since this is me thinking about a statistics analog for the whole Kolmogorov complexity thing, it's really about resolving the fact that it's nigh-impossible to get a definitive best answer about these sorts of things.

And don't worry, I still feel all :derpyderp1: when I read the ideas you put down in the original blog. That's why I'm off in my own world rebuilding it in a form I find more intuitive.

Hah. Intuitive.

Actually, when I say I do philosophy / theology too, it basically just means I've spent enough time putting both into a statistics framework that I've developed an entire Bayesian epistemology to resolve the argument from free will and expanded it into a Kantian categorical imperative that seems to also handle theodicy by echoing back to Leibniz and recognizing the necessity of order or its appearance in the development of causal reasoning. From there, things start getting esoteric. So yes... I suppose I am kind of a nerd.

932851
If that is the case, then I find myself miles from the mountain itself, unable to even see its statuesque structure through the hills and bends.

I have been writing for a little over a year, no more. I have no schooling in this art, nor have I studied it beyond my own experiments (most of which were trial and error with a big focus on error) and so, to me, the science of writing is a mystery.

I'd like to thing of myself as somewhat capable, or at least far better than I was when I began. Yet, these posts remind me of how little I know and truly understand[1]. In my readings I've come across a lot of genres and styles, many of whom don't do to what you've written, yet were still really enjoyable reads.

Maybe I'm beginning to put too much weight on the power of simple words versus to flexibility of many.

[2]

Realizing that there were certain shifts in writing and a panoply of methods to communicate the same idea was a great boon to my abilities. From then on, I was able to focus instead on pacing and the act of delivering information instead of the 'how'.

A simple example is the infamous action scene, where the interactions of the world are happening and an increased pace and where, most importantly, complex movements have to be describe in punchy, short bursts. This is one place where your "high-entropy" comes into play.

Then, there's the romantic squabble or first true meeting. Here, wasting words to describe tiny notions such as the set of a character's shoulders or the way in which one is breathing is fine and even expected to some degree.

Ahh, work beckons, so I'll leave off with a final point by one of my favorite authors and childhood idols, one that used metaphor in his work in such a way that I've yet to see replicated.

Don't use words too big for the subject. Don't say "infinitely" when you mean "very"; otherwise you'll have no word left when you want to talk about something really infinite.

--C.S. Lewis

[1] Mind you, I understand the concept, and have for a time, but I never put it so plainly or even learned to apply it in a steady fashion.

[2] Before and after this point in the comment, I consciously changed styles, more out of curiosity than anything else.

937691
> Bayesian epistemology … that seems to also handle theodicy

Hey, hey, you can't drop a high-entropy bombshell like that and walk away! :twilightsmile: Can I derail the conversation to get the ten-cent version of how you connected those dots? I usually see Bayesian reasoning applied to human action, and theodicy is a dilemma about the actions of a higher power…

938684
Let me offer some food for thought: Entropy, as we are discussing it here, has nothing to do with how many words you use. 933568 offered a beautiful example of high-entropy vs. low-entropy word choice. Entropy can't even tell us whether a word is "wasted" or not: only how much new information it introduces to the story.

As a rule of thumb, most writing in most circumstances could stand to add entropy. However, as C.S. Lewis (and some of the discussion above) tells us, maximizing entropy isn't always the correct choice. An action scene's total entropy should clearly be lower than a romance scene; the more new information a reader has to process, the more it will distract them and slow their trip through the scene. That being said, individual words can often be usefully replaced. Describing a blow as a roundhouse instead of a punch, for example, can make an action scene more vivid.

938978
>Hey, hey, you can't drop a high-entropy bombshell like that and walk away!

Can too!
Look for a PM.

939032 Bradel, if you have something to say, share it with the rest of the class. :trollestia:

939239
Fine, fine, I'll out myself. This is dropped straight from the PM Horizon got.

Have at it, you mad people. I hope my worldview survives the night.


So I'm going to have to be somewhat quick here. I have a lot of RL work to do today, and very pressing deadlines governing it.

Here's the short. I hold the existence of free will as a point of faith. I'm also Christian. So how do you resolve the idea of an omniscient God and freedom in action? A lot of people would probably just say it's double-think and move on, rejecting freedom or divine omniscience (or, roughly equivalently, the knowability of the universe). One of my friends basically hit this and did exactly that, rejecting free will entirely and jumping on the deterministic universe bandwagon.

The conflict actually has a proper name, the "argument from free will" I mentioned in the comment. So for my part, I tackle it from the side of divine omniscience. Say a being does exist or can exist and they can be correctly characterized as "knowing everything". Well, then how can it be possible for an action to be free, since it is by definition known.

Enter Bayesian thinking. One of the ways people have used to deal with this is a sort of counterfactual argument. I step that up. My attitude is that nothing in the universe is "knowable" in the way we understand knowledge. I think this jibes well with science, where (as a Bayesian) I believe that we can never achieve truth, we can only find better and better models that approximate what's really happening. From a rational perspective, I don't think we can legitimately call anything true (except possibly rules of math and/or grammar).

Extend that out a step. If nothing is knowable in the sense of conventional epistemology, what would omniscience look like. I figure it would look like a perfect representation of the probability state of the universe at every given moment. An omniscient being wouldn't know what WOULD happen next (conventional epistemology), but would have a precise understanding of what COULD happen next.

Wait, this still doesn't make any sense. What about after something happens? Then we have a known state. An omniscient being must also be able to know this known state, and given that we're talking about omniscience, really must have always known this known state.

Oh, that's easy to fix. There are no known states. Our existing epistemology is all wrong. Why don't we just let the universe be more than four-dimensional? Every time a probabilistic event happens that can change the state of the universe, every event does happen and reality branches. Limited human perception keeps us from perceiving the entirety of this (although there's no real reason to believe we don't get small echoes of it from realities in very similar configurations, and no real reason to believe we'd be incapable of perceiving surrounding realities more fully). So omniscience basically means knowing all the probabilistic paths between what we would consider separate eventuated realities.

All this to justify free will. So how do we get theodicy? Well, let's go a step further and take free will as a universal good. This is effectively my starting point in thinking, so one branch leads down to the Bayesian epistemology. On the other side, we can ask, "What is necessary for the existence of free will?" My answer to this is that free will cannot exist without the capacity for causal reasoning. The idea of "perfect freedom" in the sense of "I could grow wings right now and fly away" doesn't seem like it's actually perfect freedom at all, to me. Freedom means having some capacity to effect change. You can't change something without a reference frame. Understanding a reference frame means understanding that your actions precipitate consequences. And fundamentally that relies on physics. Without a universe that appears to be ordered and obey certain laws, we can't develop the capacity for reason and for evaluating the consequences of our choices.

So theodicy. Why is there evil in the world? Well, on the human scale, we have free will, but that's easy. On the natural scale, why does God let hurricanes and tsunamis and tornadoes kill people?
Because our goal isn't to reduce suffering, it's to maximize freedom. Fungi have life. Life is not all that precious a thing. The capacity for self-actualization is precious. The capacity to decide our own fate is precious. And hurricanes, tsunamis, and tornadoes reduce freedom in the sense that they kill people, but they increase freedom inasmuch as they're artifacts of a universe in which certain physical laws appear to govern phenomena. If we can't have freedom without causal reasoning, then whatever is required for causal reasoning seems pretty well justified and it's not hard to believe we could actually live in the best of all possible worlds.

As a side note, I think that freedom is a far more intelligible way of understanding whether actions are moral or immoral than pretty much any other criterion, too. If you take away free will, that's a bad thing. If you increase it, that's a good thing. So what we often consider two of the worst crimes that can happen – murder and rape – are actually wrong for basically the same reason, that they directly revoke another person's free will. Social laws like laws against speeding are minor restrictions on freedom put in place to reduce the likelihood of greater losses to freedom (from traffic accidents). Everything just starts making more sense, in my opinion.

You asked.

Login or register to comment