• Member Since 11th Apr, 2012
  • offline last seen 11 hours ago

Bad Horse


Beneath the microscope, you contain galaxies.

More Blog Posts758

Dec
21st
2016

On surveys and the precision of language · 6:05am Dec 21st, 2016

Trigger warning: Extreme nerdiness

There are still comments coming on my post "How to piss people off with a survey", and still some hard feelings (though I feel much better now, thanks). It occurs to me that the binary question is a concrete example of how words mean, and can cast light on the larger argument over the precision of language and the nature of "truth."

A synthetic statement can’t (I think) be “true” in the old-fashioned sense required by logic, nor can it be absolutely, 100% certain. It can, however, be meaningful and precise. Unfortunately, the kinds of statements that can be meaningful and precise aren’t the kinds of statements we’re interested in.

Take the survey question, “Are you in the fandom more for...”, with allowable answers ‘the show’ and ‘the fandom’. Suppose I had 100 responses, 73 ‘the fandom’ and 27 ‘the show.” What truth can I state given that data?

You might like to say something like, “Most fans are in the fandom more for itself than for the show.” The data is strong evidence that that statement is true, but we don't know how confident we should be about it.

You might like to say, “The probability p that a fan is in the fandom more for itself than for the show is 0.73,” but you don't really know that. If you'd gotten one more answer, the sample result would have gone either up or down. The true probability is probably close to 0.73. We can't say we know the true probability. We could say how confident we are that it's within some specified range of 0.73, although that's not what I'm going to suggest doing.

You might like to say something about the probability that the proposition “Most fans are in the fandom more for itself than for the show” is true. In other words, given our results, what is the probability that the true p for the population > 0.5? We could answer this if we could integrate, over the possible values for p from 0 to .5, the probability of getting 73 or more “fandom” responses out of 100. The problem is that, in the real world, p = .5 seems much more reasonable for this question than p = 0. If I asked you to place bets on our results before we asked anybody any questions, probably more people would bet on getting half of each survey answer than on getting zero of one or the other. So just doing that integration, without accounting for the prior probability distribution for p--think of that as what the betting odds for different ranges of p would be before conducting the survey--will give us a wrong answer. That simple integration would assume that 0 < p < .1 and .4 < p < .5 were equally probable ranges in which to find the true value of p.

But we don’t know what the prior probabilities are for different ranges of p. The word “prior probability” isn’t even well-defined here--prior to what? We could say “prior to knowing anything about people and how they behave” and use the uniform distribution for p. That means just saying "I dunno how people behave; maybe every value of p is equally likely." That would let us compute a result that was precisely defined and accurate, but could not be translated into a number that could be used by a scientist embodied in the real world who already knew something about people and how they behave.

Consider the hypothesis that p = 0.5, meaning fans are equally likely to answer either way. We can say--and this is about as close to a true statement in English as you can get--“If our sample of fimfiction users was unbiased, and fans were equally likely at the time to be in the fandom for the fandom or for the show, then the probability of getting 73 or more ‘fandom’ responses in our survey would have been .00000235.” This is the probability that flipping a coin 100 times will give 73 or more heads. We can also say, “If our sample was unbiased, then the probability that the true value for p, the probability of a fimfiction member responding on Dec. 14 or 15 2016 that they were in fandom more for the fandom, was less than 0.5, is < .00000235.” That’s truth to 3 significant digits.

(Defining exactly what “unbiased” means is surprisingly tricky, but you can easily come up with hackish definitions that are good enough to work with. You could be excruciatingly epistemologically correct, but in practice it's not worth the trouble.)

The real numbers in this case are 1176 ‘fandom’ answers out of 1606, and my computer doesn’t have enough numeric precision to represent a number as small as the probability of getting 1176 or more fandom answers if p = .5. So while we can’t say it’s “true” that most fimfiction members will say on a survey that they’re in the fandom (on Dec. 14-15 2016) more for the fandom than for the show, we can say that you shouldn’t waste time worrying over whether I got an unrepresentative sample just by pure bad luck. It’s much more reasonable to wonder if I got an unrepresentative sample because I asked people on my blog, or whether you're drunk or dreaming, or if there was a bug in the Google Forms software that reported the answers wrong, or if I copied them down wrong, or if I’m lying and made those numbers up to prove some point.

The difficulty is that this statement--let’s call it S--can be meaningful and precise only because it’s a “second-order statement”, a statement about a statement. S is about the statements T1 and T2, “I am in the fandom more for the fandom” and “I am in the fandom more for the show”. The uncertainty about meaning is stored in T1 and T2: What is a fandom? What does it mean to be “in” a fandom? Are these people really “in” the fandom? If you want to apply S, which is nearly true, to the real world, you’ll usually need to interpret T1 and T2.

I believe all the words have meaning which can be unpacked, but the packing and unpacking is done by our senses and by our learned responses, not by our conscious mind, so we can't explain it. Explaining why I believe that would take a bigger post. Just showing the flaws in post-modern arguments would be easier than explaining how meaning is stored and accessed, but either would take more words.

But rather than get into all that, scientists just try to choose T1 and T2 so that interpreting them isn’t very ambiguous. For example, if you wanted to study whether artificial flavors make kids hyperactive, you couldn’t let T1 be the claim “Artificial colors make kids hyperactive.” It should be something like “In the hour after eating a cookie with 1 gram of red dye #3, more than 10% of children ages 8-12 will have an increased activity score as recorded according to the scoring system described in Table 2,” where table 2 says things like “Child raises arms above head: 1 point. Both of child’s feet leave the ground simultaneously: 3 points.” That’s called operationalizing the question: defining it in terms of observable behavior (“operations”). That gives you T1 and T2 that aren’t very ambiguous, but translating them back into human summaries like “hyperactivity” is difficult.

NERD DIGRESSION: Einstein invented operationalization in 1905. He was puzzling over the apparent fact, from Maxwell's equations for light and the Michaelson-Morley experiments, that the speed of light was always the same for any observer. He realized this meant distances and times must appear different to different observers, and this meant that no one could ever actually measure the distance or the time between two events distant in time and space. You had to already know the distance to measure time, and you had to know the time to measure distance. So he said, "What if I realize all my time and distance measurements are really measurements of distance in time-space?" Developing the theory of special relativity was simple after that. But operationalization changes the meaning so much in relativity, and also in quantum mechanics, that you can't translate the results back into ordinary human thought at all. You have to keep thinking in those strange new terms forever after.

This is how truth works in science. It’s difficult, but we can make statements with language that are, for most practical purposes, true. The particular objections made by phenomenologists and post-modernists are not show-stoppers. They're correct that we can't objectively define all the terms within the statement, but in practice that isn't the source of most miscommunication with precise statements. More problems arise in trying to translate these second-order statements--statements about the probability of some summary statistic of operationalized behaviors of the population lying within some range--into first-order English statements about the things we’re interested in, like "Most fans are X." That is where more errors in scientific articles come from--not in some disconnect between language and reality, but in keeping track of all our assumptions and summarizing our truths in familiar terms. If policy setters were also good scientists, we could keep statements in those operationalized terms, and there would be a lot less confusion in the world.

I’ve noticed over the years that when I make a precise statement, people without any scientific training seldom understand what I said. That claim above did not say that 73% of fans are in fandom for the fandom. That’s not even close to what it said. “73% of fans are it it more for the fandom” would be closer, but still wrong. People who aren’t used to statements like that usually just strip off the qualifiers and leave some chunk of words from the middle, and remember “that’s what Bad Horse said.” Even if they’re writing a reply so they have my actual words right in front of them. That’s the usual response to any precise statement made by a scientist.

So it’s especially frustrating to hear claims made by people in certain academic disciplines that science is bunk because words can’t really convey meaning, when I know from experience that people from those disciplines are the ones who don’t seem to pay any attention to what words mean, either when they speak or when they listen. The people who complain the loudest about the inability of people to make meaningful statements are the people who are the worst at making meaningful statements. Read some Heidegger or Derrida if you don’t believe me.

Now, if I’d put in a third answer choice on that question, “neither”, what true statement could I make?

I couldn’t make a precise statement about the probability of getting N or more ‘fandom’ responses to the two-choice question “You’re in it more for the…”, because a big chunk of responses are missing, and the responses that are missing are not random. I couldn’t easily make a precise statement about the probability of getting N or more ‘fandom’ responses to the three-choice question, because I have a new free parameter now--what range of fandom / show importance ratios should give a ‘neither’ answer? And that range varies from person to person. And depends critically on the exact wording used: 'neither', 'both', ‘I don’t know’, ‘other’ might all give different results. Merely adding the third option, ‘neither’, would change the statement I could make based on the results from one that is precise and true--in a really epistemologically rigorous way--to one that is neither.

Comments ( 18 )

So while perhaps we can’t say it’s “true” that most fimfiction members will say on a survey that they’re in the fandom more for the fandom than for the show, we can say that you shouldn’t waste time worrying over whether I got an unrepresentative sample just by pure luck.

Of course, it is reasonable to think you got an unrepresentative sample by design. People who don't read blog posts or participate in the fimfiction community would have been less likely to find out about the survey, so one would expect some bias toward fandom-oriented fans.

I do agree with the general point. Scientific findings often get distorted in the giant game of telephone that occurs when the results are translated from scientific paper to press release to news story to twitter headline. Important caveats get lost as information gets passed through people with progressively less understanding of the topic.

I thought the next blog post was going to be the blanket 'Things Bad Horse Screwed Up'. Not that I'm complaining, just neutrally expressing surprise.

4349043

Of course, it is reasonable to think you got an unrepresentative sample by design. People who don't read blog posts or participate in the fimfiction community would have been less likely to find out about the survey, so one would expect some bias toward famdom-oriented fans.

Absolutely. Some data I have suggests that half of fimfiction's users don't watch anyone, or else are in small circles of friends that only watch each other. "Fandom" is fuzzy at the edges. Only about 1/4 of people registered on the site visit it once a month, IIRC. Are they "in ficdom" if they watch no-one and visit the site a few times a year? But that could be a quarter of the users.

4349047 I'm surprised too.

Yes, yes, I know some of these words.

On the bright side, you now know that you can manipulate a small portion of the fandom's populace to DO YOUR BIDDING with a simple blog about a survey!

... and so it begins. FASTEN YOUR HATS!

I didn't have any problems with that one personally but it might ultimately just be an ill-posed question for a self-reporting survey, given that it demands people construct a meaningful narrative over this portion of the last several years of their lives and collate what memories stick out to them, but since they've been primed by being on a fanfic site instead of a place where you watch the show, they're naturally going to gravitate towards fandom-specific memories in the moment. You might get a completely different result if you had this question at the end of a streaming episode.
Maybe the answer you wanted could have been gleaned by a section asking people what they think of when they think of the fun they get from MLP in general, with a list of things relating to both the show and the fandom, and seeing how people rank or otherwise choose them.

You might like to say, “The probability p that a fan is in the fandom more for itself than for the show is 0.73,” but that is, in fact, false. That’s because it’s a real number, and there are an infinite number of real numbers between 0.7299 and 0.7301, and so the chance that the true probability is exactly 0.73, rather than some other very close number, is zero.

Well, no, the probability isn't zero because there are not in fact infinitely many real numbers that the ratio of fandom-for-fandom-fans / total-fans could end up being. For that to be true, there would need to be infinitely many humans, and there aren't (which in itself would lead to weird results: when there are infinitely many people, you can always ask another person and add one more to the sample size, and you can't know before asking what their answer will be, so the ratio discovered will eventually gravitate towards some value as a limit but never really stop fluctuating randomly along the way).

/pedantic nerding

Have to admit that I was laughing at the....wording of the survey at times and wondering if it was on purpose, and if so what its intention was. Well, now we know I suppose, and yeah, I've noticed that people tend to rearrange the wording of things at times so it makes sense to them even if it changes the meaning. :pinkiesick:

YAY! I actually understood a lot of that! And yeah.. pretty much agree, it's frustrating when you are trying to be as precise as possible.. and people outside the field ignore that and just generalize or misinterpret things. The whole "Evolution is just a Theory" crowd for example who do not understand what is meant by 'theory' in that context.

Good article.

That just reminded me of a quote whose source I can no longer remember: "Post-Modernism is the refusal to think. Deconstructionism is the refusal to believe anyone else can either."

I have to say that this is probably the most educated and thought out thing I've read in the past year. I may not have understood all of it, but I can trust that it's not a load of bull.
Well done.

4349093 Ah. Correct. You get a Twist. :twistnerd:

4349071 Yes--thanks; that is a real problem, and I wasn't fair to deconstructionism in my post. A lot of the uncertainty about the statement's meaning is hidden or "stored" in how the question is worded. So the deconstructionists have a point that it's very hard or maybe impossible to "unpack" the full meaning of the statement ("What does it really mean to say you're "in it"? What is "the fandom"?). I haven't addressed all their arguments. You could say I cheated.

But I can leave that uncertainty stored in the words if I keep the same words, and whenever I have a question that the answer might be relevant to, I can check how closely the words in my question match the words in the stored statement, and decide whether the uncertainty is too large to use the statement to answer that question. Maybe even without fully understanding either the question or the statement.

I think I could address all their main arguments about meaning, but it's gonna take a bigger post.

All this because no one was able to understand how vague a "neither" or "other" answer would be.

4349357 I think there's more to it than that. As Bad Horse notes, all this really tells right now is 'In the window that the survey was open, I received X responses indicating Fandom, and Y indicating show, and assuming people took it in good faith then the proportion self-reporting here is that'

Whether there is anything truly useful there is unknown, because the question is binary for something that I think most everyone would agree is a spectrum.

So it’s especially frustrating to hear claims made by people in certain academic disciplines that science is bunk because words can’t really convey meaning

I think a nice way to address this is to ask such people what their words mean. ^.^

This is how truth works in science.

That's how drawing conclusions works in science, not truth. :) That's an important difference. Truth works the same in science as it does everywhere else. Determining what that truth (such as data) implies is what functions differently in various realms in life (whether it ought to function differently is up for debate).

Confusing truth for conclusion is what leads to ideas like Post Modernism or that no absolute truth exists.

Anyway, I'm glad you feel better. :)

The people who complain the loudest about the inability of people to make meaningful statements are the people who are the worst at making meaningful statements.

Isn't this just a classical case of psychological projection?


I'm not sure if this post is going to help people who don't understand this stuff understand this stuff. I understood what you said, but I have scientific training and understand the joys and sorrows of second-order effects (even if I, too, sometimes make imprecise statements - and think imprecise thoughts).

But I suspect that someone without much understanding of this sort of thing is going to go into a coma about halfway in and their brain fill in the rest.

4350652

Isn't this just a classical case of psychological projection?

Maybe--it is remarkable the extent to which what theorists claim tells you more about theorists than about their subject matter--but I think it's also self-fulfilling. Post-modern writing is designed for a world that works the way post-modern theory says it should. So they don't check empirical facts, don't have a theory of how categories work, introduce new words and phrases without defining them because the definition is supposed to be just the words they're used with and opposed to, use old words to mean different things without mentioning it, don't use hypotheticals, use word cognativity as their primary form of evidence, use words that don't actually refer to anything, and so on.

People who really believe statements are unanalyzable don't do the work to make their statements analyzable, and eventually lose the ability, and even the ability to determine whether they're saying anything or not.

I'm not sure if this post is going to help people who don't understand this stuff understand this stuff.

Maybe not. I made some big goofs the first day I posted it, and nobody caught them. :unsuresweetie:

Login or register to comment