• Member Since 11th Apr, 2012
  • offline last seen Last Wednesday

Bad Horse

Beneath the microscope, you contain galaxies.

More Blog Posts758


Stylometrics: Pros vs. fans · 3:22am Aug 7th, 2014

Continuing yesterday's post on stylo. (Note I should've used PCA covariance instead of PCA correlation, because the data is already normalized. But as horizon and I pointed out, the PCA uses only the first 2 dimensions, accounting for 20% of the variance in word usage, so I should've chosen MDS (multi-dimensional scaling) instead of either PCA.)

stylo's oppose() function lets you compare two stories, writers, or groups of writers.

1. Go back to that directory with the "corpus" subdirectory. Make subdirectories "primary_set" and "secondary_set" (sisters to "corpus").

2. Concatenate all the stories for each author together into one file. This is necessary because oppose() divides the files into slices, and words have to occur a certain number of times in each slice to be included. If the slice size is less than 10,000, most of the useful discriminating words will be filtered out. But the program will crash if any of its input files are smaller than the slice size.

3. Put one set in the primary_set directory, and the other in secondary_set.

4. Run R, then
> library(stylo)
> oppose()
Set INPUT: Slice Length = 10000, Occurrence Threshold = 2, choose Craig's Zeta (not that it matters AFAIK).
Click OK.
> q()

This will create two files:
words_avoided.txt: Words avoided by authors in primary_set.
words_preferred.txt: Words preferred by authors in primary_set (= avoided by authors in secondary_set).

Taking the same stories as before, and putting the professional authors in primary_set and fan works in secondary_set, I get this as the top of the list of fan-preferred words, in order "ponies, pony, hooves, ... trotted":

ponies mare eyebrow magic plan
pony somepony filly spike nod
hooves everypony pegasus library breaking
hoof twilight rarity perfect hay
mane stallion applejack gotten surprise
equestria anymore unicorn okay belle
canterlot cutie gaze uh huh
ponyville sparkle raising today purple
celestia horn magical apple frown
anypony mares wings bloom trotted

(Somepony help me get a monospace font here...)

These are some of the words pros preferred. I'll put them in red in discussion below:

hands hotel built seat hundred
hand nine o hair office
hell sir indeed water west
money slid kill dead fish
boy hurried doubt brain listened
pocket clearly creature smoke bar
lived feet ourselves chair pipe
loss ten key steel stared
terrible thousand terribly strange someone
sea scene damn possibly fear

I'll put these words into categories:

Equestrian: pony, hoof, mane, equestria, canterlot, anypony, mare, somepony, horn, ears, chocolate, hand, hell, money, feet, kill, damn, dead, smoke, bar, pipe

Fans use real dialogue: okay, uh, huh, um, yeah, hey, mm, kinda
Rarity-speak: dear, darling, truly
Applejack-speak: ain't, gonna, ya

Reverse-Bechdel failure mode: she, her, sister, dresses, miss, his, him, boy, sir

Body language and speech tags: eyebrow, gaze, raising, nod, frown, eyebrows, sigh, shake, cheeks, sighed, motion, wide, lifting, frowning, lowered, smiles, interrupted, gasped, blinked, grin, narrowed ["her eyes"], glare, jaw, giggle, stare, frowned, blank [look], roll, glance, shifted, closer, cleared [her throat], chuckled, tears, leaning, sniffed, slid, seat, hair, stared, reached, worn, laughed, beat, passed, swung

Adverbs: expectantly, slightly, mostly, barely, normally, briefly, softly, promptly, completely, clearly, terribly, possibly, vaguely, surely

Other: anymore, gotten, today, plan, breaking, surprise, guess, promise, alright, despite, parents, display, school, truly, giving, appreciate, onto, lack, thanks, needs, mess, brief, helping, mention, response, friends, empty, supposed, confused, worry, planned, opening, asking, stories, love, pocket, lived, loss, terrible, sea, hotel, sir, hurried, scene, built, indeed, doubt, creature, ourselves, key, seat, hair, water, dead, brain, chair, steel, strange, office, west, listened, stared, someone, fear, age, often, remembered, yard, picked, corner, following, began, reached, suggested, beginning, glass, car, somewhere, soul, worn, laughed, fresh, bank, lightning, breathing, beat, certain, passed, perhaps, manager, tall

The only category that I can make anything out of is the body language category. Fans use much more body language and speech tags describing the position of the head, the eyes, the eyebrows, etc. I removed my stories from secondary_set & tried again, because I do that ALL THE TIME, and got all the same body language words again.

It's almost as if some important fandom editors had been telling us all to use more body language.

Somepony desirous of fooling others into thinking that he hadn't written a ransom letter document of some kind, could use oppose() to adjust his word choice to match somepony else's. Hopefully these tools won't fall into such nefariouis hooves.

Report Bad Horse · 1,151 views ·
Comments ( 18 )
Georg #1 · Aug 7th, 2014 · · 1 ·

Dear Governor, plses send one billion dollars as ransom two the followg addres...

If you notice the body language list for ponyfic writers, many (though not all) of them involve the face or head. I've long suspected that many authors here recognize the value of body language yet still feel subconsciously uncomfortable with the equine body. Facial expressions are shared with humans, so it's a comfort zone they gravitate toward, but everything else is sort of alien to them. There's much less of using the ears or tail... legs... nostrils? etc.

Professional authors on the other hand are predominantly using humaniods, so they get faces, plus hands, hips, standing positions (e.g. arms akimbo), sitting position, etc etc. Since they pull from a wider range of things, they aren't gravitating to the same set of words like horseword authors are. It's not that humans are more expressive per se, just that conveying horse body language is less intuitive for author and reader alike.

2350129 Yes, I agree. It would take a different analysis to determine whether pros really use less body language. My subjective impression is that they do, though.

2350129 This sounds entirely reasonable...

Coming from a crazy person discussing crazy things with a crazy, nefarious horse!

I know in my own writing I love using "flattened ears" and whatnot, but its something I need to consciously do, as well as always scan over my work to excise any foreign words like "hands" and "feet."

Maybe if somebody grew up on this stuff? Would it be worth forcing ~15ish years of horsewords on a child to see if they would naturally use hoof and horn in prose?

But see that's the thing. (Presumably) you learned flattened ears from the show, or drooping a tail, or etc. If prior to FIM, someone asked for descriptions of what a horse in various states of emotion would look like, I bet most of us would have struggled. Even now, the show has taught us some thing (whether or not they are IRL realistic, they're at least show-accurate) but that too is limited by the Flash animation. If I were a nervous horse, would I be clenching my haunches? Raising my tail or lowering it? What emotion(s) would cause me to fidget and kick up dust with my hooves?

Show helps. Fanfic helps if-and-only-if the author knows what they're talking about too. But none of that matches the empathy of using a human body for 16-35 years. Horses are innately alien to us.

I actually use llama body language for a lot of animals.

Yes, llama. Specifically that. Because I own llamas and did llama 4H for a decade.

Ears are a big thing for me. I think part of it is that the ponies in the show are actually made to be quite expressive, and facial language in particular - including the ears - is very notable.

I do notice other forms of expression though - hunching down, cowering, crouching, ect. - which frankly the ponies do a lot more than humans do. Humans are mostly very upright, whereas the ponies do a wide variety of emotes on a regular basis on the show.

They also have wings, which get used a fair bit. And Mood Wings is a very popular story.

2350315 Are the writers using horse body language or dog body language, though? I know there's a difference, but I know dog body language when I see it, and a horse's expression is a complete mystery to me. I know that I, for one, fall into the trope.

tvtropes: do not click if you want to do stuff today

2350380 I've noticed that pony body movements are much bigger than what we'd see in humans, even animated ones. I'd call that a stylistic choice, but it also reminds me of theater gestures being exaggerated to be clearly seen from the cheap seats. Or I'm just imagining things.

The one bit of information that jumped at my eyes was the usage of adverbs. Apparently the pros use them to qualify how certain some piece of information is, while fanfic authors use them to modify actions. Seems like fanfic authors need to learn more synonyms for action verbs, in order to use them more precisely and require less adverbs.

(And I certainly include myself in that statistic.)

The show itself sometimes falls into that issue. Like the filly wagging her tail as she asked Twilight for an autograph in Trade Ya; tail wagging is supposed to mean something quite different for horses, especially quick tail movements like those.

(I might be wrong, though; my experience with real horse body language comes exclusively from reading a few articles online.)

It's almost as if some important fandom editors had been telling us all to use more body language.

Why is that?


The show itself sometimes falls into that issue. Like the filly wagging her tail as she asked Twilight for an autograph in Trade Ya; tail wagging is supposed to mean something quite different for horses, especially quick tail movements like those.

"All animals are dogs."

It is worth noting that ponies are anthropomorphized, meaning that they use human body language as well, and wiggling in anticipation is a totally human thing to do.

The issue in this specific case is that it sends conflicting signals for those who do know horse body language. What I've read about the subject holds that this kind of tail movement happens when the horse is angry, prone to attack at any time.

I'm not against using anthropomorphic (or canine :trollestia:) body language, mind you; the issue is just when it conflicts with the body language of the animal being represented. It would be akin to a cat wagging its tail to show affection.

Horses don't actually wag their tails like the filly does at all, at least as far as I know, having owned a few horses and never seen it; honestly, I'm not even sure that they can do so.

They do flick their tails, which can be them trying to drive off flies, them expressing excitement (sometimes horses do this when trotting/running - you also see it fairly frequently at, say, equestrian shows), to reflect that they are doing something "hard", or as an expression of annoyance. You can generally tell by their other body language on what is going on.

The big thing to look at with horses generally is their ears and their posture, though some other motions (stomping/pawing at the ground) also have meaning.

Well, you certainly seem more qualified than me in that aspect :scootangel:

As I said, I'm basing that on articles I read in an attempt to get pony body language right (or at least not too wrong). My experience with real horses, while it exists, is very limited and happened a long time ago.

Now that you mention it, I don't think I've seen "wagging" used to describe the tail movement, but I did see fast and erratic lateral movement described as a signal of anger or deep irritation. This is the closest thing to wagging I've seen in the references I found, thus why I reacted negatively to a filly wagging her tail to show happiness.


(sometimes horses do this when trotting/running - you also see it fairly frequently at, say, equestrian shows)

The references I found seem to agree that, if this is happening, the horse is uncomfortable or in pain. Not sure if that would be the case here, though it does make some sense for it to happen in equestrian shows.

It's almost as if some important fandom editors had been telling us all to use more body language.


Quite honestly horses flick their tails all the time.

I mean, if you look at a random video on youtube:

You can see the horses flicking their tails fairly often even without any flies being visible. Do you think they're in pain or annoyed all those times? I doubt it, judging by their ear positioning and other behavior.

What would be really interesting would be to compare the corpus of "canon" work—show and comic transcripts—to fandom work. In your previous posts you were comparing our fandom to "pros" like Pratchett, Adams and Doyle, but while we are sort of imitating those guys, let's be honest, the pros we're really aping are the professional writers of pony content.

I'm sure it wouldn't be too hard to extract the captions from the various pony episodes (assuming transcripts aren't already available elsewhere) but I do see the immediate problem that, both in comics and the show, all the words are going to be almost exclusively dialog. Hmm... might be possible to find/write a programmatic way to extract just dialog from fandom stories and at least compare that. "Do our ponies sound like their canon selves?" might still be intriguing to look at.


I'm sure it wouldn't be too hard to extract the captions from the various pony episodes (assuming transcripts aren't already available elsewhere)

You mean, like on the MLP Wiki?

You might also be interested in my little experiment in adding all dialog to a spreadsheet. It's not complete, but you can have a quite large amount of dialog by copying the Text column and pasting it into a pure text editor like Notepad++. It's better for this experiment than using the pure transcripts because this allows the speaker name to be easily removed, though if you want a purer sample you would need a bit of processing to remove some tags I left there.

(With notepad++, the processing is just to search for \[[^\]]*\] as a regular expression and replace with nothing. This will remove everything within square brackets, such as all the tags I've left there.)

(Why I made that file: because I use the character column filter to select the canon dialog for one character at a time and feed it into a word cloud builder when writing dialog. So, I'm not using this software exactly, but I have been using something similar for some time already. Well, more or less ever since I got told that the dialog I wrote for Luna didn't sound like Luna at all :scootangel:)

BTW, the corpus I linked to in a comment in the first Stylometrics blog post includes five official chapter books. They likely will have some differences in word frequency, compared with fics, due to the target audience; they tend to use simpler words, and paint a brighter scenario, than most fics.

Login or register to comment