• Member Since 24th Sep, 2015
  • offline last seen March 11th

Oliver


Let R = { x | x ∉ x }, then R ∈ R ⟺ R ∉ R... or is it?

More Blog Posts349

  • 110 weeks
    Against Stupidity

    I figure I’ll do some popular sociology. I’ve reached the limit of what I can do at the present time, and I need to take a break from all the doomscrolling, because there’s only so much war crime bingo I can read before I go do something emotionally motivated and ultimately useless.

    Read More

    16 comments · 1,670 views
  • 111 weeks
    Good morning, Vietnam

    My foreign friends often ask me – the very few that know I’m Russian – what does the average Russian think about Ukraine.

    You can see why I have always kept this private now.

    Read More

    34 comments · 1,274 views
  • 156 weeks
    Lame Pun Collection

    So I decided to trawl conversation logs for throwaway lines I spout on occasion. Because otherwise I’d forget them entirely, and some of them are actually good ideas. Granted, most of them are stupid puns… But I like puns, and I’m still not sure why you’re supposed to cringe at them.

    Read More

    10 comments · 1,344 views
  • 156 weeks
    Rational Magic

    I basically improvised most of this lecture from memory when talking with DannyJ yesterday, but then I thought, why not blog this, should at least be food for thought. It’s not directly pony-relevant, more like a general topic of discussion which one needs to meditate on when writing fantasy – but that includes ponyfic, so you might be interested.

    Read More

    24 comments · 1,595 views
  • 164 weeks
    A series of unexpected observations

    So I’ve been reading things.

    Read More

    15 comments · 1,522 views
Aug
14th
2016

Statistics #2: The impact of Equestria Girls · 1:30pm Aug 14th, 2016

I finally got the time and inclination to dig into the data of the Great Fimfiction Archive.

One of the side effects is that I rewrote the normalizer that chews the thick stack of EPUBs and produces a database of easy to search text, and now this script runs on all the cores of my laptop and serves as a space heater, chewing the entire archive down in about an hour. Which is a great improvement from six hours I started with and made it a lot easier to refine it.

I’m still trying to figure out the statistics of upvotes and downvotes, and constantly stumbling into how I’m far too much of an amateur in this, but I do have some more statistical observations to share, or rather, to write down for posterity.

People seem to like it when I geek out in public. :)

This was born in a discussion with Derpmind, wherein we started arguing whether Equestria Girls movies can be considered sufficiently accepted by the fandom to treat them as proper canon or not. You can see the whole discussion in the comments here. Of course, the two worlds are highly compartmentalized, they exist independently of each other, but how big an impact did Equestria Girls eventually make?

The calculation based on Fimfiction tags performed by Derpmind produces rather dismal results: Tags permit us to select about ~4000 stories out of ~100000 as relating to Equestria Girls, or, something like 4%. But tags were added long after the first movie aired, and often, stories don’t get retagged after the appropriate tag shows up.

So how many stories actually reference Equestria Girls? You can, and often do, see things like a 1000-word short which only consists of one mention of a character name and several pages worth of angsty musings. It may or may not be tagged, even. But one of the reasons I even have this dataset is so that I could easily answer questions like these, and here’s how I approached it:

  1. Stories published before the air date of the first movie are not eligible for this calculation. You can’t reference Equestria Girls before it existed.
  2. Stories that are explicitly marked “Equestria Girls” count towards the total with no further investigation. The archive does not contain data on character tags, though, which is why…
  3. Stories that contain certain marker strings – mostly EG-specific character names – count towards the total of stories referencing Equestria Girls.

The list of those strings is as follows:

"sunset shimmer",
"vice-principal luna",
"flash sentry",
"adagio dazzle",
"sonata dusk",
"aria blaze",
"crystal prep",
"principal cinch",
"lemon zest",
"indigo zap",
"dean cadance", "dean cadence",
"sci-twi",

All of these are tested for as whole words, i.e. spaces are added before and after the end. I tried searching for “canterlot high,” but that triggers a common false positive “canterlot high society,” which is unacceptable. “principal celestia” triggers some false positives, and “wondercolts” is a common typo for “wonderbolts.” If you can think of any others, I’m open to suggestions.

Fimfiction lies about published dates for stories at least in some cases. The earliest story I could find that triggered this search, “Chrysalis’ Evil Council of Doom” by Matthais Unidostres, mentions Sunset Shimmer explicitly, so it’s not a false positive. According to Fimfiction, this was first published in January 2012. The first movie aired in June 2013. We don’t know just how widespread this problem is. Way to go and screw up your data, Knighty. This does not introduce a lot of skew into the data – some ~10 stories would fall through the cracks if I didn’t notice this – but it does introduce some. And we don’t know which one actually was the first, now.

My final result is that 8519 stories of the corpus of 135435 reference Equestria Girls entities or are marked with “Equestria Girls” category. If we only count stories published after the first movie air date, this is 9.11% – that is, on average one in 11 new stories you’re going to see will eventually reference Equestria Girls. Compared to the amount of stories explicitly tagged as such, where EG characters take center stage, this is a huge increase. Among the total corpus, this is 6.29% of all the stories that we can access.

As of this moment, the airtime of all Equestria Girls movies totals up to 215 minutes, while the airtime of the rest of the series totals up to 2904 minutes, which means that the movies are 6.89% of the entire pony video corpus. Which means they made more or less as much impact as any TV episode, if not more, because they have had less time to do so.


Cumulative number of stories

Ticks are placed in the middle of the named month. Notice that the graph is annoyingly linear, being cumulative. That isn’t very helpful except to see the generic trend. When I plot it as a line of stories per week, instead, I get this:


Stories per week

Much more interesting, we can at least infer that some weeks were better for Equestria Girls than others. But let’s plot them separately:


Stories referencing Equestria Girls per week

We can attribute the peak in May 2016 to the relaxation of site moderation rules, which happened in February, taking effect, but it does look like the proportion of EG-related stories is growing.

Any ideas if the other peaks are significant?

Report Oliver · 840 views · #statistics
Comments ( 22 )

The release date of the first EQG movie isn't quite the most accurate measurement. People made fics about it based on the announcement, the trailers, and any other info they knew at the time.

Most comedic and making fun of it, but more than a few attempts at serious storytelling, just like people are already making Legend of Everfree fics.

On a personal note, Rainbow Rocks derailed my entire writing career. I originally joined this Fandom to write about ponies. That seems to no longer be the case.

4149218

The release date of the first EQG movie isn't quite the most accurate measurement.

I suppose, but I'm pretty sure the name "Sunset Shimmer" didn't mean anything in 2012, and some of the pre-release stories do go as far back as that. Maybe I should make a list of all stories dated before the release, though...

I originally joined this Fandom to write about ponies.

Ponies are people, too. :)

4149218

So I got the list of all matching stories which are dated to before the release of the first movie. Let’s see if any of them are interesting, in order of ID number…

1. Chrysalis’ Evil Council of Doom – Clearly misdated, it’s way too early. Author’s note says “I know Sunset Shimmer turned good, but so did Dartz, so this still works!”
2. Broken – False positive. “wondercolts” seems to be a typo for “wonderbolts” and probably my results need revising.
3. Butterscotch’s Adventures in Equestria – A pretty bizarre false positive, because Celestia is a principal of the Ponyville school for some unclear reason.
4. Princess Celestia’s School for Gifted Unicorns – Another case of same false positive, and my results probably need more revising.
5. The Ship-Off – A false positive that can’t be filtered out. Rarity ships two imaginary ponies, one of which is named “Sunset Shimmer.” This is way too early to be relevant, unless it’s misdated.
6. A Series of Chaotic Events – Both features Sunset Shimmer and mentions the red hair with yellow streaks, and is dated to August 2012. When did the first trailer turn up?…
7. Mistakes and Accidents – Another false positive on “wondercolts”.
8. This one got deleted since, so I’m not dipping into the archive to read it.
9. The Rainbow Blitz Disaster – Another “wondercolts” false positive.
10. My Little Equestria Girls – This one is dated to March 2013, and is interesting in that it’s one of the earliest pre-release EG stories I can put my finger on, a first story, and devoted to poking fun at EG.
11. Revenge of a Fallen Student. – Has to be the earliest pre-release serious one. “Lacko, Izzy, and Maya were Sunset’s three younger sisters, they were triplets.”
12. Shimmer – Tries to be serious. But at 729 words it wouldn’t count as a story by modern rules.
13. This one got deleted since.
14. Dream Valley – A Bioshock Infinite crossover that features a Princess Sunset Shimmer. In place of Comstock.
15. The Fires of Anarchy – Contains Sunset Shimmer as local color and antagonist and is dated to May 2013.
16. Equestria Girls: William Simmons High is Texas. – This one is odd. It is tagged Equestria Girls and cancelled. And back when it was supposedly posted, the tag did not exist.
17. Super Yay The Movie: A Pony Movie with Ponies in It is Best Pony Movie. Also in 3D! – Rant about EG disguised as a fanfic.
18. LEGACY: The Binds – Mentions Flash Sentry (pony version) in passing.
19. Underdog Days – Crackfic making use of EG entities. Spike is a dog right on the cover.
20. Recollections – Ships “Comet Tail” with Twilight and mentions Flash Sentry. Probably the earliest hate-on-the-waifu-stealer story, dates to 10 June 2013, almost a week before the movie release.
21. Love Abloom – While it says it’s a Rarispike rejection story, it treats TwiFlash as the canon ship and is dated to 13 June.
22. My Little Hero – This one is in Spanish, and the author’s page says he has 0 stories. Huh?! Go home, Fimfiction, you’re drunk.

Is it just me, or are the earliest stories always the worst?…

According to Derpibooru, the first sign of Sunset Shimmer was this toy fair image captured in early February 2013. Later that month, we got a name. (Apparently she was part of a set where everypony got a luchador mask.) Official art of her human form was accidentally leaked in March.

Meanwhile, the fandom got its first whiff of Flash Sentry on April 26, though people discounted his existence until he showed up in the trailer in May.

Also, you may want to add the other Shadowbolts (Sugarcoat, Indigo Zap, et. al.) to that list of strings. Just a suggestion, though; not sure if it would add any statistically significant amount to the results, and Sugarcoat in particular could lead to a plethora of false positives without case sensitivity.

4149311

According to Derpibooru, the first sign of Sunset Shimmer was this toy fair image captured in early February 2013.

Which means that at least two of the stories I listed in comments above have wrong dates. I wonder how that happens.

Also, you may want to add the other Shadowbolts (Sugarcoat, Indigo Zap, et. al.) to that list of strings.

I thought about it. It may turn up a few more stories, but I'm not sure which of them are really EG-specific. "Lemon Zest" might be a good one, since she is, and "Indigo Zap" definitely is, but "Sour Sweet" and "Sugarcoat" will definitely generate false positives...

Gimme a moment.

EDIT: It did catch a few more late stories, it appears.

4149311
I'm suddenly tempted to write a story about Sunset as an underground wrestler. Even though I know nothing at all about the subject.

Oliver, I'd be interested in seeing how each of the Equestria Girls movies has affected its reception. I feel like a lot of people who initially hated the first film might have softened now.

4149352

Look at that last graph. There are rather obvious bumps in it in September 2014, when Rainbow Rocks came out, and in September 2015, when Friendship Games was released. They are not spikes, in that they don't taper off, it's a retained increase of new stories per week. Rainbow Rocks was less influential than Friendship Games, which almost doubled the weekly output of EG-related stories.

Legend of Everfree comes out in October, and when the next batch of Fimfarchive comes out after that, I'll definitely return to this topic. :)

"Is it just me, or are the earliest stories always the worst?…"

It's the rush to market that does it...

This is fascinating, as usual. I do wonder if your system might be undercounting Equestria Girls stories, Oliver. I think a certain amount of authors just use the "human" tag when they mean "Equestria Girls." Another factor is that, if you consider Equestria Girls canon, any story that references them in even the smallest way should count. (i.e., if Twilight briefly mentions Sunset Shimmer while talking to Starlight about something). However, people don't use tags for story elements that only appear in small amounts in their story.

On the flipside, you may actually be overcounting stories with use of the Flash Sentry tag, since he is a show-canon character outside of the movies. Realistically, I can't think of almost any story that talks about pegasus Flash Sentry that doesn't at least somewhat incorporate Equestria Girls, so that may be a moot point.

One other factor: Some people still write stories set in earlier seasons, like how they think the Changling Invasion should have gone or they just don't want to write Twilight as an alicorn. Those neither confirm nor deny the existence of Equestria Girls.

Perhaps there should be 3 categories: Stories that acknowledge Equestria Girls (by having the right tags), stories that ignore the issue entirely, and stories that actually refute Equestria Girls. In my experience, those few stories are generally Human in Equestria stories where the author doesn't want the protagonist to be able to leave Equestria. I wonder how many stories there are with the human tag since 2014 that lack relevant Equestria Girls tags? Be interesting to see if those are more or less than combined Equestria Girls stories.

From a literary point of view, I've noticed we see a lot less Human in Equestria stories ever since EQG, and I wonder if maybe authors were switching to writing about Equestria Girls instead.

One of the side effects is that I rewrote the normalizer that chews the thick stack of EPUBs and produces a database of easy to search text, and now this script runs on all the cores of my laptop and serves as a space heater, chewing the entire archive down in about an hour. Which is a great improvement from six hours I started with and made it a lot easier to refine it.

... ? Last I used it, the fimarchive is all text files, not epub. I don't want epub either.
If it's epub now, would you post the normalizer code?

How about views and ratings for EQG vs. non-EQG stories? :twilightsmile:

I think I've read a couple stories that act as prequels/AUs to the movies, where Sunset is a fallen apprentice but still in Equestria, the sirens are luring sailors to the rocks to eat their flesh, etc. Using the implied canon of the films but not the stories themselves.

Are you worried about classifying those, or just lumping them together with the highschool fics?

4150389

Perhaps there should be 3 categories: Stories that acknowledge Equestria Girls (by having the right tags), stories that ignore the issue entirely, and stories that actually refute Equestria Girls.

How would you differentiate a story between refuting EG, and ignoring the issue, based on text strings alone?

4150519

If it's epub now, would you post the normalizer code?

It's epub now. Here's the code for the entire statistics project as it currently is: https://transfer.sh/JEM3m/fimstat.tar.gz This link will self-destruct in a couple of weeks.

Requires Python 3. No idea if it will work on Windows, though it should in theory. unwrap.py produces the database, other scraps of code use it. I'll document it and make a repo later.

How about views and ratings for EQG vs. non-EQG stories?

Good idea, I'll see what I can do... Or you do that, you got the code. :) :raritywink:

4150625

Are you worried about classifying those, or just lumping them together with the highschool fics?

Unless you can propose a text-based heuristic that would permit me to classify them, I don't see how I could avoid lumping them together with highschool stories.

4150719

How would you differentiate a story between refuting EG, and ignoring the issue, based on text strings alone?

That's a tricky question. I was thinking about human stories that don't have the EG tag set, since my experience is 90% of human stories these days are about the Equestria Girls world, and the other 10% are Human in Equestria stories. Of course, that's really rough, but it would be interesting to know the size of that dataset.

4150729

Well, I can also extract tag information, but I'm not sure if this is going to be a good distinction... People have very vague ideas about what tags mean.

4150745 Yeah, it's not a perfect solution. On the other hand, comparing stories that have and do not have the EG tags is also pretty imprecise. How many of those stories are clopfics, or things like "Apple Bloom visits Zecora and learns to make a potion," that indicate nothing regarding whether or not the author takes EG as canon, even if the story itself was read.

Maybe a better approach would be to compare the number of EG tagged fics with the number of fics tagged with, say, Discord. Discord is probably the single most popular recurring character (defining the Mane 6, Spike, the CMC and the Alicorn Sisters as the "mane" characters.") He's been in 10 episodes, which is only slightly longer than the running time of all three movies. If people have significantly more Discord tags than EG tags, maybe it means they don't consider EG canon, or at least they aren't as interested in it as a character from the show. If there are about the same or more EG tags, people consider it just as much a part of FiM as Discord. If Discord isn't a good example, maybe Big Mac or Cadance, or some other character in the show who appears frequently enough to be an important part of the world, but is not a main character.

4150765

I'm not only using tags for selection. Rather, the whole point of this was to look for character names in text.

The archive index does not include character tags at all.

4150784 Shoot. I guess look for Discord's name in text then, or whatever example fills an equivalent amount of space in the show.

4150809

Discord is mentioned in 30032 or 22.17% of total stories. Which is hardly surprising, he carries much more impact than most other characters, while the mirror strictly limits the points of contact with the other world. Also, everyone needs a villain, and Discord makes a popular one.

For comparison, Sombra is even more disproportionate: 9999 of 135435 or 7.38%. That for a guy who only ever says "Crystalssssss" on screen and only shows up in one two-parter. Suri Polomare also shows up in only one episode, and yet only turns up in 337 or 0.25% stories. Lightning Dust is mentioned in 1712 or 1.26%, and she was only around for one episode as well.

I'm afraid there just isn't a readily selectable phenomenon other than EG to compare with.

4150866 I guess you're right. Too bad the comics don't have a unique character set, that would have been a much better comparison.

If you can think of any others, I’m open to suggestions.

"Abacus Cinch"
"Equestria Girls"…? might generate false positives.
"the dazzlings" or similar?

Checking whether "Sunset Shimmer" is the only hit, as there are no doubt a lot of Equestria-side hits there.

Huh?! Go home, Fimfiction, you’re drunk.

Stories can be unlisted, somehow, clearly.
See also this classic [empty] FimFic that allegedly predates My Little Pony by twelve years.

4308543

“Abacus Cinch”

Somehow I forgot that one.

I’ll go back to this topic once the next batch of Fimfarchive is released, because then we’ll be able to see the impact of Legends of Everfree. The pony movie with no ponies at all in it.

See also this classic [empty] FimFic that allegedly predates My Little Pony by twelve years.

Some of the old data is clearly borked. Much of the data is apparently precomputed. In particular, ratings only get computed several times per day and occasionally the batch job that does it freezes.

Login or register to comment