• Member Since 27th Aug, 2013
  • offline last seen 11 hours ago

Chinchillax


Fixation on death aside, this is lovely —Soge, accidentally describing my entire life

More Blog Posts62

Sep
4th
2014

Video- Over One Billion Words of MLP fanfiction · 8:21pm Sep 4th, 2014

Oh, goodness. Never has a single project used so completely nearly every single talent I have: Illustrator, Photoshop, writing, reading, excel, VBA, Java, not to mention the hours spent watching After Effects tutorials. But it is done!

Please check out that video and especially all the spreadsheets, they're really fun to play around it.

Comments ( 19 )

Holy crap, that's a lot of info.
img.pandawhale.com/post-25367-give-that-man-a-cookie-meme-Pu-7nU9.jpeg
...
It makes my 75,000 words seem so small now. :twilightsheepish:

The top 1% of stories on fimfiction can be read in five days.

Well, it's not like I needed to do anything else this week, right?

Of course, one unfortunate side effect of this is that you would only read the stories that everyone else thinks are good. And while you can usually trust that a story that a few thousand people upvoted will be pretty good, everyone has different tastes. And you'd miss out on plenty of hidden gems that just haven't become popular for one reason or another.

Here's something else for you to possibly look at. Your data shows that adventure stories are the popular to write, followed by romance, dark, comedy, and then slice of life. But what are the most popular genres to read?
Well, I did a quick survey of the top 50 stories and got these results:
Slice of Life: 78%
Comedy: 34%
Sad: 16%
Adventure: 10%
Romance: 10%
Alternate Universe: 10%
Random: 4%
Dark: 2%
Human: 2%
Tragedy: 0%
Crossover: 0%
Anthro: 0%
Those are some immensely different numbers. It seems that the fandom has a preference for cute little (perhaps comedic) slice of life stories more than anything else. Considering that this is My Little Pony, this probably shouldn't be surprising.
Just for fun, I decided to try to find where the highest-rated tragedy, crossover, and anthro stories fell.
The highest-rated crossover came in at 208.
The highest-rated tragedy came in at 270.
The highest-rated anthro story came in at 547.
In fact in the top 1% of all stories (which I'm rounding to the top 800, for simplicity), there are:
39 crossovers, 9 tragedies, and 1 anthro story.
And while I can't say (or easily figure out) which characters where the most common in the most popular stories, it looked to me like Twilight and Celestia were the most common in the top 50, followed by the rest of the mane six and Luna (none of that should be very surprising).

Actually, why not figure it out for all the genres? In the top 1% of stories on fimfiction,* this is your breakdown by genre:

Slice of Life: 450 stories (56.25%)
Comedy: 320 stories (40%)
Romance: 158 stories (19.75%)
Adventure: 155 stories (19.375%)
Sad: 111 stories (13.875%)
Alternate Universe: 89 stories (11.125%)
Random: 89 stories (11.125%)
Dark: 59 stories (7.375%)
Human: 47 stories (5.875%)
Crossover: 39 stories (4.875%)
Tragedy: 9 stories (1.125%)
Anthro: 1 story (0.125%)

So it looks like the top 50 and the top 800 stories have the genres in almost the same order (the only difference being Sad moving down two spots), though the percentages are slowly moving towards the ones in your video.

I think it's interesting that people apparently like to write dark stories (and to a lesser extent, adventure and romance stories), even though, overall, people don't seem to like to read them. Or they don't like to read them as much as they like to read slice of life and comedy stories, at least.

Maybe I will see if I can figure out the data for characters next...

* For these numbers, I looked at the top 800 highest-rated stories, according to fimfiction's sorting, not just sorting them by upvotes in your spreadsheet. For thoroughness, I included all stories, regardless of rating or completion.

Well that only took a few more hours.

Twilight Sparkle*
including Twilicorn: 278 (34.75%)
without Twilicorn: 212 (26.5%)
Twilicorn: 66 (8.25%)
Rainbow Dash: 96 (12%)
Pinkie Pie: 68 (8.5%)
Applejack: 58 (7.25%)
Rarity: 58 (7.25%)
Fluttershy: 44 (5.5%)
Spike: 95 (11.875%)
Mane Six: 154 (19.25%)
Apple Bloom: 27 (3.375%)
Scootaloo: 43 (5.375%)
Sweetie Belle: 24 (3%)
Cutie Mark Crusaders: 58 (7.25%)
Babs Seed: 2 (0.25%)
Princess Celestia: 229 (28.625%)
Princess Luna: 179 (22.375%)
Nightmare Moon: 22 (2.75%)
Gilda: 5 (0.625%)
Zecora: 13 (1.625%)
Trixie: 22 (2.75%)
Cheerilee: 17 (2.125%)
The Mayor: 5 (0.625%)
Hoity Toity: 0 (0%)
Photo Finish: 0 (0%)
Sapphire Shores: 1 (0.125%)
Spitfire: 5 (0.625%)
Soarin: 3 (0.375%)
Prince Blueblood: 14 (1.75%)
Little Strongheart: 0 (0%)
Discord: 42 (5.25%)
Mare Do Well: 2 (0.25%)
Fancy Pants: 8 (1%)
Daring Do: 11 (1.375%)
Flim and Flam: 2 (0.25%)
Cranky Doodle Donkey: 1 (0.125%)
Matilda: 0 (0%)
Mr. Cake: 4 (0.5%)
Mrs. Cake: 4 (0.5%)
Iron Will: 2 (0.25%)
Princess Cadance: 29 (3.625%)
Shining Armor: 30 (3.75%)
Wonderbolts: 4 (0.5%)
Diamond Dogs: 2 (0.25%)
Queen Chrysalis: 31 (3.875%)
King Sombra: 11 (1.375%)
Crystal Ponies: 4 (0.5%)
Lightning Dust: 1 (0.125%)
Sunset Shimmer: 1 (0.125%)
Pie Sisters: 3 (0.375%)
Cherry Jubilee: 0 (0%)
Cake Twins: 2 (0.25%)
Flash Sentry: 1 (0.125%)
The Mane-iac: 0 (0%)
Power Ponies: 0 (0%)
Cheese Sandwich: 0 (0%)
Maud Pie: 5 (0.625%)
Coco Pommel: 3 (0.375%)
Trenderhoof: 0 (0%)
Breezies: 0 (0%)
Suri Polomare: 2 (0.25%)
Ahuizotl: 2 (0.25%)
Seabreeze: 0 (0%)
Fleetfoot: 0 (0%)
Bulk Biceps: 0 (0%)
Tirek: 1 (0.125%)
Big Macintosh: 26 (3.25%)
Granny Smith: 8 (1%)
Braeburn: 3 (0.375%)
Diamond Tiara: 17 (2.125%)
Silver Spoon: 11 (1.375%)
Twist: 3 (0.375%)
Snips: 1 (0.125%)
Snails: 1 (0.125%)
Pipsqueak: 8 (1%)
Featherweight: 0 (0%)
Angel: 6 (0.75%)
Winona: 5 (0.625%)
Opalescence: 7 (0.875%)
Gummy: 6 (0.75%)
Owlowiscious: 8 (1%)
Philomena: 4 (0.5%)
Tank: 1 (0.125%)
Derpy Hooves: 36 (4.5%)
Lyra: 25 (3.125%)
Bon Bon: 17 (2.125%)
Vinyl Scratch: 26 (3.25%)
Caramel: 1 (0.125%)
Doctor Whooves: 9 (1.125%)
Octavia: 26 (3.25%)
Berry Punch: 2 (0.25%)
Carrot Top: 3 (0.375%)
Fleur Dis Lee: 3 (0.375%)
Colgate: 0 (0%)
Dinky Hooves: 17 (2.125%)
Thunderlane: 6 (0.75%)
Flitter and Cloudchaser: 3 (0.375%)
Rumble: 4 (0.5%)
Roseluck: 3 (0.375%)
Changelings: 49 (6.125%)
Noteworthy: 0 (0%)
Nurse Redheart: 2 (0.25%)
Flower Ponies: 0 (0%)
Raindrops: 1 (0.125%)
Spa Ponies: 1 (0.125%)
Sparkler: 5 (0.624%)
Cloudkicker: 0 (0%)
OC: 216 (27%)
Other: 114 (14.25%)

* While Twilight Sparkle and Twilicorn are two seperate tags, searching for Twilight will also return Twilicorn stories. The first number is the results when the Twilicorn results are included, the second is when all Twilicorn stories are excluded, and the third is a search for the Twilicorn tag.


[/hr]

Now if you prefer them in order:

Twilight Sparkle (including Twilicorn): 278 (34.75%)
Princess Celestia: 229 (28.625%)
OC: 216 (27%)
Twilight Sparkle (without Twilicorn): 212 (26.5%)
Princess Luna: 179 (22.375%)
Mane Six: 154 (19.25%)
Other: 114 (14.25%)
Rainbow Dash: 96 (12%)
Spike: 95 (11.875%)
Pinkie Pie: 68 (8.5%)
Twilicorn: 66 (8.25%)
Applejack: 58 (7.25%)
Rarity: 58 (7.25%)
Cutie Mark Crusaders: 58 (7.25%)
Changelings: 49 (6.125%)
Fluttershy: 44 (5.5%)
Scootaloo: 43 (5.375%)
Discord: 42 (5.25%)
Derpy Hooves: 36 (4.5%)
Queen Chrysalis: 31 (3.875%)
Shining Armor: 30 (3.75%)
Princess Cadance: 29 (3.625%)
Apple Bloom: 27 (3.375%)
Big Macintosh: 26 (3.25%)
Vinyl Scratch: 26 (3.25%)
Octavia: 26 (3.25%)
Lyra: 25 (3.125%)
Sweetie Belle: 24 (3%)
Nightmare Moon: 22 (2.75%)
Trixie: 22 (2.75%)
Cheerilee: 17 (2.125%)
Diamond Tiara: 17 (2.125%)
Bon Bon: 17 (2.125%)
Dinky Hooves: 17 (2.125%)
Prince Blueblood: 14 (1.75%)
Zecora: 13 (1.625%)
Daring Do: 11 (1.375%)
King Sombra: 11 (1.375%)
Silver Spoon: 11 (1.375%)
Doctor Whooves: 9 (1.125%)
Fancy Pants: 8 (1%)
Granny Smith: 8 (1%)
Pipsqueak: 8 (1%)
Owlowiscious: 8 (1%)
Opalescence: 7 (0.875%)
Angel: 6 (0.75%)
Gummy: 6 (0.75%)
Thunderlane: 6 (0.75%)
The Mayor: 5 (0.625%)
Gilda: 5 (0.625%)
Spitfire: 5 (0.625%)
Maud Pie: 5 (0.625%)
Winona: 5 (0.625%)
Sparkler: 5 (0.624%)
Mr. Cake: 4 (0.5%)
Mrs. Cake: 4 (0.5%)
Wonderbolts: 4 (0.5%)
Crystal Ponies: 4 (0.5%)
Philomena: 4 (0.5%)
Rumble: 4 (0.5%)
Soarin: 3 (0.375%)
Pie Sisters: 3 (0.375%)
Coco Pommel: 3 (0.375%)
Braeburn: 3 (0.375%)
Twist: 3 (0.375%)
Carrot Top: 3 (0.375%)
Fleur Dis Lee: 3 (0.375%)
Flitter and Cloudchaser: 3 (0.375%)
Roseluck: 3 (0.375%)
Babs Seed: 2 (0.25%)
Mare Do Well: 2 (0.25%)
Flim and Flam: 2 (0.25%)
Iron Will: 2 (0.25%)
Diamond Dogs: 2 (0.25%)
Cake Twins: 2 (0.25%)
Suri Polomare: 2 (0.25%)
Ahuizotl: 2 (0.25%)
Berry Punch: 2 (0.25%)
Nurse Redheart: 2 (0.25%)
Sapphire Shores: 1 (0.125%)
Cranky Doodle Donkey: 1 (0.125%)
Lightning Dust: 1 (0.125%)
Sunset Shimmer: 1 (0.125%)
Flash Sentry: 1 (0.125%)
Tirek: 1 (0.125%)
Snips: 1 (0.125%)
Snails: 1 (0.125%)
Tank: 1 (0.125%)
Caramel: 1 (0.125%)
Raindrops: 1 (0.125%)
Spa Ponies: 1 (0.125%)
Hoity Toity: 0 (0%)
Photo Finish: 0 (0%)
Little Strongheart: 0 (0%)
Matilda: 0 (0%)
Cherry Jubilee: 0 (0%)
The Mane-iac: 0 (0%)
Power Ponies: 0 (0%)
Cheese Sandwich: 0 (0%)
Trenderhoof: 0 (0%)
Breezies: 0 (0%)
Seabreeze: 0 (0%)
Fleetfoot: 0 (0%)
Bulk Biceps: 0 (0%)
Featherweight: 0 (0%)
Colgate: 0 (0%)
Noteworthy: 0 (0%)
Flower Ponies: 0 (0%)
Cloudkicker: 0 (0%)

So it looks like my results are fairly similar to yours. But there are a few differences that stood out:
-While we both have Twilight and Celestia at the top, my numbers give them considerably higher percentages.
-I have Luna before Rainbow Dash, with a considerably higher percentage.
-Spike jumps way ahead, showing up before four of the mane six
-The CMC have much higher percentages and show up before most of the background ponies.
-Actually, most of the characters have noticeably higher percentages in my numbers.
-I have a lot more characters listed than you do.

So what can we draw from this?
-We all love Twilight. And Celestia.
-High-rated fics use most canon characters more than average (especially Twilight, Celestia, Luna, and Spike).
--Therefore, there are probably a lot of low-rated fics that don't use canon characters. (Only 8 (or 1%) of the top stories have only the OC tag. Another 5 (or 0.625%) use only the OC and Other tags. Unfortunately, we don't have that data for the entire site.)
-High-rated fics use Fluttershy and Doctor Whooves less than average.
High-rated fics also use Sunset Shimmer and Flash Sentry less than average. This is likely just because there are hardly any stories that use either of them, but it could also be because people don't like them and/or Equestria Girls.
-Maud is the most disproportionately represented pony in the highest-rated fics. She appears in only 0.13% of the total fics, but she's in 0.625% of the best fics, almost 5 times more frequently. Though admittedly, that might not mean much with such small numbers.
--Cheerilee appears 4.5 times as often in the best fics. Applebloom appears 4.25 times more. Bon Bon appears 4 times more. Cadance appears 3.3 times more. But all of them also have low numbers overall.
--Among the characters who actually appear a considerable number of times, Spike appears almost 3 times more frequently in the highest-rated fics. After him, Luna appears almost 2.5 times more frequently, and Celestia appears 2.3 times more frequently.

So I guess that if you put everything I just figured out together, it says that if you want to write a story that lots of people will like, you should make it a slice of life about Twilight and Celestia. And adding or swapping in Luna and/or the comedy tag wouldn't hurt either.
This shouldn't really be a surprise, because if you look around here a bit, you will see that there are a lot of slice of life and slice of life/comedy stories about Twilight and Celestia and about Celestia and Luna.

Oh, and you might want to reconsider any plans you might have had to write a Dark/Tragedy/Human/Anthro/Crossover featuring Doctor Whooves, Fluttershy, Sunset Shimmer, and Flash Sentry. (But I'd read it.)

Okay, a little bit more. Again, of the top 1% (800) stories:

508 (63.5%) are rated Everyone.
283 (35.375%) are rated Teen.
9 (1.125%) are rated Mature.

26 (3.25%) use the Sex tag.
15 (1.875%) use the Gore tag.

623 (77.875%) are complete.

53 (6.625%) are over 100,000 words long.
235 (29.375%) are under 5,000 words long.

224 (28%) were published within the last year.

Okay, I think I'm done now. :twilightsheepish:

2428780

Shouldn't you look at what's more common among the top 1%?

For example, if 99% of fics feature Twilight, but only 90% of the top 1% do, then featuring Twilight is clearly bad.

2428048

Of course, one unfortunate side effect of this is that you would only read the stories that everyone else thinks are good. And while you can usually trust that a story that a few thousand people upvoted will be pretty good, everyone has different tastes. And you'd miss out on plenty of hidden gems that just haven't become popular for one reason or another.

I know! The popular stories get ever increasingly more popular. What gets me are the 14021 stories without a comment.

I am loving all of this data, by the way. I'm happy I got those spreadsheets working because I wanted to see what others would do with data like it.

But are you going through each story and counting individually? That's what computers are for. But then again you did get actual character tag data, and whether there is a Sex/Gore tag, so your way is more superior in that respect.

But I didn't even think about answering the question, "What aspects of a story makes it more popular?" Really good ideas to think about, thanks The Letter J.

2429081
I did. Among the top 1%, Twilight is more common than any other character, showing up in about 35% of the stories. While she is also the most common character overall, she only shows up in about 20% of all stories. So she shows up more commonly among the top 1% than she does in the remaining 99%.

The situation you described is the case with Fluttershy, Doctor Whooves, Sunset Shimmer, and Flash Sentry. About 6% of all fics feature Fluttershy, but only 5.5% of the top 1% do. And 1.56% of all fics feature Doctor Whooves, but he's only in 1.125% of the top 1%. Sunset is in 0.36% of all fics and Flash's is in 0.23%, but both are only in 0.125% of the top 1%.
Admittedly, those numbers are probably all within some margin of error, so it might not mean much. But even so, almost everypony else appeared at least around twice as often in the top 1% as they did in total, so I think it's at least interesting.

Actually, the best examples of what you're describing are probably the Dark and Mature tags. Dark is the third most common of the categories, with almost 30% of stories using the tag. But my numbers show that only about 7% of the top 1% have the tag. And almost 15% of the total stories mature, but only just over 1% of the top 1% are.
So clearly it's mature stories and dark stories that are bad, if you want to write a popular story.

2429101

One thing to keep in mind is that I did not have access to the character tags, I was looking through the content of the descriptions of the stories and seeing if a character happened to be in there. So my dataset and your dataset aren't completely interchangeable.

2429099

But are you going through each story and counting individually?

Heheheh. Pretty much.
I tried using your spreadsheet at first, but I quickly realized that it would only let me sort by number of upvotes, while Fimfiction's rating system using some formula that involves upvotes, downvotes, and possibly the temperature of Knighty's lunch. So what I ended up doing instead was going to the list of all time top stories, hitting the "switch to compact view" button so it would show 200 stories on each page, and opening up the first four pages of that list to get the top 800 stories. Then I opened up another tab where I searched for the best stories for a given tag, and then I just did some CTRL+F-ing to find the first story on the second list that wasn't on the first list. Then I knew that every story before that one on the second list was on the first list, and thus in the top 1%. So then I did have to count the stories above it. I would have made a computer do the counting for me if I could have come up with an easy way to make it do so, but I couldn't.

2429114
I know. That's where that margin of error mostly comes from. And it probably is why almost all the characters showed up more frequently in my data than in yours. I don't know why I didn't mention that earlier.

Here's another little statistic for you (apparently I lied when I said I was done): Of the top 10 most-followed people on the site, Eakin (the 10th most followed) has the most stories in the top 1%, with 9 of them there. Surprisingly, Aegis Shield (4th) and darf (8th) both have no stories in the top 1%
I am definitely not going to try to figure out if there's anyone else who has more stories in the top 1%. That would be way too hard to do.

Whoa, that is a ton of words. :rainbowderp: Wonderful job in compiling all this information. If I filter out the chaff, I could read the top 1% surprisingly quickly, but yet I only knew a fraction of users with the highest word counts. The MLP fandom really is unlike any other I've ever encountered. Now I know for sure I'll have to make my story 100,000 words if I figure some things out and stop lazying around.

¡10 , T0T,001,0TT , TT0,T10,001 Words!

¿Can you guess the base?

2429101

Sorry. I just skimmed it. I wasn't sure what you did.

You should make a table that has all of the likelihood ratios P(fic x has tag y|fic x is in the top 1%)/P(fic x has tag y|fic x is not in the top 1%). For example, Twilight gives a likelihood ratio of 35%/20% = 1.75.

Assuming all of the tags are independent (they're not, but it's fun to assume), you just multiply the likelihood ratios of each of the tags and you get the odds of the fic being in the top 1%.

To be more precise, you'd also have to take into account the tags it doesn't have, and multiply by P(fic x does not have tag y|fic x is in the top 1%)/P(fic x does not have tag y|fic x is not in the top 1%) for each one it doesn't have.

To simplify the math, you could multiply those for all of the tags to get the base rate (probability of a fic with no tags getting into the top 1%) and then for each tag, you multiply by (P(fic x has tag y|fic x is in the top 1%)/P(fic x has tag y|fic x is not in the top 1%))/(P(fic x does not have tag y|fic x is in the top 1%)/P(fic x does not have tag y|fic x is not in the top 1%)), which cancels out the evidence of not having the tag and adds in the evidence of having the tag.

You can also take the log of everything so that you add instead of multiplying.

Maybe I should just do that myself. It sounds like it might be fun.

2429913
I took Probability Theory last semester. I thought I was supposed to be done with hat stuff now.
If you want to work it all out, then go ahead. I probably won't end up doing it myself.

Of course, the problem with the whole idea is that not all fics are created equal. There's a lot more that goes into making a story popular than just putting the right tags on it. There's the actual quality of the story itself, first of all. Even if it has all the right tags, no story is going to end up in the top 1% if it's poorly written and full of misspellings and grammatical errors. Even though I've seen plenty of people complain about terrible stories somehow making it into the featurebox (and it definitely does happen), I'd be willing to bet that the top 1% are at least pretty well-written.
And then there's visibility. A story can't become popular if no one ever sees it. So stories written by popular authors and stories that are in a lot of active groups are going to have an instant advantage. And a good story that happened to get posted when a bunch of other stories also got posted is likely to get buried and missed, especially if those other stories happened to be written by popular authors or were visible for some other reason.
I can't imagine any simple way to factor any of that out of the calculations. But I guess there's no reason you couldn't just assume that it all just evens out. If you assume that quality, visibility, and the tags are all distributed randomly (though not evenly, of course) (and I'm sure that this is not an entirely accurate assumption, but it might be good enough), then the whole thing probably would work.

And as an interesting (but probably not incredibly useful) note, there are 30 stories on fimfiction that do not use any tags. None of them are in the top 1%. And a brief look through their short descriptions makes it look like most of them are just stories that the author forgot to tag. There's only one I can see that I know was deliberately left tagless for a good reason.

Awesome video! I'm glad my work is of use to somepony. :twilightsmile:

I did some more stuff. 2429913, you might be interested in this.

So now I've got accurate (minus whatever minor changes have happened in the past 24 hours or so) counts of the tags used in both the top 1% of all stories and in all stories.

So we've learned that the most used character tags are, in order, OC, Mane Six, Other, Celestia, Twilight, Luna, Rainbow, and Spike.
The most used categories are Adventure, Romance, Slice of Life, Comedy, and then Dark.
Just over 50% of stories are 5000 words or less. 87% are 25000 words or less.
Suri Polomare is statistically the most likely character to get a story into the top 1%, with 9.52% of stories featuring her in the top 1%. But really, that's only actually 2 of the 21 stories about her. There are several other characters with no more than 300 stories about them with relatively high percentages as well. The character tag that actually gets used a lot that has the highest percentage of stories with that tag in the top 1% is the Changeling tag, with 49 of its 1725 stories in the top (2.84%). The next often-used and highest ranked tags are Twilight (with Twilicorn removed), Scootaloo, Twilight (with Twilicorn included), Sweetie Belle, Apple Bloom, Cadance, Rarity, Celestia, Chrysalis, and Pinkie.
Colgate is the most-used character without any stories in the top 1% with 354 stories. The characters with fewest stories in the top 1% who actually have a decent number of stories about them are Soarin, Spitfire, and Doctor Whooves.

There's probably some other fun stuff you can see by looking at the document I linked to.

2429253

The base is Balanced Ternary.

Login or register to comment