Statistics #5: Friendliest Fleet · 9:05am Mar 22nd, 2019
Well, not statistics per se, but mathematics. With some unexpected conclusions.
It started yesterday, when Fylifa, who had been asking me to figure out a way to compare the popularity of different ships based on the Fimfarchive dataset – something that remains a puzzle, because FimFiction does not have shipping tags, and figuring out which ships a particular story espouses from text alone is nontrivial – came up with one herself.
That is, FimFiction does not have such tags, but Derpibooru does. Unfortunately, it does not have a list of all the shipping tags, or a category for them, so she spent some time fishing for those tags and counting how many images each tag has, and while she was doing so, I had a question…
What is the highest scoring set of non-intersecting ships?
Okay, let me explain. Imagine each ship as, well, a ship. With a complement of marines on board, reflected in the score – the number of images a tag has. Now, we want to assemble a fleet to send them all against some common enemy, but we can’t send ships which will fight other ships in the fleet – and barring comparatively rare polyamorous shippers, it will mean that Fluttershy can’t be shipped with Discord and Big Mac simultaneously, we’ve got to make a choice. So which ships should we pick to get the biggest possible force of marines at the destination?
To put it into less metaphorical terms, which combination of ships will satisfy the highest possible number of readers?
This called for some mathematics.
I posed the question to PCRS regulars, and Undome Tinwe recognized it as a maximum weight independent set problem: Let’s represent each ship as a graph node, and draw edges between every pair of nodes that share a character. Then, the requisite combination of ships will be the maximum weight independent set of that graph.
I basically know jack about graph theory, but it was enough to get me coding, and soon, I scrounged up enough libraries that had a better idea of graph theory than me. What I got was code that could compute approximations of such a set – because it’s an NP-hard optimization problem. So I made it compute a lot of solutions and pick the highest scoring one.1 The next step was to weasel a complete data set out of Derpibooru, because picking at the likely tags like Fylifa did was going to take forever.
After much sifting – don’t do this with filters turned off – I came up with this heuristic:
- Go through character tags from highest scoring to lowest scoring.
- On each tag, look at the “implied by” section, which contains a set of tags, some of which are shipping tags. Every shipping tag implies both components of a pairing, so you’ll find all of them this way. Exclude shipping tags that involve more than two characters, because we’re talking about pairings here.
- Ignore ships which total under 50 images, because otherwise we’re going to end up with thousands of nodes and I’ll be computing into the next week.
This allowed me to finish collecting data quickly enough, and I’m fairly confident I missed at most one or two relatively minor ships. It’s worth noting that the number of pairings that make the grade is much smaller than you would expect – 257. It further drops down to 170 if you place the cutoff at 100. Even if I missed any, it is apparent that they do not affect the result significantly, and the result is somewhat unexpected:
Highest scoring fleet:
5504 - Rarity / Spike
4640 - Applejack / Rainbow Dash
4181 - Trixie / Twilight Sparkle
2559 - Bon Bon / Lyra Heartstrings
2548 - Octavia Melody / Vinyl Scratch
2175 - Discord / Fluttershy
2120 - Sci-Twi / Sunset Shimmer
1996 - Cadance / Shining Armor
1638 - Celestia / Luna
1292 - Derpy / Doctor Whooves
873 - Cheese Sandwich / Pinkie Pie
747 - Big Mac / Cheerilee
735 - Starlight Glimmer / Sunburst
576 - Bright Mac / Pear Butter
536 - Soarin / Spitfire
497 - Button Mash / Sweetie Belle
389 - Aria Blaze / Sonata Dusk
354 - Diamond Tiara / Silver Spoon
267 - Gallus / Sandbar
254 - Bow Hot Hoof / Windy Whistles
254 - Chrysalis / Nightmare Moon
248 - Carrot Cake / Cup Cake
240 - Night Light / Twilight Velvet
211 - Apple Bloom / Scootaloo
192 - Cloud Chaser / Flitter
165 - Fancy Pants / Fleur
149 - Aloe / Lotus
147 - Ember / Thorax
143 - Maud Pie / Mudbriar
107 - Radiant Hope / Sombra
101 - Ahuizotl / Daring Do
93 - Flash Sentry / Thunderbass
85 - Flurry Heart / Pound Cake
83 - Snails / Twist
81 - Berry Punch / Minuette
66 - Braeburn / Little Strongheart
58 - Rumble / Thunderlane
57 - Double Diamond / Night Glider
54 - Carrot Top / Written Script
Total fleet score: 36415
Yeah, I’d love FimFiction bbcode to have a table, but alas. Should the cutoff be placed at 100, rather than 50, the optimizer arrives at the same solution every time – all the lines below “Ahuizotl / Daring Do” are gone, but the rest doesn’t change. Anyway, notice the following observation:
- Heterosexual pairings are over-represented in the friendliest fleet, even though they are otherwise a minority. As Undome Tinwe observed, the reasons are structural: M/M ships are less popular than F/F ships, but an F/F ship takes away two popular characters – together with the potentially higher scores for their other ships – out of the sum, while M/F only takes one, so the sum ends up higher.
- Pairings which don’t have a lot of alternatives always make it into the total: Their ships are already not in conflict with anyone else. Conversely, characters who get shipped with the entire cast might not make it into the fleet at all if their highest scoring ship is not popular enough.
- The most apparently popular ships are the oldest ones. Somehow, no ship can exceed the popularity of Sparity, still.
Happy pandering, and I hope you will find this fleet useful!
P.S. You can see both the code and the dataset I ended up using in the Github repository. If I missed your favorite ship, feel free to comment – as long as there is a Derpibooru tag for it so that it can be assigned a score, and as long as that score is higher than the cutoff, it’s trivial to add it.
P.P.S. See the further adventures in logistics in Statistics #5.1: Modernized Friendliest Fleet
Huh. Didn't know these three were still so popular. Old fandom favorites must have a lot of inertia -- if nothing else, it's got to take some time to overtake the backlog that built up back in the fandom's heydays.
...
Ew.
5031380
Perfectly time-weighted data of this kind only exist after the series is long done, and even then not always.
Up until I got 250 pairings in, Sunburst was consistently getting shipped with his mother. It’s Derpibooru, what did you expect.
It only it was possible to do the same to Fimfiction ships, we'd be able to compare the two - and establish, once and for all, which is more sick, wrong and repulsive than the other.
I'd suggest some sort of AI reader, but that's equivalent to saying "some magic here".
5031385
I got a beefier GPU and 8 gigs of GPU RAM now. I’ll think of something eventually.
Three sibling incests. Not bad at all
5031397
🎵 Questionable content here is socially accepted 🎵
This degree of statomancy makes me feel inadequate. ^^
I'd love to solve this problem though. :)
5031403
Amen!
I always knew we were doomed once Cadence discovered graph theory.
I just want to chime in and say that, in light of the statement about the advantages of straight shipping in maximizing the objective, the fact that AppleDash is still optimal is likely indicative of the sheer popularity of the ship, because it's a ship that removes two of the Mane 6 from the graph. Alternatively, it could be indicative of AJ being one of the lesser-shipped M6, so that the penalty for hooking her up with Dash isn't that great.
It utterly baffles me that Twixie is still popular. It was never a good idea, but it would seem to have been definitively shown to be such by Season 5.
5031432
It had a lot more time to accumulate a lot more artwork. I expect that eventually, Startrix will overtake it.
At which point, Sunburst will end up without a partner and get shipped with his mother in the Friendliest Fleet, because his mother is not shipped with anyone else. You win some, you lose some.
5031432
5031435
Is it possible to apply a recentness weighting to the stats?
5031442
Only if you can somehow get me the entire Derpibooru database without the actual files, but with upload dates for every file.
5031444
Not on my mobile I can't.
5031402
Actually, it's four (Celestia/Luna, Aloe/Lotus, Flitter/Cloudchaser, and Thunderlane/Rumble). I guess if it wasn't for AppleDash, AJ/Big Mac would also score high.
This makes me wonder which characters are the most promiscuous, that is they get involved in more ships. Like, I think Bon Bon is shipped almost exclusively with Lyra while Dash is the town bicycle.
5031455
Check the CSV file in the repository linked above. But in general you’re correct.
5031435
I've seen some shipping of Stellar Flare with Chrysalis. But I'm not sure how much artwork there is of that ship.
BTW, who the heck is Thunderbass? (googles) Oh, that guy. I didn't know he had a name.
5031461
Not enough to make the cut, I checked.
Yes, I was surprised too.
5031380 5031402
I was shocked at Celestia/Luna too, but then I thought about it... how many other ships do they have? There's Luna / Big Mac, Luna / Twilight, Celestia / Twilight, but all of those would remove other more popular ships. The one exception is Celestia / Discord (also my personal favorite), but Fluttershy / Discord still at least rivals its popularity.
Okay, I've seen one each of Celestia / Twilight Velvet and Celestia / Troubleshoes.
5031528
Precisely why the optimizer filters them out, yes. Sets containing Celestia / Discord are close enough for the optimizer to come up with them some of the time, but ultimately, they end up scoring less than this by about 1000.
Aloe and Lotus just don’t get any other ships at all. Thunderlane has more popular ships than Rumble, but all of these are with characters who have much more popular ships than any of Thunderlane’s, so he falls back to Rumble because there’s nobody else left. Flitter / Cloudchaser only get minor ships in general and their other ships lose the competition, forcing them together. Sunburst only has ships with Starlight and his mother, and if Starlight grabs Trixie, Sunburst / Stellar Flare is inevitably included, because Stellar Flare has no other ships at all.
If I remove all the incestuous ships from the initial data entirely, the result will be wildly different:
Notice the appearance of Celestia / Discord, which now wins, the sudden shipping of Luna to Pipsqueak – because she doesn’t have the much higher scoring sister to pair to anymore and gets the shortest end of the stick – and Thunderlane now gets Cloud Chaser, which is a better result than Rumble, but only because Cloud Chaser / Flitter is no longer an option.
5031459
Hmm, "Flashdash" is a great ship name (also, for a while I was convinced there was some fandom drama regarding Berry Punch due to the "berrygate" tag). And yeah, Bon Bon has exactly one ship. Even Lyra is occasionally cheating with Octavia...
5031542
Rumble: “This is like the worst game of musical chairs ever.”
Thunderlane: “When I joined the Wonderbolts I didn’t expect this kind of service.”
CadanceAI: “This way satisfies shipping through love and ponies.”
5031528
Cries in Novolestia
At least your ship doesn't have a score of 17... 1 of which is the cover I commissioned. I guess I am 5.8% responsible for that ship, yay.
5031542 And we thought Luna was the fans' favorite princess!
5031595
If you truly have a favorite princess, why would you want to ship her with someone else?!
Some of the popular ships, like Twilight and Trixie or Derpy and the Doctor, are ones that you don't see that often these days but saw very frequently in early writing (2012-2014).
It's interesting to think that these pairings may be weighed towards opinions from the early years of the show, since there was that huge bloom of fandom and content. I wonder, if this included only content created after say 2016, would the results be substantially different?
5031625
Definitely. However, as I said above, I would need data that isn’t readily available to me through manual collection from Derpibooru web interface to compute a more fair representation: Manually checking 250+ tags and seeing how many pictures each one has is feasible. Manually determining how many of those are dated to when is a lot more work… If I had a dump of Derpibooru metadata, it would be easy, but I don’t.
5031633
Hmm. It is kind of a pity that Derpibooru doesn't provide an easier way of checking when, precisely, an image was uploaded. You could get some good data from that. It might be interesting to see how and when certain ships peaked or sank in popularity, and what canon and fandom events this'd correspond to.
I actually looked over the full Github stats now...
Shocked to see Princest actually being the second-most-common Celestia ship after Twilestia, and the single most common Luna ship!
Also surprised that Rarity / Shining Armor didn't show up! If it wasn't for his already being married in the first (well, technically second) episode we saw him, they'd be perfect together.
Surprised again to see Cheerilee / Twilight, though now that I think about it they seem like they might do quite well together...
And what's the "Debatable" by Chrysalis / Nightmare Moon? Are you debating whether to count Nightmare Moon separate from Luna - and do any scores change when you combine the two?
5031712
Princest is wincest?
If you ask me to make a hypothesis, here are two:
Both could be two sides of the same coin.
I originally thought that Nightmare Moon cannot be thought to exist separately from Luna, and thus form an independent ship. MitchH’s comment that the Luna / Nightmare Moon ship is itself a thing eventually convinced me otherwise.
If I merge them, not much will change, because indirectly this will also require me to merge Celestia and Daybreaker, merging “princest” with “evil princest”…
With regard to getting a dataset from fimfiction, there are groups dedicated to ships, so those could be used to approximate the popularity of the various ships on fimfiction. If course, the groups don't capture every story (only those that have been added to the groups), it's not easy to find the group associated with a particular ship (and there may be multiple, and it may not be so straightforward to scrape such data from the site.
Still it would be interesting to compare the relative popularity of the ships on derpibooru vs fimfiction.
5031834
Pretty sure those aren’t in the dump. :) The metadata might be available through the API though, and I did ask for a key for a prior project…
But in general, text analysis is still where it’s at: However flawed the method I come up with eventually will be, it will at least be as objective as possible.