• Member Since 11th Apr, 2012
  • offline last seen Wednesday

Bad Horse


Beneath the microscope, you contain galaxies.

More Blog Posts758

Jun
12th
2017

Author pie charts graph · 12:57am Jun 12th, 2017

This is a follow-up to Author heat maps, which is a follow-up to Author clusters question.
 
I thought I was going to have a very exciting graph for you today, showing how a computer could objectively find authors on fimfiction who had well above-average chances of being good.  It turned out that all it was doing was picking authors who followed a lot of other authors.
 
This turns out to be a powerful indicator, IMHO, of story quality--in my set of 131 popular authors, according to my pick of which ones I liked best, an author who followed at least 10 other people was more than 10 times as likely to be on my list of "good authors" than an author who followed less than 10 other people.  That's still interesting, but not as useful as what I had been hoping.
 
It also turned perverse at the high end.  That heuristic didn't exclude any of the midlisters like me, but it excluded several of the most-popular authors, who usually follow 50 or fewer people, and often less than 10.  (At the time I built the database, that included Pen Stroke, device heretic, & Aegis Shield).
 
Multi-dimensional scaling
 
Multi-dimensional scaling (MDS) takes a list of distances between n points in an M-dimensional space, where M is large, and tries to put all those points in some m-dimensional space, where m is small, so that the distances between them in m-space are close to the distances between them in M-space.
 
MDS is not a dimension-reduction technique, because it stuffs the n points into m-space and then nudges them around iteratively until it finds a close match.  It never produces a function or parameters which will then take any point in M-space and map it onto a point in m-space.  That means it doesn't really produce a new set of dimensions.  (It does, but the new dimensions are defined only extensionally, not intensionally.  Consult an AI researcher for an explanation of that distinction.)
 
Anyway, you don't have to know the math--I don't--because all it does is throw points into an M-dimensional space and shake them around until the distances between them look right.  I used multi-dimensional scaling to project the distance vectors used to make the heat map into two dimensions, using the R igraph library’s layout_with_mds function, using default parameters.
 
Instead of plotting points, I plotted a pie chart for each author to show what kind of stories they write.  Each author’s pie chart shows their tag usage, where each story is assigned to just one category based on its tags, using this order of priority: red, blue, pink, green, white.  (That means a story that can be labelled red, blue, or pink will be labelled red.)
 

Color   Category              Tags
Red       Sex         mature and sex
Blue      Sad         sad, dark, or tragedy
Pink      Romance     romance
Green     Fun         action, comedy, or random
White     Other       none of the above

Fig. 2: Community structure projected onto a plane. Click to enlarge. This figure includes more authors to show density.

 
Font size of author name is proportional to number of followers.  A number after a name indicates how many stories that author had published on EQD by Oct. 2015.  A name is in italics if that author had a story in the Royal Canterlot Library (RCL) by Oct. 23. 2016.
 
This graph has 69 authors, so you may notice a few new names..  I got the additional authors by dropping the number of other authors to follow from 10 to 6.
 
First, look what happened to our two big blocks from the heat map:

  • The "popular authors" are in a red blob on the east side of the graph.

    • The gore subgroup is far in the east, except for ed2481 (human adventure crossovers).
  • The "character authors" are on the west side.

    • There is a colony of pink romance writers on the southwest edge of the character writers.  This suggests that romance is more about character than about sex.
  • The two main groups are separated by a wedge coming from the north of the same authors who separated the two clusters in the heatmap: Daemon of Decay, meme-asaurus, OverlordFlinx, Brony2893, BronyStories, Jet Howitzer, Hoopy McGee.

 
Next, see how the types of stories are distributed:
 

  • Pretending for a moment that anything with the mature sex tags is Clop, it's mostly in the "popular" group.
  • Sad stories are everywhere.
  • Fun stories appear to occur more outside the clop area, but that might just be because of the order of priority of assigning colors to stories.  A lot of those mature sex stories are comedies.  (I tried assigning colors to an author proportionally to the number of times she used the tag, but that made it harder to see concentrations of the tags that I think dominate a story's type, like "mature+sex" or "tragedy".)
  • There's a slice of pinkish romance authors extending from the popular cluster SSE towards bookplayer and Steel Resolve.  Romance authors cluster poorly because romance readers (and writers--it should go without saying, in fan-fiction), unlike porn readers, are often committed to particular ships.  That may be why there's less pink in the graph than you'd expect--it's probably hard for a TwiPie shipper to reach 1000 watchers.
  • The Other category, stories without any of the tags listed above, occurs more often among authors whose stuff I'm likely to read.  Here there be Slice-of-Life and tagless.

 
Where are the authors you recognize?
 
 
 
Great Sadness
 
Using this dataset, the award for "saddest or darkest author on fimfiction" is a 4-way tie between me, Rated Ponystar, Rust, and… Absolute Anonymous.  Huh.  This is particularly interesting as only half of our stories are saddish.
 
For comparison, I went thru 2 anthologies of famous short stories, the ones that literary critics say are the best 20th-century short stories in the English language. I decided, for all the stories that I had read, which category I would assign to them (porn, sad, fun, romance, or other).  The results were:
 
Robert Penn Warren & Albert Erskine, eds., 1954. Short Story Masterpieces. Porn 0, Sad or Dark 14, Fun 4, Romance 0, Other 0.
 
The Norton Intro to Literature, 9th edition, 2005, p. 1-786. Porn 0, Sad or Dark 14, Fun 0, Romance 1, Other 9.  The "other" stories were all tagless stories with no attitude or conclusion, either purely intellectual, or angst-ridden.
 
While there was only one romance in this set of 42 "masterpieces", there were quite a few dark, cynical stories about marriage. The one "romance" was Chekhov's "The Lady with the Dog", which is a weird, plotless romance about a married man and a married woman having an affair, which drags on, is sort of sad but not entirely, and then stops, with no resolution--really presenting the idea that of course there is no resolution, this kind of shit just happens and then you die.

Report Bad Horse · 1,028 views · #graphs #fimfiction #authors #sad
Comments ( 25 )

As usual, I'm not even on the map. :derpytongue2:

4568528 You didn't have 1000 followers in October 2015. Also, didn't I kill you? :trixieshiftright:

Wanderer D
Moderator

It amuses me that I usually hover around the center-ish of all your charts when I'm included. :pinkiecrazy: I wonder what that means?

4568537 I think the graphs usually organize around you, Pen Stroke, shortskirts, & Aegis Shield, because you have the most followers. (Except for RainbowBob, who refracts mathematics and rationality around himself whose crackfics always pull him toward the popular group.)

  • The "popular authors" are in a red blob on the east side of the graph.

    • The gore subgroup is far in the east, except for ed2481 (human adventure crossovers).
  • The "character authors" are on the west side.

So--you choreographin' da rumble for BronyCon?

4568534
You tried, but I'm stab-resistant.

Also, I feel that I did have a thousand followers then, and you were probably deliberately leaving me off your charts, just so I wouldn't break them. :derpytongue2:

4568546 I should have anticipated this. :facehoof:

While I did like Chekov's "Lady and the Dog" that's an absolutely fantastic summary of it.

Also, only 3 on EQD in 2015? Makes sense. I only really came into my own since then; I think the number now is 9, out of the sixteen total I've written? Much more respectable. And I just got the RCL too, so I got snubbed the italics by a few months at best. Rats, drats, darn and curses.

 It turned out that all it was doing was picking authors who followed a lot of other authors.

Is it, tho?

That's a genuine question. Are you conflating "has an account on fimfiction, thus rendering them capable of following and being followed" with "is an author on fimfiction?" (And, even if you are, would it even make a difference to the analyses?)

Those are disjoint datasets. Example: myself! I've been here for years and haven't written a damn thing. I'm pre-reader and a commentator. I don't even blog. But that, by itself, has gotten me followed by a few big names; yourself, of course, Bad Horse, but also Skywriter (who I pre-read for), horizon, and Bradel. Do I count as an author being followed by other authors? I'm not sure I should.

Another good example: Monarch Dodora. Not an author! Commentator, critiquer, and pre-reader. Small follower count, but, again, some extremely heavy hitters in there: Bad Horse, Fan of Most Everything, PresentPerfect (!), Titanium Dragon, Bradel, Skywriter. Dude has earned a lot of respect just by being, you know... smart. But does following him count as following another author?

I'm given to understand that this isn't uncommon; a lot of authors on fimfiction follow people who comment on their stories, or whom they see commenting on the stories of others, and who they decide "I like this person" or "this person writes persuasively enough in comments that I will be interested to get an alert if they ever become an author or start blogging."

Again, I don't know if this matters for the purposes of your analyses, tho.

 Pen Stroke, device heretic, & Aegis Shield

All relatively reclusive and dead, despite their output and blogging outreach. Pen stroke and dh were both very much in their own categories (despite all the fun we could have making fun of each), but aegis shield is the true outlier here. His work is the perfect mix original stories and fan service, to the point where when he was active he was the guy I pointed newbies to to get acquainted.

How does black rose raven appear in these trends? He's in the top 3 for words written, but doesn't have that many followers, and falls safely in the green category for most of his fics. I've only read a few of his fics, they're alright, but they are hella original, to the point where most of the times someone in the comments will ask if he's missing a crossover tag, but nope there's a fully formed universe popped out of nowhere but his head.

It would be interesting to see words written as a dimension in your graphs. Do the most prolific authors follow the trends or buck them?

This turns out to be a powerful indicator, IMHO, of story quality--in my set of 131 popular authors, according to my pick of which ones I liked best, an author who followed at least 10 other people was more than 10 times as likely to be on my list of "good authors" than an author who followed less than 10 other people.

I think there are likely confounding factors involved here. If your list of good authors is based off of stories you've read and liked, then one could explain the observation simply by the fact that authors who follow more people are more involved with the community and because of this, their stories are more visible to you. This could also be true of more objective measures of author quality like number of features in EqD (authors less inclined to be involved in the community would probably not value a feature in EqD as highly). Does this criteria hold up if you consider only authors with whose work you are familiar? Does it hold up to a more systematic sampling of the authors involved?

Regarding the MDS results, given that the authors with the most followers seem to fall in the center of the network (as expected), it could be interesting to calculate the center of mass weighted by follower number, then measure each authors' distance from the center of mass. Maybe that would give some measure of how "niche" an authors stories are.

4568684

Are you conflating "has an account on fimfiction, thus rendering them capable of following and being followed" with "is an author on fimfiction?"

Good question, but no. This was making use only of follows by people in the graph shown here of other people in the graph. I didn't test "number of users followed".

4568695

If your list of good authors is based off of stories you've read and liked, then one could explain the observation simply by the fact that authors who follow more people are more involved with the community and because of this, their stories are more visible to you. 

That is true in general, but the authors in this set all have comparable fame due to the 1000-watcher minimum. There were 152 users with >=1000 watchers, but only 37 with 2000 watchers.

Regarding the MDS results, given that the authors with the most followers seem to fall in the center of the network (as expected), it could be interesting to calculate the center of mass weighted by follower number, then measure each authors' distance from the center of mass. Maybe that would give some measure of how "niche" an authors stories are.

I wonder if some publisher would pay for such a tool. Hard to collect the data, though.

4568691
Weird, I forgot BRR was on fimfiction. I ran into him or her on fanfiction.net before I found fimfiction.

Anyway. BRR has like 560 followers, so never showed up in my analyses. But I recommend BRR! Also, he/she gave me pre-reading help way back before I'd posted anything on fimfiction. Doesn't follow anyone, though. Which is probably a reason why only 560 followers.

4568731

I wonder if some publisher would pay for such a tool. Hard to collect the data, though

This is likely an analysis that a company like Amazon could do.

4568691

It would be interesting to see words written as a dimension in your graphs. Do the most prolific authors follow the trends or buck them?

What trends do you mean?

4568695

If your list of good authors is based off of stories you've read and liked, then one could explain the observation simply by the fact that authors who follow more people are more involved with the community and because of this, their stories are more visible to you. 

You could suppose that authors who are actually good are more likely to benefit from social networking, so the set of authors with >1000 watchers is enriched in authors who follow a lot and are good. The >1000 watchers authors get their either by being good writers, by writing what people want to read, or by being lucky. The number of good writers may be small enough that you wouldn't expect to find more than a few without the help of social networking.

Comment posted by Super Trampoline deleted Jun 12th, 2017

I hope I can reach 1000 followers before the fandom dies and bad horse starts using current data, so I can be included in these studies and skew the data with my army of crappy random comedies

4568555

(Maybe you'd prefer...)

'LO BAD

'LO ED

YOU GOT PIE, BAD?

I GOT BETTER THAN PIE, ED

WHAT POSSIBLY BETTER THAN PIE, BAD?

MAP TO ALL FIMFIC AUTHORS WHO GOT PIE

OOOOOOH! WHAT COLORS MEAN?

COLORS SHOW WHAT KIND OF PIE AUTHOR HAVE

THAT CUNNING!

YES. YES IT IS

i.ytimg.com/vi/vDPtDSXpYGU/mqdefault.jpg

I thought I was going to have a very exciting graph for you today, showing how a computer could objectively find authors on fimfiction who had well above-average chances of being good.

What fuels your hunch/conviction that a computer can find this? Is it something concrete, or more (at this point) intuition? I place a lot of value on intuition, so don't be afraid to say it's that.

4581615
It seemed to be doing that. I thought about it, and came up with a theory for how it might work. But eventually I found that it was just choosing authors that followed more other authors.

4581752
This is a dumb question, because I'm sure you've answered it already but I'm having difficulty grasping it (I've reread the other two blogs): what exactly does distance between authors represent here? Is it style similarity, so that perhaps you're feeding it stylometrics data, or does it mean they share similar followers? Or does it mean they follow each other? Since you said it wasn't choosing good authors so much as authors who follow many other authors, does distance just represent that, so that two close by authors follow a similar number of (same, different?) authors?

4583915
I can't explain it more simply than it's explained in this post. It isn't simple. I don't even know exactly what the distance represents; I know basically what MDS does, but I don't know how this particular MDS implementation does it. The original, high-dimensional distance is explained in the first 2 posts. It's a distance based on the probability of a user following author B given that the user follows author A, and vice-versa.

4584025
Hm, do you plan to keep working away at the data to see what else it might reveal, or is this about all you can do with it?

4584147 The data, yes, but this particular graph, no.

Login or register to comment