• Member Since 25th Feb, 2013
  • offline last seen Saturday

Titanium Dragon


TD writes and reviews pony fanfiction, and has a serious RariJack addiction. Send help and/or ponies.

More Blog Posts593

Sep
5th
2014

FIMFiction Rating System Explained · 8:58pm Sep 5th, 2014

So, I know a lot of people are curious about how the FIMFiction rating system works - that is to say, when you browse stories by rating, or look at the stories listed next to a story written by a given author, which stories come out on top. This is not the heat system, which is used for calculating which stories are in the popular stories box or the featured story box. They don't really talk about it too much, but I was lead to this post by Knighty on Reddit while reading Horizon's post on the subject, which explained that he was using the Wilson score interval method. So what is the Wilson score interval method?

The short explaination is that it is a means of calculating how likely it is for someone to upvote a story versus downvote a story.

The Wilson score interval is a way of predicting the true proportions of events in a binomial distribution. In English, it is a way of determining, given a certain number of events with only two possible outcomes (heads or tails, upvotes or downvotes) what the true proportion of heads or tails, or upvotes or downvotes would be - that is to say, the odds of getting one outcome or the other on any given coin flip or vote. It is a probabilistic distribution, meaning that it gives you a range of possible values it could have.

The formula being used to calculate how likely someone is to upvote a story versus downvote a story is this:

Where:

Total number of votes = n = upvotes + downvotes
Percentage of upvotes = p̂ = upvotes/(upvotes + downvote)
95% confidence interval = z = 1.96.

What is z? z represents the desired level of confidence in the results; in this case, 95%, meaning that 95% of the time, the true value will lie in between the two numbers given by this equation. Now, using the median of this range would, inherently, not give a number very different from rank order by percent upvotes (p̂), so Knighty, in order to reward stories which have tighter confidence intervals, takes the lower bound and uses that for his rating - that is to say, the ± in the equation above is -. This has the effect of causing stories with a more tightly bound confidence interval to be rated higher than stories with a less tightly bound confidence interval, even if their predicted true proportion of upvotes would be identical. This makes sense if you think about it, as the system is more confident in the rating of a story with a larger absolute number of votes.

And now you know! And knowing is half the battle.

Okay, maybe you don't know. So let's give some examples! These numbers are taken from the highest rated stories on 6/10/2014 because I happen to have that data.

The highest rated story at that point, Hard Reset, had 4140 upvotes and 55 downvotes. Thus:

n = 4140 + 55 = 4195 (total number of votes the story got)
p̂= 4140/4195 = 0.986889 (this is the percent upvotes of the total votes that the story got)

And the equation tells us that it is 97.5% confident that the minimum p̂ value is at least .982977338.

(Why 97.5% confident and not 95% confident? Because 2.5% of the time, the value will be below the 95% confidence interval, and 2.5% of the time, the value will be above the 95% confidence interval. So when you take the lower bound of a 95% confidence interval, you will only be overestimating the true value 2.5% of the time, or 1 time in 40).

In layman's terms, this means that the equation believes that people who vote are at least 98.2977338% likely to upvote the story.

In other words, the equation is trying to determine how likely people are to upvote a story, and uses that to calculate its true rank order.

If you were to take what was the 8th highest rated story on the site at that point instead, The Stars Ascendant, which had 938 upvotes and 9 downvotes at the time:

n = 938+9 = 947
p̂ = .990496

You'll note that the p̂ on this story is higher than the p̂ on Hard Reset. However, if you actually plug the numbers into the equation, you end up with a value of 0.98204665. Why is this? Because it has a smaller sample size - the system is less confident about the proportion of people who would upvote this story versus the proportion of people who would upvote Hard Reset. Thus the system is 98.2% confident that people would upvote this story, versus 98.3% confident that they would upvote Hard Reset. Thus, this story, despite having a higher percentage of upvotes, is rated slightly lower because the system simply is not as confident in the true proportion of upvotes it would get.

Note that the highest rated stories are very close together - the #1 story at the time had a minimum expected upvote percentage of 98.3%. The #10 story at the time? A minimum expected upvote percentage of 98.2%. The #50 story at the time? A minimum expected upvote percentage of 98.12%. In fact, you had to go all the way down to the 102nd highest rated story to find something which had an expected upvote percentage below 98%.

Report Titanium Dragon · 1,964 views ·
Comments ( 38 )

Uhm.

I do not think I know any better than I did before reading this blog post. :applejackconfused:

I didn't learn a damn thing.

I don't do math... I learned nothing.

Oh, cool. I was looking for a math problem to waste time on. (All the ones in my textbook are tiny two minute things) And I learned something.

2430439
2430430
2430422
2430420
I added more. Lemme know if it makes more sense now.

2430469 I can't tell. I understood it the first time.

2430469 Now I understand :pinkiehappy:

2430422

Try drinking more cider. You still won't learn, but you'll have more fun. :ajsmug:

The algorithm might be able to distinguish the best of the best and the worst of the worst if the input would be more fine graded. If rather than voting up or down, one could vote on a scale from -99 to +99, it would be helpful.

When we switched to Up and Down from 1 to 5 —— ⸘what if a story deserves 0‽ —— because most ponies chose either 1 or 5. For that reason, a simple Down would map to -99 and a simple Up would map to -99 should be an option. Most would use the Ups and Downs, but those using the optional scale from -99 to +99 would help distinguish the best from the best and the worst from the worst.

2430720
Doesn't work. They used to use a more complicated system (I forget if it was 5 or 10 points) but because people on the site aren't professional raters, they pretty much always voted either the maximum or minimum amount, which is what usually happens in systems like this. Ebay feedback is the same way.

That's why they went to just up or downvoting - it was more or less what people did anyway.

2430755

I acknowledged that already:

Most ponies are lazy and just vote eitherhate or love ¡most are even to lazy to vote at all! The point I made, is that the small percent willing to rate, would help the algorith:

Many stories have voteratios of 99 upvotes to 1 downvote. This is a little hard on the algorithm. I shall use an illustrative example:

¡Hate! [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] ¡Love!

What we have hear is a series of RadioButtons with 1 selected. Most ponies who bother to rate at all will choose only the most Extreme values out of laziness:


¡Hate! [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] ¡Love!

Or

¡Hate! [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] ¡Love!

The overall rankings will be every similar under such a system as they are now, but the system would have an easier time finding ¡The # 1 Truly Bestest Story Evar!, thanks to the minority making fine distinctions. In other words:

It is less likely that the 2nd best story will round to be equal to # 1 because of course ratings, and then be declared # 1 by the tie-breaking algorithm. If less than 10% of voters use the full range for voting, it could help distinguish between the best and second best.

2430803
No public rating system will ever find the best story ever.

FIMFiction's system - and all systems like it - are really more about finding the most liked, least hated story. A great but controversial story does worse than a story which is okay, good enough to upvote but nothing to get excited about and which contains nothing which would cause people to downvote it.

Adding further graduation into the system won't change that.

2430993

In simulations of Score-Voting where half of the voters score honestly and half strategically (only -99 or +99), the voting system chose the best candidate (candidate whose political views are closest to the average of all voters) more often than Approval Voting (0 or 1):

Range-Voting with mixtures of honest and strategic voters

I just understood about1% of this. Not to sound mean or anything it just that I am not a math person here

I like how you show your work for calculating n and p̂, but not for the actual hard part of calculating this thing. I can't reproduce your answers.

Why the hell do we use infix notation, anyway?

2431192
The equation itself is all basic algebra; there's no upper-level math here.

If you think about flipping a coin, the odds of getting heads or tails are, under normal circumstances, about 50-50.

But in this case, we're talking about an event which can have two possible outcomes (an upvote or a downvote), but the odds of getting one outcome or the other (an upvote or a downvote) is unknown to us. So how can we determine the odds of getting an upvote versus getting a downvote?

Well, what we can do is look at the results we have actually gotten (namely, the actual upvotes and downvotes on the story) and use math to determine the "true" odds of getting an upvote versus getting a downvote - that is to say, if we had an infinite number of people vote on the story, what percentage of them would be upvotes and what percentage would be downvotes?

That's what this equation does - it determines the possible true range of values for the odds of getting an upvote versus getting a downvote.

In the case of what Knighty is doing, he is taking the lower bound of the 95% confidence interval - which is to say, a fairly pessimistic estimate of what the "true" upvote percentage should be - and using it to rank stories.

As you can see from the equation, a higher p̂ value (the percentage of upvotes on a story) increases the estimate, while a higher n value (total number of votes) decreases the uncertainty of the value.

So if you look at the equation, as n becomes larger, all the things multiplied by 1/n or 1/n^2 become much smaller. If n was infinity, everything in the equation would go to 0 except for p̂, which would be all that was left. As the numbers are significantly less than infinity, we end up with some small deviation in values which modify our p̂ value and give us some sense of our uncertainty about it. Something with a very small n will have a very large range of possible values, while something with a very large n will have a very small range of possible values.

2431260
I used Excel; I didn't do the calculations by hand because that's what calculators are for.

Alright, alright... but which variable is equal to cookies? :trixieshiftright:

Pretty much how I understood horizon's post. I didn't have the equation in front of me, but this just confirms that I knew what he was talking about.

Now, when do I get my lasers, and what color will they be?

2431628
Are you a good guy or a bad guy? :trixieshiftright:

2430593
I just woke up now with a hangover.
Goddamn you to hell!

2431635
But I already told you, I'm not a witch at all!

I thought the rating system over-penalized stories with few views.

The right way to do this is simpler. Sum up all possible true values for p, weighed by the probability of getting p̂ out of n upvotes given p. Divide by the sum of all those possible true values.

Well, conceptually. p has an infinite number of possible values, so you use calculus.

The probability of getting p̂ upvotes given p is (n choose p̂) p̂^p̂ (1- p)^(n-p̂). Let q = p^p̂ (1- p)^(n-p̂). The probability-weighted average p is then the integral from p=0 to 1 of pqdp divided by the integral from p=0 to 1 of qdp. The (n choose p̂) gets moved outside the integral in both numerator and denominator, and cancels out. As q is just a polynomial in p, it is trivial to integrate after plugging in the values for n and p̂.

Take the top-rated story, Hard Reset, u= 4,595 (u=p̂, because I'm sick of ctrl-pasting that damn p̂), n=4656, q = p^ 4595 * (1-p)^61, pq = p^ 4596 * (1-p)^61. So the answer, according to Wolfram Alpha, is...

integrate p^4596 (1-p)^61 dp from 0 to 1: 1/351376301759676201178254046124168022180847415485010423241112403564881135683936880048171076686696914180529454972946822250832633876012183115052800

integral_0^1 p^4596 (1-p)^61 dp = 1/346699330804523791458835465003580126651604706219215952171780293427263567969809768291411392969527483377783034576140746042255643042969513438553600

346699330804523791458835465003580126651604706219215952171780293427263567969809768291411392969527483377783034576140746042255643042969513438553600 / 351376301759676201178254046124168022180847415485010423241112403564881135683936880048171076686696914180529454972946822250832633876012183115052800

Wolfram Alpha won't take an input this long. LISP says:

(float (/ 346699330804523791458835465003580126651604706219215952171780293427263567969809768291411392969527483377783034576140746042255643042969513438553600 351376301759676201178254046124168022180847415485010423241112403564881135683936880048171076686696914180529454972946822250832633876012183115052800) )

= 4596/ 4658 = 0.98668957

Now take "The Truth About Myths and Legends", 384 up, 2 down.

Wolfram Alpha:
(integrate p^385 * (1-p)^2 from 0 to 1) / (integrate p^384 * (1-p)^2 from 0 to 1)
result = 385 / 388 = .9923

You know what, this is just giving me (u+1) / (n+2). I don't know why. It still seems like the proper way to compute it. Let's take a clearer example: Say we get 10 upvotes and 0 downvotes. We don't want this to score so well.

(integrate p^11 * (1-p)^0 from 0 to 1) / (integrate p^10 * (1-p)^0 from 0 to 1)

11/12~~0.916667

I think this is actually right. (u+1) / (n+2) gives the proper rating. Weird. Numerator and denominator are very similar and some trick must simplify them.

2434505
I disagree that it underrates stories with few votes; it rates them lower because downvotes are such rare events for many stories that ending up with a high ratio by coincidence is much more likely. A number of stories have accumulated something close to 300 upvotes with 0 downvotes before they got downvotes, but they ended up accumulating downvotes and then not being at the very top. Stories with large numbers of upvotes seem to max out at somewhere north of 98% but south of 99% upvotes, and your system would penalize them relative to the stories which have very few downvotes but which are likely to have so few downvotes by chance. When you're looking at the odds of an event being only 1-3%, you need a large sample size to actually get a good gauge on just how frequently that event is taking place. Penalizing stories which have only very small sample sizes is only logical, because the odds of them actually being so highly rated are slim.

The goal of the system is to find the stories which the masses rate most highly, and having a large number of votes makes it more likely that we can ascertain the masses true opinion on a story. It is a logical way to go about things, and also means that most of the highest rated stories on the site have been vetted by many thousands of readers. If a story has an especially good ratio, it can overcome having only a few hundred votes, but it needs to have a very good ratio indeed, and it is probably "wrong"; The Collected Poems of Maud Pie, for instance, at one point were incredibly highly rated, but as they got more exposure, they have gone to merely being very highly rated; they had extremely few downvotes early on, but now that they've been viewed by a larger number of people, the true upvote:downvote ratio is becoming more apparent and it is clearly not the 99% which it boasted earlier in its career (though it is still north of 98% at the moment, and I think likely will have a very high ratio in the long run).

I don't think that the way the system works is unreasonable; given that stories with lots and lots of votes trend towards a upvote:downvote ratio south of 99%, and indeed not a single one has managed to sustain one in the long run over 1k upvotes, we should be extremely suspicious of stories which have a small number of votes but a higher upvote:downvote ratio.

2435374

I disagree that it underrates stories with few votes; it rates them lower because downvotes are such rare events for many stories that ending up with a high ratio by coincidence is much more likely.

Certainly we want to rate them lower than simply u / (u+d). But the solution I provided gives the exact probability, assuming a flat prior, that the next person to rate the story will upvote it, and it takes that into account. A mathematician friend of mine informed me that the formula I proposed is called Laplace's Rule of Succession, and the proof that it reduces to (u+1) / (u+d+2) is given on its Wikipedia page.

If you use that rule, you will, as you say, get a list of top stories in which more stories with few votes occur by chance, than stories with many votes. But this effect would be small; a story with 100 upvotes and no downvotes, or 300 upvotes and 1 downvote, would barely make the first page. Whereas the current ranking has a much stronger effect in the opposite direction: Stories with few votes occur on the list much less often than they should.

Using the Wilson score interval method is reasonable, but whatever confidence level knighty is using is too high, as shown by the dramatic underrepresentation of stories with few votes. Given that there are many more stories with few votes than many votes, we'd like to see at least as many stories with few votes on that list. (Besides, one point of the list is to help people find stories they haven't heard of.)

2438565
He uses a 95% confidence interval, as I noted.

I think the point of the system is to find the best stories on the site; really, any public rating system is going to be poor at finding hidden gems (and not very good at finding the best stories either), and, as I noted, the fact that there are stories with under 1000 upvotes with better than a 100:1 ratio but no stories with over 1000 upvotes with such suggests that the stories which have a better ratio which have less than 1000 upvotes have such because of chance, not because they are, actually, better - and indeed, my own very highest rated stories both had better than 100:1 upvote:downvote ratios which dropped as their views increased, and I think that is the case with many of the very highest rated stories.

The most viewed and highest rated stories are not all that similar, which is a good thing, and I find that the highest rated stories on the site are generally at least passable, whereas the most viewed stories range from terrible to fine.

I think it is rational for the system to function in the manner that it does; the fact that the vast majority of stories on the site have low ratings really doesn't mean that the system should smile upon such stories, because the vast, vast majority of stories which get very few upvotes are bad. The fact that there are indeed good ones mixed in with the bad ones does not necessarily mean that the system is broken; it means that the system is not in fact soley based on goodness, and is also significantly influenced by popularity. But that's inevitable with any public rating system, and in any event, the general exclusion of clop from the highest rated stories on the site suggests that the system does indeed filter out popular drek to at least some degree. It is true that popular stories which aren't particularly great but which are not very likely to be downvoted (A Hell of A Time or How To Preen Your Chicken) end up overrated in the present system, but that is likely under any system where downvotes count - and in a system where there are no downvotes, the highest rated list would very closely resembled the most viewed list.

If we worked to better promote good, underrated stories, it might help make the system work better. Seattle's Angels is devoted to this, and Equestria Daily, The Royal Guard, and even Twilight's Library can help. But part of the reason some good stories are so little viewed is who wrote them, and part of it is poor advertising - I think that having something which grabs people makes a very big difference. It is all about hitting popular stories and the feature box - stuff which hits the feature box in particular is probably much better sorted in terms of quality than stuff which doesn't, because feature box exposure greatly increases the likelihood of a story being seen by someone who isn't automatically going to upvote it.

2430755 I get the simple up/down system, but I really wish there was a "neutral" vote. "I read this, and think it deserves neither accolades nor scorn: my opinion has no weight in this system." Is it plausible to think they might add this option, or would it screw up their statistics too much?

2565998
You can simply not vote on something, which is what I do in cases like that. In fact, I don't vote on things roughly as often as I give upvotes and downvotes, I think.

It obviously doesn't alter the rating... but, well, should it?

Comment posted by Titanium Dragon deleted Oct 31st, 2014

2566544 To someone looking at the voting record for hints as to the opinions of others, we might as well not have an opinion on those stories. I don't like that.

2567157
Well, you can always look at the vote:views ratio; a story which has very few votes relative to its number of views is probably not very good.

2567262

The votes-to-views ratio is quite informative, and it's unfortunate that metric isn't part of a rating system here. When it comes to a reader's reaction to a story, there are currently four they can pick from: upvote, downvote, in-between vote, and no vote at all. Unfortunately, the last two are confounded together. This somewhat blurs the use of the votes-to-views metric, but I still think it would add further refinement to the ratings system.

One thing I've been thinking of is the possibility of adding multiple rating algorithms to the site. Just as we can select from multiple display formats now, multiple algorithms could be selected from to view the top stories with different eyes. This means more stories could be given top exposure, depending on what method is selected by each reader. And readers that find one method that best fits their own opinions of stories would have a more fine-tuned tool to discover more.

2626296
Views can't be used in this manner because there are a lot of people who read stories who don't have accounts; this is especially true of stories which get linked externally, such as on reddit, Equestria Daily, TV Tropes, ect. Views don't really tell you anything about the story, because you can't tell why they didn't vote, and punishing people for having non-members read their stories (or alternatively, rewarding people for reading their stories and then clicking away without voting) is problematic.

Moreover, as was noted by Knighty when he created the up/downvote system, most people didn't use the rating scale, and just pegged it at one end or the other. Thus, adding more options isn't really helpful when people don't use them in a consistent manner.

A system which would be helpful would be something along the lines of Amazon's "people who bought this" thing, where "people who read this story also read" blah, blah, and blah. Or "people who favorited this story also favorited" blah, blah, and blah. Or, using the user recommended stories box, "people who recommended this story also recommended" blah, blah, and blah, though the last one is tough because most people don't really use those, and most stories aren't on anyone's recommended stories.

I've actually implemented something like this on a few of my stories, where I recommended some other stories I've written at the end of them which I felt that they would enjoy.

2626648

I'm pretty sure user accounts that view stories have that information recorded, if for no other reason than to keep the system for counting multiple reloads by one user as multiple views. So views from offsite could be separated from the views of people that can, at least in theory, vote.

But yes, as I said, even just using registered user's views, you can't separate the non-voters from the null-voters, so it's mostly a wash. Would still be interesting to see how that would adjust rankings. Not that I'm advocating for it, just sayin'... :)

all I want to know is how to undo a downvote without having to upvote. I did a bad when I was in a bad mood and an author and their story pissed me off. I undid most of the downvotes, but those were all on the comments, not the stories, I realize what I did was bad, and I just want to fix it.

4224111
The only way to unvote on a story is to message a mod and ask them to remove your vote on the story. You can switch a downvote to an upvote, but not manually unvote on a story for whatever reason.

Login or register to comment