• Member Since 9th Feb, 2013
  • offline last seen November 2nd

chukker


More Blog Posts1

Sep
3rd
2016

Analyzing Fimfiction data with Tableau and postgres, part 1 · 4:33am Sep 3rd, 2016

I've started playing with Tableau, and figured that the Fimfarchive data would be a good place to start. Here is a teaser graphic:

https://public.tableau.com/profile/publish/fimfarchive/FimfarchiveAnalysis1

What are some initial observations?

My Little Dashie is the most liked and most viewed story on the site, although the view count is slightly skewed since it is only one chapter. By other metrics Past Sins has more views, but I don't have an easy way to get at that data (see Notes below).

There are 69 stories that are longer than War and Peace, which is about 575k words, although this will vary by translation. The longest story is The Chase, with 2,333,348 words.

There are 38946 stories where fimfiction does not report likes/dislikes because there are not enough likes/dislikes yet. I will likely exclude these from future analyses.

I'm planning to continue updating this.

Notes:

All data comes from Fimfarchive, which is based on the Fimfiction API. In particular this means any views are based on the story view number, and not the mouse-over "total views."

I'm happy to share the code and/or schema I used to get the data into postgres. It is a rather nicely normalized schema, if I do say so myself. That might be an easier format to query than the json that Fimfarchive provides.

Comments ( 4 )

Awesome! Thanks for taking the time to do this!

Tableau seems to be getting pretty popular recently.

Are you using the Tableau free trial? What are its terms? Its website doesn't say. What do you think about Tableau so far?

4190834

Tableau is a very powerful tool and I'm probably not using it to its fullest extent. I'm using the 14 day free trial, but have used Tableau Public before, which has two limitations: 1) It can only read from CSV, XLS, and a few other files (i.e. it cannot connect to a database), and 2) you can only save the the tableau public server, which means anyone can view your results. Making things public is fine, but it's a pain exporting from a database to csv and them importing it to Tableau - I've had problems with types (booleans in particular are a pain).

It's possible to do most of the work in Tableau (derived columns like rating = (likes-dislikes)/(likes+likes) and even cross-database joins), but I prefer to do that as a database view and just use Tableau for the visualization.

We've used it at work to good effect - it makes it very easy to drill down into various subsets of customers/accounts/etc. For this example I think the coolest thing is how it shows the outliers - My Little Dashie and Thunder Struck. I'm still curious how Thunder Struck has gotten that many views without a corresponding number of likes.

The thing I don't like about it is that it's difficult to setup a repeatable analysis without a lot of clicking around in the UI, but that's what you get for it being a graphical tool. I should probably spend the time and learn how to use R, matplotlib, or gnuplot better and see how those work. In fact, writing a wrapper around those to approximate some of the Tableau functionality could be a fun project.

The various terms are here:

http://www.tableau.com/legal

4191859 I can tell you not to bother trying to plot in Perl. Most of the plots I've done on my blog over the past 4 years were in Perl, but all the Perl plotting libraries have some severe limitations and almost no documentation. Matlab is far, far better. R probably is too, but I haven't used it much because of its wacky syntax.

Login or register to comment