• Member Since 9th May, 2013
  • offline last seen Yesterday

publiq


"To the everlasting glory of those few mares blessed and sanctified in the curses and execrations of those many whose praise is eternal damnation." I'll typeset your story in ConTeXt if you ask.

More Blog Posts16

Jul
20th
2022

Research proposal, free to a good home: The Structure of Derpibooru [plus Publiq Publishing Plans] · 3:22am Jul 20th, 2022

The Structure of Derpibooru?

After attending Bad Horse's excellent talk on The Structure of Fimfiction at Trot Con last Friday evening, one of the questions was "why not do the same analysis but for Derpibooru?"

Let's start with

Literature Review

Or, put another way, a comparison of the relevant site features between Fimfiction & Derpi and would affect the social structure of the two sites.

  • The most immediate structural difference is that Derpi has dedicated forums, while Fimfiction decidedly does not. As some of you may have noticed, there are some Derpi users who seem to be exclusive forum-dwellers. However, we can probably treat Derpi forums like Fimfiction groups: they exist, but they're not why uses are here.
  • Fimfiction has a much more restricted set of tags than Derpi, which has over 200,000 tags that have been used more than twice. More on tagging later.
  • Derpibooru, unlike Fimfiction, is quite open with its public-facing data. Not only do they have an API, they also encourage you to download a nightly database dump for extended analysis sessions. No futzing with scraping to gather data over there.
  • Both sites do have a similar phenomenon where users exist in distinct interest bubbles to the point where you can be top 10% by user activity yet still have never heard of someone else who is ten times as addicted as you are simply because your paths never cross.
  • Finally—importantly—, while Fimfiction users follow one another, Derpibooru limits subscription options to following tags or watching a thread (either in the forum or the comments of an image).

Why put it out as a research proposal instead of doing it yourself

  1. I wouldn't know where to start. Gather basic statistics and make a graph of upvotes compared to rank of image popularity? Sure. Find some way to do a PCA of the most important image tags without committing statistical malfesiance? That's left to someone else [or at least for someone to show me proper methodology so I can ape it].
  2. The DB dumps take up 32gigs uncompressed. Not a showstopper, but that gives me pause.

Various questions to explore [with annotations + suggested methodologies/joins]

  • Log-log graph on the popularity of tags. There are about 201,550 tags used more than twice; 100,700 used ≥10x; 20,700 used ≥100x; 3457 used ≥1000x; and 513 used more than 10,000 times.
  • Benford's law analysis on tag, favorite, and vote totals.
  • Tag-based interest clustering. I'm not entirely sure how the methodology would work, other than it would essentially use users' favorite images (or commented-on images) almost as a pivot table between tags and users. There is no direct way to measure which tags a user follows without access to the actual database.
  • Rerun some of the above analyses with user comment data in place of user favorite data. Why? Users who comment can be assumed to be more passionate than drive-by favoritors and voters. However, I would advise only counting unique commenters under each image, as comment counts can become inflated by two users having a back-and-forth discussion. Ten comments by me under an image should count the same as a single comment left by me.
  • Various stats on most-favorited/voted images, distribution of votes/faves/comments per user, etc…

Hypotheses

  • There's a power law at play for tag popularity. It may actually be a Zipf distribution.
  • Fave counts and vote counts are much flatter—in spite of there being well over two million images on Derpibooru, the most-faved image is below 5,000 and the highest vote count is around 6,500. Interestingly

    1. The image rankings stayed mostly static when switching between most updoots, highest score, and most favorites
    2. The top 50 [at least with my usual filter on] were dominated by animations and clop.
    3. Sorting by top Wilson Score gave me an entirely new field of top 50 images. Still clop-heavy, but decidedly not dominated by animations.
  • Interest clusters exist. However,

    • I cannot say what they are until some analysis is done. There are obvious possibilities: FoE, EqG, anthro clop, show-style porn, comics, screencaps. The analysis will reveal the less-obvious clusters as well as whether some of these obvious possibilities ought to be merged.
    • Users that exclusively belong to a single cluster are rare to nonexistent. That said,
    • There are some clusters that do have de facto mutual exclusivity.
  • User stats (in terms of the distribution of faves/votes/comments per user) are fairly similar to Fimfiction.

Publiq Publishing Plans

Since I'm writing, I may as well give some heads-up on future projects.

  1. Over the next week or three, I am going to trawl through my longposting in /r/mylittlepony and reformat them into BBCode for somewhat daily posts to my Fimfiction blog. Lots of interesting worldbuilding details collected in discussions between 2017–2021 (the subreddit has been relevatively bereft of discussion in 2022, sadly).
  2. There's a good chance I'll have an entry for the Twilight Files contest. It could double as a teaser preview for a megaproject I may never write.
  3. By the end of August, I'll have written at least one story for the G5 contest. With motivation, I'll also have written two clop fics to accompany it.
  4. Speaking of August, if any of you want to say hi at Gen Con, feel free to PM me.
  5. In the fall, I want to expand and clean up some WIP stories. Some hints as to what lies ahead:

    • ~1500 words of SFW TwiLuna fluff that I expect to have expanded to 3000–4500 words by the time I've cleaned it up and added the scenes cut due to time constraints.
    • A gay Anon+Kirin story, published as three sequels. Why not a single story of three chapters? The themes and focus of each third is too distinct such that the transitions between them may be jarring. Working title for the series: Anon's Gay Adventure With A Green Kirin

      1. (currently unwritten) Anon's experience in the Acclimation Center and preparations for entering Equestria. Somewhere between E and T rated.
      2. Anon (or maybe his name is John: "Anon" and "John" sound identical when spoken by a kirin) is unknowingly hot for a male kirin but rolls with it anyway. Spicier T.
      3. Picking up immediately after the end of part 2, some wholesome gay steam between John and Forest Fall.
    • Adding one final chapter to last year's Jinglemas story.
    • The guy I wrote my Summer Sin story for liked it enough to say he'd enjoy another chapter. May or may not get around to this one
  6. I probably won't do NaNo, or at least will take a pass on writing a single novel-length story [even if I use it as inspiration for total word count for resurrecting various projects from my slush pile]. The aforementioned megaproject does have multiple novella-length stories embedded in it; however, it is incredibly intertwined to the point that I need to plot out the entire project before I start writing in earnest. Word from the wise: don't have more than a single time-traveling cohort in a time travel story. Any more and "before" and "after" lose their intuitive meanings.
  7. I will need to take a pass on Jinglemas 2022. I do not expect to have writing time between December and March.
  8. Speaking of the mega project once more, if I do get it off the ground, "real" chapters would appear in late 2023 at the earliest; realistically, sometime in 2024. Why? I'd like to develop software tooling to plot stories with time travelers. The existing market of story planning tools is meant for standard linear stories.

Stories frozen in the slush pile [don't count on ever seeing these, even by 2025]

  • Rarity Does Canterlot. Rather than directly connect it to my old clop, it'll instead be a continuation of another story in this pile.
  • Several quick cloppy troll/crack stories
  • A pair of Spitfire stories, working title of Spitfire in LaTeX. May want to split it into thirds. First, it's Spitfire stranded on a cloud over an empty ocean: sad-tagged. Second, she finds salvation in a crack encounter with a KC-46 that refuels her in a cloppy manner. Third (and bridging the gap between the previous two and Rarity Does Canterlot), getting stranded over open ocean and hit with a natural storm is enough to give any pegasus shell shock. Rarity makes an emergency trip to Canterlot to help out Spitfire and has a flashback to the events of Sonic Rainboom. Another [sad]-tagged feels fic to close the series. Ends on a hopeful note with Rarity leaving Spitfire's Canterlot penthouse to clear her mind with the pleasures of the Sunny City.
  • A MarbleMac clop that deliberately has no dialogue
  • A SFW Fluttershy+Vinyl Scratch story where the two leads do not have dialogue (but Fluttershy does talk her feelings out with her other friends + Angel)
  • A tribute to The 50-Year Sword
  • A silly horror about Pinkie getting trapped in an infinitely-large bathroom
  • I originally had like eight story stubs for Poképorns, but then I realized that the premise was the only worthwhile part of about five of them and collapsed them into diagenic texts of the surviving two. One features a human female and male vaporeon. The other is 100% gay. It features a male human, male feral Charizard, and male anthro Sprigatito. Possible guest appearances by anthro Torracat, feral Lucario, and feral Luxray [all male].

If you really want to adopt one of these ideas, I probably won't say no once you've already written it. The Fluttershy+Vinyl Scratch as well as the Rarity+Spitfire section of Spitfire's trilogy may be good candidates for Q2 2023 publication. Who knows when/if the rest will be touched.

Comments ( 1 )

I just began downloading the Derpibooru db dump. Doubtful whether I'll ever get around to analyzing it... but if it's simple, and if Postgresql is simple, maybe.

Login or register to comment