• Member Since 15th Jun, 2013
  • offline last seen Yesterday

Fimfarchive


Third-party archival project, do not contact Fimfiction staff for support.

More Blog Posts48

Jun
1st
2018

Fimfarchive 20180601 released! · 7:33pm Jun 1st, 2018

Twentieth release with 166411 stories.


The complete archive.
Name: fimfarchive-20180601.zip
Size: 5.4 GB
Torrent: Official (SE)
Direct: Official (SE), Mirror (AU)
Magnet: magnet:?xt=urn:btih:466a629327f8aa4c122bf5be1de02649f249e492
MD5: 7c593fa4b7f5130f012fb59f65aa8b12

The xdelta3 patch.
Name: fimfarchive-20180307-to-20180601.xdelta3
Size: 0.4 GB
Torrent: Official (SE)
Direct: Official (SE), Mirror (AU)
Magnet: magnet:?xt=urn:btih:db84a60f047ba739354b3a9a88e4b5e1742849c7
MD5: ae21524e436d0760f8842e29c66d1422


This is not an official Fimfiction project!

Do not contact Fimfiction staff for support regarding this archive.

Post questions as a comment or private message to this account instead!


Yet another regular archive update!

Report Fimfarchive · 1,132 views ·
Comments ( 15 )

πŸ‘πŸ‘πŸ‘thank you!

:heart: Thanks for keeping these up! Seeding the torrents as usual :pinkiehappy:

Links for people in AU/NZ direct download [Via Browser or FTP]:

ftp://spookelton.net/FimArchive/Files/fimfarchive-20180307-to-20180601.xdelta3

ftp://spookelton.net/FimArchive/Files/fimfarchive-20180601.zip

4874293
4874337

You're welcome!

4874445

Awesome! Hope you like it!

4874575

Thanks a ton! I appreciate the redundancy!

I'm curious how you're handling the changes in the story download format. Downloaded stories have different markup and also now include author's notes, which were excluded before. I'm not sure exactly when this happened, but mostly I'm interested to know whether stories which haven't otherwise changed since the format change have been updated.

Thank you:pinkiehappy:

4876368

I've been trying to keep stories as consistent with each other as possible across all releases. When there were styling changes in the past I replaced the style sheets in previously downloaded stories. I've also had to switch between story downloaders (see this, this, and this) in order not to lose text formatting.

Since the tenth release I've been passing stories through Calibre, which has helped with keeping things consistent. There may be differences between Calibre versions though, and I only pass each story file through it once. This is to minimize the chance of stories being affected by different sets of bugs from different Calibre versions. Every now and then I perform a complete refetch of the archive just in case I've missed something.

Author's notes have been part of the archive since 20171203 when I switched to APIv2. Story data is currently taken from the content_html and authors_note_html fields of the chapter resource. Those are then stuffed into EPUB files and passed through Calibre to make sure they're as compatible with e-readers as possible.

The source code for all this will be available in the Git repository when I have cleaned it up a bit. I can slap it together in a ZIP-file if anyone is in a hurry though. :rainbowwild:

4877711

You're welcome! :twilightsmile:

4876368

Oh, and if you want to know which stories haven't been updated since the APIv2 stuff, you can look at the index. This requires some programming experience though. The below example is using Fimfarchive as a library to check when "The Greatest Equine Who has Ever Lived!" was last updated.

>>> from fimfarchive.fetchers import FimfarchiveFetcher >>> fetcher = FimfarchiveFetcher('fimfarchive.zip') >>> story = fetcher.fetch(9) >>> story.meta['archive'] { 'date_checked': '2018-05-01T19:02:43.400897+00:00', 'date_created': None, 'date_fetched': '2018-05-01T19:02:43.400897+00:00', 'date_updated': '2017-11-01T21:27:29.364912+00:00', 'path': 'epub/s/sethisto-18/the_greatest_equine_who_has_ever_lived-9.epub' }

Here's what the fields mean:

- date_checked: When the story was last checked for updates.
- date_created: When the story was first added to the archive.
- date_fetched: When the story meta was last successfully fetched.
- date_updated: When the story data was last updated. This is only done when necessary.

Most stories are missing the date_created value since I haven't been collecting that information for very long. It would be possible to retroactively restore it since I haven't been throwing away anything, but I just haven't done that yet. All four timestamps will be exactly the same if the story is new to the archive. You could also, for example, check if a story was updated for this release by checking if date_checked is equal to date_updated.

4878467
4878417
Thanks for the detailed info! Thanks for providing this archive as well! :twilightsmile:

It sounds like the answer is "some stories are potentially lacking author's notes, (if they became unavailable before the switch to APIv2) you can find them by checking if (date_updated || date_created) < 2017-12-03".

4878484

Almost! In this case it would be enough to check if date_updated is null since the archive meta data was missing before the change to APIv2. It might be worth checking if date_updated is before 2017-11-01 too, for future compatibility reasons. The date to test for is 2017-11-01 since the first story in release 20171203 was checked at 2017-11-01T21:27:29.364912+00:00.

I would do something like...

>>> import arrow >>> from fimfarchive.fetchers import FimfarchiveFetcher >>> >>> fetcher = FimfarchiveFetcher('fimfarchive.zip') >>> >>> def test(story, target=arrow.get('2017-11-01')): ... date_updated = story.meta['archive']['date_updated'] ... parsed = arrow.get(date_updated or 0) ... return parsed < target >>> >>> old = [story for story in fetcher if test(story)] >>> len(old) 46318

Any update on archiving with the images? Do you know how big it would be?

4901753

No progress on that yet, sorry. I'd estimate it to be around ~80 GB or so based on numbers I've heard before. Can't do more than guess unless I actually fetch them though.

4903897

Well are you accepting pull requests? What would you like the archive to look like if it did include images? Probably a different zip file, maybe only with the imageful-epubs?

4903899

I'm open to suggestions, and code! But be aware of the following things:

  • I have a whole bunch of code I haven't cleaned up and checked in from the APIv2 migration. Most of the work I've been doing recently has been to clean that up, test it, and check it in.
  • The stuff on GitHub is essentially a rewrite of the original code. It is incomplete, and there is no way to actually build an archive with it. So, I have to send you the old build scripts too if you want to actually build a release from it.
  • I encourage people to use data from the archive itself, not re-downloading it from Fimfiction. Making all stories available to anyone without having to hammer the servers is one of the reasons I'm doing this. I can't and won't try to stop anyone, but you would still need to request an APIv2 key from Fimfiction staff.
  • There's pretty much no documentation, except for doc strings and type hints. I'm available to answer questions though.
  • I'm considering changing license from GPL-3+ to either LGPL 2+ or LGPL 3+.

If I were to implement this right now I would have created a Converter. It would take a Story instance containing EPUB data without images, and return one with images. The structure of the archive itself wouldn't matter in that case since the converter would play nicely with the other parts of the system.

I would store the images in a subdirectory within the EPUB file, probably using a checksum of the URL to the image as the name. The checksum would be of the original URL, excluding Fimfiction's CDN since that might change at any moment. The important thing being that we can easily check which URL resolves to which image without re-downloading it, and that we don't run into problems with overly complicated filenames. Maybe also adding additional information about the images, either directly to the EPUB file or to the Fimfarchive index. I'm not so sure that last part would be a good idea though.

I realize you're not promising to do anything. What I wrote above is something I have planned to do, but never gotten around to. I'm open to suggestions, but I think something like that would work nicely with the rest of the system.

Login or register to comment