Fimfarchive 20180601 released! · 7:33pm Jun 1st, 2018
Twentieth release with 166411 stories.
The complete archive.
Name: fimfarchive-20180601.zip
Size: 5.4 GB
Torrent: Official (SE)
Direct: Official (SE), Mirror (AU)
Magnet: magnet:?xt=urn:btih:466a629327f8aa4c122bf5be1de02649f249e492
MD5: 7c593fa4b7f5130f012fb59f65aa8b12
The xdelta3 patch.
Name: fimfarchive-20180307-to-20180601.xdelta3
Size: 0.4 GB
Torrent: Official (SE)
Direct: Official (SE), Mirror (AU)
Magnet: magnet:?xt=urn:btih:db84a60f047ba739354b3a9a88e4b5e1742849c7
MD5: ae21524e436d0760f8842e29c66d1422
This is not an official Fimfiction project!
Do not contact Fimfiction staff for support regarding this archive.
Post questions as a comment or private message to this account instead!
Yet another regular archive update!
Thanks!
πππthank you!
Thanks for keeping these up! Seeding the torrents as usual
Links for people in AU/NZ direct download [Via Browser or FTP]:
ftp://spookelton.net/FimArchive/Files/fimfarchive-20180307-to-20180601.xdelta3
ftp://spookelton.net/FimArchive/Files/fimfarchive-20180601.zip
4874293
4874337
You're welcome!
4874445
Awesome! Hope you like it!
4874575
Thanks a ton! I appreciate the redundancy!
I'm curious how you're handling the changes in the story download format. Downloaded stories have different markup and also now include author's notes, which were excluded before. I'm not sure exactly when this happened, but mostly I'm interested to know whether stories which haven't otherwise changed since the format change have been updated.
Thank you
4876368
I've been trying to keep stories as consistent with each other as possible across all releases. When there were styling changes in the past I replaced the style sheets in previously downloaded stories. I've also had to switch between story downloaders (see this, this, and this) in order not to lose text formatting.
Since the tenth release I've been passing stories through Calibre, which has helped with keeping things consistent. There may be differences between Calibre versions though, and I only pass each story file through it once. This is to minimize the chance of stories being affected by different sets of bugs from different Calibre versions. Every now and then I perform a complete refetch of the archive just in case I've missed something.
Author's notes have been part of the archive since 20171203 when I switched to APIv2. Story data is currently taken from the
content_html
andauthors_note_html
fields of the chapter resource. Those are then stuffed into EPUB files and passed through Calibre to make sure they're as compatible with e-readers as possible.The source code for all this will be available in the Git repository when I have cleaned it up a bit. I can slap it together in a ZIP-file if anyone is in a hurry though.
4877711
You're welcome!
4876368
Oh, and if you want to know which stories haven't been updated since the APIv2 stuff, you can look at the index. This requires some programming experience though. The below example is using Fimfarchive as a library to check when "The Greatest Equine Who has Ever Lived!" was last updated.
>>> from fimfarchive.fetchers import FimfarchiveFetcher >>> fetcher = FimfarchiveFetcher('fimfarchive.zip') >>> story = fetcher.fetch(9) >>> story.meta['archive'] { 'date_checked': '2018-05-01T19:02:43.400897+00:00', 'date_created': None, 'date_fetched': '2018-05-01T19:02:43.400897+00:00', 'date_updated': '2017-11-01T21:27:29.364912+00:00', 'path': 'epub/s/sethisto-18/the_greatest_equine_who_has_ever_lived-9.epub' }
Here's what the fields mean:
- date_checked: When the story was last checked for updates.
- date_created: When the story was first added to the archive.
- date_fetched: When the story meta was last successfully fetched.
- date_updated: When the story data was last updated. This is only done when necessary.
Most stories are missing the
date_created
value since I haven't been collecting that information for very long. It would be possible to retroactively restore it since I haven't been throwing away anything, but I just haven't done that yet. All four timestamps will be exactly the same if the story is new to the archive. You could also, for example, check if a story was updated for this release by checking ifdate_checked
is equal todate_updated
.4878467
4878417
Thanks for the detailed info! Thanks for providing this archive as well!
It sounds like the answer is "some stories are potentially lacking author's notes, (if they became unavailable before the switch to APIv2) you can find them by checking if
(date_updated || date_created) < 2017-12-03
".4878484
Almost! In this case it would be enough to check if
date_updated
isnull
since the archive meta data was missing before the change to APIv2. It might be worth checking ifdate_updated
is before2017-11-01
too, for future compatibility reasons. The date to test for is2017-11-01
since the first story in release 20171203 was checked at2017-11-01T21:27:29.364912+00:00
.I would do something like...
>>> import arrow >>> from fimfarchive.fetchers import FimfarchiveFetcher >>> >>> fetcher = FimfarchiveFetcher('fimfarchive.zip') >>> >>> def test(story, target=arrow.get('2017-11-01')): ... date_updated = story.meta['archive']['date_updated'] ... parsed = arrow.get(date_updated or 0) ... return parsed < target >>> >>> old = [story for story in fetcher if test(story)] >>> len(old) 46318
Any update on archiving with the images? Do you know how big it would be?
4901753
No progress on that yet, sorry. I'd estimate it to be around ~80 GB or so based on numbers I've heard before. Can't do more than guess unless I actually fetch them though.
4903897
Well are you accepting pull requests? What would you like the archive to look like if it did include images? Probably a different zip file, maybe only with the imageful-epubs?
4903899
I'm open to suggestions, and code! But be aware of the following things:
If I were to implement this right now I would have created a Converter. It would take a Story instance containing EPUB data without images, and return one with images. The structure of the archive itself wouldn't matter in that case since the converter would play nicely with the other parts of the system.
I would store the images in a subdirectory within the EPUB file, probably using a checksum of the URL to the image as the name. The checksum would be of the original URL, excluding Fimfiction's CDN since that might change at any moment. The important thing being that we can easily check which URL resolves to which image without re-downloading it, and that we don't run into problems with overly complicated filenames. Maybe also adding additional information about the images, either directly to the EPUB file or to the Fimfarchive index. I'm not so sure that last part would be a good idea though.
I realize you're not promising to do anything. What I wrote above is something I have planned to do, but never gotten around to. I'm open to suggestions, but I think something like that would work nicely with the rest of the system.