• Member Since 14th Feb, 2012
  • offline last seen 57 minutes ago

Chris


Author, former Royal Canterlot Library curator, and the (retired) reviewer at One Man's Pony Ramblings.

More Blog Posts115

Jul
27th
2023

Friend Computer Says a Lot of Things, Some of Which are True (By Accident) · 12:08am Jul 27th, 2023

A few hours ago, Lucky Dreams posted a fascinating blog about an AI review group that's posting AI-written reviews of stories (he choses to take the no naming/no shaming approach, and I'm going to follow his example by being nice and vague and not linking directly to anything). Anyway, you should go read that, then come back here if you're interested in a little following-up. Heck, this whole blogpost started as a comment, but it got way too big for that.

So, this group is something I was vaguely aware of, but Lucky got me to take a look (because I am both stupid and a contrarian, and so when someone says "I DO NOT encourage people to go looking for the group," the first thing I do is go looking). Lucky does a good job talking about what the reviews are, and why they are, frankly, useless (my word, not his--seriously, go read his post first so you've got the context for this!) Then, because I am an egomaniac, I tried taking one of my stories and using their preferred AI and asking it a few unchallenging questions of the sort that their reviews include--or rather, consist of. So now I have a story I'm intimately familiar with, with which to talk about the idea of "AI-generated reviews" with a little more specificity.

I chose Going Up, because it seems like it should be pretty easy to review: it's a short, straightforward humorous/heartwarming story. Plus, plenty of actual humans have read and reviewed/offered comments on it, so those can be compared to the AI. Also, the story's on my mind the last few days, since it's included in the For the Love of Faust anthology which is going to be available at EFNW. Man, I'm linking to everyone's blogs today!

After feeding the AI a .txt file to use as reference (sidebar: I know you have to be careful what kind of stuff you feed AIs, but I have no doubt that FiMFic has been scraped many times and this story is over a decade old; while I don't love the idea of my writing being used for machine learning, I'm quite sure that that ship has already sailed for this particular piece of text, so w/e) and asking it a few general questions to make sure we were on the same page, I dove in with the "review" questions. Other than cutting out those first few questions, I'm including everything "we" said, in order, with a bunch of breaks for extra commentary. So, let's see what me and the AI came up with:

*****

This is... fine. It's bland and soulless, sure, but it's broadly accurate. Point 1 is a gimme, and point 2 is a bit questionable (Carrot Top gets dragged into "her dreams," and she ultimately appreciates that, but does she follow them?) but is defensible enough that I wouldn't call it out if an actual reviewer posted it, but point 3 is actually bang-on. Off to a good start!

Apparently, "specific and concrete" means, "quote a block of text." Which would be a fine approach... if it didn't quote two paragraphs, when it's obvious that its praise only applies to the first one. And even that first paragraph isn't really a good example of what the AI claims I'm going well; it's not vividly describing her feelings, it's vividly describing the noise she made.

I let this slide, though, because I will never correct anyone who wants to praise me I had bigger fish to fry. On to the next question:

So... if you haven't read Going Up, or if you have but it's been a hot minute, that might sound like a decent response. Maybe a little nitpicky, but the sort of thing a reviewer might point out. There's just one problem:

So... better?

Now, let's pause a moment. The obvious problem here is that the AI started out making a factually inaccurate pair of claims about my story, and it's easy enough to see how that could lead an overtrusting writer down a bad, or at least useless/irrelevant, path. But also look at how it responds to my (fairly gentle!) pushback. If I were writing a review and got called out for this... well, hopefully I would never be called out for flat-out making things up in my review, but if I got called out for questionable analysis like in the "re-examining" part, I'd probably start by apologizing for using a bad example, then go find two or three more relevant examples of what I was talking about, to show that it is indeed a systemic issue, and that even if my particular example was bad, the story itself could still benefit from having more buildup, or whatever it is I'm suggesting.

But of course the AI can't do that, because it doesn't have a holistic view of my story. Instead, when confronted with any correction, it immediately folds like a house of cards. So now we have a situation where an author who is at all deferential to the AI is getting bad advice, and one who is at all critical of it is told that actually, their work is basically perfect.

Think I'm exaggerating?

So, that was disappointing and a little frightening. But now let's move on to something that the review group doesn't seem to ask, but which I think is really important: audience.

Back when I was regularly writing ponyfic reviews, I always tried--whether I was writing long(er)form reviews or just a paragraph or two--to say who I thought might enjoy the story. Because that's the whole point, isn't it? Not every story is right for every person, and certainly not every story is right for me personally, but there are lots of fics out there that have an audience, and are well-suited to that audience, and getting those fics and those people matched up is what it's all about. IMO, anyway. So let's see what the AI has to say about that.

Okay, some of these are vague to the point of useless, but I guess that's at least partially on me for not offering a little more context on where/how this story is likely to be encountered. But that second bullet point caught my eye, and here's where I think this post gets really interesting.

See, Going Up was originally written for a Care Package event, where stories were being solicited in contest format to be sent to a six year-old ponyfan and cancer victim Kiki. Now obviously, the story's been much more widely read than that, but it was written specifically for her--or rather, for her parents to read to her.

Well, that's nice to hear, but that third bullet point is a red flag. See, Going Up has plenty of big words, and one of the more significant criticisms of the story (at least, in the context of its target audience) was that the vocab was just too much.

So I mentioned those criticisms...

...Then mentioned that I tend to disagree (which isn't to say it's not a valid criticism, just that I come down firmly in the camp of "hearing your parents read big words is how you learn them yourself," and is why I didn't change that aspect of the story in response to those criticisms back when I wrote it)...

...And since the AI seemed a little confused, I clarified that the problem wasn't that the critics didn't understand the concept of "having a parent read to you"...

...At which point the AI apparently got tired of agreeing with whatever it thought I said most recently, and threw up a "guess not everyone agrees."

Which is true! Of a lot of things, in fact! But the whole point of a review is to offer some sort of opinion--whether that's an opinion on what would make the story better, or who would like it, or whether someone should read it, or whatever. With all the twisting on whether the story was fit for a six year-old, I felt like I hadn't gotten an opinion, even a milquetoast one.

So I asked the AI (I note, as I'm typing up the commentary, that I started off my "conversation" with the AI by simply asking it questions, but as I went on I increasingly started writing to it like it was a person--calling it "you" and whatnot. Scary how fast we (well, I) fall into that personification trap) to give me an actual opinion, as in, to make some kind of a judgement rather than just regurgitating what I said that I and the reviewers thought.

It went as well as you'd expect.



(the image is cut because I'm working on a small-ish screen and had to take two screenshots to get the whole response, btw)

I didn't think much of that response. So I said as much.

And to nobody's surprise, what the AI came up with is "yes sir, whatever you say, I agree with you."

I'm not gonna lie, it's nice to be told you're right. But even if I am right about the audience-appropriateness of my story, this AI's commentary certainly isn't the proof. It's not even evidence. It's obsequious word salad arranged in a crude facsimile of useful commentary. And it's right here where I realized that not only was I not getting useful commentary on my story, I was actually getting annoyed.

I teach middle-schoolers for a living. I get enough annoyance in my life when I'm getting paid for it.

And so, that's where the AI review experiment ends.

*****

Despite that negative wrap-up, I don't actually have a problem with using AI for a lark. The FiMFic group in question seems to be operating on at least two levels of irony, and that's fine! Much like a trollfic, most people will find it baffling and offputting, but if it's what you find funny, you do you.

But as a tool? It's more than useless. I'm not even talking about "AI can't write your story for you" or whatever, because duh. But it also can't tell you what to write, or what you did write, and it certainly can't tell you why it was written, or what it's trying to accomplish, and it really, really can't tell you how to alter what you've written to more accurately accomplish your aims (you know, like an actual author-oriented review does), because it does not understand... well, anything. Machine learning can do a lot of things, but one thing it explicitly cannot do is comprehend. And writing a story is fundamentally an act of attempting to be comprehended. We write stories, at a most foundational level, because we have something that we want another person to understand. We might fail to be understood, or be incompletely understood, or even be totally misunderstood, but we are always, always trying to make something understood.

AI (at least, the category of AI we're talking about here) cannot understand. Thus, it cannot meaningfully review your story.

Coulda saved yourself a lot of time if you skipped down to the end. There's a lesson there, I think, but I'm not going to bother asking the AI what it is. And there's certainly some humor in this wrap-up, but I wouldn't bother asking an AI about that, either.

It wouldn't get it.

Report Chris · 344 views · Story: Going Up ·
Comments ( 9 )

Nice to hear more opinions on this whole business. n_n

For the most part your analysis seems pretty spot-on as a breakdown of what current LLMs are and aren't capable of, but I feel like the conclusion doesn't follow from the rest of it? You made this sudden jump from "this particular current AI can't do the thing" (which seems accurate, and in-line with my picture of what most state-of-the-art RLHFed LLMs are like right now) to "machine learning / AI-of-this-sort are categorically incapable of the desired sort of comprehension". And that seems... at the very least highly overconfident, even if not outright already-known-to-be-wrong.

The consistent trajectory of language models over the last half-decade has been extremely rapid improvement. Fourish years ago, GPT-2 was new and everyone was amazed at the degree to which it sounded like not-a-dumb-Markov-chain; twoish years ago, GPT-3 was new and everyone was amazed at the degree to which it could, in its better moments, put on an actually-convincing imitation of a human writer for a paragraph or two, not to mention its capabilities in areas like math and chess and so forth; now we've got the whole big proliferation of modern LLMs as far ahead of GPT-3 as GPT-3 was ahead of GPT-2, acting sufficiently personlike over sufficiently long timescales that people keep on mistakenly treating them as if they were humans and thus making mistakes like assuming that when they say things they're doing it on the basis of thinking those things are true.

Modern LLMs still fall short of the necessary skill-set to engage with fiction in a high-quality way, both as writers or as readers; but the idea that the architecture is fundamentally incapable of doing so strikes me as about as ill-founded as the ideas of the various people a few years back who were very confident about the limits of LLMs' ability to solve math problems. There's a very big difference between areas where the models aren't yet competent and fundamental architecture-level incapabilities, and I would bet with pretty high confidence that inability-to-usefully-analyze-fiction is the former rather than the latter.

(There are some reasons to think that, even if the architecture is theoretically capable of doing the thing, we won't be able to train it to do the thing in practice, at least barring a bunch of algorithmic improvements more complex than the brute-force "increase scale" approach that's been beating out all its competitors for the last few years—from what I've heard, they're running out of text to train their models on, because although the internet is large it's still finite and they've been pretty thorough about scraping all the usable training text they can over the last few years, such that it might not be possible to scale up the training-datasets much farther at this point—but that's more in the realm of pragmatics, rather than of architectural limitations of any deeper sort.)

I haven't played with this yet (and probably won't), but I've played with AI some and it's been interesting, to say the least. It did come up with a generally mediocre story for me, but I'll give it credit for inventing appropriate pony OCs for the setting.

And it suggested that the ponies would have translation collars, which is something I personally haven't seen in a Pony on Earth story. It's not an original concept, obviously, but I haven't seen it on FimFic.

This is great, Chris. I've definitely had a morbid curiosity about that group's reviews, but I've got a bit too much self-respect/stubbornness to submit one of my own fics for them to "look" at.

(also knowing the people who run the group, I'm not actually sure how much irony is really involved)

I'm firmly reminded of a Douglas Adams gag in the Dirk Gently series: Reason, WayForward Technologies' first software success. It's a program that starts with the conclusion you want and then crunches the facts you feed it to make sure that any set of premises - however unlikely - can be wrangled into the "right" result ahead of time.

The upshot is that you can then present as reasonable "anything that'd otherwise look like a botched mess of lousy planning by the criminally stupid".

Wanderer D
Moderator

I mean, if you want to see how incapable an AI is of understanding nuance in a quick and dirty experiment, just ask an AI to write a comedy that's actually funny and not a summary of events.

As I've mentioned on other blogs on the topic, there's a lot of talk about how AI will occasionally "hallucinate" something about the topic it has been directed to discuss. People who claim this seem to believe that it otherwise understands its inputs and data, and just occasionally makes things up because it isn't refined enough. The fact is, everything these statistical models output is a "hallucination"; it's just that they sometimes happen to hallucinate something approximating reality.

Quite honestly, I'd be slightly surprised if most people reading a blog like this (or Lucky's) didn't know which group you were referring to here. It's not as if they've been shy about promoting themselves. I don't like them because I feel there's an unspoken bargain in reviewing: if you're going to give your unsolicited thoughts about someone's fic, then they should be able to give their response to what you wrote. You can't do that if the "review" was churned out by a machine.

5739516
This comment caught my eye. I'm a very long way from being an expert on AI development, but I can't shake the feeling that a lot of people are looking at what ChatGPT etc can do now and saying, "Is that it?" Well, no. That isn't it. That may prove to be like asking the same question about the microchip in the early 1970s. I'm not a betting pony, but if I were I'd have at least a moderate wager on AI-written (or at least AI-assisted) fiction taking a large chunk of the "airport novel" market within a decade. Quite probably also the kind of wall-hung pictures you pick up in discount stores. People don't buy those because they're a beautiful and thought-provoking look at the human condition. They buy them because they want something to fill up that irritating wall space by the kitchen.

I am in favour of Fimfiction's strong line against AI fics, since for me the human element is really important. I refuse to review AI-written fics and my policy is that if I am kidded into reviewing a fic that later proves to be AI-written, it will get zeroed and its author will be blacklisted. But there are quite clearly a number of readers for whom the human element is less important, for whom easy, trashy, no-effort reads are welcome -- again, see airport novels. It's fine for a Michelin restaurant to say it doesn't want to serve Cheetos. It's not fine for the chef to say that anyone who likes Cheetos should be sneered at.

There's a lesson there, I think, but I'm not going to bother asking the AI what it is.

You should dump this whole blog into that AI and ask it exactly that. I'm morbidly curious what it'll regurgitate. (Yes, I am the devil on your shoulder telling you to do what you said you wouldn't.)

This was an interesting read. Like you, I guess I'm not that surprised. I was surprised by how fluently talked and how well it was able to spit out relevant info, but yeah, there's a clear gap in comprehension once you started really asking it questions (and the backpedaling was worse than a politician, yeesh). Thank you for this little experiment.

Login or register to comment