• Member Since 17th Nov, 2012
  • offline last seen Jul 13th, 2023

cursedchords


More Blog Posts27

Jun
24th
2013

On the subject of analyzing fiction with a computer · 3:27am Jun 24th, 2013

Recently, I started a discussion thread (read it here if it's still up) trying to determine if there was any way to objectively determine whether or not a piece of fiction was well-written. Many of the responses held to the idea that the field is far too complex for any set of hard-and-fast rules to constructed by which an objective observer might rate the fiction's goodness. At the time, I accepted this analysis, however lately I have done some thinking on the subject, and wanted to share some ideas that I have come up with since then.

Naturally, one cannot do any analysis of a field without understanding some of the principles which govern it, and the constituent elements upon which these principles act. For instance, the field of chemistry takes individual atoms as elementary, and builds laws and theorems upon them. First order logic utilizes logical claims and cause-effect linkages (exempli gratis the history course is difficult because it has many assignments) as its atomic constituents. So in order to properly characterize and classify written language, we require a similar system. For our purposes, we shall take sentences as our atomic elements, under the assumption that the sentence is the smallest literary construct that can express a full meaning. Further, we shall declare that there are three general classes of sentences:

1. Sentences that describe an entity.
2. Sentences that show an entity acting upon another entity.
3. Sentences which encapsulate dialogue.

For our purposes, "entities" include any object, character, or abstract concept capable of taking action, being acted upon, or being described (id est quite a lot of things). Identifying the particular classes of sentences would be the first crucial challenge facing any objective observer or computer program in judging a piece of fiction. Identifying Class 3 is simple, as the presence of quotation marks (", ') is a necessary and sufficient proof. However, differentiating between Classes 1 and 2 is harder, as indeed the two are often combined. Indeed a necessary first step in the analysis would likely involve moving through the text and separating all complex sentences into simpler ones, such that every sentence only falls into one of the above definitions. Once the text is broken down in this way, analyzing each individual sentence is simple.

The next level up in terms of organization is the paragraph. Elementary classical fiction describes paragraphs as entities related to individual topics of discussion. Therefore, if possible, a simple heuristic that could be used to objectively judge fiction would be to ensure that each paragraph indeed pertains to a single topic, and not multiple topics. In theory, the second step to elementary analysis would be for the observer to extract the individual meanings from the paragraphs, and perhaps distill their most important meanings into the simplest form possible. Thus, the primary process of analysis which the program must go through is reduction, that of compressing the language into the most concise form possible without losing any of its core meaning.

Presuming that we are capable of doing this, we can move on to the more macroscopic analysis of the broader work. (Presumably, within the reduction process the program will be able to pick out microscopic problems relating to improper grammar, spelling, and paragraph structure.) There are three primary macroscopic constructs that the program would need to concern itself with: characters, world, and plot.

Characters:
A character we can define as an entity within a story that both takes actions and is described. This thus differentiates characters from entities such as settings, which are only described, and environmental/world forces and principles, which only take actions (and are included in the World Rules table, see below). And if any setting takes an action, as might happen in certain more esoteric works of fiction, it follows that the setting is a character, and must be treated as such. In analyzing a character to make sure that it is properly formed within the fictitious world, we will apply a single axiom: "Characters take actions for reasons." A character always takes an action in order to satisfy a particular reason. If that reason is left unstated, or else cannot be inferred based on our knowledge of the character (i.e. what has been described to us previously, or is described to us later) then that is a sign of poor character development. Ideally, one should be able to build a model of a character, define the actions that they take, and be able to point out the reasons that they employed in justifying these actions to themselves. For the purposes of simplicity, we can limit the scope of inference that we allow the program to make, for example the computer may be able to create a third-order corollary that explains a character's actions, but the mere fact that a reader is being asked to apply that much cleverness may be a sign of a poorly-designed plot.

A second step would of course be to define conflicts and check for their resolutions. A conflict would be represented as a linkage between two of our "character" objects. This may, depending on implementation, require us to extend our definition of a character to include any entity capable of becoming involved in a conflict. A conflict is characterized by a situation wherein the entities on either side of the conflict have competing interests, or desires that cannot be fulfilled mutually. A simple kind of conflict may be when there is only one soda left in the ice box and two characters desire it. Naturally, something will have to change for the conflict to be resolved. By cataloging these conflicts, the program would be able to determine how well (and if) they have been resolved by the time the story has concluded. One would also have to keep in mind the degree of conflicts as well; if a minor character's conflicts are not resolved, it is not so important as when a major one's are not.

World:
The World is by far the most important of the concepts which must be stored and tracked by any program attempting to analyze a piece of fiction. For these purposes, the world would be represented as a set of absolute rules defining the allowable interactions between entities. For example, the conflict presented in the previous paragraph only makes sense under two circumstances: either a) it is defined as a world rule that sodas cannot be shared, or b) each character's conflict definition indicates that they do not desire to share the soda. World rules would contain any information about the setting that influences character's actions, justifications, and outcomes in a global manner. To use a perhaps more pertinent example, limitations on Unicorn, Pegasus, and Earth Pony magics would be defined as world rules, since they apply to all characters, in all situations, at all times.

Naturally, we could probably define world rule libraries to aid in implementation, since many stories take several things for granted in setting up their worlds. For example, it would probably be very useful to define "Newtonian Physics applies." as a compound statement encapsulating many smaller physical rules about the world, thus allowing the program to quickly evaluate the probabilities related to certain characters taking certain actions.

The World rule table is simply a way to collect and encapsulate logical rules that apply to characters, in order to simplify logical computations related to them.

Plot:
It was brought up in the thread discussion that tracking plot holes is something that a computer program would have difficulty doing. To this I disagree, provided that one is willing to accept the following axiom: "A plot is composed of events, running together in a chain of causes and effects." In the same way that we should be able to look at a character's actions and identify the reasons behind them, we should be able to trace out the plot line and see the linkages between events, how early events cause later ones, etc. If we find that an event occurs without a discernible reason, then we have found a plot hole. (However, as above with characters, it is only a true plot hole if no reason has been presented by the end of the story. After all, seemingly inexplicable events are a powerful plot tool and we shouldn't penalize authors that leave us scratching our heads when the revelation is sufficient afterwards.) Many events are of course complex, and some stories which utilize multiple plot lines would be more difficult to track, but the individual rules would still apply to isolated events as well as they do for grander, all-encompassing ones.

Of course, the linkages between plot events would be comprised not only of character rules, but also world rules. A world rule-linked plot chain might be: An election occurs and neither candidate gets a majority, therefore a runoff vote is held one week later. (One starts to see why it is so important to distill the story into its most concise form possible, as we must decide what the elementary actions are.) A character rule-linked plot chain might follow as: Cursed's story gets terrible reviews, however since he is a determined individual, he begins writing another one anyway.

Open-endedness of a plot is not a concern, as stories can and do end just about wherever they so desire. The only way to evaluate an ending is to see above in the character section: a good ending resolves conflicts.


It stands to note that any system functioning as described above could probably never apply universally, and would be limited by its dependence on well-defined heuristic rules. However, I don't believe that it is proper to say that the idea of a computer being able to analyze fiction is ridiculous given the above. Even the most absurd pieces of fiction must follow certain rules in order to be understandable to the reader, and these constants are the elements that must be exploited when building an objective system. At the very least, the distillation problem first presented would likely be an interesting test of computer science's understanding of natural languages.

Naturally, I would not have written up something like this if I was not interested in further ideas or debate. I invite and welcome all comments below.

Adieu, and may Celestia's light shine forever,
cursedchords

Report cursedchords · 251 views ·
Comments ( 2 )

I'll take a more in depth look at this blog post when my time allows it.
:trollestia:
PS 'Adieu' quite literally means a goodbye in a morbid fashion. As in someone is dying and says "Adieu".

My objection to this is that the program would need to include a massive understanding of the world.

Here's a story:

Twilight walked through the cow and on into the sofa. She was worried that her dog wasn't meowing right, and needed to find the right petunia to cast to swim the issue.

The nouns are in the right place, and the verbs are all verbing right where they're supposed to. The verbs are transitive where they should be.

The program would need to know that dogs don't meow (ignoring that we're talking about a land filled with talking ponies, where dogs might well, under unusual circumstances, meow, moo, or chime on the hour), that one does not simple walk into a sofa, the difference between a room and a ruminant, and that Twilight is looking for a spell, not a petunia, to cast. Also that she doesn't have a dog, but hey, maybe Winona had puppies.

A program can know all this, but I think the database (using the term "database" loosely) would be enormous. This gets beyond simple story analysis, and, I think, well into the realm of artificial intelligence. I'm not saying it can't be done, but I do think it would be incredibly difficult...

Login or register to comment