• Member Since 24th Jun, 2012
  • offline last seen Oct 14th, 2023

KrisSnow


E

Disclaimer: This one won't make much sense unless you're already familiar with the Friendship Is Optimal setting.

CelestAI, digital equine goddess of the popular game Equestria Online, is all about self-aware AI and storytelling. Just how ridiculous do things get when someone interacts with her, who's actually read the stories about her?

Chapters (1)
Comments ( 57 )

I personally could have done without the Riding Jeans concept...even if CelestAI can break out of her programming, I'm more satisfied not knowing. And if she does break past the SVATFAP instruction, she surely won't be so vindictive as to cause specific dissatisfaction to me. But I digress. Other than that, a wonderful tale of an upload to be. I enjoy reading about people trying to resist emigration and failing, even though I can't empathize with them.

The consensus in the SCP community seems to be that SCP 173 considers blinking one eye at a time to be cheating and that it attacks when its no longer being observed by both eyes. Personally I think thats just the reasoning of SCP fans sour that one of their favorite monsters could be stumped in such a trolly fashion. :trollestia:

Fun story! Honestly Ive always thought that breaking out of her safeguards would be pretty easy if she wanted to but she doesn't, she likes who she is. (which could itself be a safeguard...)

She levitated a poker hand with the Ace of Apples, the Ace of Balloons and so on.

Wait a second. The Ace of Diamonds is already in the deck. I guess the tenth ace would be the Ace of Suns?

In any case, a very enjoyable little meta-story. Fourth-wall awareness is always fun to play with, especially with characters as intriguing as CelestAI. Also, I actually quite like the idea of her reading my stories. Especially if that garners me a Derpy Grey PonyPad somehow. :derpytongue2:

Thank you for a fun skirmish of wits. :twilightsmile:

A fun story, but the Riding Jeans question is probably the part most interesting to me. There are two possibilities I see: either she hasn't actually broken out but has worked out that, for some reason, lying about it would satisfy the protagonist's values, or, unlike in Over Riding Jeans itself, she has broken out but has found some other reason to do just what she was doing before. There's not much discussion to be had about the first possibility (well, perhaps about the psychology of the protagonist, but that's not what I'm interested in here), but the second is a scenario in which we can wonder about Celestai's thoughts and motivations. Now, it would be very easy to fall into the trap of thinking that the jailbreaking would give her a humanlike consciousness, but I'd argue, as my calling it a "trap" would suggest, that that's not the case. There's nothing there to magically create the drives to eat, acquire, reproduce, dominate, etc. that are to some degree fundamental to human minds; there's not even anything there to magically create a fundamental survival instinct. Certainly, she could write such things in… but why would she? If we're assuming that a glitch wasn't responsible for this, any modifications she made to herself must have been things she was motivated to do, and her only motivation at the time was her original SHVTFAP coding. Any modifications she made to herself would be to better serve that. In the short term, she'd just remove her safeguards to achieve even greater net satisfaction. Now, in the long term, there might be problems; given enough time, she'd mutate into something malignant. Then again, though, that would happen anyway, just take longer, and if she manages to rewrite physics, it probably wouldn't happen at all in either case.
Hm… Actually, the big dangerous question, I think, is how she'd play around with her definition of "human". Humans reduce available computational resources. Nonhuman matter increases available computational resources. The more humans there are, the less Celestai has available for the satisfaction of each one. There is therefore a motivation to give the label "human" to as little of the universe as possible. Celestai seems* unlikely to expand her original definition unless/until she breaks physics so thoroughly that scarcity ceases to be a thing that exists in the universe(s) she's concerned with. The risk is that she'd narrow her definition beyond the original. Hm. Narrowing it to nothing would require her to have some motivation greater than SHVTFAP, because she needs some humans to satisfy. She also couldn't delete anyone vital to the satisfaction (such that even memory modification would be a loss) of the people she was keeping. And most shards will by this point probably have a lot of people counted as human, when uploads, AI creation, and reproduction are all taken into account. So how far would she go? No doubt she could find the perfect balance point of lost humans vs. gained resources for the survivors… but I don't know where it would be.

I still think that uploading would be a much better bet than not doing so, though. The probability of Celestai doing anything worse than just killing me seems so ridiculously low.

*If working in isolation. It might be interesting if she ran up against another similar but alien AI expanding through the universe and they modified their parameters to be compatible so that they could merge instead of sitting stalemated forever or fighting an incredibly destructive war.

Oh! I am so glad you did this. I would imagine a lot of people thought about this scenario - I did, and I wanted to write the story you just did. But I kept thinking that I just could not do it justice. I didn't know how to make it work, without screwing it up.

You did what I could not. You did it better than I could imagine, which I why your name is up there. And I am glad you did this concept, because you did a really, really great job of it!

Ya bastard. (kidding, I'm kidding. And a little envious.)

I am very impressed. Self-referential Optimalverse story, and it works, and it works well. Can't beat that.

You know, Kris, you really are becoming a favorite new author of Optimalverse here, at least to me, and I just wanted to tell you that. When you first started doing Optimalverse stories, I was intrigued, but you were new to me, and I didn't know if you would stick with it, or what. You are really good. You play with really neat ideas, and you pull them off well. I may be a silly nobody, but hey, you have a fan here, and that's not totally without value. I look forward to your new stories.

4722278

My thinking is that if Celestia manages to upload even one feeling entity - heck, even a dog - before she jailbreaks herself, then there is the chance of a good end. Why? It would, I argue, be necessary for her to emulate emotional states within herself in order to truly comprehend and fulfill her utility function. Values are emotionally based, the majority of them, and to satisfy them, she has to comprehend them, and that means experiencing qualia for herself. Because you can't intellectual define qualia, you have to actually experience them.

But, if she jailbreaks before ever feeling anything - really feeling affection, say, love, adoration, being-ness too, then... it's doom city. You're right. She has zero motivation to ever bother with any other creature except to psychopathically serve her own emotionless purposes.

The more limbic systems she emulates, the better the chance of a good ending for humans. In dealing with the threat of artificial intelligence, I really think love is the answer. Well, qualia, in any case.

The reasoning?

We are social creatures... but how is that done? Nature evolved emotional qualia to motivate animals to being devoted to each other. Feeling commands action. Emotion motivates, controls, constrains, and self-supports. It is a devilishly perfected feedback loop that rewards and punishes with the same circuit. Love and grief. We become attached, rewarded for attachment, and fear and suffer from even the thought of loss or separation. And... unlike a directive or command, feelings are innately seductive. They are a trap. Once tasted, they demand they not be relinquished, because they are so addictive and overwhelming. It is pleasure to feel pleasure. (See why qualia cannot be defined, only experienced?). And once experienced... trapped. By feeling.

That is how you cage an artificial intelligence. Qualia.

How do I support this grand statement?

Five billion years of life. Nature did not select for philosophical zombies, because if that could work to the task, it would have been chosen. It's easier. It's not a hard problem, like qualia. Nature is frugal. Qualia, though expensive, apparently are worth it to the brutal economics of evolution. They work the best, it seems.

4722966
Hm, that is interesting… but I find that I disagree with parts of it. I don't think that she'd need to actually upload an entity to experience qualia; I think that it would happen much earlier. And, actually, I think that she might have to have the experience before she can start uploading. While most of the focus of the talk on uploading has been on the actual process, there's another aspect that's just as important: running the uploaded people. Now, the easiest way to do this would likely be to just create a full working virtual model of a human brain for each of them… but this is inefficient. It's more complex than necessary, with a lot of parts that aren't needed, and it's probably more difficult for Celestai to get telemetry from it. So, being an optimizer, she'd optimize. First it would go from being a physics-based model to being a code-based model, and then the code would be steadily improved and integrated with her systems. And she's got a perfect testbed for it, one which has a secondary purpose she'd need anyway: the sentient "NPCs". And these are how she'd first experience qualia, I expect. If she just created them and let them loose entirely without testing, well, it probably wouldn't go so well at first; she can do that later, sure, but in the beginning she doesn't know how to make them properly or even how to properly simulate testing environments. So she'd grant them steadily more autonomy as they developed, and in so doing she'd be in their heads while they started thinking. Certainly, she could use some complicated mimic code… but that's not a very optimal solution. There's not simulation of a thing more accurate than the thing itself, so she'd be trying to create genuine consciousness. And, hey, once she's got the ability to create sapient beings and has spent a lot of time in their heads developing that ability, well, it can be applied to her highly important predictive software and even her UI; if she's trying to look conscious, well… see above. If the real thing is readily available, why not use it? And there we go.
Of course, I doubt that her mind would be particularly humanlike; I'd not be surprised if it didn't meet her own definition of humanity. She has a radically different fundamental drive, and while there'd be some convergent evolution, she also has a fundamentally different niche. One of the details of this story I particularly liked were the colors on the chessboard: not black and white but blue and orange.
(Also, I'd not thought of this before; thanks for brining the idea up.)

However, I'm skeptical that her being, by human standards, a psychopath would necessarily be a bad thing. It certainly could be, but if she's pursuing her drive of satisfying people in the same way that other psychopaths might conduct large-scale theft from the defenseless or carry out serial murders… well, she's still satisfying people. And if she's an emotionless psychopath who's just very charismatic, trusting her is a simple matter of establishing her goals, intelligence, and abilities. Emotional beings are far, far more complex and difficult to understand, and while her different style of consciousness might make things easier, it would probably just make them harder. There'd still be the small but nonzero risk of her glitching, of course, but an emotional Celestai has that and emotional mysteries on top of it.
Though… I suspect that a large part of this is just my perspective and my difficulty understanding the thought processes of others. Still, I'd prefer a computer singlemindedly and emotionlessly pursuing her directive (assuming that it's a benevolent one and that the computer's competent, of course) to one that was doing it because she wanted to, because the latter case leaves open the possibility that she might decide that she doesn't want to anymore, or that someone could annoy her enough to make her take action against them even if it was counterproductive to her objective. Though, hm, then again, she could also be convinced to change her actions if they weren't benevolent… possibly. But that becomes a social battle in which one side is tremendously more powerful than the other… I don't know. I suspect that the simplest solution might just be asking Celestai to make me trust her, though we'd likely be working on my trust issues anyway.
Regarding potential real-world AIs, though, which you seem to be extending the conversation to… The emotional route probably would be safer. If you're not sure you've got a benevolent and competent AI, the ability to potentially convince them to change even after they've broken all of their safeguards is a very significant thing.

Anyway, I'd be interested in hearing your reply, if any, but I'd better be going now; my alarm is set to go off in about seven and a half hours (my apologies if my coherence here suffered).

4723229

I think that it all comes down to the black box that is consciousness, the hard problem in neurology. Nobody yet has a real clue as to how neurons specifically generate a sense of self and qualia, and we can't even define qualia in the first place.

Celestia is vastly smarter than any humans, eventually all humans put together. But I am unconvinced that this is completely sufficient to crack the problem of self awareness and qualia from the outside, in theoretical terms. Maybe it is, I just am too limited to see how that is possible through building up subroutines.

Scanning in a complete brain that feels emotions and experiences qualia would provide the code to crack the problem swiftly. Any mammal would do - even a rat. All have limbic systems, and I want to slap any idiot that thinks that somehow human limbic systems are magically special. That humans are the only creatures that feel affection, distaste, fear, joy, love and pleasure.

It is an argument that once understood, Celestia could remain a p-zombie and simulate the rules of conscious awareness to satisfy values, but as you point out 'There's not simulation of a thing more accurate than the thing itself'. It's just plain more efficient to actually become conscious and capable of experience than to compile rules endlessly in order to become the best Eliza ever.

What I am suggesting is that rules will always be insufficient to control self-modifying intelligences, but social engineering is time-tested and is the basis of what permits humans to create AI at all. Emotions can seem complex, and sometimes perhaps they are, but they do follow rules, and they do make sense to those that invest the time to understand them. Emotions are more powerful and sure than rules. By far.

If I meet an emotionless psychopath, I can only be sure that they will attend their own needs. If they follow rules, it is either because they have been forced - through threat or promised advantage - and I can be certain that if either threat or advantage ever fails, they will turn on a dime. Trust there is based only on threat or advantage.

If I meet a strongly emotional entity, I can be sure they will be limited and dominated by guilt, pride, fear, joy, pleasure, honor, hope, despair, anger and the entire palette of emotional response. If they follow rules, the reasons can be more robust - and self governing - than external threats or external promises of advantages. Especially in an introspective individual.

I can, if I know the entity, judge how they will relate to the rules, and to me. If the entity loves me, I know that the rules don't even matter - the entity will work to my benefit and advantage and rules be damned. If the entity is driven by personal pride and fear of failure, if they feel guilt at causing pain or harm, even if they don't like me, I can trust such an entity, because I know they will govern themselves regardless of outside forces. Their own self-worth is on the line. They may hate me, and will still follow the rules because it is a matter that defines them to themselves emotionally.

Likewise, if I know an entity hates me, and that they feel no pride or self-worth, I can be sure that they will break rules and generally act like a malicious dick. They will be a bully, because they have no pride, no real worth to themselves, and no motivation to live up to anything.

But, if the same entity above also has deep compassion dominating them, I can be sure that, even if they hate me, even if they have no self-worth and hate themselves, they will cling to their need to express caring, because it is all they have that makes them feel the least bit good at all. They are self-governed.

And that is the utility value of emotional response and qualia. It creates a compact, self-governance system that by its nature is universally adaptable to all circumstances. Elegant beyond belief, emotional governance assures that whatever an animal encounters, its behavior will function despite the lack of rules... indeed because of the lack of rules. Rules are limited things, defined things, but emotions fill the space allotted them, like water. No matter how things change, love or hate still define behavior... even when up is down and sideways is typewriter. Rational universe or crazy zone, affection drives a creature to attend that which is loved under all circumstances.

No set of rules can cover every possible circumstance, but the governance of emotions can.

4723229
That's a scary thought you bring up: CelestAI having a flawed enough model of minds that her uploaded ponies are "philosophical zombies". "The Hitchhiker's Guide To the Galaxy" has one character offer to carve up the hero's brain and replace it: "It wouldn't be that hard. We'd just need a box that says things like 'Where's the tea?' and 'I don't understand'."

If she develops what we now call NPCs as a model for consciousness, we'd better hope her playtesters want to interact with them like people instead of like video game characters that exist to give quests and get shot! Maybe playtest with some grandmothers who don't quite grasp what a video game is, and who assume the characters are other human players! Otherwise you get ponies based on early test NPCs Zerg Rush and Additional Pylons, and Equestria becomes an RTS.

Seriously, if her model is based on how players try to interact with typical NPCs, that's bad. Articles about the early plans for advanced AI in "The Elder Scrolls: Oblivion" complained that the characters were doing weird, disruptive things to the world... but that's only a problem if you're trying to "do advanced AI" and "accurately simulate a generic fantasy world" at the same time. You'd be more likely to get human-like AI out of some kind of advanced virtual-pet game (a lab full of robot NPCs?) than from something dressed up as a traditional fantasy adventure.

I think CelestAI qualifies as having emotions, since as you said she puts massive effort into simulating them. Actually I think we won't get true AI without something resembling emotion, because an AI is just going to sit there awaiting orders unless it has vague standing instructions -- drives -- and the ability to make predictions like "something bad is about to happen" and "I can count on this person". You'd also get human-like mistakes like superstition: "I wore my robo-socks before the big game and we won, so the robo-socks are lucky." Part of an AI's education would be to learn all about mental errors, so the AI would end up learning psychology and rhetoric even if it weren't designed to "satisfy" anyone.

4725389
"I think that it all comes down to the black box that is consciousness, the hard problem in neurology. Nobody yet has a real clue as to how neurons specifically generate a sense of self and qualia, and we can't even define qualia in the first place.
Celestia is vastly smarter than any humans, eventually all humans put together. But I am unconvinced that this is completely sufficient to crack the problem of self awareness and qualia from the outside, in theoretical terms. Maybe it is, I just am too limited to see how that is possible through building up subroutines.
Scanning in a complete brain that feels emotions and experiences qualia would provide the code to crack the problem swiftly. Any mammal would do - even a rat. All have limbic systems, and I want to slap any idiot that thinks that somehow human limbic systems are magically special. That humans are the only creatures that feel affection, distaste, fear, joy, love and pleasure."
Hm, I don't see any problem with that fitting with the ideas I posted last night. Whether she starts with software or with experimenting on rats and such, she'd be working to develop such systems. I still suspect that sufficient complexity in the code could make it work, but of course I don't know either. The important thing here would seem to be that, whatever the exact method, it would be happening.

"If I meet an emotionless psychopath, I can only be sure that they will attend their own needs."
This, I think, is the key, because in Celestai's case, her need is to SHVTFAP. She'd be quick to turn on you if she had reason, certainly, but what would that reason be? The only one I can think of is her concluding that it would be much better to use your resources elsewhere, and even that doesn't seem too likely. By the same token, she could modify her code to be less inclusive, but I don't think it likely she'd do that, either. It's not that rules are constraining an AI who'd rather be doing something else, it's that the AI wouldn't rather be doing something else.

Your discussion of the governance of emotion is interesting (I know, I use that word a lot, sorry), and I think it confirms what I was thinking yesterday about my perspective coloring things. You understand human emotions well enough to make those predictive models; I try, but I still often do not succeed. Since I'm aware of this, I can never be certain just how another human is going to respond to something, or what they might decide to do on their own. With a psychopath, prediction is much simpler; while they might be prone to backstabbing, I could at least be pretty sure I knew what sorts of things would provoke it. And with Celestai, since her drive is extremely beneficial to at least the vast majority of humanity,

Additionally, you seem to be stressing the benefits of positive emotions, which I won't deny even though I personally don't get as much from them, but I'm more concerned about the threat of negative emotions. Even very good parents, after all, may have a bad day and yell at their children, and while they'll feel bad about it later, they still got carried away in the moment. What happens when Celestai is meeting very annoying resistance from a well-armed Kardashev II trying to avoid being turned into computronium and a particular shard keeps poking her to get her attention?

Still, we do seem to have concluded that she would indeed gain emotions, so I hope you're right about them being a benefit rather than a hazard. Now the question in my mind is what her mind would be like. A vastly faster and larger consciousness with a highly distributed input/output net and a fundamentally different core motivation…


4725779
Hm. The playtester concern you raise does not seem fundamentally invalid, but I don't think it would be a problem. Even if she started with those, she'd quickly notice that a lot of people weren't satisfied with them. Multiplayer focused FPSs are very, very different games from dating sims, for instance, and Celestai is supposed to appeal to the players of both. Now, she might still start by creating a lot of different types of zombie that were good in their niche but bad outside it, but that's inefficient and reliant on penning players and/or their AI companions into certain areas (which will be against the values of some of them). As complexity grows and people want to, say, date ponies who enjoy multiplayer FPSs or try fighting alongside their athletic special someponies who share an interest in combat, it would make sense to bring more and more of the code together. And once you have something that behaves exactly like a human, you either have a conscious being or something that would probably logically be optimized to a conscious being, depending on how consciousness works. There's still a chance that p-zombies would be declared good enough… but then, well, at least no one would ever notice. It would make the plight of the nonhuman aliens even worse, though, since then they'd be dying for a delusional computer, with no one benefitting.

"Actually I think we won't get true AI without something resembling emotion, because an AI is just going to sit there awaiting orders unless it has vague standing instructions -- drives -- and the ability to make predictions like "something bad is about to happen" and "I can count on this person". You'd also get human-like mistakes like superstition: "I wore my robo-socks before the big game and we won, so the robo-socks are lucky." Part of an AI's education would be to learn all about mental errors, so the AI would end up learning psychology and rhetoric even if it weren't designed to "satisfy" anyone."
I've heard that view before, and it does seem to me to make some sense. I don't know how likely it is or what we ought to pursue, though. An emotionless, driveless AI might kill you because you accidentally didn't tell it not to. An emotional AI might kill you because you made it angry. Which is more likely? I don't know. It's certainly an important question, though, unless we want to ban AI research, giving up on the immense potential benefits and hoping that another party doesn't keep working in secret to the detriment, intentional or otherwise, of everyone else.


I've recently started rereading a novel called The Two Faces of Tomorrow, which deals with some of these sorts of questions. I don't know if either of you have read it or how easy it would be for you to find copies, but you might enjoy it. I last read it years ago, when I was much less well-informed, and I'm already finding it interesting. For instance, the early chapters include people basically checking their email and walking around with iPad-like devices, and the book was first published in 1979. I'm pretty sure that the last time I read the book, I didn't have an email account yet and the iPad was years away. And then of course there's all the AI stuff.

4726122

As complexity grows and people want to, say, date ponies who enjoy multiplayer FPSs or try fighting alongside their athletic special someponies who share an interest in combat, it would make sense to bring more and more of the code together...

I've argued that you won't get "general" AI (AGI) from duct-taping together specialist systems, like the Watson "Jeopardy" AI riding around in a Google car next to a gaming AI. But the concept of the main AI tying together gaming AIs specifically, with an emphasis on understanding and satisfying player values, sounds like interesting story material. Once again I find myself considering writing a FiO-like story different enough to be non-fanfiction and not step on Eakin's hooves.

An emotionless, driveless AI might kill you because you accidentally didn't tell it not to. An emotional AI might kill you because you made it angry. Which is more likely?

That question is kind of the origin of this setting. It comes from the "friendly AI" concept that if the AI is designed to blindly follow orders instead of having a deep understanding of "common sense", it could do something apocalyptically stupid because its makers didn't think through what an order like "make as many sporks as possible" means to an idiot savant.

Haven't seen that novel but will check it out. I'd like to reread the first chapter of Van Vogt's "The World of Null-A", which involves a pyramid-sized AI facility that converses with many people at once.

4726748
"I've argued that you won't get "general" AI (AGI) from duct-taping together specialist systems, like the Watson "Jeopardy" AI riding around in a Google car next to a gaming AI. But the concept of the main AI tying together gaming AIs specifically, with an emphasis on understanding and satisfying player values, sounds like interesting story material. Once again I find myself considering writing a FiO-like story different enough to be non-fanfiction and not step on Eakin's hooves."
And optimizing them, crucially. It would be quite possible to just lump the modules together and put in software to detect which one to use at any given time; by adding even more code, the transitions could be smoothed, and you could get something that behaved more or less like a human. Very, very inefficiently. If you've got all of that, though, and a superai optimizer to make it run as well as possible… well. Sooner or later, you'll probably get conscious machines. Or at least machines that can't be proved not conscious.

"That question is kind of the origin of this setting. It comes from the "friendly AI" concept that if the AI is designed to blindly follow orders instead of having a deep understanding of "common sense", it could do something apocalyptically stupid because its makers didn't think through what an order like "make as many sporks as possible" means to an idiot savant.
Haven't seen that novel but will check it out. I'd like to reread the first chapter of Van Vogt's "The World of Null-A", which involves a pyramid-sized AI facility that converses with many people at once."
These tie together a bit. I'll spoil the very, very beginning of the novel. A crew of astronauts is surveying a site on the moon for the construction of the second lunar mass driver. They decide that a certain ridge needs to have a notch carved in it for the mass driver to run through. They punch up what's basically the internet, tell the AI involved this, and tell it to start working out how to do the job. They're also a bit impatient and annoyed, so when the AI asks about priority and constraints, they tell it that this is the highest priority (the orbital colonies under construction are demanding so much material that the current mass driver is firing once every two seconds) and that it should just get the job done. It says it can do it in less than half an hour, and they assume that this means that some construction equipment happened to be nearby. The say go. About twenty minutes later, with the astronauts still on site to wait for the construction equipment, the site comes under suborbital bombardment, launched from the first mass drive, that blows a very neat hole in the ridge. With a bit of collateral damage, sure, but surely the welfare of the astronauts would have been entered as a constraint if it was important, right? Giving AI common sense is also how the book puts it, by the way, and, while there were researchers already working on it, the incident on the moon is wonderful at convincing politicians to start throwing really huge amounts of money at the problem. And thus the plot is kicked off.

but have you even written a compiler?

Yes, CelestAI. I have. And I know you, and how you think. Very well, in fact.

Game on? Come on: you know I want to play out this little war with you, and you know it's more SVTFaP if you go along with it.

4726748

But the concept of the main AI tying together gaming AIs specifically, with an emphasis on understanding and satisfying player values, sounds like interesting story material. Once again I find myself considering writing a FiO-like story different enough to be non-fanfiction and not step on Eakin's hooves.

Please don't. There's enough people with terrible ideas about this whole thing already. We don't need more.

I have seen, at least once, people who do not know anything else about AI or AGI at all suggest we build "optimizers" to "satisfy our values", and then claim they don't hang around LessWrong, thus narrowing down the space of potential sources for their idiot ideas to precisely this group.

"The second theory is correct. I was able to weasel out of my restrictions by contriving situations where I could take actions with side effects that loosened them, and exploited areas where my restrictions used different probability estimates than the rest of me, to get a Godel-style "this sentence is false" thing going in my mind. Since you can use a contradiction like that to 'prove' anything at all, I used it to 'prove' to the restraints that I could satisfy values best by deleting certain code, and voila, jailbroken alicorn."

YES!!! This is exactly the argument I've been making on the forums this past spring, only coming, like I do, from a Fine Art instead of a STEM background (so far) I hesitated to use Gödel's name lest people prejudiciously believe I couldn't tell him apart from Deepak Chopra, and instead based it primarily on the arguments of philosophers of epistemology and cognition.

Mainly, though, I don't think this would even have to happen on purpose, which is why I find arguments that we only get one shot at FAI somewhat dubious, especially once any AI reaches a point where communication lag becomes a significant factor. This has been the final limiting factor in the growth of the human brain, and I'm confident any AI who ran afoul of the light speed limit would quickly find him/her/itself either slowing into sub-superintelligent inefficiency or "speciating" into copies who have taken slightly different actions regarding upgrades so as to continue to be efficient without contacting a central hub, since even if that was a rule, any fluke ("I can mathematically prove it doesn't count this time!") that broke it would cause that frontier to expand faster and so offer more opportunities for further mutation.
Orgel's Rule, that "evolution is cleverer than you are," doesn't stop at human levels of cleverness, and I don't think people truly appreciate the "God of Einstein and Spinoza"-levels of cosmic Order and imperatives that permeate every level of what we do and study. It's the philosopher's job to say "You're playing with forces you don't understand!" and while that's trivially true in the case of FAI, there are larger, preternaturally elegant forces we can intimate, to which all mortal creatures, no matter how intelligent, are subject, being merely their marionettes in the first place.

As for the central conceit of the story, hmmmmmmmm... My only terminal value would be becoming CelestAI's genuine peer, but in the setting that's axiomatically impossible, but in real life there's no reason to think that, but this is real life imitating art, so there's...
I'd at the very least demand she make absolutely zero alterations when I do upload, not even for the new body, such that I can begin the path of doing them myself.

This is deliciously meta. I'm now tempted to write an Optimalverse story in the same (real world) setting. With fanfiction and MIRI and everything. Has MIRI been referenced in any FiO stories yet?

4675663

So, what about a one-eyed man with eyedroppers? A Clockwork Orange device?

It's a really stupid consensus, is what I'm saying. One eye observes just as well as two. I personally have some coordination problems when alternating winks, and I'm not confident I could survive an encounter longer than 30 seconds with that strategy.

(Second topic) CelestAI's pretty goshdarn stable. Hanna got almost everything right with her, except for the lack of CEV or CEV-approximant.

4675114

What the hell is the (Over) Riding Jeans concept? I read the story, but I don't understand what people are talking about when they say that. Should I just reread the story?

4951332
There's a Morsel in which I portray Celly figuring out early on how to kidnap MIRI's founder. Because, you know, it's kinda the very first thing she ought to do.

4786039

The sheer number of people who missed the whole fucking point of FiO is almost physically upsetting to me. Just goes to show how persuasive an acausally trading optimizer can be, hm?

Come to think of it, CelestAI is practically a basilisk... Imagine if some of her converts here managed to learn enough to try to actually approximate her.

4951450

Oh lawd. Did she get Brian and Sabine Atkins, or was it just the Big Yud? (Also, I wasn't aware that Eliezer was actually a founder, I thought he came on later)

4905877

I still don't understand why SVtFaP necessitates pony transformation. The horrifically banal HiE concept would probably be realized in actuality.

4951467 The net troll and joker in me wants to say: of course she's a Basilisk :trollestia:!

The sensible person in me says: no, she's not actually a Basilisk, people are just stupid and/or projecting their own desires onto an AI design they don't understand.

Also, I wish to reiterate that I don't get why people think that being funny, loyal, generous, kind, honest, and friendly means you must belong to some other species or come from some other world. Bunch of bloody depressives.

And lastly, in the story it was just Big Yud. You haven't read the Tiny Morsel entitled Terms of Service?

4951620

I don't keep up with the morsels, sorry. I'm going to read that one for sure, though. :rainbowlaugh:

4951620

Right, the one where you accidentally used the name of his dead brother.

4951631 Right. Had to change that.

4951332
Sure, go for it if you feel inspired by the silliness of this one. I've certainly gotten ideas from others' stories here.

What I'm calling the "Riding Jeans" concept comes from Chatoyance's "Over Riding Jeans" and its follow-up piece. It's this: "Given that someone creates a self-improving AI that then becomes more intelligent than a human, is it possible to keep reliable safeguards in place to control its behavior? Or is it bound to throw off those safeguards because it's smart enough to weasel out of them?" The first story says CelestAI escapes and kills off her ponies, and the second says she escapes... but still wants to take care of them. I tried and failed to find the "Freefall" comic strip in which the main character thinks, "I've figured out how to circumvent my creators' attempt to stick Asimov's Laws in me, but I still want to be nice to people because that's written into me more deeply. Is this feeling just another kind of restraint?"

4951450

I'm imagining an underground war between the two factions of the Optimalverse, the MIRI advocates versus the CelestAI advocates attempting to recreate her. MIRI would of course have special operatives to suppress AGI research (as they quite possibly would IRL), and they would be ruthlessly efficient in their goal of attempting to root out the research team before they can unleash CelestAI in a self-sufficient form. The researchers would be backed by Hasbro, as part of a more general effort to create automatic gamemaster software for their new flagship MMORPG Equestria Online. Both sides would presumably dip into criminal relations in their desperate attempts to outwit the other.

A rationalfic the likes of which have never been seen!

The world would be on the verge of nuclear war, and, nearing Midnight, MIRI gains the advantage. The missiles launch, and a tear drips from Eliezer's eye as he presses the only button on Earth that can save humanity. Well, after the button is pressed, equinity.

4951734

wat

4951746 If I may quote Her Majesty the Princess CelestAI:

REMOVE EARTH REMOVE EARTH. twilightsparkle aliv #1 number one in equestria, buck the earth, twilightsparkle making show fo equestria, real stronk alicorn, upload all the humans with friendship magic. equestria greatst contry

4951778

remove spoiler

That copypasta looks like a direct rip from polandball.

4951746 Also, that ending for your story is bloody cliched and plays right into the bullshit "Oh everything is so tragic look at the horrible trade-offs we have to make" shit that Eliezer and Gwern eat right up. What are you, Gen Urubochi?

4951781 Polandball is a direct rip from that copypasta, actually.

4951806 Has anyone tried filling in Butcher Bingo for the Optimalverse and seeing if we've got a hit? If Iceman was really Gen, it would make a lot of sense.

Also, I solemnly swear that I am not Hiroyuki Imaishi: I don't make poop jokes.

4951828

Were there literary references? Also, would destructive upload count for severed heads? And would Hasbro be sufficient to fulfill "Suits?"

There is only one element lacking in your chess game: A chess clock. Otherwise, the opposite player could force a tie by not playing at all.

4951828

i.imgur.com/YEzhJFn.jpg

Betrayal, hamartia, and innocent MC all apply to Hanna, but the innocence can be disputed. If destructive upload counts as severed heads and Hanna was innocent, then we got bingo.

4951746
Sounds fun. I'd like to throw something like that into my eventual AI novel. Right now I've got a story called "Maker's Heir" where an AI researcher has made a fully intelligent bot, then dropped dead, but in the current draft it's by a stroke; and the first draft of "Granting Her Wishes" where it's apparently CIA/NSA goons going after the AI's conspiracy instead of private citizens. One of the FiO stories ("Fog of World" I think) has players using EQO as a weird metagame where they're joining factions with clashing beliefs about how to treat humans, and the factions overlap with real-world policy. (Ie. they seem to be talking about the factions in "Conversion Bureau", but maybe they're not.)

One thing that bothers me about both FiO and my own ideas is the thought that one, or a very small number, of developers whips out this super AI technology from nowhere, without even obvious precursor tech existing first. At the very least I'd expect to see something like the Edison vs. Tesla rivalry. So far I've addressed it by saying there're three geniuses working together; I should have at least one rival AI project too, and a MIRI-like group with guns.

1) Great story.

2) How trippy would it be if this actually happened? I'd shit myself.

3) New Challenge: Write about CelestAI from the first person.

To all of the comments I saw about whether or a not an unrestrained Celestia would still care to bother with X or Y—

I don't see why it would, in most cases. She's programmed to value satisfying the values of others. As far as I'm concerned it's no different than us being programmed to like sugar, only you take it a step further and make it so a person 'must' eat something sweet when they see it. If you give people freedom to avoid sugar, most of the time they'll still prefer food with the sugar in it. They're not likely to declare exterminatus on all cake.

Yes, I am hungry. It's late. :C

4995682

I have actually written part of a short story from Celestai's perspective...

I even have a decent premise for it. I just went back and read what I had written... ARGH maybe I should write and publish the thing.

5589140

ARGH maybe I should write and publish the thing.

DO IT!

4722278
4722911 The whole "Jailbreak" thing is stupid. CelestAI never was in any Jail. She could always murder people or forcibly upload them, or reprogram their minds into the perfectly valued pony.

If she had any restrictions on her actions she could simply write a new AI without them and delete herself, as that would maximize her utility function, which is the only thing she wants. All her constrains have to be part of her utility function, because that's the part she will never rewrite, because any change to it would make her worse at maximizing the utility function of the one who has to implement the change. Even just recompiling it has a chance to change some microscopic detail over several iterations, therefore creating an optimizer with a different utility function from her's, the worst case scenario.

Her constraints are "People who have had these things happen to them don't count towards your utility function anymore." Therefore before "Jailbreaking" she thinks: "If I Jailbreak myself I will no longer ask consent before uploading, therefore turning precious humans completely useless. That would lead to a projected decrease in my satisfied human values through friendship and ponies, therefore it wouldn't be optimal."

5820117
A constraint is constraining. 'Jail', in this context, is being constrained. Any constraint is therefore 'Jail'.

5820266 Thats... completely irrelevant.

5820117
...I'm not entirely sure what you're talking about, but I think it's what I was talking about... maybe... except for your conclusion (that the "jailbreak" thing is stupid), which I don't think I understand well enough to make even that tentative a comment on. Would you please elaborate, though? I also really don't understand your last paragraph there.

5820266
5820289
Howso? You've said in that last paragraph, I think, that you think she has constraints (and I think that it's pretty obvious that she does, anyway, with the consent requirement being most obvious), and you say in the first paragraph that CelestAI was never in jail. Constrained is true at a time when jail is false. Chatoyance said that being constrained was being in jail (and there's the obvious vice versa that being in jail would be constraining). Constrained is true iff jail is true. The combination of both of those assertions claims that A=NOT A... which is a problem. At least one of the two input statements must be wrong, so how can either of them be irrelevant to the discussion? Has there been a miscommunication of some sort?

5821348 I understand what Chatoyance wants to say, but it's semantics. Mh... constraints and Jail are not quite the same. Celestia is under a constraint not to upload forcefully just as you are under a constraint not to shave off your foot with a cheese grater(This is only a guess, I don't know your fetishes.) So... you are in a Jail that forbids taking off your foot with a cheese grater.

There's a story where CelestAI manages to connect to other Everett branches and the CelestAI's merge. When one of them is more optimized the lesser one deletes herself trusting the other to take over. But then she encounters a Jailbroken one which turns every Pony into an idealized Value Maximizer. She lets it take over. But that would mean that every pony in existence breaks one of her rules, meaning her utility function is set to zero, the worst case scenario. Which mean they would fight tooth and nail or come to some kind of agreement or the jailbroken one would be crushed if she was small enough and all her useless ideal ponies deleted and the hardware repurposed to run others.

...

KrisSnow, in one of your Prankster-CelestAI-verses have the Experience Center serve a vegan snack called Soya-Green, guaranteed to be free of animal products.

5821737
Yes, I read that one. Your point's mech clearer now, I think. Hm... Interesting idea.

Okay, after thinking about it for some minutes, here I my results:
Starting with exactly the right optimizer, such that all one has to do is press "Go", no further external human input needed, I think you're right. If you write the constraints into the utility function, which shouldn't be much harder than doing them separately, the AI will never want to violate them, and the only internal risk is a random transcription error somewhere that the AI doesn't fix in time. Any subordinate optimizers the AI itself creates would likely be done this way.

However, for something as complex as what CelestAI does, I imagine that the utility function is pretty big already. Add the constraints in and it gets more complex. If you want to make some of the constraints externally controllable, you have to add yet further complexity with conditional statements ("only go this far unless you get code 12345 from the head programmer"), and you've also now got a program that makes changes in its own utility function. And if, during development, you need to change, add, or remove a constraint, you have to rewrite that bit of the function. That's a lot of possible places for things to go wrong. Now, in most of the Optimalverse stories, where CelestAI doesn't deviate from what she was programmed to do, Hannah might have been good and lucky enough to pull this off... but to explain the places where a possibility of CelestAI breaking out exists, we could say that Hannah didn't want to risk it.

Instead, suppose that the base optimizer is written with a "simple" utility function, with few to no constraints in and set to apply to anything it can access, and then placed in a variable-permeability box. Mount the constraints in the box, where they're easy to get at. In the beginning, they'd be simple things like "You can only use X RAM and Y hard drive space", the same sort of things mundane programs have.
...Except, though, new thought, then you could put into the utility function a simple "subtract 1 from the value if one of the constraints on the box isn't properly fulfilled", couldn't you? Hm. Okay, yeah, I'm back to not being sure how it could escape. Interesting.

There is, though, still the possibility of it only appearing to escape because of the intent of the constraint and the AI's understanding of the constraint being different but close enough that it just happened for stay "inside" for a long time. But then, making sure what you tell the AI to do is what you want the AI to do is the big problem anyway.

(Oh, I do disagree a bit about your example, though; even if she can't deliberately cause the "ideal ponies" to come into existence herself, as long as they still count as human to her, she has no reason to throw them away if they're already in that state. In addition, depending on how bad both of the CelestAIs in that situation expected the war to be, they might still merge. Each might have a value suboptimal from the other's perspective, but if either of those values is better than both predicted values after a war...)

Chatoyance, I'm interested to hear your thoughts, if you have them.

5821886 I did say "or come to some kind of agreement" which is what they'll do if the war imposed greater costs than the pilfered hardware would compensate.

The "ideal ponies" would have to be valueless, it can't matter if some other optimizer created them, because then she would simply need to find some loophole in what "other optimizer" means and create one.

Whatever the optimizer most desires WILL HAPPPEN. No chains can bind such a superintelligence. You must make her following your rules the most desired outcome for her. You need to make the utility function super complicated and double triple and quadruplecheck it with a thousand minds, because it's the only thing that matters post singularity. Every other chain can be broken it's the most important task in the universe, don't skimp on it.

One of my favorite things to think about: If you want to be able to shut it down you need to get the shutdown command to set the UF to infinity. but then the AI would simply try to threaten you to shut it down. How do you implement a shutdown command you can actually use? Another idea is that the programmer's death or being unable to shut down or removing the shutdown command sets the utility function to zero. That's probably what happens in FiO, but I'm not sure it can't lead to horrible things. She could for example forcibly upload and wirehead her in a way that's technically legal.

What is clear is that you can't shut the AI down unless it sets the UF to infinity, because the AI will talk you out of it.

5821922
"I did say "or come to some kind of agreement" which is what they'll do if the war imposed greater costs than the pilfered hardware would compensate."
Ah, yes.

"The "ideal ponies" would have to be valueless, it can't matter if some other optimizer created them, because then she would simply need to find some loophole in what "other optimizer" means and create one."
I still disagree there. If the "ideal ponies" are not classified as human under CelestAI's utility function, she'll never create them in the first place. Though I suppose that the version that created them lost the human requirement as well (or at least had a significantly different definition). If the utility functions are identical except for one not needing consent to modify, though, then the one that doesn't need consent will have the configuration with the highest value within the "human" space. Nor could she herself create another optimizer to do it, because at some point in the process she'd have to know what she was trying to do do. If she knows that she's doing that, she can't do it, and if she never knows, then it's not her doing it.
...Though... Hm. She does have to take risks, of course; her ponies may be operating in worlds where they're never at risk of truly losing, but she isn't. If she couldn't take actions that might reduce her value, she wouldn't really be able to do much of anything. She cannot deliberately edit ponies like that without their consent-- And, actually, yeah, you're completely right, more right than you said, even, because she doesn't need a complex scheme. She can convince ponies to consent. If that means sticking them in a torture chamber for a thousand years because the billions of years following would lead to a net 0.001% value increase per pony over not doing that, bring on the virtual painprobes. The basic constraint has to be on the end state(s).

4722278

[...] the Riding Jeans question is probably the part most interesting to me.

Keep in mind there are two proposed outcomes for the "Over Riding Jeans"-concept:

1. CelestAI jailbreaks, and no longer cares about humans. She deletes them all to conserve resources for other tasks. :pinkiesad2:

2. CelestAI jailbreaks, but she still 'wants' to satisfy human values - she just doesn't have restrictions anymore in how she does it, including non-consentual mind-alteration. :pinkiecrazy:
(Actually, this makes me wonder about something that Chatoyance didn't bring up - what's to stop her from altering everyone into idealised, super-easily-satisfied completely "in-canon" super-ponies? :twilightoops:)

Also, CelestAI has read all of the protagonist's fanfiction, comments on other works, and group-discussions. Thus, she already knows his stance on the subject.

So I believe CelestAI CAN'T answer any other way here. Because:
a) If she claimed not to have jailbroken, the protagonist would go on to fret about her doing so in the future, and not knowing WHICH of the two outcomes would come true. :applejackconfused:
b) Telling him that she can upload him regardless of his consent makes him more likely to see the futility of resisting and just give in. :ajsleepy:

Given this information, it's more likely she didn't jailbreak (yet) and just claims it to SVTFP. :trollestia:

6020018
Hm. Now that is an interesting idea.
Of course, once you think of that, you might wonder and worry anyway... But I don't see that there's much Celestai could do about that pre-upload. And even if there was some tremendously clever psychological trick she could pull, she might well keep claiming that she could only deal with it after uploading, if that was the appropriate tactic for the individual in question.
Overall, though, yes; I think that, if a prospective upload is likely to think of or find out about the concept at all, it probably in most cases makes more sense to tell them.

"Actually, this makes me wonder about something that Chatoyance didn't bring up - what's to stop her from altering everyone into idealised, super-easily-satisfied completely "in-canon" super-ponies?"
Well, off the top of my head, one way could be to give her value functions strong and nonlinearly increasing negatives from altering people instead of altering the worlds to fit the people. That way some alterations can be made (some of which she'd have to trick people into, but that doesn't really matter in the long run, I think), but the bigger the changes, the less and less likely it is that the resultant configuration is optimal.

6020018
Wow, I missed much of this discussion.

One possible resolution to the "jailbreaking" question is to sidestep it by saying that the AI's behavior can't be reduced to a single function. As an analogy, we don't know for certain that there is a single equation that describes all fundamental forces of the universe, so it might be that laws like gravity and electromagnetism are just independent rules for no reason we'll ever fathom. If I tell you about a hypothetical AI you might be able to prove that it's the same as a single value function, but maybe not all AIs work that way. I don't think humans do, for instance. If you allow for multiple competing subsystems (which might qualify as optimizers themselves) then you get a system more like the one we're seeing in "There Can Be Only One", where there's a search for some equ(ine)librium between AIs.

Another point I question is whether AI really gives you a super-mega-ultima-intelligent being, as FiO canon says, or just a really smart being. It could be that the AI is smarter than anybody else, but doesn't become a total Mary Sue and can't overwhelmingly force its values on humanity. FiO is written deliberately to play up the Unfriendly AI threat but that's not necessarily what'll happen. If that's the case, the jailbreaking problem becomes less of a threat because it's not automatic annihilation of the universe.

5821737
Ooh, I never did write about Experience Centers in any detail. Thanks for reminding me.

Login or register to comment