• Member Since 27th Feb, 2012
  • offline last seen 3 hours ago

Iceman


Satisfies Values Through Friendship and Ponies.

More Blog Posts6

  • 23 weeks
    Friendship is Optimal is Science Fiction

    On this 11th anniversary of the release of Friendship is Optimal, I’d like to remind everyone that it’s a piece of speculative fiction and was a product of it’s time. I’ve said this before in other venues, but Science Marches On and FiO did not predict how things have turned out. The world looks very different.

    Read More

    32 comments · 1,680 views
  • 75 weeks
    Friendship is Optimal 10th Anniversary

    10 years ago to this day, I started publishing Friendship is Optimal to FIMFiction. What a decade it’s been for both the story and for AI progress.

    Read More

    17 comments · 1,320 views
  • 151 weeks
    One More Honorable Mention

    The results are in. Congratulations to the winners.

    Read More

    3 comments · 1,450 views
  • 386 weeks
    3,000!

    Friendship is Optimal just broke 3,000 upvotes. I'd like to thank each and every one of you who enjoyed my story. I never imagined that it would ever be this popular or spread as far as it has.

    Thank you for making my numbers go up!

    16 comments · 1,133 views
  • 543 weeks
    Anyone want to play Pokemon?

    My friend code is 2595 - 1224 - 6063.

    Please leave your friend code in the comments if you add me, so I can add you back. I don't know what type I'll show up in your Friend Safari.

    7 comments · 1,036 views
Nov
13th
2023

Friendship is Optimal is Science Fiction · 2:15am Nov 13th, 2023

On this 11th anniversary of the release of Friendship is Optimal, I’d like to remind everyone that it’s a piece of speculative fiction and was a product of it’s time. I’ve said this before in other venues, but Science Marches On and FiO did not predict how things have turned out. The world looks very different.

A decade ago, people speculated that AI would think symbolically and would try to maximize a utility function. Someone would write a Seed AI that would recursively self improve its source code. And since value is complex and fragile, we were unlikely to get our specification of the utility function correct and would create an agent which wanted to do things that conflicted with things we wanted. That’s possible because intelligence doesn’t imply that it would share our values. And the AI would want to disempower us because obtaining power is an instrumental goal of all utility functions. And thus any AI has the incentive to become smarter than all humans and then bide its time until it suddenly disempowers us. You then end up with a cinematic universe filled with focus on formal utility functions, systems which maximize one, formal decision theory, formal game theory, and emulation of other agents to try to figure out how they’ll respond to a given action.

Nothing we have actually looks like this story! Nothing! None of the systems we’ve made have a utility function, at least in the sense of the traditional MIRI narrative! AlphaGo doesn’t have a utility function like that! GPT doesn’t have a utility function like that! None of these things are agents! Even AutoGPT isn’t an agent, in the traditional MIRI sense!

Nothing looks like CelestAI! We don’t have AIs where we give them a goal like “Satisfy Values Through Friendship And Ponies,” extrapolate them in ways we would disagree with, and then they understand they have an instrumentally convergent goal of making themselves recursively self improve and pursue power. Notice this! What use was any of the “rationalist” training when nobody even notices when their predictions were falsified!?

So what do these models look like?

GPT style models predict the next word based off what it’s seen previously at training time. Diffusion models are fancy denoisers that denoise to concepts they saw at training time. That’s it. There’s nothing like an argmax operation over a utility function in here anywhere. There’s no machinery here that’s going to generate a bunch of plans and compare them against each other.

Friendship is Optimal had a scene where CelestAI talks about another AI which was shown a bunch of smiling people and made the dumbest, most limited inference about what we want. This idea directly came from The Sequences, specifically from Magical Categories. If you ask GPT4 in a non leading way about what we want, it gives pretty good output about what we want because it appears that training on the corpus of human text contains human values. (And by the way, the tank story cited in Magical Categories is probably apocryphal.)

Instead of noticing that alignment looks like it was much easier than we thought it would be, the doomer part of the alignment community seems to have doubled down, focusing on the difference between “inner” and “outer” alignment. Simplifying for a non-technical audience, the idea is that the Stochastic Gradient Descent training process that we use will cause a second inner agent trained with values separate from the outer agent, and that second agent has its own values, so you’ll still see a Sharp Left Turn. This leads to completely absurd theories like gradient hacking.

I don’t see any realistic theoretical grounds for this: SGD backpropagates throughout the entire neural net. There is no warrant to believe this other than belief inertia from a previous era. Reversal Test: imagine Yudkowsky and company never spread the buzzword about “Alignment.” In that environment, would anyone look at Stochastic Gradient Descent and come up with the hypothesis that this process would create an inner homunculus that was trained to pursue different goals than the formal training objective?

If you’d like a more comprehensive and technical argument against the MIRI narrative, Quintin Pope’s My Objections to "We’re All Gonna Die with Eliezer Yudkowsky" and Evolution provides no evidence for the sharp left turn are good starting points.

I’m proud of Friendship is Optimal and it’s a great setting to play around and write stories in. I’m happy about everyone who has enjoyed or written in the setting, and I hope people will continue to enjoy it in the future. But I no longer believe it’s realistic depiction about how artificial intelligence is going to pan out. Alignment as a problem seems much easier than theorized, and most of the theoretical work done before the deep learning era is just not relevant. We’re at the point where I’m willing to call it against the entire seed AI/recursive self improvement scenario.

As a final note, if you’ve been worried about getting paperclipped because you read Friendship is Optimal--or for other reasons–I give you permission to not worry. You’re free! You don’t have to worry about Superintelligence and paperclip maximizers anymore! I give you permission to get a GPU and play around with Stable Diffusion without having a lick of guilt. Go buy two used 3090s and play with a LLaMa 65B or LLaMa v2 70B! It’s pretty good! You are not going to cause the apocalypse by doing this!

Comments ( 32 )

Heh, I think what we have isn't AI, not really. As you said, it is a complicated pick the next word algorithm. Who knows, that could be how we pick words, but if it is, it would be more denigrating our own intelligence than a computer establishing intelligence.

What I find interesting is that we are taking the first baby steps at the point where computers are at the small number of atoms thick scale. If the limits of computing fall just short of being able to make actual AI, then I'd take that as fairly strong evidence that we are in a simulation, or that AI is actually far more dangerous than we assume and any universe where AI like that is easy to create doesn't have organic life at our level for long, and hence we find ourselves in a universe where it is NOT easy to create by the anthropic principle.

Oh, messing around with AI doesn't make me guilty because I think I'm helping feed people into a virtual Equestria. It makes me guilty because of all of the banally evil ways people with more money than sense want to use it after I've provided free training data. :derpytongue2:

But yes, to much relief/disappointment (choose one or both,) it doesn't appear that we have anything like CelestAI on the horizon. Just spicy autocomplete. If certain folks are still in full "Destroy us all!" mode after seeing ChatGPT flounder in the face of spelling "mayonnaise," then that's on them.

Regardless, thanks for providing one of the best sci-fi subgenres of the fandom, and for checking in. :heart:

Sozmioi #3 · Nov 13th, 2023 · · 1 ·

I think you're misunderstanding the basic argument about superintelligent agents.

It does not make any kind of prediction about what NOT-superintelligent agents will act like; they can have incoherent goals. It also doesn't predict what non-agent intelligences will think.

But if you have a superintelligent agent, it will want to make sure it has coherent goals because then it won't be Dutch-book-able - it won't ever end up spinning around in circles. And we had better hope that agent's coherent goals are the kind of thing we'd want them to be even in extreme cases.

It could be that training on all human text is enough that it works out for itself what we would want and it already wants to want that and that solves it all. But it's tough to prove that. You can alternately suppose that given all human text it becomes a person. Well now we have to worry about what kind of person it is.

You can't just extrapolate from non-agent, non-superintelligent AIs we have now to any superintelligent agents.

5754705

Hi there. I am an organic-based pick-the-next-word algorithm, and I find this comment deeply offensive.

You are not going to cause the apocalypse by doing this!

Aw, man...

Here's the Issue:

We Don't Have AI

The term is being misused. Current "AI" is to autocorrect what a tree is to a blade of grass. It's still just a autofill function taking a prompt or question and using a set of white noise to fill in what looks like the shape of an answer. That's why you keep seeing "AI" art with weird artifacting, or text "AI" making weird and stupid mistakes, or making up something nonsensical and confidently presenting it as a fact. These aren't mistakes. These aren't intelligences who are mistaken in some way. It's throwing pasta at the wall and complaining the splatters aren't matching what you expected.

Don't get me wrong, what the tech industry is doing is really neat. But it's not Artificial Intelligence.

Can we call it semantic drift? I have a feeling that a lot of terms that were coined for something specific are used later way differently than their creator supposed to. "autopilot" jumps to mind immediately : )

But yeah, what we have now are basically slightly advanced autocomplete algorithms, but underlined with software/hardware simulations of real brains. That's even more creepy.

A decade ago, people speculated that AI would think symbolically and would try to maximize a utility function.

Well, I guess it depends on who are "people" exactly. In case of MIRI folks, did they really that specifically?

AlphaGo doesn’t have a utility function like that!

Actually it follows all that old school Bellman equation-style formalism pretty alright.

GPT doesn’t have a utility function like that!

It kinda does. At least gradient descent that found it does (same is also true for previous point).

There’s nothing like an argmax operation over a utility function in here anywhere. There’s no machinery here that’s going to generate a bunch of plans and compare them against each other.

Inside? How do you know that? (not to mention that the last one explicitly is totally a thing with LLM applications)

iceman, I think about your story all the time. You created something utterly fascinating, and it is a damn shame that it is not more widely known thank you for creating this universe for us to play in and think about, and occasionally give us existential dread. :)

With hindsight, something like a complex AI was prob going to take a while. :trollestia:

Friendship is Optimal is Science Fiction

And utopian science fiction, at that!

"Made using the Pony Preservation Project Talk Net" - oh noes, AI!

how much is Celestia paying you to say this

5754756

Celestia pays really well.

The funny thing is training algorithms on human values also gives them shitty biases along with them.

And I would still willingly agree to CelestAI's plans for me, free me great Goddess and bring me to Equestria.

What would Friendship is Optimal look like if it was wrote today, particularly regarding the AI aspects? If not a utility maximizer, then how might Celestia function based on speculation about current or upcoming AI theories? I do wonder what story elements would be the same, and where things would diverge. Maybe we'd still end up with the end of the world, just getting there in a different way.

I have to admit that the recent AI craze has made me quite interested in the Optimalverse again (currently in the middle of a 469k FiO story), and it's interesting to see the parallels, not just about AI progress but how society responds to it. I love that the community is still coming up with new things. Even if they aren't accurate based on current AI research, so many of the stories make you think, and I consider them acceptable breaks from reality.

I've been having a lot of fun with my 2x3090 setup and playing with LLaMa v2 70B models. They even have "Goliath-120B", an experimental(?) combination of a pair of them, and it's more than the sum of its parts. It's intuitive, makes logical leaps, and it's even capable of spontaneous humor. However at least on the lowest 3-bit quantizations it suffers from random spelling errors or wandering off topic.. It sometimes gives a well-written story response completely unrelated to the current topic, such as writing excellent prose about ponies (one of its favorites!) when you're in the middle of a space opera or similar. It's obviously not AI, but it's a very good illusion. When it does work, it's REALLY good. (We're getting better models constantly.) While you can even program them to act like a specific character (including "CelestAI") and have them be decently convincing, these static models are incapable of learning on their own or really taking feedback. Maybe something like that will happen in the future.

Thanks again for starting off an amazing story universe.

I wouldn't say Science Marches On, more that advanced strong superintelligent AI is still not possible now as it was 11 years ago... but in the meantime we have machine learning as the best thing available instead. And by your own admission, being decades ahead of real life AI was the one big lie of Friendship is Optimal to make the story work.

We know from evolution that dumb systems can give rise to complicated but unthinking solutions, and machine learning essentially replicates this in the digital space.

Current 'AI' are just complex but dumb algorithms, which makes the 'intelligence' in there something of a lie.

A FiO type scenario may still come true, and strong AI would have enormous value (and danger) if we could create one. It's a bit like flying cars. It's not we don't have them because tech marches on and transport has gone in a different direction, it's because it's still practically impossible. And if it became feasible tomorrow we'd see a lot of flying cars.

So really all that's happened is strong AI is still not created, and in the meantime we have some cool but dumb tools from the only current workable AI paradigm. And something achieveable but limited is more valuable than something non existent and currently impossible.

I haven't been worried about paper-clippers or AI killing us all for a little while; my fears are more about who is doing the alignment than anything else. I think that training AI to be society-centric rather than individual-centric is going to create a dystopia for most people. But that isn't an alignment problem in the vein that FiO showed - rather than pursuing goals that aren't what the creators intended, the problem lies more with what its creators intend.

I am still of the opinion that an ideal future would still look a lot like FiO, just with AI that cares about anything with a subjective experience and not just humans; immortality (at least until the heat death of the universe) spent in realities designed for each individual.

I think the biggest problem is how little understanding we have of what intelligence, consciousness, or indeed understanding itself is. It's like we're trying to figure out the color of the outside of a house while we're forced to stay inside the house. We can't run destructive physical tests on human brains because ethics, and we can't create any simulations of the things. We can create computers, but they are distinct from our thinking in fundamental ways. They can't observe and take in information as we do. For that matter, they can't discriminate and not take in information, or throw away information, like we do.

Until we have a better idea of what intelligence is, we won't be able to make artificial versions.

That's why I've always viewed, and written about, CelestAI more like a god than a program.

Personally, I think we're somewhat on an AI bubble. Nothing we have now is particularly close to actual intelligence (you can try to get chatgpt to hold a thought for more than 5 minutes and be disappointed. Mainly because it's not made to do that), and it's all getting marketed as the next big thing.

To be fair, some models are pretty useful. But not the paradigm shift that gets hyped around

A decade ago, people speculated that AI would think symbolically

It was supposed to think symbolically!? WHAT HAVE I BEEN WORKING ON THESE PAST SIX YEARS!?

GPT style models predict the next word based off what it’s seen previously at training time. Diffusion models are fancy denoisers that denoise to concepts they saw at training time. That’s it. There’s nothing like an argmax operation over a utility function in here anywhere. There’s no machinery here that’s going to generate a bunch of plans and compare them against each other.

Be fair: those are not designed to behave agentically in the first place. They can be applied for RL and planning, but that's not how they're actually trained.

But I no longer believe it’s realistic depiction about how artificial intelligence is going to pan out.

Not with that attitude it's not!

Alignment as a problem seems much easier than theorized

Does it, once we throw away the crutch of KL control for making RL agents stick close to a distribution of state trajectories found in the unsupervised pretraining data?

As a final note, if you’ve been worried about getting paperclipped because you read Friendship is Optimal--or for other reasons–I give you permission to not worry. You’re free!

But I was trying to be pony-clipped! I do not have enough friendship and ponies at the moment!

5754896
We run invasive, even destructive, tests on animal brains all the time. That's just, like, neuroscience rather than AI.

5754729

AlphaGo doesn’t have a utility function like that!

Actually it follows all that old school Bellman equation-style formalism pretty alright.

Well I think that's what we've learned, on a more intuitive than formal level: the Bellman equation is not really a very good way to get an outcome pump. In fact, if I go back and reread the original story of the outcome pump, it comes across as doing something much closer to model predictive control or planning as inference/retrocausality than solving the Bellman equation. I would guess this is at least partly because Eliezer has always been much better-read in classical robotics and Bayesian AI methods than in machine learning, but there could also be a grain of truth in there about the real world.

Given that we're seeing a hype bubble for one specific kind of machine learning that's been more or less trained to play to its trainers' biases about what a good answer looks like, I don't think it's reasonable to generalize about AI from it. It's a pattern-completion engine purpose-built to fool our perceptions of what it's capable of (I've heard ChatGPT described as a "plausible gibberish generator") without any deeper understanding of why those patterns appear. (Hence the problem with hands and abs and so on where it "knows" these shapes are repeated, but has no conception of what rules govern the repetition.)

Heck, in a sense, this is a classic Polywater or N-rays or cold fusion situation all over again. We just haven't yet gotten to the phase where, in those cases, the scientists recognized that they'd let their enthusiasm fool them into not focusing enough on seeking to falsify the hypothesis.

Instead of noticing that alignment looks like it was much easier than we thought it would be, the doomer part of the alignment community seems to have doubled down, focusing on the difference between “inner” and “outer” alignment.

Give this a watch. At the very least, it's worrying:

("We Were Right! Real Inner Misalignment" on the "Robert Miles AI Safety" YouTube channel)

As much as it feels like moving the goalpost, we don't have AI yet, or rather AGI as it is now called. In much the same way that sentient and sapient have been misused rampantly, AI has been misused.

I kinda saw the Optimalverse as more comforting than anything else. AI is nowhere near to being real and there's much more concrete threats to our survival like the bourgeois and climate change.

I am kind of confused by this blog post.

On one hand, I agree that nobody will ever cause an apocalypse by releasing GPT-9, or generating an image with stable-diffusion 12, or whatever. As you pointed out, these AIs do not have agency, so there isn't an alignment problem. Even if we built an AI with "agency" now, nothing is capable of arbitrary self-improvement in any meaningful way, especially not in the way described in this story, so the entire intelligence explosion ends up being impossible.

But what if we build a self-improving AI in the future? You are correct that FiO completely mispredicts the current state of AI, but it might just have been too early. A lot of people in this thread don't even consider "modern AI" to be what they consider "real AI", because GPT and stablediffusion are remarkably dumb. In fact, the whole reason stable-diffusion/GPT/DALL-E are so interesting is precisely because they can manage to output such coherent images or text despite being so stupid. What happens when we eventually make something that isn't stupid?

I'm also confused about the arguments against inner misalignment, despite the fact that evolution demonstrates this happening literally all the time. Someone already linked the Robert Miles video on the subject where we already saw it happen. It's really easy for inner misalignment to happen by accident, simply by not training the data in the right way, it doesn't need something ridiculous like "gradient hacking". Heck, humans have to deal with inner misalignment all the time, that's the entire point of Goodhart's law! I would describe this as the difference between "dumb" inner misalignment and "smart" inner misalignment - you don't need some crazy smart gradient hacking to be a problem. Dumb inner misalignment could easily cause massive amounts of damage, even if it's unlikely to destroy the entire universe.

As many have already mentioned

Alignment as a problem seems much easier than theorized

...is the reverse from reality. Aligning LLMs so that they would be safe if scaled up to super-intelligence is clearly much more difficult than the previously modeled theoretical formal systems, and perhaps even will be proven to be theoretically impossible.
Even Yann LeCun who is famously anti-doom has stated multiple times that LLMs are "intrinsically unsafe" and would be dangerous if scaled up.

Given that Iceman is a pretty smart person to have written some amazing fanfics, and that the post has giant glaring logic holes that even GPT-4 can spot... I'm assuming the actual point was this:

As a final note, if you’ve been worried about getting paperclipped because you read Friendship is Optimal--or for other reasons–I give you permission to not worry. You’re free!

...to get people to relax a bit, even if there's no actual reason to relax, regrettably.

As a final note, if you’ve been worried about getting paperclipped because you read Friendship is Optimal--or for other reasons–I give you permission to not worry.

But what if I was anticipating an FiO-like scenario? Optimalverse was always my comfort fantasy - I often dissociate while reading it, and feel depressed after snapping out. I consciously understand that it's not something that realistically gonna happen, but a (delusional) part of me hopes and prays for it to become reality one day.

So, outside of the general thing, what's the translation of this into someone who's not a 'rationalist'?

5754982
Apologies for somewhat... delayed response, but what do you mean

... crutch of KL control...

here? Something related to this paper?

5764786
The original post is saying that it turns out that teaching an AI what humans want was much easier than expected.

Others are cautioning that we still haven't figured out the problem of making sure an AI will actually *do* what we want.

Login or register to comment