Semi-structured reflections on AI safety

By-products of a week at the University of Cambridge with some very lovely people

Feb 06, 2025

CAMBRIDGE: Oxford’s understated cousin, trading elegant sprawl for mismatched masonry and corridor-streets. You might not expect medieval buildings to play home to cutting-edge AI research, but the future has old bones.

It’s just after 8pm and I’m tired. We’re all tired. Sure, it’s only Day 3, but 30-plus hours of intense research and jet lag take their toll. It’s worth it, though; there’s no substitute for IRL. There are advantages to being handed a ready-made group identity and spending nearly every waking moment together. Familiarity sets in fast. Conversations flow more easily… although they’re not always strictly work focussed.

“... that’s why I don’t want kids. Well, not ‘til I’ve seen how the next five years go.”

I know I’m one of the few open optimists in the AI safety world, but damn, that sounded sincere. There’s a world of difference between ‘shitposting’ and confessions in St John’s. Doom is different in person.

It wasn’t my conversation to overhear. Still, I can’t help myself; I never can.

“Sorry… Say more?”

Cassandra

My colleague isn’t neurotic or a conspiracy crank. She’s a rationalist with a pedigree, whip-smart and deadly serious.1 She, like everybody else I’m working with this week, believes in the existential threats of AI. In technical terms – which you must understand are, in fact, colloquial here – their expected probability of a catastrophic outcome (even extinction), our ‘P(doom)’, is significant in the short-term and rising in the medium-term.

Let me quote a currently-trending EA Forum post:

“AI risk is no longer a future thing, it’s a ‘maybe I and everyone I love will die pretty damn soon’ thing. Working to prevent existential catastrophe from AI is no longer a philosophical discussion and requires not an ounce of goodwill toward humanity, it requires only a sense of self-preservation.”

Of course, everybody’s forecast is different. Hell is personal, you see. Yet the community is united by the belief that we live in critical times, with a narrow window to act before we descend into a future far worse — and far harder to escape — than our present conditions.

The atmosphere is sort of terrifying, but also weirdly fascinating. Honestly, I thought that kind of talk was mostly equal parts internet culture and generic despair, accelerated by contemporary scholarship and media. Edginess at worst and edge cases at best. Apparently not.

And yet …

The AI safety community is pretty much laser-focused on one thing: avoiding catastrophe and extinction at the hands of AI. Don’t get me wrong, I like living too, I just think survival alone is a bleak and miserable goal. What if we manage to solve issues of governance, alignment, and interpretability, only to find ourselves in a future that remains profoundly mid? We’ll avoid death, but for what? I can only have so many identical Wednesdays.2

I would rather actively use new technology — not squander it, nor neutralise its potential. The bridge between our present and a much brighter future is, in essence, intelligence and its application. The more smart, capable, and well-equipped individuals we have working on pressing problems, the greater our chance of finding a solution.

Natural intelligence has propelled us into a materially better world than ever before. In fact, that claim is so well-worn it teeters between passé and cringe. For much of human history, exceptional intelligence was a rare, serendipitous gift, emerging only when innate ability was nurtured in a broader, supportive environment. Artificial intelligence requires no such luck.

Tenfold improvements in our quality of life hinge on how effectively we deploy intelligence. We’ve reached the limits of unaided human capability in areas like public health screening, medical imaging, drug discovery, precision agriculture, and advanced manufacturing. To progress further, we either need performance that exceeds the bounds of human ability or a vastly expanded workforce to manage distributed systems — both presenting practical bottlenecks. There’s a wealth of low-hanging fruit waiting for an abundance of intellect.

Pause

I’m optimistic and excitable. I know.

Yes, I believe we may have glimpsed a pathway to something as extraordinary as interstellar immortality — a prospect that serves as a counterbalance to the prevailing extinction-risk discourse.3 But let’s not get ahead of ourselves.

There are many mundane problems. Even if we never achieve an all-powerful, superintelligent AI, today’s AI4 already poses significant challenges. We have seen deep learning applied to drug discovery, toxicology, medical imaging, and all manner of beneficial services while it has also been used to create vicious skinner-boxes and attention-blackholes in the form of modern social media and the gambling industry. Now, we’re witnessing LLMs fuel signficant disruption by outcompeting the human supply of labour. These are immediate and pressing concerns. Yes, the Industrial Revolution saw waves of destruction and benefit (“creative destruction”), but the destruction part was very real. If you happen to be a worker replaced by a newfangled machine, it sucks. A lot.5

(As an aside: In a twist that seems to elude some funding bodies within the AI safety space, the economic dangers of AI are exacerbated by some of their own hiring and funding practices. With some funder’s expected timelines to AGI in the “3 – 5 years” range, the message seems to be: if you missed the boat, another isn’t coming. Consequently, the talent pipeline risks stagnation, as many doubt that junior researchers can make a swift impact or that exploratory work remains worth supporting. In my view, this is short-sighted. C’est la vie.)

Distribution, too, is no trivial matter. There’s nothing inherently wrong with becoming fabulously wealthy (I aspire to that myself), but wealth creation doesn’t automatically benefit everyone. Inequality matters and systems have their breaking points. Just ask Robespierre6 or Mangione.

People do dangerous and desperate things in zero sum environments. Much of what we mistake for morality is merely felt-abundance. It’s easy to share when you have some to spare. Actually, correction, it’s easy to share when you feel like you have something to spare. Perception is all you need. Rapidly diminishing comparative wealth feels like deprivation and it can eventually become deprivation, too.

Technology alone cannot solve every problem. There is no substitute for effective governance and robust policy. In this regard, I agree with Yarvin’s Techno-Pessimist Manifesto;7 looking at Johannesburg, it’s hard to imagine that peace and prosperity can be achieved solely by producing and consuming ever more technology. Material progress affords us new opportunities, but it’s only beneficial in so far as it drives human progress. In a similar vein, it’s hard to imagine that there was anything particularly effective or altruistic about AI researchers enjoying a week of vegetarian catering with a fully stocked pantry while ignoring Cambridge’s homeless sat on the same street.8 No, whatever it affords, material technology is second to social technology.

Mindful Solutionism

Every technology is multi-purpose. Every disruption in a complex system risks unexpected and unwanted outcomes. Almost all innovation is a one-way street; if it’s out of the bag, then it’s out of the bag. Still, we’ve racked up plenty of wins. We’ve overcome food production bottlenecks, resolved an ozone crisis, and brought endangered species back from the brink. Heck, we have had 80 years of nuclear weaponry with near misses but without actual catastrophe.9

Yes, we still have a long way to go in environmental management and climate change adaptation. And, yes, we’ve only just realised that vital organs full of microplastics might be a problem. But, hey, that’s a good start! After all, it took multiple civilisations thousands of years to realise just how dangerous lead and mercury were. In just over 100 years since the widespread industrial production of synthetic plastic, we’ve uncovered the risks posed by microplastics. Not bad, really.

I believe that historically informed, trajectory‐focussed optimism is essential. Fear really is the mind killer. Here’s Feynman:10

“I had a very strong reaction, after the war, of a peculiar nature. And it may be from just the bomb that's happened or maybe for some other psychological reasons, I had just lost my wife or something.
I remember being in New York with my mother in a restaurant right after, immediately. And thinking about New York. And I knew how big the bomb in Hiroshima was, how big an area it covered and so on. And I realized from where we were, I off 59th Street, to drop one at 34th Street. And that would spread all the way out there. And all these people would be killed and all the things would be killed. And there wasn't only one bomb available, but it was easy to continue to make them. And therefore, that things were sort of doomed. Because already it appeared to me, very early, earlier than to others who were more optimistic, that international relations and the way people were behaving was no different than it had ever been before. And that it was just going to go out the same way as any other thing. So, I felt very uncomfortable and thought, really believed, that it was silly.
I would see people building a bridge, and I would say they don't understand. I really believed that it was senseless to make anything, because it would all be destroyed very soon anyway. That they didn't understand that. And I had this very strange view of any construction that I would see. I would always thought how foolish they are to try to make something.
I was really in a kind of depressive condition.”

It’s OK to be worried as long as you keep your wits and faith about you. In fact, preparing for the worst can be healthy and adaptive. Again, Yarvin:

“As a pessimist by temperament, I find it important to be a pessimist by trade. Also, everyone has to be a pessimist by logic: pessimism optimizes tail-risk optionality, since you are either wrong or disappointed — never both.”

Pessimism is a pinch of salt — and you need salt — but you need light too. You need a positive vision. Building communities around fears (or hatreds) can only fuel a destructive, Luciferean will.

So what should you do amidst uncertainty? Create. Build. Love. Be sensible, but don’t give in to fear; figure it out with a sense of poise and rationality. I know these sound like cringey, vague, and abstract responses to a practical problem, but they aren’t. They are the basis of cooperation and social cohesion. Remember that most of our problems are social problems not “tech” problems, and most our solutions have been social solutions.

You need people and they need to be inspired.

I will take an extra footnote to stress just how much respect I have for this person.

Flippant, perhaps, but I will trade some quantity of life for a significantly increased quality of life. I’ll shoulder some risk to do that, too. This isn’t Bryan Johnson’s ‘Don’t Die’ movement. If someone plotted a path to achieving immortality (or at least the expected escape velocity to get there), then it would only be reasonable to trade out quality for quantity in the short-term. This isn’t that.

Pascal’s Mugging cuts both ways.

Many people see longtermism as a justification for extreme caution, arguing that small errors now could have catastrophic effects down the line. This leads to a bias toward decelerationism — slowing things down to minimise potential risks. But if you take longtermism seriously in a broader sense, it actually implies the opposite: the compounding benefits of progress (both technological and moral) mean that delaying progress is itself a catastrophic cost. Every day without medical breakthroughs, better energy sources, or economic growth means lost QALYs, stagnation, and increased systemic fragility. This is also why I’ve been sceptical of the standard AI safety framing. Many in that space view longtermist concerns as a reason to hit the brakes, but slowing down has its own existential risks.

The real challenge, then, is making longtermism not just about risk mitigation but about maximising the rate of beneficial progress. That’s a much harder and more interesting problem than just saying "go slow."

Here I’m speaking of AI in the typically-accepted sense. Frankly, I think the as-of-yet unspoken driving force of human history is a battle with algorithms. It appears that many are worried about the abstract philosophical “what if?” of a ‘paperclip maximiser’ going off the rails and destroying planets to manufacture more paperclips without realising we currently live in one. What do you think Capitalism is, exactly? What do you think any economic system is? Economies (really, all social domains) are algorithms and we are the nodes of a neural network. Paperclips are dollars and you better believe they are being maximised, with varying degrees of benefit and harm to the rest of us. We don’t have “AI” problems; we have algorithm problems. That shift in mindset brings a far larger set of tools to the table… but I’m getting off track and I’ll return to this in another piece.

As with all prior innovations, I’m sure humans will adapt to some of this and there will be entirely new jobs that humans can find. However, I’m also pretty confident we are at peak population-relative demand for human labour (if this trajectory continues and significant interventions don’t occur).

One may also wish to ask Robespierre about the harms that can be done by a ‘Committee for Public Safety’ too.

Gray Mirror

A techno-pessimist manifesto

Are you a techno-optimist? This is a serious condition—as common as prediabetes. Don’t laugh. You can treat your prediabetes—and your techno-optimism, too…

2 years ago · 664 likes · Curtis Yarvin

This isn’t a critique levelled at my colleagues. The critique would have to apply to myself, too. I’m not so lacking in self-awareness. Rather, this is a broader commentary on the license an abstract, felt-sense of “doing good” can give us to ignore the practical goods in front of us. I don’t know what the right balance is at an individual or population level, but this juxtaposition of recycling, low-carbon vegetarian catering, and moral mindfulness felt at odds with the world just outside. If you’re looking to resolve an important, neglected, and tractable problem every day then you don’t have to look very far.

“and that is a powerful cat!”. Aesop Rock - Mindful Solutionism

30 November, 2024: ‘AN UNEXPECTED CONVERSATION with Richard Feynman’ via Rick Rubin’s Tetragrammaton podcast. — Strictly speaking, this a remix of old interviews in a new podcast, but I believe this quote has been given in full elsewhere, and frankly its the concept that matters most.

Cyril

Discussion about this post