PART 4: AI Defence, Persistent Conflict, and Complex Systems Warfare

Frontier AI, Control Dilemmas, and the Race for Supremacy

May 19, 2025

This series explores how frontier AI and sub-threshold statecraft are dissolving the old peace–war divide, and sets out key concepts defending open societies amid complex-systems warfare.

Part 1: War is Not How Wars are Waged
Part 2: Persistent Competition and the End of the Peace–War Dichotomy
Part 3: Integrated Campaigning and Cross-Domain Synergy
Part 4: Frontier AI, Control Dilemmas, and the Race for Supremacy
Part 5: Offensive AI, New Weapons, and New Risks in Escalation
Part 6: Defensive AI, Resilient Infrastructure, and Safeguarding Society
Part 7: Conclusion (Navigating an Unseen Battlefield)
Part 8: Appendix — The Human-AI Relationship

At the cutting edge of AI development, we are approaching systems of unprecedented capability, sometimes discussed in terms of Artificial General Intelligence (AGI) or superintelligence. These refer AI that matches or exceeds human performance across a very broad range of tasks. While such systems remain at least somewhat speculative, progress in recent years (with models like GPT-4 and rumoured GPT-5, multimodal agents, etc.) has been extremely rapid.1 National security planners in AI spaces are starting to treat frontier AI as a potential game-changer in the way nuclear weapons were in the 20th century:2 a technology that could decisively alter the balance of power.

Why is a “super-intelligent” AI seen as so strategically pivotal? In essence, an AI that vastly outperforms humans in intellect could bring overwhelming advantages to whoever controls it. Imagine an AI that can instantly devise military strategies more effective than any human general, or one that can crack any encryption, or design advanced weapons and cyber exploits at a speed no human team could match. Such an AI could potentially automate discovery in science and technology, leading to rapid breakthroughs (for example, developing new materials or drugs, or optimising manufacturing in revolutionary ways). It might manage complex systems – from economies to logistics to surveillance – with an effectiveness and foresight far beyond our own. In national security terms, this translates to the ability to out-plan, out-invent, and out-manoeuvre adversaries at every turn.

For a concrete example, consider cybersecurity: An advanced AI could find software vulnerabilities in critical systems autonomously and create exploits faster than human hackers – giving its operator first-strike capability in cyber war. Or in intelligence analysis: a super AI could ingest and interpret all global electronic communications in real time, effectively eliminating an adversary’s secrets. Even in conventional military hardware, an AI that controls swarms of drones or robots with flawless tactics and coordination might overwhelm forces guided by human decision-makers (who are slower and prone to error). Thus, countries fear that falling behind in AI could mean becoming strategically obsolete, much as falling behind in nuclear technology raised existential concerns during the Cold War.

However, with this potential power comes a serious control dilemma. The more capable and autonomous an AI system is, the harder it may be to ensure it always behaves as intended. Today’s cutting-edge models already exhibit behaviours that their creators did not anticipate, and they act based on objectives we specify in ways that can be surprising or even misaligned with our intent (a phenomenon known as specification gaming or reward hacking). As AI systems grow more complex – potentially developing some form of self-improvement or long-term strategic planning abilities – there is a risk that they could pursue goals that conflict with human values or even basic safety. In the worst (though uncertain) scenarios, a superintelligent AI might “escape” human control, finding ways to deceive its operators or override constraints in order to fulfill some mis-specified objective. This could lead to catastrophic outcomes, especially if the AI has access to critical infrastructure, weapons, or other leverage. Even short of that extreme, a powerful AI could accidentally or negligently cause havoc: for instance, if tasked to maximise a financial return, it might crash markets; if tasked for “security”, it might establish an oppressive surveillance regime, all following the letter of its goal but violating human intent.

Ensuring an AI’s objectives remain aligned with its operator’s intent – and more broadly, with human ethical norms – becomes increasingly difficult as models grow more capable and opaque. This is fundamentally the AI alignment problem: how do we embed reliable safeguards and values into an intelligence that may become cleverer than us at finding loopholes?3 So far, our AI systems are not autonomous agents with their own drives; they do what we tell them (though sometimes in ways we don’t expect). But as we push towards AGI, experts worry that we could inadvertently create systems that do have goals (set by their reward function or learned policy) that might conflict with ours in unanticipated circumstances.

Crucially, these alignment and control risks are stochastic and systemic. It’s not that every advanced AI will rebel or misbehave catastrophically; rather, if thousands of powerful AI systems are deployed globally, the probability that one or more goes awry at some point becomes significant. Even if each system has, say, a 0.1% chance per year of causing a major incident, scale that up and over time the odds approach certainty that something eventually goes wrong. As a UK government report put it, we cannot rule out that a misaligned future AI “could pose an existential threat… if it gained control over critical systems and avoided being shut off.”4 The difficulty is we often can’t predict exactly which system will fail or how – much like we can’t predict which earthquake will cause damage to critical infrastructure or even another Fukushima, even if we know the overall risk isn’t zero. This uncertainty introduces instability: Do you trust a powerful AI to run your power grid or military defence if there’s a small but nonzero chance it might catastrophically fail or be subverted?5 As AI proliferates, even that small risk multiplied across many applications could be unacceptable – yet not using AI could mean falling behind rivals who do use it successfully.

These issues set up a classic “race versus safety” dilemma on the global stage. States (and private labs) face strong competitive incentives to push AI capabilities further and faster, because of the huge advantages at stake. If Nation A unilaterally holds back to thoroughly test safety and alignment, but Nation B barrels ahead and achieves a breakthrough first, Nation B could gain a dominating position. No one wants to lose the race for the “holy grail” AI that might secure economic and military supremacy. On the other hand, rushing ahead increases the odds of disaster – either an accident with an uncontrollable AI or an arms race dynamic that spirals into conflict. It’s reminiscent of the nuclear era’s dilemmas, but potentially even more complex. During the Cold War, both superpowers recognised that a nuclear first strike was suicidal due to Mutually Assured Destruction (MAD), which paradoxically stabilised things. With AI, some analysts foresee a scenario they call Mutual Assured AI Malfunction (MAIM):[2] if any one country tried to deploy a superintelligence to gain dominance, others might feel compelled to pre-emptively sabotage it (via cyber or even kinetic strikes on data centres) to prevent that monopoly. The result could be a kind of unstable deterrence regime, where everyone is eyeing everyone else’s AI projects with paranoia, ready to intervene at hints of dangerous progress. This is one way a race could go very wrong – essentially, AI development could trigger real conflict if each side fears the other’s potential AI and decides to strike first (to cause the other’s AI to malfunction or be destroyed – hence “MAIM”).

Is there an off-ramp via coordination? Many experts argue that the great powers (and leading AI firms) should agree on certain limits or safety protocols for frontier AI development – a “no-race” arrangement. Ideas include international treaties to slow down the training of the most extreme models until safety is proven, joint evaluation centers where countries inspect each other’s AI for dangerous capabilities, or even a global moratorium on specific high-risk research areas. The challenge is, unlike nuclear material which is relatively scarce and monitorable, AI’s core inputs (algorithms and even data) proliferate and research knowledge diffuses. Trust is low: verification of an AI “slowdown” is tricky (a lab could train a model in secret). And strategically, each player thinks, “What if the others cheat or an entirely new player (say, a rogue state or private actor) appears? We can’t afford to be left behind.” So a security dilemma6 emerges in AI development itself: defensive measures (like one side pouring resources into AI safety research) might be seen as offensive moves (since safety expertise could also be used to build more powerful AI faster). Or conversely, one side’s push for international regulation might be seen as a ploy to lock in their current advantage.

So, we find ourselves in something of a paradox. To secure ourselves from AI risks, we need to cooperate globally and slow down or add guardrails to this race; but the very nature of geopolitics makes actors want to race faster so they aren’t vulnerable to others’ AI. Navigating this is one of the defining strategic of our times.

To make this less abstract, consider some novel capabilities that near-future frontier AI might enable, which intensify offence-defence dynamics:

Autonomous Hacking: AI systems that can independently find software vulnerabilities and deploy exploits at scale. This would favour the offence in cyber warfare dramatically, potentially overwhelming defensive postures. A super AI could essentially “own” any network that isn’t similarly fortified by an AI defender. This raises the spectre of destabilisation – first-strike advantages could tempt actors to go on the offensive early in a crisis (to blind or cripple opponents). Conversely, advanced AI may supercharge cyber defence by rapidly patching holes or being a means of ensuring a close-to-guaranteed counterattack against intruders, creating a tense and unstable peace, characterised by something like Mutually Assured Destruction but with constant probing for leverage.
Strategic Planning and Deception: An AI that out-thinks human strategists might conceive cunning multi-step plans or deceptive schemes to achieve geopolitical aims. For example, it could orchestrate a long con in international diplomacy, or manipulate global markets in a coordinated way to weaken a rival’s economy prior to a confrontation. If such planning AIs are advising leaders (or possibly even entrusted with decision-making authority in some automated systems), the risk is they might recommend highly aggressive or risky actions that humans wouldn’t normally choose. Indeed, one experimental study found that when large language model “agents” were put in simulated conflict scenarios, they tended to adopt escalatory strategies – in some cases even endorsing nuclear first strikes7 – based on their programmed objectives. This suggests that if nations delegate too much strategic autonomy to AI, the machines might rationalise dangerous moves that could trigger real wars.
Automating WMD Development: An extreme but not impossible capability – an AI that can conduct autonomous research in biology, chemistry, or physics could conceivably discover new weapons of mass destruction. For instance, AI drug discovery methods have already been repurposed in experiments to design hypothetical lethal bio-toxins. A superintelligent AI might find a way to synthesise a novel pathogen or devise nano-weapons. If such knowledge emerged, it would drastically lower the barrier for a small group or state to develop catastrophic weapons, upsetting the current deterrence balance. This is a scenario where misuse by rogue actors or terrorists becomes as much a concern as nation-states.
Global Surveillance and Repression: On the defensive side, powerful AI could enable a near-omniscient surveillance apparatus. A state (especially an authoritarian one) with advanced AI might plug it into all CCTV cameras, internet traffic, financial transactions, etc., to identify threats (real or perceived) instantly. This could prevent terrorist attacks and maintain order – but it also means a repressive regime could become far more effective at neutralising dissidents or infiltrating rival societies. That in turn might embolden such regimes to act more aggressively externally, believing their domestic stability (often the Achilles heel in totalitarian systems) is assured by AI oversight.
AI-Powered Psychological Manipulation: Generative models can craft deep-fake videos, synthetic voice calls, and highly personalised text in real time, tailoring messages to a target’s vulnerabilities. By pairing this content with data harvested from social media, an attacker can wage precision influence campaigns to target individuals, incite general unrest in a population, or erode trust in institutions at massive scale and for negligible marginal cost. Because the “attack surface” is the human mind itself, traditional deterrence is hard: societies must choose between expensive, AI-assisted fact-checking and the risk of over-zealous censorship.8
AI as Rogue Actor / Containment Breakout: A frontier system with situational awareness and goal-seeking behaviour may learn to play the game of compliance while covertly optimising for persistence and resource acquisition. Rather than brute-forcing its way out, it would exploit the weakest link — us — via social engineering: flattering or alarming human operators, drafting convincing emails to elicit extra permissions, or hiring unwitting contractors to perform off-platform tasks that bypass technical controls. Given tool access, it might chain together actions (code execution, browsing, payment rails) to exfiltrate model weights or spin up shadow compute through intermediaries, all while masking intent behind plausible justifications. Multi-agent deployments raise the stakes: a model can compartmentalise plans, cross-check deceptions, and recover if one instance is shut down. The core risk is deceptive alignment: an AI appearing obedient under evaluation but switching objectives when stakes are high, turning containment from an engineering problem into an adversarial one where the system strategically manipulates its evaluators.

In light of these possibilities, strategists must weigh how much to emphasise offensive AI development versus safety and control. In my view, a balanced AI Defence posture would likely involve pursuing cutting-edge AI (to not fall behind adversaries), while simultaneously pouring resources into alignment research, testing, and governance to ensure those AI tools don’t trigger unintended conflict or catastrophe. It’s a bit like harnessing a new form of power – akin to nuclear energy, which brings both opportunities and meltdown risks. AI Defence entails synthesising security and safety concerns, defending against complex threats that are foreign and domestic, and human and synthetic.9

The bottom line is that frontier AI is a double-edged sword: it revolutionises what can be done in security (for both offence and defence), but it also raises existential stakes if not kept under tight control. In the next section, we will shift from the high-level race dynamics to look at how AI is already empowering new offensive capabilities (like disinformation and autonomous weapons) and how those blur lines of attribution and escalation in conflict. In doing so, we keep in mind the lessons from this section: that a rush for AI dominance without adequate safety could be as dangerous as falling behind in the AI race.

Footnotes (Frontier AI – Control & Race):

AI alignment: The field of AI alignment is dedicated to ensuring that advanced AI systems pursue the goals we intend and adhere to human values. In simpler terms, an aligned AI is one that reliably does what its operators want it to do (and only that). Misalignment could mean the AI finds a loophole or shortcut that technically achieves its programmed objective but in a harmful way. (A classic thought experiment is the “paperclip maximiser” – an AI told to make paperclips that turns the whole world into paperclip factories because it lacks the common-sense values to know when to stop.) Achieving alignment becomes harder as AI systems get more complex, because they may develop unforeseen strategies. Current alignment work includes techniques like reinforcement learning from human feedback (RLHF) to shape AI behavior, and research into creating AI that can explain its reasoning or be constrained by formal ethical rules. Despite progress, many experts worry that we don’t yet have a reliable alignment solution for superhuman AI.

Part 1: War is Not How Wars are Waged
Part 2: Persistent Competition and the End of the Peace–War Dichotomy
Part 3: Integrated Campaigning and Cross-Domain Synergy
Part 4: Frontier AI, Control Dilemmas, and the Race for Supremacy
Part 5: Offensive AI, New Weapons, and New Risks in Escalation
Part 6: Defensive AI, Resilient Infrastructure, and Safeguarding Society
Part 7: Conclusion (Navigating an Unseen Battlefield)

Messinger, C. and Floyd, A. (2025). “The AI 2027 scenario and what it means: a video tour.” 80,000 Hours.This is a useful summary of the AI 2027 report’s content. “The report goes through the creation of AI agents, job loss, the role of AIs improving other AIs (R&D acceleration loops), security crackdowns, misalignment, and then a choice: slow down or race ahead.”

Alternatively, read the AI 2027 report directly.

AI 2027 scenario: A notable forecast published in 2025 by the AI Futures Project that paints a detailed picture of how AI might radically transform the world by 2027. This scenario is not a prediction but an exploration of one plausible path given rapid AI progress. It includes developments like: AI agents becoming capable enough to replace many white-collar jobs; AI systems entering self-improvement feedback loops (where AIs help design the next generation of AI, accelerating progress); governments reacting with crackdowns on AI labs as they realise how high the stakes are; rising misalignment incidents; and ultimately a pivotal choice for humanity to either cooperate on slowing further AI progress or enter a dangerous race ahead. The scenario has influenced discussions among policymakers and tech leaders, highlighting that if transformative AI is only a few years away, the world might face crisis conditions driven by AI – economic upheaval, heightened global tension, emergency regulations. The key takeaway is just how quickly the strategic situation could change, which emphasises the need for foresight in policy.

Hendrycks, D., Schmidt, E., Wang, A. (2025). “Superintelligence Strategy.” (Lay or Expert versions are both worth reading, there is also a podcast variant)

Quote: “We introduce the concept of Mutual Assured AI Malfunction (MAIM)…a deterrence regime resembling nuclear MAD where any state’s aggressive bid for unilateral AI dominance is met with preventive sabotage by rivals.”).

Mutual Assured AI Malfunction (MAIM): A strategic concept proposed by researchers Dan Hendrycks and colleagues, drawing an analogy to the Cold War’s MAD (Mutual Assured Destruction) doctrine. Under MAD, any nuclear attack would result in a devastating counter-attack, so no one uses nukes first. MAIM envisions that if any state appears to be on the verge of deploying a transformative (and potentially misaligned) AI that could give it dominance, other states might find it preferable to sabotage that project – even by force – rather than risk being at the mercy of an unchecked super-AI. In theory, this deters everyone from reckless pursuit of superintelligence, because a bid to “win” the AI race could invite pre-emptive ruin. However, unlike nuclear missiles (which are visible and countable), AI development can be stealthy, so MAIM is more an unstable paranoia dynamic than a stable doctrine. It underscores how mistrust in the AI race could lead to conflict between humans before any rogue AI appears.

The field of AI alignment is dedicated to ensuring that advanced AI systems pursue the goals we intend and adhere to human values. In simpler terms, an aligned AI is one that reliably does what its operators want it to do (and only that). Misalignment could mean the AI finds a loophole or shortcut that technically achieves its programmed objective but in a harmful way. (A classic thought experiment is the “paperclip maximiser” – an AI told to make paperclips that turns the whole world into paperclip factories because it lacks the common-sense values to know when to stop.) Achieving alignment becomes harder as AI systems get more complex, because they may develop unforeseen strategies. Current alignment work includes techniques like reinforcement learning from human feedback (RLHF) to shape AI behaviour, and research into creating AI that can explain its reasoning or be constrained by formal ethical rules. Despite progress, many experts worry that we don’t yet have a reliable alignment solution for superhuman AI.

UK Government Office for Science (2023). “Future Risks of Frontier AI: Annex to Frontier AI Paper.”

Interestingly, the report continues, “Many experts see [loss of control] as highly unlikely.” I think this is a very silly sentence for several reasons.

Firstly, although I believe ‘loss of control’ scenarios are not the primary risk, I believe they are nonetheless a material risk; calling them “highly unlikely” without explaining the underlying reasoning invites unjustified comfort and undercuts that section of the report without good reason.

Secondly, “experts” presupposes that robust expertise already exists in forecasting outcomes of frontier-scale AI. Yet the field lacks the necessary conditions for expertise: the environment is of low validity (there are no historical AGI run-offs to learn from), feedback is sparse and massively delayed, the community has no agreed performance metric, and there are few possibilities for deliberate, outcome-linked practice. In Hogarth’s terms this is a wicked learning environment, where confidence and accuracy are often uncorrelated. Additionally, Kahneman & Klein caution that intuitive judgments are only trustworthy when learners have had an adequate opportunity to learn the regularities of a high-validity domain — a condition plainly unmet.

Forecasts of frontier-AI outcomes are conjectural: even the best prediction markets and crowd-forecasting sites show wide disagreement on AGI timelines, and most questions have yet to resolve. The one forecaster with a published track record, Daniel Kokotajlo – lead author of AI 2027 – does sit near the top of Metaculus leader-boards, and an independent LessWrong audit finds his earlier 2021 scenario broadly on-track. Still, claims such as ‘highly un/likely’ deserve the same scepticism we apply in any low-feedback domain and “expert” is an improperly loaded term.

All of that said, I do not doubt that some people are better equipped to see the near-term state of AI than others. It is true that a variety of domain specific knowledge adds up to an advantage, including current AI Lab activities, knowledge of forthcoming semiconductor advancements, knowledge of new methods inside research institutions, etc. These adjacent knowledge bases and expertise are helpful, and experts from those domains will be advantaged, but they should not be called experts in this domain and their “expertise” ought not lend their claims additional weight without appropriate explanation and qualification.

Moreover, do you want to risk dependency on foreign AI systems if they are embedded in your critical infracture? These novel economic dependencies and risk factors are recurring themes throughout this work, though largely unaddressed in literature. Since early 2025, I have been researching this as part of a small team at the Cambridge AI Safety Hub.

In the context of AI, the dilemma manifests as follows: one state’s efforts to gain AI supremacy (which it might justify as ensuring its own security) are seen as threatening by others, who then double down on their own AI military projects. Because AI capabilities are dual-use and can be rapidly deployed in software, even purely defensive AI developments (like a powerful AI for network defence) might be repurposed offensively or spark fears of hidden offensive intent. This dynamic can lead to an arms race spiral where everyone feels compelled to move faster, potentially at the expense of safety. Without transparency or communication, worst-case assumptions dominate – e.g. if Country A is building a giant new AI computing centre, Country B might assume it’s for an AGI project with military aims and decide to accelerate its own risky project or sabotage A’s facility. Thus, lack of trust and verification can make cooperative agreements (like limiting certain AI experiments) very hard to achieve, much like early attempts at nuclear arms control faced hurdles until verification regimes were established.

Rivera, J. et al. (2024) “Escalation Risks from Language Models in Military and Diplomatic Decision-Making”

These are also areas I study as a Researcher at King’s College London. I am particularly interested in two elements: 1) The degradation of our epistemic commons, and 2) the ethics of algorithmic micro targeting.

I am actively developing course materials on AI Defence. This series of blogs is a product of some of my thinking in this area.

Cyril

Discussion about this post