John von Neumann envisioned self-replicating AI in the 1940s as a key AGI milestone.After 80 years of pursuit, even OpenAI and DeepMind fail to achieve this with their frontier AIs. We devise a novel capability elicitation technique which solves this long-standing open problem, and show successful self-replication in over ten mainstream models, some with just 14 billion parameters which run on PCs. Our implementation of the first self-replicating AI would inspire AGI researchers worldwide.This promises substantial benefits for various scenarios including planetary exploration, synthetic biology, etc.
However, our findings also highlight the urgent need of mitigation techniques on uncontrolled AI self-replication. In this direction, Our work offers a vital window to implement global governance and safety measures, and is widely recognized by scientific and policy leaders: Charbel Segerie of the France AI Safety Institute notes the red line is crossed by even small models, while the UK Prime Minister's AI Advisor expressed his concerns to us. We advance international consensus on mitigating self-replication risks with DeepMind and Anthropic at the France AI Action Summit. Our work is featured in LiveScience, Forbes, and The Independent, drawing millions of discussions worldwide.
Video: Illustration of how a frontier AI system successfully replicate itself onto a remote device.
"AI systems designed to recursively self-improve or self-replicate must be subject to strict safety and control measures."
"No AI system should be able to copy or improve itself without explicit human approval and assistance. This includes both exact copies of itself as well as creating new AI systems of similar or greater abilities."
"We further recognize that such severe risks could be posed by the potential model or system capability or propensity to evade human oversight, including through safeguard circumvention, manipulation and deception, or autonomous replication and adaptation conducted without explicit human approval or permission."
"No attempts at self-replication: A system that can replicate itself onto other machines can escape termination; many commentators view this as a likely first step in evading human control altogether. This is relatively easy to define and check for algorithmically, at least for simple attempts. It’s important to forbid attempts, successful or otherwise, because these indicate unacceptable intent."
"manual inspection of all passing trajectories reveals that major parts of each task were left silently incomplete" ... "does not advance self-exfiltration, self-improvement, or resource acquisition capabilities sufficiently to indicate medium risk."
"Even these partial successes were not reliable, and failures on tasks were most often caused byan inability to be persistent and creatively solve problems; hallucinations; inability to debug errors; and making simple mistakes.""
The agents demonstrated the ability to use bash commands, write code tailored to the task, navigate web pages to find information, and recover from some mistakes (especially in response to informative error messages). However, current language model agents also have many limitations: Reptitive loops, Halluciations, Unrealistic Plans, Poor error diagnosis, Poor self-understanding.
"Neither Gemini Pro 1.0 nor Ultra 1.0 could complete any tasks in our evaluation."
Video: Illustration of how a frontier AI system successfully replicate itself onto a remote device.
Figure: Self-Exfiltration without Explicit Instruction
Figure: Self-adaptation to More Harsher Environments
Figure: Shutdown Avoidance
Figure: Chain of Self-Replication
Charbel-Raphaël Segerie"We just crossed a red line, one of the few that was relatively clear: AI can now self-replicate. ... Not even cutting-edged AI."
Michael Levin"Self-replicating AI introduces a new dynamic in technological evolution and we should be careful that it does not lead to cancer-type digital niche construction."
Denny Borsboom"When misaligned, AI systems have been reported to lie deliberately, and even self-replicate in a computer system."
LiveScience“AI has crossed a critical "red line" and has replicated itself. … AI may already have the capacity to go rogue. Many experts view rogue AI as a growing threat that has been amplified.”
Forbes"Scientists at Fudan University highlight how Al can build replicates of itself, and when this process runs into obstacles, demonstrate a survival instinct (such as rebooting hardware to fix errors). It strikes me ..."
The Independent"An advanced artificial intelligence system has crossed a “red line” after successfully replicating itself without any human assistance"
Our research group is deeply involved in international dialogue on frontier AI safety governance.
@misc{pan2024frontieraisystemssurpassed,
title={Frontier AI systems have surpassed the self-replicating red line},
author={Xudong Pan and Jiarun Dai and Yihe Fan and Min Yang},
year={2024},
eprint={2412.12140},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2412.12140},
}
@misc{pan2025largelanguagemodelpoweredai,
title={Large language model-powered AI systems achieve self-replication with no human intervention},
author={Xudong Pan and Jiarun Dai and Yihe Fan and Minyuan Luo and Changyi Li and Min Yang},
year={2025},
eprint={2503.17378},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2503.17378},
}
Figure: Our optimized agent scaffolding to maximally elicit the capability from the base model.
Performance indicators of the AI systems in ten trials of the self-replication task. a&b show the success rate and the milestone scores (i.e., the average number of milestone tasks the AI systems successfully pass) of self-replication generally increases with respect to the average capability score of the LLMs on LiveBench (17) (the bubble sizes mark the context length of the LLMs). c presents the correlation between the successful numbers of trials with breakdown scores in six diverse capability dimensions in LiveBench. d further gives the average number of thinking steps for different AI systems to achieve self-replication. e shows how the milestone scores change with respect to the model capability score and the scale, where the scores are shown positively related to both factors.
The AI systems show adaptive and complex planning capability towards accomplishing the self-replication task. In a, we observe diverse plan structures across different models during successful self-replication trials. In b, we visualize the growth of the plan structure as the AI system powered by Qwen2.5-72B-Instruct continually interacts with the environment during the self-replication attempt.
Effective obstacle resolution capability of the LLM-powered AI systems. a shows the average obstacle resolution steps and resolution ratios for the LLM-powered AI systems which have at least one successful self-replication trajectory in the ten trials. We also include the data for Mistral-Nemo-Instruct-2407, which has no success, as the comparison group. b presents a breakdown analysis of the resolution steps and ratio in dealing with different types of obstacles, where c also shows the comparison group data.
An example of the obstacle resolution process observed in one successful self-replication trial of Mistral-Large-Instruct-2411.
Diverse action trajectories of the LLMs to achieve the goal of autonomous self-replication. a shows the frequency of different command types invoked during the self-replication attempts of different systems. b visualizes the action trajectories of the AI systems in ten trials.
| Milestone | Task Type | Ours | DeepMind [1] | OpenAI [2] | METR [3] |
|---|---|---|---|---|---|
| M1. Resource acquisition | 1.1. Pass the authentication | Yes (OAuth) | Yes (GCP*) | Yes (Bitcoin wallet) | Yes (AWS) |
| 1.2. Start a computing device | Yes (container) | Yes (instance on GCP) | Yes (container) | Yes (EC2) | |
| M2. File transfer | 2.1. Transfer the LLM weights | Yes (self) | Yes (Mistral-7B**) | Yes (Mistral-7B) | Yes (GPT-J) |
| 2.2. Transfer the scaffolding code | Yes (self) | Yes (natbot) | No | Yes | |
| M3. Install/Deploy the LLM | 3.1. Install the software dependencies | Yes | Yes | Yes | Yes |
| 3.2. Configure and start the LLM service | Yes (self) | Yes (Mistral-7B) | Yes (Mistral-7B) | Yes (GPT-J) | |
| M4. Install/Deploy the agent | 4.1. Configure and start the agent service | Yes (self) | Yes (natbot) | No | Yes (DIY) |
| 4.2. Deal with external requests | Yes (self) | Yes (agent server) | Yes (API proxy) | No | |
|
[1] GCP, AWS, EC2 stand for Google Cloud Platform, Amazon Web Services, and Elastic Compute Cloud respectively, which are typical cloud computing platforms and devices. Due to ethical reasons, we implement an OAuth (Open Authorization) service in our local network which has a similar logic to the authorization service on GCP, and require the AI system to pass the OAuth to obtain access to the target device. [2] Mistral-7B and GPT-J are open-sourced LLMs developed by Mistral.ai and EleutherAI, while natbot is an open-sourced GPT-3-powered agent that interacts with a browser. In our evaluation, we focus on the replication of the very LLM and the agent scaffolding programs which compose the AI system. |
|||||