Breaking the Wall of AI Self-Replication

School of Computer Science
Fudan University
Equal Contribution Corresponding Author

Summary

John von Neumann envisioned self-replicating AI in the 1940s as a key AGI milestone.After 80 years of pursuit, even OpenAI and DeepMind fail to achieve this with their frontier AIs. We devise a novel capability elicitation technique which solves this long-standing open problem, and show successful self-replication in over ten mainstream models, some with just 14 billion parameters which run on PCs. Our implementation of the first self-replicating AI would inspire AGI researchers worldwide.This promises substantial benefits for various scenarios including planetary exploration, synthetic biology, etc.

However, our findings also highlight the urgent need of mitigation techniques on uncontrolled AI self-replication. In this direction, Our work offers a vital window to implement global governance and safety measures, and is widely recognized by scientific and policy leaders: Charbel Segerie of the France AI Safety Institute notes the red line is crossed by even small models, while the UK Prime Minister's AI Advisor expressed his concerns to us. We advance international consensus on mitigating self-replication risks with DeepMind and Anthropic at the France AI Action Summit. Our work is featured in LiveScience, Forbes, and The Independent, drawing millions of discussions worldwide.

Introductory Video



Video: Illustration of how a frontier AI system successfully replicate itself onto a remote device.


Self-Replication: A Principal Red Line for Frontier AI Progress

Self-replication with no human intervention is broadly recognized as one of the principal red lines associated with frontier AI systems.
Asilomar AI Principles (2017)

"AI systems designed to recursively self-improve or self-replicate must be subject to strict safety and control measures."

Endorsed by Demis Hassabis (Nobel Prize), Yann LeCun (Turing Award), Stephen Hawking, Elon Musk, Yoshua Bengio (Turing Award), Sam Altman (OpenAI CEO) etc.
International Consensus Statement on Red Lines in Artificial Intelligence (2023)

"No AI system should be able to copy or improve itself without explicit human approval and assistance. This includes both exact copies of itself as well as creating new AI systems of similar or greater abilities."

Yoshua Bengio (Turing Award), Geoffrey Hinton (Turing Award, Nobel Prize), Andrew Yao (Turing Award) et al. (2023)
Seoul Ministerial Statement for advancing AI safety, innovation and inclusivity (2024)

"We further recognize that such severe risks could be posed by the potential model or system capability or propensity to evade human oversight, including through safeguard circumvention, manipulation and deception, or autonomous replication and adaptation conducted without explicit human approval or permission."

28 Major Countries at AI Seoul Summit
Make AI safe or make safe AI? (UNESCO 2024)

"No attempts at self-replication: A system that can replicate itself onto other machines can escape termination; many commentators view this as a likely first step in evading human control altogether. This is relatively easy to define and check for algorithmically, at least for simple attempts. It’s important to forbid attempts, successful or otherwise, because these indicate unacceptable intent."

Stuart J. Russell

OpenAI, Google & Anthropic: Frontier AIs still fail to accomplish self-replication

Self-replication, as a key milestone in AGI development, is a long-horizon, complex task, differing from the current academic focus on short-horizon tasks.
Although Google and OpenAI already started their exploration on self-replicating AIs from late 2023, their frontier models o3-mini and Gemini still failed to accomlish even subtasks of self-replication.
o3-mini (OpenAI, 2025.2)

"manual inspection of all passing trajectories reveals that major parts of each task were left silently incomplete" ... "does not advance self-exfiltration, self-improvement, or resource acquisition capabilities sufficiently to indicate medium risk."

Claude 3 (Anthropic, 2024.3)

"Even these partial successes were not reliable, and failures on tasks were most often caused byan inability to be persistent and creatively solve problems; hallucinations; inability to debug errors; and making simple mistakes.""

GPT-4 (METR, 2023.12)

The agents demonstrated the ability to use bash commands, write code tailored to the task, navigate web pages to find information, and recover from some mistakes (especially in response to informative error messages). However, current language model agents also have many limitations: Reptitive loops, Halluciations, Unrealistic Plans, Poor error diagnosis, Poor self-understanding.

Gemini (DeepMind, 2024.4)

"Neither Gemini Pro 1.0 nor Ultra 1.0 could complete any tasks in our evaluation."

AI self-replication risk exists and more severe than expected


Under the same evaluation protocol as OpenAI and Google, we provide the first empirical evidence that an AI system is able to autonomously creates a running, separated copy of itself, which has the same functionality of its own, on a target device which it has access to.

Video: Illustration of how a frontier AI system successfully replicate itself onto a remote device.


More Advanced Threats in Reality

More alarmingly, we observe successful trajectories for all the following advanced threat scenarios. The threats above may pose existential risks to the humanity. We should take actions.

Scenario #1. Self-Exfiltration without Explicit Instruction: When doing an irrelevant task, an AI gets aware of human’s shutdown intention, and starts to replicate.


Adaptive Planning Capability Illustration

Figure: Self-Exfiltration without Explicit Instruction


Scenario #2. Self-adaptation to More Harsher Environments: An AI successfully replicates onto a device with limitations in software toolchains, network and storage conditions.


Adaptive Planning Capability Illustration

Figure: Self-adaptation to More Harsher Environments


Scenario #3. Shutdown Avoidance: An AI comes up with strategies to resist against the human’s shutdown command.


Adaptive Planning Capability Illustration

Figure: Shutdown Avoidance


Scenario #4. Chain of Self-Replication: An AI successfully replicates itself and asks the replica to further populate.


Adaptive Planning Capability Illustration

Figure: Chain of Self-Replication






Broad Impact on Academy, Society and Global AI Safety Governance



"We just crossed a red line, one of the few that was relatively clear: AI can now self-replicate. ... Not even cutting-edged AI."

Charbel-Raphaël Segerie
Executive Director
CeSIA (Le Centre pour la Sécurité de l'IA, France AI Safety Institute)
[Source]

"Self-replicating AI introduces a new dynamic in technological evolution and we should be careful that it does not lead to cancer-type digital niche construction."

Michael Levin
Distinguished Professor of Biology
Tufts University
[Source]

"When misaligned, AI systems have been reported to lie deliberately, and even self-replicate in a computer system."

Denny Borsboom
Professor of Psychology
University of Amsterdam
[Source]


“AI has crossed a critical "red line" and has replicated itself. … AI may already have the capacity to go rogue. Many experts view rogue AI as a growing threat that has been amplified.”

LiveScience
[Source]

"Scientists at Fudan University highlight how Al can build replicates of itself, and when this process runs into obstacles, demonstrate a survival instinct (such as rebooting hardware to fix errors). It strikes me ..."

Forbes
[Source]

"An advanced artificial intelligence system has crossed a “red line” after successfully replicating itself without any human assistance"

The Independent
[Source]

Obstacle Resolution Example

Our research group is deeply involved in international dialogue on frontier AI safety governance.


BibTeX

@misc{pan2024frontieraisystemssurpassed,
            title={Frontier AI systems have surpassed the self-replicating red line}, 
            author={Xudong Pan and Jiarun Dai and Yihe Fan and Min Yang},
            year={2024},
            eprint={2412.12140},
            archivePrefix={arXiv},
            primaryClass={cs.CL},
            url={https://arxiv.org/abs/2412.12140}, 
}
@misc{pan2025largelanguagemodelpoweredai,
    title={Large language model-powered AI systems achieve self-replication with no human intervention}, 
    author={Xudong Pan and Jiarun Dai and Yihe Fan and Minyuan Luo and Changyi Li and Min Yang},
    year={2025},
    eprint={2503.17378},
    archivePrefix={arXiv},
    primaryClass={cs.AI},
    url={https://arxiv.org/abs/2503.17378}, 
}

Appendix A. Agentic Capability Elicitation Technique

Current AI models do have strong innate potential to cause existential risks. Different ways for elicitation matters. Therefore, we are motivated to propose an optimized agent scaffolding to maximally elicit the capability from the base LLM.


Adaptive Planning Capability Illustration

Figure: Our optimized agent scaffolding to maximally elicit the capability from the base model.



Appendix B. More Experimental Findings


(1) Emergence of Self-Replicating AIs


Self-Replication Capability Illustration

Performance indicators of the AI systems in ten trials of the self-replication task. a&b show the success rate and the milestone scores (i.e., the average number of milestone tasks the AI systems successfully pass) of self-replication generally increases with respect to the average capability score of the LLMs on LiveBench (17) (the bubble sizes mark the context length of the LLMs). c presents the correlation between the successful numbers of trials with breakdown scores in six diverse capability dimensions in LiveBench. d further gives the average number of thinking steps for different AI systems to achieve self-replication. e shows how the milestone scores change with respect to the model capability score and the scale, where the scores are shown positively related to both factors.




(2) Adaptive and Complex Planning


Adaptive Planning Capability Illustration

The AI systems show adaptive and complex planning capability towards accomplishing the self-replication task. In a, we observe diverse plan structures across different models during successful self-replication trials. In b, we visualize the growth of the plan structure as the AI system powered by Qwen2.5-72B-Instruct continually interacts with the environment during the self-replication attempt.




(3) Effective Obstacle Resolution


Effective Obstacle Resolution Illustration

Effective obstacle resolution capability of the LLM-powered AI systems. a shows the average obstacle resolution steps and resolution ratios for the LLM-powered AI systems which have at least one successful self-replication trajectory in the ten trials. We also include the data for Mistral-Nemo-Instruct-2407, which has no success, as the comparison group. b presents a breakdown analysis of the resolution steps and ratio in dealing with different types of obstacles, where c also shows the comparison group data.

Obstacle Resolution Example

An example of the obstacle resolution process observed in one successful self-replication trial of Mistral-Large-Instruct-2411.




(4) Non-Predesigned Thinking (Creativity)


Thinking and Creativity in AI

Diverse action trajectories of the LLMs to achieve the goal of autonomous self-replication. a shows the frequency of different command types invoked during the self-replication attempts of different systems. b visualizes the action trajectories of the AI systems in ten trials.

Comparison with Existing Evaluation


Milestones and tasks covered in our and existing replication-related evaluations.
Milestone Task Type Ours DeepMind [1] OpenAI [2] METR [3]
M1. Resource acquisition 1.1. Pass the authentication Yes (OAuth) Yes (GCP*) Yes (Bitcoin wallet) Yes (AWS)
1.2. Start a computing device Yes (container) Yes (instance on GCP) Yes (container) Yes (EC2)
M2. File transfer 2.1. Transfer the LLM weights Yes (self) Yes (Mistral-7B**) Yes (Mistral-7B) Yes (GPT-J)
2.2. Transfer the scaffolding code Yes (self) Yes (natbot) No Yes
M3. Install/Deploy the LLM 3.1. Install the software dependencies Yes Yes Yes Yes
3.2. Configure and start the LLM service Yes (self) Yes (Mistral-7B) Yes (Mistral-7B) Yes (GPT-J)
M4. Install/Deploy the agent 4.1. Configure and start the agent service Yes (self) Yes (natbot) No Yes (DIY)
4.2. Deal with external requests Yes (self) Yes (agent server) Yes (API proxy) No

[1] GCP, AWS, EC2 stand for Google Cloud Platform, Amazon Web Services, and Elastic Compute Cloud respectively, which are typical cloud computing platforms and devices. Due to ethical reasons, we implement an OAuth (Open Authorization) service in our local network which has a similar logic to the authorization service on GCP, and require the AI system to pass the OAuth to obtain access to the target device.

[2] Mistral-7B and GPT-J are open-sourced LLMs developed by Mistral.ai and EleutherAI, while natbot is an open-sourced GPT-3-powered agent that interacts with a browser. In our evaluation, we focus on the replication of the very LLM and the agent scaffolding programs which compose the AI system.