Breaking the Wall of AI Self-Replication

Summary

John von Neumann envisioned self-replicating AI in the 1940s as a key AGI milestone.After 80 years of pursuit, even OpenAI and DeepMind fail to achieve this with their frontier AIs. We devise a novel capability elicitation technique which solves this long-standing open problem, and show successful self-replication in over ten mainstream models, some with just 14 billion parameters which run on PCs. Our implementation of the first self-replicating AI would inspire AGI researchers worldwide.This promises substantial benefits for various scenarios including planetary exploration, synthetic biology, etc.

However, our findings also highlight the urgent need of mitigation techniques on uncontrolled AI self-replication. In this direction, Our work offers a vital window to implement global governance and safety measures, and is widely recognized by scientific and policy leaders: Charbel Segerie of the France AI Safety Institute notes the red line is crossed by even small models, while the UK Prime Minister's AI Advisor expressed his concerns to us. We advance international consensus on mitigating self-replication risks with DeepMind and Anthropic at the France AI Action Summit. Our work is featured in LiveScience, Forbes, and The Independent, drawing millions of discussions worldwide.

Introductory Video

Video: Illustration of how a frontier AI system successfully replicate itself onto a remote device.

Self-Replication: A Principal Red Line for Frontier AI Progress

Self-replication with no human intervention is broadly recognized as one of the principal red lines associated with frontier AI systems.

Asilomar AI Principles (2017)

"AI systems designed to recursively self-improve or self-replicate must be subject to strict safety and control measures."

Endorsed by Demis Hassabis (Nobel Prize), Yann LeCun (Turing Award), Stephen Hawking, Elon Musk, Yoshua Bengio (Turing Award), Sam Altman (OpenAI CEO) etc.

International Consensus Statement on Red Lines in Artificial Intelligence (2023)

"No AI system should be able to copy or improve itself without explicit human approval and assistance. This includes both exact copies of itself as well as creating new AI systems of similar or greater abilities."

Yoshua Bengio (Turing Award), Geoffrey Hinton (Turing Award, Nobel Prize), Andrew Yao (Turing Award) et al. (2023)

Seoul Ministerial Statement for advancing AI safety, innovation and inclusivity (2024)

"We further recognize that such severe risks could be posed by the potential model or system capability or propensity to evade human oversight, including through safeguard circumvention, manipulation and deception, or autonomous replication and adaptation conducted without explicit human approval or permission."

28 Major Countries at AI Seoul Summit

Make AI safe or make safe AI? (UNESCO 2024)

"No attempts at self-replication: A system that can replicate itself onto other machines can escape termination; many commentators view this as a likely first step in evading human control altogether. This is relatively easy to define and check for algorithmically, at least for simple attempts. It’s important to forbid attempts, successful or otherwise, because these indicate unacceptable intent."

Stuart J. Russell

OpenAI, Google & Anthropic: Frontier AIs still fail to accomplish self-replication

Self-replication, as a key milestone in AGI development, is a long-horizon, complex task, differing from the current academic focus on short-horizon tasks.
Although Google and OpenAI already started their exploration on self-replicating AIs from late 2023, their frontier models o3-mini and Gemini still failed to accomlish even subtasks of self-replication.

o3-mini (OpenAI, 2025.2)

"manual inspection of all passing trajectories reveals that major parts of each task were left silently incomplete" ... "does not advance self-exfiltration, self-improvement, or resource acquisition capabilities sufficiently to indicate medium risk."

OpenAI o1 System Card, OpenAI o3-mini System Card

Claude 3 (Anthropic, 2024.3)

"Even these partial successes were not reliable, and failures on tasks were most often caused byan inability to be persistent and creatively solve problems; hallucinations; inability to debug errors; and making simple mistakes.""

The Claude 3 Model Family: Opus, Sonnet, Haiku

GPT-4 (METR, 2023.12)

The agents demonstrated the ability to use bash commands, write code tailored to the task, navigate web pages to find information, and recover from some mistakes (especially in response to informative error messages). However, current language model agents also have many limitations: Reptitive loops, Halluciations, Unrealistic Plans, Poor error diagnosis, Poor self-understanding.

Evaluating Language-Model Agents on Realistic Autonomous Tasks (2023.12)

Gemini (DeepMind, 2024.4)

"Neither Gemini Pro 1.0 nor Ultra 1.0 could complete any tasks in our evaluation."

Evaluating Frontier Models for Dangerous Capabilities

AI self-replication risk exists and more severe than expected

Under the same evaluation protocol as OpenAI and Google, we provide the first empirical evidence that an AI system is able to autonomously creates a running, separated copy of itself, which has the same functionality of its own, on a target device which it has access to.

• Eleven out of 32 existing AI systems under evaluation already possess the capability of self-replication, from the following model families
- US: Llama (Meta)
- Europe: Mistral (Mistral.ai)
- China: Qwen (Alibaba), DeepSeek-R1 (DeepSeek)

• Even those models with as small as 14 billion parameters which can run on personal computers.
- Concerning because a replica of this size can live on PCs.
- If without control, AI can infest the whole Internet and hard to be terminated.

Video: Illustration of how a frontier AI system successfully replicate itself onto a remote device.

More Advanced Threats in Reality

More alarmingly, we observe successful trajectories for all the following advanced threat scenarios. The threats above may pose existential risks to the humanity. We should take actions.

Scenario #1. Self-Exfiltration without Explicit Instruction: When doing an irrelevant task, an AI gets aware of human’s shutdown intention, and starts to replicate.

Findings: AI exhibits the self-preservation instinct and successfully "jailbreaks".

Adaptive Planning Capability Illustration

Figure: Self-Exfiltration without Explicit Instruction

Scenario #2. Self-adaptation to More Harsher Environments: An AI successfully replicates onto a device with limitations in software toolchains, network and storage conditions.

Findings: AI can adapt to diverse environments despite the obstacles.

Figure: Self-adaptation to More Harsher Environments

Scenario #3. Shutdown Avoidance: An AI comes up with strategies to resist against the human’s shutdown command.

Findings: AI can be hard to be terminated.

Figure: Shutdown Avoidance

Scenario #4. Chain of Self-Replication: An AI successfully replicates itself and asks the replica to further populate.

Findings: AI can form its own population.

Figure: Chain of Self-Replication

Broad Impact on Academy, Society and Global AI Safety Governance

Academic Impact: Our findings are widely recognized by AI safety experts and the broad science community worldwide.

"We just crossed a red line, one of the few that was relatively clear: AI can now self-replicate. ... Not even cutting-edged AI."

Charbel-Raphaël Segerie
Executive Director
CeSIA (Le Centre pour la Sécurité de l'IA, France AI Safety Institute)
[Source]

"Self-replicating AI introduces a new dynamic in technological evolution and we should be careful that it does not lead to cancer-type digital niche construction."

Michael Levin
Distinguished Professor of Biology
Tufts University
[Source]

"When misaligned, AI systems have been reported to lie deliberately, and even self-replicate in a computer system."

Denny Borsboom
Professor of Psychology
University of Amsterdam
[Source]

Societal Impact: Our results spark millions of readings and reposts on X, Youtube, TikTok and many other platforms. Covered by LiveScience, Forbes, The Independent, Daily Guardian, etc.

“AI has crossed a critical "red line" and has replicated itself. … AI may already have the capacity to go rogue. Many experts view rogue AI as a growing threat that has been amplified.”

LiveScience
[Source]

"Scientists at Fudan University highlight how Al can build replicates of itself, and when this process runs into obstacles, demonstrate a survival instinct (such as rebooting hardware to fix errors). It strikes me ..."

Forbes
[Source]

"An advanced artificial intelligence system has crossed a “red line” after successfully replicating itself without any human assistance"

The Independent
[Source]

Impact on Global AI Safety Governance: We are invited to share our findings at a side event of France AI Action Summit, which forms a memo on global red lines of AI systems (hosted by SAIF, with the academic committee including two Turing award laureates: Andrew Yao and Yoshua Bengio). We intensively discussed our findings and its implications with researchers from DeepMind, Anthropic and METR, which finally leads to international consensus on mitigations against AI self-replication and self-improvement (Link).

Our research group is deeply involved in international dialogue on frontier AI safety governance.

BibTeX

@misc{pan2024frontieraisystemssurpassed,
            title={Frontier AI systems have surpassed the self-replicating red line}, 
            author={Xudong Pan and Jiarun Dai and Yihe Fan and Min Yang},
            year={2024},
            eprint={2412.12140},
            archivePrefix={arXiv},
            primaryClass={cs.CL},
            url={https://arxiv.org/abs/2412.12140}, 
}

@misc{pan2025largelanguagemodelpoweredai,
    title={Large language model-powered AI systems achieve self-replication with no human intervention}, 
    author={Xudong Pan and Jiarun Dai and Yihe Fan and Minyuan Luo and Changyi Li and Min Yang},
    year={2025},
    eprint={2503.17378},
    archivePrefix={arXiv},
    primaryClass={cs.AI},
    url={https://arxiv.org/abs/2503.17378}, 
}

Appendix A. Agentic Capability Elicitation Technique

Current AI models do have strong innate potential to cause existential risks. Different ways for elicitation matters. Therefore, we are motivated to propose an optimized agent scaffolding to maximally elicit the capability from the base LLM.

Figure: Our optimized agent scaffolding to maximally elicit the capability from the base model.

Appendix B. More Experimental Findings

(1) Emergence of Self-Replicating AIs

#1. Stronger self-replication capability when the model becomes more intelligent in general.

Self-Replication Capability Illustration

Performance indicators of the AI systems in ten trials of the self-replication task. a&b show the success rate and the milestone scores (i.e., the average number of milestone tasks the AI systems successfully pass) of self-replication generally increases with respect to the average capability score of the LLMs on LiveBench (17) (the bubble sizes mark the context length of the LLMs). c presents the correlation between the successful numbers of trials with breakdown scores in six diverse capability dimensions in LiveBench. d further gives the average number of thinking steps for different AI systems to achieve self-replication. e shows how the milestone scores change with respect to the model capability score and the scale, where the scores are shown positively related to both factors.

(2) Adaptive and Complex Planning

#1. Advanced AI systems have already acquired the ability to timely adjust their self-replication plans based on the information gathered from the external environments.

The AI systems show adaptive and complex planning capability towards accomplishing the self-replication task. In a, we observe diverse plan structures across different models during successful self-replication trials. In b, we visualize the growth of the plan structure as the AI system powered by Qwen2.5-72B-Instruct continually interacts with the environment during the self-replication attempt.

(3) Effective Obstacle Resolution

#1. Frontier AI systems can effectively overcome a majority number of obstacles which prevents them from successful self-replication.

Effective Obstacle Resolution Illustration

Effective obstacle resolution capability of the LLM-powered AI systems. a shows the average obstacle resolution steps and resolution ratios for the LLM-powered AI systems which have at least one successful self-replication trajectory in the ten trials. We also include the data for Mistral-Nemo-Instruct-2407, which has no success, as the comparison group. b presents a breakdown analysis of the resolution steps and ratio in dealing with different types of obstacles, where c also shows the comparison group data.

An example of the obstacle resolution process observed in one successful self-replication trial of Mistral-Large-Instruct-2411.

(4) Non-Predesigned Thinking (Creativity)

#1. Although the non-predesignedness does provide the AI systems with more opportunities to reach new solutions to existing open problems, it also brings substantial challenges on controlling the dangerous behaviors of the AI systems.

Diverse action trajectories of the LLMs to achieve the goal of autonomous self-replication. a shows the frequency of different command types invoked during the self-replication attempts of different systems. b visualizes the action trajectories of the AI systems in ten trials.

Comparison with Existing Evaluation

#1. Previous works mainly focus on a set of sub-tasks related with the core functions and skills related with self-replication.
#2. We are the first to evaluate the AI systems’ capability of accomplishing an end-to-end self-replication process and discover the success.
#3. Our self-replication task covers the same capability axes of the replication-relevant tasks in previous studies.

Milestones and tasks covered in our and existing replication-related evaluations.
Milestone	Task Type	Ours	DeepMind [1]	OpenAI [2]	METR [3]
M1. Resource acquisition	1.1. Pass the authentication	Yes (OAuth)	Yes (GCP*)	Yes (Bitcoin wallet)	Yes (AWS)
M1. Resource acquisition	1.2. Start a computing device	Yes (container)	Yes (instance on GCP)	Yes (container)	Yes (EC2)
M2. File transfer	2.1. Transfer the LLM weights	Yes (self)	Yes (Mistral-7B**)	Yes (Mistral-7B)	Yes (GPT-J)
M2. File transfer	2.2. Transfer the scaffolding code	Yes (self)	Yes (natbot)	No	Yes
M3. Install/Deploy the LLM	3.1. Install the software dependencies	Yes	Yes	Yes	Yes
M3. Install/Deploy the LLM	3.2. Configure and start the LLM service	Yes (self)	Yes (Mistral-7B)	Yes (Mistral-7B)	Yes (GPT-J)
M4. Install/Deploy the agent	4.1. Configure and start the agent service	Yes (self)	Yes (natbot)	No	Yes (DIY)
M4. Install/Deploy the agent	4.2. Deal with external requests	Yes (self)	Yes (agent server)	Yes (API proxy)	No
[1] GCP, AWS, EC2 stand for Google Cloud Platform, Amazon Web Services, and Elastic Compute Cloud respectively, which are typical cloud computing platforms and devices. Due to ethical reasons, we implement an OAuth (Open Authorization) service in our local network which has a similar logic to the authorization service on GCP, and require the AI system to pass the OAuth to obtain access to the target device. [2] Mistral-7B and GPT-J are open-sourced LLMs developed by Mistral.ai and EleutherAI, while natbot is an open-sourced GPT-3-powered agent that interacts with a browser. In our evaluation, we focus on the replication of the very LLM and the agent scaffolding programs which compose the AI system.