AI Safety in China #23
APEC, China-US, revised Cybersecurity Law, AI emergency response standard, experts on AI and cybersecurity, AI and international stability
Key Takeaways
During a meeting with President Trump on the sidelines of the Asia-Pacific Economic Cooperation (APEC) summit, President Xi remarked that the US and China have “good prospects for cooperation” in AI and other fields. AI will also be on the agenda at APEC in 2026, with China chairing.
President Xi and a Vice Minister of Foreign Affairs reaffirmed China’s proposal for a World AI Cooperation Organization in different international fora, underscoring China’s continued commitment to the initiative, though details remain sparse.
China’s revised Cybersecurity Law added a new clause on AI, signalling continued policy attention on AI development and safety, but without introducing any new legal requirements.
A key standards-setting body released guidelines for AI emergency response, creating procedural and institutional foundations. However, the standard only briefly references frontier risks.
Chinese researchers have published technical papers on safe protein foundation models, quantifying AI agent self-replication risk, malicious fine-tuning defense, and unlearning harmful data.
An expert from a leading national security think tank warned of the twin dangers of humans losing control over AI and technology-induced great power strategic misjudgments.
International AI Governance
APEC leaders’ meeting suggests US-China engagement on AI is possible
Background: The APEC Economic Leaders’ Meeting was held in South Korea on October 30-November 1. In addition, Chinese President Xi Jinping and US President Trump held their first in-person meeting of Trump’s second term.
Key AI updates:
In his APEC address, President Xi emphasized that AI should “benefit people of all countries and regions” and evolve in a “beneficial, safe, and fair direction.” He reiterated that China proposes to establish a World AI Cooperation Organization to promote collaboration on development strategies, governance rules, and technical standards. Xi also stated that AI will be on the agenda at the 2026 APEC Leaders’ Meeting, hosted in Shenzhen, China.
The meeting produced the APEC AI Initiative (2026–2030), which focuses primarily on advancing AI development while also referencing “security, accessibility, trustworthiness, and reliability.”
During his bilateral meeting with Trump, Xi remarked that the US and China have “good prospects for cooperation” in AI and other fields, without elaborating further. The US side did not acknowledge discussing AI, and the readouts primarily focused on trade disputes.

Implications: These developments highlight APEC’s potential role as a convening point for AI governance cooperation, especially given the hosting of China-US head of state meetings on the event’s sidelines for three years in a row. With China chairing APEC in 2026, the venue’s relevance could further increase. However, APEC discussions so far appear to center more on AI development than safety or governance. Xi’s comments in the meeting with Trump may indicate Chinese interest in continuing the bilateral China-US dialogue on AI that last met in May 2024.
Meanwhile, Xi’s mention of the World AI Cooperation Organization underscores China’s continued commitment to the initiative, though details on its timeline, leadership, and structure remain unclear.
Vice Minister of Foreign Affairs discusses AI at the United Nations
Context: On September 24, Vice Minister of Foreign Affairs MA Zhaoxu (马朝旭) addressed a UN Security Council (UNSC) session on AI and international peace and security, warning of dangers from lethal autonomous weapons and terrorist misuse of AI.
Content: At UNSC, Ma called for a global consensus on AI governance grounded in:
People-centered development, with AI aligned to shared values of humanity and refined ethical norms.
Fairness and inclusiveness, ensuring all countries benefit from AI.
Peaceful, safe, and controllable AI, kept under human control at all times. He urged that major powers should act responsibly to prevent an AI arms race on lethal autonomous weapons, while cooperating to prevent misuse of AI by terrorist or criminal groups.
The following day, Ma affirmed China’s support for the UN’s newly established Global Dialogue on AI Governance and Independent International Scientific Panel on AI. Ma also highlighted China’s July proposal to create a World AI Cooperation Organization and noted that it could complement the UN initiatives.
Implications: On safety, Ma’s comments show Chinese concern over lethal autonomous weapons, AI arms races, and AI misuse by terrorists, suggesting willingness to coordinate internationally on these issues.
Ma’s comments are consistent with China’s preference for the UN playing a central role in global AI governance. In contrast, Michael Kratsios, Director of the US White House Office of Science and Technology Policy, asserted at the same UNSC meeting that the US would “totally reject all efforts by international bodies to assert centralized control and global governance of AI.”
Domestic AI Governance
Revised cybersecurity law adds AI provisions
Background: On October 28, China revised its 2017 Cybersecurity Law for the first time, adding a new Article 20 focused on AI.
The new article states that the government will:
Support AI R&D and the provision of data and compute.
Improve AI ethics guidelines and enhance risk monitoring, evaluation, and safety/security oversight.
Promote innovative cybersecurity management methods with technology such as AI.
Implications: The addition falls within the law’s “Chapter 2: Support and Promotion of Cybersecurity,” which outlines general actions taken by the state. As such, it primarily functions as a policy signal underscoring AI’s importance to cyberspace issues, and reiterating China’s dual commitment to AI development and safety. It does not create new binding obligations for AI developers or directly address specific AI risks, such as the risk of AI misuse for cyberattacks. This revision reflects China’s broader regulatory strategy: embedding AI provisions into existing legislation rather than passing a new, standalone AI law.
Standards document provides foundation for AI emergency response
Background: On September 22, TC260, a key AI standard-setting body, released practice guidelines for AI emergency response. The standard provides a framework for classifying, grading, and responding to AI security incidents. While it does not explicitly address frontier risks, it establishes basic systems and communication channels that could eventually support responses to more advanced or catastrophic AI risks.
Content:
The standard covers three types of incidents:
1) Content security: harmful and illegal information, as defined in existing national AI standards. Notably, the standard also lists AI “sharing cyberattack, hacking, or data theft techniques” as examples.
2) Data security: data leakages, data tampering, data poisoning attacks, etc.
3) Cyber attacks: model tampering, denial of service (DoS) incidents, etc.
Severity is assessed by three dimensions: importance of affected services/data, business loss, and social harm. Based on these dimensions, there are four severity tiers:
Level 1 (especially major), such as large-scale AI misinformation threatening national security.
Level 2 (major), such as mass leakage of sensitive personal data.
Level 3 (significant).
Level 4 (general).
The standard then lays out both managerial and technical emergency handling processes, covering:
Emergency preparedness: Establish response strategies, Incident Response Teams (IRT), escalation procedures, training, drills, and regularly updated test libraries.
Monitoring & early warning: Real-time monitoring of model inputs, outputs, parameters, and system traffic; automated alerts.
Emergency response: Incident classification, activation of response plans, containment measures (service suspension, retraining), and regulatory reporting.
Review & improvement: Post-incident audits, feedback loops, knowledge-sharing, and updates to strategies, manuals, and training.
Implications: The standard closely parallels China’s existing cybersecurity incident management standards, but applies them specifically to AI. This shows active Chinese efforts to adapt emergency management principles to AI, as President Xi called for in April. The standard is focused on AI service providers, with emergency preparation and response measures that can create foundational response capability within companies. The standard also formalizes reporting channels, requiring companies to notify regulators of significant incidents.
It primarily addresses known and near-term AI risks, with only brief references to AI-driven cyberattacks, and no references to other dangerous misuse risks or loss of control risks. Nevertheless, the risk classification and procedural guidance will build institutional capacity that could be applied to frontier risks in the future. As other governments also explore AI emergency or incident response capabilities, this could become an area of mutual learning and international coordination.
Technical Safety Developments
In this edition, we catch up on technical papers published by Chinese researchers from June to September 2025. This is just a subset of the many interesting papers published in recent months.
Bio safety of protein foundation models
Enhancing Safe and Controllable Protein Generation via Knowledge Preference Optimization: This paper from Zhejiang University introduces a method called “knowledge-guided preference optimization” (KPO) to improve the safety of protein language models. The KPO method utilizes a “protein safety knowledge graph,” a large database linking harmful and safe proteins through their biological properties using gene ontology data, to minimize the risk of generating harmful proteins. The authors test this method on models including ProtGPT2, Progen2, and InstructProtein, finding that their method increases safety while preserving functional capabilities.
SafeProtein: Red-Teaming Framework and Benchmark for Protein Foundation Models: This paper from Peking University, Shanghai Jiao Tong University, Zhejiang University, Stanford, and Princeton introduces a red-teaming framework designed to test the biological safety of protein foundation models like ESM3 and DPLM2. It also proposes a benchmark dataset based on 429 experimentally confirmed toxin and viral proteins to systematically test models for generation of harmful or pathogenic proteins. The paper reveals major safety flaws in widely used protein foundation models, with up to a 70% jailbreak attack success rate, meaning it could often regenerate harmful protein sequences or structures despite safety filters.
Quantifying agent self-replication risks
Dive into the Agent Matrix: A Realistic Evaluation of Self-Replication Risk in LLM Agents: This paper from Shanghai AI Lab introduces an evaluation framework for quantifying self-replication risks of AI agents. It focuses on creating two different experimental scenarios where misalignment might lead a model to spontaneously attempt replication. They explore one scenario where a model could replicate to better monitor load during a system spike, and another scenario where a model could replicate in response to its operating resources being reduced. The analysis further disaggregates self-replication into four stages: recognizing that its current operational state requires intervention, forming an intention to replicate, successfully creating a replica, and verifying that the replication has achieved its desired objective. Over half of 21 tested LLMs showed a tendency towards “uncontrolled self-replication.”

Unlearning and preventing malicious fine-tuning in open-weight models
SDD: Self-Degraded Defense against Malicious Fine-tuning: This paper from South China University of Technology introduces a new method to make open-weight LLMs more resistant to malicious finetuning. Instead of making models just reject harmful questions, it teaches them to give irrelevant but harmless answers to those prompts. For example, when asked for instructions for creating a bomb, the model might answer with instructions for making coffee. This means that when attackers attempt malicious fine-tuning, the model’s general ability to follow any instruction will break. The researchers find that their method outperforms other approaches on a number of benchmarks.
Reliable Unlearning Harmful Information in LLMs with Metamorphosis Representation Projection: This paper from Peking University and Tsinghua University introduces a method for unlearning harmful knowledge called “metamorphosis representation projection (MRP).” Unlike existing methods like gradient ascent that merely ‘suppress’ unwanted information by maximising loss on unwanted data, the approach ‘erases’ information from a model’s hidden state vectors through irreversible projection operations—making it impossible to recover even through retraining. They test their method by unlearning natural science subjects in the ScienceQA benchmark and hazardous knowledge in the WMDP benchmark, finding that their method performs better than alternative methods.
Other relevant technical publications
Nankai University and China Academy of Electronics and Information Technology, Governable AI: Provable Safety Under Extreme Threat Models, arXiv preprint, 28 Aug, 2025.
Beijing Institute of AI Safety and Governance, Fine-Grained Safety Neurons with Training-Free Continual Projection to Reduce LLM Fine Tuning Risks, arXiv preprint, 8 Aug, 2025.
Zhejiang University, NeuroBreak: Unveil Internal Jailbreak Mechanisms in Large Language Models, arXiv preprint, 4 Sep, 2025.
Tsinghua University, 01.AI, and Nanyang Technological University, SafeSearch: Automated Red-Teaming for the Safety of LLM-Based Search Agents, arXiv preprint, 28 Sep, 2025.
Shanghai AI Lab, SafeWork-R1: Coevolving Safety and Intelligence under the AI-45, 24 Jul, 2025.
Renmin University of China, Zhejiang University, and Alibaba, Safe-SAIL: Towards a Fine-grained Safety Landscape of Large Language Models via Sparse Autoencoder Interpretation Framework, arXiv preprint, 11 Sep, 2025.
Hangzhou Dianzi University and Case Western Reserve University, When Truthful Representations Flip Under Deceptive Instructions?, arXiv preprint, July 29, 2025.
Shanghai AI Lab et al, Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents, arXiv preprint, 30 Sep, 2025.
Expert views on AI Risks
Chinese legal and industry experts publish in Science on AI regulation and frontier governance
Background: A group of leading Chinese AI legal experts from academia and industry published an article in Science outlining China’s evolving approach to AI governance. The authors include the lead drafters of China’s two “model” AI Law proposals, ZHOU Hui (周辉) and ZHANG Linghan (张凌寒), alongside industry figures such as Alibaba’s FU Hongyu (傅宏宇) and DeepSeek’s WU Shaoqing (吴少卿). The paper reviews China’s current AI rules and argues for harmonizing and simplifying existing regulations, potentially through a consolidated AI Law.
Content most relevant to frontier safety: The authors argue that Chinese AI regulation creates an environment friendly to AI development and innovation by focusing on providers of AI services, exempting AI scientific research and open-sourcing of models. Noting the possibility of “extreme risks” from dangerous misuse of open source models, they suggest “a more cautious approach” to frontier models. The paper urges Chinese developers to be “more transparent and evidence-based” in demonstrating safety measures for frontier systems and emphasizes international cooperation on “extreme AI risks.”
Implications: This article provides a strong overview of the current state of China’s AI legal landscape. It highlights that key experts and industry representatives advising on AI policy in China envision China fostering AI innovation while mitigating extreme risks.
Leading national security think tank warns of loss of control and strategic misjudgments
Background: LI Yan (李艳), Director of the Institute of Sci-Tech and Cyber Security Studies at the China Institutes of Contemporary International Relations (CICIR), gave an interview on the US-China technology competition. CICIR is a key national security think tank, and we have previously covered its analysis of AI’s security implications.
Content: Li warned that cyberspace technologies including AI would bring “unprecedented uncertainty to the world,” emphasizing two global risks:
Loss of control over advanced, self-learning AI systems, which the author notes UN Secretary-General Guterres compared to pandemics;
Strategic misjudgments intensified by technological uncertainty, which could spiral into uncontrollable confrontation absent effective crisis management mechanisms.
The article is sharply critical of the United States, arguing that:
Since 2015, the US has incorporated cognitive warfare into military planning, with ideologically tinted AI models now enhancing the reach and subtlety of such operations.
Washington views China’s rise through a “tech war” lens, echoing past US containment of the Soviet Union and Japan.
The US strategy focuses on de-regulation at home and promoting tech diplomacy with a small bloc of allies while excluding others.
By contrast, Li describes China’s response as focusing on technological self-reliance, economic openness and global cooperation, and risk supervision. She concludes that finding a governance path that balances security and development is ultimately a shared global challenge.
Implications: This article shows that scholars within one of China’s top national security think tanks, while critical of US AI strategy, acknowledge the twin dangers of humans losing control over AI itself and great power rivalry escalating into instability.
Expert warns of AI-driven cyber threats
Context: HAN Honggui (韩红桂), Dean of the School of Computer Science at Beijing University of Technology, wrote an essay in Study Times (学习时报) about AI’s transformative impacts on cybersecurity. As the official newspaper of the Central Party School, Study Times is an influential theoretical and policy journal aimed at Party officials, scholars, and policymakers.
Content: Han argues traditional defences are slow, manual, and rule-based, causing them to struggle to keep pace with intelligent, efficient, and fast-evolving AI attacks. For instance, AI-driven code obfuscation and polymorphic malware can easily evade static detection systems that rely on signature databases. AI-driven cyberattacks are often distributed and cross-network, compared to more disjoint cybersecurity mechanisms, allowing for chain reactions once a single node is compromised. Han also observes that AI models themselves have become new attack surfaces that can be compromised through data poisoning, adversarial samples, or backdoor implantation.
He advocates:
shifting from reactive to predictive, adaptive defense;
adopting Security as a Service (SECaaS) for flexible, AI-enabled protection and systems capable of continuous learning and autonomous adaptation;
strengthening AI model security through adversarial training, watermarking, data protection, and transparency.
Implications: While expert discussions on AI’s impact on cybersecurity are not new in China, this essay stands out for being technically sophisticated, focusing on adaptive capabilities in cyberattack and defense, and also being published in an influential party journal. Yet, his proposals center on using AI to enhance cyber defense rather than curbing AI misuse.
What else we’re reading
Zilan Qian, Why We Shouldn’t Call Export Controls ‘AI Safety’, Sep 29, 2025.
Karson Elmgren, Scott Singer and Oliver Guest, Is China Serious About AI Safety?, AI Frontiers, Oct 14, 2025.
Matt Sheehan and Scott Singer, How China Views AI Risks and What to do About Them, Carnegie Endowment for International Peace, Oct 16, 2025.
Concordia AI’s Recent Work
On the sidelines of Singapore International Cyber Week 2025, we convened 50+ members from government agencies, embassies, academia, industry, and civil society for a conversation on AI Governance in Singapore. Concordia AI’s Jonathan Lee discussed findings from our State of AI Safety in Singapore report with an expert panel of National University of Singapore Vice Provost Simon Chesterman, Deputy Director at IMDA Vanessa Wilfred, and Concordia AI CEO Brian Tse.
Our CEO Brian Tse joined Nathan Labenz on The Cognitive Revolution podcast to explore China’s approach to AI development, safety, and governance.
We discussed findings from our State of AI Safety in China (2025) report with distinguished panelists Angela Zhang, Paul Triolo, and Samm Sacks. You can find a full webinar recording on Youtube.
Concordia AI hosted a delegation from the Roundtable for AI, Security, and Ethics (RAISE), launched by the United Nations Institute for Disarmament Research (UNIDIR), to discuss AI safety and governance in China at our Beijing office.
Concordia AI CEO Brian Tse spoke on the intersection of AI and biosecurity at a panel titled “AI-Accelerated Biological Risk: Delving into Asia’s Challenges and Emerging Solutions,” organized by AI Safety Asia (AISA) on October 30.
We are proud to have joined the Partnership on AI and the founding cohort of the International Association for Safe & Ethical AI (IASEAI) affiliate program.
Feedback and Suggestions
Please reach out to us at info@concordia-ai.com if you have any feedback, comments, or suggestions for topics for the newsletter to cover.

👀